The Big Data Beard team takes a crack at Machine Learning

Machine learning is one of the hottest IT and data analytics buzz words of the year.  In fact, CNBC ranked it among the top 4 tech buzz words of 2017 which means there must be some relevance to it.  Personally, I have been so focused on other big data and analytics technologies that I haven’t paid machine learning its due attention.  I was in Austin a few weeks ago talking to a few colleagues about this cool new company called DataRobot and how they were making machine learning more accessible to everyone.  Due to my lack of data science skills, this seemed like as good of spot as any to get my feet wet.  After a quick search on the DataRobot University page I found a 2 day essentials course that was taking place the following week in Boston… Perfect.

Screen Shot 2017-12-21 at 10.57.42 AMBefore I get into the details of the training, here is a little bit about DataRobot.  In a nutshell, DataRobot is an automated machine learning platform that enables a wider set of users, not just data scientists, to create predictive models.  In doing so it democratizes machine learning, allowing companies to leverage the talents of their business analysts or other data driven resources to focus on machine learning projects rather than having to rely on a data scientist which in addition to being expensive, are also in high demand.  This means more companies getting more out of their data and driving tangible business results!

I took the DataRobot Essentials training with two of my bearded friends Cory Minton and Kyle Prins.  Both left their homes in the south and braved the Boston winter to accompany me in what was an fun and education two days.  I’m not going to dive into much of the training logistics other than it was a two day course from 9-4 each day, and cost $1000.  We started day 1 going over the 6 steps of the machine learning life cycle and used this template for the different modules for the next two days.  Right away we got access to the platform and were able to load data, test models and play with the DataRobot GUI.  We used some fabricated data on loan applications to reinforce the training modules and then had a final project where a team had to create their own predictive Screen Shot 2017-12-21 at 10.59.07 AMmodel based on healthcare data. The class culminated with the teams sharing their models and seeing which team came up with the most effective one.  Oh yeah there was a test at the end too, nothing crazy.

On the last day, I decided to put Data Robot to the test and see if it was truly as easy as advertised for a machine learning novice.  I put together a table that showed wine ratings and weather data over the last 100 years for Bordeaux, France.  Due to lack of time, the data was all fabricated but it was representative of a data set should look like before loading into DataRobot.  I then ran the data through DataRobot, blended and fine tuned the different pre-set models and chose the one that would give the best prediction. When I tested the model, it came within .01 variance. Now I just need to find 100 years worth of weather data and I might be able to make some good bets on wine futures.  This just shows how easy the platform makes creating machine learning models.

I am going to wrap up this blog with some of my takeaways from the training and DataRobot:

– They have a cool platform that is really simple to ingest datasets, model and predict.  They do a great job taking out a lot of the complexities through automation while still allowing the user to fine tune the models to get the most accurate prediction

– Practically any analyst can use DataRobot with minimal training and can produce predictive models.  I was highly skeptical about this claim going into the class but was proven wrong after being able to create my own model in just two days.

– That being said, to be successful, projects still take a while.  The 6 steps to machine learning includes define project objectives and data sets which alone could take weeks. In addition, users still need to know certain statistical theories in order to fine tune their model

– DataRobot does not address the hardest part which is data wrangling.  If you don’t have all your data in one table or its not consistent, DataRobot will not work.  Users will need to spend time getting all the data they defined as necessary for the project into a consistent single table. 

– The training is a little expensive for what you get out of it but then again, I am not looking to use DataRobot full time, just play with it every now and then. If you are looking to use DataRobot full time, its a great course to take. 

If you are interested in machine learning, I recommend you check out the Data Robot training.

Your bearded friend,

– Brett