Title:Everyone is a Unicorn with DataRobot
Host:Cory Minton
Co-Host:Kyle Prins & Thomas Henson
Guest:Dr. Greg Michaelson
In episode 16 of the Big Data Beard Podcast Cory, Kyle,Thomas talk with Dr. Greg Michaelson, VP at DataRobot about all things machine learning.  Machine learning was one ofthe top trends in 2017 and 2018 looks to be no different but with the shortage of data scientists organizations are limited to what they can do.  Greg and the team explore DataRobot and how their platform allows organizations to expand their machine learning capabilities through the democratization of data science.

Registration is open for #StrataData San Jose.


[0:00] Are welcome to 2018 folks I’m your host for this big date of your podcast and Thomas along with their beers along for this fun ride.
And it’s everybody is probably aware machine learning is easily one of the hottest topics topics in Analytics.
And big data for not only the back half of 2017 definitely the hottest thing happened in 2018.
And I’m sure it’s going to stick around for a while and a handful of a sewer students of this big data space in the.
The environment where our fans of machine learning think it’s a cool Trend and think it’s going to have a lot of power we’ve started poking around and look at it companies.
Do an interesting work to accelerate machine learning in the Enterprise and we wandered into these these folks are did a robot and we thought.
It sucks you’re doing some pretty interesting work simplifying how organizations and frankly aspiring data scientist can leverage machine learning in real life and such a real life unicorn.
West Side at least it is pictures of very robust Big Data mustache his name is dr. Greg Michaelson he’s the VP at datarobot leading their work for banking customers and the big date of robot Labs teams.
Greg welcome to the show sir.
[1:21] Hey thanks for having me although I’m I’m sad to report that my mustache did not survive the Christmas holidays that’s what the New Year’s resolution.
[1:31] I have to assume I have to assume by casually you mean that it got taken off as you were trying to eat a turkey leg and it got pulled off that’s the only acceptable answer.
[1:36] Shaved off every mustache has a lifespan and you know you wake up one day and that’s all she wrote for the mustache.
[1:49] See now here’s a play we’re going to we’re going to teach you this play for my playbook if you if you brand something like big did a beard as your thing then it’s really hard for the wife or anyone to tell you you got to shave it off because.
I mean there’s like stickers of their likeness on there.
[2:06] Yeah my wife wants told me that she wasn’t going to kiss me until I shave my mustache.
[2:12] Yeah well that’s you know you got it you got to respect the ladies so Greg I obviously we were fans of date of robot we started doing some some trolling on you and and one thing I pulled out of your bio that I thought was interesting,
is Ura.
At least you have some ties to the state of Alabama to get your PhD at the University of Alabama so you are you the phds from Alabama love college football and are excited about college football championships.
[2:38] Yeah man I went to LSU to so I did my undergrad at LSU so you know occasionally my undergrad will play my grad and we’ll see which part of me is the best football team in the country.
[2:51] And I swallow that where I think there’s you know if if not by well I mean I’m trying to help her body wave perspective from a mass perspective work I’m seeing a lot of data science and Big Data people in the Southeast in Hart of Dixie practice in the stuff.
[3:05] Amen brother.
[3:06] Greg you live in Charlotte North Carolina now is that right.
[3:10] I do.
[3:11] Alright well let’s let’s talk a little bit about datarobot so things that we.
So we can have video disclaimer Kyle and I and one of our other big hit of your guys went to the date of the robot Essentials course a couple weeks back because we were just totally interested in how this thing works.
And I think the terms of the the general statement that you guys used to describe the platform is the democratization of data science.
Tell me what that means what is it really what is date of robot doing to make it make data science approachable.
[3:46] Yeah I think you actually read this morning an article from Glassdoor LinkedIn or so I can’t recall where it was from but it’s it mentioned that.
Data science salaries have topped like their they stopped going up even though they’re still really high and yeah that made me think that.
There’s some some risky tell even though there is a ton of demand out there in the market for data scientists and for data science because companies are collecting a ton of data and they want to do obviously monetize that data and helping.
Use it to help them make their businesses run cheaper and and be more efficient and so on,
there’s a couple of problems one is you can’t hardly find these people right they all work at Google or Facebook or whatever their possible to hire and they change jobs every 9 months for,
30% more money so they’re super hard to get,
the other is there there I mean most of them are Space Cadets right up and I speak from personal experience right these guys are,
you know they like to solve problems they like to write computer code they’re not Bankers right they’re not Hospital administrators or not.
Generally speaking business people until I think well I think I constantly worry that Executives out there,
are going to get sick of this one day they’re going to say you know what we invested in data scientists phds and we’re just not getting any value.
[5:17] Right and so weird. Right I think AIG did something like this a few years ago they did kind of double down on setting up the science department.
And then you don’t think there’s later,
they they just laid them all off right cuz they weren’t getting any riy so that’s that’s kind of the place where data robot plays a bit in a we what we’ve done is we’ve automated the technical pieces,
of the data science raw so training the models pre-processing the data,
handling missing values tuning the different models picking out which algorithms to use partitioning your data all that kind of technical stuff that your PhD is normally have to do we automated,
so that you can take business people who actually know the business problems and use them to solve the problems that there aren’t really the experts in that’s what we try to do.
[6:10] Okay so one thing I know it’s that’s perfect I think one of the things I think is interesting is what’s I’m going to unpack this little bit because one of the things that is.
Is interesting to me is that we have a lot of data that’s that’s one thing.
That just the process of once we have the dinner with cleansing we’re done at the process of applying machine learning to a data set.
It’s pretty soft right there’s a bunch of different libraries that you wouldn’t use any there’s a lotta there could be a lot of challenges in trying.
To figure out which models are right like why do you like it how do you guys do that better like what I mean you clearly have some experience building models but.
Really that model building selection thing why is there a robot unique there cuz I think your history and where you guys came from may have something to do with that.
[7:00] Yeah well so it turns out that the secret to building really good models is not,
being an expert in one type.
Right so yeah maybe if you’re playing in like the computer vision space or something like that I mean in that space kind of drone that works are the only show in town,
but if you’re building like a frog model or you know you’re you’re trying to,
predict read mission in the hospital or you’re doing like drug Discovery or whatever it is there’s no rule of thumb to say oh you should use xgboost here or you know you should build a linear model or whatever it is the only way to know what’s going to work the best is to try everything,
until our roots are kind of in this thing you guys are familiar with it I’m sure called keggle.
[7:48] Competitive data science right.
[7:50] Sites like Airbnb for data science right.
[7:53] I’ve never heard it said that way I was I was thinking it was more like the it was like massive was a massive online game or one of those nerds.
[8:07] Totally turned out the secret to winning those things is just to try,
as many things as you can and you’ll discover the thing that works for your particular dataset for whatever reason and and that’s how you get good model so it just tries everything,
using kind of brute force and also some some heuristics and some smart things that leave baked into the platform it’s basically AI building AI which is pretty cool.
[8:36] Von Neumann probe.
[8:38] I don’t know what that means.
[8:40] Look up the Spades are self-replicating spacecraft to take over the galaxies it’s like a high bill and I are maybe switch called Skynet.
[8:48] Yes sure why not.
[8:50] So so what are things at that other say so it’s so what what did a robot does is a very focused thing because one thing is that I learned in that class and we learned was there’s some things I have to happen before.
Datarobot is leveraged which is really around this this concept of data wrangling is that fair to say.
[9:12] Yeah I think there’s a couple of pieces to it what is data that you get wherever you are is going to be a mess,
right you’re going to have missing values you’re going to have you know just weirdness about your data that you can’t discover until you start kind of down this process so being able to iterate really fast is important,
and then the other piece that’s important is being able to,
partition your data appropriately right so I don’t know how technical you want to get but one of the one of the secrets to.
To building these models in a safe ways to never evaluate how well a bottle works,
add data that was used to train it so you have to partition out set aside some data at the start in order to have kind of a hold out her a test sample that that you can use to see how well your models did,
so yeah it’s another piece of the whole data wrangling pie that the day robot takes care of for you.
[10:10] Greg is this tool do we use datarobot do you focus on the data scientist as a tool for them because,
data scientist are in a really hard to find and really popular so is this something that a data engineer can start looking at a running up against or do you see,
this being something that the data scientist is going to use them, augment for them.
[10:31] You know it’s funny you should ask that question I would say probably 60% of our users.
I’m guessing have never done data science projects before yeah so you know we went in with one of the biggest banks,
in the world and started working with an organization actually a sales organization that was trying to kind of increase efficiency and and boost revenue and so on and we worked with,
yeah some guys that,
so guys and gals that had never done data science before 90% of our users at that bank have never done this kind of work before and within the first year they generated you know 10:20 million dollars in additional Revenue just from prospecting better,
right so that’s that’s a subset of our users kind of these new analysts that are data Savvy maybe they spend most of their time in Excel.
You know kind of kind of doing that analyst type work.
The rest of our users are arguing with your your data scientist right with with varying levels of skill I find it,
that working with data scientist in a bit of a mixed bag.
Because there’s some sort of a mix of skepticism and fear I think.
With with the to her like datarobot because we are made of a big chunk of the work that a lot of data scientist do today.
[12:02] And so you know there’s a portion of data scientist to see datarobot and they go.
Holy crap what am I going to do right is my job going to be is going to be here and then there’s a bigger chunk of data scientist that look at datarobot it’s a,
well I can accomplish you have 10 times more than I could yesterday using it to like this.
[12:23] Yeah but that’s that’s what it seems like you said the people there there’s not nearly enough of the data scientist anyway so maybe the maybe maybe they spend their time actually focused on developing the ROI rather than developing the model for the sake of model development.
[12:38] Yeah I mean it’s a trend right so in the days of data visualization space over the last five years we’ve had tools like qlikview Tableau come about that have democratize the whole management reporting piece,
on the dataprep ETL side we got tools like.
Ultrax are Posada or talador some of these others that have made it possible for.
For folks that can’t code to build data sets and so on a robot has done the same for data science it’s certainly the case that over the next five years you know technical people are going to be forced to become more business.
And business people are going to be forced to become more technical.
[13:21] Yeah one of the things we’ve heard this over and over again is like.
You know machine learning school in until applied isn’t eating will get to some of the ways you do it but I want am I my car,
arguments you know where I think we’ll see things like machine learning and AI have the most impact as we’re intelligent software companies use it to power.
Either building other ml or that they use it to simplify other very technical jobs.
Already been place right automation on steroids to quote and ruining right I think a lot of this is going out I feel like it.
I feel like machine learning what you guys are doing with it to to democratize it.
Is almost as cooler I mean is cool but it feels like it’s the nearest term chance machine learning has to find its way into the Enterprise.
[14:13] Yeah there’s a lot of places right you know I mean it’s certainly not knew that that these kinds of predicted models replace,
some of the more manual were coming 25 years ago when you went and got a quote for your auto insurance you sit down with an agent or another writer and they you know fill out some forms and estimate your risk and so on but today there’s,
there’s doesn’t happen right it’s all a predictive model.
It’s so you know that this is kind of the natural extension of that you know if you look it up Factory,
write a you know a manufacturing plant or something like that why in the world wouldn’t you hook up Automated machine learning to All Those sensors that are taking readings in order to do predictive maintenance and keep that thing up and running,
are you can build thousands of models to predict what parts are going to break in and replace them just in time without having to you know go through all that complex analysis in Selma.
[15:08] Greg so if we’re talking about you know the ability Force users that are data Savvy but are not data scientist being able to use a robot be able to get boring machine learning because I mean the whole thing we were talking about one thing that really stuck out as you said democratizing data so what’s,
skills do these data saby Engineers or analyst need to have them in order to be able to become and use this to look like a data scientist what.
[15:30] The biggest one is knowing the business right I mean if you think about sort of the standard definition of a data scientist there’s those 3 skill sets you can picture that Den diagram that everybody seen.
The meeting of coding and algorithms and business expertise,
I think the traditional data scientist has the most trouble with the business expertise side and that is arguably the most important piece of the puzzle.
Turns out that if you can take somebody who knows the business knows the problems right they’re losing sleep over you know whatever it might be,
my Revenue are or operating expenses or risk whatever if you can give them the ability to spot opportunities to convert those business problems into real.
Ml problems like mushy AI Solutions then you know you’re 80% of the way there cuz you can,
Automation in tools can get them the rest of the way so it’s not really a technical challenge anymore obviously knowing the data is important,
and you know having having access to it and all the kind of data infrastructure and so on the that goes along with it but,
really 90% the battle is knowing the business and being able to frame the problem.
[16:48] You guys so nice so good they said you know it’s organized in a decent way that I can implement it into datarobot.
And then you. You either uploaded there or you know you if you give bitter about access a file then.
You can do some like you said that data you know cleansing missing values that kind of stuff.
Goes into running at running at the meat of what a termite does which is it goes in parses the day to end illogical sets like you talked about holdouts right so you can do some training you can do some some validation,
later and then after that once I’ve got some models and I figured out which models are best.
Well that’s what I do then what is a robot help me do from the point of I’ve got a model that that identified as good how do I actually put that into practice.
[17:40] Yeah really good question turns out that’s the second hardest part of this process.
It’s the first hardest lets you automate all the all the technical data side stuff and it’s different for every use case so your deployment options have to be,
very flexible because all use cases are going to have somewhat different implementation.
Right so if you imagine a fraud use case right where I transaction like a processor like a point-of-sale type transaction processor and I want to block fraudulent transactions,
why do you do that in real time right I need to do it when the person is standing there at the checkout register if it’s a fraudulent transaction I want to block it before the person gets pays.
Set a real time he’s case that means it has to be highly available it has to be redundant to test the never fail has to be on all the time and has to respond and in microseconds and so on,
that’s one kind of one end of the spectrum right and there’s lots of stuff in that real-time space things like high frequency trading or admitting Gore,
yeah whatever might be you know the kind of on the far other side of the spectrum is,
like nightly batch type processing right maybe I have a set of customer is it on a weekly basis say I want to score them for churn risk right I want to try to predict who’s going to leave me for a competitor.
[19:12] Well I don’t need to do that in real time and I don’t need to do it every day maybe maybe I need to do it,
weekly or monthly you’re even cordially something like that depending on the business so you know those two use cases both are worth tons of money but they have a really really different.
I implementation path so I don’t think you can focus on one,
particular set up that’s one of the mistakes that we actually see a lot is that organizations have sort of the happy past the one pass that they deploy these models with and they don’t really work for either,
either either one of those Solutions do you have to be flexible I guess is what I’m saying.
[19:52] Yeah so that’s one thing that we also knows truth is it by I heard every algorithm is wrong but many of them are inherently useful.
[20:06] He’s at.
[20:07] But but they also their usefulness doesn’t it doesn’t degrade over time.
[20:12] They do so you have to watch him in fact there’s a lot of regulation in the financial sector that actually forces organizations to do that kind of modeling for their or monitoring for their models you know it needs in the credit scoring space,
for example really all models within banking institutions depending on the level of risk associated with them going wrong,
yeah tomorrow right are they still making accurate predictions is the is the population different has the world changed because you know these events happened in the world and,
the extent that the relationships between the features that you including your models and whatever outcome that you’re trying to predict to the extent those relationships change then all the sudden your models aren’t aren’t going to be working anymore so keeping an item is really important.
[21:00] Is that one of the other challenges with a lot of this machine learning the trend towards AI is the ability to audit what’s actually going on under the covers.
[21:11] There’s actually been a ton of progress in that space so you know yet I think all of us and probably heard the word Black Box before you guys heard that.
That phrase I hate it because depending on who who you’re talking to they’ll use that word is like an arrow that they’re firing at you a flaming arrow.
So if you talk to the statisticians and they’ll use the word Black Box to describe any machine learning technique like a random forest or something like that.
Let me talk to a business person then all bottles are black boxes you know whatever right but it turns out it might be you that.
The technology to understand how these models work.
Things like partial dependency and sensitivity analyses and reason you no reason codes and so on,
there’s plenty of ways to understand the inner workings of even your most complex deep neural networks.
You know certainly their models that are less interpretable than others and there’s there’s a trade-off when it comes to model selection in terms of picking something that’s simple and explainable versus accurate right there.
Those things go to Spectrum.
[22:28] It’s like you can have we are so you can have good fast and cheap but you have to pick two.
[22:32] Exactly so simple or accurate I don’t know if it certainly it was it was that way more in the past but.
The tools have really come a long way as far as interpreting how these models actually work even when they’re really quite complex.
[22:48] Interesting dinner robot takes this concept of okay I can build a model and I can visualize what the model is and then I have the option of if I want to do this real time.
Do you know I wanted a boy that Fried Chicken a use case or I want to do that batch.
Is it like like 120 say it deploys is not as it opening up like a.
I’m in some sort of an API that I can integrate into you some sort of commercial or proprietary application like what how do I actually put it into put y’all put it into action.
[23:19] Yeah there’s four main ways to do it inside a big the robot one is a gooey based approach so if I’m an Excel user.
And I have some kind of process that I that I manually do I can just drag and drop a file into the GUI super manual not super repeatable.
But it exists right so if you don’t want to write any code if you want kind of the the easy way then.
The gooey is there II you like you say ASAP I approached so my personal view is that the API using an API for for deploying predictive models.
Is a very excellent way to do it mainly because it keeps you from having to implement scoring code it eliminates the possibility of implementation errors.
It’s all right there’s lots of good parts of good good options when it comes to a good reasons to use API.
Hot low latency high-throughput horizontally scalable all that kind of stuff.
Set a third or second the third is spark so you know if you’re scoring a Giant Mountain of data then distributed scoring using something like to do pleasing Spa.
It’s super fast super good way to do it.
In the last way to to get prediction Saturday to robot it’s just a dump coat but will give you raw Java code so if you want to.
Yeah literally see exactly every step to do to her about us taking you just dump the code directly out and Implement that directly that’s a more technical approach right I mean that’s going to require some engineering work to get it up and running.
[24:58] It’s little it’s it’s cold right it needs to be in a compiled and maintained and so on so if you want the code you can have it and that’s good for you know like an offline application or if I need,
Ultra Ultra fast embedded predictions for like a high-frequency trading yusuke’s or something like that then you know that’s that that’s the fourth option.
[25:21] Interesting so one thing she brought up was deployment model so when things like I was.
On your website I noticed that you very much have a cloud service and I’m guessing that’s probably that’s probably like the entry point for a lot of your urinary customers they want to try it out they don’t want to have to invest in.
Deploying you know any sort of platform themselves I just want to play with and take her which is cool but.
Do you also offer like are there are there large organizations that say look I don’t want to I don’t want to send all of this data that I want to run these models against I want to do this and you know in my security on my data center on my gear,
is there a way for organizations to do it both ways or is it all or non-dairy is it only in the cloud what are the how do organizations actually deployed.
[26:11] Ada robot in context of their Enterprise strategy.
[26:14] Yeah turns out that’s a really good question and organizations have tackled it differently the more,
the smaller an organization is of the newer it is the more friendly they are towards the cloud I was just supposed to not surprising.
And the more sensitive data an organization uses Health Data personal credit information that kind of thing those those guys,
tend to want to have it on prep solution so about 2 3 years ago we kind of made the commitment to to be available however you want us,
so we’ve got a Cloud solution we probably,
yeah 60% of our customers use the cloud all the big customers and all the banks in and health health organizations use on Prime Solutions so datarobot can install on.
Clusters Linux servers is there’s lots and lots of ways to to get datarobot to work.
In your environment we don’t need an internet connection to run detour about never tries to the phone home when it’s on Prime and so on so we can be fully behind a firewall on and not their problem.
[27:27] Cherish you bring up that that’s that’s actually one of the really intriguing things I learned about it about that I thought it was in terms of like usefulness and an Enterprise adoption what things are we saw over the last five years was a lot of organizations much like your AIG example previously where they invested have landed a science because I thought,
he’s doing it we got to do it a lot of people interested in these big Hadoop cluster is because there were like there’s the state of deluge we got to have a place to store it this technology whether it’s,
your quad air horn works or whatever this to do stuff is is where everybody seems to be stored it because then we can analyze it and there was kind of this like this,
Sheikh investment and people had built these are clusters and then they struggle sometimes with getting value.
Out of one all the data into the investments in technology that they made to the basically build these big siloed data analytics environments to me it seems like.
Datarobot going in and then coexisting an existing to do clusters,
like that has to be when I’m by myself and I have to be one of those like Target go to market and probably best places for people who are,
like your customers are looking around going well I got this giant I do cluster and nobody likes it anymore.
[28:40] You’re so right there that’s exactly what we saw you know to three years ago as we were you know starting starting out in the yard Prime space is that organizations have spent millions of dollars on commodity Hardware to run Cloudera hortonworks,
whatever distribution of Hadoop takes it say wanted the cores are just sitting there,
right there just not not being used no compute right maybe they’re the date of notes are getting used but the the computers just kind of,
a dormant so yeah we are weird it where is tightly integrated with the dupe as you as you can be,
you know we had a great with all the electron cloud air will integrate with century and Kerberos and so on yeah we install with we look look just like a native application on Cloudera iPod Air package manager,
yard integration all that kind of stuff so yeah we’re where is tight as you can get with it with the deepest parts identification.
[29:43] Well I think that’s one of those like you said you guys made that decision a few years ago but I would argue that pretty much anybody that wants to be you know kind of in this,
no data analytics space you got you got to be a first class citizen with the hood Defenders cuz.
Like it or not there yet they have made very big investments in the customers have invested heavily in them so I think that’s smart one one of the thing goes do you guys play with,
I mean I think about the predictive models in the way that you you Orient data that goes into datarobot are there equally as big of martians maybe I was biggest,
but it’s tight connection points into more traditional Enterprise data warehouses so I can scale at databases npp’s that kind of stuff.
[30:26] Sure I mean there’s lots of ways to bring data in right so we can make odbc connections we can,
yeah I’m bringing Pat parqueadero files from hdfs we can connect to PS3 or or share drives or whatever so there’s lots of ways to get,
to get dated.
It turns out everybody’s infrastructure is different right even if you go over seats right so Asia has at least in my experience seems to have a much lower adoption of Hadoop.
At least they haven’t maybe they haven’t gotten there yet maybe they’re going down a different path I don’t know so the hardware and Technology Stacks that that Enterprise so using.
Dad to be hugely married and so you know being able to fit in with all those different technology Stacks is really important.
[31:14] Yeah I was actually over in Japan end of last year and I ran into a lot of your customers there was incredible and I didn’t realize you guys had such a presence in Japan I was quite stunning.
[31:26] Yeah Japan it turns out loves automation they are crazy about robots to so.
[31:35] Stickers must go like hotcakes over there.
[31:37] Oh yeah it’s a match made in heaven we’ve actually got I think maybe 15 or 16 people at RR Tokyo office,
a massive massive customer base there so Japan this is I would say the Leading Edge of of Automation and interest in deploying Nai to optimize their,
the way everything rents.
[31:56] Talking about the point a I know you’re,
you’re pretty heavily focused in the financial services industry and I know it may not be able to talk about specific customers but use cases and how,
you seen organizations can I go from a problem statement to a hey I’m using data robot to effectively achieve Acts.
[32:19] Yeah so it turns out that pretty much any part of an organization particularly banking is is ripe for automation.
Right so if you if you look at the sales team right if you if you look at the process of generating.
New business from Prospect list.
Ranking those prospects in terms of maybe their propensity to buy maybe their credit worthiness you know whatever turns out to generate.
Much better results than whatever subjected process is being used by sales teams today the same is true for cross-selling right to a deepening relationships with existing customers.
Turns out targeted marketing to your existing customer base to increase the the level of adoption of your product is is hugely beneficial to an organization.
Risk forecasting credit losses super easy with machine learning and a huge opportunity in in banking.
Fraud Financial crime.
You know of operations type use cases there is just so many different avenues that you can go down,
what interesting one that I particularly like it has to do with what do you do when an account goes bad right what do you do when alone you know stops paying.
[33:51] What’s the optimal resolution strategy for for maximizing recovery’s on a alone that’s gone bad turns out there’s lots of really interesting things you can do in that space.
[34:03] 2017 or maybe the second half of 2017 started seeing fintech huge you know on Twitter it was trending all the time and then you have toward the end of 2017 even you no more gas.
[34:16] Dos Mars I was going to change his name to the dean of Bitcoin or the dino blockchain.
[34:22] Is AI machine learning in these use cases and that’s what is that word striving so much of popularity in so much of the focus.
[34:29] On fintech or is it is it something totally different.
[34:33] It’s a big piece of it right so you know you’ve got fintech companies in the in the lending space lending is all about forecasting risk right so if I can predict How likely you are to pay back a loan price it profitably,
and I can steal a piece of the market so it really is an arms race and that’s based in to whoever’s got the best model wins.
You know if you look at the payment space if you look at just so many different sectors in infant.
Identifying building those models in a fast way it’s speed too fast speed to Market.
And be nimble the way you do it is the reason why fintex are getting so much so much traction the reason they haven’t fully taken over I think there are two reasons one is they don’t have nearly as much data as the big guys do.
And they don’t have nearly as much business expertise right so it it’s still the case in my view that the big players.
Still have the expertise in terms of how the business works and and how to run say a bank for example.
[35:39] David versus Goliath.
[35:41] Yeah I mean the tech guys are are coming into their they’re taking up a lot of that slack and I think some of the big players are are justifiably afraid,
I’ve got the market share that they’re losing in that space but there’s still time for for the big words to,
to you know a vested in a i and end build models and more nipples High fashion to to stay head.
[36:05] What area of financials do you think machine learning will change the most of Sardar be the most transformative in.
[36:13] So interesting I think that I think the financial crimes and fraud Space is really interesting.
Most of the fraud systems that we come across these days are sort of rule-based.
Right so you know what this transaction meets rules a b and c that will block it otherwise not those those systems tend not to be very good you can improve them very very easily with machine learning.
That’s going to reduce losses until I meant that’s going to be really big.
[36:46] Take it a lot from a classification filter too much more of a alive breathing classify right.
[36:53] Is a ton of room there I think things it’s been really interesting it in the lending space,
you know some of the phanteks like Lending Club and in some of these others are kind of going after the non-traditional lending Market.
Your folks with no credit score or or bad credit or whatever it is that’s kind of been a place where the traditional players haven’t ever really wanted to play so much.
Or conversely have been kind of predatory in the way they they work with those folks they could have like payday lenders and and so on.
I think I get it I think there’s a lot going on in that that’s infile credit space and and some some of those areas that are that are really interesting.
[37:39] So you actually spend some time prior to that robot you were you were running the Crux of this set looks like Travelers and regions developing models.
So how did you end up going from a being a practitioner in the inner right in the heart of this stuff to to join in a tech company doing this.
[37:59] Yeah so the founders of datarobot we all work together at Travelers so that’s where I met Jeremy H&R.
Are founder and Tom big boys is co-founder so we all kind of met there they went off to start the company back in 2012 I joined.
Brad two-and-a-half 3 years later something like that after they’d kind of built the products one of the unique things about datarobot is that they took or we took the first say 25 million or so in venture capital.
And just dumped a hundred percent of it into building a real legitimate products that actually does what we claim it does we didn’t do an iota of marketing until you know year 3.
Maybe your for it so I joined right when we when we started to go to market.
Because that’s what I wanted to do I wanted to go in and help him power and accelerate the way organizations adopt this kind of Technology.
[39:06] Interesting so I’m curious you don’t have a mustache anymore I thought for sure that a large part of your statistical prowess had to originate from that robust mustache that’s too bad.
[39:19] Can I find that I can barely think straight now.
[39:23] That’s that’s why I hate that for you I will welcome you back to the land of the first ice anytime your anytime you’re interested.
Well Greg this is been super fun I appreciate the conversation I think what I take away from this is is state of robot is absolutely democratizing data science developing and given folks.
Some great tools actually do data science without Peter data scientist since you’re actually get some Roi from machine learning in the business and you guys are automated a bunch of stuff that maybe isn’t the the easiest parts of the data science process and make it easier to take.
As models pushing back into the business back into action I got super cool tech people to pay attention to I got to know the like in your when you think about what’s next in the,
the machine learning space and you kind of look at this in,
Justice technology ecosystem over the next called 12 to 18 months what are some of the big macro trends that you think that we ought to be paying attention to as a relates to the machine learning and.
[40:23] I think one of the really interesting places is unstructured data.
Dear about does a fair amount with with text just raw text and it always surprises me how much value.
Barry is it just straight text,
right it’s not a structured data field it says it’s just what somebody typed in or the words that somebody ties said or whatever it might be,
that unstructured data tends to be really orthogonal to other data sources that exist and so harnessing that stuff,
both text and audio and video and photos and all that kind of stuff I think it’s going to be really important.
You know we started to do some work in that space using like a decapitated neural networks to feature eyes images and and so on so I think there’s some exciting things to come in that space the other thing to look for,
from datarobot in particular over the next year is the things we’re doing in the time series space.
We have a Time series beta that is out now that is doing stuff with time series data that I’ve never seen before,
that the the automated feature creation the the forecast and capabilities of datarobot there are pretty unique in the market and I don’t think there exists another product out there that they can kind of they can do what they do robot,
candidate type series space that’s another really interesting area.
[41:53] Chill like a Time series indexing or time series and then model development.
[41:59] Yeah model development so forecasting type models.
In the may be in the market space or you know there’s lots of different applications of of Time series I mean all all the sensor data that we that we get in factories today’s Time series.
[42:16] Yes I think about so time series I start to think about the a lot of the security and log players have to be folks that would be interesting for you to use as sources.
Something I’m thinking about things like when you say timeshares in act like Splunk time series DB anything that does a Time series index seems like it might be a good source that then you could use.
Did robots time series product to build models against.
[42:40] Yeah it turns out that that kind of munging time series data is hard right it’s most time series is,
aren’t the real regular so your regular time series and then looking at you know what are all let me just imagine all the aggregation so you can make ride you got like moving averages what it what was that,
what happened over the last day two weeks on this in the state at how does that help me forecast forward for this the series that I’m interested in it and so on so.
Time to feature engineering stuff is really interesting.
[43:12] Very cool. So it said I was so pretty on datarobot any other cool announcements that that you guys are going to be arson psycho and I was the cool things we should pay attention to specifically the date of robots doing outside of this time series.
[43:24] This that’s cool stuff coming all the time so you know I would I would say have a look at our our website and see you know what what’s what’s all,
you know in the works there lots of stuff that’s going on but I can’t talk about at the moment so you know lots of lots of exciting opportunities out there at the data science world is a big space,
I am so you know reaching out to all the little mix and crannies of of the kinds of problems with that folks are facing today’s it’s a good time.
[43:54] So do you do you personally do you go into any of the any of the data conferences like are the conferences of interest that that you spend time at.
[44:03] Stratus always big we do Strata every year that’s it it’s a great conference there’s another one called the open data science conference odsc,
it’s it’s kind of an open source conference it’s that exciting that’s that’s one that I always look forward to that would happen several times a year so those those two are good ones today,
that I tend to enjoy.
[44:26] Awesome well we’re actually a community sponsor of O’Reilly strata conference is globally and for those folks listening to remember.
You need to drop a review and iTunes or.
Subscribe to the podcast and they will have a chance to win a free pass to strata San Jose which is in March this year so we’re huge fans of strata ourselves.
Actually I don’t know if you saw but O’Reilly actually lunch there this year they’re going to have an AI set of AI conferences one in New York one in Beijing and one in London so I’ll be excited to check those out this year as well.
[45:02] Yeah the last time I was at strata I think I went by the O’Reilly booth and they were doing headshots there was a massive line kind of out the door doing headshots at the O’Reilly booths they always stupid stuff there.
[45:16] Those are used for a decapitated neural network.
I kid I kid Greg I appreciate the time it’s been super fun Thomas Kyle thanks for being on what we’re going to do now though is where the shift gears real quick and we’re going to go to our rapid-fire section and what I want to do is is Greg setback.
Relax and give me the first thing that comes to mind when I ask you this what these questions okay.
What year will Skynet go online.
[45:46] 2025.
[45:49] Alright what’s the best book you’ve read in the last year.
[45:54] Oh boy I don’t so many to pick from I actually read Watership Watership Down with my kids this year cuz it’s super sad but it’s a good butt.
[46:04] Alright what particular genre of music are you rocking right now.
[46:09] Oh Willie Nelson all the way.
[46:11] Classic huge fan what is your favorite piece of utterly useless technology.
[46:20] Wow that’s a good one sitting on my desk here is a motor from a player piano.
That is operated a hundred percent by suction so it’s a suction powered motor.
[46:37] I think really sucks.
[46:40] I rebuilt it about a year ago in my shop just for just for giggles and I can’t think of anything to do with it but it’s so cool that I can’t I can’t imagine getting rid of it.
[46:51] So I can just see you twisting your mustache while working on a suction motor in your shop.
[46:55] You got to see it man it’s super cool.
[46:59] Alright so what is your biggest Money Pit right now.
[47:03] Oh I’m actually starting to consider making YouTube videos.
So I’ve I’ve recently bought some lighting and and that’s some audio video equipment stuff to start start doing that so that’s certainly going to be a money pit I don’t expect to ever get any any benefits from doing that.
[47:21] Is that like a and blogging format or educational videos.
[47:25] I find that there is a vacuum of quality product review videos out there.
I have a variety of interests everything from Firearms to woodworking to you you name it so.
[47:41] True Renaissance Man.
[47:43] Exactly so yeah product reviews.
[47:45] Are you a cook.
[47:48] No I don’t cook but I do eat.
[47:51] We won’t we won’t take you down to sous-vide pathway that’s another one will teach you about later alright are you going anywhere interesting soon.
[47:59] I have to be in London in a few weeks for for work but I don’t you know I spend up I spend most of my time traveling these days anyway so the most interesting place that I go is home.
[48:13] Couldn’t agree more alright last question what show are you binging on right now.
[48:20] I just caught up to the end of Game of Thrones so the current debated my house is what next we’re about a season behind in Westworld which is an awesome show.
And my oldest son is catching up on The Big Bang Theory now so so that plays at my house as well.
[48:39] Good choice thank you so much for your time let me ask her real quick where can we find you on social if there’s one to follow you check you out check out what you’re up to.
[48:50] Yes and my Twitter is tweeting as Greg which is always exciting that’s really the only social media I do I’ve got a Facebook account but Facebook is for grandparents at this point set.
[49:03] And it’s all alright well Greg it has been super awesome I highly encourage her listeners check out due to robot really cool.
Piece of technology empowering analyst and those folks just interested in a science to really have data science tools at their fingertips we appreciate you listening.
Have a great day.