Title: “A Look at the Modern Data Architecture with Hortonworks CTO Scott Gnau”
Host:Cory Minton
Co-Host: Brett Roberts
Guest: Scott Gnau

 

In Episode 18 of the Big Data Beard Podcast, Cory and Brett talk with Scott Gnau, CTO of Hortonworks and they learn why Hadoop is absolutely not dead.  Scott shares his top trends in Big Data that shaped 2017, what a modern data architecture means in 2018, and why San Diego is a pretty ok place to call home as long as you don’t like the NFL.

 

Transcript
Cory Minton:
[0:00] Hey this is Cory Minton we are back with another episode of The Big get a beard podcast and today we are recording from hortonworks sales and partner kickoff 2018.
I am beautiful Phoenix Arizona at The Phoenician Resort and we are.
Lucky enough to have mr. Scott gnau CTO of hortonworks doing in this here in the Cypress conference room Scott welcome to the show.
Scott Gnau:
[0:23] Thanks for having I see you pull out all the stops on the accommodations for today’s podcast is beautiful thank you.
Cory Minton:
[0:29] Yes we’re going for a conference boring is the theme of think that we went for with a mix of Paisley and.
Scott Gnau:
[0:37] Will focus on the big issues and it has.
Cory Minton:
[0:38] That’s awesome well joining me also a co-host for today is Brett Roberts and we are going to jump in with Scott because sales kickoff is really where.
Like every tech company we bring you the best and the brightest from the company leadership to come in and talk with the sales.
Pre-sales and partner community that support and leave it organization to hear what’s going on what’s 2018 going to be about but because it’s the the start of the year I can’t like to take just a second Scott and talk with you about.
2017 and get kind of a backward view of you know what were some of the big things that you saw his Trends or macro Trends in 2017 that you thought were you know really pression that I think you’re going to maybe set us up in 2018.
Scott Gnau:
[1:20] Well.
2017 I think was a great year for many reasons but on a technology front it feels to me like two themes really crystallize are very important for.
Both the industry as well as hortonworks in general right the first is that I think.
People started to really realize that big data is more than Hadoop.
It’s more than GM running a spark job and all of those things it’s really about being able to capture and connect lots and lots of different data and that means,
data in motion and data at rest are really important concepts for being successful in the Big Data space and of course,
this is something that we’ve been position for is a company and as a product stock for some time but I really started to see a lot more realization out there of hey there’s more to the story than just,
Hadoop and start a bunch of data and running a bunch of jobs and and so you know we’re starting to see a lot more implementations a lot more success stories and things kind of,
driven along that path so that was that was big for me because it was kind of a validation of what we’ve been doing and it was also a good thing,
just because of us commercially and the product stack that we have but I think really for folks to kind of put their best foot forward and not have a big piece of the story missing in the technology strategy so you know the definition of modern-day in architecture.
[2:50] And folks really looking at their streaming strategy as well as heard a two-story strategy strategy strategy around.
A modern data architecture and I started to really see that come in full force that was kind of one big.
The second big thing that I saw which you know it again depending on which channel should read and which magazines and on someone.
Yeah a lot of people talk about the hype of artificial intelligence anywhere from you know robots are going to eat Us Alive and there will be no more humans or we won’t have any jobs anymore whatever it kind of it one end of the extreme,
to hey this is just over Heights and what the heck is it’s really just like more math,
but the thing that was really interesting and crystallizing is I think.
The whole machine learning and artificial intelligence wave has brought with it.
The notion of the true value of all of this data right and so they’ve been some Skeptics have been looking at the Big Data.
Craze meal for the past 10 years and saying wow dictated it’s kind of interesting but who cares right it’s not as valuable as not like the Erp transactions that we had in the old bi stuff or you had very dense data relatively speaking.
And the density of that data created some economics it created some value proposition and so on and people looked at the Big Data space I said yeah it was a lot of data being created but who cares.
And the AI saying really I think clearly identifies yeah and and you should care because right and.
[4:27] And so whether you’re wherever you are in the hype cycle of the way I would Define it is thinking about deep learning machine learning that drive kind of artificial intelligence applications,
it’s not a New Concept write the math was invented in the 1860s or whatever.
But the difference is the more data you add to these algorithm somewhere accurate they become and the more accurate these are greater than speak um those algorithms can mathematically be more accurate than.
Very sophisticated human beings.
And I don’t see that as a threat I see there’s an opportunity wouldn’t you like to have a predictive model for personalized medicine telling you how to go cure your illness then a Doctor Who’s 50 years in the industry but not quite as accurate.
There’s a huge opportunity there so this whole thing really identifies.
The reason why you need a modern-day architecture is because there are these applications out there that are in riched by and enhanced by and can create new business models around having more and more data and the ability to process that data is socially and create.
New kinds of decision-making capabilities and so those two things really came together right the realization of streaming and storage and analytics in the kind of hole modern-day architecture and oh by the way,
there is actually a used case that is extremely high value right and and hey capturing this data actually does make sense or not crazy.
Cory Minton:
[5:51] Yeah so what are they actually heard this said multiple times right in the Press you know and it goes along with that hype cycle thing you talked about which is,
so many folks if you look at the Genesis of organizations like hortonworks and and the development of Technology was amazing in this early rise with so few people really knew how to take advantage of it and we’re really like you said there was some.
At least we saw that there were some of this it invest in it without a real clear understanding how to derive value from and and there was these press books and like all I do is dead like this is a dead technology.
And I and I feel like this opinion and I’m curious if you feel the same way or feel very differently as I feel like.
Hadoop in the way that organizations were buying it originally which was we need to buy technology for the sake of technology is dead but buying a as you describe the modern data architecture feels like something that’s.
Maybe the it’s it’s the next instantiation of using these interesting Technologies in a real way you think that’s an accurate assessment of you feel differently about it.
Scott Gnau:
[6:48] I’m on a parallel thread with you.
I don’t know if I would go as far as to say Hadoop is Dad because of course that’s kind of a headline attention getter and all that kind of stuff.
Cory Minton:
[6:59] Nobody would ever do that.
Scott Gnau:
[7:01] Yeah but but it it’s like a piece of the puzzle and it’s a key enabler for the modern-day architecture and I’ve done a piece and there’s actually a website where they talk about the data Tipping Point and you know I don’t want to,
spend 40 minutes going through all of that here but I certainly heard folks take a look at it,
I think the key thing that the Hadoop stack did for the world was.
It change the Paradigm where the old Paradigm for analytics data storage was you had to actually go to find a schema and then set up an ETL if you had to load the data and then you have to figure out what the heck is there.
In the new world that’s impossible a lot of it is created outside of your firewall you don’t even know what it is you don’t know how his crate is going to change over time as,
sensors get reconfigured and someone so the stack changes I Paradigm where you can actually load the date of first and then go figure something out about it.
That’s the key concept now whether it’s you know this Apache project that Apache project whether it’s you know open source tonight that’s actually not the important thing you poor thing is we change the Paradigm to make it easy to consume data very quickly.
Everything else is then history and got kind of a crates to what I described earlier as this modern-day to architecture to enable.
New kinds of applications and the fact that the core technology the poor who do technology change that Paradigm in actually reverse that paradigm.
Is what made the difference now they’re a bunch of analysts out there some of whom are really good friends of mine although in in the public domain sometimes we sparked a little bit which makes it interesting.
[8:32] Now saying Hadoop is Dad or even there was one that sell the hoodoo vendors that aren’t really who do fenders anymore,
and I would say that’s absolutely true because it’s not just to dupe it’s really this modern-day to architecture this enabled by Hadoop and many other things.
Cory Minton:
[8:48] Yeah I would actually I’m sure so that cuz it smells very similar to The evolutionary thread if you pull on a bunch of different technology companies right so a dupe was the this technology stack the change the Paradigm of how we deal with data and we deal with.
As you said transactions that are no longer dense and easy to put it in the Erp there sensors there at the edge they’re not our data feels like paralyzed like what databases in Oracle did write Oracle started as a database company.
And then use the fast forward 5 6 10 years into their evolution and Hugo are they database Company still.
Scott Gnau:
[9:21] Yeah you know it’s it’s not an uncommon theme in shortly I’ve been in the date of business for longer than I should probably care.
And then I went through kind of the data warehouse bi Evolution starting in the late 80s.
Dating myself now and you know way back then and you know when dinosaurs walked the Earth and I was starting my career,
you know one of the pool Concepts was G,
instead of having just as big monolithic database I can paralyze his stuff and so you know by paralyzing the stuff I can store more data than ever like maybe even hundreds of gigabytes,
imagine that.
Excuse me and I can efficiently process at data using prologo them so I can get response times that are reasonable for human thought and so I can combine that technology.
In space is what makes sense like Telco where tacos you know might have a huge database and might want to do some price elasticity study or something anyways it was technology that solve the problem for a very specific very scaled use case,
and all of that morphed into what we now know as the bi landscape which is a multibillion-dollar kind of annual thing,
and by the way if you don’t have a bi strategy you’re not in business anymore right it’s it’s in every unit item.
And so that whole morphine eyes I see is what we’ve done right Hadoop started as Haley change the Paradigm you can I store a bunch of data.
This is very effective for extremely scaled organizations that have a very unique problem like Yahoo where we came from.
[10:54] And and now you look at that and say gee that’s really interesting technology but here are all of those business use cases and solutions that you can now go saw.
With the application of this technology and so are morphing into a solutions kind of world instead of a tack.
Tac only kind of world natural maturity by the way it is I see it happening about 3 or 4 times faster than it happened in the r dbms time which you would expect.
Now based on the that the radar pace of change you would also expect that based on the open-source Paradigm that we’ve adopted where we’ve now got.
Collaboration instead of competition in the space and so.
I think that’s all part of the natural process and I would expect in 3 to 5 years you know you’ll be talking less and less I even talk less and less about Hadoop whenever I talk about modern-day architecture here’s how you can solve the problem and so on.
And as soon as you Solutions become ubiquitous lb new Solutions new names new business models.
But they’re all built around that same Cornerstone of differentiation of the port act where the coretec now enables you to do something in a very different way than you were ever able to do it before.
Brett Roberts:
[12:06] Yeah I like this.
[12:08] Idea this this turned out the model did architecture I think that’s that’s really interesting and I feel that there’s a new or renewed focus on machine learning as a way to apply it to the modern data architecture that’s really about it.
[12:20] And it really is getting it it’s late so I think that’s so cool.
[12:25] Trending 2017 where machine learning has become more of a focus and when applied to the modern data architecture it’s really but you know driven it along.
Cory Minton:
[12:33] Can I should I do this morning can we unpack modern did architecture cuz I like that term but I feel like that maybe one that.
It could get lost on folks if they don’t understand what you mean cuz I know it I think I know what you mean from a product perspective but,
help me understand that a strategic level none of the product level what is modern-day architecture mean to you and it works.
Scott Gnau:
[12:53] Sure and obviously it comes around that core I do but able to right where we.
Where we’ve effectively been able to reverse the process of data acquisition and data analytics where you can acquire and analyze vs. Analyse and acquire.
So that’s kind of the core and the other thing I like to think about as a concept for a modern-day architecture is really connected not converged and it’s almost like we reverse the polarity in the entire industry.
Which I’ve been part of for a very long time where for 30 years it was all about pull all of the data together into your Enterprise data warehouse.
Model the data make it there no more form have one copy of it and then distributed to everyone so it’s a converged pull it all together.
In an iot world that is just physically and dramatically impossible to have happen it’s just not going to have.
So we talk about here is being able to play it where it lies being able to have data in many places being able to have analytics that are portable and not data that.
All right and so that has again that’s it’s a reverse polarity you’re pushing process to the data instead of pulling data to the process and so all of those concepts are really built into what I described as required for that modern,
State architecture and certainly what we’re building and then that and by extension.
Because a lot of this is being driven by sensors and iot and those kinds of applications you’re also talking about a large majority of the effectiveness of your architecture being how you actually.
Move and capture data at the edge in the cloud on Prim and how you distribute applications to the point where.
[14:32] I think modern-day architecture really needs to support applications running at the edge.
Right so for many years even in nrt BMS Centric world who talk about real-time processing in stream processing and what that typically man was Jeep running faster.
Cory Minton:
[14:47] Real-time will get it closer to when I got it.
Scott Gnau:
[14:50] In a world where your.
IPhone will tell you 5 minutes before you’re going to have a heart attack that you’re about to have a heart attack and call an ambulance for you.
Let Me Shine a little bit far-fetched but it’s not unrealistic right in that world that application can’t run somewhere in a Datacenter it’s got to run on the phone.
So that means it needs to be distributed app it’s got to have access to.
All of the hundreds and thousands and millions of petabytes of historical data for model building but it’s got to be able to execute that model locally at the edge or it’s not real time.
Right and so application architectures and data architecture that support that kind of modeling where.
You’re capturing data at the edge of capturing data at the edge you’re analyzing get at the edge of publishing date of that make sense to publish your enhancing models a hundred percent of the time and then you’re redistributing those models back out to the edge to run locally.
That’s a modern-day architecture.
Cory Minton:
[15:50] That’s ocwencustomers a lot about the sum of the use cases and challenges iot is obviously a very interesting one.
People get excited about like 5G technology know to make it possible for a sexually finally get the data pack in the Datacenter cuz they still operator the model that I need all the date of the comeback.
Scott Gnau:
[16:06] And you don’t need you don’t need all of the data to come back in fact one of the things that I’ve talked about a lot in 2017 is the concept of skill ability and again.
For the first 30 years of my career skill ability was well how many gigabytes of data do you have by the way my first day to Warehouse in 1989 how big was it.
[16:26] It was the biggest in the world how big was it.
Cory Minton:
[16:29] 30 gig that’s huge.
Scott Gnau:
[16:30] 30 gig 30 gig MN.
Cory Minton:
[16:34] How to impress by your database right now.
Scott Gnau:
[16:38] So so my point is scalability has always right now we’re talking about petabytes in and Beyond okay that’s really cool 5 years from now.
Scalability is going to be about the amount of data that you can impact not about the amount of data that you store centrally.
So I have Edge processors I have sensors they all have local storage right can I impact can I connect that data without having to move all of it that’s going to be the definition of true scalability think about that jet engine that creates a terabyte of data.
All the time every hour and now we have 5G and satellite okay great I can transmit that turned by today to every hour why what you going to do with it do you need to do that.
No what you need to do is have a model that runs locally against the date of this being created finds exceptions and then decides when to transmit things and or can make a local decision to improve overall safety.
Cory Minton:
[17:29] I agree with that the question though is is it if I want to build a model for a jet engine this running and thousands of airplanes around the world.
Where do I get the date of the bill that model cuz don’t have to bring some amount of that sample data back and into my you know my data plan.
Scott Gnau:
[17:44] Or your cloud or.
Cory Minton:
[17:45] Where you at where it’s going to go somewhere because it.
Scott Gnau:
[17:47] Sure but isn’t it better to do that when the plane is on the ground connected to WiFi and using sheep and what.
Cory Minton:
[17:52] I don’t know yet I just texted that I think that’s the consummate challenge that I think people get wrapped around the axle on when they start the architect these things it’s like.
It’s not an it’s not an all-or-nothing right it’s it’s it’s like we said before like it’s not that Hadoop is dead it’s not it’s not it’s not that binary it’s you still need lots of do to come back into that.
You know part of that motor did architectures having a.
Scott Gnau:
[18:13] The more data you have the more accurate your model be.
Cory Minton:
[18:15] Estimate I like you said it’s 18660 statistics right work building math on models of large amounts of data so we still going to get the data back but what you’re saying is we need to figure out a way to build models that are portable that can be run at the edge like you said to.
Scott Gnau:
[18:29] You can make decisions at that.
Cory Minton:
[18:31] Absolutely that’s very cool.
Scott Gnau:
[18:32] And reprioritize data streams at the edge as well.
So you think about the connected car case or even autonomous a Vehicles which seems to have everyone’s imagination.
[18:45] That is the ultimate coyote application because it’s got to make decisions in real time locally there’s a semi bearing down on me what should I do.
I don’t want to wait for GE do I have a 3-g signal let me transmit someday. Let me run in and I need to get out of the way while okay.
So being able to have that local model process but also at the same time she I found an anomaly in the day to let me send it up for model enhancement.
Let me refresh the model when you know when it’s appropriate and so on so those are all Technologies are being deployed today and making some of the things that we thought were science fiction.
Actually reality.
Brett Roberts:
[19:21] But let me send that data back for the model enhancement not when the tractor trailers would be like coming down on a right it’s after everything is good after the cars parked and everyone’s out then we’ll see.
Scott Gnau:
[19:26] After I survived after I survived the near-death experience then let me let me go handle that.
Brett Roberts:
[19:33] Oh so you were for the exact exact same case although I used a kid crossing the street do I really want to send the data back if someone’s crossing the street in a self-driving car so yeah I know.
Scott Gnau:
[19:44] But at the same time when you think about the sophistication of those models right and even image recognition of.
G it’s a semi coming right I know the weight I know the relative momentum of that vehicle I know it’s turning radius I know mine I can actually calculate a much better outcome than a human being in a Split Second.
These are huge thanks.
Cory Minton:
[20:05] Now the fun part that gets really interesting this is the this is the part of autonomous cars that I think is really interesting is the ethical question parts of it so like I like the question I read conference I think it was.
Scott Gnau:
[20:18] Software crash has a new connotation and.
Cory Minton:
[20:20] Man doesn’t does it ever right so but I like the example somebody gave of.
I was at a conference in Berlin and the guys from Porsche Digital Labs were talking about it cuz they’re helping develop some of the tunnels driving and they said here’s a question to new lawn in a car if I push a processing of the edge.
What do I tell it to do when the only two possible outcomes are kill driver or driving to a group of people.
And is that something that should be disclaimers to the driver when they buy said car cuz they may not be interested in that one.
Brett Roberts:
[20:54] Is it like a moral checklist like when you like seeing to start the car or is it your phone or setting it up.
Scott Gnau:
[20:59] Yeah or kill driver or kill passenger.
Cory Minton:
[21:02] Yeah I told her exactly so it ain’t who gets the worse the brunt of that,
that 18-wheeler that I identified has a turning radius of X it’s likely to hit this one or this one.
One of these I see that horde works as is bringing the market in this talked about it started on the press for last year was.
Hdb 3. Ohd PB in the horn Works Data platform the.
Score of the modern-day d’architecture if I’m capturing this right tell me what’s what’s exciting that you’re going to be talking about both the sales kickoff with your team but also for the market to hear what’s coming from hortonworks.
Scott Gnau:
[21:42] Well HTP 300 is.
A big deal for us obviously anytime you start with the new major number and we go from two point Baba to three point that’s a big deal,
I think it’s a big deal for our customers really strategically because his introduction of containerization,
across the head of stack and so earlier when I was talking about,
application portability in being able to push out to the edge and so on you know containerization enables a lot of things,
now one of the key things that enables obviously is better Microsoft versus an application portability and so I think from a usability consume ability,
after this will just be yet another extension of where you see our platform deployed one of the other.
Features let’s say inside of a CP3 is Erasure coding which which a lot of folks have been looking for as well wishes,
enables us to store a lot more data more densely of course there’s some performance trade-offs when you do that but the overall savings can be huge.
You know what we’re not talking about a 30 gigabyte database anymore which would fit on my phone but you know we’re talking about hundreds of petabytes or more and so being able to make,
50% difference or more kind of storage footprint on on on that will be a big deal so I think you know a lot more applications lots more access to data.
[23:14] 2 West Nelson different commercial Footprints in terms of the storage footprint I will be really big Headliners at our customer will be excited about.
Cory Minton:
[23:23] Yeah sure Coatings one that we you know we’ve heard about for a long time I think.
For us that we’re going to end of that came into this big kid ecosystem not as the applications are database psypokes became at it from an infrastructure perspective that was always one that I looked at and we went.
Man like if they like I get that hdfs.
Scott Gnau:
[23:41] Just do that.
Cory Minton:
[23:42] I mean is it because the problem and it’s also it’s been a boon for companies to to deal if I can in terms of a way to sell more Hardware right I’m in the lot of companies made a lot of money because.
800 wasn’t very efficient at storing data right it with having replicas which is fine because it’s expressed intention wasn’t.
Be absolutely efficient first it was no store large amounts of data effectively so that I can so I can analyze it right.
Design paradigms and as you said they’re straight off but the fact that that will that now I think to me one that’s one of those that the platform feels like it will be easier for many Enterprises to adopt at a scale that maybe hit that they were held back from before.
Scott Gnau:
[24:23] Yeah and it’s another choice right and it’s another Choice it’s another trade off its you think about it as data temperature and you know I have some some very cold daytime going to do Erasure coding so I know I’m going to pay a bit of a performance penalty on Reed Road.
That’s okay because otherwise I couldn’t afford to store it anyway.
And I’m not doing it for real-time decision-making at the edge while at the same time I can choose more traditional storage methods for data does little bit warmer or I can go completely in memory and use Spark,
or I can use Hive llap for so that’s for high-performance kind of interactive and anywhere in between so I don’t do it as a Boolean thing but more as just creating some more choice,
and again it is kind of extends the validity of the in the Bible of the footprint into new spaces.
Cory Minton:
[25:08] So the container ization support you talked about before there is sort of a battle going on in the container space.
Is there a single container platform that you guys have decided to incorporate is it that you container agnostic what’s the what’s the spin on container.
Company.
Scott Gnau:
[25:26] Yeah the goal is to be as agnostic and open as possible.
You know we’ll be able to use some of the best openframeworks that are out there I don’t think the Container Wars are completely over.
And and so our goal is to be as open.
To what our customers decide to implement as we can possibly be in support that and Priscilla take that through integration with yarn and some of the other core capabilities of the platform.
Cory Minton:
[25:57] Obviously HDPE is getting some big updates.
Are there other parts of the the modern did architecture the products at hortonworks bills to support that are there other big announcements in terms of Technology direction or innovations that are coming across the portfolio.
Scott Gnau:
[26:14] Well I think the biggest thing that we launched and thanks for asking that yeah I think the biggest thing for us is what we launched in the fourth quarter of 2017 which is now behind us.
And that is the date of plane service concept and so we talked earlier about his dad and this and that and everything,
when I think about kind of the change in the business the change in the footprint that we’ve had even during my tenure here for the past three years right is Hadoop and the business kind of move from G,
we’ve got this hdfs in mapreduce thing that was kind of the center of gravity.
When I think about you know what we’re shipping today and what customers really look for in the value they get are the core services around security governance multi-tenancy management.
An Operational Support.
And into what you can bring your data you can bring your data in multiple different formats you can use hdfs you can use S3 and Amazon if you choose you can you can have high if you can have spark you can have hbase whatever you bring your analytic you write your own thing.
The core Services become really really important because when you think about.
The massive capability but also the massive responsibility of having all of this data you really need to have some very consistent security and governance roles.
And and you need to be able to do the operational Management in a seamless fashion.
And so when I think about the value prop around monitoring architecture and what hortonworks bring to the table is really those core services in to which we plug all of the other things and.
[27:52] Do you know the new shiny object Dushore will be something else tomorrow.
So that’s all well and good enter the cloud right so now I’ve got all these clouds Footprints I got on Prime clusters I got folks we can spend stuff up in the club.
And I want to be able to understand my perimeter even though my primeter now extends outside of my data center.
I want to have constant and consistent security governance role so when I say governance what I really mean is where did the beta come from who had access to it what did they do and where did it go.
Cory Minton:
[28:26] Natural governance and Stewart.
Scott Gnau:
[28:27] Provenance.
True prop where did it come from who touched it right all those cans I think so.
[28:36] We’re able now with ate a plane services to to kind of pull those core Services out of a cluster and make them a cloud service that accesses all of their data and all of the Clusters whether they’re on prime or in the cloud.
Cory Minton:
[28:48] And enforcing the policies back in.
Scott Gnau:
[28:50] And enforce all of those policies in the provenance and all of those kinds of things so it’s a really interesting concept I think it becomes a way for.
Customers to really get into the cloud with and understand that they’re able to bring with everything that they’ve built.
Around the security in the provenance out into the cloud and understand and keep track of where those assets are and how they’re being used that’s a really really good thing it enables customers to the cloud agnostic.
Based on price based on Performance Based on corporate and whatever I can go over here I can go over there.
True application portability through data plan Services you can guarantee that your.
Implementation whether it’s on Premarin the cloud is the same versions of software so you’ll have application portability be able to automatically provision and deeper vision.
[29:42] So this is a whole new a whole new way to look at managing managing data over the sysprep footprints remember back I said you know it’s not about convergence about connect.
Write in a connected World which is inherently more complicated and you’ve got day to ashes all over the place we can go to have a consistent way to understand to find and keep track of those assets regardless of where those assets are.
In a very consistent and enforced approach I think it’s very valuable thing and so we launched a plane services in the 4th quarter.
You’ll see us coming in 2018 with more instantiations of the date of playing services.
The way I think about it is there The chordate Appliance services that I described I think you’re extremely valuable if you are.
Hortonworks customer Unico deploy data playing great I also view it almost like.
IOS and app store because now they deplane services having access and mapping to all of your data and you did ask that’s along with the governance and security,
why not be able to plug applications into that and now those applications can have access to data wherever the data are stored anywhere in the simple structure as opposed to having,
a different set of applications in each one of your Footprints and so the first application I sent that we launched with a Des Plaines Services is data lifecycle manager.
Still life cycle manager does backup recovery data copy replication groups kind of data lifecycle management.
Cory Minton:
[31:12] It’s important stuff though because that hasn’t been typically something that was super Jermaine and like.
Scott Gnau:
[31:18] It was hard to do.
Cory Minton:
[31:19] It was hard to do in the hood you’re saying that was one of the things I like Enterprise it practitioners you know the folks are responsible for running clusters that were that they were in all the other Enterprise applications in an environment they would look at and go.
Wait what are we do for Dr and how do I attack this thing up and granite there.
There’s reasons for and against each of those but that’s one of those I think if that’s a huge Boon for you guys and terms of adoption the m&o stuck his big but one thing I will say that you said it a couple times and I don’t want to pull in this thread.
I feel like your data plan is is is almost a requirement now partially because one of things you said was Data assets right I’ve got assets all over the place.
Wouldn’t is it a fair assessment to also say that those could be called Data liabilities like if we don’t have proper security and governance and that audit and lineage.
We can probably find enough stories to just scare the crap out of them.
Scott Gnau:
[32:10] So there a couple of things right there’s an ease of use keeping track of assets and so on so that that’s the good side.
Write the flip side is all coming to a head this year with gdpr being implemented and gdpr is important not just because of the far-reaching requirements but also the penalty phase.
Which can be quite extreme so you know think about gdpr do you want to go implementa gdpr,
Independence solution that way you just paying the cost and so on or do you just want to build it into your modern data architecture,
where you’ve already got provenance so you know where it came from where it went where it exists you would you know what your boundaries are and therefore you can be compliant in a very seamless fashion.
So I think so there’s certainly use that aspect to it as well.
I think the other aspect in this this is one of the 2018 predictions that I wrote about recently online is you know,
cloud is a huge asset but also potentially a huge liability or is it too much of a good thing.
And when I think about it I think about a person I know in Industries been a customer of mine for a long time and we were talking about his Evolution right.
And you guys get out you came in and told me this Enterprise data warehouse cuz I’m going to save money right I was going to get rid of all these day tomorrow so I can put it in one place we have single version of the truth and save some money.
Going to do better analytics life is going to be good he said so I did that.
[33:41] And I got a lot of benefit out of the edw but you know what I never actually unplug made this day tomorrow it’s not set up so now I’m kind of double paying,
of course the upside was was the enhanced analytics is able to get he said then all the sudden came along.
And and that’s really great.
Because now because of the the Paradigm around to do but I can actually capture data that I wouldn’t have captured before and it’s native form I can create new analytics and all that.
But I’m also still got any DWI still got they tomorrow it’s and now my users are going to go to the cloud.
And so now there’s no longer any adult supervision because you know what for the edw in the day like I had to go to Capital Community I had to go buy a bunch of stuff.
And so at least at least it was some Roi right with the cloud any any any person with a credit card and a need can just go spend stuff up willy-nilly.
And it’s going to be very hard for me to manage and control that.
So what do you think about the value proposition run data playing services and at least be able to create that standard footprint and create that provenance at least you’ll be able to measure and track what’s being done.
At the same time you know 3 those Services be able to measure the utilization rates and hey do I need this instance or not.
And so you know giving giving some level of.
Balance of power as it were in a cloud world I think becomes interesting cuz Cloud I know how it is with data right people get consumed by data they get hooked on dating to get hooked on analytics and it’s like entropy it only goes One Direction.
[35:20] And that’s really that’s really hard even in a normal data center World let alone a cloud world where is seemingly boundless.
Cory Minton:
[35:28] And it is a multi-cloud world now.
Scott Gnau:
[35:29] And it’s a multi Card World.
Cory Minton:
[35:31] Yeah that’s the that’s why there’s a weave recognize cuz I want things your your phone there’s like it generally when it was on premises we deploy it in our Enterprise at to generally we were a little better at like you said securing it make it to the Meadows features in the world.
Scott Gnau:
[35:45] You knew where the fence was and you could see it.
Cory Minton:
[35:47] Yeah exactly know sometimes it wasn’t security as well as it should have been and that’s.
It happens but I think in candidly the cloud whenever you say the cloth I think the cloud is an operating operating model less of a place but when you go to a public cloud provider.
You can still do dumb stuff and if you don’t have a some sort of the governance tool that can help you manage doing dumb stuff.
The cloud however secure Amazon dress to make their services are google-cloud-platform XR services.
You can still have about administrators humans the obviously we’ve talked about this for the the the weakest link in this whole chain is the heat up and we can make that decision so that’s good that she was going to go to extend into that that that new world of multi-class.
Brett Roberts:
[36:31] Hey you’re really only secure is the person clicking the button right.
Cory Minton:
[36:34] That’s the butts in the seats so out of curiosity so.
Obviously modern architecture big Trend we’ve talked about the last hour she gave me the date of plane service is helping extend the capabilities to a higher multi-cloud hybrid world.
Solve some of the governance and compliance challenges.
What about on the streaming xikezan ohd F was a big part of the modern data platform is is there anything happening with ex-gf this year that we should be paying attention to.
Scott Gnau:
[37:03] Well there’s a whole lot going on in that space and thanks for reminding me to bring that up you know.
Cory Minton:
[37:10] I did say this was your show.
Scott Gnau:
[37:11] Thank you very much so yeah hdf we continue to invest very heavily and I mentioned last year there was a big aha moment in our customer install base and certainly through.
The results that we published and so far for 2017 we’ve talked about kind of the increased take rate of HD app as people seem to realize hey this is like as big as storing data is how do I capture it touch it managed to close those kinds of things.
So are approaching the space is that we look at it as data in motion and for us that includes streaming as well as.
Dataflow and kind of everywhere in between so,
you saw us and that’s very heavily in enabling application development with streaming analytics manager that we at launch seal CS with some additional upgrades and enhancements to that,
in 2018 as well as continuing to integrate from an operational and data provenance perspective and make that seamless right.
So scheme registry was our first instantiation of that again released last year and you’ll see us move the ball forward as we.
I continue to invest in that as well as integration of things like schema registry and Apache Atlas and.
You know some of the other governance models to make provenance even easier and more seamless and I think again it would it would be even though it’s a discrete set of products that is different that we.
That we bundle separately because they have separate used cases and separate needs I think over time you’ll see kind of a blurring of the lines where was almost look like one big thing.
[38:48] Right around that revolves around those kind of key core services and a date of plane Central kind of world.
Cory Minton:
[38:54] Yeah so what did you talk about with Alan Gates on a previous episode was that works obviously this great Product Company developing some very interesting products to solve interesting,
gaps in the Enterprise adoption of a modern data platform that’s very cool thanks he said never heard you talk about it a lot is really around,
solving customer use case right that was you’re going to think for 2017 is like Hey we’re really getting this this Roi this this real achievement of we’re using it,
properly to solve real customer problem so I’m curious coming out of 17 was there like what was your favorite,
my customer story was there a customer like vertical that was the most interesting to you.
Scott Gnau:
[39:33] Wow you know there were a lot of customer stories and I found very interesting you know I’ll date myself again and talk about history repeating itself.
In the early 90s I remember a bunch of Industry analyst running around saying 80% of BBWs fail.
[39:54] And she what was the highlight of the analyst reports last year at 80% of the Gator projects are unsuccessful is that pop up up up up up up.
Okay whatever my point is that obviously we wouldn’t be here.
And we wouldn’t think it was disappointing if we weren’t solving real production problems for customer so,
one of the things that I personally have issue with is a bunch of animals running around kind of scaring the world like they did in the 90s and look what happened. We came this multibillion-dollar thing and they can everybody’s doing it and successful on Table Steaks,
so I think we’re kind of on the cusp of that happening in our space and so it’s really important the flip side is that a lot of customers who are doing things with our tag and with big data.
They’re very breakthrough that they’re using to build new business models to be more competitive so the last thing we want to do is actually talk about it.
And tell everybody else hey this is what we’re doing.
Cory Minton:
[40:52] I’m going to remind some people this because like that’s one of those things that is practitioners in the space,
one if your customer and you’re trying to find out what other customers are doing like if you go to the conferences it’s really hard to get him to talk about and then if you’re up a provider in this place I’m sure the hortonworks sales teams deals with this I know that a lot of it,
Chicken on sales and preschool themes deal with this the customer go hey tell me what your other three customers just like me are doing with your technology and you go.
It just doesn’t make sense.
Scott Gnau:
[41:20] At work can’t sorry.
Cory Minton:
[41:22] Yeah exactly there’s a legal reasons why they can’t you guys I’m just really glad to hear someone.
Scott Gnau:
[41:25] Yeah so so we’re kind of in that conundrum right now it’ll this too shall pass you know I’m not entirely worried about it but of course I’m.
Any anytime I find someone who will talk about something publicly is like yes please let’s go do it and we want we want to Foster that environment and accelerate it.
There was a white paper to blog published by Geisinger Health earlier this year that I found it really interesting and rewarding because they name names and,
talk about what they did and talked about the success and how they were able to me to work with her business to her.
Are utility in the UK called centrica similar thing they’ve done some use cases spoken at our conferences and other conferences about what they were able to do.
Leveraging Big Data technology and so those use cases are starting to show up and and I find that extremely rewarding.
I put the shout out to them you know they’re in the public domain and obviously I would encourage folks to look at hortonworks. Com any anytime a customer gives us permission to talk about what they’re doing we’re going to put it out there as well just because it’s like really important.
But there are you know I can tell.
Not just because I run around the world talking with customers and Prospects is my role of CTO but also does have our support organization.
That’s part of my team and I can tell just from our support calls and the intensity of what our customers are asking us to help them with.
[42:57] That we’re seeing a huge rush to production and production use cases meaning.
Business processes are depending on this technology and I see that is extremely encouraging and and frankly I I will debate till the last of my breath is industry analysts should say.
Big data is failing Hadoop has failed or Big Data projects are not in production cuz it’s just not what I’m saying.
Brett Roberts:
[43:23] Not the case so did it work Summit coming up a couple of shows globally in 28.
Scott Gnau:
[43:30] They work some it never heard of it I’m just kidding.
Brett Roberts:
[43:32] You got to obviously a big part of that any preview you can give the audience anything you can talk about what we’re going to talk about me but you’re going to be talking about any of these shows anything you can tease out there for us.
Scott Gnau:
[43:43] It’s going to be great the you know first off you know we hit we rebranded today to work Summit from.
Hadoop Summit which gets to the earlier conversation we had a few know it’s is modern-day to architecture much more than it is about the data and what you can do with the data how data works.
Get it that’ll work Summit you’ll see us focus on customer use case examples and and really letting our customers have.
The driver seat in terms of describing And discussing and talking about a reticulating what they’re doing of course in each of those venues will have some additional product updates.
I don’t want to say too much more because then you won’t TuneIn.
Cory Minton:
[44:26] There you go let me ask you a question if if folks wanted to find you on social are you on the Twitter.
Scott Gnau:
[44:35] I am on the Twitter.
Cory Minton:
[44:37] And do you have any.
Scott Gnau:
[44:39] Scott underscore yes I’m starting a hashtag I’m from San Diego.
And I am particularly perturbed and insulted that the NFL football team was allowed to move to Los Angeles so I have a hashtag campaign called hashtag Dean is an ass focus on the owners of the.
Gnau La Chargers who left after making billions of dollars out of the San Diego community over 40 years to LA.
Cory Minton:
[45:09] I got you okay so go follow me on Twitter I’m assuming we’ll see you the day to work Summit.
Scott Gnau:
[45:13] You might see me at the day to work Summit perhaps if I’m invited.
Cory Minton:
[45:17] Excellent,
well very good one Scott we had a blast chatting with you it’s great to get your Trends going to where you at where you see a 2017 brought us to now and what’s ahead of us in 2018 so I want to take a few minutes shift gears for a second personal,
we have a rapid fire section.
No we’re not done so hang on it’s just it’s a few quick questions and what I want you to do sit back relax and just going to say the first thing that comes to mind when I ask you these questions okay.
It won’t hurt will be easy on you or maybe you be easy on us.
Scott Gnau:
[45:46] Dean is an ass.
Cory Minton:
[45:47] Alright so what year do you think Skynet will go online.
Scott Gnau:
[45:53] I don’t know.
Cory Minton:
[45:54] Okay if you bought me a book what would it be.
[46:02] Or maybe best book you read in 2017.
Scott Gnau:
[46:05] I didn’t have time to read I was busy Building Product and talking to customers.
Cory Minton:
[46:10] It’s alright.
Scott Gnau:
[46:11] Come on now you’re probably with some of the Tipping Point stuff Malcolm Gladwell.
Cory Minton:
[46:14] Big fan of Malcolm Gladwell what genre of music are you currently into.
Scott Gnau:
[46:22] Wow I went to see Wally Nelson night before last.
Cory Minton:
[46:26] Awesome yeah but Willie is he Super C.
Scott Gnau:
[46:27] Although I don’t like country music I happen to find talented musicians everywhere very entertaining.
Cory Minton:
[46:33] That’s awesome literally we had a guest the guest on the show your I think you’re the only two people that have said like the same artist which is awesome way to go Willie so let’s go the next one what is your favorite piece.
Of just utterly useless are goofy technology.
Scott Gnau:
[46:50] My favorite useless technology is the windshield wipers in my car.
Cory Minton:
[46:54] Is the guy.
Scott Gnau:
[46:57] I live in San Diego.
Cory Minton:
[46:58] That’s awesome.
Scott Gnau:
[47:00] Nana well now here’s the thing their way over engineered.
And every time you turn them on and off when they come to a rest it’ll slip half way so that every other time it’ll rest on the opposite side of the rubber blade.
Cory Minton:
[47:16] Oh goodness.
Scott Gnau:
[47:17] In San Diego where it never rains and I actually never turn them on.
Cory Minton:
[47:20] Nice what is your biggest Money Pit right now personal Money Pit.
Scott Gnau:
[47:26] My biggest personal Money Pit it’s either my wife or my dog or perhaps the two of them combined.
Cory Minton:
[47:32] Family got it okay and are you going anywhere really interesting or cool soon.
Scott Gnau:
[47:37] I will be in Mumbai in 4 days and that’s always cool and interesting.
Cory Minton:
[47:42] It is India is super fun and then are you binging on any particular show right now.
Like on your flight to Mumbai are you going to rock in any Netflix downloads.
Scott Gnau:
[47:53] You know I’ve watched all of the ones that I wanted to binge out on I would say I’m really looking forward to the new season of Homeland.
Cory Minton:
[48:04] Excellent alright.
Scott Gnau:
[48:06] Which will not be available in time for my 24 hour flight to Mumbai.
Cory Minton:
[48:09] Bummer.
Brett Roberts:
[48:10] Just got to be watch all the previous seasons.
Cory Minton:
[48:12] There you go hello. What’s got it’s been super fun to have you on we really appreciate it Scott gnau again from a hortonworks here at the hortonworks sales kickoff for 2018 thanks again and thanks for tuning in.