BDB Podcast Ep:15 “Hadoop Community Updates with Alan Gates from Hortonworks”

hadoop community updates

Title: Hadoop Community Updates with Alan Gates from Hortonworks

Join the Big Data Beard Podcast in episode 15 where have special guest Alan Gates from Hortonworks. In this week’s episode we explore where the Hadoop community¬†is headed in 2018 and what’s new with Apache Hive. We also take time to dive into Alan’s background at working with Hadoop, Hive, and Pig in the early days at Yahoo.

Special Guest Alan Gates

Alan Gate is a founder of Hortonworks a leader in developing open source Hadoop ecosystem tools. Alan worked at Yahoo during the early days of Hadoop and was pivotal in creating Pig an open source ETL abstraction tool for unstructured and semi-structured data . Alan Gates full bio.
Big Data Beard Team
Host: Thomas Henson
Co-host: Cory Minton, Robert Hout

SHOW NOTES:

Transcript – Hadoop Community Updates With Alan Gates from Hortonworks

 

Transcript

Thomas:
[0:00] Apics welcome to the big date of your podcast my name is Thomas Simpson and I’ll be your host for today but it won’t just be me I’ve got two other beers that are going to join the bus so Rob how are you doing today.

Rob:
[0:10] Doing awesome great nice rig grey rainy day in Seattle like normal so all positive and thumbs up an optimistic.

Thomas:
[0:18] We’ve also got Corey maintenance going to jump on the bus Corey how’s it going.

Cory:
[0:22] Math living the dream are you guys still growing your beards it’s I know it’s it’s still No Shave November.

Rob:
[0:29] Trying trying too hard.

Thomas:
[0:30] I I did a little trim up and had a Miss lick so I’m I started back over.

Cory:
[0:36] Disappointment all over the place.

Thomas:
[0:38] I took today we’ve also got to guest I’d like to introduce the co-founder reporting Works a big data Community leader and author of the programming pig book Allen Gates Allen how are you doing today.

Alan:
[0:49] I’m good how are you.

Thomas:
[0:51] Doing great.

Alan:
[0:52] Anna I’m in California where it’s nice and sunny no Gloom here definitely and despite the No Shave November I’m nice and cleanly shaven this morning thank you.

Cory:
[1:03] Yeah it’s just disappointing you just bragging we started off the podcast with you bragging about California we know how that we know how that’s going to go.

Thomas:
[1:14] Well and thanks for being on the show Once I wanted to start off by just didn’t give you an opportunity to be able to introduce yourself and then we’ll kind of dig into your background so can you tell the audience a little bit about yourself what you do on a daily basis.

Alan:
[1:26] Sure so let me start with a little history since I’m an old guy so I actually started in the Deep World about 10 years ago and a half now when it was still very young,
my eye started on the pig project which was at the time in Apache.
Yahoo Labs as a research project we took it to Apache as an open source project that was my first experience.
With both open source and Apache in any big way I mean I can give you that a few patches here and there before but this was definitely my first big experience.
Worked on Pig for years moved over and started helping with hives.
Six seven years ago now and around about that time also helped found hortonworks out a bunch of us came out of Yahoo and help to build hortonworks where I am on the,
the architecture team so I actually my day today job,
part of it is still development doing architecture work reviewing other people’s stuff and also part of it is.
Other founder stuff like being involved in Otti and some other things.

Thomas:
[2:46] But that’s really interesting let me dig dig in a little bit on the Yahoo kind of how it all kind of came about so we had bills s’mores on the show not too long ago when he was there at the same time kind of working on some of the marketing research and analytics side of it,
what was it like being a part of being a Yahoo during the days that was just coming out open source until Pig was there hi if it was all kind of being just push to the market.

Alan:
[3:10] Well.
So early on we didn’t know what how big it was going to be right I mean kind of now you look back and think of it must have been amazing bit of the time it kinda was all,
just something we thought was for us and a few other people rate I mean we were involved Facebook Netflix Twitter.
Basically all the web properties they couldn’t buy a big enough database right we couldn’t go to Oracle or teradata or whatever and buy a database it would handle the amount of.
Data we need to move at a price that we could pay them today to maybe could have handled the volumes that we had back then early on but it couldn’t not at a price point that,
it was going to work out so we.
Waited up building kind of our own saying and sharing that with with the others but it wasn’t.
It wasn’t that long before other people start picking it up some of the insurance companies got interested some of the banks got interested some of the government’s got interested,
it’s like oh well maybe maybe this is something that other people would be interested in maybe it’s not just us that’s willing to do it an early on what to do if you did you had to be pretty brave,
to do it right you had to have a fair amount of engineering resources to make it work but.
It it still was something that people start to pick up and it was a ways after that we started to realize oh this could maybe go somewhere and by then there were other people.

[4:48] That were starting to push it hot air amount bar and that’s when we start to realize okay so maybe we could.
We could take this somewhere and so we started work with Yahoo and how would we get out.
Do this on our own at the same time Rob Bearden our CEO from Benchmark Capital approached Yahoo and said hey how would you like to spend this out you know this looks like a pretty interesting,
project and so kind of those two streams coming together work to build order more.

Cory:
[5:21] Does a Confluence of a whole bunch of interesting activity I do find it I do find it interesting that you say like it was just you guys are just kind of trying to solve a problem you’re so much Engineers that are.
Dealing with you know I like you said the web properties are dealing with scale issues.
And frankly budget problems that nobody else was probably dealing with the time right but you didn’t like you didn’t just like this didn’t know in the beginning it was at the beginning of your career like I do just graduated from school was at the Genesis for you.

Alan:
[5:49] No I’m almost at already been doing stuff for 15 years by then I actually my first job was in Tillicum way back in the early 90s.
The by the mid-nineties I work for informix the database company.
So and then I work for a couple startups that I’m sure you’ve never heard of cuz they went nowhere.
And then both of which working inside database database stuff and then.
I actually started at Yahoo for years Sportster doesn’t do stuff not.
You know nothing to do but in just internal database tools for Yahoo since they were building their own Hadoop was not Yahoo’s First Take.
And how to do all this stuff and even originally inside Yahoo dupe wasn’t for the database stuff it was for.
Your car the web search right they needed to rebuild the web search because I saw the web crawl just say what Yahoo would call the web.
On a regular basis to build all its links and I don’t I was not involved in the wave cross stuff so I’ll probably get the numbers out wrong but it took something like a week.
To do the crawl when Yahoo had its own proprietary crawler when they started rebuilding it on Hadoop they got it down to taking hours instead of days to get crawled on.
Then it was after that they turn to it and said.

[7:21] Turn to the internal daily stuff that I was working in in and said well maybe a dude’s the right solution here too since it’s improved the web crawl so radically that’s when I got involved.

Cory:
[7:32] Very cool now is your background you know you said earlier you’re in California in the Silicon Valley are you are you one of these dudes it was like it involved in super cool stuff in college and was already part of,
like a board for Venture Capital firm on the west coast air like what was your what was your Genesis like in education.

Alan:
[7:52] I’m so my undergrad degrees in math and my I have a masters in theology so yeah that fits me super welfare office database stuff.

Cory:
[8:01] I am sure it takes like you said it takes a lot of faith to.

Alan:
[8:05] There is some faith in Baltimore but no am I might have grads in math I got involved in software cuz you know I was out of school in the early nineties and didn’t I,
I have been goofing around not my wife and I took some time worked in national parks and traveled around and then.
Friend of mine said hey you know my company needs.
He worked at this Telecom company is hardwood ad he said hey we need somebody to do tech support them like a shirt beats waiting tables which is what I was doing.
So I got that’s how I got started in software taught myself C and Unix and.
Cuz I decided hate exports not a lot of fun you just answering the phone and I’m dealing with people’s problems I’d rather be riding this software so I figured out how to do that.
And moved into the engineering department then got hired by informix and that’s how I got into database.
I work so yeah I know I took a completely on Orthodox route that was all in Oregon actually I moved to Silicon Valley around the end of 2000 right at the end of The Boom Like kind of time that exactly.
Yeah exactly it’s alright.

Thomas:
[9:18] Buy Low sell High.

Alan:
[9:23] Yeah I never quite got the hang of that I admit I always kind that backwards cuz I can always look at him go that’s not going to work cuz I remember looking at all those web things are going to sell dog food or whatever I’m like,
make any money off of that what I missed was they that’s true in the end but in the short-term was true in the long-term but not in the short-term and.
So I looked at it backwards and then find it exactly wrong and lol whatever.

Thomas:
[9:49] So you spend some time you know working in the national park waiting tables didn’t Benny did 15 years fast forward your Yahoo and so you’re doing and you’re on the hoodoo project and you said,
are you at all just having fun but you really realize it’s going to be such a big thing that you guys are building so what’s the background on why do we have,
why do we have high but we have why do we need a zookeeper dummy did you guys I mean,
I’ll talk to you a little bit last year at the Woodworks Roadshow and he was talking about how you know somebody at one point said no more naming these things.
You know after animals in an effort to keep her damn cuz he had to kind of do it with a mall where you guys just have to open drain to how you wanted to name this and why all the animals.

Alan:
[10:31] There’s definitely some open rain so you know I don’t even know the background behind all the animal names to be honest Pig came about I am told it was already called this by the time I got on the project but originally it was just called the language or something,
and one of the research scientist walked in one day and said we need a name and somebody else said we’ll call it Pig and it suck you know end.
So yeah partly it was just having fun and you can still see that right and I with our recent work to add MPP type support where they call it LOL AP which stands for live long and process so.

Thomas:
[11:08] Right.

Alan:
[11:09] There’s still that you know kind of wild when y’all this is I would tell people don’t let your engineers name projects this is what you get.
There is kind of that but part of it too I would say and I think this is still partly what’s going on there was a fair amount of experimentation Pig was a was a question of.
Do we go back to sequel,
the Wii is sequel the way and I was the you know was the answer of yes equals away Pig was let’s try something else maybe scripting the right answer here maybe.
Maybe people don’t want to think about this only in a sequel leeway and I think one of the things that strong about Hadoop and the kids and I would say now the whole big day to ecosystem not just to do is you can.
Deal with data in different ways the old school database world.
There was Sequel and if you didn’t like it well you should just get over there a 6 late there weren’t a lot of options outside of sequel for you.
I think one of the Beauties about dude in the Big Data world is if you don’t like sequel you can use Pig you can use spark you can use Blink at your doing streaming while or even not streaming,
you know if there is lots of different ways to look at and think about your data and I actually see that as a strength because different problems,
really do better with different,
tools and different people frankly just thinking different paradigms lots of people like sequel it’s what they know it’s what they use for many years and a lot of people look at sequin go yuck I hate that I don’t think and relations I think of data flows and so.

Cory:
[12:45] Yeah but was it was.

Alan:
[12:46] Air tool.

Cory:
[12:47] Was a sequel like that the ability to use sequel semantics.
With the underlying architecture that was to do to me that felt like one of the kind of the Tipping points in terms of adoption specifically in the Enterprise and more masculine option.
But you said it financial services. Do you think that’s true like that’s equal capabilities.

Alan:
[13:10] It’s very true I call sequel the English of the data world everybody speaks it right and you may not speak it well if you may have a funny accent but you speak it right.
And I think that’s what at first that’s where I was at it had the funny accent because it was doing it on top of.
Mapreduce right which was hard now we’ve gone underneath and rework all that so that it’s not so that it’s very natural it has has horse park underneath,
now l a p which can execute those sequel queries in a much more.
Can a natural fashion butt.
So I agree it was very much the the Tipping Point but I would also say that that openness the other approaches has I think made.
Big Data take off because people could do things like machine learning much more easily can you do machine learning and sequin tree can but you know it certainly wouldn’t want to ask somebody to have to do.

Thomas:
[14:18] So if English so it so it’s equals the English version you know cuz everybody everybody speaks it to some extent so I guess everything in.
And I do world and how do because system I guess job is kind of like Greeks all kind of derived from there.
So what where where we going at with that kind of talk about it a little bit and we had a episode I guess I think was episode 2 you can go back and check it out for our listeners but we we talked about you know when everybody puts out these list and.
You wanted one of the languages on there for the big day the world was you know Pig you know it’s kind of going away but I mean we were day works I’m at this year you know in San Jose and I know Yahoo would converted all there.
You know a lot of their big jobs and majority of those over to Taz and then now I’m seeing support for Pig on spark I mean Pig still around pick still going right or is it.

Alan:
[15:04] Take is very much still around,
hives are sorry Yahoo still uses Pig extensively I know some other company still do in fact,
we ask ourselves at hortonworks we asking you know do we still need to support Pagan hdp version 3 when it comes out.
And the answer from our user base came back over sounding yes like plenty of people used it,
they definitely still want us to support it it’s not growing a lot it’s not adding a lot of new features it kind of does what it does and people are happy with it,
but there is definitely still a strong base out there.

Thomas:
[15:45] Somewhere there,
talk a little bit about Pig but let’s shift over and talk a little bit about the dupe so we’ve talked about you know some of the shift and some of the different ways that we can do it so you can you know I have your pink jobs words in,
Jazwares in mapreduce what’s where’s the community now specifically to head Dubai me we see a lot of people that you know the kind of taking a dupe and they don’t talk about it as much everything about spark kind of where is a community leader do you kind of see that going as far as to dupe and meet us a dupe still remain player.

Alan:
[16:14] I would definitely say hey dude so mean still a big player.
I would say I see several shifts kind of going on in different dimensions one is dupe itself is expanding.
There’s been a lot of work on the hdfs side to add an object store so that for people who want to thinking that objects or Paradigm they can do so.
It also actually makes hdfs more scalable in some ways there’s on the yarn side they’ve been doing a lot of work to add container support so that people can run.
General Docker containers on top of.
Yarn cuz once you got all your data someplace it’s really nice feel to run all your apps on there not just the ones that are hit each specific and then obviously the big question there is the cloud what what happens when you go into the cloud.
And I would say there’s several things one people still want to run Hive and Spark and,
Summit Lakes Park and run natively on the cloud hive,
runs by using still the hdfs API so even though it’s in the cloud it’s still thinking about us to do so there’s still some of it up there and see if ya later even if it’s not the execution.
But the bigger thing we’re finding with our customers here at work works is.
Everybody’s doing something in the clouds but everybody still has stuff on premise right it’s really a mixed world so.
Is there going to be a new phone the cloud yeah is there going to be sleep on promise yes definitely for seeing them both places.

Cory:
[17:50] So when you say cloud are you are you seeing.
I mean obviously we know each day AWS was going to the leader in that space for many years but in the Big Data space we’ve seen a number of other like entrance and that the other big tubing for Microsoft and then I also obviously the Google Cloud platform are you seeing.
Trends One Cloud versus the other sort of winning in your accounts are in the you know as a as a macro in the community.

Alan:
[18:18] It’s really Regional here in the US you know we see some of each but AWS is definitely the 800-pound gorilla right as you get outside the US.
Starts the Balance album or we see more Microsoft.
You know there’s still plenty of Amazon but we we start seeing more of a sure especially in Europe in Canada and.
Then you get to Asia and that becomes a whole different set of players cuz there’s a lot of local players there in the cloud and it’s it’s a lot of your pictures you’ll forgive then,
sad pain but I mean there’s lots of different clouds there it’s not quite the big three that were used to hear.

Cory:
[19:01] So what are the,
you know at the community level and even if the world Works level what are you seeing as some of the developments that that are required for organizations in 4 platforms like you know HD p&n others from the horse family in frankly in the community,
what are the big sort of developments that have to happen.
To enable this multi-cloud world because we we know I think you know and that if you’re a hardware seller right cloud is a bad idea but it’s kind of a good idea cuz you’re an arms dealer to them but but if we know like Enterprise is there still going to do the majority of their.
Y’all Russian said always but in general we see a large majority of applications data being on premises what are the big things that that you and your team have to do to.
To make your platform relevant in this multi-cloud world.

Alan:
[19:50] I think the things we have to do our users don’t want to think about where the data is Right mean.
Think about it from an IT perspective for a moment you want it you’re going to have some date on Kramer going to have some on the cloud you don’t want it all in one Cloud cuz you don’t want to be Hostage to whichever Cloud if they decide to double their rates are.
Whatever right,
and so you’re going to spread it around and you’re going to be forced by if you’re at a big company remedyforce by legal requirements to spread it around anyway cuz even if it’s all in the same Cloud if you put it in,
you know if you start some in Germany and start some of us you probably can’t mix those streams and.
And all kinds of stuff right so from an IT perspective you’re going to have it spread around like you said.
From an end-user perspective I don’t want to have to think about that I don’t want to have to think about oh that day to that lives in AWS in Germany but this other one lives in as you’re in Arkansas.
I just want to go get my data and I want to hear. And govern it and make sure it’s secure whatever my job is.
I want to do that so what I think we need to do as a company is make sure we are giving people the tools in Iraq.
Across those systems right is there one paint if you’re the security guy is there one pane of glass that you can go to and make sure the security policies are right for all your data sets regardless of where they’re living.
If you are an analyst and you need to get to those data sets that you may have to be a little bit of wear that because there may be rules against joining across.

[21:24] National boundaries or there may just be the laws of physics that say you know moving.
Terabytes cross-links under the ocean doesn’t work out so well so you may have to,
be at least a little wear but you’d still like to be able to say okay I’m at my laptop I can just go find where the date of said is I need to do my work on that I can go there and clear yet.
I don’t have to worry about myself knowing that are going to some.
Big long search page somewhere that tells me where everything is it it’s that building those tools to make that a smooth experience for the users of things going to be key for us.

Cory:
[21:59] So what is hortonworks is there any particular products that are that have been announcer easily her that you guys are going to maybe have some cool code names for them that help bring that together probably an animal if I had to guess.

Alan:
[22:14] It’s not an animal actually this is cuz the the marketers name this one instead of the engineers so there is one it’s called Data playing and the idea of here is this is the kind of.
The piece that sits there unless you manage all the different parts and so far it’s just released version 1.
In October I believe it was so far it’s supports replication of data so if you need to move your data between clusters that will eventually be more sophisticated and allow,
reputation from cluster the cloud and between clouds and such it will be adding in the governance and security pieces and eventually then the,
no babe the stuff to help users find and query their data so they can get all that but it is.
Kind of where we were pushing customers to say here’s or not pushing sorry that’s the wrong word but giving the customers here’s.
If you need to manage all the stuff if you have more than one cluster or data in on Prime in in the cloud here is how we’re going to help you.
Facebook you track of all that and and gets all of it.

Thomas:
[23:27] Did that come about also for streaming analytics and iot and then you have data in about at the edge sometimes and yep you know more more devices that are coming online are you is that where you guys Percy date of playing playing as well.

Alan:
[23:39] Yeah we do it it’s going to be across all your data assets right again you don’t want to have to think about oh that date is out on the Edge versus it’s in this cluster you just want to do your job.
Now some of that it’s going to have to be somewhat aware of the edginess of things right if it’s actually coming out of your car,
your car might not be connected right at the moment so there has to be some awareness but again we want to integrate all that and give people that.
Kind of single single view of things.

Cory:
[24:09] So sorry imagine cars and Edge processing the the automated there there since you’re the automatic autonomous Car Projects in the big factures in the traditional ones but in the the web.
The properties that you sent for the end of the beginning places where Big Data started some really interesting challenges on that edge.
Sort of space that I think you know obviously did a plane sort of relevant but are there any big.
Kind of technology advancements you’re seeing happened in the community with a no DPI or others that.
But I’ve been just purely based on and come out of that autonomous vehicles or the trend Racine.

Alan:
[24:50] You know I don’t know I’m not close enough to that part of it too I’ve seen what’s going on I mean I know a little bit of what’s going on there in terms of stairs challenges.
You know how much the big challenges are around I generate tons and tons of data.
But my Uplink isn’t very big what should I send what should I not send what do I keep for later what’s important and even how do I do feedback like let me shift the use case a little bit cuz it’s a little more extreme but a similar idea.
You getting these jet engines they produce I like on an airplane I don’t remember how much it is but it comes per hour but it comes out to a transoceanic flight you can end up generating a couple terabytes worth of data out of that engine.
Obviously you can’t upload all that when you’re in the air while you’re flying you don’t have enough space on the satellite link and even when you’re on the ground you’re not always plugged in long enough to get it all downloading.
So what do you send what do you keep and what do you like say you know.
All that data is just saying engine running normally you probably don’t need to send it all but what if one part of the engine is getting a little warmer than it should be does that flip some switch somewhere that says okay now send more data about that particular.
Subsystem because it’s heating up do we need to do something those are the kinds of.
Questions that I know they’re asking in that space But I I’m not really in-depth knowledgeable on how they’re attacking that or or all the changes that are going on now.

Cory:
[26:27] No worries the it’s funny we’ve said Hadoop a handful of times in this in this conversation but,
I find it astonishing how the words are the debrand Hadoop or that name has just absolutely been shuttered by.
A lot of folks in this in this ecosystem in this community why is that why do we say Hadoop anymore.

Alan:
[26:51] I don’t know I’m not I don’t know if it’s a way to.
I can think of two reasons and I don’t honestly know if either of them are true one is just you know it sounds a little bit Kitty right I do been Pagan olap and all these like as we talked about earlier kind of.
A little bit you know the engine somebody let the engineers play and name all the products does it sound more professional if you can talk about you know ate a plane instead of I do maybe.
Part of it too I think some companies at least really want to.
Talk about what how they’re differentiating on top of it since the deep is open-source since there is multiple places to get it so way to talk about it.
Probably actually the the real answer now is as as a technology spreads out.
It’s becoming less and less about the technology and more about more about the use cases it’s all right.
When was the last time you saw an ad for just oracle’s database server.
Probably see it song but a lot more common now is to see ads for all the apps that you can run on top of it right they can do HR programs and they can do your P programs and all these other things.
Because.
They’re trying to connect with people that want to that have specific problems to solve their not in a silly Engineers that one particular technology I suspect we’re probably.

[28:26] Maturing into that same space where people.

[28:31] Are connecting with customers who have a problem to solve rather than with Engineers who were like who we know this is the latest coolest technology.

Thomas:
[28:38] So you talk about how you start off in Pig and you’re involved in the hive project are there any other open source projects that you’re part of and that you’re involved with in the community.

Alan:
[28:47] Sort of so one of my rolls since I’ve been in Apache for a while now as I help Mentor new projects when they come into the incubator.
So I have been a mentor to many many projects playing who’s the big top.
A bunch calcified many more but I can’t think of and wouldn’t remember the names of and that doesn’t mean I wrote code on those most of those I’ve never written the line of code that’s about helping them.
Learn the patchy way right when they come in and some of those are brand-new projects that need to.
Get going from the ground up some are already existing projects but they’re becoming Apache and they have to learn.
Are the Apache way and how we do things so in that sense I’ve been involved in a number project side I’m still on their mailing lists and often still and you know a member of the.
The PNC on this project but as far as actually truly contributing code the vast majority of my work spending Pagan High.

Cory:
[29:52] So the I want to come back to contributing code because I have a question where there about the Apache community.
How is it evolved over time cuz obviously some of the projects that you were involved in early on in terms of their development and they’re not going to their release into the open source I’m sure Apache was a a different communities then.

[30:12] That it is that it is now and what what’s the what are those changes what’s what’s different about what it means to be the Apache way.

Alan:
[30:20] Well I would say the big difference that I.
She is just really size and volume right Apache is really continue to grow especially on the big data side you know when.
At least when I got started with pig we were the second or third Big Data project we didn’t even use the word big data back then but what you would Now call Big Data.
And so honestly a lot of the people in Apache didn’t really know what we were about cuz they were all from her most of them were from the HTTP Dior Tomcat side and they were thinking more about,
Java servlets and those kinds of things and they were about distributed file systems or whatever.

Cory:
[31:05] Look for those are expensive.

[31:10] Yes I heard on flank like that is that is that project going to be as big a deal overtime as like the early Buzz was.

Alan:
[31:23] Well if I could tell the future.

Cory:
[31:28] Come on that’s why you’re here that’s how you got to where you are.

Thomas:
[31:30] Yeah but we already established the the buying High selling low.

Cory:
[31:36] So I’m looking for the the anathema of what he’s doing.

Alan:
[31:39] Here’s what I would say I I think that link has a really great model when it comes to stream processing.
And I think the interest there is how they can do truly you know not batch not micro batch but.
Down to the record stream processing and handling.
How big is the market or interest for that that’s where I don’t know the answer well but from a technology perspective I think it’s really interesting technology.

Cory:
[32:11] Now it’s like anything you just said that the use cases seem to be driving more adoption anything else I guess I think from in your seat as a as a guy who’s been a yeah really.
Kind of in a pivotal role of the the early stages of big data in the technologies have you are like what are you and the community doing two shifts to that use case sort of lens.
To grow your relevance and grow your auntie or your adoption.

Alan:
[32:39] I would say the big thing is we’re trying to spend more time out with the users and try to figure out a what.
What exactly are you going to do with this technology right cuz honestly early on.
Hadoop was here’s a cool tool now go find some problems help with it right or.
You know it was Engineers who already knew they had a big problem and knew they couldn’t solve it any other way so let’s use to do and.
They said you do now have to meet people where they’re at and say hey look we built this database state of warehousing solution,
on top of the dude we think I can solve your problem for you and here’s how,
so we’re we’re really out there trying to figure out exactly what it is you’re looking at in and really our best feedback there is our existing customers cuz you soon as you give them something there you know there clamoring for something they’re like.
We used to do just ETL And Hive now we do all our reporting an eye but I can’t hook my cubing engines up to it cuz it doesn’t respond to those categories fast enough so why not right so then we’re like oh well I guess we should go make it work for,
those cheating type query so we’ll work on that it is getting that feedback from them where are they not happy.
If you’re at your customers are either complaining at you or they’re not using your stuff so if you got to listen to those complaints and see where they’re trying to drive it.

Thomas:
[34:08] Are you able to take those complaints and take those you know optimizations are opportunities back into the community so with it with your roll at you know helping incubate Newell projects in Apache.

Alan:
[34:20] Definitely mean sometimes it’s been launching new project sometimes it’s just adding things to existing project sometimes it’s figuring out how to make projects work together cuz you realize I have two piece,
two projects have half which and eating all I have to do is figure out how to hook them up and we can solve this problem.
But yeah I definitely get a lot of chance that well not just me our whole company I I don’t not alone at all in that.

Cory:
[34:50] There’s only like for it that’s.

Thomas:
[34:51] Now you can you can tell us yeah you can tell if my new you’re running the show man this is the way we live right.

Cory:
[34:58] That’s a 4-person shell for four digit of dogs I know.
Organ works is a big place and the Apache Community is a is a big place but it’s it’s still very much a technical community are you still are you still getting Hands-On on a weekly basis.

Alan:
[35:18] I do so it really varies kind of from time and place are times when.
I’m getting the right code one-day-a-week there’s times when I’m doing it for days a week right now I’m in kind of in a 4 days a week mode which actually makes me really happy I.
I really am an engineer top to bottom but.
They are you know what I find that one I want to stay very connected just cuz I like doing it but also pretty soon you don’t know what you’re talking about if you don’t if you’re not in a day today right you can.
Resent the slides and you can talk at a high level but somebody ask you a question and if the next level down and you just don’t know and the only way to.
I know is to really be a part of it everyday.
And also I think when you when you help start a company you have to figure out what are you good at and the reality is I’m not a business guy and I’m not a manager I’m good at helping build this stuff so that’s what I do.

Cory:
[36:24] What’s funny I always tell ya I’ve had the chance to Mentor.
Systems engineers and bookstore, technical pre-sales rolls and capacities of a drive technology adoption right but one is always telling his man you gotta stay Hands-On like you got a car value.

[36:41] That that time I’d love to hear that you’re still doing that even you know I think.
Most of us have been in and around this community when we hear names like yours out there like they must just be like flying around hanging out cool customers like just being awesome all the time but you’re actually doing the fun stuff that you love it it’s that you enjoy doing that,
I think is this all it’s okay for us to continue to be nerds does that mean like a little pudgy.
Definitely have a beard probably in Mountain Dew somewhere nearby.

Alan:
[37:12] Well I’m not touching I actually.

Cory:
[37:15] Abu.

Alan:
[37:17] I enjoy running in my spare time so that times to discourage the way you definitely have well Diet Mountain Dew now I’m not.

[37:28] You know I tried to grow a beard off and on but it just doesn’t.

Cory:
[37:33] That’s alright we have.
Where to have a new sponsor of our of our podcast in that they’re they’re awesome they’re called beardski so next time you need to go hit the mountains get you a beard ski there these really leaves really red masks.
That look like beards and they are officially a sponsor of RCL and their sponsor if I should only because I had to figure out a way to get some of the dudes that don’t have.
Beards won’t grow them.
Committed enough I had to get away to get them a good beard so we partnered up with. Ski.com to to get us some to get us some beard so that’s it that’s awesome that you’re still are you going to need a big guy.
The big conferences this year like the data Works conferences and if so are you going to be presenting anything there.

Alan:
[38:19] I will I’m sure I’ll be going cuz one of my side jobs here is I actually coordinate the technical content of the day to work Scott conferences or help coordinator so from the engineering side.
Obviously we lied.
Yo Community groups choose the content but you have to do a little bit of editing to make sure you don’t get 10 talks on the hottest topic and nothing on anything else and you,
and just so you know somebody has to be out there recruiting the reviewers and and.
Kind of all that finding the replacements for last minute to,
people that can show up and all those things so I I help with that so I’ll definitely be involved I.
I usually speak at most of them I have actually quite enjoy doing that,
I don’t I haven’t gotten around to putting in a talk for the next one which is in Berlin and April which I guess I should get to cuz I think the conference is closes pretty soon.

Cory:
[39:21] December I think December 14th and so weird.
So the collective group at the Big Data beard team would like to officially begin begging for your consideration of are abstracts.
We’ve all been working really hard to get some some interesting talks going so if you see our names up there we are expecting a preferential treatment if you know what I mean.

Thomas:
[39:44] Only half of them were actually about beards the other half at we’re actually talking big data so I don’t know if that helps or hurts us.

Cory:
[39:51] I’m actually just going to stand up there and drink beer and gets home in the mustache for 30 minutes of Sylvia Browne.

Alan:
[39:57] Well we should get you involved that you know we have a you know some like what is out birds of a feather that’s what we should have a beards birds of the feather afterwards.

Cory:
[40:07] Well then I’m pretty sure there’s a couple of sparrows hanging out in this Face store South got going on.

[40:13] Car so I got to go I got to go she just in general again it’s your you if you got the school view of kind of this community,
what are the you know what’s the what’s on the horizon for just the general Big Data Community I know we talked to use cases are important but just where do you see this community going and and what do you think or the the kind of the next big things a week.
As participants in the community should either be paying attention to or starting to work on now.

Alan:
[40:40] Well it’s.
So what all kind of throw out several directions that I see things going and I don’t the thing is.
I get so in the weeds in a section to that it can be hard to know if I’m seeing the whole picture or parts of it but there’s several parts on seeing and somebody deleted to which is the connectedness of data.
People should be able to get to operate on use whatever it is they do with their data regardless of its location whether it’s in a cluster somewhere Cloud somewhere or out on the edge.
I think the other thing that I’m seeing a lot of them this is a reflection of where I spend part of my time is around security and governance of the data what.
What are the protocols for how you keep data or how long you keep it or who you can present it to who you can share it with what.
When the customer calls and says you know what are you doing with my data or can you please delete it or can you tell me who you’ve given it to how do you answer those questions or deal with those things,
I think those are going to get more and more important we see that certainly happening in Europe with I don’t know if you guys have heard of the gdpr legislation but it.

Cory:
[41:58] Oh yeah.

Alan:
[41:59] Okay that’s obviously a huge deal for many of our customers even companies that are primarily European is anybody who has any data on any EU citizen is.
Bound by that but I suspect we’ll see how other countries starting to adopt either gdpr itself or you know something similar.
Those and I would say rightfully so I mean people are as companies and governments and others collect lots and lots of data as individuals we want to know.
Who has what information and how secure is it we’ve obviously seen failures of that recently in terms of you know our information not being properly.
Guarded by those who have it so I think they’ll be a lot of focus going forward on that which is something that.
Honestly a bit of a shift for people in this community cuz the community has been really engineering-driven and Engineers tend to think of security and governance is something you have to do not something you want to do.

Thomas:
[43:11] Will do it later.

Alan:
[43:12] Yeah it’s not interesting right somebody else to come along and do that at night I think that we have to ship that mindset a little bit and think about you it’s just a lot to think about security first we have to be asking ourselves okay cool.
Houses datasafe what are the ways somebody might try to get out of it what what do I need to do to enable my users to.
Note to protect their stuff.

Rob:
[43:37] Yeah it said hey I want to roll up a couple of topics that we’ve we talked about today because I think that you just touched on that something,
pop a little light on in my head so I’m working with a couple of different Global organizations now where you know like Debbie Gates one.

[43:52] Holiday to all the time regardless of where it originated from or where it came from but to your point about security and governance some of the state of crosses political boundaries or nation-state boundaries word data has to stay local but that scale at they need to keep,
In Too Deep archive and many many years retention ill all those things together where I’ve got.

[44:13] Nation boundaries and retention and governance an important aspects Force privacy regulations around the data,
Dole Caesar critical at how do I continue to grow with the scale to the and use the kinds of security governance and data orchestration you’re talking about today.

Alan:
[44:30] Well so let me just be upfront and say I don’t think anyone’s answer that question well yet it’s not like I can pull a project out of my hat and say here’s something that will magically solve your problem.
I can tell you that there are people working on this problems we have a.
Project Wallace I mean there’s an Apache project that we are part of Apache Atlas that tracks data Providence and the rules around,
very state of sides and those sorts of things.
And we are through our collaboration in Otti which is a an industry collaboration group of us an IBM and Sass and some others.
We’re working on adding.
Kind of module screw Atlas that would enable it to track some of those sorts of things but it’s very early days I mean right now what we have our beautiful set of slides that explain how all this is going to work.
Start like there is a solution that’s been found yet.

Cory:
[45:37] ODP eyes an interesting one for a lot of us you know it was it created the kind of a rift between you know I think the Marquette General agrees are the two biggest players in the space between you guys at hortonworks in Cloudera in terms of approach,
what would you think the big benefit of OTP I has been in this in this community.

Alan:
[45:58] Well today.
So deeply I was created to give really with the goal of saying okay there is I don’t know her number but there were at the time 5 or 7 or whatever it was they do distributions.
And let’s try to bring some kind of uniformity here so that users don’t have to worry about the fact that there are.

[46:22] You know they may have to run on different distributions could have a particular company might buy one or the other distribution but most anybody use building software on top of it is going to want to run on.
Multiple distributions cuz they don’t know what their customers are going to want to use.
There was a lot of consolidation in the market so that actually in that sense Otti,
was very helpful pivotal and IBM ended up consolidating on top of htps.
Distribution and we’re kind of down to a point where it’s us and Clara and Matt Barr in.
There is and also pen Amazon his tools if you’re just thinking here in the US so not a lot of consolidation has been driven now DPI is turning and trying to look at the next problem of OKC.
If we could have like an elevation.
The next thing is let’s look at how would we govern and secure the stay there cuz that’s kind of one of the next big problems users have.

Thomas:
[47:24] Alan thanks again for being on the show we’re going to be going to roll into a wrap it up before we let you leave we’re going to ask you some rapid fire questions so these questions first thing that comes to mind we’re just kind of go through saddle right or you ready.

Alan:
[47:35] You sure fire way.

Thomas:
[47:37] What year was Skynet go online.

Alan:
[47:41] 2035.

Thomas:
[47:44] If you bought me a book what would it be.

Alan:
[47:49] Lord of the Rings.

Cory:
[47:50] Which one.

Alan:
[47:52] The hotel it’s one book.

Thomas:
[47:54] Controversy here.

Cory:
[47:57] Okay suck I got it not The Hobbit you’re saying.

Alan:
[48:00] No no no the real Lord of the Rings.

Thomas:
[48:03] What genre of music are you talking to right now.

Alan:
[48:07] 80s rock and roll.

[48:14] I’m old man.

Thomas:
[48:17] What is your favorite piece of useless Tech.

Alan:
[48:20] Favorite piece of useless Tech.

Cory:
[48:23] You know like the thing that you bought me is like this is kind of dumb but I kind of love it.

Alan:
[48:28] I don’t know I don’t buy stuff that’s useless actually.

Thomas:
[48:33] Spoken like a true engineer.

Alan:
[48:34] No I don’t buy all that just are not the latest junk kind of guy so I’m trying to think what would be my.

Cory:
[48:42] Do you want an Apple watch.

Alan:
[48:44] No I don’t.

Cory:
[48:45] Alright see you that’s another vote again so cuz we had it but we had suspected as much people like the majority of people like Mom Apple watch.

Thomas:
[48:53] Yeah we never get an Apple Watch sponsorship.

Rob:
[48:55] I love my Apple watch today.

Alan:
[48:57] Okay.
Okay let me modify that bit I have bought one that is a gift for someone else my son has one in which he loves us so.

Thomas:
[49:05] What is your biggest Money Pit right now.

Alan:
[49:09] For me personally or artworks.

Thomas:
[49:12] Personally you.

Alan:
[49:13] Biggest Money Pit my my diesel Volkswagen that I need to get rid of.

Cory:
[49:20] Is it like a cool like cool old diesel Volkswagen or just like a 95 Passat that this is beat up.

Alan:
[49:27] No it’s it’s just.
It’s a 2014 Jetta it’s just I ordered a new car and it keeps getting delayed and I keep having to put like small things keep breaking on my Jetta and it’s driving me nuts.

Cory:
[49:44] Oh that’s the one that it puts out more smoke than 18 wheeler.

Alan:
[49:48] Yeah exactly.

Cory:
[49:48] Perfect love it.

Thomas:
[49:49] Are you going anywhere interesting soon.

Alan:
[49:54] I’m going to Disneyland over Christmas that’s about it.

Thomas:
[49:56] Oh wow what show are you benching on right now.

Alan:
[50:03] Tinker Tailor Soldier Spy but the old Alec Guinness version not the the recent one.

Cory:
[50:09] Very good.

Thomas:
[50:10] What thanks again Alan that’s all for today show want to thank you for being a part of the show make sure that you subscribe to the big date of beer podcast said that you never miss an episode make sure you rate US on iTunes thanks again.

Alan:
[50:23] Alright well thank you for having me.