Big Data and Analytics are still two of the hottest topics in tech and Strata Data Conference in London is the epicenter of activity in this smoldering area of tech for the folks in Europe.
Funny aside…notice the conference is no longer carries the word “Hadoop” in the title?!? Yeah, it’s much more than that now.  Actually some of the sub-topics in this space, like Machine Learning, AI, and Deep Learning are quite rapidly becoming hot topics on their own with their own conferences…more on that later from my next stop in Berlin.
The Strata Conference is the premier, especially from a sponsorship cost perspective, conferences in Europe gaining attendance and presentations from brilliant folks across industries: leaders from telecom, titans of financial services intelligence, to many public servants looking to use data for general good.
Key assessments and take always:
  • Cloud remains front and center in Data Analytics
    • Cloudera brought the announcement of Altus out during the keynote, officially entering Cloudera into the cloud PaaS game
    • The concept of “long-lived vs ephemeral” clusters comes to mainstream
    • Shared storage and disaggregation of compute and storage was promoted by all of the cloud vendors for a variety of reasons. Including cluster life expectancy
    • Security and governance challenges and solutions remain front and center
    • Even though the only major public cloud vendor talking cloud was google
    • AWS was not present at the show
    • Microsoft was not pitching Azure very visibility…heavier focus on their apps that ITaaS
    • Dell EMC was only major vendor focusing on private cloud best practices and architectural concepts
    • The majority of customers I talked to still build their clusters on premise, but talked about cloud as a great option for some test/development workloads
    • Large enterprises seemed to be strongly moving towards the data lake concept, in spite of challenges
  • Machine Learning is the next manifestation of excitement in this area
    • The number of sessions on ML, TensorFlow, and deeper ML topics like computer vision was impressive
    • ML related talks also dug into hardware concepts like GPU and TPU advancements
    • Many talks on ML sought to explain what it was more than how it worked or how to apply it
I also had the pleasure of leading a technical breakout session during the conference titled: Architecture Best Practices for Big Data Deployments.
image
The goal of the session (slides available here) was to provide attendees with some fundamental approaches to sizing Hadoop infrastructure based on both capacity and performance requirements while sharing some learned best practices along the way.  We’ve found that organizations looking to deploy these technologies generally struggle with infrastructure sizing as they get started and that the decisions made early in the deployment life-cycle have significant impacts on scalability and usefulness of a cluster over time, so we want to get it right up front and we at Dell EMC have years of experience doing exactly that.  The approach I shared was based three major bodies of knowledge:
  1. The last 7 years of testing, benchmarking, and engineering we’ve done around Hadoop here at Dell EMC.
  2. Feedback from customer’s on their experiences.
  3. My personal experiences and interactions in this tech arena for the last 5 years.
However, as I stated at the conference and I will state here again, I am confident that this approach is not right every time for every organization or deployment.  The goal is to share some directional guidance and foster consistent communication between those deploying Hadoop and those selling the clouds to run them, because clouds are operating models, not places.
While the approaches discussed in my session are highly distilled approaches to the complex, organizations looking to deploy Hadoop could use them to buy a hadoop deployment in a public cloud or on premise private cloud or they could equally use them if they seek to build their own clusters from their favorite mix of tech providers.  And no matter if folks want to buy or build, it is my pleasure to work for the division of Dell EMC whose sole purpose is to make complex deployments like this simple, which is why we offer our Ready Solutions for Hadoop.
Ready Solutions for Hadoop are the industry’s first, most well tested, and widely deployed reference architectures and bundles of technology for Hadoop.  We package the Hadoop distribution, Linux O/S, Compute, Storage, Networking technologies with the best services and automation for on-premise deployments to make it simple for customers to chose Dell EMC for their Big Data needs.  And it was great to hear from multiple members of the audience that they had top-notch experiences deploying Hadoop with our Ready Solutions for both Cloudera and Hortonworks.
During the conference, I also had the chance to have a few chats with the press, including folks from O’Reilly Media and Information Management 360.  It was my distinct pleasure to share with them and you folks reading this two new tid-bits of awesome news about the Ready Solutions for Dell EMC (links to interviews coming soon).  The first was the recent announcement of our TPC benchmark domination.  In this test, we show that the Dell EMC Ready Bundle for Cloudera Hadoop provides the #1 price/performance in TPCx-Big Bench for Scale Factor 10 (*Based on published results as of May 13, 2017).  See the report right here on the TPC site.
The second was the incredible honor of a Senator John McCain leadership award for the work we do with TGen, working to beat childhood cancer by building the world’s best analytics environments to support their relentless research to end this awful disease.  It is quite inspiring to know that our work at Dell EMC truly does drive human progress and being recognized this way keeps the fire in the belly strong to keep doing it!
All in all, it was a great conference for learning the latest trends in Data Analytics, but also to connect…with customers on how they are using our technology to power their analytics achievements, with our great partners like Cloudera, Hortonworks, Intel and Microsoft, and with the analyst community to share the great work that Dell EMC continues to pour into this tech hot bed.
Now on to the next stop…Berlin for my first Blockchain, IoT, and Artificial Intelligence Tech Expo. My propeller spin rate is at a crescendo!
Your bearded friend,
Cory