The EMC SE Forum is a wrap now and the Data Fabrics Minors team was invited to attend six hours worth of technical breakout sessions over the two-day event focused on Big Data and Analytics solutions from EMC and our alliances partners. Now, six hours of anything can be a drain, but I feel like we embarked on a really cool “Netflix-binge” of content about the exciting developments in the Big Data and Analytics ecosystem, of which 83.333% of the sessions were delivered by folks outside of EMC. No retreading of boring old training topics here…we covered some crazy cool topics:

 

  • Big Data Solutions Update
    • Presenter – Chris Harold – CTO, Big Data Solutions @ EMC
    • Abstract – With the delivery of the Data Lake Foundation and Federation Business Data Lake v1.0 in the books, the Big Data Solutions team has proven they are delivering exciting solutions to solve customer problems by curating the best technology, people and processes from across the Federation.  During this session, we will hear from the team’s senior technical leader on where the team is focused in the next 12-18 months and get a sneak peek at some of the offerings they are building as we evolve the Federation Business Data Lake solution portfolio.
  • Customer Perspectives – Adobe
    • Presenter – Jason Farnsworth – Sr. Storage Engineer @ Adobe
    • Abstract – It’s fun to talk about all this data fabrics goodness going on around EMC, but many of you have asked to hear directly from a customer about the real-world experiences of architecting, implementing, and operationalizing a data lake supporting various analytics applications.  Well, this is the answer to a great request.  Come hear from Jason Farnsworth at Adobe (lead technical contact responsible for helping develop the popular white paper – Virtualizing Hadoop in Large Scale Infrastructures) about what it was really like to implement a virtualized Hadoop environment using EMC technologies and get first-hand accounts of the challenges in doing so, but also on the pay-off for persevering.  This is not a commercial, it will be a real conversation with a great customer who delivered real results to a really cool business.
  • Open Data Platform Update
    • Presenter – Joel Dodd @ Pivotal and Shivaji Dutta @ Hortonworks
    • Abstract – As new types of data sources such as the Internet of Things (IoT) and analytics workloads gain significant momentum, companies need to adopt a data-driven approach in all their business operations. They are looking to a strong open source ecosystem, faster business innovation cycles and flexible deployment options to do so.  This session will discuss how new open source initiatives including the Open Data Platform (“ODP”), along with the Pivotal Big Data Suite can help customers realize their data-driven objectives while fostering greater advancements in the Big Data Landscape.  This session will discuss how to represent the open strategy and how EMC will work with these solutions.
  • SAS – Analytics Portfolio and Hadoop Integration
    • Presenter – Marc Wellman and Gary Spakes
    • Abstract – SAS has a powerful analytics portfolio that customers have trusted for many years and continue to trust in the new era of Big Data.  During this session, we will discuss the SAS analytics approach, their portfolio of technologies, the technical architectures required to support this exciting application environment, and how solutions across the EMC portfolio can be leveraged to provide best in class infrastructure for SAS applications.
  • Splunk – Hunk Technical Deep Dive and Demo
    • Presenter – Raanan Dagan, Senior SE @ Splunk
    • Abstract – Splunk is one of the hottest applications and frameworks in IT today as their model for making Machine Generated Data accessible and valuable to everyone is resonating with customers far and wide.  The team at Splunk is supremely aware of the pervasiveness of other frameworks, like Hadoop, and is embracing them in a powerful manner as they continue to develop their awesome Hunk offering and offer deeper integration with Hadoop across the Splunk Enterprise stack.  During this session, we will discuss the exciting developments coming from the Splunk teams in the months ahead with a specific focus on the Hadoop-centric offerings maturing so that EMC teams understand how to determine whether “to Hadoop, or not to Hadoop” with Splunk.
  • Cloudera – Data Fabric Update
    • Presenter – Matt Harris, Director of Systems Engineering @ Cloudera
    • Abstract – Cloudera has emerged as the leader in Hadoop deployments and their unique ecosystem leveraging a blend of open-source and proprietary solutions has evolved to solve a number of our customers biggest Big Data challenges.  During this session, we are going to take a technical deep dive into their unique model for SQL on Hadoop with Impala and discuss the most successful models for EMC to partner with this powerhouse Big Data player.

 

The sessions were focused on helping the team of Data Fabrics Minors, who have been through a lot of training thus far relative to Hadoop and EMC solutions for analytics and Big Data, to get a deeper understanding of our alliances partners’ solutions and the kinds of conversations they have with customers regarding their platforms. While all the sessions were exceptionally valuable and definitely helped our team enrich their knowledge and hopefully tell better stories, the highlight of the sessions (based on feedback from the audience of more than 150 each day) was the demo executed by Raanan from Splunk.

 

Raanan did a great job of quickly getting the team a level-set on why Splunk has invested in building the Hunk platform and its various use cases, then he jumped over the a Linux virtual machine running on his MacBook that housed a single node version of Hortonworks HDP with Splunk Enterprise and Splunk Hunk running on the same machine. We saw how easy it was to query data using Splunk Enterprise, then he switched gears to Hunk and showed how the same query language and process can be leverage with Hunk against data that exists in HDP HDFS data store, rather than being processed by Splunk indexes…what Splunk refers to as a “virtual index.” Once the queries were entered into the Hunk dashboard and executed, Raanan switched over to the HDP environment and showed how YARN jobs were being initiated right there in Hadoop. Switching back to the Hunk dashboard, he showed the intermediate results showing up in Hunk even before the full YARN job had completed…really interesting piece of tech that allows analysts to see if they are asking the right questions in the right way of the right data before completing the batch YARN process. He then killed the job in Hunk and showed the job being killed in HDP on the spot. It was a killer showcase of what I think may be “the easy button” for executing queries against data that persists outside of Splunk in an HDFS environment…especially considering the value props and use cases for each technology component in the enterprise analytics ecosystem.

 

The integration that Hunk brings between Splunk and Hadoop has crazy cool impacts for a lot of customers I have talked to over the last year that are looking to either expand the use cases of their teams’ skills with Splunk into other data silos like Hadoop or bring the power of Splunk to environments where data sets are too large to cost effectively index into their Splunk Enterprise environments. With Hunk, Splunk seems to have validated one of my predictions that Hadoop will likely become more popular as a storage persistence framework that enables integration to more advanced processing tools (think Spark in the open source world) rather than being the standard for processing.

 

The Data Fabrics Minor sessions and demo were all pretty darn awesome to be part of and I hope all the Data Fabrics Minors got as much from each of the sessions as I did. I take a lot of pride in trying to curate great content and tools for our teams and I encourage folks to provide feedback on how to make this program better, so feel free to let me know what you thought about it.

 

Now, it’s time for me to get this goodness of HDP and Hunk running against my virtual Isilon cluster acting as my HDFS tier…spin propeller, spin.

 

Your Bearded Friend, Cory.