It’s been a while since I’ve posted a blog, but I wanted to dust off the metaphorical pen (aka my Logitech keyboard), and talk about why I am excited about the upcoming Cloud Data Lake conference known as Subsurface.  

For those of you who are not familiar with Subsurface, this is a FREE two-day (July 21st & 22nd) community-driven virtual conference focused on the Cloud Data Lake ecosystem. While Subsurface is hosted by Dremio, the focus is 100% on the projects, innovations, and technologies being led by the Data Lake community.

This will be the third Subsurface event; the first was last summer, and the second one this past winter, with over 10,000 attendees braving the virtual cold to join and hear what the best and brightest in the industry are doing with their Data Lakes! If you’d like to catch a glimpse of the awesomeness to come, check out some of the past sessions

This upcoming Subsurface is shaping up to be the most electric and exciting one yet, with keynotes from some of the biggest players in the industry, with AWS, Microsoft, and Intel as Presenting Sponsors.  In addition, there are more than 30 technical breakout sessions that include speakers from Uber, Netflix, LinkedIn, and even our Big Data Beard friend Thomas Henson! Here’s the complete agenda of sessions for your browsing pleasure. 

Now that you have an idea of what the conference is about, let me highlight some of the sessions that I am most excited about!

  1. “Distributed Transactions on the Data Lake with Project Nessie”
    • Project Nessie is an open source project created by the team at Dremio and shared with the OSS community. Nessie enables users to maintain multiple versions of their data and leverage Git-like Branches & Tags for their Data Lake. Nessie integrates with Apache Iceberg, Delta Lake, Hive, and more. Ryan Murray, one of the creators of Nessie, will be the speaker and this will be an exciting session covering some exciting innovations for the cloud data lake.
         2. “Why and How Netflix Created and Migrated to a New Table Format: Iceberg
    • Netflix has been at the forefront of data analytics innovation for years now.  A lot of their internal projects have made their way into the open source community and have had huge impacts. A few years ago, Cory and I recorded a podcast with Michelle Ufford, former Head of Data Science Tools at Netflix, and we were blown away with the innovation and focus on Data Analytics. Anytime we get to hear someone at Netflix talk about what they are doing internally, we should listen and take notes!
         3. “Best Practices for Building a Cloud Data Lake”
    • Cloud Data Lakes are still relatively new, and many companies are still trying to figure out how to build and implement a Cloud Data Lake to handle the scale and complexities needed for our data-driven world. In this session, Roy Hasson from AWS will focus on architectural considerations and best practices for building a Cloud Data Lake, with a specific focus on ensuring security and using AWS service, including Lake Formation and Glue
         4. “Avoiding the Architecture Undertow: Building Lighting-Fast Queries with Blazing Fast Object Storage”
    • I have to give a shout-out to Big Data Beard contributor Thomas Henson. Thomas’s knowledge of the data analytics ecosystem and data lake architectures is incredible, and every time he gives a session I learn something.  A lot of customers are using on-prem object storage for their Data Lake, and Thomas will discuss how to architect a data lake to get the best performance.  
          5. Lastly, I am super excited about all the keynote sessions from AWS, Microsoft & Intel. Here is the breakdown:
    • AWS’s Kevin Miller, GM, S3 Storage, will discuss the role of S3 in Cloud Data Lake storage, both today and in the future. Kevin will also talk about how AWS is working to help customers manage the scale and complexity of the Cloud Data Lake.
    • Microsoft’s Jurgen Willis, VP, Product Management, Azure Storage will talk about how Microsoft enables customers to control their data and use it in an open architecture. I heard there might be a cool demo taking place as well!
    • Jeremy Rader, GM, Enterprise Strategy & Solutions Group, Intel, will be talking to Eric Kavanagh and Tomer Shiran about how Intel is enabling Data Analytics and the Data Lake today.

This is just a glimpse of what Subsurface has to offer. Honestly, as I was picking sessions to highlight, I had trouble picking just five! There are a ton of sessions that are super-interesting, and I’m thankful they’ll be available on-demand following the event.

I highly encourage everyone to register for Subsurface (HAVE I MENTIONED IT’S FREE?!), block off some time, and attend these really interesting and informative sessions.  

I’ll leave you with my new favorite YouTube video for motivation.  

In case you’re wondering why this is my new favorite video, Dremio’s mascot is a Narwhal… enough said.

Your Bearded Friend, 

Brett