Elastic Big Data Infrastructure with VMware’s Big Data Extensions 2.1 and Isilon

Written and originally posted in December of 2014…

The holiday season is upon us and many of us will soon lie miserable on a couch after we inevitably consume copious quantities of tryptophan-laden turkey, ransack the cornucopia of casseroles and indulge in one too many sweet treats this holiday season. With our bellies chockablock, we over-indulgers will immediately regret these transgressions of gluttony.  However, a rare few innovators, like your truly, will enjoy holiday bliss regardless suboptimal decisions thanks to what has become my secret weapon in holiday happiness…elastic waistband pants.

Thanksgiving.jpeg.jpg

Elastic is forgiving, freeing if you will.  It allows my waist to expand and contract as the demands dictate without discomfort or encumbering my agility.  And if I make the right choice in a shirt that doesn’t require tucking in to look proper, then my friends and family are none the wiser…it’s a beautiful gift I give myself each year and you should too.

I want elastic everywhere. I want to translate the wonder of elasticity into every corner of my life.  Thanks to our friends at VMware, we can now bring the freedom, comfort, and unencumbered enjoyment of elasticity into your Big Data infrastructure thanks to their recently released 2.1 version of Big Data Extensions.

The functionality of virtualizing Hadoop started with Project Serengeti in 2012 whose goal was to make it easy to deploy Hadoop or HBase clusters in a vSphere platform. The product of this open-source project were rolled into vSphere in 2013 as BDE when customers began looking for a commercially-supported version of what Project Serengeti had created.  Now, BDE is included at no additional cost for customers with VMware vSphere Enterprise and Enterprise Plus licensing agreements. For more official details specific to BDE from VMware and how they work, visit the VMware BDE page.

Virtualizing Hadoop makes sense for the same reasons virtualization worked so well of every other enterprise application…efficiency, scale, and simplicity.  Many implementations of critical applications like Oracle and SAP had an issue where islands of servers were operating well below their performance thresholds, but VMware helped us resolve those problems by virtualizing application servers and leveraging external, shared storage. That’s the idea behind BDE…treat Hadoop and Big Data applications like other enterprise applications and while recognizing that the data services to support these applications and workflows are just a little different.  Interestingly enough, many of the same principles I outlined in a previous blog that make Hadoop on Isilon an interesting solution in the enterprise are analogous to the reasons why BDE is so relevant.

The recent release of BDE version 2.1 brings us a couple of new features that help evolve the BDE into an even more powerful tool in simplifying infrastructure deployments for support Big Data applications.  The most powerful and impactful feature comes from tighter integration with provisioning tools from major Hadoop vendors Cloudera and Hortonworks.  With previous versions of BDE, you could create the basic Hadoop cluster but it would have not Hadoop software in it, meaning you would need to manually load the Hadoop software on the virtual machines using the vendors’ installation and configuration tools.

bde21.png

Now in 2.1, the process is greatly improved because BDE can now call the APIs from Cloudera Manager or Ambari Blueprint to not only create the cluster resources, but actually initiate the installation and configuration of operational Hadoop environments. Either through the GUI or if you just love command line, this process is well automated in BDE 2.1 and it means that we can now leverage the load balancing and resource management of vSphere’s Distributed Resource Management for Hadoop (think resource pools, reservations, shares, limits, etc.).  More simply put, the elasticity and enterprise features that VMware provides to all other mission critical apps now come in to play with Big Data…I’ll have another helping of that please!

cli.png

That is the big deal with BDE 2.1…not only can we virtualize Hadoop, but now we have integration so tight with Hadoop that the power of VMware for mission critical applications is now relevant to Hadoop as well.

The downstream impacts of this improved integration means big things to virtual Big Data deployments:

  • With VMware’s vSphere BDE 2.1, we can easily spin up and spin down Big Data infrastructure automatically and deliver automated elasticity to our Hadoop environments.   This elasticity is enabled by leveraging the increasingly impressive set of automation and orchestration tools and APIs from Hadoop distributions like Hortonworks Ambari and Cloudera’s Manager.
  • The simple elegance of BDE in vSphere combined with VMware’s vRealize Automation (formerly vCloud Automation Center) means that enterprise IT can deliver automated, Hadoop-as-a-service with self-service provisioning tools to empower end users with Hadoop and Big Data resources on demand.
  • Combining the power of VMware’s BDE for automatic compute right-sizing of Hadoop compute resources with Isilon’s industry leading scalability and efficiency for HDFS storage creates arguably the most agile, elastic and extensible Hadoop infrastructure package in the industry.
  • All this automation and elasticity through VMware’s BDE and Isilon for HDFS comes complete with the most robust, enterprise IT proven security, governance, and compliance tools for any Big Data environment.

graph.png

While I am as excited about this as I am about sweet potato casserole, the excitement is not all mine. Even our partners at Cloudera and Hortonworks are jazzed about what VMware is delivering in terms of enterprise capabilities in Big Data deployments.

Cloudera’s Director of Cloud Products, Tushar Shanbhag, expressed his delight in a recent blog post:

“This significantly speeds up time to value for IT operations person servicing Hadoop deployments on virtualized infrastructure. The architect, developer, QA testing person or other user comes with a request to the administrator, perhaps with a specification of their desired cluster. The vSphere administrator can now carry out the provisioning task for them using VMware BDE integration with Cloudera Manager.

We’re very excited to see more of our key partners like VMware leveraging Cloudera Manager APIs to enable better Hadoop experiences for their users.”

One of Hortonworks’ bloggers, Jeff Sposetti, recently posted a blog touting how well BDE enables Hadoop-aaS:

“Now the two companies are also working together to innovate in the area of ease of use for administrators of virtualized HDP clusters. This blog explains the features that are available for using Apache Ambari and VMware vCenter (with vSphere Big Data Extensions) in concert to cleanly provision, manage, and monitor your virtualized Hadoop clusters.

The BDE-Ambari integration works well for the user communities where an IT operations person or an architect is providing Hadoop-as-a-Service.”

Clearly, VMware should be excited about what this means for Big Data for their customers and their Technical Marketing Manager, Justin Murray, posted a great blog detailing the configuration of the BDE 2.1 with both Hortonworks and Cloudera.

It is abundantly obvious now that VMware has put an inexorable focus on helping customers provide the same ITaaS features and data services for Hadoop as they have elsewhere in the enterprise and BDE 2.1 is the product of that relentless focus.   The power of VMware’s BDE is made even more impressive when you seamlessly integrate it with the scalability, ease of use, flexible interoperation, and enterprise data services that Isilon delivers as your Data Lake storage substrate.  Just as Isilon continues to reimagine Big Data infrastructure with a lens for enterprise class standards, we are elated to know that our partner VMware has a shared view of development to support these next generation applications.  All this gooey technical goodness has my mouth watering just talking about it and the customers I have been discussing this vision and deliverable technology with are equally voracious to get this running.

So I have a little holiday treat to satiate your appetites for Big Data goodies…if you are an Isilon and VMware customer today and you want to bring all this to life in your environment right now, then don’t wait to belly up to the buffet line technology treats.  We have step-by-step instructions called “Hadoop Starter Kits” that outline how to deploy VMware’s BDE with Isilon as your HDFS storage hot and ready for you on the EMC Community Network.

Enjoy the gift that is elasticity…in your Hadoop cluster deployments and in your clothes this holiday season.

References:

All images are courtesy of VMware and EMC.