Site icon Big Data Beard

Hadoop Sizing – A Basic Capacity Approach

From time to time, I have the opportunity to work with customers setting up their initial Hadoop cluster and we are often asked what should be a simple question:

“How many nodes do I need in my initial Hadoop cluster?”

In my time in the field, I’ve struggled with an easy answer to that question, although it struck me as something that should be pretty straightforward.  Well, after a recent conversation with some senior field engineers at one of the big Hadoop shops, I finally got a simply elegant answer based on initial storage requirements.  So it goes like this:

Pretty easy, right?

If you are interested in how we came up with this, here is the guidance:

Extrapolated further for Isilon sizing, I came up with this:

Clearly, this is a super simplified approach, but dang if it isn’t handy?!?  Now, I am well aware of many cases where this number and the configuration of a Hadoop cluster are dependent on more factors that capacity…like say are you planning to use Spark, SparkStreaming, HAWQ, Impala, TEZ, and on and on, but it’s a handy place to start.  And to top it off, when you deploy Hadoop on Isilon, you have a ton of flexibility in scaling compute and storage distinctly to address the infrastructure constraints based on workload requirements.

So here’s to making Hadoop as simple as possible…one cluster at a time.

-your bearded friend

Exit mobile version