Testing VxRail as a Scalable Starter Platform for Splunk Enterprise – Part 1 of…..

by Keith Quebodeaux

EMC and VCE, also known as EMC Converged Platforms (EMC CP) (chest bump and crisp high five to the big data beard @Cory_Minton and @EMCBigData), have numerous platforms and products that have proven value for Splunk Enterprise. VxBlock 540 and its associated XtremIO all flash arrays for hot/warm data provides advanced data services, and inline deduplication and compression to reduce the overhead of increased replication factors and search factors.  VxRack provides the ability to scale out in a node based architecture, more efficiently using local storage and improving IO performance with EMC ScaleIO software defined storage while still being able to independently scale investment in compute and storage. EMC Isilon provides efficient scale out cold storage and potentially eliminates the motivation to freeze data.  That’s all awesome.

Yea, so what about a net new Splunk customer that doesn’t have any of that and is looking to just deploy a starter kit for Splunk and potentially scale out once they have demonstrated the ease of use and awesomeness that is Splunk for machine data.  So we think we have that covered too.  Recently VCE/EMC CP entered the hyper-converged infrastructure (HCI) appliance market in partnership with VMware launching VxRail.  VxRail is a fully integrated VMware HCI appliance that starts at 4 nodes in a 2U enclosure and from that point can scale incrementally on a node by node basis to a maximum cluster size of 64 nodes, which just so happens to be the maximum cluster size of VMware.  There are multiple models of VxRail with differing node types, but the current Splunk reference Server is a minimum of 16 cores, so based on that 160, 160F, 200, 200F, 240F, 280F would be the appropriate appliances for Splunk.

Picture1.png

Just a quick note about VxRail nomenclature; the trailing F indicates all flash drives in the appliance.  The back end storage in VxRail is vSAN based, and in addition to the awesome IO speed, an all flash VxRack also enables advanced data services on vSAN.  The value of vSAN advanced data services as it relates to Splunk are to be considered TBD at this point while we continue testing their effectiveness.  Our understanding of vSAN deduplication and compression is that it does not span multiple nodes, so given the CPU requirements of Splunk will generally dictate a single Splunk Enterprise server (or in a distributed deployment an indexer) per VxRail node, we will have to do some testing and introspection.  (Note the use of small ‘o’ throughout representing only the opinions of two architects cloistered in a room heavy with the scent of stale pizza, post digested burritos, and sour gummy bears and not the opinion of EMC, VCE, VMware, Splunk or any rational and showered individual.)

So speaking of those un-showered and un-shaven data bearded individuals; with the resources available in a single 2U VxRail, someone, and in this instance that would be us (but hopefully in the near future that would be you too) could start with as little as a 20GB/day of ingest in a  starter kit and, depending on replication factor and data retention requirements, expand to potentially 400-500GB per day of ingested Splunk data in a single 2U appliance.  To validate this architecture will meet or exceed Splunk’s published reference server hardware, we are currently testing a single instance of Splunk on a VxRail 200 with a 16vCPUs, 128GB RAM, and 2.4TB data volume and 1TB of data.  The intent is to not only validate Splunk on VxRail (and as a precursor vSAN), but to see just how easy it is to expand that environment to a distributed deployment on the same appliance.

Picture2

As was clearly stated, our objectives are to validate that VxRail will meet or exceed Splunk’s published reference hardware standards.  We are working closely with the team at Splunk to not only learn more about how they “pressure test hardware” but also to make sure that we configure a solution that will absolutely help customers be successfully deploying Splunk atop VCE/EMC Converged Platforms.  As more official validation are achieved, we will certainly share them, but for now this remains only an update on the opinions and workings of what may be considered irrational and punch drunk engineers having a lot of fun getting our hands dirty.

Your Goateed Friend,

Keith Quebodeaux