This monday I had briefing with Datrium. They have a tag line of “Open Convergence”. I was grasping for a snappy title for this post as lead into writing about what they do. As ever my contrarian brain hit about the opposite of convergence which is divergence. I kind of like “hyper-divergence” because for me in away it describes the fact that despite the massive growth in the “hyper-convergence” marketplace – there persist radically different approaches to “getting there”. Both in the method of consumption (build your own VSAN Vs the ‘appliance’ model) and also the architecture (shared storage accessible directly from a hypervisor kernel (VSAN), a “controller” VM which shares out the storage back to the hypervisor (Nutanix)). I think Datrium and the recently announced NetApp HCI are delivering yet more options on both the consumptions/architecture front.
For Datrium the consumption and architecture model go hand in hand. I want to try and use some clumsy short-hand to describe their technology – it’s as if a relatively new storage company (like say TinTri, Nimble etc) and server-side cache company (like say PernixData, Proximal Data etc) have come together. This is a gross generalisation which I know the folks at Datrium wouldn’t want to make naunced, but if brevity is the soul of wit – that would be it in a nutshell. What makes Datrium “Open” is whilst they will happily sell you “Compute Nodes”, not obligated to. You can bring your own servers, your own vSphere licenses, add SDD, and then sprinkle on your VMware ESX together with Datrium’s secret sauce. However, for the system to work, you would need to buy “Data Node”. All IOPS are contained within the read/write SSD storage of the “computer node”, with persistent storage being written back to the “Data node”. In the past we used call a data node a “SAN”, but that term is a little toxic to some – and whilst Datrium “Data Node” provides many of the features of enterprise SAN, it does so without the complexity of configuration that sometimes comes an SAN. Datrium thinks this offers some unique advantages over the kind of hyper-convergence that would require the purchase of dedicated appliances (often at vastly inflated prices compare to the cost of the OEM equipment). It allows you decouple your compute from your storage in such away that its open to folks who aren’t necessarily the “storage admin”. For me hyper-convergence has always meant the consolidation of storage/servers/network for a complete solution – and that extends to the person managing too – its converges that skill set into an individual, where previously these were specialist skills in specialist silos managed by specialist teams.
So that’s the ‘high level” thumb-sketch, that necessarily oversimplifies any offering. Now for some detail wherein we will find both angels and devils. Any consumption/architecture model is likely to involves trades offs either for the consumer or the vendor – and hyper-convergence is no different. For instance where hyper-convergence is a combination of storage and compute bonded together the assumption is your consumption of storage/compute will grown at the same ratio. The tricky thing is getting those ratios right. It’s relatively ‘easy’ when it comes to workloads like virtual desktops, where the workloads are series of identicate of expendable cattle – its ‘hard’ when dealing with an array of radically different server-side apps that consume data at different rates. The ‘danger” of conventional hyper-convergence is you end with ‘too much compute’ and not enough storage, or ‘not enough compute’ and too much storage. FInd the right “goldlocks” size of hyper-convergence appliance isn’t necessarily easy, and of course what if you needs or workloads change. This could result in under-utilized compute, and additional appliances just being added for more storage. You can see this perpetual challenge in the architecture – and its one of the reasons for arguing storage and compute should not be so closely coupled. The counter argument from the hyper-convergists is the traditional servers/storage/network model was excessive complicated and required a small army of specialists in white coats to keep it alive. Hyper-convergences did away with the kind of data availability management that generations of SAN users had become accustomed to – and so the hyper-convergence companies have to pivot hard to add such things as snapshots, replication, compression, dedupe, erasure encoding, and encryption to them. So its one thing to dispense with the complexity of LUNs and LUN presentation (WWNS, IQN and so on), for a simpler (almost cloud) “big buckets” approach to storage – but most companies would balk at the prospect of losing the features they get from an enterprise SAN.
Datrium believes its architecture has the right balance of components and delivers a consumption model that will make it attractive to customers. Below is simple architecture diagram of their model:
The compute nodes can be you own servers fitted with SSD, or you can purchase “compute nodes” from Datrium. As you might expect Datrium isn’t the OEM business so these will be re-badge servers from an OEM. Although I asked, I wasn’t able to prise out of Datrium who their vendor is. Apparently, the vendor re-sales their gear with non-disclosure attached to. No doubt anyone looking at the rear of chassis or seeing the start-up/boot-up screens would figure that out in an instant.
The nodes do not have to be the same size in terms of compute/storage – although I would say that best practises around HA/DRS usually mean people DO create clusters using similarly sized servers. All the IO and active data processing is held within the SSD on the server. That means configuring enough SSD for the working set. Datrium were keen not just to see that is a read-only caching system – that’s why a glib simplistic comparison to vendors who do this to accelerate access to a legacy SAN is somewhat lazy of me. But what can I say? I can be a lazy person sometimes! The Data Node is mandatory component – so Datrium is not a ‘software only’ solution. Essentially, it acts the persistent storage layer for the system – or what Datrium dub “Durable Capacity”.
This Data Node offers the kind of storage management capabilities of enterprise storage of the modern type. It’s offers up an NFS target to the VMware ESXi hosts, but the UI and management tools are focused squarely on a virtual machines and their virtual disks. It’s this which make me want to compare it to the new-generations of storage vendors who came on the scene in the last decade. I think I might have to revise my use of the word “new” as some of these companies are now coming up their first decade of trading – and in our fast paced world, 10 years ago now feels like a century. So along side the usual suspects of replication, compression, and de-dupe, Datrium do also support encryption. What I think is interesting about Datrium is whilst these features took time to end up in “Minimal Viable Product” that typified the first-gen of hyper-convergence, and were added at later stage – the hyper-convegence market has “matured” to the degree that you cannot nowadays launch a hyper-convergence technology that’s missing these features. They have to be there from almost Day One to get any serious traction in a market that’s already quite crowded. So Datrium offers end-to-end encryption at points of the data flow cycle.
All the management of the system is carried out from vCenter using a plug-in. So rather than pushing yet another management “Single Pain of Glass” to go along with all your other “Single Pains of Glass” – you do management from vCenter. Of course when you say “management” – there are no LUNs or other array type objects to look at – so the focus is squarely on the hosts and their associated VMs.
Finally, futures – I wasn’t able to get much out of Datrium but they didn’t indicate they do want to deliver support their own replication to targets in AWS such as S3 for off-site DR purposes.