But Coraid hews to the appliance model of storage deployment, where storage resources are still treated as infrastructure distinct from servers. It's an architecture soundly rejected by native cloud services such as Google and Facebook, along with many Hadoop implementations. In such massively scaled infrastructure, it makes no sense wasting a server's inherent physical capacity and processing abundance, where storage could be just another resource like CPU cycles or memory size, only to connect them to an independent storage system.
In an era of distributed file systems like Ceph, GlusterFS, HadoopFS and Lustre, running cloud servers with DAS no longer means the capacity is dedicated to a single host and application. The only problem is that cloud stacks still only support file and object stores (with the caveat that Ceph does offer block support but is thus far limited to guests with the appropriate installed kernel module or to Qemu VMs running on KVM).
Enter ScaleIO, a stealthy startup that hopes to do for storage what OpenStack and CloudStack have done for computing. SANs are hard to manage and harder yet to scale, so founder and CEO Boaz Palgi says ScaleIO set about building a SAN without the fabric and dedicated hardware, in which local disks in commodity servers are stitched together with software that provides all the features storage pros expect: high availability and performance, shared block volumes and distributed file systems, rich features like snapshots, thin provisioning, disk and node redundancy, self-healing (replace a failed node and it's automatically reincorporated back into the storage grid) and even performance QoS--a topic we explored in depth in Network Computing's February digital issue (registration required).
The idea is simple, says Palgi: to turn local disk into a SAN, accessible by any server in the data center, that's scalable to thousands of systems and where adding capacity is as easy as connecting another server to the ScaleIO "hive mind." ScaleIO SANs can also mix and match solid state and mechanical disks in a couple of ways. The simplest is just building separate HDD and SDD volumes, binding applications to whichever is most appropriate. Alternatively, Palgi says, the software can also use SDDs for caching in native HDD SANs.
It's such a logical concept, one wonder's why it's taken so long. Two words: it's hard. Palgi points out that even AWS, the world's premiere cloud system, maintains a distinction between object (S3) and file (EBS) storage service. And he says building cloud-like block storage is particularly difficult, noting that although all cloud providers offer some sort of object store, "almost no one offers an alternative to EBS." Access to ScaleIO SANs is done through a kernel driver, with ports to Linux and ESX currently available and Windows to come. Palgi says it has even been working with Calxeda to provide support on ARM servers.
IT architectural trends are nothing if not cyclical, oscillating between epochs of extreme centralization and hyper-distribution. Storage, as witnessed by EMC's enormous and sustained success, has been living through an era of consolidation, but one that's come at a high price, as the cost of buying and managing huge storage systems hasn't kept pace with that of disks and flash chips.
Cloud infrastructure is likely the catalyst to swing the architectural pendulum back toward distributed storage systems, where server storage bays don't sit empty, adding capacity isn't a moonshot project, and server admins needn't supplicate before storage gurus every time they need to spin up a new applications.