In a previous blog post, I got all prophetic and said that I had seen the future of solid-state storage and that it was scale out. The sheer performance that even a small number of solid-state disks can deliver is just too much for a traditional scale-up storage controller to handle. The question then becomes: If the future of storage is scale out, which scale-out storage architecture is best?
Of course, scale-out storage architectures aren't just for solid-state, scale-out NAS, and object-storage systems have been solving big data problems of one sort or another for a decade. Over time, we've seen vendors introduce several different scale-out architectures, each one attractive in its own way.
The simplest approach is to use a shared-disk clustered file system, like IBM's GPFS or Quantum's StorNext, to build a scale-out NAS system like IBM's SONAS or Symantec's FileStore. These systems use a central SAN array to hold data managed by multiple NAS heads. The scale of the systems and their performance are limited by the shared storage array. While clustered file systems are a good solution for organizations needing large, fast file repositories, they're not the answer for solid-state storage, where our problem is in the array controller as much, if not more, than it's the file system.
What we need for scale-out flash is a "shared nothing" cluster that allows us to add nodes to the system without any single choke point, like the shared array in a clustered file system. As we look at shared nothing storage systems, both all-solid-state and those based on spinning disks, we see two quite different architectures.
Some vendors, like SolidFire in the all-solid-state market, HP's Lefthand, and most object storage systems, build up their storage clusters from independent storage nodes. To allow the system to survive the loss of a storage node, they mirror data across two or more nodes in the array. This approach keeps the node hardware simple, usually using off-the-shelf servers, but since all the data is mirrored across multiple nodes they have to store at least two copies of all the data. As a result, they're not terribly space efficient. While disk space is cheap, SSDs are less so, which may push solid-state vendors to the twin model.
While storage systems for archival data can use cross-node RAID or even better erasure coding to distribute data across multiple nodes without the overhead of mirroring, these approaches aren't well suited to the low-latency, high-IOPS applications that solid-state storage systems address. EMC's Isilon uses a combination of mirroring for files or folders that are accessed randomly and erasure coding for older files and those that will be accessed sequentially, like media files.
Rather than using a simple storage node, like rack-mount servers, which have their own single points of failure as their building blocks, twinned systems like Dell/EqualLogic and NetApp build a cluster of dual controller systems. Since each storage node has two controllers and a block of storage, it can use RAID for data protection, keeping overhead down. System designers can also build hybrid scale-up/scale-up by adding drive shelves to the controller pair.
The downside to paired systems is their behavior in the event of a controller failure. When a controller fails in a paired system its twin has to take on its workload, which may cause a significant performance loss. Most peer systems distribute the second copy of data from a single node across all the other nodes in the cluster, so a node failure will have a smaller impact on performance.
The unanswered question for all-solid-state systems is how vendors balance the cost of additional SSD capacity for single-node systems with the cost of additional controllers in a twinned system. This will become apparent as we see scale-out twin systems from vendors like Pure Storage.
Disclaimer: Dell, HP, NetApp and SolidFire are or have been clients of DeepStorage LLC.