While I stand by my position that we put too much of the responsibility for keeping our data safe for the long term on storage systems, as I wrote in Long Term Retention:It's More Than Media, I also believe that you bet on different horses for different courses. Startup Amplidata's new AmpliStor system has most of the features on my wish list for storing large data objects like medical images or rich media.
AmpliStor is a scale-out object store based on the redundant array of inexpensive nodes (RAIN) model. Applications use a RESTful or Python API to connect to controller nodes. The controller nodes are Gigabit Ethernet connected to their associated storage nodes. The system can theoretically support thousands of nodes.
Where most RAIN systems use RAID and/or object replication for data protection, AmpliStor uses a unique set of advanced erasure codes that Amplidata calls BitSpread. Like Reed-Solomon codes, BitSpread provides a much higher level of data integrity than more conventional RAID systems. BitSpread implements erasure codes on a per-object basis. As each object is stored, it applies the forward error correction math, breaks the data into chunks and distributes the chunks across the drives in the storage nodes in the cluster.
Other erasure code-based systems, like NEC's Hydrastor, allow you to specify the reliability level or the number of blocks that can be lost while the data remains accessible. The AmpliStor system also allows you to specify number of chunks each object is stored in and, therefore, how broadly the system will spread your data.
Unlike parity systems, every data block includes both data and ECC information so the AmpliStor controller node can assemble a data object when it has retrieved the minimum number blocks needed to reassemble the data. If you select, say, 16 blocks and a reliability level of 4, the AmpliStor system will assemble objects when it's retrieved 12 blocks. For latency-tolerant applications, you could even specify 33 blocks with a reliability level of 13 and put 11 storage nodes in each of three data centers. All your data would remain protected, even in the event of a data center failure, and a with just over one-third overhead where a more typical object replication system would need three times as much storage as data to cover similar failures.