In the open-source storage space, the most notable event of 2014 was the acquisition of Inktank by Red Hat, putting Ceph and Gluster in the same stable. This gives Ceph a much-needed boost in credibility as a solid venture, since it adds a large-scale support and marketing capability from a vendor with a solid set of credentials.
The payoff for Ceph is already apparent. Ceph is becoming OpenStack's preferred object store, relegating OpenStack's own Swift development to second place. In the open-source world, this means Ceph is attracting more development effort and bringing new features to market faster. An example is erasure-coding support for lower-cost data integrity, which was released in April in Ceph; we'll see it in the next release of Swift in 2015 (Kilo).
SUSE recently announced that Ceph is the core of its SUSE Storage product, placing it in competition with Red Hat, and others are expected to follow with Ceph-based products. We can expect pre-integrated appliances from China with Ceph storage stacks, providing object stores and scalable solutions.
With Gartner predicting that open-source storage will have a 20% share of the storage market as early as the end of 2017, Ceph and Gluster appear poised for high growth, and the battle for a share of that 20% will be hot and heavy. Most likely, the winners will use inexpensive COTS hardware from Chinese ODMs, and the contest will tend to be based on lowest price. The first "white-box" storage platforms are already entering the market from the likes of Quanta and Supermicro.
The release of erasure-coding support in Red Hat's Inktank Ceph Enterprise 1.3 is a major step forward for Ceph. Storage is moving to a new tiering system that uses very fast solid-state drives as the primary storage tier, and spinning disks as the bulk storage tier for cool and cold data.
For performance, it's necessary to use replication as a data-integrity method for primary storage, since RAID and erasure coding can't keep up with the IO rates. In the bulk tier, the opposite is true, and erasure coding becomes the solution of choice, being more robust while using less drives.
Other features in this year's release of Red Hat's Inktank Ceph Enterprise 1.3 include cache tiering using SSD and extensive additions to the management toolset.
Gluster also made strides. Both Ceph and Gluster scale well, with Gluster more aligned with traditional web storage. Red Hat bases its Storage Server product on Gluster and has just released Version 3.6, with snapshots, increased scale-out capability, flash drives, and Hadoop support. These features add to the 3.5 release last spring, with snapshotting and at-rest encryption.
In the long run, Red Hat will probably emphasize Ceph over Gluster, in line with industry uptake and open-source community technical support, but the company likely will have two platforms for quite a long while. Red Hat could move to build Storage Server over the top of both Ceph and Gluster, although the current focus of that team is on OpenStack integration.
The best strategy for Red Hat may be to let the market decide. There are no clear winners in this space at this point in time, so offering a rich menu of options potentially is a good strategy, especially if the company's sales of both Ceph and Storage Server are growing rapidly, as they reportedly are.
The future direction of both Ceph and Gluster has to be as base platforms for software-defined storage, though there are features missing from both storage technologies, such as deduplication and compression, which will be offered by other vendors such as the traditional array makers. Extending support in both packages for unstructured data is also crucial for success, given the immense growth that data will see over the next decade.
Moving to COTS and open-source code allows users to buy drives from distributors rather than from OEMs. The result is a price per terabyte in the $30 range for bulk storage drives. This is dramatically lower than the typical large array vendor's price, and in the end it will become the force driving open storage to a high market share.
Erasure coding cuts in half the total number of drives required under replication, which can save money in big installations. But overall, the dollars involved are small compared with the move to open source and COTS platforms.