At the end of a recent seminar, an audience member asked how I felt about enterprise hybrid disk drives. Could vendors, and even users, boost the performance of their disk arrays by replacing their 600 GB HDDs with 600 GB solid-state hybrid drives (SSHDs)?
Before answering the question, let's look back at the evolution of the SSHD. Like a hybrid storage system, a hybrid disk drive combines multiple storage media to deliver a price/performance proposition better than flash or spinning disks alone.
The first successful hybrid drives, like Seagate's Momentus XT, targeted the laptop market. Back in 2011, it seemed like a good solution for people who wanted to carry around more data than an SSD they could afford would handle.
When I used the Momentus XT, it felt more like a fast hard drive than a solid-state drive. Just as you noticed that the system was stuttering to read from the disk, it would finish; with a big enough SSD, the system wouldn't stutter at all. My working set was just enough larger than the 8 GB of flash on the drive for me to notice when there was a cache miss. When I bought my last laptop, an SSD big enough to fit my needs was cheap enough -- about $100 for 240 GB. That's why I'm now running all flash on the road.
Seagate's enterprise SSHDs marry 32 GB of flash to its fastest 15K RPM spinning disks. That works out to a cache of roughly 10% of capacity for the 300 GB model or 5% for the 600 GB version. Since many of today's hybrid storage systems default to a cache 10% the size of their disk layer, those proportions make a lot more sense than the Momentus XT's 8 GB cache for 1 TB of capacity.
My problem with enterprise SSHDs is that, for any given amount of flash, one big pool is a lot easier to manage well than a whole bunch of little pools in the individual disk drives. If we have a storage system with 32 SSHDs, the 32 GB of flash on each of the drives will cache the hottest 5% or 10% of the data stored on that disk drive.
Since the storage controller will decide which drive to use to store any given piece of data based on its data protection and volume management schemes, the hottest 10% of some drives will likely be significantly colder than the top 10% of others.
If instead we used a pair of 480 GB SSDs as a centralized cache, the array controller could cache the hottest 960 GB of data. The centralized cache would also be more efficient because the distributed cache would have to duplicate data or parity information for data protection. A centralized cache could write new data to both SSDs and then overwrite one copy when the data block is written to the backend data store.
Since each SSHD is independent, if the SSHDs are configured as RAID-1, new data written to the system will be cached on both drives in the mirrored pair. If the controller distributes reads across both drives -- as even Windows' built-in volume manager does -- data blocks will be equally hot on both drives that hold them, and therefore any data blocks hot enough to be in cache at all will likely be cached on both drives. Since the central cache can store just one copy, it will have room for more warm data.
Hybrid controllers can also combine their caching algorithms with their data layout. Systems like Nimble Storage's CASL can accumulate data in the cache and write large sequential stripes of data to their backend disks. Unless hybrid drives get really smart and implement their own log-based data layouts, destaging data from a drive's cache to disk will require more head motion.
Like laptop hybrids, I think enterprise hybrid drives will remain a niche product. They can provide a performance boost when installed in servers as DAS and on basic array controllers in applications with small working sets or modest IOPS. In shared storage systems or server SANs, a more centralized cache should be significantly more efficient and perform better.