Without a doubt, flash is the most important innovation in modern storage. Flash delivers significantly higher performance for randomly accessing data -- the greatest requirement in today’s virtualized datacenters.
But does that mean only flash is viable for modern storage architectures? That’s what flash-only vendors would have you believe. It's time to take a hard look at the pros and cons of flash and disk.
First, let’s consider disk, the more venerable of the two. Hard disk performance is mostly determined by seek times, which, in turn, is determined by rotational speed, measured as rotations per minute or RPM. Access time for a 15K RPM disk is about 2 ms.; it’s double for a 7.2K RPM drive. Keep in mind that while random I/O incurs seek-time overhead, sequential I/O does not, and that hard disk performs well with sequential I/O.
Plus, hard disk delivers loads of capacity for the price: A single 3.5 inch drive stores up to 6 TBs of data, and a gigabyte costs only a few cents.
Perhaps hard disk’s strongest advantage is its age: Because it’s been around for a long time, its reliability and error characteristics (the likelihood of data/sectors being corrupted) are well understood. And, in the decades since it was first introduced, several technologies like parity/checksums and RAID have been developed to mitigate the impact of errors.
Now, let’s compare disk to flash.
NAND flash is based on solid-state technology. Since flash eliminates mechanical parts, access times are as much as 100 times faster than hard disk. In an environment requiring lots of random I/O, flash leaves hard disk in the dust. However, flash is far more delicate than hard disk. Flash stores data in memory cells, and every time a cell is erased or written to, it degrades.
The number of program/erase cycles in a flash device is determined by the kind of flash being used -- SLC, eMLC, or MLC. Devices with a higher number of P/E cycles have better endurance. They also come with a higher price tag: SLC has the highest endurance and is also the most expensive version of flash.
The most common way to extend flash’s lifespan is by over-provisioning, over-allocating capacity so the same cell does not get written to more times than its permitted wear level. That makes flash pricey. Already, it’s seven to 20 times more costly than hard disk. With over-provisioning, it only gets more expensive.
Of course, the semiconductor and storage industries are working at bringing prices down. At least today, savings are coming at the expense of lifespan. As cost-per-GB comes down, the need to over-provision rises. That means that any initial savings erode -- even vanish -- over time.
Because of its relative newness, flash’s error rates aren't as well understood as those of hard disk. Multiple SSD drives have a greater likelihood of failing all at once -- incidents that can mean catastrophic data loss.
As it turns out, flash and disk are perfect complements. The ideal storage architecture leverages the advantages of both.
Compute power is cheap and plentiful. With that in mind, what if random writes could be coalesced into sequential writes leveraging compute and memory? This scenario is exactly what VMware founder Mendel Rosenblum, proposed in this thesis, "The Design and Implementation of a Log-Structured File System."
Sequentializing can improve performance as much as 100 times, delivering up to 40 MByte/s with a single 7.2K RPM drive. Ten hard disks equal the throughput of an SSD device at a fraction of the cost. That makes it possible to get a lot more performance from disk, while leveraging its capacity advantages.
And remember, not all applications need high amounts of flash. Based on a Nimble Storage survey, the working set size (the actual amount of application data that needs to reside on flash) for low sub-millisecond responsiveness is no greater than 10%. And that includes the most performance-intensive applications.
So the next time a storage vendor tells you that disk is dead, remember flash and disk are complementary. Optimal performance and capacity lies with leveraging both.