In mid-2020, the NVMe Organization released the first revision of the Zoned Namespace specification (ZNS). This new specification has received a lot of attention, with promises of longer living drives that achieve higher performance under certain conditions. Since then, a number of companies have published roadmaps that include products with ZNS support. In this article, we'll review the significance of this specification and what ZNS drives are good for.
First, this specification is significant because it brings a new command set for host systems to use when accessing SSDs and to get more out of those SSDs under certain workloads.
Traditionally applications on host systems write data to the SSD, and the controller on the SSD makes decisions about where that data should be written to on the device. You may assume that when data is written to an SSD, it's written once to its new 'home' in the flash memory, and that's it. However, some important limitations of flash memory can necessitate the data to be written and re-written to the drive several times. Data can be written in relatively small units but can only be re-written after an entire, larger block of data is erased.
For example, typically, when an SSD has free space in it, an application is free to write data to the SSD, and the controller decides which block should store that data. Let's say a block has been fully written, but then the application asks to delete the data in the first half of the block and keep the data in the second half of the block. The controller will flag that 'deleted' data as being stale. It's not actually erased because the erase operation can only be performed on the whole block, and doing that would also erase the data in the second half of the block - and that's data that the application wants to preserve. In order to preserve drive capacity that data in the second half of the block may be re-written to an entirely different block, and only then will the first block be completely erased, making that space free again. This re-writing is referred to as Write Amplification. It's like a game of Tetris where the data needs to be kept efficiently and compactly without any 'gaps' of wasted flash memory.
Clearly, this re-writing of data is not without purpose since it enables more efficient use of the space on the drive. The drawback is that this re-writing effectively shortens the life of the SSD, which has a limit on the number of times that it can be overwritten. For single-level cell (SLC), multi-level cell (MLC), and triple-level cell (TLC) flash, this is not usually a concern since the endurance of the drive, that is, the number of times it can be overwritten, is extremely high. However, quad-level cell (QLC) flash drives, which have higher capacities, have lower endurance, and we need to pay close attention to managing the number of drive overwrites. This is where ZNS can become very useful.
Small packages, high capabilities
Most if not all ZNS SSDs will use quad-level cell (QLC) flash, meaning that 4 bits of data are stored in each cell of the flash. This means that QLC SSDs can be built with very high capacities in a relatively small package. However, that density comes with the trade-off of less endurance. Many QLC SSDs are spec'd with write/erase cycles in the low thousands. As the drive is overwritten, again and again, more cells will fail.
While that sounds like a bad thing, it's actually fine for certain applications. Many workloads are 'write once read many,' meaning that once the data is on the drive, it rarely or never needs to change, and this is where Zoned Namespaces can become very useful. Zones provide an excellent mechanism for reducing unnecessary writes and managing the number of drive overwrites that occur over the life of the drive. How does it do that?
ZNS allows applications to manage zones on the drive that are either open for writing new data or closed to writing new data. Data is written to the zone sequentially, so there is no need to go back and shuffle data later, significantly lowering the amount of re-writing done to each flash cell in the drive. But remember, ZNS is best suited for workloads that are write once, read many. It's not well suited to workloads that have a lot of writing, deleting, and writing again.
Some architectures use other flash memory, perhaps an NVDIMM, as a staging area to make sure all data is written to the zone sequentially. While that may seem like a lot of extra effort, it can make a big difference because it enables more efficient use of the flash memory, which is the most expensive part of the SSDs. Multiplied across the thousands of SSDs in a data center, the cost impact is significant.
The NVMe Organization has prioritized interoperability and compliance when introducing this new command set. First, driver support for ZNS has been introduced in some open-source NVMe drivers. This will aid in making it easy to adopt ZNS drives. Further, compliance tests are being designed to ensure that as drives are introduced to the market, they support the new command set properly.
David Woolf is the Senior Executive of Technology Offerings at the University of New Hampshire InterOperability Laboratory (UNH-IOL).