In the last couple of years, the IT industry has created more acronyms than the US government. I won’t even get started on all the "as-a-Service" terms! Storage isn’t immune to acronym overload. Of course, most of the acronyms hide valid technical advances and there have been a lot of those in the storage industry of late.
Let's first look at the types of storage drives available today.
Hard-disk drive: HDDs are the traditional spinning drives that we’ve had for decades. They come in two main sizes, 3.5” diameter media and 2.5” diameter media, with the latter much thinner as well as smaller in area.
Solid-state drive: SSDs are rapidly replacing hard HDDs in the primary storage tier. Unlike HDDs, these drives have no moving parts and store data on flash memory chips. SSDs also are much faster than HDDs, between 100x and 1000x times faster
Flash card: PCIe plug-in cards that look like ultra-fast SSDs to the system.
Drive interfaces have moved on from the SCSI (Small Computer System Interface) and Fibre Channel of the past. The primary drive interfaces are:
SATA: Serial-ATA replaced the parallel-wired ATA interface of the 1990s. It uses just four wires to transfer data. Vendors offer data rates up to 12 Gbps today and plans for 24Gbps are in the works.
SAS: Serial-attached SCSI is the high-end version of SATA. Used on enterprise-class HDDs, it offers options for dual-port connections for redundancy, but otherwise shares many features with SATA. Most SAS connections can connect to SATA drives, though the reverse is not true.
PCIe/NVMe: Quite a mouthful! The PCIe reflects the high-speed interconnect used on motherboards to connect add-in cards, while Non-Volatile Memory express is a new low-overhead protocol using memory to memory (see RDMA below) operations at more than 4x SAS performance.
Drives in a server are connected either by ports on a motherboard, or an I/O expander that provides many more ports, or via a RAID controller or host-bus adapter. The latter solutions are usually SAS-based and can be configured to also connect to external boxes of drives, called JBODs (Just a Box of Disks). Note that the SAS external interface uses different voltages to the internal ports.
Storage often comes as a network-attached solution. Here a box of drives is attached to host servers via a controller (today, that's usually a COTS motherboard ) using a variety of connections. There are several types:
All-flash array: The AFA is a stack of flash memory devices behind a sophisticated controller, usually a COTS motherboard. Designed for simple installation into a Fibre Channel storage-area network (SAN) storage, it provides a massive performance boost and is becoming very popular.
RAID array: The traditional box of drives with a controller. Redundant Array of Independent Disks (RAID) arrays mainly use HDDs; when they use some SSDs, they are called hybrid arrays.
Appliance: As drives get larger and much faster by using SSDs, the industry is moving to compact appliance boxes with low-drive counts, typically 10 or 12.Appliances balance network performance and processing power with drive performance.
To complicate things further in storage, there is a set of networked storage protocols that's different from drive interfaces. The most common are:
Fibre Channel: Fibre Channel was developed in the 90’s as a serial replacement for SCSI. Use of FC has evolved into a complex of arrays and servers, connected in a SAN. IBM has its own version of this called FICON (Fiber Connectivity).
iSCSI: An Ethernet-based protocol using a SCSI command set
FCoE: An Ethernet-based version of Fibre Channel, now falling into disuse.
InfiniBand: InfiniBand operates with very low operating system overhead and uses RDMA to move data around. It is used in low-latency use cases, such as trading floor systems.
NoF: NVMe over Fabrics is an emerging protocol using NVMe with PCIe, Ethernet or even InfiniBand. With similar RDMA performance, it's likely to replace older InfiniBand solutions.
There are a few other storage terms to know. First, some methods for finding and accessing a particular data item.
Object storage: A method of storing data objects in a flat indexing system, object storage has the advantage of unlimited scalability. It's the choice for unstructured big data applications and should replace file storage in the next few years.
Block I/O: This is the access method for the SAN. Block I/O data addressing simulates a huge drive, broken up into logical units, called LUNs. Access becomes unwieldy as the capacity increases.
File storage: Also called network-attached storage (SAN) or filer, this is an extension of the file systems inside a computer, where the file system resides on, and is processed by, the filer as opposed to block I/O, where the server processes the addressing scheme.
Universal storage: This is a combination of all three access methods, residing typically on an object store such as Ceph. This universality should be enough to make universal approaches a preferred choice when it comes to replacing the others.
Then there are also some common terms used to describe capabilities in storage devices.
RDMA: RDMA (Remote Direct Memory Access) is an interface mechanism for direct transfer between the main memories of two computers. This speeds up transfers and also significantly reduces the computer overhead for transactions. It can be applied to many storage protocols.
Compression: Most data can be compressed by factors averaging around 5X. This reduces the raw capacity to store data, though at a price in terms of complexity and performance. As technology speeds up processing, expect the approach to become ubiquitous.
Deduplication: A similar process to compression, deduplication stores just a single reference set of any object, with pointers to the common object replacing the other copies in the system. This can save huge amounts of storage. An example is keeping one image for hundreds of desktops. It is sometimes confused with compression, and even major companies interchange the terms, but the process for deduplication is subtly different.
Encryption: Data needs to be protected when stored at rest. Source-based encryption is the only viable solution to meet most compliance requirements.
Indexing: A new concept in the storage market is coupling data searching and indexing with backup approaches. This technique uses the huge bandwidth available in backup systems to save cost and time in legal and medical record environments, for example.
SDS: Software-defined storage is a new concept that involves separating processing of storage data from the actual storage devices. This separation allows storage services to be virtualized and scaled as needed, increasing operational agility.
Today, the storage industry is changing faster than it ever has in the past, so expect there to be even more terminology to keep track of in the near future.