Deduplication isn't the only data reduction plan in town. Compression and single-instance storage provide other tried-and-true ways to store more data on primary systems.
Lossless compression using the LZW algorithm, or one of its many descendants, dates back to the early 1990s when Stacker appeared to compress DOS, and therefore Windows 3.x, disks. Even in those dark ages of 200-MHz Pentium Pros and 40-MBps SCSI drives, compression almost always improved I/O performance by forcing more data through the disk channel bottleneck.
Today's processors are 100 times faster than those of the early '90s, but the disk I/O channel is only 16 times bigger than it was then. So we think fears of compression negatively impacting performance are overblown at best.
NTFS compression has been a standard feature of Windows servers since the beginning, enabling administrators to compress data on a folder-by-folder basis. Conversely, enterprise NAS vendors have until recently avoided compression because of fears it might slow performance.
Storwize's STN-6000 appliances sit between the user network and CIFS or NFS servers. They intercept requests and compress or decompress the data payload using a version of LZ compression tuned for random I/O, then pass it on to the filer. Storwize claims typical users get 75% (or 3-to-1) data compression, with especially compressible data getting up to 10-to-1, and most applications also see improved performance.
Single-Instance Storage
Single-instance storage identifies multiple copies of the same file in a file system and replaces all but one with references to a single copy of the data. The file system manages updates by keeping track of how many files refer to a single set of data, knowing when users create new, modified copies, and deleting data after the last file referencing it is deleted. Single instance works well on user home directories and similar sets of files because many users will save the same e-mail attachment or scanned delivery menu in their home folders.
EMC's Celerra NAS, which ranges in price from $20,000 to more than $100,000, uses both single-instance storage and data compression in its data reduction toolkit, and addresses possible performance impacts by only using single instancing and/or compressing inactive files. Admins set policies to apply single-instance storage and compression based on the file's last accessed time stamp and schedule a task to apply the policies.
Windows Storage server uses NTFS's real-time compression and, like Celerra, provides single-instance storage as a scheduled post-process.
Return to the story:
Deduplication Joins The Primary Storage Reduction Fray