Spectra Logic is promoting the concept of "deep storage" as a new way of selling tape libraries, but don’t let your thoughts on the viability of tape distract you from learning about deep storage.
Try to step back mentally and emotionally and see where deep storage fits in the recent panoply of IT innovations and what it portends for management of “heavy data”--that is, bulk data--that is a byproduct of the ongoing data explosion.
Spectra Logic defines deep storage as “extremely low cost, power efficient and dense storage that requires some latency when retrieving data.” This is a storage tier for the long-term mass storage of data--typically, from about 200 Tbytes to north of multiple petabytes.
In essence, deep storage is a new application use for a tape library. The two principal application uses of a tape library are data protection (such as for backup and recovery) and active archiving. Yet heavy data does not fit easily into either application category. Since heavy data is working data that maintains a long-term value, it is definitely not data protection, where backup/restore and disaster recovery are the primary functions.
In fact, much of heavy data is fixed content (that is, the data does not change after it is created), so, technically, it might be a target for an active archive. There might be cases where some deep storage data would fit into an active archive. But an active archive also requires overarching software that provides functionality such as compliance and e-discovery for emails. This may be too much or the wrong kind of software management overhead for many applications, such as big data, seismic data or video surveillance data.
With deep storage, specific use cases are probably best managed individually rather than under an active archiving software management umbrella. But that poses a problem: Moving data for deep storage and then accessing that data for business purposes is not easy. In fact, although you could write custom software to do the job, it would still be difficult and complex. Active archiving and data protection have solved this problem, but at the expense of focusing on targeted solutions rather than on providing a framework that can be adapted to serve as a general purpose solution.
In contrast, Spectra Logic has introduced a general-purpose interface that enables the use of deep storage and will eventually be open (at least to some extent). Spectra Logic also launched its BlackPearl appliance to use deep storage with its own products, which are most likely to be with tape libraries, but that will also eventually work with a Spectra disk product.
Deep Simple Storage Service (DS3)
Spectra Logic’s DS3 is a communications interface that allows clients (as in a client/server architecture) to manage and direct bulk storage read (GET) and write (PUT) operations to deep storage, such as tape. DS3 is actually an extension of Amazon's S3 (Simple Storage Service). The extensions enable the use of sequential storage media as well as removable storage media.
Amazon S3 is a de facto standard and one that has been broadly accepted as a Web services interface designed to scale large amounts of data at any time and from any place on the Web. Storage is in the form of objects in buckets where each object has a unique, developer-assigned key. Spectra Logic employs this form of object storage for its deep storage architecture and solutions.
As an extension of S3, the DS3 interface obviously encompasses the REST (Representational State Transfer) client/server architectural style to move objects to and from deep storage using the high-level GET and PUT commands. DS3 is the first native RESTful interface that can work with robotic tape libraries.
[Read how tape is a more cost-effective way to deal with little-used data than hard disk drives in "How Tape and LTFS Can Relieve Storage Pressure."]
Using an extension to S3 as a cornerstone seems to be a particularly good move on the part of Spectra Logic. Deep storage has to be able to play in the modern IT world, and integration with the Web services world is an essential component for doing so. Secondly, deep storage has to be feasible in the sense that the application developer time has to be palatable for organizations. It's about being able to do what would have been too costly (development-time-wise) in a time frame and at a cost that an organization deems acceptable. Previously, that was not true except for custom-built applications that could justify the investment.
The BlackPearl Deep Storage Appliance
BlackPearl is Spectra Logic’s data management appliance based on DS3 that actually implements the use of deep storage. BlackPearl does a number of things:
• Acts as a DS3 server to DS3 clients while using the DS3 interface; data is migrated from a DS3 client to the BlackPearl appliance;
• Stores data as object-based deep storage by grouping collections of data as buckets while being able to store this data using the open, self-describing Linear Tape File System (LTFS) format, and maintaining an object catalog physical storage location and metadata information;
• Manages the deep storage system itself, including inventory, retries and error handling;
• Provides tight integration with Spectra’s BlueScale tape management system for actual management functions of the tape library, such as tape encryption, data integrity verification and system error detection.
Spectra has a developer program that provides all the tools necessary to write a custom DS3 client, including the necessary API, a software development kit (SDK), and a simulator download. The vendor is also working on pre-written clients; one is for use with the Hadoop Distributed File System (HDFS), so data can be migrated out of an active HDFS-managed cluster for long-term storage and future use.
Spectra Logic claims that its tape-based deep storage platform, front-ended by a BlackPearl appliance, can cost as little as 9 cents per gigabyte in multipetabyte environments and 14 cents per gigabyte in smaller environments. Note that this is the full purchase price. The payback would be in less than a year, versus a monthly charge for a cloud service that provides longer response times (most notably Amazon’s Glacier, which is a deep archiving service).
Next Page: Deep Storage Use Cases