Face it -- even with whiz-bang technology like flash caching and data deduplication, today's storage systems are pretty stupid. Sure, as we move from block storage arrays to file stores and object storage systems, we add a little bit of metadata. But in general, our storage systems don't know much about the data.
The startup DataGravity is aiming to change that with its Discovery systems. Add in the fact that DataGravity isn't the first rodeo for its founder, who showed the storage world that ease of use was important at EqualLogic, and DataGravity Discovery is worth a good, hard look.
At its most basic level, DataGravity Discovery looks like many other unified storage systems. It's got flash SSDs and spinning disks connected to a pair of Xeon-based controllers that deduplicate the data and present a hybrid data store via the usual SMB and NFS file protocols or iSCSI if block access is absolutely required.
What makes DataGravity's approach different is that, rather than leaving the second controller idle while waiting for the primary controller to fail, it uses the second controller to provide additional data-aware services. In the initial release, these services include full text indexing, file system auditing, and file system analytics. These services provide a degree of insight into user data that corporate IT folks can normally only wish for.
Leading large enterprises use tools like Google search appliance to index the data on their NetApp, audit who modified or deleted the critical spreadsheet with tools from Varonis, and use NTP Software or Northern Parklife to see who's filling up the NAS. The overhead of running multiple tools, each of which has to scan the NAS appliance periodically, is just more work than small IT departments can handle.
By indexing and collecting metadata inline as data is written to the appliance, DataGravity not only rolls all three of these tools into the storage system itself, but it also does so without impacting the performance of the primary data stream.
You would expect the system's data awareness to be limited to documents and other common data types stored in user folders. However, recognizing that most data today is actually stored on virtual machines of some kind, DataGravity has extended its discovery engine to open the .VMDK and .VHD files that make up the VM's disks. As a result, the engine can scan through files in more than 300 file formats, whether they're on virtual machines or in user shares.
The DataGravity folks have also given serious thought to data protection. Other storage players, like NetApp and Nimble Storage, pitch their snapshot mechanisms, which I admit are very nice, as an organization's first line of data defense. Snapshots, however, have two critical flaws: They share the same disks as the primary data, leaving the data vulnerable to a multiple disk drive failure scenario, and they have no index, making restoring individual files more difficult.
Rather than using snapshots, the DataGravity Discovery system uses a separate set of disk drives to store the protected data. Essentially, when the system creates a protection point, it creates a snapshot on the primary storage pool and replicates it to the backup disk pool. Deduplicating both pools minimizes the data that needs to be replicated, and the replication rate is, of course, limited only by the performance of the backup disk pool.
DataGravity takes advantage of the fact that it has one set of metadata that includes the data in both pools to provide a simple (these are the EqualLogic folks, remember) web interface for restoring files from within a protected Windows VM, finding the files that contain Social Security numbers, or showing that the vice president of HR really did delete the file that he claimed just disappeared.
The tl;dr version of the DataGravity story is that it's a VM-aware hybrid storage system like a Trintri that also dedupes and automatically protects data, not through snapshots, but by storing the backups on dedicated spindles. It's also got full text indexing, access auditing, and analytics that don't slow it down by scanning the system over and over.
It all seems a little too good to believe, but then so did a scaleout storage system you could install in 15 minutes. The DataGravity team pulled that one off at EqualLogic, so you never know.