NetApp made headlines last year in its attempt to purchase Data Domain. While the story of that not working out has been very well documented, NetApp also had two other deduplication technologies. One leverages the ability of OnTap (the NetApp OS) to do deduplication and the other is available with their VTL solution. This entry will focus on the former.
As most readers are aware, NetApp was one of the first suppliers to provide deduplication on primary storage. In our testing and in conversations with clients, it works well. A potential shortcoming is that it does not compress data at this point in time, so to gain storage optimization, there must be duplicate information. You can also address this by adding Storwize's compression product to the mix, which is compatible with NetApp's deduplication capability. To this deduplication core, NetApp then adds SnapVault, which backs up one filer to another and only moves changed blocks. While the technology behind SnapVault isn't technically deduplication, it is block-level incremental backup, and if you combine it with the dedupe capabilities of the core OS, you get much the same effect.
For example, if you have three filers to be backed up, all three will most likely do their deduplication prior to backup. Then SnapVault will initially do a full backup of these systems to a NetApp backup target, most likely a NetApp NearStore. These full backups are then deduped on the NearStore, removing redundant data between those boxes. Successive backups with SnapVault will then only copy the changed blocks to the NearStore from the three filers and again redundant blocks can be deduplicated from the NearStore. The backup is essentially a live copy or mirror image of the data that was backed up. It is not stored in a proprietary backup format. This makes for very easy recoveries but does not allow for point-in-time copies. To fix that potential shortcoming, SnapVault triggers a snapshot prior to the backup occurring. Dialing back in time then is similar to using any other NetApp snapshot.
This process sets the stage for a very clean replication process, simply use NetApp's replication application SnapMirror to replicate the local NearStore to a DR NearStore. The local NearStore has already been deduplicated and is only receiving changed blocks, so it is well optimized for the replication process. Multiple sites can all replicate to the NearStore at the DR site and then the deduplication process can run once again at the DR site to eliminate redundancy between the sites. While there is not a WAN, optimized deduplication like we have discussed in prior entries, the WAN is effectively optimized without the need for deduplication since only incremental block changes are transferred during each SnapVault or SnapMirror session.
The NetApp approach to capacity optimized backup has a lot of merit, especially in NetApp heavy data centers. What about the rest of the environment? NetApp has Open Systems SnapVault and also has allowed third party backup applications like Symantec's NetBackup and SyncSort's Backup Express to leverage this infrastructure. Essentially these tools will perform block level incremental backups to a NetApp backup target. These applications will also often provide the capability to do hot backup of applications like Oracle and Exchange. As is the case with SnapVault, once this block level data is received at the NetApp backup target it is then optimized and replicated as described above. Other than the lack of compression this overall process has a lot of value. I wonder why NetApp even bothered trying to acquire additional deduplication technology. My opinion is they should keep investing and improving this process instead.