Disk backup appliances leverage high-capacity hard drives and features like deduplication and compression to drive down the cost of backing up data to disk versus using tape. This allowed disk as a backup target to evolve from a "cache" in front of tape drives to the primary backup area that most data centers use today.
But disk backup appliances need to change once again to keep up with the features and capabilities of virtualized backup software products such as Veeam, PHD Virtual, and Nakivo.
Each of these virtualization-specific backup products as well as some of the more traditional enterprise backup applications, fully exploit the virtualized environment. They can now back up data at a changed block level and many can do some form of recovery in place where virtual machines (VMs) are launched directly on the backup device.
Randomized backups
Changed Block Tracking (CBT) changes the way data is streamed to the backup appliance. After the initial backup is stored on the backup appliance, only the blocks of the VM that have changed since the last backup are sent to the appliance.
This dramatically reduces the size of the transfer and enables much more frequent backups, potentially hourly instead of nightly. Instead of a single, once per night, sequential data transfer, backup is now a continuous, randomized backup stream. With small I/O coming from the dozens, if not hundreds, of VMs in the environment, the way the backup appliance receives data need to change.
Performance-oriented restores
Virtualization-specific backup solutions also change the way data recovery works as an increasing number of these products have the ability to instantiate the VM directly on the backup device by mapping a drive back to the original host. This eliminates the need to transfer data across the network during a recovery and allows for a VM to come back online rapidly.
The disk backup appliance gap
While both CBT and Recovery In Place are valuable features, they expose shortcomings in disk backup appliances. The features that disk backup appliances leverage to drive down cost -- deduplication, compression, and high-capacity drives -- are not suitable for the continuous, randomized I/O that CBT creates. These appliances also aren't suitable to launch VMs in place as duplication and compression begin to get in the way of the recovered VM that is now running in production form on the actual appliance.
Changes needed
Disk backup appliances must adapt to this new reality. First, they need to be able to receive a smaller but more continuous stream of I/O. Second, they need to be able to host VM stores for a period of time while delivering acceptable performance. Both of these goals can be accomplished if backup appliances started to integrate a very small tier of solid-state disk.
At the same time, backup appliances still need to be able to be more cost effective than the primary storage systems they are backing up. The problem here is the old tricks of deduplication and compression may not help as much, since CBT eliminates a lot of the redundancy that backup appliances used to count on for effective cost per GB. Additionally, deduping a live VM may impact performance. Instead, these systems need drive down the hard cost of capacity by using off-the-shelf disk drives and more efficient RAID algorithms.
A modernized disk backup appliance should enable and extend features like CBT and deduplication. The key is to deliver this performance while continuing to drive down cost.