I'm satisfied that snapshots and replication in conventional
storage systems can serve the same function as more traditional backup
schemes. While snapshots make satisfying
the most common restore requests easy, the limitations of the snapshot mechanism in most storage systems
leaves most organizations using snapshots as a supplement to,
not a replacement for, backup copies. Does the cloud change the snapshot as
backup calculus? Some cloud storage
vendors say it does.
The primary function of a cloud storage gateway, like those from
vendors such as Nasuni, Cirtas and StorSimple, is to let users take advantage
of cloud storage without rewriting their applications. Without a gateway your applications
have to put and get data objects from the cloud storage provider you've chosen
through that vendor's particular API. Your users want to store their data on a NAS
or file server via CIFS or NFS, Also, server applications like Exchange and
SharePoint need traditional block interfaces.
The cloud storage gateway maps these common protocols onto the cloud
object store and provides a local cache to make your applications run faster.
The cool part is that the gateways also provide snapshots.
Since cloud storage providers will be glad to sell you as much space as you
want, the gateway vendors have designed their systems to let you have an
unlimited number of snapshots of your volume or file system.
That's a big step up from the 16-255
snapshots most disk systems let you keep online and, since the snapshots exist
out in the cloud but your gateway has a couple of TB of cache for the working
set of data that you and your applications actually access on a day to day
basis, those snapshots won't have any impact on performance.
A redundant pair of caching gateways is reliable enough that
I would consider them and the snapshot data they hold to satisfy my need for a
local copy. Since all your data is in
the cloud, data is "backed up" in close to real time and if you need to recover
at a remote location you just need to fire up a gateway at the remote
site. The new gateway will start
populating it's cache as your users access their data and, despite the fact that
it's restoring across an Internet link, your users are accessing their most
critical data faster than if you restored a whole server from a conventional
backup, as the gateway restores data in small chunks as needed.
Now don't get me wrong, cloud snapshots aren't perfect. If you decide to keep 5000 snapshots you'll
have to pay Amazon or Nirvanix every month to keep all the data in those
snapshots online. Like other snapshots,
snaps in the cloud don't come with extensive metadata so a keyword search might
be a slow and painful experience as the whole data set has to get dragged down
from the cloud.
Using the cloud as your primary storage also puts you at the
mercy of your cloud storage provider. If
they lose your data, raise their rates, go belly up or otherwise cause you
problems, retrieving your data and getting set up elsewhere will be a painful
process.
Now I'm pretty sure that top notch providers are better at data
management than most organizations but there is some risk here. The truth is that cloud storage provider SLAs, as
important as they may be, can't make you
whole after a cloud service provider loses your data any more than Kodak sending
you a new roll of film made you whole after they lost the pictures of your
honeymoon in Bora Bora or the kid's first steps.
I'm looking forward to the day when cloud gateways can store
their data to multiple could back ends to reduce this risk. Even better would
be if they could write to a local object store like a Caringo CAStor or EMC Atmos and a public
cloud provider. That would give me fast access for eDiscovery and real time
offsite backup.