In my last entry I
looked at scaling single system solutions, and in this entry I'll take a
look at scaling backup deduplication via a storage cluster approach
delivered by companies like Sepaton and Exagrid. The idea here is to
make adding capacity and performance as simple as adding another node
to the cluster. Each time you add a node, capacity and performance
scales with it. This may be ideal for the enterprise and even for rapid
growth mid-tier companies.
All clustered storage systems are not created equal. As we discussed in
our entry "Storage Clusters - Tightly Coupled vs. Loosely Coupled," the
key thing to understand is how these storage clusters deliver on their
main promise to still deduplicate backups in an efficient manner. While
referencing a single target that scales seamlessly in the background is
an improvement, you also may want to make sure the deduplication is
applied globally across the cluster. In some cases, the deduplication is
only done on a per node basis and as a result somewhat reduces the level of
deduplication effectiveness.
Second, some systems require that you point to a specific node in the
cluster as opposed to a virtual node or control node. Neither are deal
breakers but worth being aware of. My thinking is that if you want
a clustered storage system, especially in the enterprise, that will grow with you, then you also want
the deduplication and performance to globally improve as you add nodes.
Finally, as anyone who has managed a cluster of any type, there is an
implication of added complexity with a cluster. A storage cluster is no
different. Storage vendors have reduced the complexity somewhat by
pre-packaging the base configurations of the cluster. If you have the
time to evaluate solutions, make sure you test adding a node to the
cluster. Do it yourself, from the point of opening the box all the way
through adding the node to the cluster and rebalancing storage
capacity. If you don't have time to evaluate solutions, then you should ask hard
questions to make sure you understand exactly how nodes are added and
what you have to do to make that happen.
As is the case with primary storage, there is no one right answer for
all data centers. As a result there is a never ending supply of
options. Single unit deduplication systems seem to
benefit from initial simplicity, potentially better energy efficiency
and should have a cost advantage. Multi-Node clusters benefit from
reduction in forklift upgrades and potentially global deduplication.