File servers and network-attached storage (NAS) systems are suffering from a data deluge of unbelievable proportions. This growth is coming primarily from unstructured data. In other words, data not in databases. The fact that data is growing is of no surprise. The pace at which it is growing is taking some data centers off guard.
Unstructured data is no longer files from office productivity applications. Although the number and size of those files is growing, the real problem is coming from media such as videos and podcasts and machine-generated data from devices such as Wi-Fi security cameras.
The Problem With NAS
This growth in unstructured data is breaking many file servers and NAS solutions. First, there is the hard capacity limit that these systems have built into them. The further a system can scale capacity, the more that system costs up front, an expense that an organization might not be able to justify. The alternative is to buy a smaller system that costs less but needs to be replaced more often.
[ Microsoft a storage heavyweight? Read Is Microsoft Ready To Be A Storage Player? ]
Scale-out storage was supposed to be a solution to this problem. And largely it is. Scale-out NAS allows for multiple NAS heads to be clustered so that their capacity is aggregated and they can be managed as one distinct unit. The challenge for scale-out storage systems is that they might start too large for some organizations because you need enough nodes -- typically three -- to establish the initial cluster. Also, many scale-out NAS solutions won't allow you to mix node sizes, so you can't start with "small" nodes and then add big ones later.
The second problem is universal: both scale-out and traditional NAS systems have a finite limit on the number of files that they can support before performance is affected. The performance impact occurs long before theoretical file limit for the NAS is reached. Every file has metadata, and the NAS has to maintain and use that metadata to serve files and protect that data. The more files there are the more metadata that needs to be managed. This management takes processing power of the NAS controller and places overhead on file system responsiveness.
The result is that many data centers end up buying a new NAS system before their current one is at maximum capacity. In fact, 50% full is a common standard used to stop adding additional data to a NAS and to buy a new one.
A potential solution is object storage. These systems allow for a capacity of 80% or more yet are not bogged down by complex metadata operations. The object file system is essentially flat -- you don't create complex paths to data. Each file or object is assigned an ID or serial number and access to the object is done through that number.
The problem with object-based storage is that systems that use it are targeted at large cloud providers with billions of files. Although the cost to purchase and manage a large system is very compelling based on cost per gigabyte, the initial buy-in is beyond the grasp of most organizations.
We are seeing the emergence of file systems and object storage systems that are designed to start very small but offer similar expansion. They also often offer common interfaces -- NFS, CIFS, iSCSI -- to the object storage system instead of requiring a REST API to get to files. Some can be installed as a virtual appliance and others are scale-out designs that can start with a single node.
Unstructured data is going to be a challenge for data centers of all sizes. Even Storage Switzerland struggles with this. When we produce videos at events like VMworld we create terabytes of data in a few days' time. And like everyone else we store this data forever. The time to explore object storage for more traditional use is now, before you end up with dozens of NAS systems.