As a use case, backup is probably the most obvious way to consume cloud storage. The nature of backup -- sequential access, less of a focus on performance, infrequent read access, not latency sensitive – makes it the ideal type of data to be placed in what appears to be nothing more than a large data repository. This single use case, however, limits opportunities to use cloud storage for other workloads and to take advantage of the operational benefits that result from not having to manage infrastructure. So how can we exploit the characteristics of cloud storage for other use cases?
First, let's review the issues associated with using cloud storage for applications other than backup.
Latency – Unless you’re co-located with the likes of AWS or Azure, latency is a big problem. Traditional on-premises storage expects I/O latency of between 5 to10 milliseconds, with all-flash solutions consistently offering less than a millisecond. The issue of latency has become such a problem that caching products such as PernixData’s FVP are used to reduce the I/O traffic leaving the server. The typical I/O latencies experienced with cloud providers makes connecting over standard protocols like iSCSI a non-starter.
Predictability and performance – One of the nice things about running your own data center is having control over the network traffic (either IP or Fibre Channel based). This makes the process of ensuring consistency in I/O response times much more manageable and, apart from latency, there’s nothing worse than inconsistent I/O performance. Step outside the data center and you have to use dedicated network lines into a cloud provider or trust the public Internet where there is no guarantee your data will get through with any decent service level.
Cost – Most online storage platforms charge in three places: capacity stored: I/O operations performed; and network bandwidth used (typically moving data out of the cloud). These costs are unpredictable and can stack up; for example, AWS charges between $50 and $90 for every TB of data read per month.
Protocol – Most cloud storage platforms, including Amazon S3, Backblaze's new B2 service and Azure Blob storage, are based on object stores. These are systems capable of storing large unstructured data objects, but aren’t designed for traditional LUNs/volumes. Some vendors do offer file services, but these are designed more for home directories and shared data rather than high performance NFS/CIFS workloads.
Backup is less susceptible to the issues of latency and I/O predictability, mainly because the backup process is traditionally sequential or streamed in nature, where the response time of individual I/O isn’t a big deal. As backup is mainly a write-intensive operation, costs are mainly based on the volume of data stored. Protocols are usually not an issue either, as data can be broken down into fixed size objects for storage in the cloud. This is a similar process to the way products such as Symantec (now Veritas) NetBackup would write data to tape.
Leveraging cloud storage for more than backup
There are techniques that can be used to make cloud storage work as a backend for traditional applications. They include:
Gateways – The term gateway can be controversial, so perhaps we should say “cloud storage on-ramp.” Whatever you call them, products such as Nasuni Cloud NAS, Avere vFXT or Microsoft StorSimple allow data to be stored in traditional formats through the use of an on-premises appliance that manages the protocol translation (file and iSCSI block) into object data. These products are more than gateways, though, handling multi-tenancy, data integrity (file locking, encryption), performance (local caching) and optimizing the way data is read to and from the cloud.
Learn more about the changing storage landscape in the Storage Track at Interop Las Vegas this spring. Don't miss out! Register now for Interop, May 2-6, and receive $200 off.
Modify your application – Although it’s more complicated, applications can be modified to take use of cloud storage. A good example is archiving, where the data is still a primary copy, but doesn’t have the performance characteristics of active data. Solutions already exist today to archive the most obvious data such as files and emails.
Move your application – Where possible, moving the application itself to the cloud provides an option to make more use out of local cloud-based storage services, either completely or through “cloud bursting.” This includes the obvious unstructured and block-based solutions like AWS Elastic Block Store. There are also structured products like SQL and NoSQL databases and data warehousing solutions that can eliminate much of the work involved in administering the database and the underlying storage.
As we can see, cloud storage does mean more than backup, but effort is required to make consumption easier and friction free. This is another example of how the changing role of the storage administrator will move to looking after the data itself rather than the physical hardware.