When it comes to archiving, IT professionals think of dusty old bits of data that are no longer accessed but still must be stored "just in case" they are needed again in the future. This type of archive can be safely stored on high-capacity disk or even tape, since response time is not an issue. However, there is a new form of archive emerging where response time to that "just in case" event is critical and can be fulfilled only by a flash array.
This real-time archive is similar to a traditional archive in many ways. Most of the data it contains is not being accessed and has not been for months or years. It also is being stored just in case someone needs it. But the big difference is that, when that data is needed, it needs to be made available instantly -- essentially in real-time.
Real-time archive use cases
The primary use case for a real-time archive is personalization of the user application experience, commonly demonstrated by Web 2.0 companies. The applications respond in real-time to users as they interact with the application, creating a personalized experience. While there are countless examples, a common one is the social networking application that suggests connections and presents advertisements as it learns about you.
But these customizations are advancing; devices now can suggest restaurants and other establishments that are around you. They can create photo albums for you to buy created from your photos based on the time of year or even your current mood. Other organizations are using facial recognition software on what used to be surveillance data (now marketing data) to determine your buying patterns.
All these use cases -- and many, many more to come -- require that data be instantly available and rapidly searchable, but stored cost effectively for a long period. The archive essentially changes from data that is never accessed to data that is constantly accessed but never changed.
The storage challenge
The challenge with this real-time archive is determining the type of storage on which you place it. It can't go on the traditional disk or tape archive -- access and search are too slow to respond to the instant personalization requirement. Even production-quality, high-speed disk may be too slow. Most of the Web 2.0 companies implementing these applications today are putting that data on PCIe-based flash inside servers or on all-flash arrays.
Of course, the problem with using these types of systems is one of economics. Most of the organizations at the forefront of real-time archiving can justify the massive cost by delivering increased customer retention or better advertising click through. But as the concept of personalization works its way into more traditional non-Web 2.0 companies, the cost justification needs to be easier to meet.
The TLC all-flash array
The way to fully enable a real-time archive is to make the storage costs more in line with the type of data. Triple level cell (TLC) NAND can hit that price point and still deliver the performance needed. TLC writes three bits per cell, instead of the two bits per cell of multi-level cell (MLC) or one bit per cell of single-level cell (SLC).
The concern over TLC in the enterprise is that it would wear out too fast. The TLC all-flash array archive would essentially be write-once (or sparingly) but read-heavy environments -- ideal for TLC NAND.
There was a time when using MLC-based NAND in the enterprise was thought to be too risky, but now it's commonplace. TLC is risky and could wear out quickly if written too often, but in a read-mostly environment, it could easily fulfill the mission at hand. Combined with continued advancements in flash controller technology, this will make a TLC flash archive a reality within the next 12 months.