Unstructured data continues to grow almost unabated. Applying policies and classifications to this data has proven to be a daunting task. For the most part, going back after the fact and identifying what data should go where, how long and securley it should be retained isn't happening. The job is too big in today's era of stretched too thin IT. How then can some semblance of order by applied? What if we apply data management interactively?
Interactive data management is applying policies and information about that data as you save it but without having to use a specialized document management application. For example as we discuss in our white paper "Using Cloud Archive", most archive systems have an API set or use an open access protocol like Webdav. What if during the move to the cloud archive, you were able to set as part of that movement, policies specific to that data set?
For example when you archive a project, as part of that archive step what if you were able to easily set parameters like how long it should be kept in the archive before being permanently deleted, how long it should in a read only or WORM state, how many copies of the project should be kept, should those copies be dispersed geographically, should the number of copies reduce over time, should the data be compressed and deduplicated or does it need to be held in its original state? Finally you may want to add some key words to the project so that index and retrieval functions are more accurate.
Setting these parameters interactively when the data is fresher and the answers to those above parameters is more top of mind can make the classification and implementation of a data policy more manageable. It also breaks the classification and retention tasks in to smaller chunks as opposed to them being massive undertakings. Massive undertakings seldom get done, small easy tasks often do.
Ideally this could evolve into a user driven function. For example I may reference the document that holds this blog entry a month from now, but I doubt I will six months from now and certainly not a year or two from now. Yet all my data is replicated to the cloud via a local agent. Cloud Storage Services like those available from Dropbox or Soonr have agents that move data to the cloud. If those agents could evolve to allow me to set, at the point of saving a document, some or all of those above parameters I could make the management of my data much easier.
If self-classification could be extended into the data center where you could classify the various parameters of files as they are stored it would be a huge off-load of work from storage managers. This interactive data management would allow classification and retention policies to actually get set and automate some of the life cycle associated with files.