Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Self-Describing Data: Page 3 of 5

These data-related characteristics must be matched to a matrix of storage-platform costs and capabilities by a thorough analysis of the storage infrastructure. Different storage platforms may manifest different attributes in terms of topological accessibility, RAID levels, replication schemes, security features and speeds, for instance, and virtually all platforms manifest different costs as a function of their design, length of service and depreciation.

The objective of an efficient capacity-utilization system is to ensure that data is placed on the right platform when it's created, and is migrated to the most cost-effective platforms--those that meet its inherent requirements at the lowest possible cost--throughout its useful life. Once the data no longer needs to be retained, such a system deletes the data from all platforms automatically.

The Data-Naming Game





How Self-Describing Data Makes for Efficient Capacity Utilization



click to enlarge

A true ILM solution implements efficient capacity utilization. Such a solution must provide a mechanism for analyzing applications to discern the characteristics they impart to the data they generate. It must also provide a facility for creating an easy-to-use schema to store the categories of these characteristics, to be used subsequently to add a header or other data-naming "artifact" to data upon creation. And it must provide a mechanism for applying a self-describing header to data before the data is written to any storage platform.

An ILM system demands some other components/functions as well: a knowledge base about storage-platform costs and capabilities; a mechanism that collects and documents storage-infrastructure information and then arranges it in a set of class- or cost-of-service descriptions to reflect the various combinations of storage-infrastructure components available to meet the requirements of different types of named data; and an access-frequency counter function that runs in the infrastructure and checks stored data at regular intervals to determine how often the data has been accessed.