Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IBM's Continuing Information Infrastructure Journey: Page 2 of 4

GPFS is also a file management system, and IBM believes the technology distinguishes itself from other clustered file systems by providing concurrent, high-speed file access to applications executing across a wide range of server nodes. The key capabilities of GPFS include storage management, information life cycle management tools, centralized administration and shared access to file systems from remote GPFS clusters.

Please note that while GPFS typically deals with "unstructured" (i.e., for the most part file) data, according to an IBM white paper, GPFS also supports relational databases with structured data that support mission critical online transaction processing systems (OLTP). That is an important point since such environments are block-based, not file based. Although there are probably a number of OLTP applications that will never submit to running under a file system, many more could fall under a file system's control. That issue has been recognized, but never really emphasized, by the industry for a long time.

IBM is using GPFS as a key building block in other technology offerings. For example, GPFS is at the heart of the company's SoFS (Scale out File Services), which can be used to provide the scalability that has hampered NAS deployments in the past. SoFS also incorporates a number of other technologies, such Tivoli Storage Manager, Tivoli Hierarchical Storage Manager and Samba. 

But so what? Why revisit existing technologies? Well, most of the data explosion that companies are creating is file data. That data has to be managed effectively And GPFS can serve as the foundation solution for the following reasons:

    * A true global name space capability has to be put in place to eliminate islands of data that can cripple or even prevent user access to information, as well as the ability to integrate and analyze distributed pieces of information.
    * The technology has to scale to manage what was once considered an unimaginable amount of data -- a petabyte (PB). IBM pointed out that when operating at this scale that 20% of the resources, such as disks or interconnects, might well be unavailable at any one time. That is shocking to think about in what, by definition, must be a high availability (HA) world. But the unavailability of resources does not mean that the user should notice any impact. That comes about through transparent failover from failed components to working components as well as the ability to clean up over time using self-healing capabilities.
    * High speed scanning is necessary for backup, archiving, replication, and other functions.
    * Management requires the effective management of metadata as well as data. Files have useful metadata; blocks have extremely limited metadata. Managing an information infrastructure optimally requires intelligence (i.e., software that can make decisions based upon policy), but that intelligence needs something from which to make its decisions, that is metadata.