Service providers and large enterprises have a goal of delivering "24x7 forever" availability, particularly for mission-critical services and applications. EMC wants to help customers meet that goal with the concept of continuous availability (CA), which marries high availability (HA) and disaster recovery (DR). The CA approach is built around EMC's VPLEX product, as well as a new service offering to perform assessments and analyze costs.
The first step in delivering a 24x7 forever is to provide enough extra server and storage capacity to create an HA system. The HA system is the first line of defense against problems that threaten "five nines" application availability. Different services and applications will require different levels of redundancy. For an enterprise database application, servers are typically replicated 100% for redundancy. EMC estimates, though, that for a Web farm, only about 20% more servers need to be provided, so only 20% redundancy is necessary.
The second step is to create a DR capability at a site geographically separate from the original data center. This typically requires 100% redundancy in both servers and storage. Note that the 100% is true both for enterprise databases and Web farms because if availability is impacted it would be the whole site (otherwise it would not qualify as a disaster).
Notice that the redundancy required to fully protect availability is an extra 200% in the case of databases and 120% in the case of Web farms.
An Alternative to the Conventional Architecture
With its new Continuous Availability Advisory Services offering, EMC proposes an alternative to traditional scenarios--a merger of traditional single-site HA with dual-site DR to create a continuous availability system. In a full CA architecture, transactions from the same application are processed in each of the two sites simultaneously. This is done by using global load balancing to distribute transactions to each site. Web and application farms are stretched between sites creating active-active applications.
At the data layer, for example, a local Oracle RAC cluster can be stretched between the sites to provide a locking mechanism over the databases. And then the storage layer is connected via EMC's VPLEX to provide a data coherency mechanism that syncs the data between the storage arrays deployed between the sites.
The final piece of the architecture is the use of active-active data center infrastructure components, such as a shared name space and common IP addressing, which are deployed so that applications can run seamlessly in either site. Probably the most interesting thing about EMC's approach is that the company claims the architecture can be provisioned with off-the-shelf components and most applications can be adapted without code changes.
[ Join us at Interop Las Vegas for access to 125+ IT sessions and 300+ exhibiting companies. Register today! ]
And where an application does not fit nicely into the mold of an active-active application architecture over Oracle RAC, a near-CA architecture can be deployed where application and database clusters run normally in one site and fail-over to another site. In this near-CA architecture, the storage layer is still using VPLEX, and the applications and DBs are set up in a two-site HA mode. This new paradigm that EMC is rolling out can provide many different combinations of CA and 2-site HA modes at the Web (presentation), application, data and storage layers to provide a level of resiliency above what was previously achievable.
In this architecture, EMC argues, each of the two sites needs only about 60% of the original performance capabilities for a total of 120%, which is 20% redundant. What magic does the company use to achieve this? EMC employs an approach called "fractional provisioning" of the server count. Under normal circumstances, 100% is enough by definition; and in most cases, if you look at the CPU utilization under most day-to-day circumstances, utilization averages somewhere in the 50% to 70% range. The remaining free space (above the 50% to 70% mark) is headroom and is used during peak hours or heavy business usages. So, the logic goes, put the average compute of 60% of the need in each site for a total of 120%.
Next Page: Going Not Quite the Distance