Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

How Big Data Can Improve Data Center Management

We in IT can sometimes be slow to recognize our power. For example, take cloud computing. We focused solely on aiding the business by  automating its activities so it could increase output or refine consistency of product development processes. Then we figured out it was time to automate our internal processes and increase consistency of our provisioning operations.

This same insight needs to happen with big data. A lot of organizations have been looking into big data analytics to discover unknown correlations, discover hidden patterns, market trends, customer preferences and other useful business information. Many of you have deployed big data systems such as Hadoop clusters. Ironically, these systems often impact our own data center services, forcing us to discover hidden patterns and comprehend correlations between new workloads and consumed resources.

The problem is, virtual data centers are comprised of a disparate stack of components. Every system, host, switch, storage system is logging and presenting data in the way each vendor deems fit. Varying granularity of information,  time frames and output formats make it extremely difficult to correlate data. And even more problematic, the vendor focus on metric output could be related  to a time where there was no x86 virtualization. All in all, this makes it extremely difficult to understand the dynamics of the virtual data center and distinguish cause and effect (causation) and relationship (correlation).

The interesting thing about the hypervisor is that it’s a very context-rich information system, teeming with data ready to be crunched and analyzed to provide a holistic picture of the various resource consumers and providers. By extracting and processing all this data, you can understand the current workload patterns. By having a gigantic set of data all in the same language, structure and format, you can start to discover unknown correlations and discover hidden patterns.  

With copious amounts of data, the only limitation is your imagination. Every time you increase your knowledge of the systems, you can then start to mine for new data, making relationships visible and distinguishing cause and effect. This by itself feeds into other processes of data center management, such as operations and design.

Having this information at your fingertips, you can optimize current workloads and identify which systems are best suited to host a new group of workloads. Operations will change as well, as you are now able to establish a fingerprint of your system. Instead of micromanaging each separate host or virtual machine, you can start to monitor the fingerprint of your cluster.

For example, you can analyze how incoming workloads have changed the clusters’ fingerprint over time. With this data, you can do trend analysis, such as discovering if you have seasonal workload. What is the increase of workload over time? Trend resource usage and compare cluster and host fingerprints to truly understand when scale-out is required. Information like this allows you to manage your data center in a different manner and helps you to design your data center far more accurately.

The beauty of having this set of data all in the same language, structure and format is that you can now start to transcend the data center. The dataset for each individual data center is extremely valuable for managing the IT lifecycle, improving deployment and operations, and optimizing existing workloads and infrastructure fora better future design. But why stop there? All these datasets of all these virtual data centers provide insights that can improve the IT lifecycle even more.

By comparing same size data centers in the same vertical, you can now start to understand the TCO of running the same VM on a particular host system (Cisco vs. Dell or  HP) or which storage system  to use. Or maybe at one point, you can discover the TCO of running the virtual machine in the private data center versus a cloud offering. That type of information is what’s needed for today’s data center management. It’s time to take the next step and leverage big data analytics to improve the IT lifecycle of your virtual data center.