Now that many enterprises have started their cloud migration journeys, the majority are adopting hybrid cloud architectures to deploy apps across different on-premises and public cloud infrastructures. This combined use of on-prem and cloud resources serves as a transitional, yet comfortable, middle ground by maximizing cost-savings and productivity while addressing privacy and security concerns.
While these are smart moves for implementing digital transformation, they introduce new operational domains that require a combination of monitoring techniques to create a modern, full-stack, hybrid cloud monitoring capability.
Regardless of how an enterprise chooses to architect its hybrid cloud deployment, two new operational domains are introduced: first, a public data center whose infrastructure and architecture is no longer controlled by the enterprise, and second, a matrix of complex interservice communication crossing multiple networks between the distributed application components and data centers. Instabilities and outages in any one part can have an avalanche effect impacting end-user experience.
(Image: Nattapol Sritongcom/Shutterstock)
When it comes to the on-premises data center – the most familiar aspect of hybrid cloud – the enterprise owns everything that resides within the data center, from the applications to the infrastructure and networking. A combination of monitoring techniques do well in this environment. But outside the walls of the data center, where enterprises don’t own the infrastructure, traditional monitoring techniques hit their limits.
Application availability and performance metrics such as page load and response time are measured through synthetic techniques:
- APM (application performance management) provides the perspective of end-user experience through code injection and agent-based data collection.
- Infrastructure monitoring can range from capturing health metrics through SNMP polling to using Unix-based utilities like collectd to read performance data of networking equipment.
- Packet captures and flow records provide a compositional understanding of traffic ingressing and egressing the data center.
Monitoring the cloud
In a public cloud environment, you might own the application hosted within your IaaS provider, but you have no control over the infrastructure or networking scheme. Virtual host-based packet capture probes like ntop or virtual taps are options, but they add significant overhead. VPC logs and CloudWatch-like services become critical to understand performance metrics from your VPC instances and in most cases require integration with modern analytics platforms like Datadog or Splunk.
Most enterprises going hybrid rely on the Internet - made up of multiple ISPs - as the underlying connectivity engine between their on-premises data centers and the public cloud. However, when these ISPs are part of the connectivity equation, during an outage, enterprises need to be able to isolate which provider’s network is causing a problem and have enough evidence to get them to respond and fix the problem.
Without this level of visibility, you’re leaving all those API calls that have to be executed flawlessly across platforms and between microservices exposed to a high degree of systemic risk, without effective ways to detect and remediate problems. Fortunately, it is possible to use a combination of active monitoring techniques to gain app, network path, and BGP routing layer insights.
That brings us to the question: What does a full-stack, hybrid cloud monitoring capability look like? In a hybrid cloud computing environment, full stack monitoring should no longer simply be a vertical approach of looking at network, server, storage, and application code in silos. That's still valid, but enterprises also need a horizontal lens spanning different type of data centers, including the connectivity between them and the multitude of inter-service communication threads running across that connectivity.
Enterprises should consider a combination of techniques and data sets to build a comprehensive view of digital service delivery across all three operational domains. Ultimately, all that data should come together in one or more big data-based, automation, and algorithmically friendly platforms that makes operations team smarter and faster.