As NetOps teams continue to pick up tips, tricks, and techniques from DevOps in their quest to optimize IT and scale operations to meet business demands, it’s important to separate fact from fiction with respect to certain concepts that are core to the movement. I'm going to dig into three of the biggest misconceptions associated with DevOps that, if not corrected, can be detrimental – and even dangerous - to your ongoing network transformation initiatives.
Myth #1: Automating manual tasks is the key to speeding up deployments.
Fact: Automation is an integral component of network transformation.
Speed isn’t achieved by automating manual tasks. Speed is achieved by eliminating wait times. That’s why culture matters.
It’s a common misconception that you can deploy faster if you just automate manual tasks. After all, commands executed manually must take longer than a computer. It’s not rocket science, but it is computer science. Except that isn’t where most of the speed – and delays – originate in the first place. That’s why a focus on automation over orchestration is problematic. Automation is the codification of a specific task, such as adding a rule to the firewall. Orchestration, on the other hand, is the automation of a process – of a pipeline, to use DevOps-friendly terminology. It’s the big picture, the one that spans operational concerns and enables end-to-end automation of an app deployment process.
Only by mapping out the process can you find the biggest time wasters: hand-offs between teams. It isn’t the five minutes of manual configuration of a firewall causing a delay, it’s the two-day wait between submission and closing of the ticket. It's time spent in the queue that causes delay, and it’s those delays that add up to lengthy deployments and frustrated stakeholders. By mapping out the process and shedding light on those wait times, you’ll be able to formulate a strategy for tightening up the hand-offs. Doing so immediately nets big wins for the entire pipeline and encourages organizations to reevaluate deployment processes that were put into place to support a completely different app architecture and organizational strategy.
Myth #2: IT must also be measured by uptime.
Fact: Uptime is an important business measure.
The metrics you use to measure success matter. Aligning with other operational groups to measure what matters will change behaviors that ultimately lead to the outcomes you want to achieve.
While NetOps languishes in the land of five 9s, DevOps practices encourage a focus on Mean Time to Resolution (MTTR), lead time to change, and other behavioral metrics. This approach encourages attention to something we have control over – our behaviors and operational practices – rather than what we don’t – downtime caused by attacks, system failure, and external outages.
By focusing on a metric like MTTR, we adjust our attention to proactively implementing technology and systems that augment our ability to detect and fix the inevitable. Instrumentation, observability, and real-time communication are the cornerstones of building out a DevOps-like practice in NetOps that meets or exceeds expectations for uptime. Because the faster you can find and fix a problem, the shorter the downtime. The outcome is the desired state, but how we get there determines how successful we are at reaching it.
Myth #3: Automating manual processes eliminates human error.
Fact: Reducing manual inputs lowers the rate of human error.
Automation doesn’t eliminate human error, especially when you’re talking about code. Defect density is an important measure of efficiency, not just of those who write the code, but of the thoroughness of your testing practices.
You might not want to call it code, but the scripts you develop to support automation in NetOps are, in fact, code. That means they’re subject to the same issues as the code written in app dev. Automation can reduce human error, yes, and that’s good news for those interested in reducing the 22% of outages caused by human error. But that doesn’t mean complete annihilation. In fact, it introduces a secondary avenue through which human error can creep into operations: the code.
There are a multitude of errors associated with code. Logic errors are one; security-breaking practices are another. And of course, there’s a wide array of algorithmic errors that can cause an edge case to decimate your data center in 2.5 seconds – or less. Think it can’t or won’t happen to you?
It happens. Frequently. In fact, it’s accepted as inevitable and there is an industry standard measure of software quality based on it: defect density. Defect density is generally measured in terms of KLOC – thousand lines of code. According to Coverity, which has been scanning and reporting on quality across open source projects for many years now, the average defect density for the software industry is about 1 defect per KLOC. Striving to hit that industry standard is a good goal to start with, depending on your business’ tolerance for error.
One of the ways to reduce and manage defect density is to adopting DevOps practices that incorporate static script analysis and code reviews. Both are critical to discovery and remediation of defects introduced during development.
Lowering defect density measures is good, but it can’t eliminate the inevitability of human error on the operator side. When firing up a script to shut down systems or push a configuration change to large numbers of devices, getting the target right is critical. A single mistyped parameter by an operator can have devastating consequences. Safeguards such as two-step verification (“are you sure you want to delete 100 devices?”) are essential in preventing operator error that can -- and has -- triggered cascading outages and failures in large-scale systems.
The ongoing transformation of your network is important. It’s necessary, but it’s also a strategic initiative that requires just as much thought about how you’re going to do it as doing it because the practices you put in place today are going be the practices you use for the foreseeable future. Separating fact from fiction when it comes to adopting DevOps practices for NetOps not only enables you to lay a solid foundation, but ensures you aren’t setting yourself up for failure.