Automation is a noble goal of software definition that lets us rapidly deploy new configurations and stand up devices with a minimum of human effort. There's no point in wasting human brain cells on a task that could be accomplished by a script or a process. But the more I think about it, the more I wonder if automation really fixes a problem or just masks the symptoms.
At the recent Software Defined Data Center Symposium, I moderated a panel focused on application-centric networking. I asked the panel whether software could improve our existing processes and allow us to remove some complexity in the data center.
The response was mixed; many thought that the problem wasn't in the goal of making hard things easy with automation. The problem was: "Why are we still doing the hard things the way we do them?"
If I create a script to type in the exact same commands I would use when I configured a switch through the CLI, I've reduced the amount of time I spend configuring the switch. But I haven't really fixed the complexity problem. I just hid it behind a wall of text. I moved the bottleneck from my keyboard to an automatically executing file. Abstracting the problem away from me directly doesn't make it any less present.
That's why the promise of network APIs is so alluring. APIs give me a way to cut through the complexity with a hot knife. I no longer have to remember syntax or worry about if my firewall script is incorrectly being applied to a router. I can query a device's capabilities and let my automation engine decide how best to configure it. But that requires a higher level of thinking not present in today's implementations.
[SDN aims to make the network more responsive to applications, but applications also need to understand network state. Find out why in Should Applications Listen To The Network?."]
Think back to the days of DOS batch files. What happened if something went wrong? Failure meant the script didn't execute. Where did it fail? Did the applied configuration still persist? Were changes backed out before being committed?
And if I chose to do this without human intervention or monitoring, how did I know it didn't work? Did the script produce an error or some kind of report that would find its way to me via email or a text message?
Automating my network with scripts or current-generation tools like Puppet or Chef to make it behave the same way it does today is the Network Function Virtualization (NFV) approach. At the end of the day, I'm still left with an integration nightmare. And if I ever decide to upgrade that equipment's firmware or change it out for a new manufacturer, my scripts are useless.
APIs allow me to build a GUI that my admins can use like a picture menu. Select the firewall, check a couple of boxes to allow ports from a specific application (or group of applications), and let the API do the work of figuring out how to make the firewall do what it's told.
APIs aren't a magic bullet in and of themselves. They do transfer complexity just like a script. The key is that they are a more extensible solution. They present an interface that can be interrogated or programmed, instead of a passive batch script simply executing on an existing interface. They can provide feedback about a device's capabilities.
This is a better construct for handling the growing needs of a network as we begin to move the focus away from simple bandwidth and availability and more toward serving the needs of applications.
Well-designed APIs can adjust for changing conditions like adding more backend support to an application or instantiating a load balancer in front of a Web server seeing high traffic. I don't have to worry about adding a new script to account for these changes. Adding the new services just requires a drag-and-drop option that the back-end API auto-magically takes care of.
Once we start using APIs or other similar constructs to shoulder the load, we can then use that as a base to find better ways to accomplish tasks. What happens when we have a device with no CLI and only a programmatic interface? Or a controller-based network that can turn up a new device and provision it without the intervention of a third-party service like Puppet? When we have a reliable interface that scales much better than quick-and-dirty automation, we can then begin to really reduce the unnecessary parts of provisioning.
Masking complexity doesn't solve a problem. Removing complexity does. We should strive to make our solutions eliminate the things we find distasteful instead of sweeping them under the rug. To me, that's the real value of software-defined networking. When we can change our process to do things the right way and eliminate the complexity that makes troubleshooting difficult, then we will have truly advanced networking to a new era.
What do you think? Do current automation schemes do the job? Or should we be looking to scrap everything and write the tools to do it right? Are APIs the right answer or are we going to need something more to finally eliminate complexity?
[The data center is poised for transformation. Get insights into emerging technologies in the workshop "Building Your Network for the Next 10 Years" at Interop New York, from September 30th to October 4th. Register today!]