A question that I have recently considered is: For networking, what are the prerequisites for widespread adoption of "infrastructure as code"?
However, before addressing this, let me offer some clarification. What do I mean by infrastructure as code? It's a systemic way of treating your network device configurations in a manner similar to source code. This includes following robust software development practices for testing and deploying these changes to production.
For example, all network changes are first submitted into a source code repository and then undergo a series of tests. Assuming all the tests pass, the changes are approved and scheduled. Finally, the changes are automatically deployed into production.
Infrastructure as code implies that no changes are made directly to network devices; all changes must be made through the deployment system.
Now, infrastructure as code brings up several interesting questions. First, is this even a good idea? What evidence is there to support this? For server teams, does treating infrastructure as code improve their ability to meet business needs? Do they still maintain reasonable uptime and security?
Second, is the network environment significantly different from the server environment? If so, what are the implications?
I am not going to try to answer those questions today. For now, I am going to assume infrastructure as code is a good idea, and I will return to my original question: For networking, what are the prerequisites for widespread adoption of infrastructure as code?
Here are the prerequisites that I see:
- Programmatic interfaces into network devices: You could also achieve this using SSH screen scraping, but programmatic interfaces will make it significantly easier.
- Tools that facilitate and simplify this process: The tools should be extensible in a modular way and should have some amount of idempotency (for that part of the tool chain that actually makes device changes). By idempotency, I mean the tool will first check the state of the device and will make a change only if the current state does not match the required end state.
- Network engineers with sufficient skills in both the tools and programming: An alternative to this would be pairing network engineers with developers.
- Ubiquitous, inexpensive virtualized network devices: This will enable automated testing of network changes.
- Tools that make it easy to verify the state of the testing environment and the production environment: This includes potentially interim states for complex changes.
- Intelligent rollback mechanisms to deal with significant error conditions: There are other possible solutions to this problem, including out-of-band management networks and increased use of virtualized network devices (overlay/underlay solutions). But at the end of the day, we need a way to recover from large failures caused by automated network changes.
- A configuration templating system for building and deploying initial device configurations: This system must be kept up to date with changes that are made to production devices. Additionally, there should be an automated way for ensuring production device configurations always comply with internal standards. Where possible, device configurations should be standardized.
There are many questions I want to explore in more detail, but infrastructure as code definitely intrigues me. Network engineers as a group need to improve how effectively we make changes in our environments to meet business needs. We need to break free of our highly manual, no-testing, make changes live in production, every device is unique mode of operating.