When someone says, "It won’t scale" or "It's not elegant," we assume they're talking about design -- network design, protocol design, or anything else. But they aren’t really. Rather, these are statements about complexity. Okay, so you're asking, “What?” and probably thinking, “Russ has gone off the deep end!” It’s always fair game to question the state of my sanity (my kids always do), but let’s look at this issue a little more carefully.
What do we really mean when we say, “It won’t scale?” In relation to network, software, or even protocol design, what we seem to mean is one of two things: Either we can’t make it bigger without it failing, or we can’t make it change without it failing. In other words, the design in question is brittle in some way -- there is going to be a point where a single change is going to cause a systemic failure. Like an ossified bone or a piece of petrified wood, it might appear to be strong, but it’s brittle. A single hammer blow cannot shatter a piece of wood, but it can certainly shatter a piece of petrified wood.
At the protocol scale, EIGRP used to exhibit just this sort of ossification. Because of the way the stuck in active process works, EIGRP was what we called a "cliff protocol." Many folks who deployed EIGRP bragged "You don’t have to design an EIGRP network; just deploy it, and it will run." In the real world, intentionally not designing your network is a really dumb idea, but let’s not go down that rabbit trail for the moment. Instead, let’s focus on why EIGRP was seen this way.
The answer is simple: because it was true, at least to a point. You could simply throw EIGRP at any network, and, up to some scale, it would just work. Going just a little over that scale, though, would cause the entire network to melt. The number of EIGRP networks I’ve worked on that simply wouldn’t converge is quite astounding, but the techniques we used to calm the roiling waters so we could spend time actually redesigning a network that had been just "thrown together" were pretty straightforward.
EIGRP’s state machine is really simple, but there was one circumstance that had to be dealt with: What happens when the network situation steps outside the state machine? The answer was a brute-force attack to bring the control plane back into the state machine -- the stuck in active (SIA) process. The SIA process wasn’t well designed from the beginning, having positive feedback loops and other issues that caused the control to enter one of several states where it would simply not converge.
EIGRP has been reformed in recent years, pushing the cliff way off into the distance, by the SIA rewrite (effectively expanding the state machine to cover more circumstances) and other features like stub routers. However, two lessons remain: failure to design is a path to network failure, and all protocols fail at some point, under some conditions.
Elegance is a different expression of complexity; it’s a statement about the amount of complexity verses the hardness of the problem. Complexity theory, in fact, tells us that complexity is a reaction to uncertainty. For instance, Alderson and Doyly wrote in "Contrasting Views of Complexity and Their Implications For Network-Centric Infrastructures" (IEEE TON 40-4), "In our view, however, complexity is most succinctly discussed in terms of functionality and its robustness. Specifically, we argue that complexity in highly organized systems arises primarily from design strategies intended to create robustness to uncertainty in their environments and component parts."
So we add complexity to a network to deal with uncertainty, but then we realize, at some point, that this very same complexity increases brittleness. We can illustrate this problem with a simple chart.
One of the main points of design should be to somehow find the “sweet spot” where the robust sort of complexity ends, and the fragile sort of complexity begins. The problem, however, isn’t that simple (did you already suspect that? If so, good for you). In fact, the problem is that this robust/fragile divide happens along a set of axes in every possible design decision.
If you haven’t already guessed, this is going to be a series, rather than a one-off blog post. I can’t promise it will be weekly or monthly series because I’ll likely intersperse other posts in between the posts in this series. But, over time, I hope to cover some ground in the world of network complexity as it relates to design.
And did I mention I’m writing a book on this topic right now? More information on that as I progress on the outline and layout. Right now, I can say it will be published by Addison-Wesley in about a year.
Until next time -- keep it simple.