Urs Hölzle knows a thing or two about software-defined networking. As senior VP for technical infrastructure at Google, he oversaw the Web giant's foray into SDN a few years ago with its B4 project, a private WAN connecting the company's global data centers. B4 uses OpenFlow on Google's custom switches with merchant silicon. At Interop Las Vegas, Network Computing sat down with Holzle after his keynote to discuss B4, SDN benefits and challenges, and the future of networking.
NWC: Could you talk about your B4 SDN initiative and its benefits?
Hölzle: It has really helped us. The key part is you have an API to talk to, and you can orchestrate things the way you want. Because at a high level, we understand the big flows on our network and what they want to do, for example Gmail replicating a mailbox to another data center or our log system streaming logs back to a place for centralized log processing. We also understand how these things should be treated relative to each other, what's more important. In the classical world, mapping that down to a network configuration that says Gmail is more important is tough, because first of all, you only have the language of ports and bits on the packets to tell one from the other, and second, you have very few QoS levels. So if you have 2,000 applications, and 50 big ones, how do you map them? It's kind of hard.
We can take the control plane of our applications to tell the network what it should do where, so to speak. We know, let's say, for next eight hours the search guys want to do a large batch copy from here to here; we know they have a deadline but also that they can be interrupted at any time. No one needs to manually go in and configure the network to do that. That's been the biggest difference. Now, you don't need to be a network engineer to control a network; you can do it at a higher level -- an application level. That's been a huge difference to us.
As a side effect, that means you get better utilization and lower costs, but that's because you're using the network the way you want. That was hard to do before. Theoretically you can do it in classical network, but it's so much work and the change rate of how you want to configure your network is too high.
NWC: You built your own hardware for B4. Do you still use your own equipment?
Hölzle: It's still the case, but it's more an incidental choice because this is equipment that we already had that we built for a different purpose. At the time, there was no commercial gear that you could control with Open Flow. There was no choice. And I don't think the hardware is a key component in it, in a sense that we could do just as well with another switch. We haven't needed to upgrade yet. But our switch is all merchant silicon in it, there's no magic. There's nothing special, it's just a box that we made....All the value is upstream of the switch in the control plane.
It may very well be that when we upgrade this network that we might find another switch that has Open Flow that is just fine because it's built with the same chipsets and other stuff that we would have picked. Now it's different; there are choices.
NWC: Have you expanded your use of SDN?
Hölzle: Originally we only had our data centers interconnected. Now we have PoPs along the way to provide more efficient routing and failover for say, a trans-pacific link. Before you would have a very sparse topology: Here's a data center and it has a direct link the to a data center in Asia. But in reality, there are multiple paths. If you're not aware of that in B4 itself, you can't control it....We spent a lot of time on the manageability of things. For example, now I can pull up a page and its shows whether we're meeting our SLA for different classes of traffic....We're monitoring the packet flow and can explain now in end user terms what the network is actually doing.
NWC: What were your biggest challenges in implementing SDN?
Hölzle: Whenever you try something new, there are going to be problems with it....We were willing to take the risk to get the innovation. Our VP who runs our site reliability gave a great talk about not aiming for 100% uptime....The easiest way to make it be at 100% is to resist change, because change is when bad things happen. Looks great for your SLA, but it's bad for your business because you slow down innovation.... In the first year of running B4, [we asked] "Will we have an outage?" Realistically, yes there's a high chance because it was all new code. Are we going to be perfect? Probably not. You have to have a willingness to take a little risk.
We had an excellent testing infrastructure. It's what we're still heavily investing in, where we can simulate our worldwide backbone purely in software. That's makes it possible for you to try new software releases with low risk and evaluate that they still behave the way you want.... The simulator is high fidelity enough where you can point your monitoring system at it and it will look to the monitoring system like it's a backbone. That means you can also use it for training. To practice how to do an upgrade, you can do it in the simulator, and if you blow it up, well you learned something. No big risk. That made it possible to keep the bug rate low enough while writing code like crazy and implementing new functionality all the time.
That's one of the values of software-defined networking in general: You can now test your control plane. Today when you get, say a Juniper upgrade, the control plane is on the box, so your update actually is a completely new control plane and it has to work in a heterogeneous environment. Testing that is actually kind of hard, whereas what SDN and OpenFlow says what's on the box is easy. The control plane runs separately, more centralized. Since this is now software running on a server, you can test it using the same techniques you use to test software.
NWC: How do you see the role of the networking pro changing with SDN?
Hölzle: SDN is a new kind of networking, but it's still networking, so you'll have to change how your skills look like, but it's actually still in the same domain. Some routine parts disappear, but new more application-oriented parts appear. Before, you told your customer, "Sorry, we can’t do that." Now you have to say, "Yes, I can do that." Some lower level stuff disappears -- there's less fiddling with router configurations, but the higher level stuff appears and has much higher value to your customers.
The role of the network administrator and overall system administrator is kind of merging. It's more of a blurry line. At the same time, your understanding of network topology and packet loss -- those skills are just as valid as before. Networking is expanding like crazy; it's not a static world. Acquiring software skills is going to be a useful thing. At the very least, you need to understand your customer and how they think, but more likely the systems you will use to manage your network will be look more like software-defined configurable things. Not a CLI on a Juniper or Cisco, but maybe a little bit of scripting.