For arcane reasons, the official releases of Ethernet have followed a pattern of 10x the data rate of the previous version: 10 MbE begat 100 MbE begat 1 GbE begat 10 GbE. So you'd expect the next generation ought to be 100 GbE in a single link, but the vendors aren’t sticking to the rules anymore!
Now, we have companies like Microsoft and Google pushing for 25 GbE, and they’ve joined with Broadcom, Mellanox, Arista, and Brocade to create a special interest group called the 25 Gigabit Ethernet Consortium. The group's specification recommends a single-lane 25 GbE and dual-lane 50 GbE link protocol.
To understand the motivation behind this support for 25 GbE and 50 GbE, one has to look at the use cases in the cloud.
First, let's look at how this effort evolved. The industry tendency of late is to create four-lane quad links, which, taking 10 GbE as a base, has given us 40 GbE. This isn’t just a case of running four separate links in the one cable; data is spread across all four links in a round-robin, making for a genuine 40 GbE link.
Using the same quad technique, the industry is planning to deliver 100 GbE in the near future. This will be a big step up from 10 GbE in terms not only of performance but for pricing of the gear too.
The 25 Gbps links used in 100 GbE are attractive alternatives to 10 GbE, being faster for a relatively small increment of cost. This is creating a good deal of interest in the cloud service provider (CSP) space for an intermediate standard that can capitalize on the single-link performance boost that 100 GbE has brought, and the result is a renewed interest in 25 GbE as a standard.
The thought process has spawned a 50 GbE interest derived by using two ganged links. If all this comes to pass, we will have: 10 GbE single, 40 GbE quad and 100 GbE ten lane in production; 25 GbE single, 50 GbE dual and 100 GbE quad nearing production (one to two years); and 40 GbE single, 100 GbE single and 400 GbE quad high on the horizon. That’s a lot more churn than we are used to in the infrastructure business!
However, even as recently as April, the industry seemed to deprecate the 25 GbE/50 GbE idea. The IEEE meeting failed to ratify the creation of sub-committees, partly due to technical load, and partly as a result of strong vested interests. Some switch companies don’t want erosion of the very high-margin, top end of the market. Clearly, this has changed with the formation of the 25 Gigabit Ethernet Consortium.
Now, let's examine how cloud use cases are influencing this new push. CSPs buy huge quantities of servers each year, so server cost is a major bottom-line issue for them. With two 10 GbE links built into the server chipset, the cost for 10 GbE is minimal. The problem is that rich, highly networked instances are the highest growth area of the cloud, as users try to get a handle on big data, and the Internet of Things. Network storage requires very high bandwidths in these situations, and even dedicating a single 10 GbE link for storage isn’t adequate.
The result is that CSPs are offering high-end instances with local HDD and even SSD to compensate for network limitations. This brings operational issues, security nightmares, and additional costs to cloud usage. The alternatives to local "instance storage" include more 10 GbE links or a move to the next available Ethernet link. Today, that means the addition of either a dual 10 GbE or a single link 40 GbE adapter card, with a cost in the hundreds of dollars; 100 GbE ten lane is priced much higher.
In a nutshell, none of these scenarios is very attractive. All are expensive, drastically impacting instance marketability. There is no relief from this by using a single-link 100 GbE, for example. That technology is still very much in flux and quite a ways into the future, and it looks to be priced much higher than current technologies, due to the optical technologies it will need.
Thus for the CSP, 25 GbE makes a good deal of sense as a next step for in-rack connectivity. The price curve will be much the same as 10 GbE and the endpoint will be low commodity pricing. Companies like Mellanox and Broadcom will provide the motherboard-down NIC chips for ODM-designed motherboards, and while there will be some cost uplift, it will be relatively small.
In the bigger picture, these intermediate speed steps also make sense. IEEE has always taken a five- to seven-year timescale for the tech refresh of 10x performance. Modern design cycles are much shorter, and in fact, CSPS are replacing systems on a three- to four-year cycle. This puts the IEEE cycle out of sync with need.
On the technology side, we have to accept that single-link and quad-link technologies are different approaches. The quad is excellent for backbone inter-switch links, and for storage appliance connections. (Those all-flash arrays need all the bandwidth they can get!) Acknowledging that they have separate roles and roadmaps will clarify the situation immensely.
With separate roadmaps, and accelerated timescales for replacement and renovation of infrastructure, 25 GbE makes a good deal of sense, although 50 GbE is somewhat more doubtful. Let’s hope that we don’t have a turf fight between IEEE and the consortium over “ownership” of the right to issue a specification.