How ‘Hyperscalers’ are Innovating — and Competing — in the Data Center

Excerpts

[1:20] Nick McKeown: If you were to cut a lie vertically through the United States and then look at all of the Internet traffic that was passing from left to right and right to life — we call this the bisection bandwidth of the Internet, it’s essentially how much capacity the Internet has crossing the United States — that is less than the amount of traffic going through a couple hundred servers inside one datacenter. It’s just an inversion — the whole model of how communication takes place has really been dominated by the inside of datacenters. ... So it means that the industry itself is focused on how to serve that market and so the majority of the silicon that is being produced — whether it’s the CPUs, whether it’s storage, whether it’s networking, for the networking interfaces or the switches — it’s mostly targeted on that datacenter business just because of the sheer scale of it.

[3:25] Martin Casado: The hyperscalers are well-known for building their own datacenters, everything from [?][3:29] their own servers, to the way they do cooling, to the way they do real estate, to the entire software stack. And the reason is because they’re pushing these technologies at scales and efficiencies that they just weren’t designed for and that’s required a pretty large redesign.

[4:05] Nick McKeown: The early datacenters, some of the first attempts, they were made out of off-the-shelf computers they would just order from companies like Dell and Compaq ... and they would arrange them into racks of servers, perhaps 30 to 40 in a rack. And then they would hang as much disk storage as they could off it, put as much memory into them as they could. And then they would string them all together with networking equipment they would just buy from regular equipment vendors. ... So that’s what the early datacenters looked like. And as they grew, there was fear in their eyes because they realized they were building systems at such phenomenal scale that no one had ever built before that these systems just weren’t going to be able to scale up to the size that was needed.

[5:05] Nick McKeown: Now in the datacenter itself, hundreds of thousands of servers are connected together by thousands of switches, and each switch decides where it’s going to send these packets of information next so that they eventually reach the correct server. Today these switches are made by a single silicon chip called an ASIC.

[7:00] Nick McKeown: As these datacenter companies get bigger and bigger, they’re naturally going to consider whether the silicon that they can buy will best support their customers’ workloads or if they would do better to modify and design them for themselves. This is a natural question as they get to a scale where they’re operating at enormous economies of scale but dealing with very ferocious competitors — the other hyperscalers. And in order to be able to compete with each other, they’ve got to be looking for not only cost reduction but ways to differentiate in terms of the performance that they get.

[9:10] Nick McKeown: So today the general rule of thumb is it’s 150 to 200 million dollars to contemplate building an ASIC and you need a team, typically of a couple hundred people, and this is not something you want to do lightly unless you’re doing something in extremely high volume or you’re going to derive immense value from the exercise. So in most cases, it sort of benefits the industry to have technology vendors who are building these chips, taking on the great investment and the great cost to do this, and selling to multiple customers. Herein lies the problem — if the costs favor having a technology vendor who is selling to multiple customers, they have to figure out what is the product they are going to build. And so if that product has a fixed functionality for all of those customers, it leaves the customers with no opportunity to differentiate from each other. And so where we’ve seen this change first is with some of the accelerators that hyperscalers have developed for themselves, or in the customer-specific network interfaces, network interface chips and switches that have been developed by some. However, there is an alternative way of doing this that Martin was just alluding to and that is, instead of making it a fixed function, make all of those devices programmable as well so the hyperscaler can determine how they’re going to introduce new ideas to differentiate from their competitors by programming the device themselves.

[10:45] Nick McKeown: This has been the trend we’ve seen over the last few years. It’s essentially software eating the world of the datacenter as a whole for the infrastructure as a whole. And this, at this point, seems inevitable. It’s the natural next step beyond disaggregation of the equipment by breaking it down into chips that are programmed, rather buying vertically-integrated, complete solutions. ... I’ve been involved in developing some networking ASICs exactly for this purpose and I’ve observed that over the last decade that as the hyperscalers want to have more control over how individual packets are processed, then they will do interesting, crazy, sometimes wild things that I never would have thought of, that their competitors don’t think of. And so they’re actually diverging and going off in slightly different directions. That’s the nature of competition and the first time you’re seeing that real competition taking place in the infrastructure of these hyperscalers.

[15:25] Martin Casado: One way to view this is the following, which is one of the reasons, perhaps, why the networking has never been programmable is because no one’s wanted to program them. Computers were programmable because everybody had their CPU writing applications, but there never were any gains from programming networking. But the hyperscalers get a very specific benefit from programming networkings — Nick said it perfectly, which is basically that all of the world’s Internet traffic, a large percentage, is actually happening within the datacenters. Latency is a competitive advantage. I’m sure Google has some algorithm that shows how much money they waste for every milisecond of latency, and a lot of that latency is going to be network related. And if that is the case then this is, certainly for their internal operations, very important. So now you have a large set of companies, that have a specific interest in programmability and they’re putting pressure on the industry at large for programmability, and that’s happening now.

[21:50] Nick McKeown: Another big thing that’s taking place is that as these systems turn into programmable platforms, big programmable systems where the infrastructure itself is entirely programmable, the software that they’re using is increasingly made up of open source. Open source which, frankly in the infrastructure industry, apart from Linux, was considered a bit of a joke 15 years ago, is now taken as a given that the majority of the infrastructure will be built by open source. ... And this is because a lot of the pieces of the infrastructure are necessary but non-differentiating, and so it benefits them to get that sharing of many, many people from outside their own company working on this together.

[23:25] Nick McKeown: Cellular networks in the past were like these walled gardens — the industry was very protective of the way things were done. But 5G was deliberately designed and architected to feel very much like the rest of the Internet. And the reason people get very excited about 5G is that it’s essentially going to replace a lot of the wireless communications we use today. WiFi, ways of connecting within factories, connecting all of those new IoT devices ... All of that is being built of the same ideas — programmable, disaggregated, low cost, open source software that is written and owned by the mobile operators. So 5G is becoming software-defined as well. So you get this entire infrastructure inside the hyperscalers, all the way out to edge computing, all the way out to 5G — all of this infrastructure is going to be defined by software running on programmable hardware. That is going to change everything because it’s going to open the flood gates for massive amounts of innovation, which is exactly what we need.