
The Art of Network Engineering
Join us as we explore the world of Network Engineering! In each episode, we explore new topics, talk about technology, and interview people in our industry. We peek behind the curtain and get insights into what it's like being a network engineer - and spoiler alert - it's different for everyone!
For more information check out our website https://artofnetworkengineering.com | Be sure to follow us on Twitter and Instagram as well @artofneteng | Co-Host Twitter Handle: Andy @andylapteff
The Art of Network Engineering
Making Ethernet Cool Again—The Ultra Ethernet Consortium
With the rise of artificial intelligence and its significant demands on network performance, experts are increasingly asking whether Ethernet can catch up to InfiniBand as the preferred choice for high-performance computing.
Join us as industry veterans Mike Bushong and Dr. Jay Metz share their insights on the necessity of open, flexible, and scalable networking solutions. Discover why many believe that Ethernet could be the platform of choice moving forward. As they dissect the need for innovation, collaboration, and competition within the ecosystem, our guests provide a forward-looking perspective on the future of network engineering.
This engaging discussion ultimately encourages listeners to rethink their understanding of Ethernet's potential, examining the increased complexity required to meet the evolving demands of modern computational tasks like AI and HPC.
For more details about what the work UEC is doing, go to https://ultraethernet.org/
Find everything AONE right here: https://linktr.ee/artofneteng
This is the Art of Network Engineering, where technology meets the human side of IT. Whether you're scaling networks, solving problems or shaping your career, we've got the insights, stories and tips to keep you ahead in the ever-evolving world of networking. Welcome to the Art of Network Engineering podcast. My name is Andy Laptev and I'm here to tell you that InfiniBand will always be the standard for AI and HPC workloads, and Ethernet will never support the lossless fabrics these workloads require. I mean, it's obvious, guys, right If you look at market share that what InfiniBand has in any AI HPC workloads.
Speaker 1:I was telling our guest Jay before the episode that I was down at Frontier Supercomputer last year. It's the world's fastest supercomputer at least it was then and they are running all kinds of flavors of InfiniBand. There's a little bit of Ethernet for management, but I don't know. If I look at the market, infiniband seems to have it and people like to say don't bet against the Ethernet. But I'm just not feeling it. So I have brought some luminaries in here. Mike Bouchon we all know, mike, how you doing. Mike, I'm doing well, good to be here.
Speaker 3:And Jay Metz, or Dr Jay, as I've been instructed by his public relations firm to call him- Jay, why don't you tell the folks who you are quickly and what you do and why you might be offended by what I said about Ethernet? Oh, offended is not the right word. Pleasantly amused, I'd have to say so. My name is Jay Metz. I am a technical director at AMD. I am the chair of the steering committee for UltraEther, which is an organization that is pulling together to create a tuned Ethernet for AI and HPC.
Speaker 1:Let's get to the problem right. What's Ethernet? I mean, I've had people tell me like well, rock V2, and there's all kinds of tweaking and playing you can do and you need special cards plugged into servers to do that stuff Like. Is it true that Ethernet out of the box, without a ton of work and customization and nerd knob turning, cannot create a lossless fabric that can support modern AI HPC workloads? Is that a true statement?
Speaker 3:I think probably that would be a true statement. So let's define our terms just a little bit though, because knowing where you are in the system makes a big difference into whether we're going to be comparing apples to apples or some other. Generally speaking, when we talk about these kinds of workloads, there are effectively three different types of networks. You've got your general all-purpose network, where you've got your land traffic, your WAN traffic, oftentimes your storage traffic, and then you've got the actual AI or HPC network, which is a backend network, and that backend network actually can be broken down into two different types of networks as well, which we call scale up and scale out. So that gets a little confusing for some people. So if you'll give me two seconds, I can try to explain what it is effectively, and we'll use primarily AI for the examples here, because it winds up being a little easier for people to grok than the specific use cases for a particular HPC kind of thing. They're, more generally speaking, people more familiar with the chat, gpts and that kind of stuff. So when I say scale up and I say scale out, what I'm really referring to is the fact that I've got these accelerators, gpus, tpus and so on, and I have to put in all the data to get its work done, and sometimes I have to make it. I have to get these things to communicate together, and I spread out all of these different communications. I set the workload from one to another, to another to another, and I call that scale out. And so ultimately, what that really means is that the larger these models wind up going, or the more work that has to get done.
Speaker 3:You want to put in more GPUs into the system, so you have to connect them together, so I scale that out. Sometimes the workloads themselves are pretty big and you can't fit them inside of a single GPU's memory, and so I need to put multiple GPUs together and pretend it's one huge honking GPU. In order to do that, I've got to network those little suckers together. I call that a scale up network. So my scale up makes one big, huge honking GPU, and a scale out network connects all those big, huge honking GPU, and a scale out network connects all those big, huge honking GPUs together. And so what we're doing at UEC is to create the scale out network that creates the interconnection between all these big, huge honking GPUs, and that's the difference between what we're trying to do.
Speaker 3:We're not talking about your general purpose data center, you know, connect your you know, your home directories into your laptop, kind of a thing. That's. That's not the kind of stuff that we're talking about. We're talking specifically about a purpose-built network for a type of workload. Just so happens that AI is a type of workload with many subtypes and HPC is a type of workload with many subtypes, and they all have their own little requirements. But we're working specifically on interconnecting a lot of these different backend GPU or TPU accelerators together.
Speaker 1:And I was surprised to learn that AI has been around what? 60 years or something like that. I mean, I thought it was a new technology when the whole chat GPT thing happened, like oh my God, look at this, this is amazing. But I guess what I'm surprised at and maybe it's just because of the explosion of LLMs, now that you know Ethernet's been around a long time, ai technology, I guess, has been around a long time, but now we seem to be at a crossroads of like uh-oh. I mean, you know the UEC was created to try to shore up some of the shortcomings of Ethernet, so they could support everything you just described.
Speaker 3:Is that fair? Yeah Well, I mean, like everything else, there's scaffolding involved, right, there's the ability and the approach to solving particular problems, but the AI consideration is old I mean really, really old and it also has been frighteningly prescient. I mean, if you've ever seen the movie Colossus, you know the Forbidden Project, one of my favorite movies by modern standards is very slow and plodding, but it is one of the scariest movies I've ever seen in my life because it is so accurate, and I think that it's one of those things that we should probably be taking as a moral lesson and an ethical lesson in the work that we do. I know I certainly take it with me whenever I go into the conversations that we've got about the unbridled passion for the halcyon days in the future, but the concept of the AI and how it works has been around for quite a while. What we are trying to do here is actually go underneath that workload process, though, right, we're trying to understand the infrastructure requirements to make that happen, hopefully in a positive way, because when it turns out that when you start to look at how these messages get passed back and forth across a network from one device to another device, one endpoint to another endpoint, you start to realize very quickly that the amount of nuance and the variability in all of these different functions is very difficult to control right.
Speaker 3:So you have to create this sense of flexibility while at the same time creating rigid boundaries inside that allows these kinds of traffic flows to happen unimpeded. And that's where things can get really complicated. But at the same time what you want to do is you want to create an environment on the network that allows for the rapid free flow of information, quick fixes when there's a problem and information about telemetry that allows the devices to be able to handle the problems that happen when they happen, without the need for manual intervention. All that kind of stuff sounds like if it sounds like all the old self-healing networks from the past. There are some parts of that that you're probably, you know probably in there, and I think some of us have the battle scars for that. But the principle is there. You want to use the proper telemetry to get the proper tuning up and down the Ethernet stack to get the workload to work properly when you're talking about really large numbers of devices.
Speaker 2:So on the Ethernet side, I mean, do you see Ethernet versus InfiniBand Like, is Ethernet adding stuff to try to, I guess, get to parity with InfiniBand, or do you think that these are like fairly decoupled technology streams? They'll kind of overlap in different areas but they're going to pursue their own ends because they can do fundamentally different things.
Speaker 3:Well, they, in some ways they solve the same problem and in some ways they don't, Right? So one of the things that happens with InfiniBand and and let me, let me just, I mean I don't want to sit here with a cup of coffee and the sign that says change my mind. But I'll leave Andy for that, for that role. I like InfiniBand, I do. I like the technology, I like the way that it has been the gold standard for high performance networking for a very long period of time.
Speaker 3:And it solves a problem in a very good way. I am not looking to beat or kill InfiniBand by any stretch of the imagination. As a matter of fact, I think for the end user, for the consumer, having options for whatever tool they need to use for whatever job they have to complete is going to be in their best interest. So I'm not looking to kill or defeat anybody in any way, right. Having said that, I do know that there are approaches that have natural limitations, right? So what we're looking to do is we're trying to solve a problem that has emerged over the last couple of years, not just for AI, but also for HPC, where the number of devices are just growing by a factor. So we were talking about 10,000 devices, now we're talking about 100,000 devices, 250,000 devices, because devices now is a different thing, right. We used to talk about initiators and targets and endpoints and cards and switches. We're not talking about those anymore, because every initiator, every target, every switch has multiple endpoints on it. Every GPU has multiple endpoints on it. I mean, all these things are exploding because we're moving the trust boundary of what constitutes an endpoint further and further and further into the processing core. Now, that means that we have to rethink the entire end-to-end solution and if I'm going to be talking about a million endpoints which is what we've been doing for UltraEthernet I have to make sure that those million endpoints are being treated equally across a network that is also being treated equally.
Speaker 3:Now that means that the problems that we've been trying to solve with Rocky, the traditional RDMA, the stuff that InfiniBand does, has a difficult time trying to get to that level.
Speaker 3:I mean InfiniBand, for example, has a 16-bit lid, which means that it only has about 48,000 devices or endpoints that you can put into a single subnet, which means you have to create multiple subnets to route between them, which creates questions of latency and topologies and that sort of stuff.
Speaker 3:That's fine, there's nothing wrong with it, but it does mean that there are some people who want to take the opportunity of saying, hey, look, I want to try to do this with a different type of an approach, and UltraEthernet is the way that's doing that, in an open ecosystem kind of a way that allows people to say, hey look, I can build upon Ethernet, I don't have to create a brand new proprietary network, I can stand on the shoulders of giants, I don't have to modify the things that people already understand. But I can do the tuning that goes up and down the stack that would otherwise be considered a layer violation and verboten right. So ultra-ethan allows us to take the opportunity to solve some of the problems of scale, scope, expanse and an equivalent treatment of the traffic in those kinds of environments for those types of workloads, and so that's one of the reasons why we're trying to say, look, we're just taking a different approach to solve problems that have emerged as the scales have gotten bigger, not to say that, you know, infiniband is necessarily in our target.
Speaker 1:Did you say one million endpoints is the goal of? That's how many endpoints that you're Dr Evil? One million?
Speaker 3:endpoints is our starting point. We, and the reality of it is that we actually. Now I'm starting to wonder if that was too small, right, so? So, let me, let me, let me try to explain a little bit why I'm saying that. All right, so, from a practical nuts and bolts rubber to the road, these models, and we'll talk about large language models and, believe me, those aren't the only kind of models that are in play here. Right, you know, large language models are not the same kind of models that you see. We have the video out there or the audio. That's a completely different type of AI in terms of the way that the infrastructure works. But we're not even talking about that. We're just talking about relative good, old fashioned, two years kind of way.
Speaker 3:Llms All right, if I've got the number of parameters in an LLM, it started off with 70 billion parameters. I could fit that onto a laptop Really Realistically parameters. I could fit that onto a laptop really realistically small, small, right, right, or. But I could definitely do inside of a server. I can do if I, if I move it up a little bit, if I move it from 70 to, let's say, 200, right, that's a little bit different. Now I'm talking about a small cluster. If I go to 405, that's a little bit even more, because the reason is I got to fit a model into the memory of these accelerators and they don't fit. It doesn't matter which GPU manufacturer you're talking about, they don't fit. You've got to be able to create these things work together. But the larger the models are, the more you have to do this swapping in and out of the memory. That means that the network has to be good enough to be able to create that swap in and out of data, which also means that the knock-on effect of this is pretty significant as well. Right? So I've got my storage network, which is usually on my front-end network, nowhere near my back-end network. My front-end network now has to go a lot of swaps in and out of my back-end network to my front-end network. So now I've got to accommodate storage in a persistent fashion which gets closer to the memory. Memory and storage is colliding, is becoming very similar. All of these things are happening at the same time. They're all going at the same time and they all have to be accommodated at the same time. And so if I get to a point where I've got a trillion parameter model right, and we're right around the corner from a trillion parameter model right.
Speaker 3:We've already seen people talking. Lama just came out, you know, the new Lama paper came out with a 405, right they're working on on they're actually working on an older version of the loms of the conversations that I've been hearing people talking about what they? They want massive amounts of data to train, huge amounts of data to train, but you're going to have to put it somewhere. You're going to have to move it somehow, because each of these different gpus have to be able to have all of that information when it needs it, where it needs it, at the time that it needs it, reliably and safely, so that you don't lose data and you don't lose precious nanoseconds and trying to move the bits around.
Speaker 3:That's what we're trying to accomplish, and the way that we are addressing it in UltraEthernet is that we're looking and saying, hey look, we've got a lot of flexibility in all of these different places in the stack. Right, we've got a standardized approach to solving problems up into the workloads and down into the hardware, and that interface is the network. The network is the compute process for what we're looking to accomplish here, and so all of that means that whether we're talking about AI workloads for LLMs or video or audio or HPC environments, you can tune the Ethernet based upon the semantics of that requirement in an ultra Ethernet environment to get the best performance that you need at that particular time. And that's the goal that we're trying to take, because those are going to require a lot of devices to be able to move that data around.
Speaker 2:I had a couple of, I guess, questions around that. So do you see, like the, I guess, what UEC is doing? So spoiler alert I actually think UEC will be successful. But there's kind of two things you've mentioned and you kind of somewhat cleverly mentioned it, by the way You've got a bunch of technical challenges and you sort of dropped in the open ecosystem comment and it was. I think people might've missed that. By the way. Do you think that UEC goes and succeeds on the back primarily of the technical capabilities or on the back of the open ecosystem part? Like I know it's kind of a 1A, 1B thing. What do you attribute the momentum to? Because I do think there is momentum in the industry around it. But I think, depending on what you're trying to do, you might look at one thing or the other thing and kind of value it more.
Speaker 3:Well, in my position as the chair, I don't have the luxury of second guessing people's motivation. All of the major graphics processing unit manufacturers are involved and participating, and we're happy to have all of them, and we're happy to have all of them, and I think that their own particular motivations are ultimately to solve a particular problem. That is not and the truth of the matter is that it's not really fair to assume any one particular person or company or group is going to be able to come up with an answer that's going to solve everybody's problem right. That's an incredible burden to place on somebody. In the first place, regardless of how you feel about it.
Speaker 3:I do believe that the approach to solving a problem is when you get an emergence of a similar way of thinking about solving a problem right. The good teams, the good squads, the good approaches to addressing an issue never come through based upon political strife. At the same time, healthy competition is always used right. Competition of ideas, competition of dissent, competition of, you know, approaches All of these things are ultimately going to be good if the end goal is clear in mind, and I do think that having an open ecosystem where people can feel free to contribute because they've got a vested interest in that outcome, they can have a voice and they don't have to say well, I'm going to have to take whatever I get. That's going to be a good thing as well, and I think that that's why we have as many different companies inside of UltraEthernet as we do, including all of the players that you'd probably expect would want to have proprietary systems.
Speaker 2:I actually think the open ecosystem piece is probably the biggest part of UEC, because we're probably at the first time in networking in the last 25 years where the future is not particularly certain, right? I mean, we were on a 20-year path where it's like you know, you're going to double the capacity, yeah, right. So it's like and even if you look at like the major network silicon providers, right, whether it's custom silicon, merchant silicon, whatever, I mean, there's a pretty well-worn roadmap where everyone's trying to hit specific targets based on what obviously comes next. I think when you get into AI, the explosion of endpoints, I don't think people really understand the order of magnitude we're talking about and the size of these data centers. To give people just a little bit of a background a large data center not in the US, let's take the US out, because the US is a little bit weird In other geographies. A 20-meg, a large data center, not in the US, let's take the US out, because US is a little bit weird, you know, in other geographies. You know, a 20 megawatt data center is like a good sized data center, right. A 60 megawatt data center, a hundred megawatt data center, is like a massive data center. People are building gigawatt data centers now, like it is.
Speaker 2:When you said it's like adding, you know it's an order of magnitude. I mean people need to get their heads wrapped around that. It's huge. And when you do something, that's that much bigger and, frankly, around a technology where it's not obvious what comes next, I mean the rate of change is crazy. And so the open ecosystem to me represents optionality. It's like you know what if, if you go all in on on you know one particular direction, ignoring whether it's a single supplier or whatever right, you're limited to whatever that direction can provide.
Speaker 2:I think when you go in and you say we're going to allow people to to compete with their ideas to, you know, maybe be a little bit speculative in their approaches, I think what that does is it provides optionality, and I think that's the value. And then obviously there's some benefits that come out of competition. Right, I mean, the number one driver of economic advantage is competition. When people are forced to compete, what you see is a level playing field and then people have to. You got to step up and I think that's good for everybody. But that open ecosystem piece, I don't know. I think if you did all the technology bits, but you didn't have the open ecosystem piece. I think the value prop is more than halved. I think that open ecosystem to me that's where UEC really shines.
Speaker 3:Yeah, and I'm certainly not going to disagree with you there, not just because of the fact that I agree with you. I do think that history has borne you out. I think that where there are technologies that I have worked in and still continue to love, but the fewer the players that are involved, you know, eventually you wind up with the last buggy whip menu, right, and, quite frankly, even if it's a really good buggy whip, you know you still need the buggies and, quite frankly, people will. It's a really good buggy whip, you know you still need the buggies and, quite frankly, people will come out. There's always going to be a room for innovation in that regard. So I think that ultimately, we want to make sure that there is an encouragement of this kind of openness and openness. I think we actually it's a really good point, mike I think we need to identify what openness means. I think we actually it's a really good point, mike I think I think we need to identify what openness means. Okay, openness is is one of those terms that is so overloaded now that I don't think people actually quite get what that's supposed to be.
Speaker 3:Openness is the ability for anybody who has a vested interest in participating. That's what openness means. It doesn't mean that you are just given things and it doesn't mean that you can just throw things out and then people have to take it. It means that there's a marketplace of ideas and where you get the opportunity, the chance, to take a broad stroke of your peers and try to persuade them that your idea is a good one. And that is what open really means. Not that not necessarily that you are going to have to take something or that you have to give something. It is all about the ability to get that opportunity to put your idea out there and to accept other people's ideas in that marketplace.
Speaker 2:How do you avoid this from turning into like I mean the standards bodies, I think, but a fairly glacial pace, you know so. And and AI, I don't think we'll tolerate that. And so the question really is like how? How do you give everyone the opportunity? They compete? Everyone has their own sort of perspectives and, in some cases, interests. How do you prevent, like an ecosystem-type environment from essentially devolving to? You know the lowest common denominator and you know so. This stuff arrives, you know, sunday, after never. How do you handle that?
Speaker 3:It's a it.
Speaker 3:That's a very fair question. It's a combination of right. So we talk about the technology, but what you're asking about is the people right. So the thing is that you need to have the combination of the technology and the ideas to put forth with the people, who have not a herd of cats mentality. They've got to be able to have a vision that they can all agree to and subscribe to. And then there are guidelines and boundaries that we place in play to allow the companies to do the work that they need to do, and it's not.
Speaker 3:You know, mediation is often the art of pissing everybody else equally Right. So if you want to, if you want to create an equal playing field for everybody, nobody winds up being super thrilled because they can get anything they want because they can't. And that takes communication, that takes constant, you know negotiation and persuasion. That that takes. You know the ability for somebody to come in and say look, these are the rules that we've all agreed to play with. And then you have the referees to be able to do that, and I will say that UEC is actually very good at doing that.
Speaker 3:We've got a strong, very strong leadership. Not every standards body does, but we have a very strong group of companies and a very strong group of people who are actually in the technical advisory committee, which is the technical arm for the steering committee, and we have a very good set of chairs for each of these different work groups who are dedicated to the cause, and so we have a series of checks and balances that go on into the organization, that are built in from day one, so that people know exactly what to do when they need to do it, and long, long list of preparation in order to be able to do that. It's it's. It's one of those things that is the unsung part of you know, any standards organization. It's all the parts where, well, what do I know? How would I have to do next? Right, who do I have to talk to? Knowing that in advance has solved a lot of problems before the even came up, and then you can actually do the technical stuff. It's knowing the right pit crew for that race.
Speaker 1:And there hasn't been like infighting or personality disorders coming to light and right, like when you get a group of people I've talked to Russ White and Radia Perlman in the past about. I remember Radia going on I don't know if it was Spanning Tree, but she she told a really interesting story to me about just the you know, the egos in the room and how certain people wanted certain things done. So the open ecosystem, to Mike's point and amazing right, but I can't imagine all these networking vendors agreeing on anything that's going to they don't have a choice though, right, right.
Speaker 2:So I think what's driving this, which is different than what you see in some of the protocols work, the protocols work initially actually got off the ground pretty quick. It wasn't like initially, the standards bodies didn't move glacially. They moved that way when it started getting into some of the more advanced stuff that was maybe more speculative, a little bit more niche. I think when there's forced adoption, that's happening when you have this kind of a catalyzing event, someone's going to show up with a solution. If you spend all of your time fighting, no one shows up with a solution and the person who goes around the end sort of wins out.
Speaker 2:I think the pace of, honestly, if you look at just the amount of money that's being spent, I mean there's strong commercial reasons for people to kind of figure it out together. I don't think that the market will wait and you know. And so if either people come together and figure it out or they don't, and if you look at, kind of the folks in UEC, like not, you can't have that many people involved and have them all in a dominant, incumbent position. So almost by definition, it's in most by volume of people, it's in most people's best interest to work it out so they can sort of get a place at the table. I think that's the thing that's different than some of the standards work over the last 15 years or so, and that's why I think that's I'm sorry to interrupt Go ahead.
Speaker 3:Please continue. No, it's okay, go for it. I think you're absolutely right, because when you look at the scope of the problem we were talking about tuning from the physical layer all the way up to the software that just sheer scope of touchpoint no one company can do it all. They just can't. You can't do it all. I mean, there is so much stuff that has to go on that, if you're going to, the barrier for entry for any new company is immense. There's barrier for country barrier for entry for existing companies is immense.
Speaker 3:There's a reason why companies who have never been part of standards bodies are now part of UltraEthan is because it's that big of an issue, right? And so if you're going, it's not about the iteration of where you're already now, it's what. What are you going to be putting out that comes out to compete with what you're putting out now? And everybody who is a part of uec at some level in the back of their head, I believe I'm not a good mind raider, so you're going to have to take this with a grain of salt but everybody believes that that if they're going to be in a competitive. They've got to be able to find who their allies are going to be on the technology level to work with, both above and below the osi layer that they're working on, and they can't invent every single nerd knob as, as andy was talking, you've got to work together in order for this to come together. Come out, or just you might. You might as well just pack up and go home, I think.
Speaker 2:Well, can you for folks who aren't, I guess, familiar with who's in UEC, without you don't have to go through the names of folks? But? But you know, it's not like it's I mean ethernet's in the name but it's not like it's a bunch of networking vendors, I mean, as you mentioned it, in that full stack. So people have like a mental model for how expansive an effort this is, because I do think this is pretty unique. I mean, the amount of technology that's represented by this group is crazy.
Speaker 3:Okay, so let's just take a very basic infrastructure mental model here. You've got two devices that are connected through a switch. Well, if you're workload, you're going to have to have the software right. So you have to have the software interface into the network, right. So we got to have the software on one device and you have to have a similar software interface on the other side. And then you've got to have the actual ability to identify how you're going to formulate those packets. So you're going to formulate those packets, so you're going to have to be able to understand that in the network architecture, whether it be in a NIC or inside of a chip or something, you've got to have the actual where to put the bits in the right format you need. Then on the other side you have to go all the way down that stack. So you've got the software, you have the power to run the GPUs and the CPUs. You need to have the PHYs and the SERDEs at the network level. To be able to connect that onto a wire, you need to have the cable that goes to a switch. You have to have that switch to another cable, another PHY and a SERDEs go off to another network interface card. And then you've got to be able to take that at a high enough bandwidth speed back into a processor and into the memory that has to go. You have the memory component that is tuned to this kind of workload and can handle the type of addressing you know synchronization. That goes along with the networking stack, and then you have to be able to chime that back up into chimney, that back up into the software stack on the other side. All of those different pieces of the puzzle have to exist for one packet to work in one workload at any given point in time. Then you have the issue of what happens if I got multiple devices and I have to be able to negotiate across that network and across those links and across those wires to make sure that I know exactly what's going to wind up happening in a lot of these different systems, because now I've got to have a traffic cop that goes along with it. Then you have to be able to understand how I'm going to configure all of these different things.
Speaker 3:What's the topology considerations? That's another element, that's an entire art form, all in and of itself right. That has a lot to do with switching, it has a lot to do with cabling. It has a lot to do with patch panels and so on and so forth. So the cabling folks, the signaling folks, the power folks, they're all part of this as well, for that very reason.
Speaker 3:And then you have the idea of well, where are we going to go? How do we make this forwards compatible? Where are we going to go next year? How do I add things into this system moving over time? Right, because we're used to in HPC, for example, we're used to wholesale budgeting on one go.
Speaker 3:Right, everything from your storage to your networks, to your compute. It's all in one build of materials to your compute. It's all in one build of materials. That's how Frontier was built. That's how anything is built in terms of high-performance computing. That's not how Ethernet networks are deployed inside of regular data centers, your normal everyday mom-and-pop data centers. You've got a budget cycle for your compute. You've got a budget cycle for your network. You've got a budget cycle for your storage and they're never aligned right, no-transcript scaffold into something there? That's a different question. So I now got power architectures that I have to take into consideration. Now we don't get into the power stuff at UEC, right? That's not what we do, but we do consider ourselves to be everything up to that point, because we need to consume that power and distribute it appropriately across the system, which means we can't do this in a vacuum. We have to be able to understand what the consequences are, but all of these things have long-term consequences that are going to affect a lot of different companies and a lot of different.
Speaker 2:Does the scope of that become, I guess, a risk? Different companies and a lot of different.
Speaker 3:does the scope of that become? I guess, a risk? It's always a risk, um, I mean the. So what? We've what? What's happened?
Speaker 3:We've always come up with ways of more interesting, clever ways of solving problems. We don't have enough room inside of the memory for our gpus, so we create parallelism, right, we create different ways of handling it. But that's not a cure-all, that's not a panacea for the problem, because when you create parallelism, you introduce other types of problems. You create overhead, you create systems. You have to have additional latency between these different parallels. So you've got pipeline parallelism, you've got data parallelism, you've got tensor parallelism. They all have their own tradeoffs.
Speaker 3:It's all about mitigating those tradeoffs, and anybody who comes up with a better way of mitigating is going to be successful for the short term.
Speaker 3:So I think that we're going to eventually have power problems. We're just not going to be able to power a 20 trillion parameter model, so we're not going to have 20 trillion parameter. We're going to have to come up with some other way of addressing that need in a different way, just because of the fact that there's not enough metal to bend in order to make that happen, let alone the amount of nuclear requirements that goes along with it. I think that we try to solve the problems that we have in our hands and we try to see what we need to be able to do. Our focus right now is to say, if we were to get to that point from a network perspective, what are the problems that we have to solve and what are the things that we can control. And that's ultimately what we're kind of focusing on, because we don't want to lose the scope of our own abilities by spreading ourselves too thin With the insane growth of everything you just said in AI.
Speaker 1:Is there a finish line for the UEC? Is this an effort that's never going to end? I almost envision you come up with IPv6 and we think, oh, we'll never run out of addresses. And then something happens like, oh crap ran out of addresses. I mean, will Ethernet is what the UEC building? Do you think that'll be good for a very long time? Or could we hit a wall of like uh-oh, we didn't foresee this other explosive thing happening. We hit a wall of like oh, we didn't foresee this other explosive thing happening Like now we're up to 10 bajillion endpoints Whoopsie, okay.
Speaker 3:So there's a couple of different ways to answer that question. I believe that there are enough problems with physics right now that are yet to be solved, that there is a very long runway for work that can be done inside of all trees. We've had to deliberately put a pause on a number of the things that we have to do because they're just, quite frankly, outside the scope of our bailiwick. That's why we're working with organizations like SNIA and OCP and IEEE, because we don't want to be an island, right, we want to work very closely with a lot of these other organizations and the ecosystems because they're solving the problems that are going to affect what we're doing and what we're doing is going to affect.
Speaker 3:So the way I see it and the way that I've been approaching the leadership of UltraEthanet is that we have a job to do, but we are not right.
Speaker 3:You really need to understand your role in the world and you know underestimating your role is equally as dangerous as overestimating.
Speaker 3:You really need to be able to stand where you fit so that you can do the best possible contribution, not only to what your members are doing, but to what the industry and then consumers are doing, because if you do the best, buggy whip again and no consumer needs it because somebody's figured out a better way to do it and you weren't paying attention to the industry, well, that's your fault, and so I'm trying to take that perspective on the way that UltraEthernet works from a bigger picture type of. So I think that, ultimately, the problems that we're seeing, that are being resolved in storage and memory, addressing and topologies all of these different things that we're not really focusing on now are going to be extremely important in the future. So, as far as I can tell, nobody's come up with a roadmap of problems that has an endpoint, and as far as there are problems that affect what we're looking to do and the partners that we've got in our ecosystem, I think we're going to be around for quite a while.
Speaker 1:That was my take on it. It seems like it's going to be. I don't think the UEC is going away anytime soon. It's going to be an ongoing effort for a long time. It seems like.
Speaker 3:Yeah, and I'd like to get 1.0 before we start talking about shutting it down.
Speaker 1:Yeah, yeah. When's the? You have a V1 coming up? Are you allowed to say when that?
Speaker 3:Yeah, we're anticipating a likelihood by the end of Q1 of this year, 2025. Yeah, We've been like I said, we've got a lot of people. We've got over. I think we've got 14 or 1500 individuals now. We've got about 120 companies in a little over a year and we have eight different working groups, not including the technical advisory committee and the steering committee and the marketing committee and all that kind of stuff. But there are a lot of people working very, very hard on getting this thing out as quickly as possible.
Speaker 1:I was just looking through your working groups on the website. It's amazing. I didn't realize you had a group working on physical layer, another group on link layer, another on transport. That's why you're just breaking the problem into manageable pieces and putting some of those hardest people on it.
Speaker 3:Yeah well, so that's a really good point, because when networking, people have thought about networks. They've known this model so much they've even forgotten why it was there. And so the problem is that when we make changes to the link layer, go into 802.1Q, make changes to the physical layer, 802.3,. There is a limit of the people who are involved in each of these different problems. They're trying to solve a very specific problem in a very specific, constrained boundary. There's not a lot of discussion about what the consequences are those. Once you start sending packets up and down that right, you encapsulate it and you're good to go. Anything else is considered a layer violation. The end user has to say, all right. Well, if I'm going to do, let me give you a really good practical example from storage. It also affects AI and HPC, so bear with me for a second.
Speaker 3:So we have priority flow control, which is a way of putting a pause frame inside of a link between two devices in order to avoid to basically keep things in order. You want to make sure if you have no way of getting to your destination, you put a pause on the link so that the packets can remain in order and then you don't have to do any reassembly on the other side. This was useful for Rocky, useful for Rocky V2, useful for FCOE all require in-order delivery. Now the problem is that it works really really well if you have a good understanding of your traffic type. Really well. If you have a good understanding of your traffic type, right. If you have a good understanding of your fan-in ratio and your oversubscription ratios of the target from the initiator, you're okay. The problem is if somebody said, hey look, this works really good, I can have lossless traffic all the way across and not realizing that their fan-in ratio was off the charts and that created head of line blocking that would cascade across the network, right. So you had to treat your lossless traffic very differently than you had to treat your lossy traffic. Now that means that what? Once we're trying to solve that particular loss less problem in a large scale environment, you can't use the same techniques and expect the same type of results, sad but true.
Speaker 3:Where this went really off the rails about 10, 12 years ago was when they tried to put iSCSI traffic onto a lossless environment. Now Fiber Channel has an oversubscription ratio of about 4 to 1 to 16 to 1, depending upon the application. Iscsi had a 400 to 1 oversubscription ratio. So they were jamming 400 different links into one target and it was causing all kinds of headline blocking, problems with iSCSI once it went outside of that single switch problem. So you take a solution that was really good for a well-defined and well-understood problem and you try to extrapolate that to what it wasn't designed to do and you're going to have all kinds of issues.
Speaker 3:So as we start to do that with large-scale Ethernet, ai, hpc, you start to realize that hey, I can't do that. I need to understand what's going on, that the link layer is going to affect the transport layer. What goes on in the transport layer is going to affect the transport layer, what goes on in the transport layer is going to affect the software layer, software layer and so on and so forth, right, and so what we're doing is saying I don't want to change the ethernet structure, right, what I want to do is I want to tweak it so that what goes on above and below are now aligned for that type of traffic. So for, if I have a reliable unordered delivery, I want, I want my ethernet to do that. I have a reliable orderordered delivery, I want my Ethernet to do that. If I have a reliable ordered delivery, I want my Ethernet to do that, and if I want to do idempotent operations for HPC, I want my Ethernet to do that. But I got to change that all the way up and down the stack and I got to get the error messages to go back. I got to get the OO codes to go back and forth between the link layer and the physical layer. That's not in Ethernet right now. That communication does not exist natively or mandatorily right.
Speaker 3:So we're trying to make sure that anybody who puts together an ultra-Ethernet environment knows that we've done the thought about saying, okay, the link layer and the transport layer and the software layer have to align this way. If you're going to be using this type of AI, for example, you want this type of congestion control. If you're going to have really large systems with probably in cast environments, you may want to have receiver based congestion control. You may even want to put trimming inside of the switches, but you don't have to do that. But you should know why and we're doing that heavy lifting for you so that you can say that in this environment these are the kinds of things that you're going to be doing.
Speaker 3:We've got work. We've got the compliance and performance and test work groups to help say to an end user this is why we do what we do and what we're recommending and how you can be sure that you're actually compliant in this kind of environment. We're trying to provide all of these tools not just for the vendors but also the end users to understand why we're breaking those layer violations and making the work the way we are, because it's tuned specifically for the back end network of a type of workload. Hopefully that made sense. It wasn't just too much of a ramble.
Speaker 2:I think it's good the question I have. So when you do that, when you break the layers, you can do that in, I guess, a couple of different ways. You can say that we're going to bound this by a reference architecture for a specific use case, and so it's sort of everything is kind of, let's say, it's hardwired to work a certain way, and then the integrations are done sort of before the things are even deployed. The other way to do it, at least for parts of it, is to add some orchestration layer over the top and say some of that is configurable because these devices are deployable in different areas. And you know, today I think we have you've got front end and back end. Networks are fairly distinct and you know, I think there's questions over time. You know, do people reuse different devices and kind of, where's the boundary? You've already talked a bit about some of the storage implications. Storage implications Do you think that these will be, I guess, architecturally defined or is there like an orchestration?
Speaker 3:requirement that comes in over the top to handle how these things come together. There's a third option, right, and that is to make the actual packets and message delivery system be a lot more flexible and dynamic. So the way that we're approaching that is kind of navigating between that skill and shribdist line of full scale proprietary stack, which is very rigid, or the overarching software architecture layer, which is very slow. So what we're looking to do is we're trying to say, hey look, each of these different messages have to be able to have equal treatment across the network, but we don't want to keep state across a million nodes or the system that's going to acquire that. So we're not going to have it. We have a stateless infrastructure.
Speaker 3:So what we do is we create transactions for each individual flow where the address information of the final destination of the memory location is built into the packet itself.
Speaker 3:And that means that I'm going to set up a transaction.
Speaker 3:I can immediately there's no slow start I immediately can send off this packet and once that transaction flow is done, each packet itself, each message is identified, the message ID has its own identification and the destination can do the reassembly of that message in that transaction and close down the system without having to maintain state across the network.
Speaker 3:Incredibly flexible approach because each of these different transactions has their own semantic requirements based upon the workloads that you're acquiring, which means you can actually run different types of packet delivery systems at the same time, because it's all addressing that married semantic layer, but not tied to it to the point where you have to do that. Every single packet, every single message has to be that way and that allows us to do some really incredible flexible things with equal cost, with the packet spring and the ability to directly talk into the memory locations at the other end, while also maintaining the congestion control notifications that go back to a sender where they can act with a sender itself, can have a lot more control over which path it's supposed to be taken for the next next flow, and it makes it incredibly flexible you mentioned ecmp.
Speaker 2:Are you exploring non-ecmp to fan traffic out over all available links?
Speaker 3:yeah so. So a lot of people get um it. Ecmp is um I, I was. It was a mistake to put that, because the way that we're doing packet spraying is more granular than normal ESCMP and I do have to be a little bit careful because there's some things I'm not supposed to be talking about in great detail before we go for 1.0. But nevertheless, it is a form of ECMP. It is not an equal cost from the sense that we would normally have it deployed in a traditional data center environment. It really has to do with the fact that we have a strong degree of variability in the radix of our links to allow us to be able to keep a fine distribution across many, many, many links with this kind of granularity. So it gives us a higher degree of sprayability without having to go to flow level dedication of a link, which is what you would get with ECMP.
Speaker 2:Well, and the failure domains that you were going back to the previous part of the discussion by and Odysseus would be proud, by the way when you, I guess, navigate between, like an orchestrated outcome or sort of a hard-coded, you know, pre-deploy outcome, you also change some of the failure domains on that, which I think is nice. You know, having worked on scheduled fabrics in the past, let's say you know pre-current technology and watching entire data centers go down. I think having something that's a little bit more tolerant of different types of workloads I think is pretty good. I have a, I guess, to put a bow on something we started earlier. We spent a lot of time talking about all the reasons UEC might fail.
Speaker 2:If Andy's going to change his mind, maybe, maybe is there like a particular reason. You think that you're that, you something you've seen right when you look at it and you said this is, this is why UEC is going to succeed. What's the? What do you think is like the? I'm not looking for like the secret sauce, but like what's a thing that you look at and you're like you know what that that doesn't give me hope, that gives me confidence. You look at, you're like you know what that that doesn't give me hope.
Speaker 1:That gives me confidence.
Speaker 3:I love that question.
Speaker 1:You got Bouchon right there. I love that question I.
Speaker 3:It goes back to something you said, mike, um, so I'm going to take. I'm going to take that and I'm going to. I'm going to turn it just a little bit around. I probably spend close to 20 hours a week on UEC and it's it's and it's a part of my job, but it's definitely the biggest part of my job and I have.
Speaker 3:In the last two years I have watched UEC go from six companies of trying to solve a very specific type of a problem where there was a period of time when people were like you know, these are six companies with very big egos and we're not going to see these guys agree on anything To 115 companies, people proudly displaying the UEC logo at Supercomputer or OCP or something along those lines.
Speaker 3:When I sit in on the meetings, it's spirited something that Andy said but they all genuinely believe in the value of the outcome.
Speaker 3:So the one thing that I have seen and I've been part of these standards bodies for a very long time the thing that's gonna make this succeed is just the level of sheer will in the people who are putting this together. This is something that they're excited about, they're passionate about and you can. You know, passion derives from the Greek word for suffering, and sometimes that's exactly what happens when you deal with a passionate person. But you know all that, joking aside, they do fervently believe that they're doing something that is going to solve the problem that is going to vex a lot of people in the very near, you know, and they're working nonstop on it. So I think, ultimately, I have to say, the people, the people that I've been working with, they're not resting on their heels, they're not cooling their jets, they're going full bore on this, and I mean from all companies. A lot of companies are putting it's not just two or three, it's like a lot of them, and that kind of self-motivation is unmatched in anything I've ever experienced.
Speaker 2:I did a pile on that and then I'll maybe kick it back to Andy to see if we've moved him a little bit from his starting position. I was involved a lot in like the open daylight stuff and open daylight had kind of came out of the gates really strong, a lot of interest. You know, ultimately I don't think you saw the volume of deployments that people had hoped. I think it was instructive to the broader SDN movement but I don't know that outside of a couple of different open daylight distributions it wasn't like the huge deployment success that people had hoped. I think what's different here is that there's like an immediate, very acute need and I think when you take all those people, I think what unifies them is that there's like a very tangible thing that's very concrete, that has like very real, you know, like business drivers behind it. I think that's the thing that Open Daylight didn't have. It was a little bit of theory, it was this idea that there was a better way of doing things, but it's like it wasn't immediate, it wasn't like this real kind of central need.
Speaker 2:I think what you have here is like a very strong need. It's being driven by a bunch of big players, but it's not only the big players, and I think that you've got some technology milestones that are forcing like, look, it's gotta be deployable by, and then they pick a date right. I think that you put those two things together. I think that's why you see success. You know, if, if necessity is the mother of invention, I think we've got the necessity side and I think what it's doing is it's driving a lot of the invention, um, and we're seeing that, and that's how you break ties, that's how you remove the standards. You know sort of slowness. I think all of that comes together and that urgency. To me, that's the thing that's different this time. So, andy, I don't know if that moves you, but you opened with questions. I don't know where you're at now.
Speaker 1:All right. So before I bring down my decision and bring the gavel down, I do have one question left for Jay and then maybe a comment. So you said the V1 is going to come out in Q1 of 2025. So I guess my question is what will that information look like? So I'm a network person and I manage networks and you're completely revamping Ethernet from the physical layer on up. How does a network engineer who has worked with traditional Ethernet all this time internalize, learn and be able to support and deploy what the UEC is doing? Is this going to be an 800-page white paper that I have to memorize? How are we going to help people take what you're building and learn and deploy it? Does that make sense?
Speaker 1:1,600 pages dude.
Speaker 3:Come on, you're right. Well, okay, yeah, right, um, well, okay, yeah. So we were already planning on how to help educate people on this because, like I said, there's an awful lot of nerds, um, and and, quite frankly, there's a lot of stuff in there that no one person has the depth of background to be able to get in one sitting. I mean, there's stuff for firmware developers, there's stuff for hpc people, there's stuff for ai people, for lib fabrics, for storage. There's just a lot of things that have nuances that are not universally understood. So the spec itself is going to be public. We're not charging anything for it. You'll be able to download it from the yelterethanetorg website. You'll be able to read everything yourself. We're also going to open up for public comments and feedback and that kind of stuff for error corrections or revisions or possible future ideas. So there's going to be a way for people to actually provide feedback into the organization. At the same time, we're already starting.
Speaker 3:I've asked the chairs of the different work groups to, in their copious free time, start thinking about how to start educating people on the work of their own particular projects, because there are a lot of them. There are a lot of independent projects that are going on in each of these and we've got a we call it the marketing committee, but that's. That's really just the, you know, the communication group. It's the one that is designing the white papers, the presentations, the seminars, the webinars, those kinds of things that are going to help get people a little bit better understanding about how to deploy Ultra. And then we're also offering to the vendors themselves, the members of UltraEthernet, any kind of assistance they need for helping with their own materials, for getting their piece of the puzzle out and say this is what we're doing and this is how it works with UltraEthernet, because there's so many different moving parts that any one particular company may only have a small part, or they may have a large part, and we're offering all kinds of support for making that message as consistent and clear as possible for them.
Speaker 3:Some of those companies are rather large and some of them are rather small, but we've got a good spread of contributions from all of the big names that you've probably heard of. So we're already starting to plan on a campaign of understanding and giving as much information for people to be able to use so that they can make informed decisions. And they may still wind up going with InfiniBand and that's perfectly fine, but we want to make sure that everything is out and open to people to be able to understand how this stuff works and ask the questions that they need to ask. We're going to be doing an awful lot of integration with other organizations OCP, snea, ieee, ofa. The OFA puts together the libfabric stuff, so we expect a lot of joint announcements and presentations and educational material to be coming forth.
Speaker 1:Awesome. Will UltraEthernet run on existing hardware, or is this going to require yes?
Speaker 3:So there's only really one mandatory thing you have to do in order to be Ethernet compliant, and that's the transport layer. And since most of the deployments we expect to be running for UltraEthernet are going to be DPU or NIC-based you know basically the server NIC-based approach to this transport we don't anticipate that going to be too difficult because you won't have to change your switches. It'll fit inside of existing GPU clusters. You don't expect anything from an Ethernet infrastructure to have to be changed. Obviously, when we start to go into silicon spin, anything that involves ultra-Ethernet based trimming support or the physical layer modifications, that's a different story. But those are optional, those are not mandatory things you have to have in order for all to read. But when it comes you'll be able to use the existing infrastructure for your environments or you can wait till the new ones. But obviously there's a scaffolding that has to happen anyway. So we're trying to make it as compatible as possible.
Speaker 1:So I said I had one question, which is a lie because I just asked two, but I will end it with just one comment, which is around, I guess the glacial pace that networking seems to move in right. I mean, if IPv6 adoption or network automation is any indication in the past 20 years and kind of the lackluster adoption rates that we have. I'm guessing, and I don't know how you guys feel about it, but it seems like the financial incentive to be able to support AI HPC workloads right Like this, is the thing we all have to do it. I'm wondering if that'll push us faster than we traditionally move in networking right. Does that make sense?
Speaker 2:I think it's the great tiebreaker. I think it's a good way to put it.
Speaker 3:Yeah, I mean. I mean, the thing is that, remember, we're also talking about a backend network, right? We're not talking about a general purpose network where you've got a lot of different workloads you have to. You're not going to be doing VLAN configurations like you would in a typical data center. This is for a specific purpose. So you know, I think that what you're really looking to do is how do I connect my GPUs together for AI properly? I can use that without needing to necessarily disrupt my traditional glacial pace of networking adoption, if that's, theband is going to rule the world forever and that's based.
Speaker 1:I mean, we're all biased, I guess, in one way or another, and I just see that the market share they have. But then, after hearing everything that the UEC is working on and I'll be honest with you, probably 15 to 20% of what you said I think made sense to me and that's a compliment to you, because no, but you and the UEC are just such a brilliant group of folks who are working on such an important thing at such deep levels, like just when I saw all those working groups and you broke it down in the levels, I'm just blown away at what you're doing and it seems real to me. I told you before the recording I'm like, oh, the UEC, they've been doing this for years and we're still waiting on a spec. I mean, they've been doing this for years and we're still waiting on a spec. I mean a real cynical, shitty kind of thing to say to the man who's chairing the thing. So that's why I didn't say that. And here we are, and I'm saying it on the record, but that's right, but that's how I felt Right and I don't, you know, my mind has been changed here.
Speaker 1:I mean, you know actually the work that you've, that you're doing on Ethernet. I mean I believe that this is going to be the way of the future. I mean I don't see how one company that owns InfiniBand can retain their stranglehold forever on AIHPC. There's just too much growth there, there's too much revenue to be had and we all know Ethernet. So, like, updating Ethernet makes a hell of a lot more sense than all just trying to figure out something else. I guess Mike was right. I heard him once say like don't bet against Ethernet, and at the time I'm like, yeah, ok, pal. But once again Mike Bouchon is right, I have been proven wrong and my mind has been changed. So that's.
Speaker 3:The gavel has come down.
Speaker 1:That's why we're here.
Speaker 3:Jay, thank you so much for your time and all your efforts.
Speaker 1:I feel like we could have spent days talking about this. Maybe we can have you back on someday. I didn't ramble enough. You want more. There's just so many rabbit holes. We could have went down and just in the interest of time we didn't. But the technical stuff, it's just been really fascinating. Thanks so much for your time. Thanks for all the work you're doing. I can't wait to see the V1 spec and all 1600 pages. Mike, always a pleasure. Thank you so much for being here and for your insightful questions, as always.
Speaker 1:You can find all things Art of Network Engineering on our link tree that's linktreecom. Forward slash, art of NetEng, most notably our Discord server. It's all about the journey. We have about 3,500 people on there now. It's a community. If you don't have a community, it's one you could try out and hop in. We have study groups spanning all kinds of vendor certifications, different technologies and in Q1 of 2025, we'll probably have an ultra Ethernet group in there of folks talking about all the things that, as network engineers, we're going to have to learn and figure out and deploy. Thanks so much for listening and we'll catch you next time on the Art of Network Engineering podcast for links to all of our content, including the A1 merch store and our virtual community on Discord called it's All About the Journey. You can see our pretty faces on our YouTube channel named the Art of Network Engineering. That's youtubecom forward slash Art of NetEng. Thanks for listening.