The Art of Network Engineering

Networking for AI: Why Every Network Engineer Should Pay Attention in 2026

Andy and Friends

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 45:33

AI infrastructure is reshaping networking faster than most engineers realize.

In this episode, Andy Lapteff sits down with Scott Robohn to discuss why traditional network engineers should start paying attention to AI networking, GPU infrastructure, and the massive changes happening inside modern data centers.

They explore:

  • Why AI workloads are changing networking requirements
  • GPU networking and lossless Ethernet
  • Ethernet vs InfiniBand
  • The rise of NeoCloud providers
  • Co-Packaged Optics (CPO)
  • Ultra Ethernet and emerging transport technologies
  • Why AI infrastructure behaves like “one giant computer”
  • How network engineers can start learning this technology today

This conversation cuts through the hype and focuses on what actually matters for engineers trying to stay relevant as AI continues transforming the industry.

Whether you're a traditional network operator, automation engineer, architect, or someone curious about the future of networking, this episode will help you understand where the industry is heading next.

Guest: Scott Robohn
https://www.linkedin.com/in/scottrobohn/

Connect with The Art of Network Engineering:
https://linktr.ee/artofneteng

Send us Fan Mail

This episode has been sponsored by Meter. 

Go to meter.com/aone to book a demo now! 

Support the show

Find everything AONE right here: https://linktr.ee/artofneteng

This is the Art of Network Engineering. Where technology meets the human side of IT. Whether you're scaling networks, solving problems, or shaping your career, we've got the insights, stories, and tips to keep you ahead in the ever-evolving world of networking. Welcome to the Art of Network Engineering podcast. If you didn't know, my name is Andy Labtev, and I'm sure you have seen and heard this gentleman, Mr. Scott Robon. Hi, Scott. Hi, Andy. How are you? I'm better now that I'm with you. I love seeing you. No, I mean it. I there's certain people I just really enjoy. And you're you're you're one of those. I catch you. Yeah, you gotta I love your vibe, man. It's all about the vibes. Um so we are in person. Uh we are both attending uh NFD 40 correct tomorrow or in San Jose. Uh I will be a presenter and you will be a delegate. I will be. Opposite ends of the table. That's right. Is this any kind of weird conflict? Is this fiery darts coming from kind of the delegate side? Be ready, man. Well, I'm getting in and out, so you won't have time to dart me. I will tell you, like, so all the delegate chatter has been uh I gotta prep. I gotta prep. I gotta be ready for all the vendor presentations. And I have heard more people talking about prep for this than I have heard, I think, ever, which has caused me to get off my my butt and try to do some prep too. I don't think I prepped at all when I was a delegate. I think there's a really good lineup and it's packed. So it's it's a high quality lineup and it's gonna be a three full days. And so people just want to be ready. And you've got that's a lot. I don't know. It's like Jason Gintert and Pete Welcher and Vince Slindro, who you know, we're gonna bring it. So well, that's great. I love hearing that there's prep. It'll mean there's probably better engagement and questions coming from as you know, it's all about yeah, if the delegates are asking questions, you're doing well. If they're not asking questions, if they're asleep, this is not working. That's not that's not what you want. Yeah. So well, I am thrilled to be here. I'm so happy to have you back on the show. Uh, mean by the way, I need a T-Nop sticker. You see my little thing over there? I'll hook you up. I need a couple. For some reason, I don't have T-Nop stickers. We have other network automation forum stickers that you don't have as well. Yeah, there's my NAS frame. I will hook you up. So I don't have them with me. I might be in Philly next week. That I'm going to AC5. I saw it. Yeah, yeah. I'm so excited. Um, dude. I know, it's gonna be great. All right. So uh what are we talking about here? I believe so. Everything seems to be all about AI, right? I know. Don't don't is it that bad? Is it that bad? You know, I kind of oscillate, vacillate, what's the right word? I you know, like some days I'm like, this is amazing, this is really cool, I love it, because I'm using it in my everyday life and it is making me I'm able to do things I couldn't do uh prior with some coding stuff. And it's also I'm learning how to automate some things for myself so I don't have to do all this manual tedium. For instance, tomorrow there is a release for the R network engineering. I still have to do the thumbnail, I still have to make the descriptions, I have to do social media, I gotta write a blog. Sounds like workflows that could be automated, yeah. Yeah, so just I'm thinking out loud here. Exactly. So and I've looked into how to do that. Um I'm I'm heavy into Claude now, and I'm at the recommendation of a mutual friend that we know who's Mr. AI guy. So why am I babbling? I really, really dig the level at which this stuff has become useful. And sometimes I'm tired of everybody talking about it all the time. Sure. AI this AI that so I believe in an FD40. I look briefly at the lineup over the next few days, a whole lot of AI. I don't know if there's anybody not. You and I have kind of had, I guess, conversations before about like and and our friend Mike has said this like, how do you separate the noise from the signal, right? Like and I really think that's a great framing for me to listen to anyone talk about this. I better receive some signal. Give me something real. Well, I think the exhaustion comes from the uh the distribution of that uh broadcast radiation that is hype. Yeah, right. And what people want are what's really useful, what's impactful, um, and help me filter out the hype. And I want to be really careful here because you know, we have a lot of friends who are out there very publicly talking about what's the art of the possible with this. Yeah, I think that's super useful and it's not production, right? But it's still what could be in production two years from now or four years from now, or some time frame that I go through a process where I understand what the guardrails are, but what lots of network engineers want to focus on is okay, what's really real here, helping focus on that versus you know, another another generated picture and stuff like that. What's behind the hype, you think? As you're like, this is why I love these conversations, because this is very I created a outline and we're completely ignoring it, which is great. I love this. We'll go to the infrastructure piece, but let's answer your question. Yeah, yeah. You've been in the industry longer than I have, a lot of experience, which I respect enormously. You've seen there's the joke of like the next big thing, right? It seems like every five or 10 years there's the next big thing. And in our vertical networking, like it was ST WAN, it was automation, yeah, whatever, right? MPLS, I don't know. Is this just what happens in our industry? We need something, we need the hype man, we need the flavor flavor from public enemy. Like somebody needs to be screaming about, you know, this thing that you need right now, so you so we can make money, like, or is is this different? I do think it's different in that it's worse than it's ever been from a competing sources perspective. And like you and I both do real network engineering things and do marketing things. So we have a foot in each world, and on the marketing thing side, you're you have more and more competition for getting people's attention than we've ever had. And now AI is changing that game too, generating lots more content that's vying for many more eyeballs. And how do I break through that noise just to get attention and get get people focused on the things that I want them to hear about? It's too much, it's too much information, right? Yeah, the the fire hose of I don't think our brains so I don't think we weren't built for this. No, I don't think we were built for this. No, absolutely not. There's a there's a lot of things that have come out of certain technologies that I don't think we were built for this over the eons we've been around. And it's interesting, for lack of a better adjective, to uh explore how we are adapting or not adapting to some of these things. And yeah, the the amount of noise has just been astounding. So before we go to the R the the infrastructure discussion, do you think that it's worse now because, and I don't know if I'm using the term Kager correctly, maybe it's ARR. These are some terms that someone smart said to me once, and I'm like, oh, but I think the Kager, the compounded annual growth rate of AI is like what, 55%? Something like insane that right. I think in data center it's eight to ten percent. Like I think if you're running a successful business, I think your annual growth rate can be around there. Again, I'm not I took one business class in community college. I don't know what I'm talking about, other than what I hear like you and people like Mike say. But when I saw the rate of growth for AI, it was orders of magnitude greater than anything I've ever seen. I'm wondering, are they related? Is there so much hype? Because there's so much money. There's so much growth. I don't know. So there's a lot that we have to let play out and see what really happens. I do think there's enough that's transformational about the set of technologies that we call AI that we are seeing a real inflection point. It doesn't mean that 100% of it is as impactful as every other piece, but I do think we have superpowers now, right? To do to do certain things in our own personal workflows, and we're figuring out how to apply that to network operations scenarios. And you don't just let it loose, right? I'm just here's an agent, go do whatever you want to do. I have to go through a POC, I have to go through very specific trust building exercises, and I would do that with any new technology that's not an AI specific thing. Human in the loop kind of thing, right? Make sure that you're in there until I don't need it. Until I've when I've clicked okay without hesitation a thousand times, it might be time to let it loose. Which is the same kind of thing in automation, right? Yeah, exactly. Which I'm I mean, this is kind of automation on steroids, how I say it, right? Yep. I struggle with automation previously because I thought you had to learn Python, and now I am building things in software without enough knowledge to be able to do that in languages I don't know because the tooling has got so good. Right. It's it's part of what we'll talk about tomorrow, but there's been a democratization, I think, of software development. Yep. This guy, I his name I forget, but he's the founder and the guy who runs co-work at Claude or at Anthropic, whatever you call co-work. And the metaphor he used, which I really liked, was how do you say and historic? And historic and space historic. That's hard to say. I don't stutter, and I just started to stutter. Historically speaking, that's better. That's good. There you go. Um, he used the metaphor of the printing press. Sure. That it democratized knowledge, yep, and it democratized the ability to create these things called books for knowledge. That then, and he said something crazy about like these microphones wouldn't exist if it wasn't for the printing press that democratized, we would never be able to coordinate at such a scale without a shared knowledge base. Like it's democratizing software development. So I'd say 10 years ago, my mom was a flight, she was a flight attendant, and I'd see them try to every month these lines came out, which were all the flights of the month. And then I have to go through all this hell and be up all night trying to switch with all their friends. And I thought I was in tech and I'm like, it's gotta be a better way. Right. And I had met some guy through something. And anyway, I approached him because he was like some software developer, whatever guy. Sure. And I pitched it to him. And I'm like, oh, he's gonna steal my idea. Right. And he was basically like, All right, man, like this is $75,000 to start. That gets us about three months down the line. Sure. And prior to the democratization of software development, for people like me, right? You needed like venture capitalists and a ton of money and give 40% of your IP over for that money. And like now I can create software, one-person startups, yeah. Right. So anyway, I'm kind of agreeing with you that the value I see is finally there because chatting back and forth was an Agenic, yes. And but what what do we you said something I thought was compelling, which I guess is the infrastructure part. There is so much money being spent this year alone. I saw something like $690 billion with a B is being spent on AI infrastructure alone. For context, the most the US space program ever spent was in 1966 during the Apollo mission, and it was six billion then, which is about 65 billion adjusted for inflation. Yep. So 60 billion is the most NASA ever spent to go to the moon. Sure. We're spending 690 billion in equivalent dollars this year on AI infrastructure. So much money. Yeah, I don't know anybody building AI infrastructure. So I like how you kind of framed there's a lot of money being spent. I don't know anybody that's actually doing the work. And we are in a world of network engineers who are like, hey, here's the thing. Pay attention. What should you do? How do we cover that gap? So here's how we cover that gap, Andy. Um, so you know, you and I have gone back and forth on this, and I think we had a conversation six or eight weeks ago. It might have been a text thread, as many of our conversations happen, where you know, you were saying, Yeah, no, I think I think we ought to spend some more time talking about networking for GPU interworking. And I say that really specifically instead of just networking for AI. Say that again. Networking for GPU inner inner interconnection. Yep, yep. Right. Clusters of GPUs, right? It's a very specific set of technologies. And there might be 80% of the spend and networking happening there, but only 20% of networkers, you know, will really be touching it. Question. Is that because this is all happening at hyperscalers presently, and the hyperscaling networking people would be working on it? Hyperscalers and neo clouds. Neo clouds. Can you define neo cloud? It's a cloud company that focuses on hosting GPUs. Which is different than a hyperscaler. They're not hosting GPUs. That's for them. This is their stuff. Yeah. Neo cloud will rent you GPU stuff. Correct. Okay. And you got to be careful because not all neo cloud providers are the same. They go after different use cases. I spent um a couple days, a couple weeks ago at a show in New York called the Data Center Dynamics Accelerated Compute Show and got to sit down with six different Neo Cloud providers. And they have lots in common and they are doing stuff differently too. That plus hearing people like OpenAI show up and say, we can't find enough capacity to do our codex workloads and our chat GPT workloads. The data centers aren't coming online fast enough. And that's a function of power, probably, right? It's a function of every supply chain. And that's not gonna slow down anytime soon. We've run up against, I guess, some real world constraints of we can't get chips. Right. Because they're all going into memory. Memory is the big deal, right? We can't get memory. It's basically all shared memory of the GPUs. It's one big computer that's all talking to the city. Well, you and I can't get memory. Well, yeah, right. But other companies can. Right. Um, because they have more buying power. And you see some of these big deal companies like NVIDIA investing in certain companies so they can have preference for certain supply chain components like memory. I'm fascinated. That's like the chess being played at like a strategic business level is fascinating to me. Like, I'm gonna buy a chip company so that we get chips so that we can do the thing. And I mean, it's it's it's really smart. Well, here's on and on this in particular, here's the thing that totally slapped me in the face. Uh at the NVIDIA GTC conference a week before that. So GTC is the GPU technology conference that they do every year in San Jose, where we are. And boy, San Jose is a lot nicer when there's not 30,000 people here for an NVIDIA conference. NVIDIA networking rose from 8 billion in 2024 to 30 billion in 2025. And it's not because NVIDIA is going out and trying to sell networking like Cisco, like Arista, like Nokia, like Juniper HP, right? It's riding in the uh what's it called in uh in motor racing? Drafting. Drafting. The the pickup in networking sales for GPU interconnection is just coming along as a natural consequence of all the stuff that needs to be built to the data center demand. Two things Ethernet one, which and in 2025, that number the so the Ethernet passed um InfiniBand a year or two ago. Just in their numbers, a year or two ago. Um you know, and don't never bet get bet against the Ethernet and all this stuff. And and I had been to the Frontier Supercomputer down in Noranel, and like I knew some people doing HPC stuff. Like they were all using flavors of InfiniBand. And I'm thinking, everything I've seen is Infiniband. Like, I don't think Ethernet's gonna win. My superpower seems to be saying dumb things publicly and then being completely wrong. So Ethernet won. Well, you're not you're not alone. So Ethernet won, which I guess I was happy to be wrong about. Right. Here's the cynical thing I want to ask. So, yes, their networking uh solutions in Nvidia have skyrocketed. I assumed it would have been on eeth uh InfiniBands back. It was not, they have an Ethernet line, Spectrum X. Spectrum X, yep, which I think they bought into with Cumulus. And so there, you know, there's a bunch of cool stuff that right. Do you think? And you probably can't answer this, and Jensen will probably come find us and he lives close by. Oh, does he? So I just don't. So I thought I figured it'd be in Bora Bora, Hawaii. That's where I'd be if I had Jensen money. Um, what I want to say is test this against you. Has their network business grown by orders of magnitude because they're bullies? Meaning, oh, sure, you can have 10,000 GPUs if you buy or because it's an integrated, everybody does this. This isn't a knock at them. Everybody wants all the things. When I was a cable guy, if you were a triple play customer for Comcast, you were like 180% less likely to ever leave. Correct. So I guess that's a thing in business. There's probably a name for it. You know that I don't, but is it rising because Nvidia is just the hot girl at the party for lack of a better term? And they're not gonna sell your lead times are gonna be different, Mr. Customer, if you don't buy their networking. Yeah, that's a fair thing, or is that I've heard multiple anecdotal reports of behavior like that. Yeah, which doesn't surprise me, and it's not a knock at them. I think we've seen we've seen other vendors do it too, right? I mean, Cisco has done things like that with telepresence, with IP Celephony, right? Hey, buy all the the desk sets and we'll give you the networking gear for free. Taking my vendor guy hat off, and just as a network engineer, what what I'm digging at, and it's probably obvious to you, is is their networking stack better? Or is it part of the integrated solution that they hold over your head that you have to buy? Two words reference architecture. They have their set of reference architectures, just like Nokia validated designs or Arista validated designs or Cisco validated designs. Will work best if you follow our reference architecture. We have tested the hell out of this. Correct. Yeah, yeah. So really interesting um play in the marketplace on this, where that's true, and that's where they decided to put their effort into if you follow this architecture, you'll you'll get better support from us, whether they say it directly like that or not. And again, that's not just an NVIDIA thing. Many vendors have had similar plays, right? But those are for very specific use cases that generally go for very large scale. And not everybody is building at that scale. And they are saying things like some of the neo cloud providers I chatted with hey, those are those reference architectures don't fit everything we're trying to do. So the fact that uh Ethernet has one helps give options in the ecosystem where I don't have to have it all. I don't I could break away from the reference architecture, but I can still make independent choices about the networking. And I think that's a trend that's going to continue. You may pay a price or you may not. Yeah, yeah. As as other vendors come along and provide excellent support for this and it starts to yeah, you start to turn the ship a little bit. Right. So thank you for defining NeoCloud. Another aspect of all the AI stuff, I feel, and maybe it's just because I'm getting older and now everything's scary, but like it feels like the pace of change is just rapidly evolving and and and getting you know increasing, meaning like somebody said NeoCloud to me like a month or two ago. I'm like, oh God, another term, like what is this? Like, you know, every couple of weeks there's this new thing, and like, oh no, it's so let's circle back to the only people, quote unquote, most of the people building AI clusters which would be using networking technologies for AI, are hyperscalers and neo clouds. Is that a fair-ish assumption? I would be I I've learned over the last few weeks to maybe discriminate away from the hyperscalers. And if you think of the large cloud service providers, and I should be careful not to paint with an extremely broad brush, but they're really good at standard compute and at scale, right? And orchestrating it and having elasticity or you know, workloads that need to expand and AWS has been very public about like yeah, they're and and let's, you know, for completeness, Oracle Cloud, Azure, Google Cloud, and and others, right? But they haven't had their arms around um AI workloads like others have had. So just because they're a hyperscaler doesn't mean they have the same agility for AI workloads. They're still figuring it out. And that's led to the emergence of these neo cloud providers. They've kind of had a clean slate, and like we're building data centers with GPU interconnects because there's a market for it, and we can do this in a more agile manner than maybe some of the older established players that have only been around for 15 or 20 years, but there's already some ossification that's set in, and other people are coming along to move faster and do it better. That's interesting. So folks like you and I who talk to the network engineering communities as their Lorak, so to speak, we speak for the network engineers, right? We I steal that from Pete Lumis. Um I love Pete. I wish he could come on. So, how do we speak to our people, right? About you really need to pay attention to networking for AI and all the good work that's happening with lossless Ethernet fabrics and uh the new UET transport protocol. It like if almost none of them are working for hyperscalers, working for neo cloud providers, building this stuff. That that that's yes, it's huge. We've never seen adoption like this, we've never seen investment like this. It's taking over everything, and almost no one statistically is building this infrastructure, right? That's not true. What's the percentage of network engineers right now who would be interested in networking for AI? It depends how you ask the question. Interested in versus I need it in my day job are two different things, but they're related, right? You don't need it in your day job today. I mean, interested in it. It's not needed in my day job. Right. I am now, yeah, because of a lot of things that I've seen. Right. Uh so our mutual friend Clayton, when he I heard him Hello, mutual friend Clayton. When you watch this, make sure you tell us that you heard us. But I heard him create the relationship between how prevalent. It's amazing how hard it is to talk when you're tired. I was almost like developers, you said, but you said prevalent. You say how prevalent these technologies are in our lives. Like almost everyone I talk to are is using these. A lot of them are paying for them. So the I use lots of technologies that I don't know how they work on the back end. Well, right. So but trying to get people, so like as an example, tomorrow we're you know, we're gonna talk to network engineers about networking for AI and why you should care and why it's important. And here's the stuff I put myself in their shoes and I go, Oh my god, like I'm a trad net ops guy, uh I'm traded ops, man. I'm not building this stuff. Like, why should I care? And Clayton got me thinking if you look at the signals in the market, if you don't care, you're not paying attention, right? I wasn't there a couple months ago. And I've again, some of the back and forth we've had has kind of pushed me there. I'm sorry, finish your thoughts. No, you're good, but but I think just like automation, I've been trying to pull the Tradnet ops along because of experiences I had and signals I see. And if you want to stay in this industry, it's even crazier with the AI stuff. So I want to pull those folks along. And the concern, just like with automation, is like, why should I? Why do I have to do this new thing? I'm drowning, keeping, I mean, you know, the whole of keeping the lights on. I don't have time for this. How the hell am I going to learn this? And now with this stuff, like, this is why you should pay attention. I'm not building this stuff, bro. Like, leave me alone. I'm tired. So we've seen this movie before. And let me just throw out a couple potentially relevant examples. You know, there used to be other types of circuits before Ethernet. I know that may be hard to believe. Fractional T1s? Yeah. 56K. I manage them at banks. That's right. Yeah. So I think this is a Bob Metcalfe quote. I don't know if it's, you know, apocryphal, but uh, I think he said something along the lines of, I don't know what the next generation of networking uh uh technology will be, but it will be called Ethernet. And I remember this moment in like the early aughts where all wide area infrastructure was TDM and Sonnet related in the West. And Ethernet for wide area links, are you kidding me? Even at one gig, at 10 gig, you know, but guess what? It's prevalent, it's it's everywhere now, right? And the federal government and lots of other places are trying to get rid of all this TDM gear that's installed. So we see this is networking, it's technology, there's disruption. If you thought you were going to be safe, the door is right over there, and you should take your headphones off first before you head the door. Um so there's always gonna be new technology, new stuff in the world. You've seen this, yeah. That's right. You know, and it's a pattern, and there's a pattern here, and the pattern repeats. That's for emphasis. Um, so the stuff that's happening that's driving so much spend for GPU Interconnect, all the scheduled fabric stuff, all these special add-ons and modifications to Ethernet. What happens when switch manufacturers decide I should just make them all like this? And that's what the Ethernet looks like on every box. Can you buy switches with one gig ports native today? It's harder and harder. I had that same thought. Why, if you could choose lossless Ethernet, why would you choose lossful? At what cost? That's the question. And then there's a standardization for the manufacturers. Like, well, if we just make all the same, like Henry Ford, but the all the bolts were the same size, like why as a vendor or manufacturer would you make both if the industry so I lost a supply chain? There's a supply chain thing, and I'm not I'm not gonna predict when, you know, but if you look at the pattern, correct, right? And and I think we the tide is rising, you know, for that stuff. Let's talk about some other near-term impacts that we'll definitely see. The other thing that really hit me at GTC was the co-packaged optics thing, dude. Yep. That's another thing. I love how wrong I am, how often, like when I heard Well, there are so many ways to be wrong, it's kind of a safe bet, right? You know, I met these guys at work and they're talking about co-package optics, and I'm like, what like who cares? Like what? Yeah, it's I'm always like who cares? Just in my head, because if I have to create communication around these things, like I have to care to then show people what to care about. And then I read, I think coming over here, the traces between the ASIC, which are the copper wires, between the ASIC and the SFP up in the front. That is now becoming the bottleneck, right? Correct. Because the speeds are getting so fast, so they have to bring the optics closer to the correct. But that I it blew my mind. The the little itty bitty copper trace is now the bottleneck that they have to. So I didn't mean to cut you off, but I got all excited at that point. That's part of this conversation. Yeah, but it made sense to me finally making sure. Oh, because nobody they kept saying why co-packaged optics are important. Like, like, why would you ever not use the SF pluggable on the front? Like, that's what that's how we've always done it, Scott. The way we've always done it. So it's a trade, everything's a trade-off. In engineering, everything's a trade-off, right? And I love the I can have a switch and I can have three 32 ports, 48 ports with a different optic for each port. Super flexibility. And back in the days when powering my data center and that particular switch and removing all that heat wasn't as big of a deal. Boy, the flexibility was really nice. Oh, and by the way, if one optic fails, I only need to replace that pluggable. I don't need to replace the whole switch, hence the trade-off. Now that I'm trying to reduce the amount of power I consume, which reduces the amount of heat that needs to be removed. You push more speed, right? That's right. And I get more speed and I can get denser boxes. And what if I can run liquid cooling all throughout the box? You know, and that that's that was on display at GTC, you know, in their Spectrum X line. I know I should know this. Where are we now? Are you at 1.6 terabytes? Uh terabytes, whatever that's so I'm not, I think people have done um early prototypes of 1.6. So we're at 800. 800 is kind of the ceiling right now, but again, our friend Clayton would know much more about that. But but the reason I ask is that's not in it's not deployment terms. Right. So to get those speeds, you're generating a lot of heat, and those traces were never designed for that. So now they have to either get rid of them or bring them up. There's electromagnetic interference that you're trying to manage to at higher speeds, like it all matters. So we are coming up against physics, which I think is so cool to watch humans try to figure out how we can we can't change physics. Right. So how can we change the way we've always done it to accommodate the needs that are happening? And this is radical. Uh, you can watch NVIDIA put out terminology that they want to see the industry use. And I think they talk about uh radical co-design or something like that, where they they bring their suppliers in on their uh you know their supply chain and try to bring it all together. So now I'm putting a lot of risk in the switch, right? Where if one port goes bad, then now I need to RMA the whole box. I jumped. Let's define co-package optics for them. And let me know if let me know if this is accurate. So instead of the laser and the diode and all the magic stuff that happens to convert light to electric being in the pluggable SFP in the front of the box, co-package, right? It's into the switch, and now there's no longer, which I hadn't thought of it until you said that. If that it's not an SFP, but if that magical light diode, whatever thing that used to live in the SFP is now inside the box, I guess if that goes bad, you lose every port on the well, it's usually per port, right? So I might lose a port at a time. Okay, so it doesn't kill the whole box, it doesn't kill the ports on the whole box. That's right, which wouldn't make sense, I guess. And there's even with ASICs, they kind of there's a slider bar here that's also interesting that we can talk about too, where maybe I don't remove all pluggables, maybe I just make it for four ports or eight ports, and I have sections of the ports where I can change out the optics. Right. And that's an Aristo approach called XPO. Okay. And not to name drop or advocate for any particular vendor here, but there's a range of they kind of do that with ASICs, right? I remember learning that like half of the box is those ports are on one ASIC and half. So there's some well, all vendors have had to do that, you know, whether it's ASICs or or um, you know, FP4, FP5 and the big routers. So as we go to co-package optics, we'll just have more and more dead ports on a device over time that we'll eventually either have to figure out how to service or just swap the whole box out. Right. Yeah. Or do I have a rule of thumb where if I'm doing huge data center designs and implementation, I can probably tolerate some failed ports, especially if it gives me radical reduction in power consumption and the need for cooling, right? Maybe I don't replace the switch until four ports have gone bad or six ports have gone bad. And that would be up to the operator. That's not up to NVIDIA or Arista or Nokia. Like operators would decide how they want to handle that. But if it gives them more capacity and I can design around lots of failures, I can again figure out where the trade-offs make the most sense for me as an operator. So if we can tie this back to which I've tried to do from the beginning, if you're trying to get into networking or want to stay in and scale up to be relevant, where does networking for AI fit into that? And what would you study, right? Like sure. There's a 900-page white paper that the UEC put out. Like I still haven't read it. It's fascinating. Why would you read it when you can feed it to a GPT and ask it to summarize it? Hey, look what you did there. Seriously. Well, right, which is which is amazing. That's a good starting point. And then where do I dive in? I like your historical perspective. So looking back at the next big thing, and when, as a person working in technology, whose whole career is dependent upon having the relevant skill set at the time it's needed, when would you tell someone to start learning this stuff? And what do they where do they start? So I talk about this a lot under the context of next gen network engineer, right? And you kind of have to develop your own personal discipline on I should go out of my way to be learning new stuff. It should be some combination of being relevant to what I'm doing at work, or something that I'm just really interested in, or something that I think is going to be impactful and I want to learn about it before I have to, right? There's probably more than those three legs of that stool. But like you can't lab GPU networking, you can't spin this up in Eve and like, right? Do you have to buy GPUs? If you if if you really want the real experience, yes. Right. I guess you could buy older, so like we've all done. I guess there's older GPUs that are now a doorstop. Because they they get upgraded every year. Yeah. So it doesn't mean they're not in production. Right. And it doesn't mean they're not doing useful work. And by the way, that's another good plug for neo cloud providers. They're really good at being able to keep stuff in service because they know better about how the technology evolves, you know, with every next generation of NVIDIA card, of AMD card, even of Intel cards. Okay. We're a network engineer and we want to learn this stuff. We don't know when it's going to hit us, but let me first make the decision that yes, I'm going to my career is additive. And you told me this when I'm glad you remembered that. Remember it, remember, Andy, it's additive. I think about it all the time. I don't want to be a DevOps. I don't want to, and like, well, it's you don't lose right. Like, oh, exactly. And I don't, again, I say dumb things publicly. I don't know why that wasn't obvious to me, but when you said it and unlocked me, it wasn't obvious to me at first either. Oh. So I think like you have to look at your career that way. This is all additive. There's always new technologies, and you will always be adding things. Yep. If you don't want that, leave because I didn't want it and I bitched for years. And the industry said, okay, go skill up or else you're not getting a job. So I did, and here I am again. So that's just so make the decision. If you make the decision, you want to learn the new stuff. And if you look at history and if you look at the growth rate and all the money, it's networking for AI. What should we study? And then how would we lab it? And I'm just thinking it through. So, like the I think plugging in the UEC white paper into an LLM and giving me a summary would probably be good. The UET, the new transport protocol looks pretty fascinating. When I was at NFD last year, there was a guy, I think, from Intel who was talking about the GPU networking. And what I what I what I was confused about at the time because I'm old school with one U routers and switches and some chassis, but it was all like NICs and servers, which is weird to me still. And I know I should know. So, like, is this all weird NICs and servers, or do vendors I mean, I know what's out there, but are we gonna have one RU boxes that are gonna do this stuff, or is GPU networking connecting to crazy NICs on weird servers? Does that make sense? So different NICs on different servers, and remember in an AI cluster, I have front-end networks and I have backend networks. Oh my god. And I have stuff that can be plain old Ethernet, and I can have stuff that's in Finiband, or you know, um, Ethernet of the next generation and the next generation beyond that. Do you want to play the definition game? You want to see what I know? I I'm not sure. I'm I'll I'll screw it up. I'll be the idea. I think back end is GPU to GPU talking to each other. I believe that's the one big computer, the GPUs all scream at each other and then stop and then all scream at each other again. So the back end network are all the GPU interconnections in the network talking to each other. And let's put a fine point on that. Having lossless Ethernet is really important there because one error holding up one GPU can hold up all the GPUs, it stops everything, I think. Because it's all one big computer, right? It could be hundreds, yeah, it could be thousands, could be tens. So and avoiding errors is very, very high priority, which we don't really care about in getting our email because I have TCP. It's a completely new paradigm. And I can I can retransmit. And if my email download takes 7.2 seconds versus 7.4 seconds, I'm okay with that. I'm not losing millions of dollars by that 0.2 seconds that's added to my email download. This is something I looked at yesterday to prepare for some of this because I've been trying to consume any content I can to get my head around this and have enough context and history to be able to speak about it somewhat coherently. And I actually asked an LLM, does the UET, which is the ultra ethernet transport, I think it's the new transport protocol that the UEC came out with for um for lossless Ethernet. I'm like, does this replace, like, is this at layer four and does it replace TCP? And it kind of said no, but then ended yes. Like, so TCP doesn't go away. We'll always have it. Right. But for AI workloads and HPC, yes, you are not using TCP, you are using this UET thing, which gives you lossless. Like so to me, that was just even like a whoa, like holy crap, we're at a point where like you know, it sounds obvious, like, oh, there's a new transport protocol. That's obviously TCP. But it's uh but we've done this before. Again, this is not without precedent. FTP versus SFTP or um TFTP. You know, FTP uses TCP, TFTP uses UDP. Where did I move the responsibility for making sure the the packets and the segments, well, they're not segments because it's not TCP, but everything arrived in order. I moved it up into the application, right? And I'm just using a different service from the network layer below or from the layer in the stack below. What we're seeing here with another transport layer is we have lots of brownfields from a from an architecture and deployment perspective. And now we're we're not starting from scratch for um UEC, Ethernet, you know, we're re-engineering certain things, and maybe someday we're gonna ask a very specific model, probably not an LLM, to help me engineer a new networking stack from the bottom up. And I'm I'm mostly serious in saying that. Yeah, it's pretty fascinating and it's accessible, which I guess is my point. If you know basic-ish networking and you understand what TCP is and what it's doing, to your point, yeah, you're not gonna you can't retransmit in an AI workload. That's right. Everything stops. Yep. Whole lot of money sitting there doing nothing but creating heat. There's something called job completion time, which is a whole thing that I don't understand enough. But these jobs that are run, and there's a it's all about getting it done fast and the network. And you're taking snapshots and storing all the data at certain points in the job um progress. So if I do have an error, at least I can only I only have to fall back to the last snapshot. And I'm sure I'm oversimplifying that. But the more you dig in, the more fascinating it gets. And then so I Which is why people should learn this. Well, right. So you're probably curious if you're a network operator, yeah, because it's in us. Yep, and this is the new thing that's that's happening, right? And it is accessible. So the analogy I heard yesterday, or something I read. Networks traditionally connected systems that weren't connected, disparate things doing different things, and we allowed them to talk to each other. The networks and AI workloads are kind of interesting and different in the sense because now all the GPUs become one big computer, and they all have to talk simultaneously all the time. The network, this probably isn't I don't want to say the network's the glue because I uh but the network is what allows an entire data center of 10,000 GPUs to all act as one system. You know, so Jensen has called the data center the new unit of compute, okay, right? And these technologies viewed together as a system, having the system's view of not just the GPUs and CPUs and not just the network and not just the storage, it has to be highly coordinated. And now the most common practice is to have many of them in one physical facility. We're gonna find ways more and more to have training done on machines that are in disparate locations, which has its own challenges from a lossless um perspective. So I think I just answered my own question. Thank you. And I will wrap by saying that what I've been doing recently is learning the terminology. Yep. I think it's a good place to start. Sure. Do you want do you want to refresh on how stuff works today? It's not bad either. So you know what's different. Yeah. But terminology is good. Yeah. Well, it's a good, you know, there's front end, there's back end, there's scale up, there's scale across, there's scale out. There, but these are just words and definitions. So if you've studied the CCNA as an example or security plus or whatever, you know, it's words and definitions and what they mean. I think it's a good entry into because for the months I've been on some calls where people talking about this stuff. I don't know the terms, right? Similar to like automation, it used to be. Sure. Like, oh God, what is this? Like, what are they talking about? But if you can just where to start, yep, learn some terms. What does this mean? What is the network doing differently? Don't be afraid to ask questions when you don't understand. And I think everybody's kind of figuring it out. Yep, we're we're on the frontier. So you can be the dumb person in the room asking the question because this is all kind of happening in real time, which there's some comfort in that, as opposed to like I can't ask about BGP, it's 25 years old. So I don't know if you have any advice for anybody out there to learn this stuff, and I guess that's what we just did for 45 minutes, but I would start with learning some terms, running that white paper from the UEC through an LLM for it for a quick summary. And what I found helpful was some of those metaphors we talked about. It's one big computer, there's very specific networking needs, right? It has to be instant, no latency, blah, blah, blah. And the further down the rabbit hole you get into, the more interesting it is to me. Sure. Like with the co-package optics. Why do we need co-package optics? Well, now because that little thing on the motherboard gets too hot and it'll explode over 800 kg. Whoa, like it just it's a really, really difference. Yeah. I would also say from a learning perspective, remember that um UEC standards are in flight and will eventually see implementation. And there are things being done today by vendors to that are pre-standard, quite literally, right? Yeah. Um, and I'm not just talking about NVIDIA. I remember before GTC 2025 seeing some weighty press announcements from Cisco talking about NVIDIA partnership, right? And everybody wants to be a partner with NVIDIA. Everybody wants to be a partner with NVIDIA. There's some really interesting things that are happening there. So I would say vendor info, um, vendor presentations, and even free vendor certification on some of these things can also be very useful. Nothing against the UEC documents. You know, you can go get a much more practical explanation of how this is being done today versus uh skating toward the standards and meeting up with them eventually. Good call in the education piece too. I remember when I was out of work a couple of years ago, I took like um the videos in FiniBan. Yeah, exactly. I think that was on your like we were talking. Somebody was like, I think we both did it. Yeah, and it was fascinating. I learned a lot. Yep. Scott, it's always so nice to see you. Welcome to San Jose. Be nice to me tomorrow, would you? I will not be nice to you. Of course, I'll be nice to you. We're both door jobs. That's right. Um, where can people find you if they don't know where you are because you are everywhere? LinkedIn is uh the best place. Drop me a DM if I can plug, listen to Total Network Operations, give me feedback on you know other folks you'd love to have on the program besides Andy again. I will have Andy back someday, no question. This is fun. I get to do this. It's not it's not work. Um, I really enjoy it. So thanks, man. That's awesome. Thanks for uh coming on for all things art of network engineering. You can check out our Linktree at Linktree forward slash Art of NetEnge. What do you want to check out there? I would direct you to our Discord server. It's all about the journey with thousands of folks in there talking, chatting, hanging out, studying, study groups. Do not trudge this road alone. Find a community. Doesn't have to be ours, but find one. Uh it's much more pleasant and enjoyable and awesome. It's a much better journey doing it with friends and a community than doing it alone. So as always, thank you so much for watching and listening. And we'll catch you next time on the Art of Network Engineering podcast. Huzzah. Hey folks, if you like what you heard today, please subscribe to our podcast and your favorite podcatcher. You can find us on socials at Art of NetEng, and you can visit Linktree forward slash Art of NetEng for links to all of our content, including the A1 merch store and our virtual community on Discord called It's All About the Journey. You can see our pretty faces on our YouTube channel, namely Art of Network Engineering. That's YouTube.com forward slash Art of NetEng. Thanks for listening.

Podcasts we love

Check out these other fine podcasts recommended by us, not an algorithm.

The Hedge Artwork

The Hedge

Russ White
Heavy Networking Artwork

Heavy Networking

Packet Pushers
Your Undivided Attention Artwork

Your Undivided Attention

The Center for Humane Technology, Tristan Harris, Aza Raskin
Cables2Clouds Artwork

Cables2Clouds

Cables2Clouds
Tech Field Day Podcast Artwork

Tech Field Day Podcast

Tech Field Day
The Cloud Gambit Artwork

The Cloud Gambit

Packet Pushers
A Bit of Optimism Artwork

A Bit of Optimism

Simon Sinek