YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

How Linux is built with Greg Kroah-Hartman

The Pragmatic Engineer β€’ 79:22 minutes β€’ Published 2025-07-06 β€’ YouTube

πŸ“š Chapter Summaries (21)

πŸ“ Transcript Chapters (21 chapters):

πŸ“ Transcript (2415 entries):

## Intro [00:00] There's a 9w week release cuz every 9 weeks there's a new release going out, right? So there's Lenus does a release this point in time and then the merge window is considered open and then for 2 weeks all the maintainers send Lenus all the stuff they've had pending from the last release. We have two weeks to add all new features and then he does release candidate one. From there on it's bug fixes only for the next 7 weeks. So it's bug fixes only, bug fixes only, bug fixes, it's regression fixes, we'll revert things, no new features. Do I understand correctly that in the case of Linux is this a thing where every 9 weeks there will be a release? It's time based. So we have that two week window of merging all the new features to Leninas that have been in our tree and accepted already and proven to work. And the window is short 9 weeks. We used to have three year long development cycles. And the problem there is even if you have six month development cycles, there's that fear of you have a feature, I want to take your feature, but it's not quite ready. Do I want to wait and things like that? But if you know that you can get your feature in in 9 weeks from now and it's just not ready, it's not ready. The pressure is off me as a maintainer to take your new feature until it's ready. Linux is the world's most widely used operating system thanks to powering most Android devices, servers, smart TVs, and embedded systems. But how is it actually built? Today, we sat down with Greg Crow Hartman, a Linux kernel maintainer for 13 years, who is one of the three Linux Foundation fellows. In today's conversation, we cover details on how widespread Linux is and why mobile versions of Linux have three times the lines of code as a server versions. What exactly it takes to get a change accept to the Linux kernel and merged by Linux Trots himself. How Linux manages to have 4,000 contributors per year yet have no product managers or project managers and many more details. If you're a software engineer, you will use Linux directly or indirectly. And this episode will help you understand why it's so widespread and how it's a lot easier to contribute than most people would assume. If you enjoy the show, please subscribe to the podcast on any podcast platform and on YouTube. Thank you. So, Greg, it's really just nice to to have you here cuz you're one of the most well-known Linux contributors, one of the few one of the longest standing ones as well. So, just welcome to the podcast. Thanks. Thanks for having me. I think as software engineers, we know Linux is important in the sense of it's it's running on most web servers that that we use and run. It's it's a desktop OS that some people use and it's of course, you know, powering a fork of it is powering Android. But what is there ## How widespread is Linux? [02:23] to know about Linux? How how big is this thing? How complex is this thing? Well, it's yeah, we took it's an operating system. So, it's a kernel. Um we took over the world without anybody noticing. Um I joke it's um Android devices or 4 billion Android Linux users out there or and they don't realize it. Um that's everything else is a rounding error which doesn't make the server people happy with me but it's true. It's in everything. Um we it's in all the embedded devices. It's in the air conditioning units, the car electric charging ports, uh satellites, runs international space station. Um really? Yeah. Yeah. Air traffic control for Europe and probably the US. All the financial markets. Um, I don't think it's in the cameras that we're using. Um, so, um, it's, yeah, I don't know of any place that hasn't taken over. The number the top five selling laptops for the past 15, 10, 15 years. Chromebooks. Those are all Linux based. Not Apple, but the Chromebooks are. Um, yeah. Oh, iPhones. So, every 5G modem out there is running a copy of Linux. Really? Yeah. Wow. Wow. So now with Apple doing their new ship, I don't know if it's the new one, but um Qualcomm, all the 5G modems, probably the 4G, I'm not sure, but I know all the 5G modems have Linux inside it. This episode is brought to you by Work OS. If you're building a SAS app, at some point your customers will start asking for enterprise features like SAML authentication, skin provisioning, and fine grain authorization. That's where Work OS comes in, making it fast and painless to add enterprise features to your app. Their APIs are easy to understand, and you can ship quickly and get back to building other features. Work OS also provides a free user management solution called OKit for up to 1 million monthly active users. It's a drop in replacement for Ozero and comes standard with useful features like domain verification, rolebased access control, bot protection, and MFA. It's powered by Radics components which means zero compromises in design. You get limitless customizations as well as modular templates designed for quick integrations. Today, hundreds of fast growing startups are powered by work OS, including ones you probably know like Cursor, Verscell, and Perplexity. Check it out at work oos.com to learn more. That is work os.com. Well, first of all, I'm I'm just kind of reflecting on why I never kind of, you know, like thought about it like this cuz in my mind it was always like, you know, Debbian, Red Hat, it's it's it's on a server side. Maybe that's because that's where I actually see where it is. Of course, you know, there there's the I'm a right now I'm a a Mac user and there's the the Unix influence which which is an influence. You know, it it gets pretty close. I think it's a good time to reflect on on on how many things it it actually runs in terms of the the kernel itself like how large is it? And I I know you know for for different devices it'll be split differently for for serverside Linux for for a an Android device it'll use different parts of the kernel. How how big is this in terms of contributors lines of code? I know lines of code is not a great measure but lines of code is interesting. So we have just under 40 million lines of code right now. Um that's a lot. That's all the kernel that's the kernel. core part is like 5% of that that everybody runs and then everybody the rest of it is hardware support different drivers different devices different architectures different chips um so your laptop runs about two two and a half million lines of code your server runs about one and a half million servers are really easy those are very simple things your phone runs about 4 million your phone so are the most complex pieces of CPU and interaction out there they're just crazy complex why why is Like can we just pause for a second? So so like again ## The difference in complexity in different devices powered by Linux [06:00] lines of code we know is not a perfect measure of complexity but but in in this sense comparing it between the two of them with the same code base is somewhat so you said roughly give or take a server is one and a half million a phone is 4 million like three times the lines of code for a phone. Why the difference even though I I would I would think that the server you know does all this mission critical stuff. A server is really simple CPU and a network card and a storage and storage that's it. So SOC on a phone has you have power control, you have clocks, you have five different buses on there talking to different types of devices. You have battery control, you have talking to your modem, you have another version of Linux in the modem. Um, you got USB out the back. You got USB bypass to talk to the audio side. You have audio drivers. You have a zillion different clocks and fives and all sorts of stuff in there. The SOS and it's a eight core machine. It's there's eight processors and nothing. Those are not trivial things. And sometimes those processors are different sizes. So you have big and little sizes which add the complexity just for some control for some power management but they all run the same core of the Linux but it's the drivers and the devices and things like that. So your Pixel phone I I look at Pixel phone Google ships a core kernel that all Android devices pick not hardware specific just says ARM 64. Pixel has 300 other drivers they add to get the Pixel phone working. I mean some of these are tiny. This is for this tiny chip. this this but your phone is is really one of the most complex beasts out there for software. Is it safe to say that you know the complexity and you know the lines of code will to some extent scale with that has to do with the hardware the capabilities and and you know not about you know like how mission critical because of course it need the phone needs to be stable the server needs to be stable my TV needs to be stable so you know that's just kind of a given right yeah oh and all TVs for the past 15 years are all running Linux so that's oh so my Samsung TV is running Linux Oh yeah oh yeah Samsung my your Samsung my Samsung washer and dryer are running Linux so Um your Samsung watch is running Linux. So um Samsung has their own yeah distro all works really nice. Um yeah it's all due down to the complexity of the hardware. So the kernel controls the hardware. The job of Linux is to make all the hardware look agnostic to programs. So you can write the same user space program and run it on the same on different hardware and it does it just works. hard a kernel's job is to manage memory and devices in a common way and provide that to user space. It's not a we joke from the kernel like user space is just a test load, but I mean it's a tool there for you to actually solve your problems. So when you're running servers, you're wanting to put message through and network and storage and stuff. That's that's your load and that's what they're there for. For a phone, you want to control a a display, you want to talk out the modem, you want to talk on the thing, you want to listen to audio. Yeah. Lots of different things there. And I'm just want to touch back on on the kernel because like I I'm not a Linux developer. Like I I know you know I' I've heard of the kernel. I in my assumption it is the the critical part as as you said the you know the thing that runs immediately and then it will you know the user space will run on top of it. But what is the differential? What what makes a kernel? And you said it's about 5% of all of these things. How how how do you split this or like is there a definition of again you're a ## What is the Linux kernel? [09:20] kernel developer so I'm I'm I'm trying to get a sense how can like someone who I'm you know let's say I'm I'd like to contribute to to Linux and understand how it is eventually I'm going to figure out what what what this kernel is but what is it what what makes kernel and and non-kernel so kernel versus user space so there's an idea um chips have a protected mode and a not protected mode in a very simplified way there's different levels of protection so The protected mode is where the operating system runs the kernel. Yeah. And that is where we share all the resources. It's one flat address space. Got it. And we are not isolating processes. Got it. So a user space process then runs on top of that and we isolate them and they they all individually think they have the whole machine but they don't. Yeah. So it's multitasking. You can run multiple programs at the same time. And the kernel is there to give you memory to give access to storage to give access in a common way to give access to the network in a common way to give or provide the pipes to go around the network stack in the user space. Um some people don't like using Linux's network stack. They have their own to provide a way for all your different mice to show up to user space in the common way. We know all the different mice um USB to storage devices, your graphics controller. We provide a way to make it so that user space can talk to the kernel in an agnostic way and it'll their stuff will just work because it all the graphics work the same interface. We talk to keyboards all the same way things like that. So it's a commonality of providing a a shim layer above the hardware and then for example drivers do they always live in the user space? So yeah no all our drivers live in the kernel. So the kernel and drivers are all Linux is not a micro kernel architecture it's a monolithic. So the the code is all in the same address space. So a bug in any one of them has a chance to take any part of the kernel down. Mhm. So Linux ships all the drivers for all the architecture in one big tarball. We that's 40 million lines of code. Other operating systems try and go out there and um had split things off. So the core of Windows is their kernel and then you can put drivers additional on the top. Um we tie everything together in one big giant blob. Theirs is still monolithic. any driver and theirs will can crash the kernel within reason. Um, in that way we can refactor the way the interfaces between drivers and the kernel are. Uh, Linux drivers are on average one-third smaller than other operating system drivers because we can see the commonalities if you send oh three different drivers for three kind of same hardware. Well, let's combine them all make it smaller and refactor things and make it easier and oh, let's change this API. And this has to do with the open source approach, right? that you see it like so we have we see all this common code and we can refactor it and we can make it better and cleaner and we're not tied to any fixed interface. Our fixed interface is between the user space and the kernel. We will not break that. That's our guarantee. We've guaranteed it for a long long time. And so we always want you to be able to upgrade your kernel and not feel worried that your old programs are going to crash. So you should always be able to upgrade. That's our guarantee to you. If it does break then it's our fault and we'll regress. There are some exceptions. There's some gray areas. There's some really low-level parts between the user space and kernel that we kind of work around and we argue about these all the time, but we never try and break user space on purpose. Yeah, a lot of times we do accidentally, we'll fix it up. That's our number. That's our only really rule of of kernel development. Don't break user space on purpose. And and so when we're talking about the, you know, the 1.5 million lines drafted per server, we're talking about the kernel and plus drivers. Kernel plus driver. Well, because it is part of the kernel. And then, you know, you have this 40 million line of of tarbo tarbal. And then every platform will kind of take their parts of it. They'll they'll take, you know, what is relevant for for their use case, capabilities, drivers, you know, other parts. And then this is why I guess you Raspberry Pi. You're going to say it's going to run. It's laying on Linux, right? Of course. Yeah. That's Oh, yeah. Yeah. That's Raspberry Pies. Yeah. Huge. Those things are everywhere. That's what's in all the electric car charging stations. Those are Raspberry Pies. Really? Yeah. Where you plug your car into Raspberry Pies? Yeah. Those are all Wow. Cuz it's a really cheap um industrial thing. Oh, lots of signages now. That's all those are all running Raspberry Pies. I I I guess it was safe to assume they're not running running Windows to be fair. So, no. Yeah. So, the Dutch um signage for the trains, those are all running Linux. Sometimes you'll see a crash Linux machine up there. Can we look at a specific example of how development actually flows through uh with a specific patch? Before I show a specific example, so say so we had 4,000 developers last year. So, they make a change. So those 4,000 developers will send an email to a maintainer and a maintainer maintains a subset of the kernel. Every part of the kernel is owned by somebody and then you are one of these maintainer. So then yeah I maintain some drivers and things like that but then those maintainers send things off up the tree to a subsystem ## Why trust is so important with the Linux kernel development [14:00] maintainer. So like USB serial then will get sent to USB and then USB will go to to Lenus. So it's kind of a pyramid scheme that way. We have that. So we we have like 800 maintainers and we have the middle section we maybe have 200 different trees there and then in our testing environment all those trees are tested every day they're all merged together and things that happen whatnot. Um so we have this kind of hierarchy of developers and maintainers that way and part of the hierarchy is the human aspect. So I if I take code from you as a maintainer um I'm now responsible for it because my name's on it. So, if it's a simple one-off or it's a simple driver that nobody cares about except you, great. I know you're the only one that's going to be affected by it, it's fine. But if it's the core part of the kernel and I take changes from you now, I'm responsible for it if you disappear. So, I have to trust that either you're going to be here or that I understand it good enough that I can maintain it. So, part of Linux development trust or issue or model is trust. And it's trust in human interaction. like I will take stuff from people if they whatever they send me cuz I trust not that they got it right but they'll be there to fix it when they get it wrong because we all get it wrong and that's that's the part so that's that's the trust model we have and that we've been burned in the past by some major features were landed in the networking core subsystem a long time ago and then once they landed and were merged and taken the email address behind it disappeared and the network developers had to took six months to unwind the mess um so it's hard to change is a core part of the kernel for good reason because it affects everybody and also good reason in that we want to make sure that you are going to be there to fix it if it breaks. Yeah. But for drivers and things like that we'll take anything. Drive by will take it's really simple and it's very simple thing that way. But um that's the hierarchy. So it change flows up the tree that way. Yeah. So I can show that. All right. So what what are we going to see? So here is a change. So this was uh written by somebody named Chester. um he made a change to the USB serial driver. Uh it's ## A walk-through of a kernel change [16:02] an option. The chip is called option. Um these are a USB to serial devices. Um they're in modems. They're in lots of different things. There's a ton of different ones and there's no standard for these types of devices. So you had to add a custom device ID for every single one that you want to use. It's just the way they work. Um so here's a patch and here's the description of it. Um this is just an email. The description here is there's the subject line. Yeah. USB serial option adding whatever adding that device and then here's so good part about uh hardest part about writing a kernel change is the description of what's going on really yeah I mean the the code is easy it's the description explaining it is hard right you don't explain usually what it's doing you have to explain why you're doing it for something as simple as this it's like it's really easy so this person says uh this driver is part of a cat 6 modem uh the product ID is shared across multiple modems it gives some a a little dump of what it looks like in the device. And then there's some more information. There's a signed off by line. And signed off by is what we created a long time ago. Um that shows that I have the proper authorship of this and ownership and I give it to this project under the license by which the project is run. So it's saying I'm licensed this thing under the GPL. Um and then way down below is probably the oneline patch. This is all just so this is all the description. This person is giving context on like here's what needs to do about the model the different you know specifications or what and here's the change. So somebody changed this removed that line the red is removed the green is add. Yeah. So somebody added and had to reformat the lines based on some new ones they added. Um and that's it. So they add a few new device IDs and then there's a device ID and then we see a hex a few hex numbers. Those are like some IDs here. So for USB um USB devices have a product and a uh vendor and a product ID. That's how they're vendor and then they have products and then there's some subvice and subproduct IDs. Got it. Got it. So that's what this is for. Okay. So they're saying we're just adding support for some the driver already works for these chips this chip but we just have a new ID because a new vendor came along and they wanted to put their own vendor ID on it. Very common. So it sounds like this change is as simple as it get in terms of the code changed but still the description was very extensive right so very des extensive um part of the description was also just here is a dump of the description of the hardware just so that we can verify yeah that is going to match with this got it so just it's a we have tools that create those things and but yeah it's a lot of work for four line change well but but this is like if if you know if we talk GitHub language which although I'm not sure this this will be a PR, right? This is this should be in the patch itself, not the PR. Ah, so a PR would be so say you have 10 changes you want to make. So a PR would be the patch zero out of 10. Got it. Got it. Yeah. This this is the commit. This is the commit itself. Which is a big problem of why I don't like the GitHub model is because people don't put the changes in the GitHub in the git commit. No, because the git repo Well, no. So there's a problem when you commit the when you're looking at the repo later and you look at the change, you don't see the pull, you can't see the pull request information. Yeah. And it's gone. And that's a big problem I feel with the GitHub model. Well, I I feel this goes back to, you know, like you built the tool or you know, Linux group built a tool for your use case and you're using it the way you intend to use it. Whereas GitHub built the pull request workflow is built on top of this and it is not part of Git for whatever reason. You know, maybe GitHub could have made it part of get whatnot, but it's not right. Well, no. So, we have pull requests. We created pull requests for in Linux. We have we email there's git create pull request. Oh, okay. That was that was a good command. It makes an email. Is that part of the get? Yeah, it makes an email that says pull from this repo and here's everything that's in this repo. And when we do a merge of that, we that merge commit has all that information in it. Okay. And then so you'll if you look at the Linux kernel and you see when you merge when Linus merges in the USB tree he sees my little message at the top saying here's everything that's going to be in this pull request. You got it. And because Git is the source and that's where all the data is, right? And so you can see that we don't have pull requests. It's not external. GitHub could change the model and put that in the merge request, but it doesn't. I I I was about to say that like because you did it, they they could do, but it's it's a matter of I guess preferences and Yeah, that's fine. Anyway, but the good thing about this is you can track every single line of code back to who made it and what they did and what the was the change what was the change log what was the reasoning behind this which is great. Okay. So, so this comes into this person sends it to the the module. So, he sent it to the Yeah. So, the owner to this is Johan. Uh there's a script we have that says take any patch and give me the people who are responsible for this and the mailing list. So Johan and me um picked this because us and sends it to the USB list and copies a bunch of other people that I guess they worked with and that have changed this driver in the past and and then the the copy is also done with the tool as you said. It kind of looks to who who touched this code or or who might all automated. Yep. All automated. We do that soon. So that was great. Um he sent it and this the mailing list has two copies of it. That's just because it went to two different mailing lists. But then they said, um, oops, I messed up. There's an email from the person instantly after he sent it off. Oh, and said, I I messed up. There's an interface. I need to maybe it would be a good idea to change this comment. So, they go and change the comment and then they resend it and you just send a new version. And then um and then in this case, they send a new patch or or do they do they do they just add one more? No, you want to have a clean commit, right? Yeah, we don't do So here they sent a version two patch. If you can see that it says version two, right? Oh, there. Version two. Got it. And then here's the same information. And then there should be some comments about what changed between the two versions. Hopefully. Yes. Changes in version two from the previous one. And there's a link back to the first one. Mhm. Nice. Very nice. So that's what we do. So we want to see the changes because I mean I get a thousand emails a day. Yeah. And I when I review patches and stuff, I'll review them and then they're gone because I'm reviewing the next one. Yeah. But if I I want when you send a next version, I want to remember what see what changed from the previous one because I don't want to go back and dig through all stuff. Okay. So they added some information to it. Wonderful. And then what happened? Johan, who is the maintainer of this subsystem? Yep. Wrote said, "Hey, thanks for the patch and how for documenting it." Oh, he did something else. I got the order. First they said um oh he Chester wrote hey please please please apply this after two weeks after a week he said because after a week or after two weeks it's nice hey what happened to this what's going on um Johan said you submit this patch during the merge window um I'll talk about how we do our development model but there's a twoe merge window for when we do releases that Lenus takes all the changes from all the maintainers that have been in their development trees we can't add new changes at that point in time so there's a two kind of blackout for new ## How Linux kernel development cycles work [23:20] development, but this is where all the stuff is flowing into Lenus for the next release. Um, so during that time, if you send me a patch, I can't really do anything with it, but it'll stick in my mailbox until then. So, this happened to hit that little window of time. Ju just understand, there's a 9w week release. Every nine weeks there's a new release going out, right? And then there's a window where patches are gathered. So, yeah, here let's talk about that. So, there's Linus does a release. Yeah. Yeah, this point in time. Um, and then the merge window is considered open. And then for two weeks, all the maintainers send Lenus all the stuff they've had pending from the last release. Yep. We have two weeks to add all new features. Yep. And then he does release candidate one and then from there on it's bug fixes only for the next seven weeks. Mhm. So it's bug fixes only, bug fixes only, bug fixes, it's regression fixes, we'll revert things and so it's no new features. But during that seven weeks, people were sending me new features. So I have a separate tree which is my next you're now batching it for the when the window will open. Yeah. So we call it next Linux next. So we have a next tree where all these are merged together on a daily basis to see to make sure they work. Yeah. Because be prepared for Lenus' next one. And then when he does a release after everything's good, we all throw things at him again in another two weeks. Now Lenus doesn't pull automatically from all those merge trees. We have to explicitly ask them. Yeah. Because sometimes our trees aren't good. Yeah. So sometimes like I maintained the TTY in serial one time famously it was a mess. Our tree it just wasn't working. There was new features added. So I'm like I'm skipping this release cycle. I'm going to pull out some of these bug fixes and send it to you off the side and then go. But if it was like automatically being merged in, we'd have to deal with that mess. It's it's just interesting because most companies just you know reflecting on you know the companies that use git large tech companies they often have let's say let's talk about native mobile development where there is a concept of releasing every week or or or two weeks because of the app store review process or same with like desktop apps such you can't really just continuous release. There's usually a an aim for something but it's not as strict. So of every now and then it would also happen that you know it's it's just not stable enough. will push it back. But there is not this rigid like clockwork like I I think you know most companies that I've seen they just treat it a bit more flexible because again you know they come up with uh thing they're they're in charge your feature you want to have added right yeah and then as as we know when you have a milestone you know like features might be cut deadline might be moved you know like companies totally do I understand correctly that in the case of Linux like is this a thing where every nine weeks there will be a release we it's time based so we have that two week window of merging all the new features to Lenus that have been in our tree and accepted already and proven to work. Um and the window is short eight nine weeks. Yeah. And that's good because we had we used to have two year-long development cycles, three year long development cycles. And the problem there is if you have even if you have six month development cycles, there's that fear of you have a feature. I want to take your feature, but it's not quite ready. Do I want to wait? I know and things like that. But if if I if you know that you can get your feature in in nine weeks from now and it's it's just not ready, it's not ready. It's it's much more like okay the pressure is off me as a maintainer to take your new feature until it's ready. You you you can say like look if if it'll make it into the next one or you know let's make sure it's going to work properly if it's more complex. Yeah, we have lots of features. I mean famously there's a USB feature that's on patch version number 35. This 25 patch series. It's on the 35th version and it's just not ready. And I just got email today saying well maybe we need to change this to this other way. I mean, so I feel feel so bad for that developer, but he's been working hard and it's a it's a complicated feature and it's taken him a year and a half to get there. I have other patches that are in version three, but that's version three and it's been two years because the developer just took a lot of time in between. Okay. So, so so in this case, this is a good example that, you know, the the the person the contributor uh said like, "Hey, a reminder, I'd like this patch applied." And and then uh Johan replied uh reminding of the of the timeline on how it works, right? Yeah. Exactly. And Chester wrote back. And then really friendly. It'll be in the next one. Don't worry. Yeah. Which is nice. Very positive. Yeah. This is We're not mean people. We want Yeah. And in reminding, don't ever feel bad about reminding me that I haven't reviewed your patch in two weeks. Now, if I haven't reviewed it in two days, yeah, I'll be a little testy, but two weeks is a good idea. And then Chester Rob, thanks a lot for keeping an eye on it. Keep up the good work. And that was it. So then Johan has it. Yeah. Johan applied it to his tree because he then wrote saying, "Hey," and Johan is very nice here. He said, "You kind of didn't do the comments in the proper format. I fixed it up for you." Oh, nice. So for driveby changes like that, we want to make it really easy and make it We're not mean people. I I mean I mean I mean clearly this this feels like it's a person who is unlikely to become a regular contributor. They're getting their work done, right? They're they're adding they have a device that they have to ship. Yeah, pretty much. But we want to be friendly and open and easy to everybody because everybody submits their first patch at one time, right? Famously, I when I did my very first patch, I wrote an email saying, "How do I make a patch?" Because we didn't have good documentation. Somebody wrote back and said, "Hey, here's how you do it." Um, he became my boss eight years later. It was like I worked for I ended up working for just funny. It's just like a small world and whatnot. But um but yeah, and we want to make it easy. So Johan takes this and he's got the patch and it's in his tree now. Yeah. Which is great. So, but that's in his local little tree. Um then he has to get it off somewhere else. Johan then makes a pull request to me. Mhm. So this is an output of the get make pull request. I don't know what the actual command and it this is what a pull request from get and this is because Johan is a subsystem maintainer and it maintains the USB to serial drivers. There's a bunch of drivers for this types of things. And then he sends it off to me the USB maintainer. Got it. and he says take this patch or pull from this tree at this tag and it's a signed tag. So it's signed with his GBG key so I can verify that it's really him. Yeah. When I pull from it and it says take these patches and here's the information. It's going to be some USB device IDs and they have all been in Linux next with no reported issues. So they've been tested in our integrated testing. We test all this stuff every day. And what does testing mean? Is is it automated testing? Is it pushing it out on on devices in device labs? Is it a mix? Yes, it's all of that. So, we have one. So, Linux next gets merged every day as developer in Australia. He merges all the trees together and builds them and boots them. Y and virtual machines. Yep. Um that's a non-trivial thing for a colonel to do just to boot. ## The testing process at Kernel and Kernel CI [29:55] If it can boot, it's usually, hey, things are going well. Um it isn't testing on real devices. Now there are other labs out there with kernel CI which is our CI infrastructure that can run on all individual labs and we do push things out there and our people testing Linux next on their real hardware sending us reports back in an automatic fashion. Um those are less rare. Lenus' tree gets tested more on that. The stable trees I can talk about stable trees in a little bit get mean tested more on the real hardware more. Linux next gets build and boot tested pretty well. Yeah. Um I don't run Linux next. I run the my development trees on mine. So I don't run all the miss mix of them all. Sometimes they interact because we don't have any fifoms. Um if I have a USB change that needs to actually go through the networking stuff, I can change the networking code and whatnot like that and they can say hey maybe you shouldn't do that and we try and get approval. You review my patches but it's now we can touch any any but can touch any part of the kernel in a way. But he sent me a pull request and I a pull request is that I don't actually review the changes in it. I'm not reviewing each individual patch through email. I'm trusting that he sent me four patches here and that they're good. Yeah. And I have known Yan and I know that he will be there if something goes wrong. Yeah. And and like you will you read the kind of the the description and then every now and then you might decide for example to like deep dive into a a change. Totally. I mean for USB device IDs it's like okay yeah well they're all attached to the same driver. Yeah. These are common. They're nothing simple. Sometimes they're a little more complex. Um, I don't pull from a lot of different trees, but I pull from some that I trust. Some subsystems that I don't necessarily trust as well. Um, I will make you send them an email and I'll actually review them and then I'll review them. And then when I review them, I add my signed off by to it and I I guess part of trust will be here. I'm just going to assume that since you and Johan know each other well and you work for a while, Johan will probably also every now and then give a comment saying, "Hey, Greg, there's this change. Can you take an extra look on on this thing?" etc. Yeah. So sometimes Johan makes changes to the code himself or I ## A case for the open source development process [31:55] make changes to the code myself. I put it out for review and I have other people review my changes. So this is just just fascinating for me to tell you explain how trust between people maintainers is so important for efficient development. Yeah, it's all it's Yeah, it's And then also the trust is somebody once told me that Linux development was the scariest thing they ever did because not because it was like difficult or what not. It's because my name is on this change and it's public. That's makes you as an engineer do really really good work. I mean so much for so that this person who submitted this patch went back and looked at it instantly and said, "Oh, wait. The comment could be made a little bit better." And they're like, "Oh, yeah." So I mean that's not a normal development process in a company that I commit to go. It it it makes me you know wonder about a few things that I kind of took for granted. For example, you know, like does could this mean that you closed source software where the outside world does not know how it was done? maybe there's just a bit less incentive to do, you know, like such great work. And actually, it's just a reflection like I do remember when when I worked at a company and when we actually my team, we open sourced a component that we built and I just remember how I put in way more work into that to make it look good to have the document and not just look good but but make it clean. We we cleaned up actual tech depth before we published it. And we didn't do that with our stuff. It it was so open source development by virtue of just human pressure makes a better engineering product. It's a better engineering and then and we've kind of shown that through the years that this development model creates a better software. I'm I'm kind of revisiting some of my like not assumptions but I I never thought of it like this but it's it's just it's awesome to to see this. So So then what what what what happens next after after Yuan sends it to? So Johan sends it to me and then I take it and I put it in my tree. I think I send do I send him saying I took it and and then if you take responsible, right? I pulled it and pushed it out. Yes. And there's my email that says that. So now I'm responsible. It's in my tree. Yeah. So now the um since this is a device ID, these can go to Lenus at any time. We can add bug fixes or new device IDs. These are trick. So then a few days later, I send this change off to Lenus. So I send Lenus. I said, "Hey, Lenus, take all these following changes, these changes, and here's a whole bunch of USB fixes." So, here's some small driver fixes, some new device IDs, and then So, I summarize it all. I say, "These are all the things in here." Yeah. And these are going to be like a few dozen of of patches, something like that. Yeah, there's a whole bunch. And but here's the list of the patches down below. And here's of them. Here's the diff of them to make sure that this diff matches what he pulls from. This is signed with Mikey. Mhm. Um I do say almost all these have been Linux next. I guess some of them slipped in, but we also have another testing when you send patches to the mailing list. We have a we call it a zero day bot. We'll go through and start applying them and build testing them. Mh. And that's run and then our L our own trees that we create also does verification that they did build and boot. Yeah. And it will run some benchmarks for drivers. It doesn't really run benchmarks. Um and so then Lenus takes this and he puts in his in his tree. So then it got picked up. So it got picked up another day later. And then let's talk about how we do our model. So Lenus does a release every nine weeks. Yep. Bug fixes come in during those nine weeks or the last release. You're running the last release, right? You want those bug fixes. You had a device that's running those bug fixes. A long time ago, we realize that people don't want to wait 8 weeks. So let's create a model of we have a development tree and we have a stable tree. So when Lenus does a release, I fork off Lenus' branch and I say this is a stable branch. So if 6.4 four. I do 6.4.1.2.3.4.5. ## Linux kernel branches: Stable vs. development [35:44] And our release numbers are just numbers. They mean nothing. They're not semantic versioning. We were around way before that happened. They're just meaning this number is later than that number. Yep. That's all. When we switch from 4x to 5.x, it's just because the x got too big. Yeah. And in your brain, when you see a number between like 14 and 18, it looks smaller than 4 to 8. Yeah. So, and Yeah. So, we just bump it up every couple years. So then we so we take stable we have stable releases. I do a release every week and what I do is the patches have to be in Lenus' tree first. We can't diverge. So if it's in Lenus' tree first and a bug fix and it meets this criteria, I put in the stable tree and I do a release. And so we do new releases every week for that. So during those nine weeks, I'll take new device IDs. I'll take bug fixes and whatnot. And then you can tag the fixes that are going into the tree with a special way that I'll automatically take them. I know to look at them. the other stable tree maintainer with me, Sasha, he runs through them and runs a whole bunch of fuzzing. He's been doing AI before it was called ever AI. Um, it's just pattern matching. I mean, and we have a whole body of here's a whole bunch of bug fixes. Here's a whole bunch of changes. Did anybody do these kind of match? Oh yeah, these people because some people don't realize that, oh, this was a bug fix. It should go into the stable tree. They've written academic papers on it for years. It's fun stuff. Um, so just pattern matching, right? So, um, then we'll pick up a whole bunch of stuff that, hey, maybe you forgot about that and you'll give me a chance to respond to before it goes into the stable tree. And we do those releases. When Lenus does a new release, then I throw that stable tree away and I make a new stable tree. That's great for things that it can update more often. People want to make a device. You want to make it something that's going to last a long time. So, what we come up with the idea is long-term stable trees. And there I pick one kernel a year and I maintain it for to start with two years, sometimes six years. So your Android phone is running off a kernel that's five years old, but it's still getting bug fixes back to it. So I I maintain like four st long-term stable trees at the same time and we backport all these fixes to all the different branches and then we pick one a year and we maintain these. So there's six of them going at a time and and in this case like is it is it you like there's one maintainer for each of these long term? No, it's just me. Oh, you Wow. Okay. Yeah, it's the two of us. Um the longer the the interesting thing is the older the code is the harder it is to maintain and the companies like oh I'll put a junior developer to maintain old code that's harder because it's more diverged from what the latest developers are using. Can you tell me a little bit more about this cuz you know the the older the code the harder is to maintain like I think it feels true but but why why is this the the case? Is it just lost context? Is it it's so development moves on goes forward right? So say a change I make today to the codebase that fixes the bug ## Challenges of maintaining older Linux code [38:32] that's going to that affects the code and I look back it's affected the code for the past 10 years. Yeah. All right. If I try start backporting this change to code that's 10 years old. Code has evolved in that time. Yeah. And making that change to older code is harder. And the more I have to change it, the more it diverges from the original fix. So the more context and skill you have to have to make the change to the older codebase than even the developer who made the first change. It's it's not intuitive. Uh companies make this mistake all the time thinking, "Oh, I'll just maintain this old codebase for a long time." We have major security bugs like Spectra Meltdown with chips. Yeah. Some of those Spectra fixes have not been backported to some of the long-term kernels that are still being supported because it was just too hairy of a fix. Anybody who cared moved to a new kernel. Yeah. So, I look at a lot of these older kernels is it's again if you're using it, you will provide the resources to maintain it. Google, I'll call out, and Laro, Google's um another group do a lot of work in testing these old kernels because Google cares a lot about these kernels. So, they provide testing infrastructure and merges and reproducibility and and running on real devices to make sure that these kernels still work on them and they work well. And that way, I know that if I make a change back there, it'll still work. If I didn't have that resources for them doing that work, I wouldn't be able to maintain these old kernels. Yeah. And and then go going back to the the buck fix. So like every week there's a a new stable branch release and then when does the the big release come the the nineweek release come that that's after this has been kind of baking right for the stable branch has been so stable's independent of Lenus's tree. Oh stable independent so the only tie is it has to be in Lenus' tree first we do not want divergent we don't want you to make a fix to a stable tree only in non lenus tree got it. Sometimes I will have bugs in the stable tree due to other changes I've t I mean fixes need fixes and I'm like I can't take the fix for this until you get the fix and leanness a tree and it's kind of a forcing function on a developer to get a fix to Lenus before I'll take it from the stable tree. Sometimes I'll revert the change in the ## How Linux handles bug fixes [40:30] stable tree. And do I understand the the way to get a fix into Linux is a well of course you need to get a a fix into Linux's tree which means you need to go through one of the maintainers who uh is is in you know who who maintains one of the the subsystems. Yeah. So say and you just need to go up the tree as you up the pyramid. Right. So uh famously Bluetooth always breaks every other release. Bluetooth is crazy complex. The hardware is horrible. And if you need to get a fix in there, it has to go to Bluetooth 3 and then that gets sucked into a networking tree and then that network tree goes to Lenus. So it's like a two-stage process sometimes. And then we have somebody tracking regressions. Regressions are really important. We don't want anything to regress. Sometimes Lenus will say, I'll just take these bug fixes or regressions. I'll just take them now. Boom. I'll just take them. So um depends on what they are. If they affect hardware that's really common, we prioritize that over hardware that isn't as common just by virtue of, hey, this broke my laptop, right? I want to keep working. So yeah, it's a little thing that way. So we have two branches going at once. Development and then stable release is happening. So then this went into Lus's tree. Um I picked this out as part of the stable trees and then they ended up in the stable tree somewhere as well. Um and then I can give you dates for all this stuff. All this whole process took about a week and a half. Mhm. And that was it. Okay. And then here is it ended up in the 6.13.4 kernel as well. Yeah. And then and other ones as well. Back to trust isn't just earned, it's demanded. Whether you're a starter founder navigating your first audit or seasoning security professional skill in your governance risk and compliance program, proving your commitment to security has never been more critical or more complex. That's where Vant comes in. Vantic can help you start or scale your security program by connecting with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives your time back so you can focus on building your company. Businesses use Vant to establish trust by automating compliance needs across over 35 frameworks like SOCK 2 and ISO 2701. With Vanta, they centralize security workflows, complete questionnaires up to five times faster, and proactively manage vendor risk. Join over 9,000 global companies to manage risk and prove security in real time. For a limited time, my listeners get $1,000 off Vanta at vanta.com/pragmatic. That is v na.com/pragmatic for $1,000 off. So, we saw what it takes to get a fix into into Linux, and it actually wasn't that complicated. No, it really is. I mean, it's just you email a change off and you you email, you use the Git workflow. So if if you're familiar with get it, it's it's pretty simple. Obviously I guess I'm obviously you need to be able to build Linux uh test it on test it yourself validate locally that it works the the basic things and then straightforward. The fun thing is so I can take a change like that without really testing it because it built it obviously works for your hardware. I can't I didn't test it but it works and I assume that it goes. Yeah. And yeah it's um very fast workflow as far as getting a project. So it was like a two-eek window from sending the first change that was the merge window to getting it out into stable kernels to the world. That was pretty fast. Yeah. For overall for a worldwide project that is everywhere. So I think I understand what it's like to you know be someone who contributes to to Linux every now and then. But over time some people start to contribute more. They become more regular contributors and eventually you're one of the few people or one of the few or many people who works on on Linux full-time. Are there many people working on it full-time? So, Linux has almost always been paid to be worked on. So, I started keeping the numbers back in what 2006 or something. And at that point in time, 80% of the people that contributed were being paid to do it full-time for their employer. And their employers want people who know how to do Linux because they want to solve their problems. They want Linux to It's much cheaper to pay a few engineers to add a few new features than it is to write your own operating system. That's the beauty of Linux. That's why IBM put a bunch of money into it. That's why everybody uses it. It's a tool for people to get their work done, ## The range of work Linux kernel engineers do [44:40] right? You want to you want to run your battery. You want to run your car charger. You add a little driver for the one device you had. You had an engineer do that and it's good to go and it'll be maintained for forever because we maintain it in the community. It's all good. So, it's cheaper. So, we've been doing it. And the joke used to be you get three changes into the kernel, you get a job. It's not really a joke. Um, as long as they aren't spelling fixes, but um some people do spelling fixes, which is great. We have people that do janitorial work to the colonel. They sweep the tree for common problems and they just clean stuff up and keep code alive and keep make sure it's fresh proper coding style. We have coding style issues. We have people just fixing spelling mistakes and comments which is great because you got to start somewhere. In fact, spelling mistakes and comments is a great place to start because it it makes you get the workflow down. You figure out how to make a patch. You figure out how to send an email. Picture email client and not send HTML and things like that. Yeah. And you can't use a web client that doesn't web email client to send an email. It just doesn't work. Um, good email. There's lots of really good email tools out there. Use use them. But you're you're now a full-time fulltime kernel maintainer. What does your kind of dayto-day or week to week look like? Cuz I'm I'm going to assume it's it's going to be a little bit different than most developers who, you know, like write code, review code, do those kind of things. So, yeah. I mean, I been working for Linux Foundation for what, 13 years now. Before that, I used to work at Nobel and Souza. Before that, IBM and then a little startup all doing Linux stuff all the time. And then before that, I did embedded work. When I worked for a company, you end up working on features that your company wants or reviewing code from other developers of your company, then sending off changes. Or if you're a maintainer, a maintainer is, and the networking maintainer said this the best. Um, we're like editors. We used to be a writer. All we do is critique other people's stuff now. But because we're a writer, we have a little side project. So, we do have little things that we do dabble in stuff. So, like I looked I did 80 changes, only 80 changes last year because I have a few little things I want to do. Um, that was low. But, um, working for Linux Foundation as a full-time maintainer, that's rare. I think there's only maybe five people, maybe maybe a handful of people that just work on whatever they want to do. So, Linux Foundation rule is they can't tell me what to do and I can't tell them what to do. Works out great. Um, me and Lus and Shua Khan, we're all fellows there and we work on improving Linux for however we feel like it. lot of me and Lenus do a lot of review a lot of other stuff. Lenus still contributes. He does he famously rewrote the core locking primitives in Linux a couple years ago. I had a Microsoft developer say there's no way any of us would be even allowed to do that on Windows. You know, you don't test changed core bits and pieces for one of the security features in one of these stable releases. Lena had said to rewrite the the call path from how a user space calls into the kernel, the core SIS call path. Nobody really noticed that it got rewritten, but it did. and he did it and in a stable way and then it worked like that. So, um we're also part of this the colonel security team. We get security bug fixes all the time and if they're easy, we'll just fix them ourselves and send out the fixes. So, we do security fixes a lot as far as that goes. So, my day-to-day is I read other people's stuff. Like I said, I get a thousand emails a day to do something. You're not excitating. No. Yeah. You you get a thousand emails a day. It's Wow. So I don't have a lot of it just file off and I do and it's like oh this this like I subscribe to a number of kernel subsystem mailing lists to see what's going on. Yeah. And I don't have to do something with all of those. Yeah. But some of them you need to do something. Yeah. Some of these I do need to do something with and some like so I'll so say for USB is one of the subsystems I retain. I showed them all off to a mailbox and then once a week I'll go through them all and say okay let's review all these. And so I'll look at my inbox, I'll have 200 USB emails to patches to go through and other people review them and other stuff like that and okay, this maintainer said this was good, not good, whatot and I apply them to my trees, see if they build, if they failed, I'll report those and not you know what you're doing reminds me a little bit of of when when I used to work at Uber, we had this concept of RFC's which I think got ## Greg’s review process and its parallels with Uber’s RFC process [48:33] inspired by by the our RFC process. So people would just just send off a document of here's what I'm planning to to do and and there would be mailing lists for like back and mobile different parts and I noticed after a while that the more tenured engineers and the more experienced engineers would spend increasingly more of their time reading through these things critiquing giving feedback giving pointers connecting the dots. Like it it just hit me when one of the one of the first mobile engineers at Uber was telling me that that he has one day blocked out per day just to go through all of these things which again it wasn't kind of part of his role but he felt responsible. He had all the context. He actually helped so many people avoid certain things just by pointing it out. It's it's the same the same thing or something something similar happened here. It's the same but we also the the different part of this and I'll call this out. Um we don't have grand proposals sent to the colonel list. We don't say, "Hey, wouldn't it be great if you did this?" I don't want to see that. I want to see code that works. Mhm. And I love it. As proof, then code that works matters is because um you've taken the time, you've proved that this can be done. Yeah. Now, not necessarily that it's done right or done the best way, but it could be done. Yeah. And that's now you have the skin in the game, and now I'm willing to work with you and let's go on that. People do send off RFC's of patches. If it's an area I care about, I'll look at it. Sometimes you can get away with this. This is a a fun trick with maintainers. If you send me a patch set that solves your problem in such a way that it's horrible that I don't I hate it so much that I'll rewrite it myself because they'll be like I can't say no because it solved the problem and I want to solve your problem but if I don't say no then I have to take that. So you get you can do that like once a year to a maintainer. I I I I sense that you're eliminating busy work because I I've seen at different companies when you have the proposal process again a lot of companies for good for you know it's it sounds logical instead of starting the work instead of investing time maybe we would all save time by do a little planning up front right but but then every now and then what happens is you get into this never-ending planning nothing happens until either the project is abandoned or someone just sits down writes some code and kind of you know just cuts all the discussions are are done cuz now it works. Yeah. Well, you have to prove that it can work. And so inside companies, I'll say we do have we did like when I worked for companies, IBM was like we had planning. Okay, we need to implement this feature to match this parody with this old version of Unix. How are we going to do that? Let's figure out how to do this. Is this going to work? Yada yada yada. And we have planning and things like that. One of the fun things is um when you're dealing with open source and this happened at IBM engineer over here was tasked with fix this problem. Great. He came up with the solution, submitted all the changes upstream, lots and lots of discussion. Turns out his solution was not very good. Somebody else saw that it was a problem, rewrote it, submitted it, and got it accepted, but it wasn't the original engineer's work. And so the end of the year came was like, how is this person going to be reviewed? And we're like, he caused the feature to get done. It wasn't that he his code made it in, but he influenced the community and made the goal was you wanted to see Linux support this, right? Linux supports this now. And it was had a change in mentality of how management had to treat engineers and also the same thing with with who owns the code. We had people come in and be end up becoming maintainers of certain subsystems and ## Linux kernel within companies like IBM [51:48] that's great and they were maintaining this part of the kernel and then they were reassigned to do something else within the company. It's like oh that's great but you're going to still have to give him time to do that other thing. It's like no no we'll reassign it to somebody else. It's like no no no the community gave that to him. It follows him. If he goes to a different company it follows him. he goes to a different part of the company follows him. And that's actually why Linux is so good. I think when you work at a big company, you're forced to work on new things every couple years, right? And that's part of moving up in a company. You get different tasks and whatnot. Famously, Windows has had like eight, no, five different teams work on their USB stack. Linux has had one team work on their USB stack for 20 years. And then we know this stuff and we have this development and depth there. We just keep coming back to like I was kind of expecting a little bit of a discussion about I came in here just you know using Linux or or indirectly using Linux a lot but not knowing of the depths and I I kind of thought that we would talk a lot about the the tech the processes and every time we come back to the people the trust I I wanted to ask why why you think you know Linux is is has has won so big that it's everywhere but I'm starting to get the answer to this like like you know cuz I was thinking why Linux why not a why not a commercial if I naively uh you know ask myself before this conversation like we have two teams one is commercially funded they're selling their software they're paying the developers really well and then the other one is giving it away for free you know they figured out a model where people are still paid but but you know it's open source anyone can use it anyone can contribute which one would win the long term I naively would have said maybe the commercial one because they're incentive device are going to you know create all these professional things where here it's more intrinsic value but but now it is interesting but so Linux has been contributed to by companies in their own interest so it turns out everybody contributes in a selfish way we want to solve my selfish problem but it turns out everybody has the same problems so your problem being solved is the same problem as their problem we had this when it came down to embedded so embedded happened they came up saying we ## Why Linux is so widespread [53:52] need to change Linux to make it work better on batteries you know Power is really important. Power is very, very important. A lot more efficient. So, we wanted to This was when Linux was first getting into embedded and like we need to make this very efficient. And we're like, great, that's a wonderful little solution. Make it work for everybody. Like, no, no, no, we just care about embedded over here. That's the only person that's going to care about power. It's like, no, you really just make it generic. It'll all be good. Turns out data centers save billions of dollars in money because of power management. And it turns out everybody so the main frames if it's more efficient on on on a mobile phone suddenly it's a good candidate for it to be a mobile OS. Yes, it works for every it works for everything. Um same thing for um multiprocessors. Multipprocessor came out there's two we have two processors in big data center. Who's going to care about that? In your pocket now you have 16 processors. It just works for everybody now right data trends shrink go different places but because we solved it for in a generic way. We forced you to solve it in a generic way, but you contributed in a selfish manner. And that's it's the it's a good way that IBM knew they could put money into it, hire developers, and get the money back. Yeah. So, it was cheaper for them in the long run to do that. And they make money selling support and selling hardware. Red Hat makes money selling support. And that's like that. Intel makes money selling chips. And that's how that's who contributes to Linux is the people who they want to sell a different product. Now, one other thing that's interesting about efficiency We have 4,000 developers contributing. Some of them only contribute one change. Some of them contribute a bunch. Three to 500 companies per year. We're talking about per year. If you told me like this is inside a a kind of commercial company, a tech company, you know, I I would assume that in order to make this work, oh, for 4,000 developers, you know, we probably need to hire 400 PMs. We'll have we'll have about about for every 50 developers, we'll have about 80 TPMs. This is how it would run. Like you're you're laughing. I know I've been there. In fact, you've been there, but I I only come from here. Now, one thing you told me cuz I I was asking you how many pro how many project or you very technical project managers you have and you said zero. How well so in a way the project managers already happened on the back end before the patches got to us. So at a company say IBM I want to solve this problem. They've said how do we solve this problem? Let's put this task. Let's figure this out. And then the patches come out to us. So we don't see that. So we just see the feature when it lands on us. That's fair. So they're there working for the individual companies to get their thing. Sometimes sometimes they're not. Sometimes they're just developers are spitting things out and like this person who needed to get a new device ID. It saves company time and money if they contribute their changes upstream than to keep it as a fork because they have to keep maintaining that fork. So wise companies have realized let our developers work upstream, do what they need to do there with limited project management and it it just works out better. And again, we're only taking things when they're ready, right? We're not having to track. We do have tools. We said everything's through email. We have tools like the networking subsystem has a web page you can go to see what the status of your patches, if it's passed all the CI, if ## How Linux Kernel Institute runs without product managers [56:50] it's been reviewed by the maintainer and things like that. So, we have a bunch of automatic tools based on top of email that'll help you out. And those project managers can go look at those if they want to wonder what the status of their employees patches were at, things like that. But yeah, it's just it's a different model, but it's not like they're there. they're hidden behind the solution for that company. It's it's fair and I think it's also good to to remind that that's the case. But I I feel Linux still figured out a way to just focus on just ruthless efficiency with with automation with focusing on on the work when it's done. So as you said all these things do happen but they happen before and then you can you know like this this part of the process will just be more efficient by design. Yeah. And we also but once a year we get together the core maintainers and we talk about not technical things because we can't have enough technical people in the room for a topic. We talk about process. Is our process working? Is it not working? And we refine it and say, "Oh, maybe we need to do this a little bit better. Oh, wouldn't it be nicer to do this? Hey, we need more testing over here. Hey, can we do this type of stuff?" So, we do we talk about our process all the time. Famously, the leadup to that meeting is a public another public mailing list that we all talk about processes and that process that that once a year bike shedding of our process in public. It helps shake out a lot of things and work out and there are problems. I'm not saying this development model is perfect. It works really well. One thing that's odd about Linux is that we keep going as fast as we are. We're running at 9 to 10 changes an hour. In the stable kernels, we're running 30 changes a day. 30 to 40 changes a day. Mhm. Um 10 CVES a day. A bug at our level is a CVE almost. Yeah. So CBS are the critical. Yeah. It's a security bug. It's a vulnerability. Um they could be as stupid as um a memory leak somewhere or um I rebooted the machine or I took over and got permission. I I don't know when I when I create a CV, I can't I don't know how you use Linux, so I can't tell the severity of it, but I can just say here's a bug. You should you should look at this. So we're responsible for that. So we're running at a huge rate of change. Most large software projects have a huge ramp up and then they plateau with developers and rate of change and whatnot because they've solved the problem. Linux has never solved the problem. And I used to have I had a manager at IBM every year come to me and said, "Hey, is Linux done yet?" I was like, "No." It took me 10 years to finally come up with the answer of um it'll be done when you stop making new hardware. And when they stop making new hardware or having different work classes, then we'll stop. But we're one of the few projects that keep having to add new features because of new hardware. We're not doing it just because I mean Lin has been working for all of us for 20 years. We're doing it to support new hardware to support new use models to support things. We don't add things for fun generally. We add it to solve a problem that somebody had. Most of Linux is is written using C C or C++, right? No C++ just just C. And I guess for some hardware drivers, is there assembly ever involved or no? No. um assembly will drop down into the early boot of a processor and then some core functionality like locking and that drivers or other people will call will basic will go down like string functions and whatnot will go down to good assembly language that's tuned for the different processors also when you boot Linux looks at the processor you're running on patches itself to figure out the best best functions that those assembly would work and then it continues on moving which is crazy it patches itself at boot time so so hold on but but Some of that is is assembly or is that some of that's assembly in the very beginning and some of those low-level functions but drivers don't ever touch assembly. Okay. So so basically like from from a Linux contrib now you know one one thing that actually the way we started talking is uh there is a proposal to do to introduce Rust because it's it's just more memory safe. It's also a language growing in popularity and some people would like to do more Rust development. What is your take on on this? Do do you think Linux at some point uh might support Rust or and you know what what what are your what what is your thinking of doing things outside of C? So we have 25,000 lines of Rust in the kernel already. Oh, we do. Okay, awesome. Yeah. Um so most of that is just bindings. There's no real functionality. Um in the latest release, um if the kernel crashes, it'll put up a QR code. You can take a picture of it to get the crash dump. That code was written in Rest. Oh, nice. Um that's in Rest. Um, so the Rust for Linux developers have been working for a long time. A couple years ago, they came to us and said, "We think we're ready to do this. Do you want it?" And we said, "Yeah, let's try this experiment. You're willing to do the work? Who am I to tell no to?" Um, I mean, it's Linux. Yeah. I mean, it's it's it's now the problem with Linux and Rust is it would be easier to write a core piece of Linux and Rust than it would be to write a driver. A driver is consumed from everywhere in the kernel. Mhm. So you want to talk locking, you want to talk input and output, you want to talk talk to the driver model, talk to the USB port, all this stuff. Drivers have to can be really tiny because they take resources from the rest of the kernel. In Rust, you need to have a binding between the C code and the Rust code. There's an intermediate layer. The C the kernel in C has these very opinionated model ideas of how it handles objects and how it does memory and how it it has its memory model. Rust has its very opinionated model of how it does this type. Same idea. This meshing is tough. This meshing is also the most crazy ## The pros and cons of using Rust in Linux kernel [01:02:01] complex Rust code you've ever seen. So from a new Rust developer like me, I can barely read the bindings, but I trust other people are doing it. So yes, so the trick is we now need to write a binding for every different part of the kernel in order to write a rest code scope, a rush driver. If you want to do the QR generator, that's simple. That was this one function. Yeah. So over the year, the past couple years, people have been trying to write write bindings to try and do things. We've had a bunch of example drivers like a new disc driver, this write a driver in C versus R. It turns out there are still some performance issues with R code versus C code because we can do some tricks in C that they can't do yet in R. Yeah, that's and the tooling and the RS developers are doing it. The core R developers that the language, some of them are Linux kernel developers. They've always wanted R to be working for Linux. Um the rest model is good. Memory safety at our level does not mean that you can't crash the kernel. Uh you can still overwrite things. It memory safety in Rust just means the the memory that you pass around you think you have ownership of or it isn't an ownership of and it when things are go out of scope, they'll get cleaned up properly. So I've seen every single kernel bug for the past 18 years. Half of them will be fixed with Rust. It's just it's just going to be fixed with Rust. It's the stupid oneoff bugs. It's the I oops, I overwrote an array and I didn't realize it by one. Oops, I um forgot to clean up this error path. Yeah, I forgot to unlock this lock. It's I It's stupid little things like that. There's logic bugs. Of course, you can write logic bugs in Rust. You'll always have those, right? So, but famously, the code the QR code for in Rust that made the QR C passed into the rest code a pointer to a buffer and the buffer size. The rest code forgot to look at how big the buffer was and it scribbled right over memory. So you can write memory unsafe code in R just fine and you can crash things in Rust. So memory safety here means it's the safety of object life cycles and things like that. It doesn't mean it's going to remove all bugs. It's not a golden bullet or anything like silver bullet. But I think yes I think Rust needs to come in because it should be easier to write drivers in this stuff. We have a lot of issues with lifetime rules of when you yank out a device. Devices are dynamic and dealing with these reference counting of things like that is very tricky to get right. There's parts in the colonel we still do not have it right and we know we don't have it right. Rust is forcing us to actually document our C code better and it's cleaning up. So if Rust disappeared tomorrow, I've had to clean up code in the driver core that's like, oh yeah, I guess we can do things better and safer in the C code in order to make Rust easier. Mhm. And we have and so it's making us rethink how we do a lot of our existing code in the kernel. To be fair, a lot of core kernel people are very resistant to that. They don't like change, don't like different languages. Um, one core kernel developer said, "I don't like working with a project that has um multiple languages in it just because it's tricky and they are free to do that. They're not stepping on anybody's toes. Um, a lot of it's miscommunication and a lot of it comes down to people." Again, famously in this binding I wrote the driver core many many years ago of how drivers work in the system in the kernel. There had to be a binding for that in Rust. I this code I saw I said this is horrible. This isn't going to work at all. It's miserable. I went and actually met with the developers and we had there's a rest Linux conference. We sat down. I think they gave a whole presentation just for me. Um turns out I was wrong and they were wrong. We both were wrong and they were doing crazy things like they had a thousand lines of C Rust code that that I do in two lines of C code. I'm like well why? They're like well we didn't want to change the C code. I'm like we can change the C code because I just did that because it was easy in C but if I change that you get rid of a thousand lines of Rust. Let's do that. And again it comes down to okay understanding what your problems are understanding what my problems are and let's work together. And now we have bindings in the kernel that you can actually write some drivers with. And the Red Hat developers are starting to write the new Nvidia GPU drivers in Rust and they're starting to put the proposals out there. The Apple GPU drivers are for the Apple MacBooks are written in Rust. Those patches are not merged, but they're written to rest and prove on on a fork. Um, that works great. Um, there's a whole bunch of crazy object life cycle issues with graphics drivers and Rust makes it a lot easier for them to do. Um, I think you'll see a lot more of the driver simple stupid drivers for hardware devices being written in Rust because all they want to do is read and write to some random memory bits and it's really easy to do that in Rust and you can do it in actually less code than you can do it in C code. Yeah. And I think that's we now have the infrastructure in there. So I think we've hit the tipping point where you'll start seeing new stuff in there and we need to do that. I mean there's mandates from governments that you can't use memory unsafe languages like C and products. Yeah. And if I want to see Linux to succeed, which I do, we're going to have to change. And I can say going forward, if you want to write in rest, you can write in rest. Now, that being said, we still have 40 million lines of C code. Yeah. So, we have some very, very good developers out there working on mitigating the problems we have in C. We now have bound checking for our stuff. We now have other we call them seat belts and airbags that protect your C code from doing stupid things. And we working with the compiler authors to add new extensions to C and make things safer for the C code because we want to protect the code that we have today because you're not going to rewrite code in Rust. Don't worry about that. Google famously published something recently saying over the past couple years we've written our new code in Rust and we got uh overwhelmingly more secure because we didn't touch the old code and bugs degrade over time. There's still going to be bugs in the older stuff, but most bugs happen in your new code, not in your old code. That's awesome. I'm I'm I'm sensing you're you're excited about Russ and I I it's also just nice to see the evolution. Yeah, it's evolution and see what happens and if it fails tomorrow, we can rip it out and what but we have developers willing to do this work for us. It's not intruding on other people's stuff. Well, and I I I think it does go back to what you said earlier is is it's feel I understand that a big part of Linux is like show the work like if if if it works and and same thing, you know, it sound like that's how Rust started and how it's also how it's progressing. People are showing that it works. they're proving that it works, it solves their problem, it maybe even works better for them. And then, you know, step by step. Yeah. Like people are like, "Well, why not Zigg or Hair? Those are other good languages." I'm like, "That's great, but nobody's proposed." Yeah. So, yeah, they want to do that. And to be fair, I think those developers who work on those languages don't care about Linux, which is fine. They don't have to. So, so looking ahead uh outside of uh Rust, what are other things that you're kind of excited about uh that's that's coming in Linux? uh e either projects changes I don't know if uh we we haven't said LLMs except for once here I don't know if that for example like like will LLMs have any impact on on how development is done? No, no, not there's not. I mean, they're all trained on Linux kernel code. So, you write out another driver, but LMS are great for writing um boilerplate code and things like that. In Linux drivers, you don't have much boilerplate code because we've stemmed that down into the core and made that work better. Um LLMs are used to find bugs and find the bugs fixes to match that we should be taking. So, we but again, we've have published papers on that for eight years. Um there's been lots of research on that. Um, so we we've been using that for a while. I mean, LLM is just applied statistics, right? So it's just pattern pretty much. So code for us at this level, it doesn't matter that much. So no. And then as far as I don't know what's coming tomorrow because I just see what people send to me. So we don't have a plan. I mean, we always joke, you know, Linux is evolution, not intelligent design. Um, it's just whatever shows up, right? Because you're solving your problem and we'll figure out how to fit it in there with everybody else's stuff and um, make sure it doesn't work out. People are working on new features. I mean, Linux is people are like, "Oh, it's an old model. It's the old Unix model." It's like, yeah, we can run code from 20 and 30 and 40 years ago, but we can also run new stuff. We have new features. We have new IO paths that are even better. We have new types of functionality. We have new security models. We have new capabilities. We have new types of stuff for the new stuff, but we didn't break the old stuff. So, we can do both stuff. You can rewrite your code. But I know the ## How LLMs are utilized in bug fixes and coding in Linux [01:09:55] databases are rewriting it to use IO ring which is a new way to do IO which gets the user space to kernel boundary out of the way and does fast faster path. So they're speeding up the databases by porting to new Linux features but their old databases still run just fine. And so it's like people look at it like oh nothing's changed because the old stuff still works. This whole that was the goal. The old stuff still works. So I don't know but just see what new I mean new hardware features. I see the new hardware coming all the time. We get told by the CPU vendors like look at this new chip. It's like great. But so that's always fun. And then in terms of contributing to Linux, so we we just went through this example and it's it seems pretty easy to contribute honestly. like you know I I wanted to ask on advice to contribute but my sense is just do it like it's not that difficult but from a professional p point of view like what what do you think a developer who you know is is is building other stuff at at a company what would they get professionally out of contributing even one change or or a few changes to to Linux like how how could their you know outlook change or or what could they learn well that's the best thing is it's your resume it's a so I look I talk to college students. I talk to college students at VU other universities all the time. Say, "Hey, contribute to the colonel while you have time." And then when you go to get hired, somebody can look at you say, "Oh, yeah, look, you do play well with others and you did a contribute other stream." Because when you come in as a company, you're not writing code from scratch. You're working with other people. You're working with existing code bases. If you contribute to Linux, you or any open source project, you show that you can work with others. You can work with existing codebase. So, it shows a great skill set. When I hired people when I was at IBM, if you contributed versus not, it's like, oh, that's an easy cell. I'd rather take that. So, from a personal point of view, contributing, you can get a job easier, get the next job. From another point of view is from an engineer, you get to learn new things. I wrote my first driver and sent it out. So, oh, here it is. It's all perfect. What? Everybody's like, "No, this is wrong. This is wrong. This is wrong." And you ever heard of multiprocessors? I'm like, "What? What is all this?" And that's great from an engineering point of view. I want to know better. I mean the Linux kernel developers, you can never have all the best developers in the world at the same company. But when in open source, we can all work the best operating system people can all work on the operating system together. So the depth and talent of the people that are working on Linux is just amazing. Take advantage of that. ## The value of contributing to the Linux kernel or any open-source project [01:12:13] I'll say the rest developers that are working on Rust for Linux are core Rust developers. These people are really, really, really good. They maintain core parts of REST infrastructure. Take advantage of them. I mean, I'm learning so much from them. So from an engineering point of view, there's these people that are really out there and willing to help you and grow and as an engineer and learn different processes and learn different skills much better. I mean, I learned so much more working in the community than I ever did working at companies because you have better review process. You have more exposure to crazy corner cases that you hadn't thought of that. Oh yeah, in the real world, yes, that would have been one in a million, but we do have to take that because we have a million boxes, two four billion machines out there. Plus, plus plus I guess that the more curious you are, everything is open in Linux. So I remember when I when I joined Uber, I was just amazed by the RFC process and internally I could read all the RFS and I spent like a week or two just kind of breathing and you know trying to take it all in in Linux is here like like any anyone obviously it's overwhelming if if you just start at once but but you can like target something and so so you can just even even if you contribute little or even before you contribute you could just learn you can see how the changes are made. You can try to understand these things. Yeah, it's I will say it's not the best learning operating system. There's really good learning operating systems out there. We're not this. That being said, I mean people still write academic papers about it and all this stuff. We want to rewrite the scheduleuler, do all this fun stuff with it because it is a realworld tool. I mean, I learned from Min and Lena's learned from Minx, which was a learning operating system. And then um we took those ideas and that and we made Linux with it. I mean, Lena stood it way before me, but um learning operating systems are great, but working on a real world system is a little bit different. That being said, there's really there's parts of the kernel that are very easy to get into for newbies. We have a whole section of code with really bad crummy drivers that are the wrong coding style. They have um the wrong formatting. They have um they just a lot of dead code that's there for beginners to take up and take your first patch, fix the the spelling mistakes, fix the coding style, learn how to do this stuff. And there's a whole website kernel newbies.org is a wiki that has a whole bunch of stuff on it. How to write your first kernel patch, how to get involved. Um I've given old YouTube talks if you searched how to write a kernel patch. I need to do a newer one. It's fun. I've gone to universities and said here I gave everybody a file that you're going to write a colonel patch for this file. It's like what? Okay. And they do it and by the end of the class, end of two hours, they send a patch off and they got accepted. You know, it's very simple. Um it's not a difficult thing to do. Um, and we want new people to get involved because we want we don't know who's out there or what they can contribute if they just want to do something for fun or do something for real. It's great. Awesome. Well, this has been really interesting and I just like to close off with some rapid questions where I I just ask and then you know you do what comes to mind. What's the most memorable patch that you've contributed to in Linux? So, this is going to be about people again. early 2000s um we were starting to get Microsoft was saying Linux is a cancer. We're all worried about Linux. Oh, I remember that. Yes, I remember that stuff. Um we started getting some really really good patches for some hardware that we really didn't know that well that was showing some really good stuff and it was like this is really good and we're like where did you get this information? How did you know this stuff? Is this like as somebody trying to sneak this in? And the person wrote back and said here's how I found this. Here's how I tested it. How they did this? like okay all right we took this and over time we took all these patches over time and then we have this conference once a year for all the maintainers and you get invited to it and we're like oh let's give them an invite because it was really good and it was in Canada every year for a number of years for some reason um and they came and they showed up and he he showed up and he's like a sorry I had to bring my mom because he was in high school he was 17 years old none of us knew and he contributed and it was like okay great and it turns out um he later went on to MIT And now he's a professor at Stanford. Wow. Yeah. All you see is an email address. Yeah. Say Adamo. It's like okay. Yeah. It's it's like that. I mean it's things like that. That's good. Um another one I'm really happy about is we there's lots of drivers that have been sitting outside the colonel tree for many many years just cuz people never got them upstream or whatnot. Um one of them is the subsystem that handle Braille keyboards. So Braille displays. Yeah. Just feel that those are outside the tree. I and a couple other developers worked with those people and got them in the tree and got them working and now they're shipping with all devices. So we made sure that these people who not were always having to patch this out of tree stuff because ## Rapid fire round [01:16:40] these people these devices only needed for a very tiny subset but now it's available for everybody. I'm very happy to see that happen. Wow. And I I guess this goes back to to Linux of what you said why companies will contribute a few developers per year because now when you take Linux you get an OS for example that also has rail support like you know that that itself like adding it to an existing product or or if you built an OS like that itself would be a massive undertaking. Yeah. So now it supports all the devices out there. So awesome. What's your favorite programming language? It's still C. Still C. I mean I've been doing C for what 30 years every day. So yeah C. Yeah. Um, I've been doing a lot of Rust lately. Rust I feel like I'm able to write really sloppy code and it's and it's and it works. Not as I don't feel like I have to be as precise as as C, which is I don't know if that's good or bad. And what what are what's a book or two that you would recommend reading? The old code complete book was a really good one. That was a really good one. It taught me that um coding style matters. It doesn't matter what the coding style is. It's just a a spe a generic a set coding style matters because our brains work on patterns and as programmers we're reading patterns and when the patterns the same the metadata goes away and we can see the logic easier. Code complete is aged a little bit weirdly. If you look at the first book it has a lot more C examples and whatnot but it it talks about the basics behind programming all that stuff and that was a really really good book. Um, on the flip side, another really fun one, um, that's really tiny, programming pearls and like bit fiddling and, um, cute little algorithms and neat stuff like that, which surprisingly we still do today. We're talking about adding parody functions in a common way and everybody's like, "No, if you do it this way, you do it this way, it'll be faster." And this, so we're still messing with these things that people have messed with for 40, 50, 60 years. And these things still matter, and they matter to people because cycles matter and power matters and things like that. So between those two, those those are my favorite ones. Well, this is this is awesome. This has been such an interesting and like for me just really educational and eye opening chat. So I'm I'm glad we did it. Well, thanks for having me. I found this episode to be a really interesting one about Linux. I'm still amazed that an open source project managed to become the most widespread operating system in the world despite not being a commercial business. It's such an interesting and inspiring project. You can find Greg on social media as linked in the show notes below. And if you'd like to try your hands on contributor to Linux, visit kernelnewbies.org. For more deep dives related to backend engineering, check out the pragmatic engineer articles linked in the show notes below. If you enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps more people discover the podcast and a special thank you if you leave a rating. Thanks and see you in the next