YouTube Deep SummaryYouTube Deep Summary

Star Extract content that makes a tangible impact on your life

Video thumbnail

What is a Principal Engineer at Amazon? With Steve Huynh

The Pragmatic Engineer • 73:17 minutes • Published 2025-07-09 • YouTube

📚 Chapter Summaries (16)

📝 Transcript Chapters (16 chapters):

📝 Transcript (2174 entries):

## Intro [00:00] If you're going to optimize for performance, saying why can't we be at 1 millisecond or why can't we be at 10 milliseconds and start from there instead of sort of saying hey let's try to decrease latencies by 50% or 25%. Let's just start from what is the conceptually fastest thing that we could do and that's actually how Amazon was created. Amazon's principal engineering level is unique in many ways across big tech. Steve Hume was a software engineer at Amazon for 17 years and worked as the last four years as a principal engineer. Today we talk about the ins and outs of this role, including why being promoted from senior to principal is so hard, even though Amazon usually has hundreds of principal engineering openings and thousands of seniors trying to get into these positions, the Amazon principal engineering community, the Inerson events, the Slack group, and the principles of Amazon internal presentation series. Engineering concepts at Amazon are on reliability such as brownouts and COE, correction of errors, and many more topics. If you're interested in understanding one of the hardest engineering levels to get into across big tech together with stories of how Steve thrived in this position, this episode is for you. Subscribing on YouTube and on your favorite podcast player greatly helps more people discover this show. If you enjoy it, thanks for doing so. So, Steve, welcome ## What Steve worked on at Amazon, including Kindle, Prime Video, and payments [01:11] to the podcast. Uh, thanks for having me. How long were you at Amazon? 17 years. Yeah, I was there for 17 and 1/2 years. And yeah, I just quit last year. So, I've been basically a year doing uh other things now. And what were the things that you worked on while you were there? You know, people always talk about my long tenure there, but uh you know, I feel like I've had like five or six jobs uh over that time period. Um I started off on you know, a project called Search Inside the Book. I worked on the first Kindle launch. Wow. I worked on the uh precursor to Prime Video. I sort of like worked there at the beginning part of my career and then I sort of ended my career there uh for the last five years of my time there. I worked in payments. I worked in uh Amazon local which was sort of our group on project when that type of business was looking like it was going to take over. Um I worked on Amazon restaurants. I worked on Amazon tickets which was all ticket master clone and then um my last 5 years was working on live sports streaming uh on Prime Video. If you want to build a great product, you have to ship quickly. But how do you know what works? More importantly, how do you avoid shipping things that don't work? The answer, Statig. Static is a unified platform for flags, analytics, experiments, and more. Combining five plus products into a single platform with a unified set of data. Here's how it works. First, StatSic helps you ship a feature with a feature flag or config. Then it measures how it's working from alerts and errors to replays of people using that feature to measurement of topline impact. Then you get your analytics, user account metrics, and dashboards to track your progress over time, all linked to the stuff you ship. Even better, Static is incredibly affordable with a super generous free tier, a starter program with $50,000 of free credits, and custom plans to help you consolidate your existing spend on flags, analytics, or AB testing tools. To get started, go to stats.com/pragmatic. That is satsig.com/pragmatic. Happy building. This episode was brought to you by Graphite, the developer productivity platform that helps developers create, review, and merge smaller code changes, stay unblocked, and ship faster. Code review is a huge time sync for engineering teams. Most developers spend about a day per week or more reviewing code or blocked waiting for a review. It doesn't have to be this way. Graphite brings stack pull requests, the workflow at the heart of the best-in-class internal code review tools at companies like Meta and Google to every solver company on GitHub. Graphite also leverages high signal codebased aware AI to give developers immediate actionable feedback on their poll requests, allowing teams to cut down on review cycles. Tens of thousands of developers at top companies like Asana, Ramp, Tecton, and Verscell rely on Graphite every day. Start stacking with graphite today for free and reduce your time to merge from days to hours. Get started at gt.dev/pragmatic. That is g for graphite t for technology.dev/pragmatic. So that that's that's a lot of different teams. Is was it like how did you work on so many teams? Is it just like there's a lot of internal transfers? Did you get bored? Was it just you followed your manager? How does it work inside Amazon? Because when people think about companies of people who have not worked on Amazon, they would kind of assume you go, you work there, you're on a team for like, you know, four, five, 6 years. ## How Steve was able to work on so many teams at Amazon [04:38] Clearly not the case. You know, it depends a little bit on like corporate policy and then where you are with your career. Uh I started as a support engineer. So sort of like operationally um focused person and then you know I was basically like I want to be a software developer and so you know I think getting into the company was pretty difficult but once I was there sort of set that target and and changed roles and when I changed the role um you know it was a natural time to move to another team. There's some also some uh internal policy. So basically at Amazon, it used to be that you had to stay on a team for at least a year before you transferred. And if you wanted to transfer, like a a senior manager or director or whoever up top could block your transfer. And what that ended up meaning was that like certain teams that were just terrible to work on, those teams actually had more than 100% attrition over the course of a year because you measured attrition with a year-long time unit. Amazon did something actually smart at the corporate level. Uh they they basically said okay well you have freedom of movement now. This sort of happened I don't know probably like 13 years ago 10 13 years ago. And so they said you have freedom of movement now. A VP or a director can can't block you. They can say okay well we need another month to get like a transition plan going. But essentially you have freedom of movement as long as you're not on a performance improvement plan. which meant that certain teams were sources of high-quality engineering talent and certain teams were syncs of high-quality engineering talent and it sort of created an internal marketplace for for different roles. Now what that ended up meaning was that certain teams they basically didn't want you to know what the policy was. They wanted you to to sort of think that you were kind of stuck. Mhm. But you know despite the that sort of like local gamesmanship that was going Yeah. Like basically some managers didn't want their best people to leave, right? Let's just say it how it is. But ultimately the the I think it's a it's a great strategy because it it put the like if there was a team that was difficult to staff, the problem was on the management. It wasn't something that had to be, you know, bared by or born from the the employee themselves. And so you know getting back to my own career journey at a very large company like Amazon there is so many awesome things that are going on and you know um I decided to just kind of go where my curiosity took me. Now there were some times where you know there were reorgs or you know a line of business got got spun down. Um but ultimately you know I think freedom of movement was one of the smartest things that that Amazon did. And I think this is something that people don't really appreciate about some large companies. You know, not all companies are like Amazon and every company changes, right? Like today, I'm assuming it will be hard to move as many teams within Amazon. Depending on where you are, you know, if you're in a if you're in a satellite office where there's two teams, uh, you can probably move on to the other team at max. Mhm. But I think this is one of the underrated things of large companies like once you are in, it's almost always easier to get that job at another team from the inside. Yes. Especially because you can talk to them. You know, this is I I talked with the Reddit mobile team and I asked like, "Oh, how how can you get a become a platform engineer on the mobile team?" And they said like, "Well, you know, most of our hires have been internal. They just helped us out on hackathons. They come around, they commit stuff. We know them. It's a it's a lowrisk hire." I think it's just nice to remember that when you think of like a big company like Amazon or Meta or or Microsoft, it's just so many small teams and once you're in, you actually have almost priority access to those teams if you play your cards right. Absolutely. And you know, you might interview for that team, but it's it's such lower stakes than an external interview. And you know, just all things being equal, would you rather take somebody that's, you know, uh, internal and and knows the culture. They know how software is developed within a particular context or somebody that's just as good but doesn't, you know, hasn't been onboarded. And I think ultimately you're you're going to pick the person that's internal, all things being equal. Yeah. It's just kind of like business rationality for the most part. So one thing about Amazon and about large companies like Amazon is people talk about externally about the scale and it's hard to imagine but can you give us a sense of the scale that you've seen or like some tough engineering challenges that you worked on that would have been just really hard to work at a smaller ## An overview of the scale of Amazon and the dependency chain [09:12] startup? Yeah, I think that's the thing that you just you will not see at most other places is the the scale of of things. I'll I'll give you a couple of examples. So, you know, Prime is the exclusive club that everybody is a member of. Yeah. And, you know, in in the US, the the shipping benefit is is probably, you know, the most popular, but globally, um, Prime Video is, you know, it's the thing that people use the most with their with their subscription. And so if you think about, you know, our serviceoriented architecture and, you know, just loading up the app, the the the gateway page is the place where all of our requests come in, right? And so it's just it's just like Netflix. It's this infinite scroll of of carousels. So the gateway page is is it the Amazon Prime landing page? Yeah, it's the landing page there. And so you're like, okay, cool. If let's say 90 95 99% of all of your requests are coming from that page and that page needs to be personalized you know and you have a serviceoriented architecture with a bunch of microservices. Um one request to that page turns into let's just say hundreds of downstream requests to different services. It might even be more than that. It's it's actually kind of hard to count. Yeah. And and is is this page right? Like all the all the stuff flowing all personalized stuff. So that's the that's the retail one, but I I was talking about the Prime Video one, the Prime Video one, but essentially it's the same thing. Yeah. And so, you know, same thing for the the retail website as well. And so if you have one request sort of spidering out into, you know, two orders of magnitude more requests internally, you start to see like really really large scale for these microservices. So a microser will have like a reverse proxy or a load balancer in front of it and you are sort of unironically talking about things like tens of thousands of requests per second or hundreds of thousands of requests per second coming into your service. So, so like the services that are like behind you know like there's the prime there's all the things loading they're spidering out like making you know to to render that one recommendation for example for I don't know the video whatever you would like it will make a lot of requests to different different services and then so when you're operating a a smaller service inside of Amazon suddenly you're going to be hit with what you just said 10 10k 100k requests per second that kind of scale exactly and you will essentially be doing yourself you're you're just like Okay, cool. Um, let's change a caching configuration on some item details. And, uh, turns out you've just browned out like a like a critical service, right? Um, what does brown down mean? Oh, sorry. I'm using some jargon. So, we just if you want to talk about availability, um, if you suppose you areing a a service or sending a lot of requests over to them, you can you know, you can you can just take them down. That would be like a blackout. Yeah. Um and so like you send a request, oh you can't establish a connection, it immediately comes back. But there's a there's a type of outage where they brown out. So basically they're reachable. They might accept a connection. Mhm. But you know um they'll essentially time out or or they might return partial results or or bad results or the only thing that they do return is a you know 500 for some percentage or proportion of after we waited a bunch of time for that. Yeah. And so, you know, now we we start talking about like availability and resilience in in the face of like all of these do of this DDoSing that you're doing to yourself. And so the the thing on top of scale that is going to really complicate things is your dependency chain, right? And so, you know, your service is a dependency of some of the process that's going on. It depends on, you know, maybe AWS, it may depend on another service. you know, how do you make sure that if um you know, suppose there's a failure for a primary dependency and that dependency comes back up, how do you make sure you don't just like inundate it with a bunch of requests as it's trying to recover? Yeah. And so you have all of these sort of like odd dynamics that occur. I used a brownout as something that is a perennial problem that we have, right? where there's maybe a dependency on a base service like S3 or Dynamo DB or whatever it is. There might you know be some increased latency that may cause a chain reaction of a dependency going down and then one of these sort of middle tier services would brown out. So what are like you know you're an owner of the the services um for your team and so then it's like okay um what do we do in those situations? How do we know that they're browning out? um what do we do in the face of uh you know a dependency outage and then critically if there is an outage and then the the service comes back up how do we make sure that we give it enough space so that it can breathe so that you know you know as they're trying to recover from some sort of outage we don't just take them down immediately again and I guess for like most of us who are not working right now on these services like these sound pretty cool in theory but you're saying this was actually like like this is not theory This actually was like, oh, this service is going down. We are literally having 100k requests per second and we're like pushing that on to like other three services with with the same cuz we need to get invoke three other services. One of them has browned out. What do we do now? How do we fix it? Yeah. It and and I think for certain other large tech companies, you know, you can do best effort, right? which is basically like, hey, we're we're temporarily down, but you know, you can you can uh you know, you have some sort of degraded service. That makes sense. But if you're on say a website that does purchases, now we're talking about transactions. Or if you're in the Prime Video like live video streaming use case, now we're talking about a football game that you're unable to see. Um and then when we recover, the game might be over. Yeah. Right. And so it's it's much higher stakes. And so I I think the the scale with transactional semantics, right? Like that's actually the challenge that you're not going to see unless you sort of like work for a payment processor or something. Yeah. Yeah. I guess that that real world pressure challenge like you are losing money. That's it. This I'm starting to understand why like I have noticed that startups love to hire from certain companies. They usually startups love to hire from other startups because it's similar environment. from large tech companies, it's a bit of a maybe. I'm generalizing. Obviously, this is will not be true 100% of the time, but for example, hiring from Google, a lot of startups are not as happy because the people coming from Google are used to having this amazing team around them, internal tools, but most startups love hiring from Amazon. And I'm starting to get a sense of, you know, why this actually is. Yeah, I think that's part of the the culture. You know, you you get uh you get hired as a software developer and they hand you a pager. And before, you know, phone apps and and things like that, it was like this pager from the 90s. And it's it's really great because you have to you have to like operate the software that you write if you if you actually you cannot write the software, hand it over to the testing team, and then throw it over to the S sur team after you're done. Like you own that that piece of software. Yeah. Yeah. At every team, right? Mhm. One interesting thing that we talked about yesterday over over dinner ## Amazon’s focus on latency and the tradeoffs they make to keep latency low at scale [16:40] with with Casey Moratori is you said something interesting on how Amazon measured how on their retail website I think it was retail maybe Amazon Prime the lower the latency of something loading like a page loading like a purchase or a purchase button loading the more revenue they got and they started to measure and there was a linear linear correction as the faster it was the more people converted and it seemed it had no end and the question Casey asked is like okay if this is the case what would stop Amazon because you have the best technologies in the world. You you have AWS, you know, you can build whatever you want to get the latency of the website down to let's say like 10 milliseconds or or even 1 millisecond because if this goes up, you would maximize revenue. So can you tell me about like how how that thing like this measurement actually happened and you know why is Amazon's website still may maybe not the fastest in in the the world even though it would generate so many more billions, right? Yeah. Um well there are a couple questions embedded in there but we'll we'll start with the you know the latency to to gross revenue measurement. So essentially somebody way back when um you know because we invest in logs and telemetry started tracking how much gross revenue we would make based off of like the latency for detail pages based off latency of gateway based off of latency of of the checkout pages. And they noticed this dynamic where it's like if you're faster you just make more money. It's a it's a pretty clear correlation. Um I think you would even go as far as to say as causation. And so there was this really big focus on on latencies. I love the idea that you know if you're going to optimize for performance saying like why can't we be at 1 millisecond or why can't we be at 10 milliseconds and start from there instead of sort of saying like hey let's try to decrease latencies by 50% or 25%. like let's just start from what is the conceptually fastest thing that we could do. Mhm. And I think in a vacuum the conceptually fastest thing that we could do is sort of like a monolith which is how Amazon started where you know you have a web server with all of your catalog information. And so all of the items that are there and then transaction processing on the host that would be the fastest way to um run and and basically like a web request would be it opens the HTTP or HTTPS handshake. It hits the server. The server in an ideal world has everything cached or calculated. It sends it back. So the total like latency would be the time for this request, the time to transfer that data and you know based on your internet speed and that's it. That is the absolute you cannot be faster than that. I I don't think so. Maybe there's some exotic sort of thing that's maybe you can do some exotic protocol that I know predicts the future and like with UDP sends it. But but yeah, but this this is this is your baseline. I guess the the optimal would be like zero click instead of like a oneclick checkout, right? So we just send you stuff before like you know you want it. That that would be the I guess the theoretical maximum. But you know if if you if there's some sort of like web request, right? So some HTTP request and then some sort of like buy button that would be the fastest, right? And that's actually how Amazon was created. We we bought this, you know, it was sort of the opposite of horizontal scaling. It was vertical scaling. We bought these big sunboxes and you know we hacked up our own web server in in C++ and you know to scale up we bought bigger hardware and then when that didn't work you know we bought like six of these big boxes and that ran Amazon and we ran that way up until the the early 2000s and then what we realized we we ran into a wall which was that um you know when you when you built the C++ binary the binary could only be 4 GB and that was a hard limit based off of the 32-bit soft uh the architecture that we're running on before. We could not get above 4 GB and so these product managers would come and just be like well can just make a change for me right to the devs and then they would just be like I don't think you understand that this is a hard constraint and so we so the size of the code or the binary code the the compiled one it was there and you had so much business logic by then that it just filled at 4 GB. Yeah. Yeah. and and you know we had a distributed C++ build so you know you could uh you know it would take many many hours for it to compile and so we would distribute it across desktops and it was this whole big thing but we ran into that wall and so what we end decided to do and I think this was super smart was like to lean into serviceoriented architectures right and microservices y and when you break it down a web service call is essentially it's a remote procedure call right so you have this execution ution pointer and then you're like okay well I need to do some computation or I need to gather some data I'm going to turn in turn make a HTTP request downstream to another service and then you can sort of chain those things together and so getting back to the original thing about performance in a world where you have to because you have thousands and thousands of developers building you know this stuff and the fact that you cannot have a a monolith as big as Amazon retail you know past something that's sort of like circa 2002 to Amazon size you have to lean into remote procedure call you have to say that there's a web service the best performance that you can actually get is always going to be bounded by the number of web requests that you end up making whether it's the you know the first order calls to say go get the item details um but then also any blocking call that happens downstream and by blocking call we mean like you need to wait for this to finish to get your data like you know a service that like returns I don't know your top five most likely to buy things. It it might need to make those, let's say, five requests or just one request. It needs to wait for that before it can return. Exactly. Exactly. And you can do this telemetry stuff. You can do this observability stuff to figure out, you know, within that service call chain what the blocking call is. And you can get some some uh you know, some amount of visualization on it. And so then you can get down to the point where it's like, okay, if we're going to start from first principles, what's this what's the least amount of latency that you can get for say like a web request or a checkout page call, you're going to run into like the absolute minimum, right? And it's going to be based off of like what are the required operations, you know, uh evaluation or transactions or whatever for that particular request. Yeah. And then basically so as I understand like as it became a microser like more microservices and services this was great for maintainability and also h you just so well you first just solved the issue of the monolith size and you know as we know as with history of course like now teams could be more autonomous they're not as dependent they could build the APIs but it was a trade-off for for latency and now like you had to go back and figure out the the blocking calls how to speed those up how to do I guess you know trade-off things like caching like you know you can things fast but it might not be as correct on the first one or like just tricky UI where you don't show the data just yet but it's coming and the users sense a sense of like progress that those kind of things it and it also I think forces teams to really and product to really say okay like what is the strictly necessary processing that happens on this page some of the work that I was doing uh before I left Prime Video was basically like you have these really really big heavy gateway page you know or landing page requests And you know if you're in a situation with high load, can you preemptively reduce the amount of say personalization that's going on to sort of speed up that page or you know to increase the amount of like throughput that you're able to have so to serve more customers. Can you do that in a smart way, right? That sort of anticipates load that's coming onto the to that page. Mh. Say if there's a football game coming up or something like that. Yeah. Sounds like these are just like a they seem just hard to solve, but now you have to solve them. So sounds like this this kept you busy and not everyone else busy at Amazon to this date, right? Like is is this do you think is this is this ongoing engineering challenge for Amazon? Cuz you know what I would imagine the tricky thing being here is like okay you can optimize whatever you have. you can find the critical path but Amazon keeps growing right like there's new teams new services new everything coming on so this thing will change all all the time it's an ongoing puzzle to solve yeah absolutely yeah I think um you know they they definitely have a ton of work in front of them um also you know it's part of their ethos to to really like launch new lines of businesses really quickly and so you know the ability for a team to go from zero to launch product within the confines and the context of a large corporate entity. I think that's, you know, part of the DNA that's there. So, as long as they're planting seeds as the the sort of like internal terminology is, I think that, you know, software developers will be uh uh in demand for quite a amount of time. Yeah. And I guess it's a good reminder that, you know, there's every now and then we have the monos versus microservices debate that it it sounds it kind of just makes sense for a startup to start with the monolith like you can always do what Amazon did and you have the benefits of latency. Everything is in one place. Like I'm sure there might be reason to start with microservices to start with, but if if you're a small team like I ## Why companies should start with a monolith [26:00] mean even today I don't think that argument changes, right? Like Amazon got really big wins by starting with a monolith back back in the day. Yeah, absolutely. I I I think it just makes a ton of sense to start with a monolith, wait till it breaks, and then the part that where it breaks is when you have like 50 developers working on the same piece of code. Once that sort of breaking point occurs, then you start to like try to figure out like how you can sort of break things up. But starting with a micros service architecture, especially when you're small, like what a waste of time and energy. Totally. So you were a principal engineer at Amazon. And apparently I I learned that you know most companies are they have different levels and again this principal engineer some companies ## The structure of engineering at Amazon and why Amazon’s Principal is so hard to reach [26:44] have like staff level but it's usually like entry level mid-level senior and then you have staff or in the case of Amazon it's it's it's principal. I've learned that Amazon's principal level is both really hard to get into compared to a lot of other companies and it's a it's pretty special in some ways. So, we'll talk about that, but can you tell me like how how is the career kind of development? Cuz most people imagine like, oh, it's it should be pretty straightforward. I spend like I don't know two years as a junior, two years as a mid roughly, and two years a senior, then I get to principal. How does it actually work at Amazon? I think it's linear up until you hit principal, right? So, you know, you join, you're a junior developer, you get promoted to mid. at mid, you know, you're starting to influence the team, but but then you get to senior and so now your expected impact is at the at the team level and then and then there's this jump that you get to principal and principal is it's L6. Uh principal is L7. L7. Yes. Yeah. And so I think you really have to start with like why is it why is that jump so big? Cuz I think at every pretty much any other company, it's just a linear progression. Like there's nothing necessarily special about staff, you know, you can just sort of go to that level, senior staff and then principal. But for some reason, Amazon decided that they weren't going to have a staff level and and so and and I think they they sort of like couched it around like having high standards. Basically to get from senior to principal you have to do like two and a half level jump from from L6 L7. Technically it sounds like one level but at some other companies this might be like uh you know L8 L9 or L8 and a half. Yeah. And you know so the the the handwavy argument is like hey we have high standards and like you know it's it means something to get to that level. It's like fine. But I noticed that some of the best engineers that I'd ever worked with were having such problems getting to principal engineer that they ended up moving to Facebook or to Meta or to all these other places where the progression was just sane. Now staff are senior staff level. Now they're senior staff and you know principal and distinguished engineer at other companies and so because we had high standards we actually had this brain drain and it wasn't a brain drain at lower levels. It was that the brain drain at at sort of like the higher levels. Mhm. And it was it's just an example of something where it's just like why did you do that to yourself? And so that's the the the context for for being a principal at Amazon. you know I so it's safe to say it's wicked hard to get internally right so I you know I I I'm I'm colleagues with Ethan Evans and so we we talk about what's the hardest promotion at Amazon and you know I had made the argument that it was you know it was uh senior engineer to principal and he's like yeah that's hard actually the hardest one Steve is you know VP to senior VP cuz there's only there's only eight spots or 10 spots to for that um and maybe 300 VPs um that are all trying to at this I would that's more of a supply and demand thing. I will say that at Amazon there is gigantic demand for principal engineers and so there are roles that have been open for years. I think something on the order of like 13 months or 17 months or something like that to get an external hire to um to join as a principal engineer. But that metric is only calculated when the role is filled. Yeah. And so probably you know there are hundreds of principal engineer openings at Amazon. Mhm. And there are thousands of senior engineers who desperately want to get there putting in the work, you know, and so there's this sort of like there's this tension, right? Um, and I don't think you see that at the lower levels. I don't think that that's happening at senior or mid or junior. And so like that inongruity I think is is super interesting. But when once you do get to principal engineer, one thing that I've never heard any other company have is there is apparently a principal engineering community which is I've heard again from other people that it's tightly knit. It's actually special. It's actually ## The Principal Engineering community at Amazon [30:44] just really nice organization. Can you talk about that? So like you know once you once you got in there somehow I don't know was was it Blood Switzer promotion? There is a community. I think it's actually really great. um my own history, you know, I I went from support engineer to senior engineer in like four years at Amazon, but then from senior to principal, it took me eight years and I got promoted in uh Q1 of 2020. Turns out to be a consequential like year four in the industry for the world that that was forceful remote work. And so, you know, I got promoted and everybody's like, you know, congratulations. They used to have like a principal engineer offsite where they just flew everybody into Seattle or nearby and then to to sort of like you know um mingle and and to talk to other folks. That stopped during the pandemic and then um you know by the time the pandemic restrictions started leaving the population of principal engineers had essentially doubled. That's still to say like there are still hundreds and hundreds of openings for principal engineer but then the you know the sort of like off-site community shifted over to the senior principles that I didn't have access to but you know at the moment the the manifestation of the principal engineering community is essentially through the slack channel um which is absolutely awesome um and then um we had principal off sites for like our local organization so like Amazon music prime video Twitch that sort of thing. Those meetups were were amazing. So the reason they were is because of this high standard that Amazon had created. And so what it meant is that everybody that was able to achieve that that overly high standard, there's something exceptional about them. Um there's there's, you know, um they're super deep in a particular technology or they were associated with, you know, uh the growth of a a really large line of business either within Amazon or or externally. They were essentially leaders within the industry and you could just literally you could just scoop out five people and then put them into a room and the conversation is just is just amazing, right? And and I would I would sort of be like I don't even belong here. Like look at this guy, you know, he wrote a book on, you know, on on a particular topic and and this guy, you know, he you know, he was, you know, a luminary in in a particular field. and then this person just like is an amazing code machine and can just write an entire application over a weekend and then you're like what am I doing here? You know, I I I do wonder if that community might be coming back now. I I know you've left but now Amazon is now in person because it sounds like a lot of the benefit was the inerson part as well because this is what I never heard again even before the pandemic. I I didn't hear other companies say for example at Uber I I've heard that the senior SAP engineers do get together every now and then but it was was very like roots so so it was bottoms up but my understanding at Amazon actually invested not just you know some principal engineers saying hey let's get together but also just kind of you like making making sure that that that group really had something like I've I I think it's smart I think more companies should do it but I'm just not seeing it the investment was um also in terms of headcount. So there are program managers and and like product managers essentially um that are um you know bringing the folks together. Awesome. There's a there's a wonderful series. It's called the principles of Amazon series where you know principal engineers will just you know they'll do a presentation and it's recorded that's been happening for you know 20 years and you know we record everything that's there but it takes work to actually but that internal series that and is that open to like everyone at Amazon or it's for the principles themselves? It's it's open uh for everybody at Amazon to consume and then um you know there might be some senior engineers and stuff like that that that would make a presentation that's part of their promotion packet is be able to make an Amazonwide presentation on a particular thing. My point was though that that stuff doesn't just happen on its own. Yeah. like you have to like you need a program manager or multiple folks to sort of like herd the cats and to like schedule the off offsites and to make sure that the you know the Slack channel doesn't go off the rails, right? And is still useful and it's just not going to happen like grassroots with just like throwing a bunch of people into a room. This episode is brought to you by Augment Code. You're a professional software engineer. Vibes will not cut it. Augment Code is the AI assistant built for real engineering teams. It ingests your entire repo, millions of lines, tens of thousands of files, so every suggestion lands in context and keeps you in flow. With Augment's new remote agent, cue a parallel task like bug fixes, features, and refactors. Close your laptop and return to ready for review pull requests. Where other tools stall, Augment Code sprints. Augment Code never trains or sells your code, so your team's intellectual property stays yours. And you don't have to switch tooling. Keep using VS Code, JetBrains, Android Studio, or even Vim. Don't hire an AI for Vibes. Get the agent that knows you and your code base. Start your 14-day free trial at augmentcode.com/pragmatic. I think, you know, these are the the things I mean, we're now exposing a few ## The learning benefits of working for a tech giant [36:06] of these things here and there, but some of these companies like, you know, Amazon is a great example where there's more to the eye than what meets the surface. So like once you're inside Amazon for example you now as an engineer even if not a principal engineer you now have access to the whole you know 20 years of principal presentations like when I joined Uber I was amazed at how we had the RFC's available like I could read all historic ones so I think there is and every company has its own of course once you're in there you have access to this like knowledge base which it will just never be published it cannot because it has you know business sensitive things etc. So I think as an engineer like you can just really just like like be a sponge when when you join especially one of the companies that that is known to be a bit more open internally even if yeah Amazon I think a really interesting one because externally it's very closed is my sense they're very careful about what they share for example the postmortm for AWS is very few are published externally but internally they're all there as I understand there as an NGO you can access you can learn from them like in really cool real world learnings absolutely you know um it is an open place internally and we're so selective about what we I say we as though I still work there but uh what what what they publish externally and you know uh the the postmortems we call them COE's it's a COE stands for it's a a correction of error yeah it's you know it's this idea that you know you have like holes in Swiss cheese and and you have like a failure requires that there's a there's a hole across layers that's the best reading like I would just subscribe to the email list where they were published internally. So you have this like stream of like of disasters that are going on within the company and you just, you know, you grab some popcorn and you you pop open one of these COE's and you learn so much from that and and I think that that's that's part of the secret sauce. The idea and I don't know if it's like this for 100% of them is that it's a blameless culture sort of thing. And so to really screw up requires that multiple people drop the ball. Yeah. And you learn so much from that that sort of stuff. You know, the the brownouts, you know, these uh these lessons that you would learn from, you know, trying to recover from really large dependencies. Those things are immortalized inside some of these COE's. So, there's some very famous outages that happened within Amazon and you know, they were an egg on our face and but we really really learned those lessons through those postmortems. They're they're absolutely wonderful. as a principal engineer, you know, you we so far we kind of glamorized a role saying, you know, it is hard to get into, but once you're there, you have the community, you do this this really impactful work. But one of the principal ## Five challenges of being a Principal Engineer at Amazon [38:44] engineers uh at Amazon who's still there called Bobby Kot Kotari, he collected some things that are maybe not as glamorous or more challenging about principal engineering. He had five of of these things or five or six. I just want to go through with you and and your take on them. The first he wrote, "There is this paradox of belonging that you're part of of all teams yet you're part of none." What does that mean? Yeah. No, so I uh Avoc was actually a a peer of mine. We worked in Prime Video together. So he's he's an awesome dude. Yeah. There's there are all of these paradoxes and and uh this paradox of belonging is is is a really interesting one. You know, you work for the organization, right? you're working across teams, right? So, as a senior engineer, you're working on you're embedded on a team and you know, you own the team's architecture, the the operations, you know, the software development life cycle and the design. But when you get to that next level where you're working across teams, um you kind of operate in this weird layer where, you know, you're not on pager duty for a particular team. Mhm. um you have visibility across all of these teams that are there. You're helping to guide and make decisions, but you're literally not on the ground floor anymore. And so, you know, when you work with a particular team, you know, you might call the senior engineers or the mid-level engineers in and be like, "Hey, let's whiteboard some stuff. Like, let's try to figure out what's going on." You're not on the team. You're kind of this like adviser that's sort of coming in, right? But then, you know, maybe a director or a VP would call you in and say like, "Hey, what do I own? Like, what's going on? Explain to me this outage or tell me why we can't build this thing." And then you're you're trying to whiteboard the architecture and the system and you're trying to say like, "Hey, you know, this is what's going on on the ground floor." Mhm. But you weren't, you know, you weren't part of that team. So, you're just sort of operating in this this sort of strata where, you know, you don't really belong on a team. you know, I'm a I'm an immigrant. I think you are uh as well. And you know, my parents came from from Asia. I'm not Asian, right? So, when I go back to Asia, I'm definitely from from the US. And then growing up in this country, it was just like, you know, I'm I'm uh you know, not quite an American, right? And so you you sort of operate in this sort of you know area in the gaps where you your identity is is is really defined by not being squarely in one of these predefined categories. So it's very similar to that as a principal engineer. You're not on the ground floor. You're not checking in. You will check in code but you're not necessarily part of that team embedded on that team. And even if you are for a short time it's usually a short time and like tomorrow the director call you up and say like hey Steve we need you on this other team. they're in trouble. Move over. Like, yeah. And you parachute in and then, you know, then they're like, "Oh, who's this guy?" You know, and then your your director is like, "What's going on? What what happened during this outage? Why is, you know, why is the why is the press writing about us?" And then you're like, well, you know, here's what's happening on the ground, but you're not really embedded on that team. Which leads us to the next paradox that Bavik said. He he he lists a few of the paradox, which is a freedom responsibility. and he writes that you enjoy significant autonomy in being able to choose what you work on. However, there's an implicit expectation and accountability for resounding impact. Yeah. So, you know, I you know, I reported to a VP right before I uh left the company and uh so they were your manager basically. Yeah, my manager was a was a VP. Oh, wow. That's I I I don't hear many companies having engineers report into VPs. Yeah, that doesn't seem very standard. um you know and so the the org that he owned I you know I considered myself the the tech adviser for that organization was about 450 people uh 450 software developers and what did our one-on ones consist of right like when I when I would have our one-on-one it wasn't like hey here's you know he didn't assign me work he wasn't like hey I need you to build this thing I need you to design this thing the context that he set was basically like here's a direction right that you need to go and the way that you can achieve that type of impact was up to me. Mhm. Right. So he might say something like hey availability is so important for you know uh live sports. We just signed you know billion-dollar contracts with these sports leagues and so we need to increase our availability posture. Mhm. And then I would be like, "Okay." And then I would go away and we would come back and I would be like, you know, here's what I'm working on, right? Like that type of dynamic. I don't this does not exist at the senior engineer below level where you're basically telling your boss what's happening. I I was about to say that when you said my my manager one-on- ones, he didn't tell me what to do. I'm like most engineers would be like, "Sign me up." Like I I don't want, you know, we all hate micromanagement. But now when you're telling me like he would say like, "Oh, so we just signed a billion dollar contract. Availability is important and then stops talking." I'm like, "That sounds uncomfortable." And and and basically like you're kind of expected a little bit to like understand what he's expecting even though he doesn't know. And then and I'm assuming, you know, there's two ways of going, right? You go back on the next one-on-one and you say something and he's like like Steve like you're a principal engineer. This is not what I expect of you and you don't want that. whereas this, you know, if if you bring back the right things. So, sounds like you really need to uplevel in like understanding how like these people think. AB: Absolutely. And so, he's, you know, he's accountable to to his boss as well. And, you know, don't get me wrong, I I didn't, you know, I I had a I owned aspects of availability. You know, there's a multi,000 person organization at Prime Video doing this stuff, but we owned the the live sports aspect of this. Um, and you know, there are playback teams, there are, you know, recommendation teams, there, you know, there's so many different teams that are there that had to to really step up and and uh make sure that availability was good. But he would say something like, hey, you know, what is our availability posture for certain aspects and I would have to go and figure it out. Yeah. Like where what are we measuring? What are we not measuring? there's a deadline for, you know, the start of a season uh where we're expecting, you know, millions and millions of concurrent uh to come in. Um what can we do between now and then, right? And then if we do write some software like what what is the highest leverage piece of software that we could create that would increase our availability posture. And so the way that I I sort of describe it to people is you are assigned not a problem, not even a problem space, you're assigned a direction. You can solve the problem with code. You can solve the problem with system design and architecture, but you could also solve the problem say by, you know, I don't know, hey, maybe there's some off-the-shelf software we should purchase. U maybe there's a dev team that we should start to spin up right now, um, whose job it is to do this particular thing. Maybe we've identified a piece of software and it's already been scoped that this team needs to go and build, but it's not a priority for them. now we need to go and figure out like you know how we can get them to do it. Can we shuffle around resources? That sort of thing. And so the way I describe it is like there's so many more things on the menu that you can use to solve the problem. And I don't think people recognize that. They they think that it's just oh when you're a principal like you just like code a lot and it's just really complicated or or do more meetings, you know, that's what happens. I mean at the end of the day like don't get me wrong, there's a ton of meetings that go on. Yeah. Yeah. But but this is I I I think it's good to like like shine light because I also feel like once it sounds like a big change, but I also kind of feel if if you get good at this, you might not really want to go back to, you know, having a manager who's like, "All right, here's a project. We need to solve like, you know, scope it out and which you can do, right?" Yeah, that that's cool. And now the next challenge that Bavik said was this all sounds great, but there's apparently bandwidth challenge. So it's it's he's become this like social resource where people just pull you into everything and you're reading. Yeah. No, you know, I think I I wish I had taken a screenshot, but you know, I have my Outlook calendar, right? So it's my schedule. My day looked like most people's week, so it looked like somebody had just like blew up a Tetris factory. Like there there was like I would have triple or quadruple booked on a Monday all through the day. So you would have the manager calendar as an IC. Yeah. And it's it's absolutely crazy because and you know for that large org that I was supporting everybody just added me as optional or or they might try to say like no you're actually required for all of these meetings but when you have you have a triple booked calendar and you're required for this stuff you just learn that you're going to have to disappoint a lot of people. Yeah. And so it's it's this sort of like uh you know um this thing where it's like it's almost easier to say no now that you're obscenely over booked versus when you're a senior engineer you're like I don't have time to write code but there's just barely enough time in between the cracks. Yeah. And so I think that uh it's almost like when it when your schedule breaks that's when you are finally freed because you know that you can sort of say no to stuff. But ultimately, if I just went to all of the meetings that everybody said that I would have to go to, I would be a professional meeting attender and I would literally have no time to do the work. And then Bavik follows up on this next challenge, which is being truly present. And he writes, I think it's almost like, you know, he was sitting next to you. You find yourself physically present in one meeting while your mind is already racing against next three. You know, it's it's a it's a really big challenge. You know, I I pride myself on being a good communicator and being present. And when there there are 20 things that are going on in the air or 100 things that are going on, it's just really really difficult to to say single threaded. Um, and what I ended up having to do is to to sort of say like, okay, I could do all of these things and they would be really impactful, but I just had to aggressively prioritize and say, you know, for the availability, I'm just looking at availability. there's all these other fires that are going on which is disappointing because there there's so many things that you know you could be focusing on. It's it's it's super difficult. And so I you know I work with a lot of people to try to get them to the next level and they say Steve well I'm completely overwhelmed. There are like 20 things that are going on. Um and I tell them like you think it gets easier when you get higher level there's just going to be more and more things on your plate. Why wait until you burn out or you break? you can just start implementing these things now. So every high level tech I see I know and managers included they have a wonderful system in order to like isolate signal and then cut out the noise and if you don't have that you literally won't survive but it just at the at the principal level and above it's just it's just amplified that much more. I'm getting sense that a lot of ## The types of managing work you have to do as a Principal Engineer [49:50] the work as you do as a principal engineer I mean most there's huge amounts of software engineering and you need to be uh you know just just really good at at building resilient systems learning about new technologies you know for example today I'm assuming whoever is a principal engineer at Amazon they expected to just know everything about LLM's trade-offs characteristics etc because they're anyway but you also need to just become do the skills that managers have which is managing your time uh changing contacts, figure out how to get that focus time like you know contrary to popular belief like managers actually need focus time. So like you know I I will also always try to carve out some time but you're now doing it while your title is not manager but actually it's it's it feels like you combine a manager a lot of manual responsibilities and a lot of you know like experienced engineer and boom you get the principal engineer role. Oh the only upside is like you don't need to do performance reviews for people. Congratulations you saved a little bit of that. Well, actually during performance review season, they pull the principal engineers in cuz if you're if you're So, you know, if you're stack ranking people, okay, cool. Well, we'll need to take a look at their performance check. So, I reported to a VP, you know, one of my peers was a director and he was basically like, "Hey, Steve, I would like you to show up to my performance review for my entire org of hundreds something people." And I'm like, "I can't do that for you and for everybody else." Okay. So now so now it would make sense why as a principal engineer your compensation package will be similar to like uh is it a senior engineering manager or something like that around that around that but basically like the job is has a lot of overlaps okay the benefit is you're not the one delivering the performance review the direct report but you're doing almost everything else or in terms of the effort I'm talking about. Yeah. Okay. So, having been a principal engineer for 4 years, what are the good things that you really really liked about Amazon, specifically Amazon's principal engineer role? And what are some of the, you know, not so good or it ## The pros and cons of the Principal Engineer role [51:47] could have been better things? I mean, the the great parts are you get visibility that you just couldn't possibly have at the team level. you know, within a large organization like Prime Video or wherever you're at, there are many thousands of people that are working within that organization doing so many things, right? And and typically the performance of these people is really high. There's so many different directions that are going on. And so to survive, you kind of have to look inward and you say, "Okay, well, here's my service boundary. Here's all the software I own. I'm going to own everything within the sphere of ownership." because you've built this wall up, you tend not to be able to see like that broader picture. Yeah. And so, as a principal engineer, I think it's really awesome to be able to sort of like spelunk and and be able to go to different teams and and sort of see that broader picture. And I just don't I don't see a way that you would be able to get that vis that type of visibility that's super interesting um at a lower level. Mhm. You know, I think the other thing is like, you know, whether it's it's warranted or not, you do get some amount of status when you go to a meeting, people just listen to you. They listen to your hairrained ideas and it's kind of nice because you don't necessarily have to like prove yourself over and over again, right? It's a bit less like professional like not fights, but just establishing that you know what you're talking about. Yeah. Yeah. Um, now the bad things are, you know, uh, there's a lot of folks that are really good in tech and being really effective as a principal engineer, but then they also, you know, myself included, they're like, "Okay, cool. Well, that sort of makes me an expert in pretty much everything." And so you would get these principal engineers together. We had a weekly meeting and and so it would be like okay if you wanted to talk about like establishing a constitution for a small island nation all of a sudden they would just be like well like here the main considerations is like we nobody has a background in government policy but all of a sudden like just because you're sort of trained to do so you start to like pitch in you're like well actually you know maybe we should have two branches of government or three branches of government and and it just sounds like we would know what we're we're doing but we don't and so there's this trap and and again I've fallen into it many times where you actually think you're an expert in one thing but you're actually not right and so you know take LLMs there's a ton of folks that understand AI I left before it was sort of like allowed to use internally but I think you can use it now um I'm not an expert in LLMs at all but I I do think that um the expectation would be that you understand you know how they work but then the expectations also like hey what should our policy be how should we be thinking about this stuff and I think that's fine for mature technologies potentially like you can ramp yourself up for it but as like that particular landscape is changing so quickly I think there's this sort of trap where you you sort of you speak as an authority even though you haven't had the requisite time to ramp up at something and you've been there for 17 years at at Amazon what are your favorite parts of the culture like I I you know there's a lot of things that uh there's a values ## What Steve loves about Amazon’s leadership principles [54:59] that that we all know like the frugality customer obsession what what were the things that you're that you found to be like the most interesting or the ones that had lasting impact and how did they change how did Amazon change over 17 years they must have changed no I I think the the things I missed the most um and in the secret sauce yeah the the leadership principles are good but I think the actual secret sauce there is principled thinking Mhm. Right. Yeah. So, you know, there's, you know, uh, invent and simplify and bias for action and all of this stuff, but like ultimately the thing that is amazing about those leadership principles aren't the specific stances that they took. So, they decided that customer obsession is a big deal. They decided that bias for action is a big deal. All of these things. But really, if you if you looked at a meta level, you'd be like, "Oh, these guys have principles that they won't budge on." I sort of think about it in terms of math and axioms like you just take certain things to be true. You know, two lines that are parallel if you extend them out to infinity won't touch them and won't touch with each other. Yeah. You assume that's true. Yeah. You you don't you don't prove that. It's an axiom and then based off of that you're able to build a system of mathematics, right? And so it's the same thing with the corporate leadership principles at Amazon. They basically said, "Okay, we are going to fix these things to be true." There are 16 or 12 or I don't know, they just sort of built some and now they're 16 and um but there are like four or five that are just really core to to Amazon and we just fix those things to be true. Which which ones were the ones that you felt were the most present? Customer obsession. We are absolutely customer obsessed. We'll just burn money to to delight a customer. You can you can be in a meeting with a VP as an intern and you say hey that's a bad customer experience. It would be like a needle coming off a record. It would just be like what what are you talking about like immediately right? You know bias for action. Uh so like just get some stuff done. Stop asking for permission. Just like go and do it, right? Ownership it's just like you own your software, you run the you know you do the operations, you know you own the bug count, all of this stuff, right? Um, so those are the ones that are like those are fixed and then you start layering things on top of it and I think it's really great and but you know you could you could take Amazon and you could have like the you know evil goatee version of Amazon which is just sort of the opposite of those things and that would still be a really valid and awesome company. So you could say okay well what's the opposite of customer obsession? It's not customer obsession or not not being customer obsessed. I I I think it's you know like being about your staff. Yeah, which is Google. It could be like, hey, we really care about our people above everything else. Or it could be, you know, um let's not mince around it. We care about topline or bottom line revenue. Yeah, that's totally valid, right? And then you could just fix that. You wouldn't you can't prove that, you know, being uh you know, staff focused is a bad thing. You just build that and then you know a certain set of of things will happen like great things are going to happen and then like not so great things are going to happen. those not great things that happen, you can try to mitigate them, but you can't fix them because you have started with this principled approach to everything. Yeah. Yeah. It it it all goes like every everything has. Yeah. I I see what you mean, but I I think what you're saying is like it it might be less about what the specific principles are. I mean, Amazon has theirs and we know about them, but it's just sticking to them and not keeping wiggling cuz because if you keep wiggling, it's like what what's the point, right? then then you're going to have a really look at a mediocre not truly not standout company whatever you do what does it actually mean to be principled and to not bend when it could be really easy to do so so that's a that's an amazing secret sauce of Amazon's people look at the leadership principle I'm like no it's principle thinking another thing a lot of this honestly from what I understand talking to you earlier and some other people a lot of it probably comes from Jeff Bezos being from the top down being very principled and not not giving not saying we we will do whatever it takes. Sounds like it was customer obsession initially and then some other things. Yeah. Yeah. Absolutely. And he's he was he was an absolute genius uh when it it came through. So I'm a I'm a you know I'm a Jeff Bezos fanboy. Um for sure ## Amazon’s intense focus on writing [59:15] like it it just it just worked. Um another thing that uh uh that's Amazon secret sauce is just the writing culture. And so you know I spent on the order of like 1 to four hours every day reading while I was a principal engineer. And the it was we had a standard format. It was a it was a six-page memo. And you know uh that would be our business strategy. That would be uh a system design. That would be you know uh what we called the PR FAQ. So a press release and frequently asked questions for like a new line of business or a new initiative. And everybody was sort of constrained to the six-page format. And everybody just produces documents in that format for whatever they need to do. And so when I would try to get up to speed on a particular thing, I would just be like, "Give me your six pages. Give me all your documents." And I just got really really good at just reading these documents to get up to speed, which was a self-fulfilling and virtuous cycle, which is just like, "Okay, well now I need to express myself." And so I will write a six-pager, and that will set the context for whatever we're working on. we'd go to a meeting, you would read the six-pager and it was just super great to to just actually just have people do study hall at the beginning part of a meeting where you just everybody just gets fast forwarded and then you have a really great discussion at the end. That is what an amazing culture that I think that almost every other company should replicate if they could. But I think the the difficulty would be like you actually have to be disciplined and actually have a breathing cult. Yeah. In principle, then have a reading culture and then actually value writing. Yeah. I almost wonder if unless it comes from the top, some of these things might just be really really hard to do. Yeah. One thing that I figured is we're in your studio right now and you have a lot of these blocks and I asked them what they are. Are they for promotions or projects or whatever? They're for ## Patents at Amazon [01:01:11] patents. Yeah. Uh and this is for patent number 10, 10,824 964. Can you tell me about why you have these, how they come about? Yeah. What you needed to do for them? So the the highest order bit is like you know um for better or for worse there are software patents that exist. Um Amazon they'll say that basically the reason they have them is defensively because you know other people will assert that hey you're in violation of our patents or our IP. Um and then you know we'll use them reactively. Okay fine but you know you're also in violation of these other things. Yeah. Um, and so, you know, there's a there is a culture of of trying to make sure that, you know, we protect ourselves in that way. But, you know, there's the other part of software patents, which is basically like, hey, can you really patent like math or whatever? Um, and so what I learned over time is that, you know, I'm just a really bad IP lawyer, even though, you know, as a principal engineer, I might cosplay as somebody that really understands software patents, right? um at the end of the day um you know what we would do is we would take our important six pages and we would hand them over to the legal team and then they would just be like oh this stuff is really interesting like let's explore that and so it it turned into this awesome thing where like we just had ready inputs to go into like the you know into that particular system a writing culture turns out has a bunch of benefits exactly and and I think that the there's this sort of like it's the concept is called like the curse of knowledge which is essentially Like if you understand something, you discount how long like how easy that concept is. Y and so it's just like you don't get it, you don't get it, you don't get it, and then you get it and then you're like, "Oh, that's trivial, right?" Even though, you know, there could have been, you know, it could actually be novel or it could actually be interesting. And so what ends up happening is that you would just throw these documents over to the lawyers and then they would basically be like, "Oh, this stuff is great." and you would just be like, well, that's just that's just regular software development or that's just the context and domain that we were living in. You know, it turns out that there's some some interesting stuff. This particular patent I'm I'm I'm proud of. So, there's a uh a system design interview question that seems to be popular right now, um which is like design ticket master, right? And so I work on Amazon tickets and you know, we ended up shuttering that business, but you know, we ended up building like one of the world's fastest like ticket selling systems like in the world, right? we could do many many orders per second. So the use case is basically at t0 that's you know for a really big ticket on sale like that's when the maximum amount of demand and requests are coming in um and you want to sell out all of your ticket supply as quickly as possible. The problem is I think uh one where you have seated concerts. Mhm. And so when you purchase a a ticket, you know, most of the time with the system design stuff, it'll be like general admission or it won't be a high ticket on, you know, like one with a bunch of demand. You have to find contiguous seats. Yeah. So the really next to each other. Yes. Exactly. And so, you know, it's uh it's actually really hard. Like suppose it was a SQL database as your backing store. like how do you come up with a SQL query that's just like hey give me the best four tickets you know within this particular price range that are sitting sitted next to each other. Yeah. Now now you're thinking so this is a real real world thing where you need to you want to be as efficient as possible in terms of resource usage may not be maybe you want to minimize your CPU or memory depending on on what you have I assume and you need to do as quick as rapidly as possible to give this to people. Okay. Okay. So, so now we're talking about a problem that is seems like pretty novel in some ways, right? Yeah. And so, you know, I was I I did this patent with a senior principal. I was a senior engineer at the time, but the the idea is like, you know, what is the theoretical maximum speed by which we could, you know, show this inventory to people. And it turns out that, you know, even if you have a high ticket on sale, you only have like thousands of tickets at the end of the day. So instead of making a request to like a backend that would conduct some sort of search across the space, what if you actually inverted it and then you basically had each of the individual hosts have like some view on the entire arena or venue that was there and you loaded up all of that availability and inventory into like L2 cache on a CPU. Yeah. Because it's actually not that many. So if you had this compact rep was pretty big. Then what you can do is you can you can do bit manipulation to like really really quickly get contiguous seats that are there. And then what you do is you can like send in that particular request and try to like reserve those particular seats. Yeah. Now now there's a logging problem which is much more tractable than like hey there's uh you know two million people that have just hit your on each of them. I'm launching a search for each of them. Yes. So the the inversion of that ordering process by which you like actually send out the inventory to the individual nodes and then like load it up into CPU cache and then just do bit manipulation um and then try to lock that resource from the individual nodes. That was that was the basis of this particular patent. Awesome. That's clever. And like that sounds like some you know people are always asking like oh you know on my job I don't use the algorithm stuff or or any of the formal methods. Sounds like there are some uses of it especially when you're trying to figure out what is it like when you just taking away from the pattern like just having a problem like like this and saying like what is the theoretical limit that we can do what is the fastest possible like to answer that you probably want to have access to these tools like you know like so it's it's not always the time and effort to yeah actually get into these things and um so what are you up to now that you've you've left Amazon a year ago after like 17 18 very long years, you know, I'm just, you know, I'm I'm just making content. I'm just sort of living the dream there, you know, making YouTube videos, uh, started up a newsletter. Um, I have a Discord community and yeah, just Yeah. And we're going to link all all of those below. I actually like got to first know you before we started talking. This was like probably a few years ago from your YouTube videos, which are, you know, you know, like you you shared a lot about like Amazon things, software engineering things, and just like your general thinking, but yeah, your news is a new one. So, I'm I'm we'll we'll link it in the show notes below. It's it's always a good way to keep in touch and also, you know, like on your YouTube channel. Awesome. So, as closing, I have some some rapid questions. Okay. So, I'll I'll just ask and you just shoot what comes to mind. What is career ## Rapid fire round [01:07:58] advice that greatly helped you in your path? Yeah. I mean, this is I you know, I talk a lot about this. It's kind of like, oh, what's what's your favorite food or your favorite movie? It's just like there's so much there and it's hard to pick one. What I would say is instead of saying like, hey, what's the technology that I should learn that's really going to, you know, u make my career uh, you know, solid, instead sort of flip it around and say like, how can I quickly learn skills? Mhm. That makes you that makes you sort of like recession proof, right? That that sort of makes you valuable. It's essentially metalarning. It's like how can I learn something faster and faster? If if that's your focus, then you'll always be you you'll never have a problem finding a job and you'll never have a problem progressing in your career. Now some of the skills may be difficult to find resources on online but you know I think if you just sort of think about like what's a valuable skill that if I knew right now would you know make my you know job search easier or would like make me you know perform better on the job and then just sort of thinking about acquiring that skill as quickly as possible and do it now like don't wait. Yeah. Well people tend to postpone themselves. They'll be like, "Oh, well, I'll start when you know everything is l lined up." But like to begin, you just need to begin. Like when you start something that only then will you know what you need to do instead of saying like, "Oh, I need to get everything that I need to do first before I start." You've used a lot of programming languages. Which one's your favorite and why? And and which one do you dislike most? Yeah. You know, I I you know, I I have like a you know, obviously there's no perfect programming language. Um, what I would say is like I really enjoyed Pearl and nobody would ever give that answer, but I just like this concept of like there's just so many different ways to do it. It's a it's a write only language. Like you can't read anybody else's Pearl and I it's it's actually one of the languages that like uses up the most power. It's like the least efficient. It's interpreted. It's it's just like terrible. So most of Booking.com still runs out or some of it. Yeah. Amazon's back end was, you know, for a long time and still might be um, you know, sort of like Pearl Mason and sort of like, uh, web technology bolted onto Pearl. But I just kind of like it. I just feel like I can express myself and there's just like there's just what, however you'd like to express yourself, you can. Um, it also looked like an Asky factory blew up sometimes. And so it's just like it's it's, you know, now that it's on a podcast, you know, I wouldn't really, you know, advertise that fact. The best programming languages right now, I think Rust is pretty interesting. So I might, you know, pick that up. Um, at the end of the day, like I really love the boring languages. Yeah. Um, so you know, Java with, you know, for all of its stuff, like it's verbosity and I think it's just a great langu like a JVM based language, um, that has essentially like great like library support and a bunch of stuff written for it, but it's just like super boring. Maybe it's just because I'm from Amazon and we do this like enterprise stuff like it's a fine language. And then I see you you have a large bookshelf here. You also read a lot especially at Amazon although most internal documents. What is a book that you would recommend something around software engineering that that you enjoyed and it cannot be that book. It can't be your book. Um what I would say is you know you know I just given the advice about um you know metalarning and and career growth. I I think that most software developers should read a book by Kell Newport. It's called so good they can't ignore you. And so the concept there is around career capital. So like what are the skills that are in the most demand? And if you can just like learn those skills then you become in demand. And then you know from there you can choose what type of lifestyle that you'd like. You know you can also like sort of lean into you know some of the science of metalarning. So deliberate practice space repetition that sort of thing. Um, in terms of like tech books, I think the new uh AI engineering book uh by Chipwin is is amazing. It's Yeah. Um, I think uh DDIA, so the the the design of data intensive so good. A new new version is coming the end of a year actually. I'm excited about that. I think that'll be pretty good. Um, but you know, at the end of the day, like you don't want one book on your bookshelf, you want 50 books on your bookshelf. Um, and so, you know, I think within a particular subgenre of techbooks, you know, I'd have recommendations there. But, yeah, Steve, this was great. Awesome. Really enjoyed it. Yeah, great. Thanks so much for having me. Thanks a lot for Steve for sharing all these details. Although Amazon's principal engineering level feels surprisingly difficult to get promoted to, I have yet to hear of such a strong principal engineering community than what Amazon builds and keeps investing in. This community itself could be a reason enough to consider the company after the principal plus level should you have the opportunity to do so. For a deep dive into Amazon's engineering culture, including the details on compensation, career ladders, performance reviews, and engineering processes, check out the Pragmatic Engineer deep dive linked in the show notes below. If you enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps more people discover the podcast and a special thank you if you leave a rating. Thanks and see you in the next