What is a Principal Engineer at Amazon? With Steve Huynh

[00:00] (0.08s)

If you're going to optimize for

[00:01] (1.44s)

performance, saying why can't we be at 1

[00:03] (3.52s)

millisecond or why can't we be at 10

[00:05] (5.52s)

milliseconds and start from there

[00:07] (7.04s)

instead of sort of saying hey let's try

[00:08] (8.72s)

to decrease latencies by 50% or 25%.

[00:11] (11.84s)

Let's just start from what is the

[00:13] (13.44s)

conceptually fastest thing that we could

[00:15] (15.28s)

do and that's actually how Amazon was

[00:17] (17.04s)

created. Amazon's principal engineering

[00:18] (18.64s)

level is unique in many ways across big

[00:20] (20.32s)

tech. Steve Hume was a software engineer

[00:22] (22.56s)

at Amazon for 17 years and worked as the

[00:25] (25.04s)

last four years as a principal engineer.

[00:27] (27.60s)

Today we talk about the ins and outs of

[00:29] (29.20s)

this role, including why being promoted

[00:31] (31.44s)

from senior to principal is so hard,

[00:33] (33.52s)

even though Amazon usually has hundreds

[00:35] (35.28s)

of principal engineering openings and

[00:37] (37.20s)

thousands of seniors trying to get into

[00:38] (38.88s)

these positions, the Amazon principal

[00:40] (40.96s)

engineering community, the Inerson

[00:42] (42.72s)

events, the Slack group, and the

[00:44] (44.16s)

principles of Amazon internal

[00:45] (45.60s)

presentation series. Engineering

[00:47] (47.52s)

concepts at Amazon are on reliability

[00:49] (49.68s)

such as brownouts and COE, correction of

[00:52] (52.16s)

errors, and many more topics. If you're

[00:54] (54.48s)

interested in understanding one of the

[00:55] (55.84s)

hardest engineering levels to get into

[00:57] (57.44s)

across big tech together with stories of

[00:59] (59.76s)

how Steve thrived in this position, this

[01:01] (61.92s)

episode is for you. Subscribing on

[01:03] (63.76s)

YouTube and on your favorite podcast

[01:05] (65.12s)

player greatly helps more people

[01:06] (66.56s)

discover this show. If you enjoy it,

[01:08] (68.72s)

thanks for doing so. So, Steve, welcome

[01:11] (71.36s)

to the podcast. Uh, thanks for having

[01:13] (73.44s)

me. How long were you at Amazon? 17

[01:16] (76.24s)

years.

[01:16] (76.96s)

Yeah, I was there for 17 and 1/2 years.

[01:19] (79.92s)

And yeah, I just quit last year. So,

[01:22] (82.48s)

I've been basically a year doing uh

[01:25] (85.20s)

other things now.

[01:26] (86.40s)

And what were the things that you worked

[01:28] (88.00s)

on while you were there?

[01:29] (89.12s)

You know, people always talk about my

[01:31] (91.12s)

long tenure there, but uh you know, I

[01:33] (93.36s)

feel like I've had like five or six jobs

[01:36] (96.32s)

uh over that time period. Um I started

[01:39] (99.28s)

off on you know, a project called Search

[01:41] (101.84s)

Inside the Book. I worked on the first

[01:43] (103.92s)

Kindle launch. Wow.

[01:46] (106.16s)

I worked on the uh precursor to Prime

[01:49] (109.12s)

Video. I sort of like worked there at

[01:50] (110.72s)

the beginning part of my career and then

[01:51] (111.92s)

I sort of ended my career there uh for

[01:54] (114.48s)

the last five years of my time there. I

[01:56] (116.96s)

worked in payments. I worked in uh

[02:00] (120.40s)

Amazon local which was sort of our group

[02:02] (122.64s)

on project when that type of business

[02:04] (124.88s)

was looking like it was going to take

[02:07] (127.12s)

over. Um I worked on Amazon restaurants.

[02:09] (129.92s)

I worked on Amazon tickets which was all

[02:11] (131.68s)

ticket master clone

[02:13] (133.52s)

and then um my last 5 years was working

[02:15] (135.92s)

on live sports streaming uh on Prime

[02:18] (138.24s)

Video. If you want to build a great

[02:20] (140.00s)

product, you have to ship quickly. But

[02:22] (142.32s)

how do you know what works? More

[02:24] (144.32s)

importantly, how do you avoid shipping

[02:26] (146.32s)

things that don't work? The answer,

[02:29] (149.20s)

Statig. Static is a unified platform for

[02:32] (152.24s)

flags, analytics, experiments, and more.

[02:34] (154.96s)

Combining five plus products into a

[02:36] (156.64s)

single platform with a unified set of

[02:38] (158.64s)

data. Here's how it works. First,

[02:41] (161.68s)

StatSic helps you ship a feature with a

[02:43] (163.60s)

feature flag or config. Then it measures

[02:46] (166.72s)

how it's working from alerts and errors

[02:48] (168.96s)

to replays of people using that feature

[02:51] (171.12s)

to measurement of topline impact. Then

[02:53] (173.68s)

you get your analytics, user account

[02:55] (175.44s)

metrics, and dashboards to track your

[02:57] (177.04s)

progress over time, all linked to the

[02:58] (178.96s)

stuff you ship. Even better, Static is

[03:01] (181.44s)

incredibly affordable with a super

[03:03] (183.28s)

generous free tier, a starter program

[03:05] (185.20s)

with $50,000 of free credits, and custom

[03:07] (187.52s)

plans to help you consolidate your

[03:08] (188.96s)

existing spend on flags, analytics, or

[03:10] (190.96s)

AB testing tools. To get started, go to

[03:13] (193.84s)

stats.com/pragmatic.

[03:16] (196.24s)

That is satsig.com/pragmatic.

[03:19] (199.92s)

Happy building. This episode was brought

[03:21] (201.92s)

to you by Graphite, the developer

[03:23] (203.84s)

productivity platform that helps

[03:25] (205.28s)

developers create, review, and merge

[03:27] (207.04s)

smaller code changes, stay unblocked,

[03:29] (209.28s)

and ship faster.

[03:31] (211.36s)

Code review is a huge time sync for

[03:32] (212.96s)

engineering teams. Most developers spend

[03:35] (215.28s)

about a day per week or more reviewing

[03:37] (217.12s)

code or blocked waiting for a review. It

[03:40] (220.08s)

doesn't have to be this way. Graphite

[03:42] (222.24s)

brings stack pull requests, the workflow

[03:44] (224.24s)

at the heart of the best-in-class

[03:45] (225.44s)

internal code review tools at companies

[03:47] (227.04s)

like Meta and Google to every solver

[03:49] (229.20s)

company on GitHub.

[03:51] (231.20s)

Graphite also leverages high signal

[03:53] (233.12s)

codebased aware AI to give developers

[03:55] (235.04s)

immediate actionable feedback on their

[03:56] (236.56s)

poll requests, allowing teams to cut

[03:58] (238.72s)

down on review cycles. Tens of thousands

[04:01] (241.60s)

of developers at top companies like

[04:03] (243.20s)

Asana, Ramp, Tecton, and Verscell rely

[04:05] (245.60s)

on Graphite every day. Start stacking

[04:08] (248.32s)

with graphite today for free and reduce

[04:10] (250.40s)

your time to merge from days to hours.

[04:12] (252.64s)

Get started at gt.dev/pragmatic.

[04:15] (255.68s)

That is g for graphite t for

[04:17] (257.36s)

technology.dev/pragmatic.

[04:20] (260.16s)

So that that's that's a lot of different

[04:21] (261.60s)

teams. Is was it like how did you work

[04:24] (264.08s)

on so many teams? Is it just like

[04:25] (265.76s)

there's a lot of internal transfers? Did

[04:27] (267.52s)

you get bored? Was it just you followed

[04:29] (269.36s)

your manager? How does it work inside

[04:31] (271.28s)

Amazon? Because when people think about

[04:32] (272.56s)

companies of people who have not worked

[04:34] (274.00s)

on Amazon, they would kind of assume you

[04:35] (275.76s)

go, you work there, you're on a team for

[04:37] (277.68s)

like, you know, four, five, 6 years.

[04:39] (279.52s)

Clearly not the case. You know, it

[04:41] (281.20s)

depends a little bit on like corporate

[04:42] (282.72s)

policy and then where you are with your

[04:44] (284.48s)

career. Uh I started as a support

[04:47] (287.12s)

engineer. So sort of like operationally

[04:50] (290.16s)

um focused person and then you know I

[04:52] (292.88s)

was basically like I want to be a

[04:54] (294.08s)

software developer and so you know I

[04:57] (297.04s)

think getting into the company was

[04:59] (299.60s)

pretty difficult but once I was there

[05:01] (301.84s)

sort of set that target and and changed

[05:04] (304.24s)

roles and when I changed the role um you

[05:07] (307.36s)

know it was a natural time to move to

[05:09] (309.92s)

another team. There's some also some uh

[05:12] (312.80s)

internal policy. So basically at Amazon,

[05:15] (315.76s)

it used to be that you had to stay on a

[05:17] (317.76s)

team for at least a year before you

[05:21] (321.20s)

transferred. And if you wanted to

[05:22] (322.64s)

transfer,

[05:24] (324.32s)

like a a senior manager or director or

[05:26] (326.40s)

whoever up top could block your

[05:28] (328.48s)

transfer.

[05:29] (329.84s)

And what that ended up meaning was that

[05:32] (332.08s)

like certain teams that were just

[05:33] (333.68s)

terrible to work on, those teams

[05:36] (336.24s)

actually had more than 100% attrition

[05:38] (338.72s)

over the course of a year because you

[05:40] (340.80s)

measured attrition with a year-long time

[05:43] (343.20s)

unit. Amazon did something actually

[05:46] (346.08s)

smart at the corporate level. Uh they

[05:48] (348.08s)

they basically said okay well you have

[05:51] (351.36s)

freedom of movement now. This sort of

[05:52] (352.88s)

happened I don't know probably like 13

[05:55] (355.28s)

years ago 10 13 years ago.

[05:57] (357.36s)

And so they said you have freedom of

[05:58] (358.80s)

movement now. A VP or a director can

[06:02] (362.32s)

can't block you. They can say okay well

[06:04] (364.24s)

we need another month to get like a

[06:06] (366.08s)

transition plan going.

[06:07] (367.60s)

But essentially you have freedom of

[06:08] (368.88s)

movement as long as you're not on a

[06:10] (370.56s)

performance improvement plan. which

[06:12] (372.56s)

meant that certain teams were sources of

[06:15] (375.28s)

high-quality engineering talent and

[06:16] (376.96s)

certain teams were syncs of high-quality

[06:18] (378.80s)

engineering talent

[06:20] (380.32s)

and it sort of created an internal

[06:22] (382.08s)

marketplace for for different roles. Now

[06:25] (385.60s)

what that ended up meaning was that

[06:27] (387.52s)

certain teams they basically didn't want

[06:30] (390.72s)

you to know what the policy was. They

[06:32] (392.88s)

wanted you to to sort of think that you

[06:34] (394.64s)

were kind of stuck.

[06:35] (395.92s)

Mhm.

[06:36] (396.48s)

But you know despite the that sort of

[06:38] (398.80s)

like local gamesmanship that was going

[06:41] (401.20s)

Yeah. Like basically some managers

[06:42] (402.24s)

didn't want their best people to leave,

[06:43] (403.52s)

right? Let's just say it how it is.

[06:45] (405.28s)

But ultimately the the I think it's a

[06:47] (407.60s)

it's a great strategy because it it put

[06:49] (409.52s)

the like if there was a team that was

[06:52] (412.40s)

difficult to staff, the problem was on

[06:54] (414.88s)

the management. It wasn't something that

[06:57] (417.60s)

had to be, you know, bared by or born

[07:00] (420.64s)

from the the employee themselves. And so

[07:04] (424.48s)

you know getting back to my own career

[07:06] (426.24s)

journey at a very large company like

[07:08] (428.40s)

Amazon there is so many awesome things

[07:10] (430.64s)

that are going on and you know um I

[07:14] (434.72s)

decided to just kind of go where my

[07:16] (436.96s)

curiosity took me. Now there were some

[07:18] (438.96s)

times where you know there were reorgs

[07:20] (440.80s)

or you know a line of business got got

[07:23] (443.60s)

spun down.

[07:24] (444.88s)

Um but ultimately you know I think

[07:27] (447.12s)

freedom of movement was one of the

[07:28] (448.80s)

smartest things that that Amazon did.

[07:30] (450.72s)

And I think this is something that

[07:32] (452.64s)

people don't really appreciate about

[07:33] (453.92s)

some large companies. You know, not all

[07:35] (455.52s)

companies are like Amazon and every

[07:37] (457.12s)

company changes, right? Like today, I'm

[07:38] (458.64s)

assuming it will be hard to move as many

[07:41] (461.20s)

teams within Amazon. Depending on where

[07:43] (463.20s)

you are, you know, if you're in a if

[07:44] (464.48s)

you're in a satellite office where

[07:45] (465.76s)

there's two teams, uh, you can probably

[07:47] (467.52s)

move on to the other team at max.

[07:49] (469.28s)

Mhm.

[07:49] (469.84s)

But I think this is one of the

[07:51] (471.12s)

underrated things of large companies

[07:52] (472.56s)

like once you are in, it's almost always

[07:54] (474.72s)

easier to get that job at another team

[07:56] (476.96s)

from the inside. Yes. Especially because

[07:58] (478.72s)

you can talk to them. You know, this is

[08:00] (480.40s)

I I talked with the Reddit mobile team

[08:02] (482.40s)

and I asked like, "Oh, how how can you

[08:04] (484.24s)

get a become a platform engineer on the

[08:06] (486.00s)

mobile team?" And they said like, "Well,

[08:07] (487.52s)

you know, most of our hires have been

[08:09] (489.12s)

internal. They just helped us out on

[08:10] (490.80s)

hackathons. They come around, they

[08:12] (492.88s)

commit stuff. We know them. It's a it's

[08:14] (494.64s)

a lowrisk hire." I think it's just nice

[08:16] (496.32s)

to remember that when you think of like

[08:17] (497.44s)

a big company like Amazon or Meta or or

[08:19] (499.60s)

Microsoft, it's just so many small teams

[08:21] (501.84s)

and once you're in, you actually have

[08:24] (504.00s)

almost priority access to those teams if

[08:26] (506.64s)

you play your cards right.

[08:27] (507.92s)

Absolutely. And you know, you might

[08:29] (509.60s)

interview for that team, but it's it's

[08:31] (511.36s)

such lower stakes than an external

[08:33] (513.60s)

interview. And you know, just all things

[08:36] (516.64s)

being equal, would you rather take

[08:38] (518.16s)

somebody that's, you know, uh, internal

[08:40] (520.56s)

and and knows the culture. They know how

[08:42] (522.80s)

software is developed within a

[08:44] (524.32s)

particular context or somebody that's

[08:47] (527.04s)

just as good but doesn't, you know,

[08:49] (529.20s)

hasn't been onboarded. And I think

[08:51] (531.04s)

ultimately you're you're going to pick

[08:52] (532.48s)

the person that's internal, all things

[08:54] (534.00s)

being equal.

[08:54] (534.88s)

Yeah. It's just kind of like business

[08:56] (536.32s)

rationality for the most part. So one

[08:58] (538.40s)

thing about Amazon and about large

[09:00] (540.32s)

companies like Amazon is people talk

[09:02] (542.08s)

about externally about the scale and

[09:04] (544.16s)

it's hard to imagine but can you give us

[09:05] (545.84s)

a sense of the scale that you've seen or

[09:08] (548.08s)

like some tough engineering challenges

[09:09] (549.44s)

that you worked on that would have been

[09:10] (550.80s)

just really hard to work at a smaller

[09:13] (553.12s)

startup? Yeah, I think that's the thing

[09:16] (556.00s)

that you just you will not see at most

[09:19] (559.52s)

other places is the the scale of of

[09:21] (561.92s)

things. I'll I'll give you a couple of

[09:23] (563.52s)

examples. So, you know, Prime is the

[09:26] (566.24s)

exclusive club that everybody is a

[09:28] (568.24s)

member of.

[09:29] (569.12s)

Yeah.

[09:29] (569.44s)

And, you know, in in the US, the the

[09:31] (571.76s)

shipping benefit is is probably, you

[09:34] (574.24s)

know, the most popular, but globally,

[09:38] (578.24s)

um, Prime Video is, you know, it's the

[09:41] (581.28s)

thing that people use the most with

[09:43] (583.20s)

their with their subscription. And so if

[09:46] (586.48s)

you think about, you know, our

[09:48] (588.08s)

serviceoriented architecture and, you

[09:50] (590.16s)

know, just loading up the app, the the

[09:52] (592.56s)

the gateway page is the place where all

[09:54] (594.80s)

of our requests come in, right? And so

[09:57] (597.68s)

it's just it's just like Netflix. It's

[09:59] (599.52s)

this infinite scroll of of carousels.

[10:02] (602.00s)

So the gateway page is is it the Amazon

[10:04] (604.08s)

Prime landing page?

[10:05] (605.04s)

Yeah, it's the landing page there.

[10:06] (606.72s)

And so you're like, okay, cool. If let's

[10:09] (609.76s)

say 90 95 99% of all of your requests

[10:12] (612.80s)

are coming from that page and that page

[10:14] (614.24s)

needs to be personalized

[10:16] (616.24s)

you know and you have a serviceoriented

[10:18] (618.16s)

architecture with a bunch of

[10:19] (619.52s)

microservices.

[10:21] (621.12s)

Um one request to that page turns into

[10:25] (625.84s)

let's just say hundreds of downstream

[10:28] (628.08s)

requests to different services. It might

[10:30] (630.32s)

even be more than that. It's it's

[10:31] (631.84s)

actually kind of hard to count.

[10:33] (633.36s)

Yeah. And and is is this page right?

[10:35] (635.20s)

Like all the all the stuff flowing all

[10:36] (636.96s)

personalized stuff. So that's the that's

[10:38] (638.56s)

the retail one, but I I was talking

[10:40] (640.08s)

about the Prime Video one,

[10:41] (641.04s)

the Prime Video one,

[10:41] (641.68s)

but essentially it's the same thing.

[10:43] (643.04s)

Yeah.

[10:43] (643.60s)

And so, you know, same thing for the the

[10:45] (645.36s)

retail website as well.

[10:46] (646.96s)

And so if you have one request sort of

[10:49] (649.44s)

spidering out into, you know, two orders

[10:52] (652.08s)

of magnitude more requests internally,

[10:54] (654.56s)

you start to see like really really

[10:57] (657.04s)

large scale for these microservices. So

[10:59] (659.44s)

a microser will have like a reverse

[11:01] (661.04s)

proxy or a load balancer in front of it

[11:03] (663.28s)

and you are sort of unironically talking

[11:05] (665.60s)

about things like tens of thousands of

[11:08] (668.16s)

requests per second or hundreds of

[11:10] (670.08s)

thousands of requests per second coming

[11:12] (672.64s)

into your service. So, so like the

[11:14] (674.72s)

services that are like behind you know

[11:16] (676.16s)

like there's the prime there's all the

[11:17] (677.92s)

things loading they're spidering out

[11:19] (679.52s)

like making you know to to render that

[11:21] (681.28s)

one recommendation for example for I

[11:23] (683.60s)

don't know the video whatever you would

[11:24] (684.96s)

like it will make a lot of requests to

[11:26] (686.40s)

different different services and then so

[11:28] (688.32s)

when you're operating a a smaller

[11:30] (690.08s)

service inside of Amazon suddenly you're

[11:32] (692.40s)

going to be hit with what you just said

[11:33] (693.92s)

10 10k 100k requests per second that

[11:36] (696.16s)

kind of scale

[11:36] (696.72s)

exactly and you will essentially be

[11:39] (699.84s)

doing yourself

[11:42] (702.16s)

you're you're just like Okay, cool. Um,

[11:44] (704.88s)

let's change a caching configuration on

[11:47] (707.28s)

some item details. And, uh, turns out

[11:50] (710.64s)

you've just browned out like a like a

[11:53] (713.44s)

critical service, right? Um,

[11:56] (716.16s)

what does brown down mean?

[11:57] (717.68s)

Oh, sorry. I'm using some jargon. So, we

[11:59] (719.68s)

just if you want to talk about

[12:00] (720.72s)

availability, um, if you suppose you

[12:04] (724.16s)

areing a a service or sending a lot of

[12:07] (727.68s)

requests over to them, you can you know,

[12:09] (729.76s)

you can you can just take them down.

[12:11] (731.28s)

That would be like a blackout. Yeah.

[12:13] (733.04s)

Um and so like you send a request, oh

[12:15] (735.60s)

you can't establish a connection, it

[12:17] (737.20s)

immediately comes back. But there's a

[12:19] (739.84s)

there's a type of outage where they

[12:22] (742.08s)

brown out. So basically they're

[12:23] (743.60s)

reachable. They might accept a

[12:24] (744.88s)

connection.

[12:25] (745.60s)

Mhm.

[12:26] (746.24s)

But you know um they'll essentially time

[12:28] (748.56s)

out or or they might return partial

[12:31] (751.20s)

results or or bad results or the only

[12:33] (753.12s)

thing that they do return is a you know

[12:34] (754.96s)

500 for some percentage or proportion of

[12:37] (757.52s)

after we waited a bunch of time for

[12:38] (758.80s)

that. Yeah. And so, you know, now we we

[12:42] (762.08s)

start talking about like availability

[12:43] (763.92s)

and resilience in in the face of like

[12:46] (766.16s)

all of these do of this DDoSing that

[12:48] (768.16s)

you're doing to yourself. And so the the

[12:50] (770.88s)

thing on top of scale that is going to

[12:53] (773.20s)

really complicate things is your

[12:54] (774.96s)

dependency chain, right? And so, you

[12:57] (777.92s)

know, your service is a dependency of

[13:00] (780.56s)

some of the process that's going on. It

[13:02] (782.32s)

depends on, you know, maybe AWS, it may

[13:05] (785.36s)

depend on another service. you know, how

[13:07] (787.28s)

do you make sure that if um you know,

[13:10] (790.00s)

suppose there's a failure for a primary

[13:11] (791.84s)

dependency and that dependency comes

[13:14] (794.08s)

back up, how do you make sure you don't

[13:16] (796.00s)

just like inundate it with a bunch of

[13:17] (797.84s)

requests as it's trying to recover?

[13:19] (799.92s)

Yeah.

[13:20] (800.40s)

And so you have all of these sort of

[13:21] (801.92s)

like odd dynamics that occur. I used a

[13:24] (804.64s)

brownout as something that is a

[13:27] (807.28s)

perennial problem that we have, right?

[13:29] (809.44s)

where there's maybe a dependency on a

[13:31] (811.44s)

base service like S3 or Dynamo DB or

[13:35] (815.12s)

whatever it is. There might you know be

[13:37] (817.44s)

some increased latency that may cause a

[13:40] (820.72s)

chain reaction of a dependency going

[13:42] (822.48s)

down and then one of these sort of

[13:44] (824.00s)

middle tier services would brown out. So

[13:47] (827.20s)

what are like you know you're an owner

[13:49] (829.28s)

of the the services um for your team and

[13:52] (832.80s)

so then it's like okay um what do we do

[13:55] (835.36s)

in those situations? How do we know that

[13:56] (836.72s)

they're browning out? um what do we do

[13:59] (839.12s)

in the face of uh you know a dependency

[14:01] (841.68s)

outage and then critically if there is

[14:03] (843.76s)

an outage and then the the service comes

[14:06] (846.08s)

back up how do we make sure that we give

[14:07] (847.76s)

it enough space so that it can breathe

[14:10] (850.48s)

so that you know you know as they're

[14:12] (852.96s)

trying to recover from some sort of

[14:14] (854.56s)

outage we don't just take them down

[14:16] (856.48s)

immediately again and I guess for like

[14:18] (858.80s)

most of us who are not working right now

[14:22] (862.24s)

on these services like these sound

[14:23] (863.84s)

pretty cool in theory but you're saying

[14:25] (865.92s)

this was actually like like this is not

[14:27] (867.44s)

theory This actually was like, oh, this

[14:29] (869.84s)

service is going down. We are literally

[14:31] (871.36s)

having 100k requests per second and

[14:33] (873.28s)

we're like

[14:34] (874.32s)

pushing that on to like other three

[14:36] (876.16s)

services with with the same cuz we need

[14:37] (877.84s)

to get invoke three other services. One

[14:39] (879.68s)

of them has browned out. What do we do

[14:41] (881.52s)

now? How do we fix it?

[14:42] (882.88s)

Yeah. It and and I think for certain

[14:46] (886.40s)

other large tech companies, you know,

[14:49] (889.28s)

you can do best effort, right? which is

[14:53] (893.20s)

basically like, hey, we're we're

[14:54] (894.88s)

temporarily down, but you know, you can

[14:57] (897.60s)

you can uh you know, you have some sort

[14:59] (899.68s)

of degraded service. That makes sense.

[15:01] (901.44s)

But if you're on say a website that does

[15:04] (904.64s)

purchases, now we're talking about

[15:06] (906.32s)

transactions.

[15:07] (907.60s)

Or if you're in the Prime Video like

[15:10] (910.08s)

live video streaming use case, now we're

[15:12] (912.72s)

talking about a football game that

[15:14] (914.48s)

you're unable to see.

[15:16] (916.88s)

Um and then when we recover, the game

[15:19] (919.04s)

might be over. Yeah. Right. And so it's

[15:20] (920.72s)

it's much higher stakes. And so I I

[15:23] (923.04s)

think the the scale with transactional

[15:26] (926.88s)

semantics, right? Like that's actually

[15:29] (929.44s)

the challenge that you're not going to

[15:30] (930.88s)

see unless you sort of like work for a

[15:32] (932.56s)

payment processor or something. Yeah.

[15:34] (934.72s)

Yeah. I guess that that real world

[15:37] (937.28s)

pressure challenge like you are losing

[15:39] (939.04s)

money. That's it. This I'm starting to

[15:40] (940.40s)

understand why like I have noticed that

[15:42] (942.48s)

startups love to hire from certain

[15:44] (944.72s)

companies. They usually startups love to

[15:46] (946.24s)

hire from other startups because it's

[15:47] (947.44s)

similar environment. from large tech

[15:49] (949.04s)

companies, it's a bit of a maybe. I'm

[15:50] (950.72s)

generalizing. Obviously, this is will

[15:52] (952.32s)

not be true 100% of the time, but for

[15:54] (954.08s)

example, hiring from Google, a lot of

[15:55] (955.60s)

startups are not as happy because the

[15:57] (957.12s)

people coming from Google are used to

[15:59] (959.04s)

having this amazing team around them,

[16:00] (960.48s)

internal tools, but most startups love

[16:02] (962.48s)

hiring from Amazon. And I'm starting to

[16:04] (964.16s)

get a sense of, you know, why this

[16:05] (965.76s)

actually is.

[16:06] (966.40s)

Yeah, I think that's part of the the

[16:07] (967.84s)

culture. You know, you you get uh you

[16:10] (970.64s)

get hired as a software developer and

[16:12] (972.32s)

they hand you a pager. And before, you

[16:15] (975.12s)

know, phone apps and and things like

[16:16] (976.88s)

that, it was like this pager from the

[16:18] (978.64s)

90s.

[16:20] (980.16s)

And it's it's really great because you

[16:22] (982.56s)

have to you have to like operate the

[16:25] (985.04s)

software that you write if you if you

[16:27] (987.28s)

actually you cannot write the software,

[16:30] (990.16s)

hand it over to the testing team, and

[16:31] (991.76s)

then throw it over to the S sur team

[16:33] (993.60s)

after you're done. Like you own that

[16:35] (995.20s)

that piece of software.

[16:36] (996.56s)

Yeah. Yeah. At every team, right?

[16:38] (998.16s)

Mhm. One interesting thing that we

[16:39] (999.76s)

talked about yesterday over over dinner

[16:41] (1001.60s)

with with Casey Moratori is you said

[16:43] (1003.76s)

something interesting on how Amazon

[16:45] (1005.52s)

measured how on their retail website I

[16:47] (1007.52s)

think it was retail maybe Amazon Prime

[16:49] (1009.28s)

the lower the latency of something

[16:51] (1011.92s)

loading like a page loading like a

[16:53] (1013.60s)

purchase or a purchase button loading

[16:55] (1015.76s)

the more revenue they got and they

[16:57] (1017.12s)

started to measure and there was a

[16:58] (1018.08s)

linear linear correction as the faster

[17:00] (1020.64s)

it was the more people converted and it

[17:02] (1022.32s)

seemed it had no end and the question

[17:04] (1024.56s)

Casey asked is like okay if this is the

[17:06] (1026.48s)

case what would stop Amazon because you

[17:09] (1029.68s)

have the best technologies in the world.

[17:11] (1031.20s)

You you have AWS, you know, you can

[17:13] (1033.28s)

build whatever you want to get the

[17:14] (1034.96s)

latency of the website down to let's say

[17:16] (1036.80s)

like 10 milliseconds or or even 1

[17:19] (1039.04s)

millisecond because if this goes up, you

[17:20] (1040.96s)

would maximize revenue. So can you tell

[17:23] (1043.52s)

me about like how how that thing like

[17:25] (1045.68s)

this measurement actually happened and

[17:28] (1048.24s)

you know why is Amazon's website still

[17:31] (1051.20s)

may maybe not the fastest in in the the

[17:33] (1053.60s)

world even though it would generate so

[17:35] (1055.52s)

many more billions, right?

[17:36] (1056.88s)

Yeah. Um well there are a couple

[17:38] (1058.32s)

questions embedded in there but we'll

[17:39] (1059.84s)

we'll start with the you know the

[17:41] (1061.52s)

latency to to gross revenue measurement.

[17:45] (1065.04s)

So essentially somebody way back when um

[17:47] (1067.84s)

you know because we invest in logs and

[17:50] (1070.00s)

telemetry started tracking how much

[17:53] (1073.04s)

gross revenue we would make based off of

[17:55] (1075.60s)

like the latency for detail pages based

[17:57] (1077.76s)

off latency of gateway based off of

[17:59] (1079.44s)

latency of of the checkout pages. And

[18:01] (1081.76s)

they noticed this dynamic where it's

[18:03] (1083.36s)

like if you're faster you just make more

[18:05] (1085.76s)

money. It's a it's a pretty clear

[18:07] (1087.92s)

correlation. Um I think you would even

[18:10] (1090.48s)

go as far as to say as causation. And so

[18:13] (1093.84s)

there was this really big focus on on

[18:16] (1096.56s)

latencies. I love the idea that you know

[18:19] (1099.60s)

if you're going to optimize for

[18:21] (1101.04s)

performance saying like why can't we be

[18:23] (1103.20s)

at 1 millisecond or why can't we be at

[18:25] (1105.36s)

10 milliseconds and start from there

[18:27] (1107.44s)

instead of sort of saying like hey let's

[18:29] (1109.28s)

try to decrease latencies by 50% or 25%.

[18:32] (1112.56s)

like let's just start from what is the

[18:34] (1114.72s)

conceptually fastest thing that we could

[18:36] (1116.96s)

do.

[18:37] (1117.28s)

Mhm.

[18:38] (1118.16s)

And I think in a vacuum the conceptually

[18:41] (1121.92s)

fastest thing that we could do is sort

[18:43] (1123.60s)

of like a monolith which is how Amazon

[18:46] (1126.48s)

started

[18:47] (1127.60s)

where you know you have a web server

[18:50] (1130.32s)

with all of your catalog information.

[18:52] (1132.40s)

And so all of the items that are there

[18:54] (1134.08s)

and then transaction processing on the

[18:55] (1135.68s)

host that would be the fastest

[18:58] (1138.16s)

way to um run and and basically like a

[19:01] (1141.68s)

web request would be it opens the HTTP

[19:03] (1143.68s)

or HTTPS handshake. It hits the server.

[19:06] (1146.80s)

The server in an ideal world has

[19:08] (1148.48s)

everything cached or calculated. It

[19:10] (1150.48s)

sends it back. So the total like latency

[19:13] (1153.60s)

would be the time for this request, the

[19:15] (1155.60s)

time to transfer that data and you know

[19:17] (1157.04s)

based on your internet speed and that's

[19:18] (1158.48s)

it. That is the absolute you cannot be

[19:20] (1160.24s)

faster than that. I

[19:21] (1161.04s)

I don't think so. Maybe there's some

[19:22] (1162.56s)

exotic sort of thing that's

[19:23] (1163.92s)

maybe you can do some exotic protocol

[19:25] (1165.20s)

that I know predicts the future and like

[19:26] (1166.88s)

with UDP sends it. But but yeah, but

[19:28] (1168.56s)

this this is this is your baseline.

[19:29] (1169.84s)

I guess the the optimal would be like

[19:31] (1171.44s)

zero click instead of like a oneclick

[19:33] (1173.12s)

checkout, right? So we just send you

[19:34] (1174.72s)

stuff before like you know you want it.

[19:36] (1176.88s)

That that would be the I guess the

[19:38] (1178.16s)

theoretical maximum. But you know if if

[19:40] (1180.56s)

you if there's some sort of like web

[19:42] (1182.24s)

request, right? So some HTTP request and

[19:44] (1184.56s)

then some sort of like buy button that

[19:46] (1186.80s)

would be the fastest, right? And that's

[19:48] (1188.72s)

actually how Amazon was created. We we

[19:50] (1190.64s)

bought this, you know, it was sort of

[19:51] (1191.76s)

the opposite of horizontal scaling. It

[19:53] (1193.36s)

was vertical scaling. We bought these

[19:54] (1194.72s)

big sunboxes and you know we hacked up

[19:58] (1198.40s)

our own web server in in C++ and you

[20:02] (1202.48s)

know to scale up we bought bigger

[20:04] (1204.96s)

hardware and then when that didn't work

[20:07] (1207.44s)

you know we bought like six of these big

[20:09] (1209.12s)

boxes and that ran Amazon and we ran

[20:11] (1211.84s)

that way up until the the early 2000s

[20:14] (1214.72s)

and then what we realized we we ran into

[20:16] (1216.96s)

a wall which was that um you know when

[20:21] (1221.04s)

you when you built the C++ binary the

[20:23] (1223.36s)

binary could only be 4 GB and that was a

[20:28] (1228.00s)

hard limit based off of the 32-bit soft

[20:30] (1230.32s)

uh the architecture that we're running

[20:31] (1231.76s)

on before. We could not get above 4 GB

[20:34] (1234.96s)

and so these product managers would come

[20:36] (1236.56s)

and just be like well can just make a

[20:38] (1238.16s)

change for me

[20:39] (1239.52s)

right to the devs and then they would

[20:40] (1240.96s)

just be like I don't think you

[20:41] (1241.92s)

understand that this is a hard

[20:43] (1243.28s)

constraint and so we

[20:44] (1244.48s)

so the size of the code or the binary

[20:46] (1246.48s)

code the the compiled one it was there

[20:48] (1248.40s)

and you had so much business logic by

[20:50] (1250.16s)

then that it just filled at 4 GB.

[20:52] (1252.16s)

Yeah. Yeah. and and you know we had a

[20:54] (1254.64s)

distributed C++ build so you know you

[20:57] (1257.76s)

could uh you know it would take many

[20:59] (1259.44s)

many hours for it to compile and so we

[21:01] (1261.44s)

would distribute it across desktops and

[21:03] (1263.28s)

it was this whole big thing but we ran

[21:05] (1265.12s)

into that wall and so what we end

[21:07] (1267.76s)

decided to do and I think this was super

[21:09] (1269.76s)

smart was like to lean into

[21:11] (1271.44s)

serviceoriented architectures right and

[21:13] (1273.44s)

microservices

[21:14] (1274.32s)

y

[21:15] (1275.12s)

and when you break it down a web service

[21:18] (1278.32s)

call is essentially it's a remote

[21:20] (1280.48s)

procedure call right so you have this

[21:22] (1282.24s)

execution ution pointer and then you're

[21:23] (1283.52s)

like okay well I need to do some

[21:24] (1284.80s)

computation or I need to gather some

[21:26] (1286.40s)

data I'm going to turn in turn make a

[21:28] (1288.64s)

HTTP request downstream to another

[21:30] (1290.96s)

service and then you can sort of chain

[21:32] (1292.32s)

those things together

[21:33] (1293.92s)

and so getting back to the original

[21:35] (1295.60s)

thing about performance

[21:37] (1297.52s)

in a world where you have to because you

[21:40] (1300.32s)

have thousands and thousands of

[21:41] (1301.60s)

developers building you know this stuff

[21:44] (1304.16s)

and the fact that you cannot have a a

[21:46] (1306.64s)

monolith as big as Amazon retail you

[21:49] (1309.28s)

know past something that's sort of like

[21:51] (1311.12s)

circa 2002 to Amazon size you have to

[21:54] (1314.16s)

lean into remote procedure call you have

[21:56] (1316.24s)

to say that there's a web service the

[21:58] (1318.32s)

best performance that you can actually

[21:59] (1319.76s)

get is always going to be bounded by the

[22:02] (1322.00s)

number of web requests that you end up

[22:04] (1324.16s)

making whether it's the you know the

[22:06] (1326.00s)

first order calls to say go get the item

[22:08] (1328.80s)

details um but then also any blocking

[22:11] (1331.68s)

call that happens downstream

[22:13] (1333.92s)

and by blocking call we mean like you

[22:16] (1336.00s)

need to wait for this to finish to get

[22:17] (1337.60s)

your data like you know a service that

[22:19] (1339.52s)

like returns I don't know your top five

[22:21] (1341.28s)

most likely to buy things. It it might

[22:23] (1343.36s)

need to make those, let's say, five

[22:24] (1344.80s)

requests or just one request. It needs

[22:26] (1346.32s)

to wait for that before it can return.

[22:28] (1348.24s)

Exactly. Exactly. And you can do this

[22:29] (1349.92s)

telemetry stuff. You can do this

[22:31] (1351.36s)

observability stuff to figure out, you

[22:33] (1353.36s)

know, within that service call chain

[22:35] (1355.36s)

what the blocking call is.

[22:37] (1357.28s)

And you can get some some uh you know,

[22:39] (1359.28s)

some amount of visualization on it. And

[22:41] (1361.04s)

so then you can get down to the point

[22:42] (1362.56s)

where it's like, okay, if we're going to

[22:44] (1364.00s)

start from first principles, what's this

[22:45] (1365.68s)

what's the least amount of latency that

[22:48] (1368.40s)

you can get for say like a web request

[22:50] (1370.56s)

or a checkout page call, you're going to

[22:52] (1372.72s)

run into like the absolute minimum,

[22:56] (1376.32s)

right? And it's going to be based off of

[22:58] (1378.00s)

like what are the required operations,

[23:01] (1381.12s)

you know, uh evaluation or transactions

[23:03] (1383.60s)

or whatever for that particular request.

[23:06] (1386.00s)

Yeah. And then basically so as I

[23:07] (1387.52s)

understand like as it became a microser

[23:09] (1389.60s)

like more microservices and services

[23:11] (1391.04s)

this was great for maintainability and

[23:12] (1392.72s)

also h you just so well you first just

[23:15] (1395.12s)

solved the issue of the monolith size

[23:17] (1397.12s)

and you know as we know as with history

[23:19] (1399.44s)

of course like now teams could be more

[23:20] (1400.96s)

autonomous they're not as dependent they

[23:22] (1402.96s)

could build the APIs but it was a

[23:24] (1404.48s)

trade-off for for latency and now like

[23:26] (1406.88s)

you had to go back and figure out the

[23:29] (1409.20s)

the blocking calls how to speed those up

[23:32] (1412.32s)

how to do I guess you know trade-off

[23:33] (1413.92s)

things like caching like you know you

[23:35] (1415.52s)

can things fast but it might not be as

[23:37] (1417.36s)

correct on the first one or like just

[23:39] (1419.20s)

tricky UI where you don't show the data

[23:41] (1421.52s)

just yet but it's coming and the users

[23:44] (1424.48s)

sense a sense of like progress that

[23:46] (1426.40s)

those kind of things

[23:47] (1427.36s)

it and it also I think forces teams to

[23:49] (1429.68s)

really and product to really say okay

[23:52] (1432.00s)

like what is the strictly necessary

[23:54] (1434.40s)

processing that happens on this page

[23:56] (1436.32s)

some of the work that I was doing uh

[23:58] (1438.40s)

before I left Prime Video was basically

[24:00] (1440.16s)

like you have these really really big

[24:01] (1441.60s)

heavy gateway page you know or landing

[24:04] (1444.32s)

page requests

[24:06] (1446.48s)

And you know if you're in a situation

[24:08] (1448.72s)

with high load, can you preemptively

[24:12] (1452.80s)

reduce the amount of say personalization

[24:17] (1457.04s)

that's going on to sort of speed up that

[24:19] (1459.44s)

page or you know to increase the amount

[24:22] (1462.16s)

of like throughput that you're able to

[24:23] (1463.76s)

have so to serve more customers. Can you

[24:26] (1466.08s)

do that in a smart way, right? That sort

[24:28] (1468.72s)

of anticipates load that's coming onto

[24:31] (1471.04s)

the to that page. Mh.

[24:33] (1473.04s)

Say if there's a football game coming up

[24:34] (1474.88s)

or something like that.

[24:36] (1476.08s)

Yeah. Sounds like these are just like

[24:39] (1479.92s)

a they seem just hard to solve, but now

[24:42] (1482.64s)

you have to solve them. So sounds like

[24:44] (1484.48s)

this this kept you busy and not everyone

[24:47] (1487.44s)

else busy at Amazon to this date, right?

[24:49] (1489.44s)

Like is is this do you think is this is

[24:51] (1491.52s)

this ongoing engineering challenge for

[24:52] (1492.96s)

Amazon? Cuz you know what I would

[24:54] (1494.72s)

imagine the tricky thing being here is

[24:57] (1497.52s)

like okay you can optimize whatever you

[24:59] (1499.60s)

have. you can find the critical path but

[25:01] (1501.44s)

Amazon keeps growing right like there's

[25:03] (1503.60s)

new teams new services new everything

[25:05] (1505.28s)

coming on so this thing will change all

[25:07] (1507.04s)

all the time it's an ongoing puzzle to

[25:08] (1508.88s)

solve

[25:09] (1509.36s)

yeah absolutely yeah I think um you know

[25:11] (1511.52s)

they they definitely have a ton of work

[25:14] (1514.16s)

in front of them um also you know it's

[25:16] (1516.32s)

part of their ethos to to really like

[25:18] (1518.56s)

launch new lines of businesses really

[25:20] (1520.24s)

quickly and so you know the ability for

[25:23] (1523.36s)

a team to go from zero to launch product

[25:26] (1526.32s)

within the confines and the context of a

[25:29] (1529.20s)

large corporate entity. I think that's,

[25:31] (1531.60s)

you know, part of the DNA that's there.

[25:33] (1533.20s)

So, as long as they're planting seeds as

[25:35] (1535.36s)

the the sort of like internal

[25:36] (1536.80s)

terminology is, I think that, you know,

[25:38] (1538.80s)

software developers will be uh uh in

[25:41] (1541.76s)

demand for quite a amount of time. Yeah.

[25:43] (1543.76s)

And I guess it's a good reminder that,

[25:45] (1545.04s)

you know, there's every now and then we

[25:46] (1546.08s)

have the monos versus microservices

[25:47] (1547.60s)

debate that it it sounds it kind of just

[25:49] (1549.60s)

makes sense for a startup to start with

[25:50] (1550.96s)

the monolith like you can always do what

[25:52] (1552.40s)

Amazon did and you have the benefits of

[25:54] (1554.24s)

latency. Everything is in one place.

[25:56] (1556.56s)

Like I'm sure there might be reason to

[25:58] (1558.32s)

start with microservices to start with,

[25:59] (1559.84s)

but if if you're a small team like I

[26:02] (1562.08s)

mean even today I don't think that

[26:03] (1563.52s)

argument changes, right? Like Amazon got

[26:05] (1565.76s)

really big wins by starting with a

[26:07] (1567.52s)

monolith back back in the day.

[26:09] (1569.68s)

Yeah, absolutely. I I I think it just

[26:13] (1573.04s)

makes a ton of sense to start with a

[26:14] (1574.64s)

monolith, wait till it breaks, and then

[26:18] (1578.00s)

the part that where it breaks is when

[26:20] (1580.16s)

you have like 50 developers working on

[26:21] (1581.92s)

the same piece of code. Once that sort

[26:23] (1583.92s)

of breaking point occurs, then you start

[26:26] (1586.16s)

to like try to figure out like how you

[26:27] (1587.92s)

can sort of break things up. But

[26:29] (1589.84s)

starting with a micros service

[26:31] (1591.20s)

architecture, especially when you're

[26:32] (1592.48s)

small, like what a waste of time and

[26:34] (1594.40s)

energy.

[26:35] (1595.20s)

Totally. So you were a principal

[26:37] (1597.68s)

engineer at Amazon. And apparently I I

[26:40] (1600.16s)

learned that you know most companies are

[26:42] (1602.08s)

they have different levels and again

[26:43] (1603.60s)

this principal engineer some companies

[26:45] (1605.28s)

have like staff level but it's usually

[26:47] (1607.04s)

like entry level mid-level senior and

[26:49] (1609.84s)

then you have staff or in the case of

[26:51] (1611.60s)

Amazon it's it's it's principal. I've

[26:53] (1613.44s)

learned that Amazon's principal level is

[26:55] (1615.76s)

both really hard to get into compared to

[26:58] (1618.24s)

a lot of other companies and it's a it's

[27:00] (1620.08s)

pretty special in some ways. So, we'll

[27:01] (1621.28s)

talk about that, but can you tell me

[27:02] (1622.40s)

like how how is the career kind of

[27:06] (1626.08s)

development? Cuz most people imagine

[27:07] (1627.60s)

like, oh, it's it should be pretty

[27:08] (1628.88s)

straightforward. I spend like I don't

[27:10] (1630.16s)

know two years as a junior, two years as

[27:11] (1631.92s)

a mid roughly, and two years a senior,

[27:13] (1633.68s)

then I get to principal. How does it

[27:15] (1635.20s)

actually work at Amazon?

[27:16] (1636.40s)

I think it's linear up until you hit

[27:18] (1638.88s)

principal, right? So, you know, you

[27:20] (1640.80s)

join, you're a junior developer, you get

[27:22] (1642.80s)

promoted to mid. at mid, you know,

[27:25] (1645.04s)

you're starting to influence the team,

[27:26] (1646.80s)

but but then you get to senior and so

[27:29] (1649.44s)

now your expected impact is at the at

[27:31] (1651.92s)

the team level and then and then there's

[27:35] (1655.04s)

this jump that you get to principal

[27:37] (1657.28s)

and principal is it's L6.

[27:39] (1659.04s)

Uh principal is L7.

[27:40] (1660.24s)

L7. Yes.

[27:41] (1661.04s)

Yeah. And so I think you really have to

[27:43] (1663.04s)

start with like why is it why is that

[27:45] (1665.44s)

jump so big? Cuz I think at every pretty

[27:47] (1667.20s)

much any other company, it's just a

[27:49] (1669.28s)

linear progression. Like there's nothing

[27:51] (1671.12s)

necessarily special about staff, you

[27:53] (1673.36s)

know, you can just sort of go to that

[27:55] (1675.36s)

level, senior staff and then principal.

[27:57] (1677.60s)

But for some reason, Amazon decided that

[28:00] (1680.16s)

they weren't going to have a staff level

[28:03] (1683.12s)

and and so and and I think they they

[28:05] (1685.92s)

sort of like couched it around like

[28:07] (1687.36s)

having high standards. Basically to get

[28:10] (1690.08s)

from senior to principal you have to do

[28:12] (1692.40s)

like two and a half level jump

[28:14] (1694.24s)

from from L6 L7. Technically it sounds

[28:16] (1696.96s)

like one level but at some other

[28:19] (1699.28s)

companies this might be like uh you know

[28:21] (1701.20s)

L8 L9 or L8 and a half.

[28:23] (1703.52s)

Yeah. And you know so the the the

[28:25] (1705.36s)

handwavy argument is like hey we have

[28:27] (1707.04s)

high standards and like you know it's it

[28:29] (1709.36s)

means something to get to that level.

[28:30] (1710.88s)

It's like fine. But I noticed that some

[28:33] (1713.12s)

of the best engineers that I'd ever

[28:34] (1714.88s)

worked with were having such problems

[28:37] (1717.60s)

getting to principal engineer that they

[28:39] (1719.60s)

ended up moving to Facebook or to Meta

[28:41] (1721.68s)

or to all these other places where the

[28:44] (1724.00s)

progression was just sane. Now

[28:46] (1726.80s)

staff are senior staff level.

[28:48] (1728.00s)

Now they're senior staff and you know

[28:49] (1729.52s)

principal and distinguished engineer at

[28:51] (1731.52s)

other companies and so

[28:53] (1733.84s)

because we had high standards we

[28:56] (1736.08s)

actually had this brain drain and it

[28:57] (1737.52s)

wasn't a brain drain at lower levels. It

[28:59] (1739.76s)

was that the brain drain at at sort of

[29:01] (1741.44s)

like the higher levels.

[29:02] (1742.72s)

Mhm.

[29:03] (1743.52s)

And it was it's just an example of

[29:05] (1745.28s)

something where it's just like why did

[29:06] (1746.64s)

you do that to yourself? And so that's

[29:08] (1748.72s)

the the the context for for being a

[29:11] (1751.36s)

principal at Amazon. you know I

[29:12] (1752.64s)

so it's safe to say it's wicked hard to

[29:14] (1754.16s)

get internally right

[29:16] (1756.16s)

so I you know I I I'm I'm colleagues

[29:18] (1758.88s)

with Ethan Evans and so we we talk about

[29:21] (1761.28s)

what's the hardest promotion at Amazon

[29:24] (1764.16s)

and you know I had made the argument

[29:25] (1765.60s)

that it was you know it was uh senior

[29:27] (1767.76s)

engineer to principal and he's like yeah

[29:30] (1770.32s)

that's hard actually the hardest one

[29:32] (1772.48s)

Steve is you know VP to senior VP cuz

[29:35] (1775.12s)

there's only there's only eight spots or

[29:37] (1777.20s)

10 spots to for that um and maybe 300

[29:40] (1780.24s)

VPs um that are all trying to at this I

[29:42] (1782.32s)

would that's more of a supply and demand

[29:43] (1783.76s)

thing. I will say that at Amazon there

[29:47] (1787.04s)

is gigantic demand for principal

[29:49] (1789.12s)

engineers and so there are roles that

[29:52] (1792.00s)

have been open for years. I think

[29:54] (1794.32s)

something on the order of like 13 months

[29:56] (1796.40s)

or 17 months or something like that to

[29:58] (1798.32s)

get an external hire to um to join as a

[30:01] (1801.84s)

principal engineer. But that metric is

[30:03] (1803.60s)

only calculated when the role is filled.

[30:05] (1805.52s)

Yeah.

[30:05] (1805.92s)

And so probably you know there are

[30:08] (1808.32s)

hundreds of principal engineer openings

[30:10] (1810.08s)

at Amazon.

[30:11] (1811.12s)

Mhm. And there are thousands of senior

[30:13] (1813.60s)

engineers

[30:14] (1814.56s)

who desperately want to get there

[30:16] (1816.56s)

putting in the work,

[30:17] (1817.60s)

you know, and so there's this sort of

[30:19] (1819.28s)

like there's this tension,

[30:21] (1821.20s)

right? Um, and I don't think you see

[30:23] (1823.68s)

that at the lower levels. I don't think

[30:25] (1825.60s)

that that's happening at senior or mid

[30:27] (1827.20s)

or junior. And so like that inongruity I

[30:30] (1830.16s)

think is is super interesting. But when

[30:32] (1832.88s)

once you do get to principal engineer,

[30:34] (1834.64s)

one thing that I've never heard any

[30:36] (1836.16s)

other company have is there is

[30:37] (1837.76s)

apparently a principal engineering

[30:39] (1839.12s)

community which is I've heard again from

[30:41] (1841.84s)

other people that it's tightly knit.

[30:43] (1843.60s)

It's actually special. It's actually

[30:45] (1845.52s)

just really nice organization. Can you

[30:46] (1846.96s)

talk about that? So like you know once

[30:48] (1848.16s)

you once you got in there somehow I

[30:50] (1850.64s)

don't know was was it Blood Switzer

[30:52] (1852.48s)

promotion?

[30:53] (1853.28s)

There is a community. I think it's

[30:54] (1854.64s)

actually really great. um my own

[30:57] (1857.60s)

history, you know, I I went from support

[31:00] (1860.64s)

engineer to senior engineer in like four

[31:02] (1862.72s)

years at Amazon, but then from senior to

[31:05] (1865.36s)

principal, it took me eight years and I

[31:08] (1868.08s)

got promoted in uh Q1 of 2020. Turns out

[31:11] (1871.52s)

to be a consequential like year four in

[31:14] (1874.00s)

the industry for the world

[31:15] (1875.52s)

that that was forceful remote work.

[31:18] (1878.08s)

And so, you know, I got promoted and

[31:19] (1879.84s)

everybody's like, you know,

[31:20] (1880.72s)

congratulations. They used to have like

[31:22] (1882.80s)

a principal engineer offsite where they

[31:24] (1884.88s)

just flew everybody into Seattle or

[31:26] (1886.56s)

nearby and then to to sort of like you

[31:29] (1889.44s)

know um mingle and and to talk to other

[31:32] (1892.00s)

folks. That stopped

[31:33] (1893.68s)

during the pandemic and then um you know

[31:36] (1896.32s)

by the time the pandemic restrictions

[31:38] (1898.16s)

started leaving the population of

[31:40] (1900.40s)

principal engineers had essentially

[31:42] (1902.00s)

doubled. That's still to say like there

[31:44] (1904.16s)

are still hundreds and hundreds of

[31:45] (1905.36s)

openings for principal engineer but then

[31:48] (1908.24s)

the you know the sort of like off-site

[31:50] (1910.08s)

community shifted over to the senior

[31:52] (1912.16s)

principles that I didn't have access to

[31:54] (1914.56s)

but you know at the moment the the

[31:56] (1916.64s)

manifestation of the principal

[31:58] (1918.08s)

engineering community is essentially

[32:00] (1920.40s)

through the slack channel um which is

[32:03] (1923.04s)

absolutely awesome um and then um we had

[32:06] (1926.72s)

principal off sites for like our local

[32:08] (1928.96s)

organization so like Amazon music prime

[32:11] (1931.28s)

video Twitch that sort of thing. Those

[32:13] (1933.12s)

meetups were were amazing. So the reason

[32:15] (1935.76s)

they were is because of this high

[32:18] (1938.48s)

standard that Amazon had created. And so

[32:20] (1940.80s)

what it meant is that everybody that was

[32:23] (1943.04s)

able to achieve that that overly high

[32:25] (1945.52s)

standard, there's something exceptional

[32:27] (1947.28s)

about them.

[32:28] (1948.48s)

Um there's there's, you know, um they're

[32:31] (1951.12s)

super deep in a particular technology or

[32:33] (1953.52s)

they were associated with, you know, uh

[32:36] (1956.48s)

the growth of a a really large line of

[32:38] (1958.72s)

business either within Amazon or or

[32:40] (1960.72s)

externally. They were essentially

[32:43] (1963.12s)

leaders within the industry and you

[32:46] (1966.80s)

could just literally you could just

[32:48] (1968.40s)

scoop out five people and then put them

[32:51] (1971.92s)

into a room and the conversation is just

[32:54] (1974.16s)

is just amazing, right? And and I would

[32:56] (1976.72s)

I would sort of be like I don't even

[32:58] (1978.16s)

belong here. Like look at this guy, you

[32:59] (1979.84s)

know, he wrote a book on, you know, on

[33:02] (1982.32s)

on a particular topic and and this guy,

[33:04] (1984.96s)

you know, he you know, he was, you know,

[33:07] (1987.28s)

a luminary in in a particular field. and

[33:10] (1990.56s)

then this person just like is an amazing

[33:12] (1992.96s)

code machine and can just write an

[33:15] (1995.04s)

entire application over a weekend and

[33:17] (1997.44s)

then you're like what am I doing here?

[33:19] (1999.92s)

You know, I I I do wonder if that

[33:22] (2002.16s)

community might be coming back now. I I

[33:24] (2004.16s)

know you've left but now Amazon is now

[33:26] (2006.48s)

in person because it sounds like a lot

[33:27] (2007.68s)

of the benefit was the inerson part as

[33:30] (2010.16s)

well because this is what I never heard

[33:31] (2011.76s)

again even before the pandemic. I I

[33:33] (2013.44s)

didn't hear other companies say for

[33:35] (2015.20s)

example at Uber I I've heard that the

[33:37] (2017.28s)

senior SAP engineers do get together

[33:38] (2018.80s)

every now and then but it was was very

[33:40] (2020.80s)

like roots so so it was bottoms up but

[33:43] (2023.68s)

my understanding at Amazon actually

[33:45] (2025.36s)

invested not just you know some

[33:47] (2027.28s)

principal engineers saying hey let's get

[33:48] (2028.64s)

together but also just kind of you like

[33:51] (2031.44s)

making making sure that that that group

[33:54] (2034.00s)

really had something like I've I I think

[33:56] (2036.32s)

it's smart I think more companies should

[33:57] (2037.68s)

do it but I'm just not seeing it

[33:59] (2039.28s)

the investment was

[34:02] (2042.24s)

um also in terms of headcount. So there

[34:04] (2044.64s)

are program managers and and like

[34:07] (2047.20s)

product managers essentially um that are

[34:10] (2050.64s)

um you know bringing the folks together.

[34:12] (2052.40s)

Awesome.

[34:13] (2053.20s)

There's a there's a wonderful series.

[34:15] (2055.04s)

It's called the principles of Amazon

[34:16] (2056.56s)

series where you know principal

[34:18] (2058.88s)

engineers will just you know they'll do

[34:20] (2060.48s)

a presentation and it's recorded that's

[34:22] (2062.64s)

been happening for you know 20 years and

[34:26] (2066.32s)

you know we record everything that's

[34:27] (2067.68s)

there but it takes work to actually

[34:29] (2069.52s)

but that internal series that and is

[34:32] (2072.72s)

that open to like everyone at Amazon or

[34:34] (2074.56s)

it's for the principles themselves? It's

[34:35] (2075.92s)

it's open uh for everybody at Amazon to

[34:38] (2078.24s)

consume and then um you know there might

[34:40] (2080.56s)

be some senior engineers and stuff like

[34:42] (2082.32s)

that that that would make a presentation

[34:43] (2083.92s)

that's part of their promotion packet is

[34:45] (2085.60s)

be able to make an Amazonwide

[34:47] (2087.36s)

presentation

[34:48] (2088.40s)

on a particular thing. My point was

[34:50] (2090.48s)

though that that stuff doesn't just

[34:52] (2092.08s)

happen on its own.

[34:53] (2093.04s)

Yeah. like you have to like you need a

[34:55] (2095.44s)

program manager or multiple folks to

[34:58] (2098.00s)

sort of like herd the cats and to like

[35:01] (2101.20s)

schedule the off offsites and to make

[35:03] (2103.76s)

sure that the you know the Slack channel

[35:05] (2105.68s)

doesn't go off the rails, right? And is

[35:07] (2107.36s)

still useful and it's just not going to

[35:09] (2109.44s)

happen like grassroots with just like

[35:12] (2112.24s)

throwing a bunch of people into a room.

[35:14] (2114.32s)

This episode is brought to you by

[35:15] (2115.68s)

Augment Code. You're a professional

[35:17] (2117.92s)

software engineer. Vibes will not cut

[35:19] (2119.92s)

it. Augment Code is the AI assistant

[35:22] (2122.16s)

built for real engineering teams. It

[35:24] (2124.40s)

ingests your entire repo, millions of

[35:26] (2126.48s)

lines, tens of thousands of files, so

[35:28] (2128.64s)

every suggestion lands in context and

[35:30] (2130.64s)

keeps you in flow. With Augment's new

[35:32] (2132.88s)

remote agent, cue a parallel task like

[35:35] (2135.04s)

bug fixes, features, and refactors.

[35:37] (2137.36s)

Close your laptop and return to ready

[35:39] (2139.12s)

for review pull requests. Where other

[35:41] (2141.44s)

tools stall, Augment Code sprints.

[35:44] (2144.24s)

Augment Code never trains or sells your

[35:46] (2146.08s)

code, so your team's intellectual

[35:47] (2147.76s)

property stays yours. And you don't have

[35:49] (2149.84s)

to switch tooling. Keep using VS Code,

[35:52] (2152.00s)

JetBrains, Android Studio, or even Vim.

[35:54] (2154.72s)

Don't hire an AI for Vibes. Get the

[35:56] (2156.64s)

agent that knows you and your code base.

[35:59] (2159.36s)

Start your 14-day free trial at

[36:01] (2161.20s)

augmentcode.com/pragmatic.

[36:03] (2163.92s)

I think, you know, these are the the

[36:05] (2165.52s)

things I mean, we're now exposing a few

[36:08] (2168.48s)

of these things here and there, but some

[36:10] (2170.32s)

of these companies like, you know,

[36:11] (2171.52s)

Amazon is a great example where there's

[36:13] (2173.28s)

more to the eye than what meets the

[36:14] (2174.96s)

surface. So like once you're inside

[36:16] (2176.40s)

Amazon for example you now as an

[36:18] (2178.08s)

engineer even if not a principal

[36:19] (2179.20s)

engineer you now have access to the

[36:20] (2180.96s)

whole you know 20 years of principal

[36:22] (2182.72s)

presentations like when I joined Uber I

[36:24] (2184.72s)

was amazed at how we had the RFC's

[36:27] (2187.44s)

available like I could read all historic

[36:29] (2189.76s)

ones so I think there is and every

[36:31] (2191.60s)

company has its own of course once

[36:33] (2193.76s)

you're in there you have access to this

[36:35] (2195.12s)

like knowledge base which it will just

[36:37] (2197.12s)

never be published it cannot because it

[36:38] (2198.80s)

has you know business sensitive things

[36:40] (2200.72s)

etc. So I think as an engineer like you

[36:42] (2202.72s)

can just really just like like be a

[36:44] (2204.96s)

sponge when when you join especially one

[36:46] (2206.72s)

of the companies that that is known to

[36:48] (2208.24s)

be a bit more open internally even if

[36:50] (2210.24s)

yeah Amazon I think a really interesting

[36:51] (2211.92s)

one because externally it's very closed

[36:53] (2213.52s)

is my sense they're very careful about

[36:55] (2215.04s)

what they share for example the

[36:56] (2216.88s)

postmortm for AWS is very few are

[36:59] (2219.12s)

published externally but internally

[37:01] (2221.76s)

they're all there as I understand there

[37:03] (2223.44s)

as an NGO you can access you can learn

[37:05] (2225.04s)

from them like in really cool real world

[37:07] (2227.52s)

learnings

[37:08] (2228.24s)

absolutely you know um it is an open

[37:10] (2230.80s)

place internally and we're so selective

[37:13] (2233.28s)

about what we I say we as though I still

[37:15] (2235.52s)

work there but uh what what what they

[37:17] (2237.52s)

publish externally and you know uh the

[37:20] (2240.16s)

the postmortems we call them COE's it's

[37:22] (2242.88s)

a COE stands for

[37:24] (2244.08s)

it's a a correction of error yeah

[37:25] (2245.84s)

it's you know it's this idea that you

[37:27] (2247.60s)

know you have like holes in Swiss cheese

[37:30] (2250.16s)

and and you have like a failure requires

[37:33] (2253.92s)

that there's a there's a hole across

[37:36] (2256.16s)

layers that's the best reading like I

[37:38] (2258.24s)

would just subscribe to the email list

[37:40] (2260.16s)

where they were published internally. So

[37:41] (2261.52s)

you have this like stream of like of

[37:44] (2264.00s)

disasters that are going on within the

[37:45] (2265.92s)

company and you just, you know, you grab

[37:47] (2267.60s)

some popcorn and you you pop open one of

[37:49] (2269.52s)

these COE's and you learn so much from

[37:52] (2272.64s)

that and and I think that that's that's

[37:54] (2274.40s)

part of the secret sauce. The idea and I

[37:56] (2276.72s)

don't know if it's like this for 100% of

[37:59] (2279.12s)

them is that it's a blameless culture

[38:00] (2280.88s)

sort of thing.

[38:02] (2282.08s)

And so to really screw up requires that

[38:05] (2285.52s)

multiple people drop the ball.

[38:08] (2288.16s)

Yeah. And you learn so much from that

[38:10] (2290.96s)

that sort of stuff. You know, the the

[38:13] (2293.04s)

brownouts, you know, these uh these

[38:15] (2295.44s)

lessons that you would learn from, you

[38:17] (2297.20s)

know, trying to recover from really

[38:18] (2298.56s)

large dependencies. Those things are

[38:20] (2300.80s)

immortalized inside some of these COE's.

[38:22] (2302.96s)

So, there's some very famous outages

[38:25] (2305.04s)

that happened within Amazon and you

[38:27] (2307.92s)

know, they were an egg on our face and

[38:30] (2310.24s)

but we really really learned those

[38:31] (2311.84s)

lessons through those postmortems.

[38:33] (2313.28s)

They're they're absolutely wonderful. as

[38:35] (2315.04s)

a principal engineer, you know, you we

[38:36] (2316.56s)

so far we kind of glamorized a role

[38:38] (2318.48s)

saying, you know, it is hard to get

[38:39] (2319.52s)

into, but once you're there, you have

[38:40] (2320.72s)

the community, you do this this really

[38:42] (2322.16s)

impactful work. But one of the principal

[38:44] (2324.16s)

engineers uh at Amazon who's still there

[38:46] (2326.32s)

called Bobby Kot Kotari, he collected

[38:49] (2329.68s)

some things that are maybe not as

[38:52] (2332.16s)

glamorous or more challenging about

[38:53] (2333.68s)

principal engineering. He had five of of

[38:56] (2336.40s)

these things or five or six. I just want

[38:58] (2338.08s)

to go through with you and and your take

[39:00] (2340.16s)

on them. The first he wrote, "There is

[39:01] (2341.92s)

this paradox of belonging that you're

[39:03] (2343.84s)

part of of all teams yet you're part of

[39:05] (2345.92s)

none." What does that mean?

[39:08] (2348.32s)

Yeah. No, so I uh Avoc was actually a a

[39:12] (2352.48s)

peer of mine. We worked in Prime Video

[39:14] (2354.40s)

together.

[39:16] (2356.16s)

So he's he's an awesome dude. Yeah.

[39:18] (2358.08s)

There's there are all of these paradoxes

[39:19] (2359.68s)

and and uh this paradox of belonging is

[39:23] (2363.28s)

is is a really interesting one. You

[39:26] (2366.24s)

know, you work for the organization,

[39:28] (2368.16s)

right? you're working across teams,

[39:30] (2370.32s)

right? So, as a senior engineer, you're

[39:32] (2372.48s)

working on you're embedded on a team

[39:34] (2374.80s)

and you know, you own the team's

[39:36] (2376.32s)

architecture, the the operations, you

[39:38] (2378.72s)

know, the software development life

[39:40] (2380.48s)

cycle and the design. But when you get

[39:43] (2383.44s)

to that next level where you're working

[39:45] (2385.04s)

across teams, um you kind of operate in

[39:48] (2388.56s)

this weird layer where, you know, you're

[39:51] (2391.20s)

not on pager duty for a particular team.

[39:53] (2393.92s)

Mhm. um you have visibility across all

[39:56] (2396.80s)

of these teams that are there. You're

[39:58] (2398.80s)

helping to guide and make decisions, but

[40:01] (2401.12s)

you're literally not on the ground floor

[40:04] (2404.00s)

anymore.

[40:05] (2405.12s)

And so, you know, when you work with a

[40:07] (2407.20s)

particular team, you know, you might

[40:09] (2409.12s)

call the senior engineers or the

[40:10] (2410.40s)

mid-level engineers in and be like,

[40:11] (2411.76s)

"Hey, let's whiteboard some stuff. Like,

[40:13] (2413.36s)

let's try to figure out what's going

[40:14] (2414.48s)

on." You're not on the team. You're kind

[40:16] (2416.40s)

of this like adviser that's sort of

[40:18] (2418.56s)

coming in,

[40:20] (2420.00s)

right? But then, you know, maybe a

[40:22] (2422.00s)

director or a VP would call you in and

[40:24] (2424.32s)

say like, "Hey, what do I own? Like,

[40:25] (2425.92s)

what's going on? Explain to me this

[40:27] (2427.44s)

outage or tell me why we can't build

[40:29] (2429.36s)

this thing."

[40:30] (2430.72s)

And then you're you're trying to

[40:32] (2432.08s)

whiteboard the architecture and the

[40:33] (2433.92s)

system and you're trying to say like,

[40:35] (2435.12s)

"Hey, you know, this is what's going on

[40:38] (2438.24s)

on the ground floor."

[40:39] (2439.68s)

Mhm.

[40:40] (2440.32s)

But you weren't, you know, you weren't

[40:41] (2441.68s)

part of that team. So, you're just sort

[40:43] (2443.04s)

of operating in this this sort of strata

[40:45] (2445.36s)

where, you know, you don't really belong

[40:47] (2447.76s)

on a team. you know, I'm a I'm an

[40:49] (2449.68s)

immigrant. I think you are uh as well.

[40:52] (2452.32s)

And you know, my parents came from from

[40:54] (2454.72s)

Asia. I'm not Asian, right? So, when I

[40:58] (2458.16s)

go back to Asia, I'm definitely from

[40:59] (2459.84s)

from the US. And then growing up in this

[41:01] (2461.44s)

country, it was just like, you know, I'm

[41:04] (2464.00s)

I'm uh you know, not quite an American,

[41:06] (2466.80s)

right? And so you you sort of operate in

[41:08] (2468.96s)

this sort of you know area in the gaps

[41:11] (2471.28s)

where you your identity is is is really

[41:14] (2474.64s)

defined by not being squarely in one of

[41:17] (2477.04s)

these predefined categories. So it's

[41:18] (2478.96s)

very similar to that as a principal

[41:21] (2481.20s)

engineer. You're not on the ground

[41:22] (2482.64s)

floor. You're not checking in. You will

[41:24] (2484.24s)

check in code but you're not necessarily

[41:26] (2486.08s)

part of that team embedded on that team.

[41:28] (2488.48s)

And even if you are for a short time

[41:30] (2490.32s)

it's usually a short time and like

[41:31] (2491.76s)

tomorrow the director call you up and

[41:33] (2493.60s)

say like hey Steve we need you on this

[41:35] (2495.44s)

other team. they're in trouble. Move

[41:37] (2497.28s)

over. Like,

[41:38] (2498.08s)

yeah. And you parachute in and then, you

[41:40] (2500.16s)

know, then they're like, "Oh, who's this

[41:41] (2501.68s)

guy?" You know, and then your your

[41:43] (2503.84s)

director is like, "What's going on? What

[41:45] (2505.92s)

what happened during this outage? Why

[41:47] (2507.52s)

is, you know, why is the why is the

[41:49] (2509.52s)

press writing about us?"

[41:51] (2511.20s)

And then you're like, well, you know,

[41:52] (2512.72s)

here's what's happening on the ground,

[41:54] (2514.00s)

but you're not really embedded on that

[41:56] (2516.24s)

team. Which leads us to the next paradox

[41:58] (2518.24s)

that Bavik said. He he he lists a few of

[42:00] (2520.40s)

the paradox, which is a freedom

[42:01] (2521.68s)

responsibility. and he writes that you

[42:03] (2523.68s)

enjoy significant autonomy in being able

[42:05] (2525.44s)

to choose what you work on. However,

[42:07] (2527.60s)

there's an implicit expectation and

[42:09] (2529.76s)

accountability for resounding impact.

[42:12] (2532.00s)

Yeah. So, you know, I you know, I

[42:14] (2534.80s)

reported to a VP right before I uh left

[42:17] (2537.28s)

the company and uh

[42:18] (2538.72s)

so they were your manager basically.

[42:20] (2540.00s)

Yeah, my manager was a was a VP.

[42:21] (2541.84s)

Oh, wow. That's

[42:24] (2544.80s)

I I I don't hear many companies having

[42:27] (2547.04s)

engineers report into VPs.

[42:29] (2549.28s)

Yeah,

[42:29] (2549.60s)

that doesn't seem very standard. um you

[42:31] (2551.60s)

know and so the the org that he owned I

[42:33] (2553.60s)

you know I considered myself the the

[42:35] (2555.28s)

tech adviser for that organization was

[42:37] (2557.28s)

about 450 people uh 450 software

[42:40] (2560.56s)

developers

[42:41] (2561.68s)

and what did our one-on ones consist of

[42:44] (2564.80s)

right like when I when I would have our

[42:46] (2566.96s)

one-on-one it wasn't like hey here's you

[42:49] (2569.84s)

know he didn't assign me work he wasn't

[42:52] (2572.32s)

like hey I need you to build this thing

[42:54] (2574.48s)

I need you to design this thing the

[42:56] (2576.88s)

context that he set was basically like

[42:58] (2578.64s)

here's a direction right that you need

[43:01] (2581.20s)

to go and

[43:03] (2583.36s)

the way that you can achieve that type

[43:05] (2585.52s)

of impact was up to me.

[43:08] (2588.16s)

Mhm.

[43:08] (2588.48s)

Right. So he might say something like

[43:10] (2590.00s)

hey availability is so important for you

[43:13] (2593.84s)

know uh live sports. We just signed you

[43:16] (2596.56s)

know billion-dollar contracts with these

[43:18] (2598.32s)

sports leagues and so we need to

[43:20] (2600.64s)

increase our availability posture.

[43:22] (2602.72s)

Mhm. And then I would be like, "Okay."

[43:26] (2606.40s)

And then I would go away and we would

[43:28] (2608.72s)

come back and I would be like, you know,

[43:30] (2610.96s)

here's what I'm working on, right? Like

[43:33] (2613.44s)

that type of dynamic. I don't this does

[43:36] (2616.80s)

not exist at the senior engineer below

[43:38] (2618.80s)

level where you're basically telling

[43:40] (2620.56s)

your boss what's happening. I I was

[43:42] (2622.64s)

about to say that when you said my my

[43:44] (2624.80s)

manager one-on- ones, he didn't tell me

[43:46] (2626.24s)

what to do. I'm like most engineers

[43:47] (2627.52s)

would be like, "Sign me up." Like I I

[43:48] (2628.96s)

don't want, you know, we all hate

[43:50] (2630.00s)

micromanagement. But now when you're

[43:51] (2631.92s)

telling me like he would say like, "Oh,

[43:53] (2633.68s)

so we just signed a billion dollar

[43:54] (2634.96s)

contract. Availability is important and

[43:57] (2637.04s)

then stops talking." I'm like, "That

[43:59] (2639.20s)

sounds uncomfortable."

[44:01] (2641.60s)

And and and basically like you're kind

[44:03] (2643.04s)

of expected a little bit to like

[44:04] (2644.48s)

understand what he's expecting even

[44:06] (2646.16s)

though he doesn't know. And then and I'm

[44:07] (2647.84s)

assuming, you know, there's two ways of

[44:09] (2649.28s)

going, right? You go back on the next

[44:10] (2650.64s)

one-on-one and you say something and

[44:12] (2652.16s)

he's like like Steve like you're a

[44:14] (2654.32s)

principal engineer. This is not what I

[44:15] (2655.60s)

expect of you and you don't want that.

[44:17] (2657.92s)

whereas this, you know, if if you bring

[44:19] (2659.52s)

back the right things. So, sounds like

[44:20] (2660.88s)

you really need to uplevel in like

[44:22] (2662.32s)

understanding how like these people

[44:24] (2664.32s)

think. AB:

[44:25] (2665.28s)

Absolutely. And so, he's, you know, he's

[44:26] (2666.96s)

accountable to to his boss as well. And,

[44:29] (2669.76s)

you know, don't get me wrong, I I

[44:31] (2671.20s)

didn't, you know, I I had a I owned

[44:33] (2673.36s)

aspects of availability. You know,

[44:34] (2674.80s)

there's a multi,000 person organization

[44:37] (2677.36s)

at Prime Video doing this stuff, but we

[44:39] (2679.36s)

owned the the live sports aspect of

[44:41] (2681.04s)

this. Um, and you know, there are

[44:43] (2683.20s)

playback teams, there are, you know,

[44:44] (2684.88s)

recommendation teams, there, you know,

[44:46] (2686.88s)

there's so many different teams that are

[44:48] (2688.24s)

there that had to to really step up and

[44:50] (2690.64s)

and uh make sure that availability was

[44:52] (2692.80s)

good. But he would say something like,

[44:55] (2695.28s)

hey, you know, what is our availability

[44:57] (2697.52s)

posture for certain aspects and I would

[45:00] (2700.80s)

have to go and figure it out. Yeah.

[45:02] (2702.80s)

Like where what are we measuring? What

[45:04] (2704.40s)

are we not measuring? there's a deadline

[45:06] (2706.48s)

for, you know, the start of a season uh

[45:08] (2708.88s)

where we're expecting, you know,

[45:10] (2710.16s)

millions and millions of concurrent uh

[45:12] (2712.32s)

to come in. Um what can we do between

[45:15] (2715.28s)

now and then, right? And then if we do

[45:17] (2717.44s)

write some software like what what is

[45:20] (2720.08s)

the highest leverage piece of software

[45:21] (2721.76s)

that we could create that would increase

[45:23] (2723.76s)

our availability posture. And so the way

[45:25] (2725.36s)

that I I sort of describe it to people

[45:27] (2727.28s)

is you are assigned not a problem, not

[45:31] (2731.36s)

even a problem space, you're assigned a

[45:32] (2732.96s)

direction. You can solve the problem

[45:34] (2734.32s)

with code. You can solve the problem

[45:36] (2736.00s)

with system design and architecture, but

[45:38] (2738.48s)

you could also solve the problem say by,

[45:40] (2740.56s)

you know, I don't know, hey, maybe

[45:42] (2742.16s)

there's some off-the-shelf software we

[45:43] (2743.60s)

should purchase.

[45:44] (2744.64s)

U maybe there's a dev team that we

[45:46] (2746.96s)

should start to spin up right now, um,

[45:49] (2749.52s)

whose job it is to do this particular

[45:52] (2752.00s)

thing. Maybe we've identified a piece of

[45:54] (2754.96s)

software and it's already been scoped

[45:56] (2756.80s)

that this team needs to go and build,

[45:59] (2759.28s)

but it's not a priority for them. now we

[46:02] (2762.40s)

need to go and figure out like you know

[46:03] (2763.92s)

how we can get them to do it. Can we

[46:05] (2765.36s)

shuffle around resources? That sort of

[46:07] (2767.52s)

thing. And so the way I describe it is

[46:08] (2768.88s)

like there's so many more things on the

[46:10] (2770.48s)

menu

[46:11] (2771.44s)

that you can use to solve the problem.

[46:14] (2774.40s)

And I don't think people recognize that.

[46:16] (2776.48s)

They they think that it's just oh when

[46:17] (2777.84s)

you're a principal like you just like

[46:20] (2780.00s)

code a lot and it's just really

[46:21] (2781.52s)

complicated

[46:22] (2782.16s)

or or do more meetings, you know, that's

[46:23] (2783.76s)

what happens.

[46:24] (2784.96s)

I mean at the end of the day like don't

[46:26] (2786.32s)

get me wrong, there's a ton of meetings

[46:27] (2787.68s)

that go on.

[46:28] (2788.48s)

Yeah. Yeah. But but this is I I I think

[46:30] (2790.56s)

it's good to like like shine light

[46:31] (2791.92s)

because I also feel like once it sounds

[46:34] (2794.16s)

like a big change, but I also kind of

[46:35] (2795.84s)

feel if if you get good at this, you

[46:38] (2798.08s)

might not really want to go back to, you

[46:40] (2800.64s)

know, having a manager who's like, "All

[46:41] (2801.92s)

right, here's a project. We need to

[46:43] (2803.60s)

solve like, you know, scope it out and

[46:45] (2805.20s)

which you can do, right?"

[46:46] (2806.32s)

Yeah,

[46:46] (2806.88s)

that that's cool. And now the next

[46:48] (2808.48s)

challenge that Bavik said was this all

[46:50] (2810.80s)

sounds great, but there's apparently

[46:52] (2812.24s)

bandwidth challenge. So it's it's he's

[46:54] (2814.40s)

become this like social resource where

[46:56] (2816.72s)

people just pull you into everything and

[46:58] (2818.40s)

you're reading.

[46:59] (2819.92s)

Yeah. No, you know, I think I I wish I

[47:02] (2822.00s)

had taken a screenshot, but you know, I

[47:03] (2823.84s)

have my Outlook calendar, right? So it's

[47:05] (2825.36s)

my schedule. My day looked like most

[47:08] (2828.56s)

people's week, so it looked like

[47:11] (2831.04s)

somebody had just like blew up a Tetris

[47:13] (2833.36s)

factory. Like there there was like I

[47:15] (2835.44s)

would have triple or quadruple booked on

[47:17] (2837.52s)

a Monday all through the day.

[47:19] (2839.28s)

So you would have the manager calendar

[47:20] (2840.64s)

as an IC.

[47:22] (2842.00s)

Yeah. And it's it's absolutely crazy

[47:24] (2844.24s)

because and you know for that large org

[47:26] (2846.24s)

that I was supporting everybody just

[47:28] (2848.88s)

added me as optional or or they might

[47:31] (2851.60s)

try to say like no you're actually

[47:33] (2853.04s)

required for all of these meetings but

[47:34] (2854.64s)

when you have you have a triple booked

[47:36] (2856.24s)

calendar and you're required for this

[47:37] (2857.84s)

stuff you just learn that you're going

[47:40] (2860.16s)

to have to disappoint a lot of people.

[47:42] (2862.24s)

Yeah. And so it's it's this sort of like

[47:44] (2864.80s)

uh you know um this thing where it's

[47:46] (2866.64s)

like it's almost easier to say no now

[47:48] (2868.64s)

that you're obscenely over booked versus

[47:51] (2871.60s)

when you're a senior engineer you're

[47:52] (2872.96s)

like I don't have time to write code but

[47:55] (2875.52s)

there's just barely enough time in

[47:58] (2878.00s)

between the cracks.

[47:59] (2879.20s)

Yeah.

[47:59] (2879.76s)

And so I think that uh it's almost like

[48:02] (2882.32s)

when it when your schedule breaks that's

[48:04] (2884.48s)

when you are finally freed because you

[48:06] (2886.48s)

know that you can sort of say no to

[48:07] (2887.92s)

stuff. But ultimately, if I just went to

[48:10] (2890.16s)

all of the meetings that everybody said

[48:11] (2891.68s)

that I would have to go to, I would be a

[48:13] (2893.28s)

professional meeting attender and I

[48:15] (2895.12s)

would literally have no time to do the

[48:16] (2896.88s)

work.

[48:17] (2897.28s)

And then Bavik follows up on this next

[48:19] (2899.76s)

challenge, which is being truly present.

[48:21] (2901.52s)

And he writes, I think it's almost like,

[48:23] (2903.84s)

you know, he was sitting next to you.

[48:25] (2905.04s)

You find yourself physically present in

[48:26] (2906.64s)

one meeting while your mind is already

[48:28] (2908.16s)

racing against next three.

[48:29] (2909.68s)

You know, it's it's a it's a really big

[48:31] (2911.52s)

challenge. You know, I I pride myself on

[48:34] (2914.32s)

being a good communicator and being

[48:36] (2916.00s)

present. And when there there are 20

[48:38] (2918.88s)

things that are going on in the air or

[48:40] (2920.88s)

100 things that are going on, it's just

[48:43] (2923.20s)

really really difficult to to say single

[48:45] (2925.60s)

threaded. Um, and what I ended up having

[48:49] (2929.44s)

to do is to to sort of say like, okay, I

[48:52] (2932.00s)

could do all of these things and they

[48:53] (2933.60s)

would be really impactful, but I just

[48:55] (2935.52s)

had to aggressively prioritize and say,

[48:57] (2937.84s)

you know, for the availability, I'm just

[48:59] (2939.76s)

looking at availability. there's all

[49:01] (2941.28s)

these other fires that are going on

[49:03] (2943.28s)

which is disappointing

[49:05] (2945.36s)

because there there's so many things

[49:07] (2947.12s)

that you know you could be focusing on.

[49:09] (2949.52s)

It's it's it's super difficult. And so I

[49:11] (2951.92s)

you know I work with a lot of people to

[49:13] (2953.36s)

try to get them to the next level and

[49:14] (2954.64s)

they say Steve well I'm completely

[49:16] (2956.16s)

overwhelmed. There are like 20 things

[49:18] (2958.00s)

that are going on. Um and I tell them

[49:20] (2960.96s)

like you think it gets easier when you

[49:23] (2963.84s)

get higher level there's just going to

[49:25] (2965.36s)

be more and more things on your plate.

[49:27] (2967.04s)

Why wait until you burn out or you

[49:29] (2969.60s)

break? you can just start implementing

[49:31] (2971.20s)

these things now. So every high level

[49:32] (2972.80s)

tech I see I know and managers included

[49:35] (2975.52s)

they have a wonderful system in order to

[49:38] (2978.80s)

like isolate signal and then cut out the

[49:41] (2981.04s)

noise and if you don't have that you

[49:43] (2983.44s)

literally won't survive but it just at

[49:45] (2985.04s)

the at the principal level and above

[49:46] (2986.48s)

it's just it's just amplified that much

[49:48] (2988.40s)

more. I'm getting sense that a lot of

[49:50] (2990.88s)

the work as you do as a principal

[49:52] (2992.48s)

engineer I mean most there's huge

[49:54] (2994.32s)

amounts of software engineering and you

[49:55] (2995.84s)

need to be uh you know just just really

[49:58] (2998.00s)

good at at building resilient systems

[50:01] (3001.76s)

learning about new technologies you know

[50:03] (3003.60s)

for example today I'm assuming whoever

[50:05] (3005.36s)

is a principal engineer at Amazon they

[50:06] (3006.88s)

expected to just know everything about

[50:08] (3008.96s)

LLM's trade-offs characteristics etc

[50:11] (3011.52s)

because they're anyway but you also need

[50:14] (3014.08s)

to just become do the skills that

[50:16] (3016.40s)

managers have which is managing your

[50:18] (3018.64s)

time uh changing contacts, figure out

[50:22] (3022.00s)

how to get that focus time like you know

[50:24] (3024.00s)

contrary to popular belief like managers

[50:26] (3026.16s)

actually need focus time. So like you

[50:27] (3027.76s)

know I I will also always try to carve

[50:29] (3029.68s)

out some time but you're now doing it

[50:32] (3032.32s)

while your title is not manager but

[50:34] (3034.00s)

actually it's it's it feels like you

[50:35] (3035.60s)

combine a manager a lot of manual

[50:37] (3037.60s)

responsibilities and a lot of you know

[50:39] (3039.12s)

like experienced engineer and boom you

[50:41] (3041.12s)

get the principal engineer role. Oh the

[50:42] (3042.72s)

only upside is like you don't need to do

[50:44] (3044.16s)

performance reviews for people.

[50:45] (3045.28s)

Congratulations you saved a little bit

[50:46] (3046.56s)

of that. Well, actually during

[50:48] (3048.88s)

performance review season, they pull the

[50:50] (3050.80s)

principal engineers in cuz if you're if

[50:52] (3052.88s)

you're So, you know, if you're stack

[50:54] (3054.72s)

ranking people, okay, cool. Well, we'll

[50:57] (3057.04s)

need to take a look at their performance

[50:58] (3058.56s)

check. So, I reported to a VP, you know,

[51:01] (3061.04s)

one of my peers was a director and he

[51:03] (3063.28s)

was basically like, "Hey, Steve, I would

[51:04] (3064.72s)

like you to show up to my performance

[51:06] (3066.24s)

review for my entire org of hundreds

[51:08] (3068.88s)

something people." And I'm like, "I

[51:10] (3070.40s)

can't do that for you and for everybody

[51:12] (3072.48s)

else." Okay. So now so now it would make

[51:14] (3074.64s)

sense why as a principal engineer your

[51:16] (3076.32s)

compensation package will be similar to

[51:18] (3078.24s)

like uh is it a senior engineering

[51:20] (3080.48s)

manager or something like that

[51:21] (3081.76s)

around that

[51:22] (3082.48s)

around that but basically like the job

[51:24] (3084.88s)

is has a lot of overlaps okay the

[51:27] (3087.68s)

benefit is you're not the one delivering

[51:29] (3089.44s)

the performance review the direct report

[51:31] (3091.68s)

but you're doing almost everything else

[51:33] (3093.36s)

or in terms of the effort I'm talking

[51:35] (3095.68s)

about.

[51:36] (3096.16s)

Yeah.

[51:36] (3096.80s)

Okay. So, having been a principal

[51:39] (3099.28s)

engineer for 4 years, what are the good

[51:41] (3101.52s)

things that you really really liked

[51:42] (3102.96s)

about Amazon, specifically Amazon's

[51:45] (3105.20s)

principal engineer role? And what are

[51:46] (3106.72s)

some of the, you know, not so good or it

[51:50] (3110.24s)

could have been better things?

[51:51] (3111.44s)

I mean, the the great parts are you get

[51:54] (3114.72s)

visibility that you just couldn't

[51:56] (3116.48s)

possibly have at the team level. you

[51:58] (3118.64s)

know, within a large organization like

[52:00] (3120.56s)

Prime Video or wherever you're at, there

[52:03] (3123.12s)

are many thousands of people that are

[52:05] (3125.36s)

working within that organization doing

[52:07] (3127.68s)

so many things, right? And and typically

[52:10] (3130.08s)

the performance of these people is

[52:11] (3131.52s)

really high. There's so many different

[52:13] (3133.20s)

directions that are going on. And so to

[52:15] (3135.60s)

survive, you kind of have to look inward

[52:17] (3137.44s)

and you say, "Okay, well, here's my

[52:18] (3138.96s)

service boundary. Here's all the

[52:20] (3140.24s)

software I own. I'm going to own

[52:22] (3142.16s)

everything within the sphere of

[52:23] (3143.44s)

ownership." because you've built this

[52:25] (3145.12s)

wall up, you tend not to be able to see

[52:27] (3147.68s)

like that broader picture.

[52:29] (3149.20s)

Yeah.

[52:29] (3149.52s)

And so, as a principal engineer, I think

[52:31] (3151.36s)

it's really awesome to be able to sort

[52:33] (3153.12s)

of like spelunk and and be able to go to

[52:35] (3155.44s)

different teams and and sort of see that

[52:37] (3157.52s)

broader picture. And I just don't I

[52:40] (3160.24s)

don't see a way that you would be able

[52:41] (3161.60s)

to get that vis that type of visibility

[52:43] (3163.76s)

that's super interesting um at a lower

[52:46] (3166.16s)

level. Mhm.

[52:47] (3167.04s)

You know, I think the other thing is

[52:48] (3168.56s)

like, you know, whether it's it's

[52:50] (3170.32s)

warranted or not, you do get some amount

[52:51] (3171.92s)

of status when you go to a meeting,

[52:53] (3173.92s)

people just listen to you. They listen

[52:56] (3176.24s)

to your hairrained ideas and it's kind

[52:58] (3178.40s)

of nice because you don't necessarily

[52:59] (3179.84s)

have to like prove yourself over and

[53:02] (3182.08s)

over again, right?

[53:03] (3183.36s)

It's a bit less like professional like

[53:05] (3185.60s)

not fights, but just establishing that

[53:08] (3188.24s)

you know what you're talking about.

[53:09] (3189.76s)

Yeah. Yeah. Um, now the bad things are,

[53:13] (3193.60s)

you know, uh, there's a lot of folks

[53:16] (3196.00s)

that are really good in tech and being

[53:17] (3197.52s)

really effective as a principal

[53:18] (3198.72s)

engineer, but then they also, you know,

[53:21] (3201.28s)

myself included, they're like, "Okay,

[53:23] (3203.04s)

cool. Well, that sort of makes me an

[53:24] (3204.64s)

expert in pretty much everything." And

[53:26] (3206.96s)

so you would get these principal

[53:28] (3208.32s)

engineers together. We had a weekly

[53:29] (3209.60s)

meeting and and so it would be like okay

[53:32] (3212.16s)

if you wanted to talk about like

[53:33] (3213.68s)

establishing a constitution for a small

[53:35] (3215.76s)

island nation all of a sudden they would

[53:37] (3217.68s)

just be like well like here the main

[53:39] (3219.20s)

considerations is like we nobody has a

[53:41] (3221.12s)

background in government policy but all

[53:43] (3223.92s)

of a sudden like just because you're

[53:45] (3225.44s)

sort of trained to do so you start to

[53:47] (3227.44s)

like pitch in you're like well actually

[53:49] (3229.36s)

you know maybe we should have two

[53:50] (3230.48s)

branches of government or three branches

[53:51] (3231.92s)

of government and and it just sounds

[53:54] (3234.16s)

like we would know what we're we're

[53:55] (3235.60s)

doing but we don't and so there's this

[53:58] (3238.88s)

trap and and again I've fallen into it

[54:00] (3240.80s)

many times where you actually think

[54:02] (3242.24s)

you're an expert in one thing but you're

[54:05] (3245.60s)

actually not right and so you know take

[54:07] (3247.60s)

LLMs there's a ton of folks that

[54:10] (3250.08s)

understand AI I left before it was sort

[54:12] (3252.88s)

of like allowed to use internally but I

[54:15] (3255.20s)

think you can

[54:16] (3256.08s)

use it now um I'm not an expert in LLMs

[54:19] (3259.92s)

at all but I I do think that um the

[54:23] (3263.20s)

expectation would be that you understand

[54:26] (3266.24s)

you know how they work but then the

[54:28] (3268.08s)

expectations also like hey what should

[54:30] (3270.24s)

our policy be how should we be thinking

[54:32] (3272.16s)

about this stuff

[54:33] (3273.84s)

and I think that's fine for mature

[54:36] (3276.96s)

technologies potentially like you can

[54:38] (3278.64s)

ramp yourself up for it but as like that

[54:40] (3280.72s)

particular landscape is changing so

[54:42] (3282.56s)

quickly I think there's this sort of

[54:44] (3284.80s)

trap where you you sort of you speak as

[54:47] (3287.12s)

an authority even though you haven't had

[54:49] (3289.44s)

the requisite time to ramp up at

[54:51] (3291.20s)

something

[54:51] (3291.92s)

and you've been there for 17 years at at

[54:54] (3294.00s)

Amazon what are your favorite parts of

[54:55] (3295.60s)

the culture like I I you know there's a

[54:57] (3297.44s)

lot of things that uh there's a values

[55:00] (3300.32s)

that that we all know like the frugality

[55:02] (3302.80s)

customer obsession what what were the

[55:05] (3305.20s)

things that you're that you found to be

[55:07] (3307.36s)

like the most interesting or the ones

[55:08] (3308.88s)

that had lasting impact and how did they

[55:10] (3310.72s)

change how did Amazon change over 17

[55:12] (3312.80s)

years they must have changed

[55:14] (3314.48s)

no I I think the the things I missed the

[55:17] (3317.12s)

most um and in the secret sauce yeah the

[55:19] (3319.84s)

the leadership principles are good but I

[55:21] (3321.76s)

think the actual secret sauce there is

[55:24] (3324.56s)

principled thinking Mhm.

[55:26] (3326.32s)

Right. Yeah. So, you know, there's, you

[55:29] (3329.12s)

know, uh, invent and simplify and bias

[55:31] (3331.20s)

for action and all of this stuff, but

[55:33] (3333.28s)

like ultimately the thing that is

[55:37] (3337.12s)

amazing about those leadership

[55:38] (3338.72s)

principles aren't the specific stances

[55:40] (3340.72s)

that they took. So, they decided that

[55:42] (3342.48s)

customer obsession is a big deal. They

[55:44] (3344.08s)

decided that bias for action is a big

[55:45] (3345.60s)

deal.

[55:46] (3346.32s)

All of these things. But really, if you

[55:48] (3348.32s)

if you looked at a meta level, you'd be

[55:50] (3350.08s)

like, "Oh, these guys have principles

[55:51] (3351.92s)

that they won't budge on." I sort of

[55:53] (3353.68s)

think about it in terms of math and

[55:55] (3355.60s)

axioms like you just take certain things

[55:58] (3358.16s)

to be true. You know, two lines that are

[56:01] (3361.36s)

parallel if you extend them out to

[56:02] (3362.96s)

infinity won't touch them and won't

[56:04] (3364.80s)

touch with each other.

[56:05] (3365.84s)

Yeah. You assume that's true.

[56:07] (3367.28s)

Yeah. You you don't you don't prove

[56:08] (3368.80s)

that. It's an axiom and then based off

[56:10] (3370.72s)

of that you're able to build a system of

[56:13] (3373.12s)

mathematics, right? And so it's the same

[56:15] (3375.52s)

thing with the corporate leadership

[56:17] (3377.12s)

principles at Amazon. They basically

[56:19] (3379.52s)

said, "Okay, we are going to fix these

[56:22] (3382.16s)

things to be true." There are 16 or 12

[56:24] (3384.40s)

or I don't know, they just sort of built

[56:25] (3385.92s)

some

[56:26] (3386.64s)

and now they're 16

[56:28] (3388.48s)

and um but there are like four or five

[56:31] (3391.60s)

that are just really core to to Amazon

[56:34] (3394.96s)

and we just fix those things to be true.

[56:37] (3397.12s)

Which which ones were the ones that you

[56:38] (3398.64s)

felt were the most present?

[56:40] (3400.88s)

Customer obsession. We are absolutely

[56:43] (3403.28s)

customer obsessed. We'll just burn money

[56:45] (3405.36s)

to to delight a customer. You can you

[56:47] (3407.44s)

can be in a meeting with a VP as an

[56:49] (3409.28s)

intern and you say hey that's a bad

[56:51] (3411.36s)

customer experience. It would be like a

[56:52] (3412.80s)

needle coming off a record. It would

[56:54] (3414.56s)

just be like what what are you talking

[56:55] (3415.92s)

about like immediately right? You know

[56:57] (3417.76s)

bias for action. Uh so like just get

[57:00] (3420.32s)

some stuff done. Stop asking for

[57:01] (3421.84s)

permission. Just like go and do it,

[57:03] (3423.44s)

right? Ownership it's just like you own

[57:05] (3425.60s)

your software, you run the you know you

[57:07] (3427.76s)

do the operations, you know you own the

[57:10] (3430.24s)

bug count, all of this stuff, right? Um,

[57:12] (3432.72s)

so those are the ones that are like

[57:14] (3434.16s)

those are fixed and then you start

[57:16] (3436.32s)

layering things on top of it and I think

[57:18] (3438.48s)

it's really great and but you know you

[57:19] (3439.84s)

could you could take Amazon and you

[57:21] (3441.44s)

could have like the you know evil goatee

[57:23] (3443.52s)

version of Amazon which is just sort of

[57:25] (3445.12s)

the opposite of those things and that

[57:26] (3446.88s)

would still be a really valid and

[57:28] (3448.56s)

awesome company. So you could say okay

[57:30] (3450.72s)

well what's the opposite of customer

[57:32] (3452.08s)

obsession? It's not customer obsession

[57:34] (3454.24s)

or not not being customer obsessed.

[57:35] (3455.92s)

I I I think it's you know like being

[57:37] (3457.92s)

about your staff. Yeah, which is Google.

[57:42] (3462.24s)

It could be like, hey, we really care

[57:43] (3463.92s)

about our people above everything else.

[57:45] (3465.60s)

Or it could be, you know, um let's not

[57:48] (3468.08s)

mince around it. We care about topline

[57:49] (3469.84s)

or bottom line revenue. Yeah,

[57:51] (3471.28s)

that's totally valid, right? And then

[57:53] (3473.44s)

you could just fix that. You wouldn't

[57:54] (3474.88s)

you can't prove that, you know, being uh

[57:57] (3477.12s)

you know, staff focused is a bad thing.

[57:59] (3479.12s)

You just build that and then you know a

[58:01] (3481.12s)

certain set of of things will happen

[58:02] (3482.88s)

like great things are going to happen

[58:04] (3484.32s)

and then like not so great things are

[58:06] (3486.16s)

going to happen. those not great things

[58:07] (3487.84s)

that happen, you can try to mitigate

[58:09] (3489.76s)

them, but you can't fix them because you

[58:12] (3492.00s)

have started with this principled

[58:13] (3493.60s)

approach to everything.

[58:14] (3494.72s)

Yeah. Yeah. It it it all goes like every

[58:17] (3497.36s)

everything has.

[58:18] (3498.24s)

Yeah.

[58:18] (3498.48s)

I I see what you mean, but I I think

[58:20] (3500.08s)

what you're saying is like it it might

[58:22] (3502.08s)

be less about what the specific

[58:24] (3504.16s)

principles are. I mean, Amazon has

[58:25] (3505.60s)

theirs and we know about them, but it's

[58:27] (3507.12s)

just sticking to them and not keeping

[58:28] (3508.72s)

wiggling cuz because if you keep

[58:30] (3510.08s)

wiggling, it's like what what's the

[58:31] (3511.84s)

point, right? then then you're going to

[58:32] (3512.96s)

have a really look at a mediocre not

[58:35] (3515.52s)

truly not standout company whatever you

[58:37] (3517.84s)

do

[58:38] (3518.08s)

what does it actually mean to be

[58:39] (3519.52s)

principled and to not bend when it could

[58:42] (3522.08s)

be really easy to do so so that's a

[58:44] (3524.00s)

that's an amazing secret sauce of

[58:45] (3525.68s)

Amazon's people look at the leadership

[58:47] (3527.04s)

principle I'm like no it's principle

[58:48] (3528.48s)

thinking another thing

[58:49] (3529.84s)

a lot of this honestly from what I

[58:51] (3531.52s)

understand talking to you earlier and

[58:53] (3533.12s)

some other people a lot of it probably

[58:54] (3534.32s)

comes from Jeff Bezos being from the top

[58:56] (3536.88s)

down being very principled and not not

[58:58] (3538.88s)

giving not saying we we will do whatever

[59:02] (3542.32s)

it takes. Sounds like it was customer

[59:04] (3544.56s)

obsession initially and then some other

[59:06] (3546.00s)

things.

[59:06] (3546.48s)

Yeah. Yeah. Absolutely. And he's he was

[59:08] (3548.56s)

he was an absolute genius uh when it it

[59:10] (3550.80s)

came through. So I'm a I'm a you know

[59:12] (3552.16s)

I'm a Jeff Bezos fanboy. Um for sure

[59:15] (3555.12s)

like it it just it just worked. Um

[59:17] (3557.84s)

another thing that uh uh that's Amazon

[59:20] (3560.88s)

secret sauce is just the writing

[59:22] (3562.16s)

culture. And so you know I spent on the

[59:25] (3565.52s)

order of like 1 to four hours every day

[59:27] (3567.68s)

reading while I was a principal

[59:29] (3569.20s)

engineer. And the it was we had a

[59:32] (3572.08s)

standard format. It was a it was a

[59:33] (3573.76s)

six-page memo. And you know uh that

[59:36] (3576.80s)

would be our business strategy. That

[59:38] (3578.64s)

would be uh a system design. That would

[59:41] (3581.12s)

be you know uh what we called the PR

[59:44] (3584.08s)

FAQ. So a press release and frequently

[59:45] (3585.92s)

asked questions for like a new line of

[59:48] (3588.00s)

business or a new initiative.

[59:49] (3589.76s)

And everybody was sort of constrained to

[59:52] (3592.00s)

the six-page format.

[59:53] (3593.84s)

And everybody just produces documents in

[59:56] (3596.24s)

that format for whatever they need to

[59:57] (3597.84s)

do. And so when I would try to get up to

[60:00] (3600.64s)

speed on a particular thing, I would

[60:02] (3602.24s)

just be like, "Give me your six pages.

[60:04] (3604.32s)

Give me all your documents." And I just

[60:05] (3605.92s)

got really really good at just reading

[60:08] (3608.56s)

these documents to get up to speed,

[60:10] (3610.96s)

which was a self-fulfilling and virtuous

[60:13] (3613.28s)

cycle, which is just like, "Okay, well

[60:14] (3614.96s)

now I need to express myself." And so I

[60:17] (3617.36s)

will write a six-pager, and that will

[60:19] (3619.04s)

set the context for whatever we're

[60:20] (3620.64s)

working on. we'd go to a meeting, you

[60:22] (3622.72s)

would read the six-pager and it was just

[60:24] (3624.96s)

super great to to just actually just

[60:28] (3628.32s)

have people do study hall at the

[60:29] (3629.92s)

beginning part of a meeting where you

[60:31] (3631.76s)

just everybody just gets fast forwarded

[60:33] (3633.84s)

and then you have a really great

[60:35] (3635.20s)

discussion at the end.

[60:36] (3636.56s)

That is what an amazing culture that I

[60:39] (3639.44s)

think that almost every other company

[60:41] (3641.84s)

should replicate if they could. But I

[60:44] (3644.80s)

think the the difficulty would be like

[60:46] (3646.80s)

you actually have to be disciplined and

[60:48] (3648.64s)

actually

[60:49] (3649.84s)

have a breathing cult. Yeah. In

[60:50] (3650.96s)

principle, then have a reading culture

[60:52] (3652.56s)

and then actually value writing.

[60:55] (3655.52s)

Yeah. I almost wonder if unless it comes

[60:57] (3657.60s)

from the top, some of these things might

[60:58] (3658.96s)

just be really really hard to do.

[61:00] (3660.56s)

Yeah.

[61:01] (3661.20s)

One thing that I figured is we're in

[61:04] (3664.56s)

your studio right now and you have a lot

[61:06] (3666.48s)

of these blocks and I asked them what

[61:08] (3668.08s)

they are. Are they for promotions or

[61:10] (3670.00s)

projects or whatever? They're for

[61:11] (3671.92s)

patents.

[61:12] (3672.64s)

Yeah.

[61:13] (3673.28s)

Uh and this is for patent number 10,

[61:17] (3677.04s)

10,824

[61:18] (3678.96s)

964. Can you tell me about why you have

[61:22] (3682.08s)

these, how they come about? Yeah. What

[61:24] (3684.32s)

you needed to do for them?

[61:25] (3685.52s)

So the the highest order bit is like you

[61:28] (3688.40s)

know um for better or for worse there

[61:30] (3690.08s)

are software patents that exist. Um

[61:32] (3692.56s)

Amazon they'll say that basically the

[61:35] (3695.12s)

reason they have them is defensively

[61:37] (3697.28s)

because you know other people will

[61:38] (3698.96s)

assert that hey you're in violation of

[61:41] (3701.20s)

our patents or our IP.

[61:43] (3703.12s)

Um and then you know we'll use them

[61:45] (3705.04s)

reactively. Okay fine but you know

[61:47] (3707.12s)

you're also in violation of these other

[61:48] (3708.88s)

things. Yeah.

[61:49] (3709.84s)

Um, and so, you know, there's a there is

[61:52] (3712.16s)

a culture of of trying to make sure

[61:54] (3714.00s)

that, you know, we protect ourselves in

[61:55] (3715.68s)

that way. But, you know, there's the

[61:57] (3717.04s)

other part of software patents, which is

[61:58] (3718.48s)

basically like, hey, can you really

[61:59] (3719.84s)

patent like math or whatever? Um, and so

[62:02] (3722.80s)

what I learned over time is that, you

[62:04] (3724.56s)

know, I'm just a really bad IP lawyer,

[62:06] (3726.64s)

even though, you know, as a principal

[62:08] (3728.40s)

engineer, I might cosplay as somebody

[62:10] (3730.08s)

that really understands software

[62:11] (3731.52s)

patents, right? um at the end of the day

[62:14] (3734.40s)

um you know what we would do is we would

[62:16] (3736.16s)

take our important six pages and we

[62:17] (3737.84s)

would hand them over to the legal team

[62:19] (3739.76s)

and then they would just be like oh this

[62:21] (3741.36s)

stuff is really interesting like let's

[62:23] (3743.04s)

explore that and so it it turned into

[62:25] (3745.52s)

this awesome thing where like we just

[62:27] (3747.20s)

had ready inputs to go into like the you

[62:30] (3750.72s)

know into that particular system

[62:32] (3752.16s)

a writing culture turns out has a bunch

[62:34] (3754.00s)

of benefits

[62:35] (3755.36s)

exactly and and I think that the there's

[62:38] (3758.16s)

this sort of like it's the concept is

[62:40] (3760.08s)

called like the curse of knowledge which

[62:41] (3761.44s)

is essentially Like if you understand

[62:43] (3763.60s)

something, you discount how long like

[62:46] (3766.32s)

how easy that concept is.

[62:48] (3768.24s)

Y

[62:48] (3768.80s)

and so it's just like you don't get it,

[62:50] (3770.64s)

you don't get it, you don't get it, and

[62:52] (3772.00s)

then you get it and then you're like,

[62:53] (3773.04s)

"Oh, that's trivial, right?" Even

[62:54] (3774.96s)

though, you know, there could have been,

[62:56] (3776.56s)

you know, it could actually be novel or

[62:57] (3777.92s)

it could actually be interesting. And so

[62:59] (3779.92s)

what ends up happening is that you would

[63:01] (3781.44s)

just throw these documents over to the

[63:03] (3783.20s)

lawyers and then they would basically be

[63:05] (3785.36s)

like, "Oh, this stuff is great." and you

[63:07] (3787.52s)

would just be like, well, that's just

[63:08] (3788.96s)

that's just regular software development

[63:10] (3790.40s)

or that's just the context and domain

[63:11] (3791.92s)

that we were living in. You know, it

[63:13] (3793.36s)

turns out that there's some some

[63:14] (3794.56s)

interesting stuff. This particular

[63:16] (3796.08s)

patent I'm I'm I'm proud of. So, there's

[63:18] (3798.48s)

a uh a system design interview question

[63:20] (3800.88s)

that seems to be popular right now, um

[63:22] (3802.88s)

which is like design ticket master,

[63:25] (3805.44s)

right? And so I work on Amazon tickets

[63:27] (3807.44s)

and you know, we ended up shuttering

[63:29] (3809.04s)

that business, but you know, we ended up

[63:30] (3810.96s)

building like one of the world's fastest

[63:32] (3812.72s)

like ticket selling systems like in the

[63:35] (3815.20s)

world, right? we could do many many

[63:37] (3817.12s)

orders per second. So the use case is

[63:39] (3819.52s)

basically at t0 that's you know for a

[63:41] (3821.92s)

really big ticket on sale like that's

[63:43] (3823.44s)

when the maximum amount of demand and

[63:45] (3825.52s)

requests are coming in um and you want

[63:48] (3828.00s)

to sell out all of your ticket supply as

[63:51] (3831.20s)

quickly as possible. The problem is I

[63:54] (3834.96s)

think uh one where you have seated

[63:57] (3837.68s)

concerts.

[63:58] (3838.48s)

Mhm. And so when you purchase a a

[64:01] (3841.84s)

ticket, you know, most of the time with

[64:03] (3843.60s)

the system design stuff, it'll be like

[64:05] (3845.20s)

general admission or it won't be a high

[64:07] (3847.12s)

ticket on, you know, like one with a

[64:09] (3849.36s)

bunch of demand. You have to find

[64:10] (3850.88s)

contiguous seats.

[64:12] (3852.48s)

Yeah. So the really next to each other.

[64:15] (3855.04s)

Yes. Exactly. And so, you know, it's uh

[64:19] (3859.12s)

it's actually really hard. Like suppose

[64:21] (3861.28s)

it was a SQL database as your backing

[64:23] (3863.20s)

store. like how do you come up with a

[64:24] (3864.88s)

SQL query that's just like hey give me

[64:27] (3867.20s)

the best four tickets you know within

[64:30] (3870.24s)

this particular price range that are

[64:32] (3872.08s)

sitting sitted next to each other.

[64:33] (3873.68s)

Yeah. Now now you're thinking so this is

[64:35] (3875.76s)

a real real world thing where you need

[64:37] (3877.44s)

to you want to be as efficient as

[64:38] (3878.88s)

possible in terms of resource usage may

[64:41] (3881.60s)

not be maybe you want to minimize your

[64:42] (3882.88s)

CPU or memory depending on on what you

[64:44] (3884.80s)

have I assume and you need to do as

[64:46] (3886.80s)

quick as rapidly as possible to give

[64:49] (3889.52s)

this to people. Okay. Okay. So, so now

[64:52] (3892.00s)

we're talking about a problem that is

[64:53] (3893.60s)

seems like pretty novel in some ways,

[64:56] (3896.48s)

right?

[64:56] (3896.96s)

Yeah. And so, you know, I was I I did

[64:58] (3898.88s)

this patent with a senior principal. I

[65:00] (3900.80s)

was a senior engineer at the time, but

[65:02] (3902.56s)

the the idea is like, you know, what is

[65:05] (3905.92s)

the theoretical

[65:07] (3907.84s)

maximum speed by which we could, you

[65:10] (3910.56s)

know, show this inventory to people.

[65:12] (3912.88s)

And it turns out that, you know, even if

[65:15] (3915.84s)

you have a high ticket on sale, you only

[65:17] (3917.44s)

have like thousands of tickets at the

[65:19] (3919.36s)

end of the day. So instead of making a

[65:21] (3921.44s)

request to like a backend that would

[65:24] (3924.08s)

conduct some sort of search across the

[65:25] (3925.76s)

space,

[65:27] (3927.12s)

what if you actually inverted it and

[65:29] (3929.44s)

then you basically had each of the

[65:31] (3931.92s)

individual hosts have like some view on

[65:34] (3934.96s)

the entire arena or venue that was there

[65:38] (3938.32s)

and you loaded up all of that

[65:40] (3940.72s)

availability and inventory into like L2

[65:43] (3943.92s)

cache on a CPU.

[65:44] (3944.96s)

Yeah.

[65:45] (3945.36s)

Because it's actually not that many. So

[65:46] (3946.72s)

if you had this compact

[65:47] (3947.60s)

rep was pretty big.

[65:50] (3950.08s)

Then what you can do is you can you can

[65:52] (3952.24s)

do bit manipulation to like really

[65:54] (3954.56s)

really quickly get contiguous seats that

[65:57] (3957.20s)

are there.

[65:58] (3958.32s)

And then what you do is you can like

[66:00] (3960.32s)

send in that particular request and try

[66:02] (3962.96s)

to like reserve those particular seats.

[66:04] (3964.80s)

Yeah. Now now there's a logging problem

[66:06] (3966.80s)

which is much more tractable than like

[66:09] (3969.84s)

hey there's uh you know two million

[66:12] (3972.80s)

people that have just hit your on

[66:15] (3975.68s)

each of them. I'm launching a search for

[66:17] (3977.12s)

each of them.

[66:18] (3978.00s)

Yes. So the the inversion of that

[66:20] (3980.00s)

ordering process by which you like

[66:22] (3982.00s)

actually send out the inventory to the

[66:24] (3984.00s)

individual nodes and then like load it

[66:26] (3986.64s)

up into CPU cache and then just do bit

[66:28] (3988.72s)

manipulation

[66:30] (3990.24s)

um and then try to lock that resource

[66:32] (3992.32s)

from the individual nodes. That was that

[66:34] (3994.56s)

was the basis of this particular patent.

[66:36] (3996.48s)

Awesome. That's clever. And like that

[66:38] (3998.96s)

sounds like some you know people are

[66:40] (4000.48s)

always asking like oh you know on my job

[66:43] (4003.04s)

I don't use the algorithm stuff or or

[66:45] (4005.36s)

any of the formal methods. Sounds like

[66:47] (4007.36s)

there are some uses of it especially

[66:49] (4009.20s)

when you're trying to figure out what is

[66:50] (4010.56s)

it like when you just taking away from

[66:52] (4012.56s)

the pattern like just having a problem

[66:54] (4014.88s)

like like this and saying like what is

[66:56] (4016.48s)

the theoretical limit that we can do

[66:58] (4018.48s)

what is the fastest possible like to

[67:00] (4020.48s)

answer that you probably want to have

[67:02] (4022.24s)

access to these tools like you know like

[67:04] (4024.32s)

so it's it's not always the time and

[67:06] (4026.08s)

effort to yeah actually get into these

[67:08] (4028.08s)

things and um so what are you up to now

[67:11] (4031.04s)

that you've you've left Amazon a year

[67:13] (4033.84s)

ago after like 17 18 very long years,

[67:17] (4037.20s)

you know, I'm just, you know, I'm I'm

[67:18] (4038.56s)

just making content. I'm just sort of

[67:19] (4039.92s)

living the dream there, you know, making

[67:21] (4041.60s)

YouTube videos, uh, started up a

[67:23] (4043.52s)

newsletter. Um, I have a Discord

[67:26] (4046.24s)

community and yeah, just

[67:28] (4048.24s)

Yeah. And we're going to link all all of

[67:29] (4049.60s)

those below. I actually like got to

[67:32] (4052.00s)

first know you before we started

[67:33] (4053.60s)

talking. This was like probably a few

[67:35] (4055.20s)

years ago from your YouTube videos,

[67:36] (4056.88s)

which are, you know, you know, like you

[67:38] (4058.88s)

you shared a lot about like Amazon

[67:40] (4060.72s)

things, software engineering things, and

[67:42] (4062.56s)

just like your general thinking, but

[67:44] (4064.08s)

yeah, your news is a new one. So, I'm

[67:45] (4065.92s)

I'm we'll we'll link it in the show

[67:47] (4067.52s)

notes below. It's it's always a good way

[67:49] (4069.04s)

to keep in touch and also, you know,

[67:50] (4070.40s)

like on your YouTube channel.

[67:51] (4071.76s)

Awesome.

[67:52] (4072.56s)

So, as closing, I have some some rapid

[67:54] (4074.56s)

questions.

[67:55] (4075.12s)

Okay.

[67:55] (4075.52s)

So, I'll I'll just ask and you just

[67:56] (4076.88s)

shoot what comes to mind. What is career

[67:59] (4079.28s)

advice that greatly helped you in your

[68:01] (4081.60s)

path?

[68:02] (4082.16s)

Yeah. I mean, this is I you know, I talk

[68:04] (4084.40s)

a lot about this. It's kind of like, oh,

[68:06] (4086.32s)

what's what's your favorite food or your

[68:08] (4088.08s)

favorite movie? It's just like there's

[68:09] (4089.60s)

so much there and it's hard to pick one.

[68:11] (4091.52s)

What I would say is instead of saying

[68:13] (4093.60s)

like, hey, what's the technology that I

[68:15] (4095.52s)

should learn that's really going to, you

[68:18] (4098.00s)

know, u make my career uh, you know,

[68:20] (4100.56s)

solid, instead sort of flip it around

[68:23] (4103.04s)

and say like, how can I quickly learn

[68:25] (4105.52s)

skills?

[68:26] (4106.72s)

Mhm.

[68:27] (4107.28s)

That makes you that makes you sort of

[68:29] (4109.36s)

like recession proof, right? That that

[68:31] (4111.52s)

sort of makes you valuable. It's

[68:33] (4113.12s)

essentially metalarning. It's like how

[68:34] (4114.64s)

can I learn something faster and faster?

[68:37] (4117.12s)

If if that's your focus, then you'll

[68:39] (4119.52s)

always be you you'll never have a

[68:41] (4121.36s)

problem finding a job and you'll never

[68:43] (4123.68s)

have a problem progressing in your

[68:45] (4125.84s)

career. Now some of the skills may be

[68:48] (4128.08s)

difficult to find resources on online

[68:50] (4130.56s)

but you know I think if you just sort of

[68:52] (4132.56s)

think about like what's a valuable skill

[68:54] (4134.48s)

that if I knew right now would you know

[68:57] (4137.76s)

make my you know job search easier or

[69:00] (4140.00s)

would like make me you know perform

[69:02] (4142.32s)

better on the job and then just sort of

[69:04] (4144.80s)

thinking about acquiring that skill as

[69:06] (4146.56s)

quickly as possible

[69:07] (4147.68s)

and do it now like don't wait.

[69:09] (4149.28s)

Yeah. Well people tend to postpone

[69:11] (4151.04s)

themselves. They'll be like, "Oh, well,

[69:12] (4152.48s)

I'll start when you know everything is l

[69:15] (4155.52s)

lined up." But like to begin, you just

[69:18] (4158.00s)

need to begin. Like when you start

[69:19] (4159.68s)

something that only then will you know

[69:21] (4161.36s)

what you need to do instead of saying

[69:23] (4163.36s)

like, "Oh, I need to get everything that

[69:25] (4165.68s)

I need to do first before I start."

[69:27] (4167.60s)

You've used a lot of programming

[69:28] (4168.88s)

languages. Which one's your favorite and

[69:31] (4171.28s)

why? And and which one do you dislike

[69:33] (4173.28s)

most?

[69:33] (4173.92s)

Yeah. You know, I I you know, I I have

[69:36] (4176.64s)

like a you know, obviously there's no

[69:38] (4178.24s)

perfect programming language. Um, what I

[69:40] (4180.56s)

would say is like I really enjoyed Pearl

[69:45] (4185.44s)

and nobody would ever give that answer,

[69:47] (4187.76s)

but I just like this concept of like

[69:49] (4189.52s)

there's just so many different ways to

[69:51] (4191.12s)

do it. It's a it's a write only

[69:52] (4192.56s)

language. Like you can't read anybody

[69:54] (4194.16s)

else's Pearl and I it's it's actually

[69:56] (4196.48s)

one of the languages that like uses up

[69:58] (4198.16s)

the most power. It's like the least

[69:59] (4199.68s)

efficient. It's interpreted. It's

[70:01] (4201.92s)

it's just like terrible.

[70:03] (4203.92s)

So most of Booking.com still runs out or

[70:06] (4206.16s)

some of it.

[70:06] (4206.64s)

Yeah. Amazon's back end was, you know,

[70:08] (4208.64s)

for a long time and still might be um,

[70:10] (4210.72s)

you know, sort of like Pearl Mason and

[70:12] (4212.40s)

sort of like, uh, web technology bolted

[70:14] (4214.48s)

onto Pearl. But I just kind of like it.

[70:16] (4216.24s)

I just feel like I can express myself

[70:17] (4217.84s)

and there's just like there's just what,

[70:19] (4219.84s)

however you'd like to express yourself,

[70:21] (4221.60s)

you can.

[70:22] (4222.64s)

Um, it also looked like an Asky factory

[70:24] (4224.72s)

blew up sometimes. And so it's just like

[70:26] (4226.64s)

it's it's, you know, now that it's on a

[70:28] (4228.96s)

podcast, you know, I wouldn't really,

[70:30] (4230.48s)

you know, advertise that fact. The best

[70:32] (4232.32s)

programming languages right now, I think

[70:34] (4234.00s)

Rust is pretty interesting. So I might,

[70:36] (4236.00s)

you know, pick that up. Um, at the end

[70:38] (4238.40s)

of the day, like I really love the

[70:41] (4241.44s)

boring languages. Yeah.

[70:43] (4243.04s)

Um, so you know, Java with, you know,

[70:46] (4246.00s)

for all of its stuff, like it's

[70:47] (4247.92s)

verbosity and I think it's just a great

[70:50] (4250.40s)

langu like a JVM based language,

[70:53] (4253.84s)

um, that has essentially like great like

[70:57] (4257.20s)

library support and a bunch of stuff

[70:58] (4258.96s)

written for it, but it's just like super

[71:00] (4260.96s)

boring. Maybe it's just because I'm from

[71:02] (4262.40s)

Amazon and we do this like enterprise

[71:04] (4264.08s)

stuff like

[71:05] (4265.52s)

it's a fine language.

[71:07] (4267.12s)

And then I see you you have a large

[71:09] (4269.04s)

bookshelf here. You also read a lot

[71:11] (4271.28s)

especially at Amazon although most

[71:12] (4272.56s)

internal documents. What is a book that

[71:14] (4274.24s)

you would recommend something around

[71:16] (4276.16s)

software engineering that that you

[71:17] (4277.60s)

enjoyed and it cannot be that book.

[71:19] (4279.68s)

It can't be your book. Um what I would

[71:22] (4282.40s)

say is you know you know I just given

[71:24] (4284.00s)

the advice about um you know metalarning

[71:27] (4287.12s)

and and career growth. I I think that

[71:29] (4289.76s)

most software developers should read a

[71:31] (4291.76s)

book by Kell Newport. It's called so

[71:34] (4294.16s)

good they can't ignore you. And so the

[71:35] (4295.84s)

concept there is around career capital.

[71:38] (4298.16s)

So like what are the skills that are in

[71:39] (4299.60s)

the most demand? And if you can just

[71:41] (4301.92s)

like learn those skills then you become

[71:44] (4304.24s)

in demand. And then you know from there

[71:46] (4306.00s)

you can choose what type of lifestyle

[71:47] (4307.92s)

that you'd like. You know you can also

[71:49] (4309.68s)

like sort of lean into you know some of

[71:52] (4312.16s)

the science of metalarning. So

[71:53] (4313.44s)

deliberate practice space repetition

[71:55] (4315.20s)

that sort of thing. Um, in terms of like

[71:57] (4317.84s)

tech books, I think the new uh AI

[72:00] (4320.16s)

engineering book uh by Chipwin is is

[72:02] (4322.88s)

amazing.

[72:03] (4323.76s)

It's Yeah.

[72:04] (4324.24s)

Um, I think uh DDIA, so the the the

[72:09] (4329.12s)

design of data intensive

[72:10] (4330.16s)

so good. A new new version is coming the

[72:11] (4331.92s)

end of a year actually.

[72:13] (4333.12s)

I'm excited about that. I think that'll

[72:14] (4334.64s)

be pretty good. Um, but you know, at the

[72:16] (4336.56s)

end of the day, like you don't want one

[72:18] (4338.48s)

book on your bookshelf, you want 50

[72:20] (4340.40s)

books on your bookshelf. Um, and so, you

[72:23] (4343.44s)

know, I think within a particular

[72:25] (4345.60s)

subgenre of techbooks, you know, I'd

[72:28] (4348.24s)

have recommendations there. But,

[72:29] (4349.60s)

yeah, Steve, this was great.

[72:31] (4351.20s)

Awesome.

[72:31] (4351.84s)

Really enjoyed it.

[72:32] (4352.96s)

Yeah, great. Thanks so much for having

[72:34] (4354.40s)

me. Thanks a lot for Steve for sharing

[72:36] (4356.24s)

all these details. Although Amazon's

[72:38] (4358.08s)

principal engineering level feels

[72:39] (4359.36s)

surprisingly difficult to get promoted

[72:41] (4361.04s)

to, I have yet to hear of such a strong

[72:43] (4363.20s)

principal engineering community than

[72:44] (4364.64s)

what Amazon builds and keeps investing

[72:46] (4366.48s)

in. This community itself could be a

[72:48] (4368.80s)

reason enough to consider the company

[72:50] (4370.32s)

after the principal plus level should

[72:52] (4372.16s)

you have the opportunity to do so. For a

[72:54] (4374.24s)

deep dive into Amazon's engineering

[72:55] (4375.76s)

culture, including the details on

[72:57] (4377.44s)

compensation, career ladders,

[72:59] (4379.36s)

performance reviews, and engineering

[73:00] (4380.64s)

processes, check out the Pragmatic

[73:02] (4382.48s)

Engineer deep dive linked in the show

[73:04] (4384.16s)

notes below. If you enjoyed this

[73:06] (4386.08s)

podcast, please do subscribe on your

[73:07] (4387.68s)

favorite podcast platform and on

[73:09] (4389.44s)

YouTube. This helps more people discover

[73:11] (4391.60s)

the podcast and a special thank you if

[73:13] (4393.44s)

you leave a rating. Thanks and see you

[73:15] (4395.44s)

in the next

YouTube Deep Summary

What is a Principal Engineer at Amazon? With Steve Huynh

📚 Chapter Summaries (16)

📝 Transcript Chapters (16 chapters):

📝 Transcript (2174 entries):