[00:00] (0.08s)
If you're going to optimize for
[00:01] (1.44s)
performance, saying why can't we be at 1
[00:03] (3.52s)
millisecond or why can't we be at 10
[00:05] (5.52s)
milliseconds and start from there
[00:07] (7.04s)
instead of sort of saying hey let's try
[00:08] (8.72s)
to decrease latencies by 50% or 25%.
[00:11] (11.84s)
Let's just start from what is the
[00:13] (13.44s)
conceptually fastest thing that we could
[00:15] (15.28s)
do and that's actually how Amazon was
[00:17] (17.04s)
created. Amazon's principal engineering
[00:18] (18.64s)
level is unique in many ways across big
[00:20] (20.32s)
tech. Steve Hume was a software engineer
[00:22] (22.56s)
at Amazon for 17 years and worked as the
[00:25] (25.04s)
last four years as a principal engineer.
[00:27] (27.60s)
Today we talk about the ins and outs of
[00:29] (29.20s)
this role, including why being promoted
[00:31] (31.44s)
from senior to principal is so hard,
[00:33] (33.52s)
even though Amazon usually has hundreds
[00:35] (35.28s)
of principal engineering openings and
[00:37] (37.20s)
thousands of seniors trying to get into
[00:38] (38.88s)
these positions, the Amazon principal
[00:40] (40.96s)
engineering community, the Inerson
[00:42] (42.72s)
events, the Slack group, and the
[00:44] (44.16s)
principles of Amazon internal
[00:45] (45.60s)
presentation series. Engineering
[00:47] (47.52s)
concepts at Amazon are on reliability
[00:49] (49.68s)
such as brownouts and COE, correction of
[00:52] (52.16s)
errors, and many more topics. If you're
[00:54] (54.48s)
interested in understanding one of the
[00:55] (55.84s)
hardest engineering levels to get into
[00:57] (57.44s)
across big tech together with stories of
[00:59] (59.76s)
how Steve thrived in this position, this
[01:01] (61.92s)
episode is for you. Subscribing on
[01:03] (63.76s)
YouTube and on your favorite podcast
[01:05] (65.12s)
player greatly helps more people
[01:06] (66.56s)
discover this show. If you enjoy it,
[01:08] (68.72s)
thanks for doing so. So, Steve, welcome
[01:11] (71.36s)
to the podcast. Uh, thanks for having
[01:13] (73.44s)
me. How long were you at Amazon? 17
[01:16] (76.96s)
Yeah, I was there for 17 and 1/2 years.
[01:19] (79.92s)
And yeah, I just quit last year. So,
[01:22] (82.48s)
I've been basically a year doing uh
[01:25] (85.20s)
other things now.
[01:26] (86.40s)
And what were the things that you worked
[01:28] (88.00s)
on while you were there?
[01:29] (89.12s)
You know, people always talk about my
[01:31] (91.12s)
long tenure there, but uh you know, I
[01:33] (93.36s)
feel like I've had like five or six jobs
[01:36] (96.32s)
uh over that time period. Um I started
[01:39] (99.28s)
off on you know, a project called Search
[01:41] (101.84s)
Inside the Book. I worked on the first
[01:43] (103.92s)
Kindle launch. Wow.
[01:46] (106.16s)
I worked on the uh precursor to Prime
[01:49] (109.12s)
Video. I sort of like worked there at
[01:50] (110.72s)
the beginning part of my career and then
[01:51] (111.92s)
I sort of ended my career there uh for
[01:54] (114.48s)
the last five years of my time there. I
[01:56] (116.96s)
worked in payments. I worked in uh
[02:00] (120.40s)
Amazon local which was sort of our group
[02:02] (122.64s)
on project when that type of business
[02:04] (124.88s)
was looking like it was going to take
[02:07] (127.12s)
over. Um I worked on Amazon restaurants.
[02:09] (129.92s)
I worked on Amazon tickets which was all
[02:11] (131.68s)
ticket master clone
[02:13] (133.52s)
and then um my last 5 years was working
[02:15] (135.92s)
on live sports streaming uh on Prime
[02:18] (138.24s)
Video. If you want to build a great
[02:20] (140.00s)
product, you have to ship quickly. But
[02:22] (142.32s)
how do you know what works? More
[02:24] (144.32s)
importantly, how do you avoid shipping
[02:26] (146.32s)
things that don't work? The answer,
[02:29] (149.20s)
Statig. Static is a unified platform for
[02:32] (152.24s)
flags, analytics, experiments, and more.
[02:34] (154.96s)
Combining five plus products into a
[02:36] (156.64s)
single platform with a unified set of
[02:38] (158.64s)
data. Here's how it works. First,
[02:41] (161.68s)
StatSic helps you ship a feature with a
[02:43] (163.60s)
feature flag or config. Then it measures
[02:46] (166.72s)
how it's working from alerts and errors
[02:48] (168.96s)
to replays of people using that feature
[02:51] (171.12s)
to measurement of topline impact. Then
[02:53] (173.68s)
you get your analytics, user account
[02:55] (175.44s)
metrics, and dashboards to track your
[02:57] (177.04s)
progress over time, all linked to the
[02:58] (178.96s)
stuff you ship. Even better, Static is
[03:01] (181.44s)
incredibly affordable with a super
[03:03] (183.28s)
generous free tier, a starter program
[03:05] (185.20s)
with $50,000 of free credits, and custom
[03:07] (187.52s)
plans to help you consolidate your
[03:08] (188.96s)
existing spend on flags, analytics, or
[03:10] (190.96s)
AB testing tools. To get started, go to
[03:13] (193.84s)
stats.com/pragmatic.
[03:16] (196.24s)
That is satsig.com/pragmatic.
[03:19] (199.92s)
Happy building. This episode was brought
[03:21] (201.92s)
to you by Graphite, the developer
[03:23] (203.84s)
productivity platform that helps
[03:25] (205.28s)
developers create, review, and merge
[03:27] (207.04s)
smaller code changes, stay unblocked,
[03:29] (209.28s)
and ship faster.
[03:31] (211.36s)
Code review is a huge time sync for
[03:32] (212.96s)
engineering teams. Most developers spend
[03:35] (215.28s)
about a day per week or more reviewing
[03:37] (217.12s)
code or blocked waiting for a review. It
[03:40] (220.08s)
doesn't have to be this way. Graphite
[03:42] (222.24s)
brings stack pull requests, the workflow
[03:44] (224.24s)
at the heart of the best-in-class
[03:45] (225.44s)
internal code review tools at companies
[03:47] (227.04s)
like Meta and Google to every solver
[03:49] (229.20s)
company on GitHub.
[03:51] (231.20s)
Graphite also leverages high signal
[03:53] (233.12s)
codebased aware AI to give developers
[03:55] (235.04s)
immediate actionable feedback on their
[03:56] (236.56s)
poll requests, allowing teams to cut
[03:58] (238.72s)
down on review cycles. Tens of thousands
[04:01] (241.60s)
of developers at top companies like
[04:03] (243.20s)
Asana, Ramp, Tecton, and Verscell rely
[04:05] (245.60s)
on Graphite every day. Start stacking
[04:08] (248.32s)
with graphite today for free and reduce
[04:10] (250.40s)
your time to merge from days to hours.
[04:12] (252.64s)
Get started at gt.dev/pragmatic.
[04:15] (255.68s)
That is g for graphite t for
[04:17] (257.36s)
technology.dev/pragmatic.
[04:20] (260.16s)
So that that's that's a lot of different
[04:21] (261.60s)
teams. Is was it like how did you work
[04:24] (264.08s)
on so many teams? Is it just like
[04:25] (265.76s)
there's a lot of internal transfers? Did
[04:27] (267.52s)
you get bored? Was it just you followed
[04:29] (269.36s)
your manager? How does it work inside
[04:31] (271.28s)
Amazon? Because when people think about
[04:32] (272.56s)
companies of people who have not worked
[04:34] (274.00s)
on Amazon, they would kind of assume you
[04:35] (275.76s)
go, you work there, you're on a team for
[04:37] (277.68s)
like, you know, four, five, 6 years.
[04:39] (279.52s)
Clearly not the case. You know, it
[04:41] (281.20s)
depends a little bit on like corporate
[04:42] (282.72s)
policy and then where you are with your
[04:44] (284.48s)
career. Uh I started as a support
[04:47] (287.12s)
engineer. So sort of like operationally
[04:50] (290.16s)
um focused person and then you know I
[04:52] (292.88s)
was basically like I want to be a
[04:54] (294.08s)
software developer and so you know I
[04:57] (297.04s)
think getting into the company was
[04:59] (299.60s)
pretty difficult but once I was there
[05:01] (301.84s)
sort of set that target and and changed
[05:04] (304.24s)
roles and when I changed the role um you
[05:07] (307.36s)
know it was a natural time to move to
[05:09] (309.92s)
another team. There's some also some uh
[05:12] (312.80s)
internal policy. So basically at Amazon,
[05:15] (315.76s)
it used to be that you had to stay on a
[05:17] (317.76s)
team for at least a year before you
[05:21] (321.20s)
transferred. And if you wanted to
[05:22] (322.64s)
transfer,
[05:24] (324.32s)
like a a senior manager or director or
[05:26] (326.40s)
whoever up top could block your
[05:28] (328.48s)
transfer.
[05:29] (329.84s)
And what that ended up meaning was that
[05:32] (332.08s)
like certain teams that were just
[05:33] (333.68s)
terrible to work on, those teams
[05:36] (336.24s)
actually had more than 100% attrition
[05:38] (338.72s)
over the course of a year because you
[05:40] (340.80s)
measured attrition with a year-long time
[05:43] (343.20s)
unit. Amazon did something actually
[05:46] (346.08s)
smart at the corporate level. Uh they
[05:48] (348.08s)
they basically said okay well you have
[05:51] (351.36s)
freedom of movement now. This sort of
[05:52] (352.88s)
happened I don't know probably like 13
[05:55] (355.28s)
years ago 10 13 years ago.
[05:57] (357.36s)
And so they said you have freedom of
[05:58] (358.80s)
movement now. A VP or a director can
[06:02] (362.32s)
can't block you. They can say okay well
[06:04] (364.24s)
we need another month to get like a
[06:06] (366.08s)
transition plan going.
[06:07] (367.60s)
But essentially you have freedom of
[06:08] (368.88s)
movement as long as you're not on a
[06:10] (370.56s)
performance improvement plan. which
[06:12] (372.56s)
meant that certain teams were sources of
[06:15] (375.28s)
high-quality engineering talent and
[06:16] (376.96s)
certain teams were syncs of high-quality
[06:18] (378.80s)
engineering talent
[06:20] (380.32s)
and it sort of created an internal
[06:22] (382.08s)
marketplace for for different roles. Now
[06:25] (385.60s)
what that ended up meaning was that
[06:27] (387.52s)
certain teams they basically didn't want
[06:30] (390.72s)
you to know what the policy was. They
[06:32] (392.88s)
wanted you to to sort of think that you
[06:34] (394.64s)
were kind of stuck.
[06:36] (396.48s)
But you know despite the that sort of
[06:38] (398.80s)
like local gamesmanship that was going
[06:41] (401.20s)
Yeah. Like basically some managers
[06:42] (402.24s)
didn't want their best people to leave,
[06:43] (403.52s)
right? Let's just say it how it is.
[06:45] (405.28s)
But ultimately the the I think it's a
[06:47] (407.60s)
it's a great strategy because it it put
[06:49] (409.52s)
the like if there was a team that was
[06:52] (412.40s)
difficult to staff, the problem was on
[06:54] (414.88s)
the management. It wasn't something that
[06:57] (417.60s)
had to be, you know, bared by or born
[07:00] (420.64s)
from the the employee themselves. And so
[07:04] (424.48s)
you know getting back to my own career
[07:06] (426.24s)
journey at a very large company like
[07:08] (428.40s)
Amazon there is so many awesome things
[07:10] (430.64s)
that are going on and you know um I
[07:14] (434.72s)
decided to just kind of go where my
[07:16] (436.96s)
curiosity took me. Now there were some
[07:18] (438.96s)
times where you know there were reorgs
[07:20] (440.80s)
or you know a line of business got got
[07:23] (443.60s)
spun down.
[07:24] (444.88s)
Um but ultimately you know I think
[07:27] (447.12s)
freedom of movement was one of the
[07:28] (448.80s)
smartest things that that Amazon did.
[07:30] (450.72s)
And I think this is something that
[07:32] (452.64s)
people don't really appreciate about
[07:33] (453.92s)
some large companies. You know, not all
[07:35] (455.52s)
companies are like Amazon and every
[07:37] (457.12s)
company changes, right? Like today, I'm
[07:38] (458.64s)
assuming it will be hard to move as many
[07:41] (461.20s)
teams within Amazon. Depending on where
[07:43] (463.20s)
you are, you know, if you're in a if
[07:44] (464.48s)
you're in a satellite office where
[07:45] (465.76s)
there's two teams, uh, you can probably
[07:47] (467.52s)
move on to the other team at max.
[07:49] (469.84s)
But I think this is one of the
[07:51] (471.12s)
underrated things of large companies
[07:52] (472.56s)
like once you are in, it's almost always
[07:54] (474.72s)
easier to get that job at another team
[07:56] (476.96s)
from the inside. Yes. Especially because
[07:58] (478.72s)
you can talk to them. You know, this is
[08:00] (480.40s)
I I talked with the Reddit mobile team
[08:02] (482.40s)
and I asked like, "Oh, how how can you
[08:04] (484.24s)
get a become a platform engineer on the
[08:06] (486.00s)
mobile team?" And they said like, "Well,
[08:07] (487.52s)
you know, most of our hires have been
[08:09] (489.12s)
internal. They just helped us out on
[08:10] (490.80s)
hackathons. They come around, they
[08:12] (492.88s)
commit stuff. We know them. It's a it's
[08:14] (494.64s)
a lowrisk hire." I think it's just nice
[08:16] (496.32s)
to remember that when you think of like
[08:17] (497.44s)
a big company like Amazon or Meta or or
[08:19] (499.60s)
Microsoft, it's just so many small teams
[08:21] (501.84s)
and once you're in, you actually have
[08:24] (504.00s)
almost priority access to those teams if
[08:26] (506.64s)
you play your cards right.
[08:27] (507.92s)
Absolutely. And you know, you might
[08:29] (509.60s)
interview for that team, but it's it's
[08:31] (511.36s)
such lower stakes than an external
[08:33] (513.60s)
interview. And you know, just all things
[08:36] (516.64s)
being equal, would you rather take
[08:38] (518.16s)
somebody that's, you know, uh, internal
[08:40] (520.56s)
and and knows the culture. They know how
[08:42] (522.80s)
software is developed within a
[08:44] (524.32s)
particular context or somebody that's
[08:47] (527.04s)
just as good but doesn't, you know,
[08:49] (529.20s)
hasn't been onboarded. And I think
[08:51] (531.04s)
ultimately you're you're going to pick
[08:52] (532.48s)
the person that's internal, all things
[08:54] (534.00s)
being equal.
[08:54] (534.88s)
Yeah. It's just kind of like business
[08:56] (536.32s)
rationality for the most part. So one
[08:58] (538.40s)
thing about Amazon and about large
[09:00] (540.32s)
companies like Amazon is people talk
[09:02] (542.08s)
about externally about the scale and
[09:04] (544.16s)
it's hard to imagine but can you give us
[09:05] (545.84s)
a sense of the scale that you've seen or
[09:08] (548.08s)
like some tough engineering challenges
[09:09] (549.44s)
that you worked on that would have been
[09:10] (550.80s)
just really hard to work at a smaller
[09:13] (553.12s)
startup? Yeah, I think that's the thing
[09:16] (556.00s)
that you just you will not see at most
[09:19] (559.52s)
other places is the the scale of of
[09:21] (561.92s)
things. I'll I'll give you a couple of
[09:23] (563.52s)
examples. So, you know, Prime is the
[09:26] (566.24s)
exclusive club that everybody is a
[09:28] (568.24s)
member of.
[09:29] (569.44s)
And, you know, in in the US, the the
[09:31] (571.76s)
shipping benefit is is probably, you
[09:34] (574.24s)
know, the most popular, but globally,
[09:38] (578.24s)
um, Prime Video is, you know, it's the
[09:41] (581.28s)
thing that people use the most with
[09:43] (583.20s)
their with their subscription. And so if
[09:46] (586.48s)
you think about, you know, our
[09:48] (588.08s)
serviceoriented architecture and, you
[09:50] (590.16s)
know, just loading up the app, the the
[09:52] (592.56s)
the gateway page is the place where all
[09:54] (594.80s)
of our requests come in, right? And so
[09:57] (597.68s)
it's just it's just like Netflix. It's
[09:59] (599.52s)
this infinite scroll of of carousels.
[10:02] (602.00s)
So the gateway page is is it the Amazon
[10:04] (604.08s)
Prime landing page?
[10:05] (605.04s)
Yeah, it's the landing page there.
[10:06] (606.72s)
And so you're like, okay, cool. If let's
[10:09] (609.76s)
say 90 95 99% of all of your requests
[10:12] (612.80s)
are coming from that page and that page
[10:14] (614.24s)
needs to be personalized
[10:16] (616.24s)
you know and you have a serviceoriented
[10:18] (618.16s)
architecture with a bunch of
[10:19] (619.52s)
microservices.
[10:21] (621.12s)
Um one request to that page turns into
[10:25] (625.84s)
let's just say hundreds of downstream
[10:28] (628.08s)
requests to different services. It might
[10:30] (630.32s)
even be more than that. It's it's
[10:31] (631.84s)
actually kind of hard to count.
[10:33] (633.36s)
Yeah. And and is is this page right?
[10:35] (635.20s)
Like all the all the stuff flowing all
[10:36] (636.96s)
personalized stuff. So that's the that's
[10:38] (638.56s)
the retail one, but I I was talking
[10:40] (640.08s)
about the Prime Video one,
[10:41] (641.04s)
the Prime Video one,
[10:41] (641.68s)
but essentially it's the same thing.
[10:43] (643.60s)
And so, you know, same thing for the the
[10:45] (645.36s)
retail website as well.
[10:46] (646.96s)
And so if you have one request sort of
[10:49] (649.44s)
spidering out into, you know, two orders
[10:52] (652.08s)
of magnitude more requests internally,
[10:54] (654.56s)
you start to see like really really
[10:57] (657.04s)
large scale for these microservices. So
[10:59] (659.44s)
a microser will have like a reverse
[11:01] (661.04s)
proxy or a load balancer in front of it
[11:03] (663.28s)
and you are sort of unironically talking
[11:05] (665.60s)
about things like tens of thousands of
[11:08] (668.16s)
requests per second or hundreds of
[11:10] (670.08s)
thousands of requests per second coming
[11:12] (672.64s)
into your service. So, so like the
[11:14] (674.72s)
services that are like behind you know
[11:16] (676.16s)
like there's the prime there's all the
[11:17] (677.92s)
things loading they're spidering out
[11:19] (679.52s)
like making you know to to render that
[11:21] (681.28s)
one recommendation for example for I
[11:23] (683.60s)
don't know the video whatever you would
[11:24] (684.96s)
like it will make a lot of requests to
[11:26] (686.40s)
different different services and then so
[11:28] (688.32s)
when you're operating a a smaller
[11:30] (690.08s)
service inside of Amazon suddenly you're
[11:32] (692.40s)
going to be hit with what you just said
[11:33] (693.92s)
10 10k 100k requests per second that
[11:36] (696.16s)
kind of scale
[11:36] (696.72s)
exactly and you will essentially be
[11:39] (699.84s)
doing yourself
[11:42] (702.16s)
you're you're just like Okay, cool. Um,
[11:44] (704.88s)
let's change a caching configuration on
[11:47] (707.28s)
some item details. And, uh, turns out
[11:50] (710.64s)
you've just browned out like a like a
[11:53] (713.44s)
critical service, right? Um,
[11:56] (716.16s)
what does brown down mean?
[11:57] (717.68s)
Oh, sorry. I'm using some jargon. So, we
[11:59] (719.68s)
just if you want to talk about
[12:00] (720.72s)
availability, um, if you suppose you
[12:04] (724.16s)
areing a a service or sending a lot of
[12:07] (727.68s)
requests over to them, you can you know,
[12:09] (729.76s)
you can you can just take them down.
[12:11] (731.28s)
That would be like a blackout. Yeah.
[12:13] (733.04s)
Um and so like you send a request, oh
[12:15] (735.60s)
you can't establish a connection, it
[12:17] (737.20s)
immediately comes back. But there's a
[12:19] (739.84s)
there's a type of outage where they
[12:22] (742.08s)
brown out. So basically they're
[12:23] (743.60s)
reachable. They might accept a
[12:24] (744.88s)
connection.
[12:26] (746.24s)
But you know um they'll essentially time
[12:28] (748.56s)
out or or they might return partial
[12:31] (751.20s)
results or or bad results or the only
[12:33] (753.12s)
thing that they do return is a you know
[12:34] (754.96s)
500 for some percentage or proportion of
[12:37] (757.52s)
after we waited a bunch of time for
[12:38] (758.80s)
that. Yeah. And so, you know, now we we
[12:42] (762.08s)
start talking about like availability
[12:43] (763.92s)
and resilience in in the face of like
[12:46] (766.16s)
all of these do of this DDoSing that
[12:48] (768.16s)
you're doing to yourself. And so the the
[12:50] (770.88s)
thing on top of scale that is going to
[12:53] (773.20s)
really complicate things is your
[12:54] (774.96s)
dependency chain, right? And so, you
[12:57] (777.92s)
know, your service is a dependency of
[13:00] (780.56s)
some of the process that's going on. It
[13:02] (782.32s)
depends on, you know, maybe AWS, it may
[13:05] (785.36s)
depend on another service. you know, how
[13:07] (787.28s)
do you make sure that if um you know,
[13:10] (790.00s)
suppose there's a failure for a primary
[13:11] (791.84s)
dependency and that dependency comes
[13:14] (794.08s)
back up, how do you make sure you don't
[13:16] (796.00s)
just like inundate it with a bunch of
[13:17] (797.84s)
requests as it's trying to recover?
[13:20] (800.40s)
And so you have all of these sort of
[13:21] (801.92s)
like odd dynamics that occur. I used a
[13:24] (804.64s)
brownout as something that is a
[13:27] (807.28s)
perennial problem that we have, right?
[13:29] (809.44s)
where there's maybe a dependency on a
[13:31] (811.44s)
base service like S3 or Dynamo DB or
[13:35] (815.12s)
whatever it is. There might you know be
[13:37] (817.44s)
some increased latency that may cause a
[13:40] (820.72s)
chain reaction of a dependency going
[13:42] (822.48s)
down and then one of these sort of
[13:44] (824.00s)
middle tier services would brown out. So
[13:47] (827.20s)
what are like you know you're an owner
[13:49] (829.28s)
of the the services um for your team and
[13:52] (832.80s)
so then it's like okay um what do we do
[13:55] (835.36s)
in those situations? How do we know that
[13:56] (836.72s)
they're browning out? um what do we do
[13:59] (839.12s)
in the face of uh you know a dependency
[14:01] (841.68s)
outage and then critically if there is
[14:03] (843.76s)
an outage and then the the service comes
[14:06] (846.08s)
back up how do we make sure that we give
[14:07] (847.76s)
it enough space so that it can breathe
[14:10] (850.48s)
so that you know you know as they're
[14:12] (852.96s)
trying to recover from some sort of
[14:14] (854.56s)
outage we don't just take them down
[14:16] (856.48s)
immediately again and I guess for like
[14:18] (858.80s)
most of us who are not working right now
[14:22] (862.24s)
on these services like these sound
[14:23] (863.84s)
pretty cool in theory but you're saying
[14:25] (865.92s)
this was actually like like this is not
[14:27] (867.44s)
theory This actually was like, oh, this
[14:29] (869.84s)
service is going down. We are literally
[14:31] (871.36s)
having 100k requests per second and
[14:33] (873.28s)
we're like
[14:34] (874.32s)
pushing that on to like other three
[14:36] (876.16s)
services with with the same cuz we need
[14:37] (877.84s)
to get invoke three other services. One
[14:39] (879.68s)
of them has browned out. What do we do
[14:41] (881.52s)
now? How do we fix it?
[14:42] (882.88s)
Yeah. It and and I think for certain
[14:46] (886.40s)
other large tech companies, you know,
[14:49] (889.28s)
you can do best effort, right? which is
[14:53] (893.20s)
basically like, hey, we're we're
[14:54] (894.88s)
temporarily down, but you know, you can
[14:57] (897.60s)
you can uh you know, you have some sort
[14:59] (899.68s)
of degraded service. That makes sense.
[15:01] (901.44s)
But if you're on say a website that does
[15:04] (904.64s)
purchases, now we're talking about
[15:06] (906.32s)
transactions.
[15:07] (907.60s)
Or if you're in the Prime Video like
[15:10] (910.08s)
live video streaming use case, now we're
[15:12] (912.72s)
talking about a football game that
[15:14] (914.48s)
you're unable to see.
[15:16] (916.88s)
Um and then when we recover, the game
[15:19] (919.04s)
might be over. Yeah. Right. And so it's
[15:20] (920.72s)
it's much higher stakes. And so I I
[15:23] (923.04s)
think the the scale with transactional
[15:26] (926.88s)
semantics, right? Like that's actually
[15:29] (929.44s)
the challenge that you're not going to
[15:30] (930.88s)
see unless you sort of like work for a
[15:32] (932.56s)
payment processor or something. Yeah.
[15:34] (934.72s)
Yeah. I guess that that real world
[15:37] (937.28s)
pressure challenge like you are losing
[15:39] (939.04s)
money. That's it. This I'm starting to
[15:40] (940.40s)
understand why like I have noticed that
[15:42] (942.48s)
startups love to hire from certain
[15:44] (944.72s)
companies. They usually startups love to
[15:46] (946.24s)
hire from other startups because it's
[15:47] (947.44s)
similar environment. from large tech
[15:49] (949.04s)
companies, it's a bit of a maybe. I'm
[15:50] (950.72s)
generalizing. Obviously, this is will
[15:52] (952.32s)
not be true 100% of the time, but for
[15:54] (954.08s)
example, hiring from Google, a lot of
[15:55] (955.60s)
startups are not as happy because the
[15:57] (957.12s)
people coming from Google are used to
[15:59] (959.04s)
having this amazing team around them,
[16:00] (960.48s)
internal tools, but most startups love
[16:02] (962.48s)
hiring from Amazon. And I'm starting to
[16:04] (964.16s)
get a sense of, you know, why this
[16:05] (965.76s)
actually is.
[16:06] (966.40s)
Yeah, I think that's part of the the
[16:07] (967.84s)
culture. You know, you you get uh you
[16:10] (970.64s)
get hired as a software developer and
[16:12] (972.32s)
they hand you a pager. And before, you
[16:15] (975.12s)
know, phone apps and and things like
[16:16] (976.88s)
that, it was like this pager from the
[16:20] (980.16s)
And it's it's really great because you
[16:22] (982.56s)
have to you have to like operate the
[16:25] (985.04s)
software that you write if you if you
[16:27] (987.28s)
actually you cannot write the software,
[16:30] (990.16s)
hand it over to the testing team, and
[16:31] (991.76s)
then throw it over to the S sur team
[16:33] (993.60s)
after you're done. Like you own that
[16:35] (995.20s)
that piece of software.
[16:36] (996.56s)
Yeah. Yeah. At every team, right?
[16:38] (998.16s)
Mhm. One interesting thing that we
[16:39] (999.76s)
talked about yesterday over over dinner
[16:41] (1001.60s)
with with Casey Moratori is you said
[16:43] (1003.76s)
something interesting on how Amazon
[16:45] (1005.52s)
measured how on their retail website I
[16:47] (1007.52s)
think it was retail maybe Amazon Prime
[16:49] (1009.28s)
the lower the latency of something
[16:51] (1011.92s)
loading like a page loading like a
[16:53] (1013.60s)
purchase or a purchase button loading
[16:55] (1015.76s)
the more revenue they got and they
[16:57] (1017.12s)
started to measure and there was a
[16:58] (1018.08s)
linear linear correction as the faster
[17:00] (1020.64s)
it was the more people converted and it
[17:02] (1022.32s)
seemed it had no end and the question
[17:04] (1024.56s)
Casey asked is like okay if this is the
[17:06] (1026.48s)
case what would stop Amazon because you
[17:09] (1029.68s)
have the best technologies in the world.
[17:11] (1031.20s)
You you have AWS, you know, you can
[17:13] (1033.28s)
build whatever you want to get the
[17:14] (1034.96s)
latency of the website down to let's say
[17:16] (1036.80s)
like 10 milliseconds or or even 1
[17:19] (1039.04s)
millisecond because if this goes up, you
[17:20] (1040.96s)
would maximize revenue. So can you tell
[17:23] (1043.52s)
me about like how how that thing like
[17:25] (1045.68s)
this measurement actually happened and
[17:28] (1048.24s)
you know why is Amazon's website still
[17:31] (1051.20s)
may maybe not the fastest in in the the
[17:33] (1053.60s)
world even though it would generate so
[17:35] (1055.52s)
many more billions, right?
[17:36] (1056.88s)
Yeah. Um well there are a couple
[17:38] (1058.32s)
questions embedded in there but we'll
[17:39] (1059.84s)
we'll start with the you know the
[17:41] (1061.52s)
latency to to gross revenue measurement.
[17:45] (1065.04s)
So essentially somebody way back when um
[17:47] (1067.84s)
you know because we invest in logs and
[17:50] (1070.00s)
telemetry started tracking how much
[17:53] (1073.04s)
gross revenue we would make based off of
[17:55] (1075.60s)
like the latency for detail pages based
[17:57] (1077.76s)
off latency of gateway based off of
[17:59] (1079.44s)
latency of of the checkout pages. And
[18:01] (1081.76s)
they noticed this dynamic where it's
[18:03] (1083.36s)
like if you're faster you just make more
[18:05] (1085.76s)
money. It's a it's a pretty clear
[18:07] (1087.92s)
correlation. Um I think you would even
[18:10] (1090.48s)
go as far as to say as causation. And so
[18:13] (1093.84s)
there was this really big focus on on
[18:16] (1096.56s)
latencies. I love the idea that you know
[18:19] (1099.60s)
if you're going to optimize for
[18:21] (1101.04s)
performance saying like why can't we be
[18:23] (1103.20s)
at 1 millisecond or why can't we be at
[18:25] (1105.36s)
10 milliseconds and start from there
[18:27] (1107.44s)
instead of sort of saying like hey let's
[18:29] (1109.28s)
try to decrease latencies by 50% or 25%.
[18:32] (1112.56s)
like let's just start from what is the
[18:34] (1114.72s)
conceptually fastest thing that we could
[18:38] (1118.16s)
And I think in a vacuum the conceptually
[18:41] (1121.92s)
fastest thing that we could do is sort
[18:43] (1123.60s)
of like a monolith which is how Amazon
[18:46] (1126.48s)
started
[18:47] (1127.60s)
where you know you have a web server
[18:50] (1130.32s)
with all of your catalog information.
[18:52] (1132.40s)
And so all of the items that are there
[18:54] (1134.08s)
and then transaction processing on the
[18:55] (1135.68s)
host that would be the fastest
[18:58] (1138.16s)
way to um run and and basically like a
[19:01] (1141.68s)
web request would be it opens the HTTP
[19:03] (1143.68s)
or HTTPS handshake. It hits the server.
[19:06] (1146.80s)
The server in an ideal world has
[19:08] (1148.48s)
everything cached or calculated. It
[19:10] (1150.48s)
sends it back. So the total like latency
[19:13] (1153.60s)
would be the time for this request, the
[19:15] (1155.60s)
time to transfer that data and you know
[19:17] (1157.04s)
based on your internet speed and that's
[19:18] (1158.48s)
it. That is the absolute you cannot be
[19:20] (1160.24s)
faster than that. I
[19:21] (1161.04s)
I don't think so. Maybe there's some
[19:22] (1162.56s)
exotic sort of thing that's
[19:23] (1163.92s)
maybe you can do some exotic protocol
[19:25] (1165.20s)
that I know predicts the future and like
[19:26] (1166.88s)
with UDP sends it. But but yeah, but
[19:28] (1168.56s)
this this is this is your baseline.
[19:29] (1169.84s)
I guess the the optimal would be like
[19:31] (1171.44s)
zero click instead of like a oneclick
[19:33] (1173.12s)
checkout, right? So we just send you
[19:34] (1174.72s)
stuff before like you know you want it.
[19:36] (1176.88s)
That that would be the I guess the
[19:38] (1178.16s)
theoretical maximum. But you know if if
[19:40] (1180.56s)
you if there's some sort of like web
[19:42] (1182.24s)
request, right? So some HTTP request and
[19:44] (1184.56s)
then some sort of like buy button that
[19:46] (1186.80s)
would be the fastest, right? And that's
[19:48] (1188.72s)
actually how Amazon was created. We we
[19:50] (1190.64s)
bought this, you know, it was sort of
[19:51] (1191.76s)
the opposite of horizontal scaling. It
[19:53] (1193.36s)
was vertical scaling. We bought these
[19:54] (1194.72s)
big sunboxes and you know we hacked up
[19:58] (1198.40s)
our own web server in in C++ and you
[20:02] (1202.48s)
know to scale up we bought bigger
[20:04] (1204.96s)
hardware and then when that didn't work
[20:07] (1207.44s)
you know we bought like six of these big
[20:09] (1209.12s)
boxes and that ran Amazon and we ran
[20:11] (1211.84s)
that way up until the the early 2000s
[20:14] (1214.72s)
and then what we realized we we ran into
[20:16] (1216.96s)
a wall which was that um you know when
[20:21] (1221.04s)
you when you built the C++ binary the
[20:23] (1223.36s)
binary could only be 4 GB and that was a
[20:28] (1228.00s)
hard limit based off of the 32-bit soft
[20:30] (1230.32s)
uh the architecture that we're running
[20:31] (1231.76s)
on before. We could not get above 4 GB
[20:34] (1234.96s)
and so these product managers would come
[20:36] (1236.56s)
and just be like well can just make a
[20:38] (1238.16s)
change for me
[20:39] (1239.52s)
right to the devs and then they would
[20:40] (1240.96s)
just be like I don't think you
[20:41] (1241.92s)
understand that this is a hard
[20:43] (1243.28s)
constraint and so we
[20:44] (1244.48s)
so the size of the code or the binary
[20:46] (1246.48s)
code the the compiled one it was there
[20:48] (1248.40s)
and you had so much business logic by
[20:50] (1250.16s)
then that it just filled at 4 GB.
[20:52] (1252.16s)
Yeah. Yeah. and and you know we had a
[20:54] (1254.64s)
distributed C++ build so you know you
[20:57] (1257.76s)
could uh you know it would take many
[20:59] (1259.44s)
many hours for it to compile and so we
[21:01] (1261.44s)
would distribute it across desktops and
[21:03] (1263.28s)
it was this whole big thing but we ran
[21:05] (1265.12s)
into that wall and so what we end
[21:07] (1267.76s)
decided to do and I think this was super
[21:09] (1269.76s)
smart was like to lean into
[21:11] (1271.44s)
serviceoriented architectures right and
[21:13] (1273.44s)
microservices
[21:15] (1275.12s)
and when you break it down a web service
[21:18] (1278.32s)
call is essentially it's a remote
[21:20] (1280.48s)
procedure call right so you have this
[21:22] (1282.24s)
execution ution pointer and then you're
[21:23] (1283.52s)
like okay well I need to do some
[21:24] (1284.80s)
computation or I need to gather some
[21:26] (1286.40s)
data I'm going to turn in turn make a
[21:28] (1288.64s)
HTTP request downstream to another
[21:30] (1290.96s)
service and then you can sort of chain
[21:32] (1292.32s)
those things together
[21:33] (1293.92s)
and so getting back to the original
[21:35] (1295.60s)
thing about performance
[21:37] (1297.52s)
in a world where you have to because you
[21:40] (1300.32s)
have thousands and thousands of
[21:41] (1301.60s)
developers building you know this stuff
[21:44] (1304.16s)
and the fact that you cannot have a a
[21:46] (1306.64s)
monolith as big as Amazon retail you
[21:49] (1309.28s)
know past something that's sort of like
[21:51] (1311.12s)
circa 2002 to Amazon size you have to
[21:54] (1314.16s)
lean into remote procedure call you have
[21:56] (1316.24s)
to say that there's a web service the
[21:58] (1318.32s)
best performance that you can actually
[21:59] (1319.76s)
get is always going to be bounded by the
[22:02] (1322.00s)
number of web requests that you end up
[22:04] (1324.16s)
making whether it's the you know the
[22:06] (1326.00s)
first order calls to say go get the item
[22:08] (1328.80s)
details um but then also any blocking
[22:11] (1331.68s)
call that happens downstream
[22:13] (1333.92s)
and by blocking call we mean like you
[22:16] (1336.00s)
need to wait for this to finish to get
[22:17] (1337.60s)
your data like you know a service that
[22:19] (1339.52s)
like returns I don't know your top five
[22:21] (1341.28s)
most likely to buy things. It it might
[22:23] (1343.36s)
need to make those, let's say, five
[22:24] (1344.80s)
requests or just one request. It needs
[22:26] (1346.32s)
to wait for that before it can return.
[22:28] (1348.24s)
Exactly. Exactly. And you can do this
[22:29] (1349.92s)
telemetry stuff. You can do this
[22:31] (1351.36s)
observability stuff to figure out, you
[22:33] (1353.36s)
know, within that service call chain
[22:35] (1355.36s)
what the blocking call is.
[22:37] (1357.28s)
And you can get some some uh you know,
[22:39] (1359.28s)
some amount of visualization on it. And
[22:41] (1361.04s)
so then you can get down to the point
[22:42] (1362.56s)
where it's like, okay, if we're going to
[22:44] (1364.00s)
start from first principles, what's this
[22:45] (1365.68s)
what's the least amount of latency that
[22:48] (1368.40s)
you can get for say like a web request
[22:50] (1370.56s)
or a checkout page call, you're going to
[22:52] (1372.72s)
run into like the absolute minimum,
[22:56] (1376.32s)
right? And it's going to be based off of
[22:58] (1378.00s)
like what are the required operations,
[23:01] (1381.12s)
you know, uh evaluation or transactions
[23:03] (1383.60s)
or whatever for that particular request.
[23:06] (1386.00s)
Yeah. And then basically so as I
[23:07] (1387.52s)
understand like as it became a microser
[23:09] (1389.60s)
like more microservices and services
[23:11] (1391.04s)
this was great for maintainability and
[23:12] (1392.72s)
also h you just so well you first just
[23:15] (1395.12s)
solved the issue of the monolith size
[23:17] (1397.12s)
and you know as we know as with history
[23:19] (1399.44s)
of course like now teams could be more
[23:20] (1400.96s)
autonomous they're not as dependent they
[23:22] (1402.96s)
could build the APIs but it was a
[23:24] (1404.48s)
trade-off for for latency and now like
[23:26] (1406.88s)
you had to go back and figure out the
[23:29] (1409.20s)
the blocking calls how to speed those up
[23:32] (1412.32s)
how to do I guess you know trade-off
[23:33] (1413.92s)
things like caching like you know you
[23:35] (1415.52s)
can things fast but it might not be as
[23:37] (1417.36s)
correct on the first one or like just
[23:39] (1419.20s)
tricky UI where you don't show the data
[23:41] (1421.52s)
just yet but it's coming and the users
[23:44] (1424.48s)
sense a sense of like progress that
[23:46] (1426.40s)
those kind of things
[23:47] (1427.36s)
it and it also I think forces teams to
[23:49] (1429.68s)
really and product to really say okay
[23:52] (1432.00s)
like what is the strictly necessary
[23:54] (1434.40s)
processing that happens on this page
[23:56] (1436.32s)
some of the work that I was doing uh
[23:58] (1438.40s)
before I left Prime Video was basically
[24:00] (1440.16s)
like you have these really really big
[24:01] (1441.60s)
heavy gateway page you know or landing
[24:04] (1444.32s)
page requests
[24:06] (1446.48s)
And you know if you're in a situation
[24:08] (1448.72s)
with high load, can you preemptively
[24:12] (1452.80s)
reduce the amount of say personalization
[24:17] (1457.04s)
that's going on to sort of speed up that
[24:19] (1459.44s)
page or you know to increase the amount
[24:22] (1462.16s)
of like throughput that you're able to
[24:23] (1463.76s)
have so to serve more customers. Can you
[24:26] (1466.08s)
do that in a smart way, right? That sort
[24:28] (1468.72s)
of anticipates load that's coming onto
[24:31] (1471.04s)
the to that page. Mh.
[24:33] (1473.04s)
Say if there's a football game coming up
[24:34] (1474.88s)
or something like that.
[24:36] (1476.08s)
Yeah. Sounds like these are just like
[24:39] (1479.92s)
a they seem just hard to solve, but now
[24:42] (1482.64s)
you have to solve them. So sounds like
[24:44] (1484.48s)
this this kept you busy and not everyone
[24:47] (1487.44s)
else busy at Amazon to this date, right?
[24:49] (1489.44s)
Like is is this do you think is this is
[24:51] (1491.52s)
this ongoing engineering challenge for
[24:52] (1492.96s)
Amazon? Cuz you know what I would
[24:54] (1494.72s)
imagine the tricky thing being here is
[24:57] (1497.52s)
like okay you can optimize whatever you
[24:59] (1499.60s)
have. you can find the critical path but
[25:01] (1501.44s)
Amazon keeps growing right like there's
[25:03] (1503.60s)
new teams new services new everything
[25:05] (1505.28s)
coming on so this thing will change all
[25:07] (1507.04s)
all the time it's an ongoing puzzle to
[25:09] (1509.36s)
yeah absolutely yeah I think um you know
[25:11] (1511.52s)
they they definitely have a ton of work
[25:14] (1514.16s)
in front of them um also you know it's
[25:16] (1516.32s)
part of their ethos to to really like
[25:18] (1518.56s)
launch new lines of businesses really
[25:20] (1520.24s)
quickly and so you know the ability for
[25:23] (1523.36s)
a team to go from zero to launch product
[25:26] (1526.32s)
within the confines and the context of a
[25:29] (1529.20s)
large corporate entity. I think that's,
[25:31] (1531.60s)
you know, part of the DNA that's there.
[25:33] (1533.20s)
So, as long as they're planting seeds as
[25:35] (1535.36s)
the the sort of like internal
[25:36] (1536.80s)
terminology is, I think that, you know,
[25:38] (1538.80s)
software developers will be uh uh in
[25:41] (1541.76s)
demand for quite a amount of time. Yeah.
[25:43] (1543.76s)
And I guess it's a good reminder that,
[25:45] (1545.04s)
you know, there's every now and then we
[25:46] (1546.08s)
have the monos versus microservices
[25:47] (1547.60s)
debate that it it sounds it kind of just
[25:49] (1549.60s)
makes sense for a startup to start with
[25:50] (1550.96s)
the monolith like you can always do what
[25:52] (1552.40s)
Amazon did and you have the benefits of
[25:54] (1554.24s)
latency. Everything is in one place.
[25:56] (1556.56s)
Like I'm sure there might be reason to
[25:58] (1558.32s)
start with microservices to start with,
[25:59] (1559.84s)
but if if you're a small team like I
[26:02] (1562.08s)
mean even today I don't think that
[26:03] (1563.52s)
argument changes, right? Like Amazon got
[26:05] (1565.76s)
really big wins by starting with a
[26:07] (1567.52s)
monolith back back in the day.
[26:09] (1569.68s)
Yeah, absolutely. I I I think it just
[26:13] (1573.04s)
makes a ton of sense to start with a
[26:14] (1574.64s)
monolith, wait till it breaks, and then
[26:18] (1578.00s)
the part that where it breaks is when
[26:20] (1580.16s)
you have like 50 developers working on
[26:21] (1581.92s)
the same piece of code. Once that sort
[26:23] (1583.92s)
of breaking point occurs, then you start
[26:26] (1586.16s)
to like try to figure out like how you
[26:27] (1587.92s)
can sort of break things up. But
[26:29] (1589.84s)
starting with a micros service
[26:31] (1591.20s)
architecture, especially when you're
[26:32] (1592.48s)
small, like what a waste of time and
[26:34] (1594.40s)
energy.
[26:35] (1595.20s)
Totally. So you were a principal
[26:37] (1597.68s)
engineer at Amazon. And apparently I I
[26:40] (1600.16s)
learned that you know most companies are
[26:42] (1602.08s)
they have different levels and again
[26:43] (1603.60s)
this principal engineer some companies
[26:45] (1605.28s)
have like staff level but it's usually
[26:47] (1607.04s)
like entry level mid-level senior and
[26:49] (1609.84s)
then you have staff or in the case of
[26:51] (1611.60s)
Amazon it's it's it's principal. I've
[26:53] (1613.44s)
learned that Amazon's principal level is
[26:55] (1615.76s)
both really hard to get into compared to
[26:58] (1618.24s)
a lot of other companies and it's a it's
[27:00] (1620.08s)
pretty special in some ways. So, we'll
[27:01] (1621.28s)
talk about that, but can you tell me
[27:02] (1622.40s)
like how how is the career kind of
[27:06] (1626.08s)
development? Cuz most people imagine
[27:07] (1627.60s)
like, oh, it's it should be pretty
[27:08] (1628.88s)
straightforward. I spend like I don't
[27:10] (1630.16s)
know two years as a junior, two years as
[27:11] (1631.92s)
a mid roughly, and two years a senior,
[27:13] (1633.68s)
then I get to principal. How does it
[27:15] (1635.20s)
actually work at Amazon?
[27:16] (1636.40s)
I think it's linear up until you hit
[27:18] (1638.88s)
principal, right? So, you know, you
[27:20] (1640.80s)
join, you're a junior developer, you get
[27:22] (1642.80s)
promoted to mid. at mid, you know,
[27:25] (1645.04s)
you're starting to influence the team,
[27:26] (1646.80s)
but but then you get to senior and so
[27:29] (1649.44s)
now your expected impact is at the at
[27:31] (1651.92s)
the team level and then and then there's
[27:35] (1655.04s)
this jump that you get to principal
[27:37] (1657.28s)
and principal is it's L6.
[27:39] (1659.04s)
Uh principal is L7.
[27:40] (1660.24s)
L7. Yes.
[27:41] (1661.04s)
Yeah. And so I think you really have to
[27:43] (1663.04s)
start with like why is it why is that
[27:45] (1665.44s)
jump so big? Cuz I think at every pretty
[27:47] (1667.20s)
much any other company, it's just a
[27:49] (1669.28s)
linear progression. Like there's nothing
[27:51] (1671.12s)
necessarily special about staff, you
[27:53] (1673.36s)
know, you can just sort of go to that
[27:55] (1675.36s)
level, senior staff and then principal.
[27:57] (1677.60s)
But for some reason, Amazon decided that
[28:00] (1680.16s)
they weren't going to have a staff level
[28:03] (1683.12s)
and and so and and I think they they
[28:05] (1685.92s)
sort of like couched it around like
[28:07] (1687.36s)
having high standards. Basically to get
[28:10] (1690.08s)
from senior to principal you have to do
[28:12] (1692.40s)
like two and a half level jump
[28:14] (1694.24s)
from from L6 L7. Technically it sounds
[28:16] (1696.96s)
like one level but at some other
[28:19] (1699.28s)
companies this might be like uh you know
[28:21] (1701.20s)
L8 L9 or L8 and a half.
[28:23] (1703.52s)
Yeah. And you know so the the the
[28:25] (1705.36s)
handwavy argument is like hey we have
[28:27] (1707.04s)
high standards and like you know it's it
[28:29] (1709.36s)
means something to get to that level.
[28:30] (1710.88s)
It's like fine. But I noticed that some
[28:33] (1713.12s)
of the best engineers that I'd ever
[28:34] (1714.88s)
worked with were having such problems
[28:37] (1717.60s)
getting to principal engineer that they
[28:39] (1719.60s)
ended up moving to Facebook or to Meta
[28:41] (1721.68s)
or to all these other places where the
[28:44] (1724.00s)
progression was just sane. Now
[28:46] (1726.80s)
staff are senior staff level.
[28:48] (1728.00s)
Now they're senior staff and you know
[28:49] (1729.52s)
principal and distinguished engineer at
[28:51] (1731.52s)
other companies and so
[28:53] (1733.84s)
because we had high standards we
[28:56] (1736.08s)
actually had this brain drain and it
[28:57] (1737.52s)
wasn't a brain drain at lower levels. It
[28:59] (1739.76s)
was that the brain drain at at sort of
[29:01] (1741.44s)
like the higher levels.
[29:03] (1743.52s)
And it was it's just an example of
[29:05] (1745.28s)
something where it's just like why did
[29:06] (1746.64s)
you do that to yourself? And so that's
[29:08] (1748.72s)
the the the context for for being a
[29:11] (1751.36s)
principal at Amazon. you know I
[29:12] (1752.64s)
so it's safe to say it's wicked hard to
[29:14] (1754.16s)
get internally right
[29:16] (1756.16s)
so I you know I I I'm I'm colleagues
[29:18] (1758.88s)
with Ethan Evans and so we we talk about
[29:21] (1761.28s)
what's the hardest promotion at Amazon
[29:24] (1764.16s)
and you know I had made the argument
[29:25] (1765.60s)
that it was you know it was uh senior
[29:27] (1767.76s)
engineer to principal and he's like yeah
[29:30] (1770.32s)
that's hard actually the hardest one
[29:32] (1772.48s)
Steve is you know VP to senior VP cuz
[29:35] (1775.12s)
there's only there's only eight spots or
[29:37] (1777.20s)
10 spots to for that um and maybe 300
[29:40] (1780.24s)
VPs um that are all trying to at this I
[29:42] (1782.32s)
would that's more of a supply and demand
[29:43] (1783.76s)
thing. I will say that at Amazon there
[29:47] (1787.04s)
is gigantic demand for principal
[29:49] (1789.12s)
engineers and so there are roles that
[29:52] (1792.00s)
have been open for years. I think
[29:54] (1794.32s)
something on the order of like 13 months
[29:56] (1796.40s)
or 17 months or something like that to
[29:58] (1798.32s)
get an external hire to um to join as a
[30:01] (1801.84s)
principal engineer. But that metric is
[30:03] (1803.60s)
only calculated when the role is filled.
[30:05] (1805.92s)
And so probably you know there are
[30:08] (1808.32s)
hundreds of principal engineer openings
[30:10] (1810.08s)
at Amazon.
[30:11] (1811.12s)
Mhm. And there are thousands of senior
[30:13] (1813.60s)
engineers
[30:14] (1814.56s)
who desperately want to get there
[30:16] (1816.56s)
putting in the work,
[30:17] (1817.60s)
you know, and so there's this sort of
[30:19] (1819.28s)
like there's this tension,
[30:21] (1821.20s)
right? Um, and I don't think you see
[30:23] (1823.68s)
that at the lower levels. I don't think
[30:25] (1825.60s)
that that's happening at senior or mid
[30:27] (1827.20s)
or junior. And so like that inongruity I
[30:30] (1830.16s)
think is is super interesting. But when
[30:32] (1832.88s)
once you do get to principal engineer,
[30:34] (1834.64s)
one thing that I've never heard any
[30:36] (1836.16s)
other company have is there is
[30:37] (1837.76s)
apparently a principal engineering
[30:39] (1839.12s)
community which is I've heard again from
[30:41] (1841.84s)
other people that it's tightly knit.
[30:43] (1843.60s)
It's actually special. It's actually
[30:45] (1845.52s)
just really nice organization. Can you
[30:46] (1846.96s)
talk about that? So like you know once
[30:48] (1848.16s)
you once you got in there somehow I
[30:50] (1850.64s)
don't know was was it Blood Switzer
[30:52] (1852.48s)
promotion?
[30:53] (1853.28s)
There is a community. I think it's
[30:54] (1854.64s)
actually really great. um my own
[30:57] (1857.60s)
history, you know, I I went from support
[31:00] (1860.64s)
engineer to senior engineer in like four
[31:02] (1862.72s)
years at Amazon, but then from senior to
[31:05] (1865.36s)
principal, it took me eight years and I
[31:08] (1868.08s)
got promoted in uh Q1 of 2020. Turns out
[31:11] (1871.52s)
to be a consequential like year four in
[31:14] (1874.00s)
the industry for the world
[31:15] (1875.52s)
that that was forceful remote work.
[31:18] (1878.08s)
And so, you know, I got promoted and
[31:19] (1879.84s)
everybody's like, you know,
[31:20] (1880.72s)
congratulations. They used to have like
[31:22] (1882.80s)
a principal engineer offsite where they
[31:24] (1884.88s)
just flew everybody into Seattle or
[31:26] (1886.56s)
nearby and then to to sort of like you
[31:29] (1889.44s)
know um mingle and and to talk to other
[31:32] (1892.00s)
folks. That stopped
[31:33] (1893.68s)
during the pandemic and then um you know
[31:36] (1896.32s)
by the time the pandemic restrictions
[31:38] (1898.16s)
started leaving the population of
[31:40] (1900.40s)
principal engineers had essentially
[31:42] (1902.00s)
doubled. That's still to say like there
[31:44] (1904.16s)
are still hundreds and hundreds of
[31:45] (1905.36s)
openings for principal engineer but then
[31:48] (1908.24s)
the you know the sort of like off-site
[31:50] (1910.08s)
community shifted over to the senior
[31:52] (1912.16s)
principles that I didn't have access to
[31:54] (1914.56s)
but you know at the moment the the
[31:56] (1916.64s)
manifestation of the principal
[31:58] (1918.08s)
engineering community is essentially
[32:00] (1920.40s)
through the slack channel um which is
[32:03] (1923.04s)
absolutely awesome um and then um we had
[32:06] (1926.72s)
principal off sites for like our local
[32:08] (1928.96s)
organization so like Amazon music prime
[32:11] (1931.28s)
video Twitch that sort of thing. Those
[32:13] (1933.12s)
meetups were were amazing. So the reason
[32:15] (1935.76s)
they were is because of this high
[32:18] (1938.48s)
standard that Amazon had created. And so
[32:20] (1940.80s)
what it meant is that everybody that was
[32:23] (1943.04s)
able to achieve that that overly high
[32:25] (1945.52s)
standard, there's something exceptional
[32:27] (1947.28s)
about them.
[32:28] (1948.48s)
Um there's there's, you know, um they're
[32:31] (1951.12s)
super deep in a particular technology or
[32:33] (1953.52s)
they were associated with, you know, uh
[32:36] (1956.48s)
the growth of a a really large line of
[32:38] (1958.72s)
business either within Amazon or or
[32:40] (1960.72s)
externally. They were essentially
[32:43] (1963.12s)
leaders within the industry and you
[32:46] (1966.80s)
could just literally you could just
[32:48] (1968.40s)
scoop out five people and then put them
[32:51] (1971.92s)
into a room and the conversation is just
[32:54] (1974.16s)
is just amazing, right? And and I would
[32:56] (1976.72s)
I would sort of be like I don't even
[32:58] (1978.16s)
belong here. Like look at this guy, you
[32:59] (1979.84s)
know, he wrote a book on, you know, on
[33:02] (1982.32s)
on a particular topic and and this guy,
[33:04] (1984.96s)
you know, he you know, he was, you know,
[33:07] (1987.28s)
a luminary in in a particular field. and
[33:10] (1990.56s)
then this person just like is an amazing
[33:12] (1992.96s)
code machine and can just write an
[33:15] (1995.04s)
entire application over a weekend and
[33:17] (1997.44s)
then you're like what am I doing here?
[33:19] (1999.92s)
You know, I I I do wonder if that
[33:22] (2002.16s)
community might be coming back now. I I
[33:24] (2004.16s)
know you've left but now Amazon is now
[33:26] (2006.48s)
in person because it sounds like a lot
[33:27] (2007.68s)
of the benefit was the inerson part as
[33:30] (2010.16s)
well because this is what I never heard
[33:31] (2011.76s)
again even before the pandemic. I I
[33:33] (2013.44s)
didn't hear other companies say for
[33:35] (2015.20s)
example at Uber I I've heard that the
[33:37] (2017.28s)
senior SAP engineers do get together
[33:38] (2018.80s)
every now and then but it was was very
[33:40] (2020.80s)
like roots so so it was bottoms up but
[33:43] (2023.68s)
my understanding at Amazon actually
[33:45] (2025.36s)
invested not just you know some
[33:47] (2027.28s)
principal engineers saying hey let's get
[33:48] (2028.64s)
together but also just kind of you like
[33:51] (2031.44s)
making making sure that that that group
[33:54] (2034.00s)
really had something like I've I I think
[33:56] (2036.32s)
it's smart I think more companies should
[33:57] (2037.68s)
do it but I'm just not seeing it
[33:59] (2039.28s)
the investment was
[34:02] (2042.24s)
um also in terms of headcount. So there
[34:04] (2044.64s)
are program managers and and like
[34:07] (2047.20s)
product managers essentially um that are
[34:10] (2050.64s)
um you know bringing the folks together.
[34:12] (2052.40s)
Awesome.
[34:13] (2053.20s)
There's a there's a wonderful series.
[34:15] (2055.04s)
It's called the principles of Amazon
[34:16] (2056.56s)
series where you know principal
[34:18] (2058.88s)
engineers will just you know they'll do
[34:20] (2060.48s)
a presentation and it's recorded that's
[34:22] (2062.64s)
been happening for you know 20 years and
[34:26] (2066.32s)
you know we record everything that's
[34:27] (2067.68s)
there but it takes work to actually
[34:29] (2069.52s)
but that internal series that and is
[34:32] (2072.72s)
that open to like everyone at Amazon or
[34:34] (2074.56s)
it's for the principles themselves? It's
[34:35] (2075.92s)
it's open uh for everybody at Amazon to
[34:38] (2078.24s)
consume and then um you know there might
[34:40] (2080.56s)
be some senior engineers and stuff like
[34:42] (2082.32s)
that that that would make a presentation
[34:43] (2083.92s)
that's part of their promotion packet is
[34:45] (2085.60s)
be able to make an Amazonwide
[34:47] (2087.36s)
presentation
[34:48] (2088.40s)
on a particular thing. My point was
[34:50] (2090.48s)
though that that stuff doesn't just
[34:52] (2092.08s)
happen on its own.
[34:53] (2093.04s)
Yeah. like you have to like you need a
[34:55] (2095.44s)
program manager or multiple folks to
[34:58] (2098.00s)
sort of like herd the cats and to like
[35:01] (2101.20s)
schedule the off offsites and to make
[35:03] (2103.76s)
sure that the you know the Slack channel
[35:05] (2105.68s)
doesn't go off the rails, right? And is
[35:07] (2107.36s)
still useful and it's just not going to
[35:09] (2109.44s)
happen like grassroots with just like
[35:12] (2112.24s)
throwing a bunch of people into a room.
[35:14] (2114.32s)
This episode is brought to you by
[35:15] (2115.68s)
Augment Code. You're a professional
[35:17] (2117.92s)
software engineer. Vibes will not cut
[35:19] (2119.92s)
it. Augment Code is the AI assistant
[35:22] (2122.16s)
built for real engineering teams. It
[35:24] (2124.40s)
ingests your entire repo, millions of
[35:26] (2126.48s)
lines, tens of thousands of files, so
[35:28] (2128.64s)
every suggestion lands in context and
[35:30] (2130.64s)
keeps you in flow. With Augment's new
[35:32] (2132.88s)
remote agent, cue a parallel task like
[35:35] (2135.04s)
bug fixes, features, and refactors.
[35:37] (2137.36s)
Close your laptop and return to ready
[35:39] (2139.12s)
for review pull requests. Where other
[35:41] (2141.44s)
tools stall, Augment Code sprints.
[35:44] (2144.24s)
Augment Code never trains or sells your
[35:46] (2146.08s)
code, so your team's intellectual
[35:47] (2147.76s)
property stays yours. And you don't have
[35:49] (2149.84s)
to switch tooling. Keep using VS Code,
[35:52] (2152.00s)
JetBrains, Android Studio, or even Vim.
[35:54] (2154.72s)
Don't hire an AI for Vibes. Get the
[35:56] (2156.64s)
agent that knows you and your code base.
[35:59] (2159.36s)
Start your 14-day free trial at
[36:01] (2161.20s)
augmentcode.com/pragmatic.
[36:03] (2163.92s)
I think, you know, these are the the
[36:05] (2165.52s)
things I mean, we're now exposing a few
[36:08] (2168.48s)
of these things here and there, but some
[36:10] (2170.32s)
of these companies like, you know,
[36:11] (2171.52s)
Amazon is a great example where there's
[36:13] (2173.28s)
more to the eye than what meets the
[36:14] (2174.96s)
surface. So like once you're inside
[36:16] (2176.40s)
Amazon for example you now as an
[36:18] (2178.08s)
engineer even if not a principal
[36:19] (2179.20s)
engineer you now have access to the
[36:20] (2180.96s)
whole you know 20 years of principal
[36:22] (2182.72s)
presentations like when I joined Uber I
[36:24] (2184.72s)
was amazed at how we had the RFC's
[36:27] (2187.44s)
available like I could read all historic
[36:29] (2189.76s)
ones so I think there is and every
[36:31] (2191.60s)
company has its own of course once
[36:33] (2193.76s)
you're in there you have access to this
[36:35] (2195.12s)
like knowledge base which it will just
[36:37] (2197.12s)
never be published it cannot because it
[36:38] (2198.80s)
has you know business sensitive things
[36:40] (2200.72s)
etc. So I think as an engineer like you
[36:42] (2202.72s)
can just really just like like be a
[36:44] (2204.96s)
sponge when when you join especially one
[36:46] (2206.72s)
of the companies that that is known to
[36:48] (2208.24s)
be a bit more open internally even if
[36:50] (2210.24s)
yeah Amazon I think a really interesting
[36:51] (2211.92s)
one because externally it's very closed
[36:53] (2213.52s)
is my sense they're very careful about
[36:55] (2215.04s)
what they share for example the
[36:56] (2216.88s)
postmortm for AWS is very few are
[36:59] (2219.12s)
published externally but internally
[37:01] (2221.76s)
they're all there as I understand there
[37:03] (2223.44s)
as an NGO you can access you can learn
[37:05] (2225.04s)
from them like in really cool real world
[37:07] (2227.52s)
learnings
[37:08] (2228.24s)
absolutely you know um it is an open
[37:10] (2230.80s)
place internally and we're so selective
[37:13] (2233.28s)
about what we I say we as though I still
[37:15] (2235.52s)
work there but uh what what what they
[37:17] (2237.52s)
publish externally and you know uh the
[37:20] (2240.16s)
the postmortems we call them COE's it's
[37:22] (2242.88s)
a COE stands for
[37:24] (2244.08s)
it's a a correction of error yeah
[37:25] (2245.84s)
it's you know it's this idea that you
[37:27] (2247.60s)
know you have like holes in Swiss cheese
[37:30] (2250.16s)
and and you have like a failure requires
[37:33] (2253.92s)
that there's a there's a hole across
[37:36] (2256.16s)
layers that's the best reading like I
[37:38] (2258.24s)
would just subscribe to the email list
[37:40] (2260.16s)
where they were published internally. So
[37:41] (2261.52s)
you have this like stream of like of
[37:44] (2264.00s)
disasters that are going on within the
[37:45] (2265.92s)
company and you just, you know, you grab
[37:47] (2267.60s)
some popcorn and you you pop open one of
[37:49] (2269.52s)
these COE's and you learn so much from
[37:52] (2272.64s)
that and and I think that that's that's
[37:54] (2274.40s)
part of the secret sauce. The idea and I
[37:56] (2276.72s)
don't know if it's like this for 100% of
[37:59] (2279.12s)
them is that it's a blameless culture
[38:00] (2280.88s)
sort of thing.
[38:02] (2282.08s)
And so to really screw up requires that
[38:05] (2285.52s)
multiple people drop the ball.
[38:08] (2288.16s)
Yeah. And you learn so much from that
[38:10] (2290.96s)
that sort of stuff. You know, the the
[38:13] (2293.04s)
brownouts, you know, these uh these
[38:15] (2295.44s)
lessons that you would learn from, you
[38:17] (2297.20s)
know, trying to recover from really
[38:18] (2298.56s)
large dependencies. Those things are
[38:20] (2300.80s)
immortalized inside some of these COE's.
[38:22] (2302.96s)
So, there's some very famous outages
[38:25] (2305.04s)
that happened within Amazon and you
[38:27] (2307.92s)
know, they were an egg on our face and
[38:30] (2310.24s)
but we really really learned those
[38:31] (2311.84s)
lessons through those postmortems.
[38:33] (2313.28s)
They're they're absolutely wonderful. as
[38:35] (2315.04s)
a principal engineer, you know, you we
[38:36] (2316.56s)
so far we kind of glamorized a role
[38:38] (2318.48s)
saying, you know, it is hard to get
[38:39] (2319.52s)
into, but once you're there, you have
[38:40] (2320.72s)
the community, you do this this really
[38:42] (2322.16s)
impactful work. But one of the principal
[38:44] (2324.16s)
engineers uh at Amazon who's still there
[38:46] (2326.32s)
called Bobby Kot Kotari, he collected
[38:49] (2329.68s)
some things that are maybe not as
[38:52] (2332.16s)
glamorous or more challenging about
[38:53] (2333.68s)
principal engineering. He had five of of
[38:56] (2336.40s)
these things or five or six. I just want
[38:58] (2338.08s)
to go through with you and and your take
[39:00] (2340.16s)
on them. The first he wrote, "There is
[39:01] (2341.92s)
this paradox of belonging that you're
[39:03] (2343.84s)
part of of all teams yet you're part of
[39:05] (2345.92s)
none." What does that mean?
[39:08] (2348.32s)
Yeah. No, so I uh Avoc was actually a a
[39:12] (2352.48s)
peer of mine. We worked in Prime Video
[39:14] (2354.40s)
together.
[39:16] (2356.16s)
So he's he's an awesome dude. Yeah.
[39:18] (2358.08s)
There's there are all of these paradoxes
[39:19] (2359.68s)
and and uh this paradox of belonging is
[39:23] (2363.28s)
is is a really interesting one. You
[39:26] (2366.24s)
know, you work for the organization,
[39:28] (2368.16s)
right? you're working across teams,
[39:30] (2370.32s)
right? So, as a senior engineer, you're
[39:32] (2372.48s)
working on you're embedded on a team
[39:34] (2374.80s)
and you know, you own the team's
[39:36] (2376.32s)
architecture, the the operations, you
[39:38] (2378.72s)
know, the software development life
[39:40] (2380.48s)
cycle and the design. But when you get
[39:43] (2383.44s)
to that next level where you're working
[39:45] (2385.04s)
across teams, um you kind of operate in
[39:48] (2388.56s)
this weird layer where, you know, you're
[39:51] (2391.20s)
not on pager duty for a particular team.
[39:53] (2393.92s)
Mhm. um you have visibility across all
[39:56] (2396.80s)
of these teams that are there. You're
[39:58] (2398.80s)
helping to guide and make decisions, but
[40:01] (2401.12s)
you're literally not on the ground floor
[40:04] (2404.00s)
anymore.
[40:05] (2405.12s)
And so, you know, when you work with a
[40:07] (2407.20s)
particular team, you know, you might
[40:09] (2409.12s)
call the senior engineers or the
[40:10] (2410.40s)
mid-level engineers in and be like,
[40:11] (2411.76s)
"Hey, let's whiteboard some stuff. Like,
[40:13] (2413.36s)
let's try to figure out what's going
[40:14] (2414.48s)
on." You're not on the team. You're kind
[40:16] (2416.40s)
of this like adviser that's sort of
[40:18] (2418.56s)
coming in,
[40:20] (2420.00s)
right? But then, you know, maybe a
[40:22] (2422.00s)
director or a VP would call you in and
[40:24] (2424.32s)
say like, "Hey, what do I own? Like,
[40:25] (2425.92s)
what's going on? Explain to me this
[40:27] (2427.44s)
outage or tell me why we can't build
[40:29] (2429.36s)
this thing."
[40:30] (2430.72s)
And then you're you're trying to
[40:32] (2432.08s)
whiteboard the architecture and the
[40:33] (2433.92s)
system and you're trying to say like,
[40:35] (2435.12s)
"Hey, you know, this is what's going on
[40:38] (2438.24s)
on the ground floor."
[40:40] (2440.32s)
But you weren't, you know, you weren't
[40:41] (2441.68s)
part of that team. So, you're just sort
[40:43] (2443.04s)
of operating in this this sort of strata
[40:45] (2445.36s)
where, you know, you don't really belong
[40:47] (2447.76s)
on a team. you know, I'm a I'm an
[40:49] (2449.68s)
immigrant. I think you are uh as well.
[40:52] (2452.32s)
And you know, my parents came from from
[40:54] (2454.72s)
Asia. I'm not Asian, right? So, when I
[40:58] (2458.16s)
go back to Asia, I'm definitely from
[40:59] (2459.84s)
from the US. And then growing up in this
[41:01] (2461.44s)
country, it was just like, you know, I'm
[41:04] (2464.00s)
I'm uh you know, not quite an American,
[41:06] (2466.80s)
right? And so you you sort of operate in
[41:08] (2468.96s)
this sort of you know area in the gaps
[41:11] (2471.28s)
where you your identity is is is really
[41:14] (2474.64s)
defined by not being squarely in one of
[41:17] (2477.04s)
these predefined categories. So it's
[41:18] (2478.96s)
very similar to that as a principal
[41:21] (2481.20s)
engineer. You're not on the ground
[41:22] (2482.64s)
floor. You're not checking in. You will
[41:24] (2484.24s)
check in code but you're not necessarily
[41:26] (2486.08s)
part of that team embedded on that team.
[41:28] (2488.48s)
And even if you are for a short time
[41:30] (2490.32s)
it's usually a short time and like
[41:31] (2491.76s)
tomorrow the director call you up and
[41:33] (2493.60s)
say like hey Steve we need you on this
[41:35] (2495.44s)
other team. they're in trouble. Move
[41:37] (2497.28s)
over. Like,
[41:38] (2498.08s)
yeah. And you parachute in and then, you
[41:40] (2500.16s)
know, then they're like, "Oh, who's this
[41:41] (2501.68s)
guy?" You know, and then your your
[41:43] (2503.84s)
director is like, "What's going on? What
[41:45] (2505.92s)
what happened during this outage? Why
[41:47] (2507.52s)
is, you know, why is the why is the
[41:49] (2509.52s)
press writing about us?"
[41:51] (2511.20s)
And then you're like, well, you know,
[41:52] (2512.72s)
here's what's happening on the ground,
[41:54] (2514.00s)
but you're not really embedded on that
[41:56] (2516.24s)
team. Which leads us to the next paradox
[41:58] (2518.24s)
that Bavik said. He he he lists a few of
[42:00] (2520.40s)
the paradox, which is a freedom
[42:01] (2521.68s)
responsibility. and he writes that you
[42:03] (2523.68s)
enjoy significant autonomy in being able
[42:05] (2525.44s)
to choose what you work on. However,
[42:07] (2527.60s)
there's an implicit expectation and
[42:09] (2529.76s)
accountability for resounding impact.
[42:12] (2532.00s)
Yeah. So, you know, I you know, I
[42:14] (2534.80s)
reported to a VP right before I uh left
[42:17] (2537.28s)
the company and uh
[42:18] (2538.72s)
so they were your manager basically.
[42:20] (2540.00s)
Yeah, my manager was a was a VP.
[42:21] (2541.84s)
Oh, wow. That's
[42:24] (2544.80s)
I I I don't hear many companies having
[42:27] (2547.04s)
engineers report into VPs.
[42:29] (2549.60s)
that doesn't seem very standard. um you
[42:31] (2551.60s)
know and so the the org that he owned I
[42:33] (2553.60s)
you know I considered myself the the
[42:35] (2555.28s)
tech adviser for that organization was
[42:37] (2557.28s)
about 450 people uh 450 software
[42:40] (2560.56s)
developers
[42:41] (2561.68s)
and what did our one-on ones consist of
[42:44] (2564.80s)
right like when I when I would have our
[42:46] (2566.96s)
one-on-one it wasn't like hey here's you
[42:49] (2569.84s)
know he didn't assign me work he wasn't
[42:52] (2572.32s)
like hey I need you to build this thing
[42:54] (2574.48s)
I need you to design this thing the
[42:56] (2576.88s)
context that he set was basically like
[42:58] (2578.64s)
here's a direction right that you need
[43:01] (2581.20s)
to go and
[43:03] (2583.36s)
the way that you can achieve that type
[43:05] (2585.52s)
of impact was up to me.
[43:08] (2588.48s)
Right. So he might say something like
[43:10] (2590.00s)
hey availability is so important for you
[43:13] (2593.84s)
know uh live sports. We just signed you
[43:16] (2596.56s)
know billion-dollar contracts with these
[43:18] (2598.32s)
sports leagues and so we need to
[43:20] (2600.64s)
increase our availability posture.
[43:22] (2602.72s)
Mhm. And then I would be like, "Okay."
[43:26] (2606.40s)
And then I would go away and we would
[43:28] (2608.72s)
come back and I would be like, you know,
[43:30] (2610.96s)
here's what I'm working on, right? Like
[43:33] (2613.44s)
that type of dynamic. I don't this does
[43:36] (2616.80s)
not exist at the senior engineer below
[43:38] (2618.80s)
level where you're basically telling
[43:40] (2620.56s)
your boss what's happening. I I was
[43:42] (2622.64s)
about to say that when you said my my
[43:44] (2624.80s)
manager one-on- ones, he didn't tell me
[43:46] (2626.24s)
what to do. I'm like most engineers
[43:47] (2627.52s)
would be like, "Sign me up." Like I I
[43:48] (2628.96s)
don't want, you know, we all hate
[43:50] (2630.00s)
micromanagement. But now when you're
[43:51] (2631.92s)
telling me like he would say like, "Oh,
[43:53] (2633.68s)
so we just signed a billion dollar
[43:54] (2634.96s)
contract. Availability is important and
[43:57] (2637.04s)
then stops talking." I'm like, "That
[43:59] (2639.20s)
sounds uncomfortable."
[44:01] (2641.60s)
And and and basically like you're kind
[44:03] (2643.04s)
of expected a little bit to like
[44:04] (2644.48s)
understand what he's expecting even
[44:06] (2646.16s)
though he doesn't know. And then and I'm
[44:07] (2647.84s)
assuming, you know, there's two ways of
[44:09] (2649.28s)
going, right? You go back on the next
[44:10] (2650.64s)
one-on-one and you say something and
[44:12] (2652.16s)
he's like like Steve like you're a
[44:14] (2654.32s)
principal engineer. This is not what I
[44:15] (2655.60s)
expect of you and you don't want that.
[44:17] (2657.92s)
whereas this, you know, if if you bring
[44:19] (2659.52s)
back the right things. So, sounds like
[44:20] (2660.88s)
you really need to uplevel in like
[44:22] (2662.32s)
understanding how like these people
[44:24] (2664.32s)
think. AB:
[44:25] (2665.28s)
Absolutely. And so, he's, you know, he's
[44:26] (2666.96s)
accountable to to his boss as well. And,
[44:29] (2669.76s)
you know, don't get me wrong, I I
[44:31] (2671.20s)
didn't, you know, I I had a I owned
[44:33] (2673.36s)
aspects of availability. You know,
[44:34] (2674.80s)
there's a multi,000 person organization
[44:37] (2677.36s)
at Prime Video doing this stuff, but we
[44:39] (2679.36s)
owned the the live sports aspect of
[44:41] (2681.04s)
this. Um, and you know, there are
[44:43] (2683.20s)
playback teams, there are, you know,
[44:44] (2684.88s)
recommendation teams, there, you know,
[44:46] (2686.88s)
there's so many different teams that are
[44:48] (2688.24s)
there that had to to really step up and
[44:50] (2690.64s)
and uh make sure that availability was
[44:52] (2692.80s)
good. But he would say something like,
[44:55] (2695.28s)
hey, you know, what is our availability
[44:57] (2697.52s)
posture for certain aspects and I would
[45:00] (2700.80s)
have to go and figure it out. Yeah.
[45:02] (2702.80s)
Like where what are we measuring? What
[45:04] (2704.40s)
are we not measuring? there's a deadline
[45:06] (2706.48s)
for, you know, the start of a season uh
[45:08] (2708.88s)
where we're expecting, you know,
[45:10] (2710.16s)
millions and millions of concurrent uh
[45:12] (2712.32s)
to come in. Um what can we do between
[45:15] (2715.28s)
now and then, right? And then if we do
[45:17] (2717.44s)
write some software like what what is
[45:20] (2720.08s)
the highest leverage piece of software
[45:21] (2721.76s)
that we could create that would increase
[45:23] (2723.76s)
our availability posture. And so the way
[45:25] (2725.36s)
that I I sort of describe it to people
[45:27] (2727.28s)
is you are assigned not a problem, not
[45:31] (2731.36s)
even a problem space, you're assigned a
[45:32] (2732.96s)
direction. You can solve the problem
[45:34] (2734.32s)
with code. You can solve the problem
[45:36] (2736.00s)
with system design and architecture, but
[45:38] (2738.48s)
you could also solve the problem say by,
[45:40] (2740.56s)
you know, I don't know, hey, maybe
[45:42] (2742.16s)
there's some off-the-shelf software we
[45:43] (2743.60s)
should purchase.
[45:44] (2744.64s)
U maybe there's a dev team that we
[45:46] (2746.96s)
should start to spin up right now, um,
[45:49] (2749.52s)
whose job it is to do this particular
[45:52] (2752.00s)
thing. Maybe we've identified a piece of
[45:54] (2754.96s)
software and it's already been scoped
[45:56] (2756.80s)
that this team needs to go and build,
[45:59] (2759.28s)
but it's not a priority for them. now we
[46:02] (2762.40s)
need to go and figure out like you know
[46:03] (2763.92s)
how we can get them to do it. Can we
[46:05] (2765.36s)
shuffle around resources? That sort of
[46:07] (2767.52s)
thing. And so the way I describe it is
[46:08] (2768.88s)
like there's so many more things on the
[46:11] (2771.44s)
that you can use to solve the problem.
[46:14] (2774.40s)
And I don't think people recognize that.
[46:16] (2776.48s)
They they think that it's just oh when
[46:17] (2777.84s)
you're a principal like you just like
[46:20] (2780.00s)
code a lot and it's just really
[46:21] (2781.52s)
complicated
[46:22] (2782.16s)
or or do more meetings, you know, that's
[46:23] (2783.76s)
what happens.
[46:24] (2784.96s)
I mean at the end of the day like don't
[46:26] (2786.32s)
get me wrong, there's a ton of meetings
[46:27] (2787.68s)
that go on.
[46:28] (2788.48s)
Yeah. Yeah. But but this is I I I think
[46:30] (2790.56s)
it's good to like like shine light
[46:31] (2791.92s)
because I also feel like once it sounds
[46:34] (2794.16s)
like a big change, but I also kind of
[46:35] (2795.84s)
feel if if you get good at this, you
[46:38] (2798.08s)
might not really want to go back to, you
[46:40] (2800.64s)
know, having a manager who's like, "All
[46:41] (2801.92s)
right, here's a project. We need to
[46:43] (2803.60s)
solve like, you know, scope it out and
[46:45] (2805.20s)
which you can do, right?"
[46:46] (2806.88s)
that that's cool. And now the next
[46:48] (2808.48s)
challenge that Bavik said was this all
[46:50] (2810.80s)
sounds great, but there's apparently
[46:52] (2812.24s)
bandwidth challenge. So it's it's he's
[46:54] (2814.40s)
become this like social resource where
[46:56] (2816.72s)
people just pull you into everything and
[46:58] (2818.40s)
you're reading.
[46:59] (2819.92s)
Yeah. No, you know, I think I I wish I
[47:02] (2822.00s)
had taken a screenshot, but you know, I
[47:03] (2823.84s)
have my Outlook calendar, right? So it's
[47:05] (2825.36s)
my schedule. My day looked like most
[47:08] (2828.56s)
people's week, so it looked like
[47:11] (2831.04s)
somebody had just like blew up a Tetris
[47:13] (2833.36s)
factory. Like there there was like I
[47:15] (2835.44s)
would have triple or quadruple booked on
[47:17] (2837.52s)
a Monday all through the day.
[47:19] (2839.28s)
So you would have the manager calendar
[47:20] (2840.64s)
as an IC.
[47:22] (2842.00s)
Yeah. And it's it's absolutely crazy
[47:24] (2844.24s)
because and you know for that large org
[47:26] (2846.24s)
that I was supporting everybody just
[47:28] (2848.88s)
added me as optional or or they might
[47:31] (2851.60s)
try to say like no you're actually
[47:33] (2853.04s)
required for all of these meetings but
[47:34] (2854.64s)
when you have you have a triple booked
[47:36] (2856.24s)
calendar and you're required for this
[47:37] (2857.84s)
stuff you just learn that you're going
[47:40] (2860.16s)
to have to disappoint a lot of people.
[47:42] (2862.24s)
Yeah. And so it's it's this sort of like
[47:44] (2864.80s)
uh you know um this thing where it's
[47:46] (2866.64s)
like it's almost easier to say no now
[47:48] (2868.64s)
that you're obscenely over booked versus
[47:51] (2871.60s)
when you're a senior engineer you're
[47:52] (2872.96s)
like I don't have time to write code but
[47:55] (2875.52s)
there's just barely enough time in
[47:58] (2878.00s)
between the cracks.
[47:59] (2879.76s)
And so I think that uh it's almost like
[48:02] (2882.32s)
when it when your schedule breaks that's
[48:04] (2884.48s)
when you are finally freed because you
[48:06] (2886.48s)
know that you can sort of say no to
[48:07] (2887.92s)
stuff. But ultimately, if I just went to
[48:10] (2890.16s)
all of the meetings that everybody said
[48:11] (2891.68s)
that I would have to go to, I would be a
[48:13] (2893.28s)
professional meeting attender and I
[48:15] (2895.12s)
would literally have no time to do the
[48:17] (2897.28s)
And then Bavik follows up on this next
[48:19] (2899.76s)
challenge, which is being truly present.
[48:21] (2901.52s)
And he writes, I think it's almost like,
[48:23] (2903.84s)
you know, he was sitting next to you.
[48:25] (2905.04s)
You find yourself physically present in
[48:26] (2906.64s)
one meeting while your mind is already
[48:28] (2908.16s)
racing against next three.
[48:29] (2909.68s)
You know, it's it's a it's a really big
[48:31] (2911.52s)
challenge. You know, I I pride myself on
[48:34] (2914.32s)
being a good communicator and being
[48:36] (2916.00s)
present. And when there there are 20
[48:38] (2918.88s)
things that are going on in the air or
[48:40] (2920.88s)
100 things that are going on, it's just
[48:43] (2923.20s)
really really difficult to to say single
[48:45] (2925.60s)
threaded. Um, and what I ended up having
[48:49] (2929.44s)
to do is to to sort of say like, okay, I
[48:52] (2932.00s)
could do all of these things and they
[48:53] (2933.60s)
would be really impactful, but I just
[48:55] (2935.52s)
had to aggressively prioritize and say,
[48:57] (2937.84s)
you know, for the availability, I'm just
[48:59] (2939.76s)
looking at availability. there's all
[49:01] (2941.28s)
these other fires that are going on
[49:03] (2943.28s)
which is disappointing
[49:05] (2945.36s)
because there there's so many things
[49:07] (2947.12s)
that you know you could be focusing on.
[49:09] (2949.52s)
It's it's it's super difficult. And so I
[49:11] (2951.92s)
you know I work with a lot of people to
[49:13] (2953.36s)
try to get them to the next level and
[49:14] (2954.64s)
they say Steve well I'm completely
[49:16] (2956.16s)
overwhelmed. There are like 20 things
[49:18] (2958.00s)
that are going on. Um and I tell them
[49:20] (2960.96s)
like you think it gets easier when you
[49:23] (2963.84s)
get higher level there's just going to
[49:25] (2965.36s)
be more and more things on your plate.
[49:27] (2967.04s)
Why wait until you burn out or you
[49:29] (2969.60s)
break? you can just start implementing
[49:31] (2971.20s)
these things now. So every high level
[49:32] (2972.80s)
tech I see I know and managers included
[49:35] (2975.52s)
they have a wonderful system in order to
[49:38] (2978.80s)
like isolate signal and then cut out the
[49:41] (2981.04s)
noise and if you don't have that you
[49:43] (2983.44s)
literally won't survive but it just at
[49:45] (2985.04s)
the at the principal level and above
[49:46] (2986.48s)
it's just it's just amplified that much
[49:48] (2988.40s)
more. I'm getting sense that a lot of
[49:50] (2990.88s)
the work as you do as a principal
[49:52] (2992.48s)
engineer I mean most there's huge
[49:54] (2994.32s)
amounts of software engineering and you
[49:55] (2995.84s)
need to be uh you know just just really
[49:58] (2998.00s)
good at at building resilient systems
[50:01] (3001.76s)
learning about new technologies you know
[50:03] (3003.60s)
for example today I'm assuming whoever
[50:05] (3005.36s)
is a principal engineer at Amazon they
[50:06] (3006.88s)
expected to just know everything about
[50:08] (3008.96s)
LLM's trade-offs characteristics etc
[50:11] (3011.52s)
because they're anyway but you also need
[50:14] (3014.08s)
to just become do the skills that
[50:16] (3016.40s)
managers have which is managing your
[50:18] (3018.64s)
time uh changing contacts, figure out
[50:22] (3022.00s)
how to get that focus time like you know
[50:24] (3024.00s)
contrary to popular belief like managers
[50:26] (3026.16s)
actually need focus time. So like you
[50:27] (3027.76s)
know I I will also always try to carve
[50:29] (3029.68s)
out some time but you're now doing it
[50:32] (3032.32s)
while your title is not manager but
[50:34] (3034.00s)
actually it's it's it feels like you
[50:35] (3035.60s)
combine a manager a lot of manual
[50:37] (3037.60s)
responsibilities and a lot of you know
[50:39] (3039.12s)
like experienced engineer and boom you
[50:41] (3041.12s)
get the principal engineer role. Oh the
[50:42] (3042.72s)
only upside is like you don't need to do
[50:44] (3044.16s)
performance reviews for people.
[50:45] (3045.28s)
Congratulations you saved a little bit
[50:46] (3046.56s)
of that. Well, actually during
[50:48] (3048.88s)
performance review season, they pull the
[50:50] (3050.80s)
principal engineers in cuz if you're if
[50:52] (3052.88s)
you're So, you know, if you're stack
[50:54] (3054.72s)
ranking people, okay, cool. Well, we'll
[50:57] (3057.04s)
need to take a look at their performance
[50:58] (3058.56s)
check. So, I reported to a VP, you know,
[51:01] (3061.04s)
one of my peers was a director and he
[51:03] (3063.28s)
was basically like, "Hey, Steve, I would
[51:04] (3064.72s)
like you to show up to my performance
[51:06] (3066.24s)
review for my entire org of hundreds
[51:08] (3068.88s)
something people." And I'm like, "I
[51:10] (3070.40s)
can't do that for you and for everybody
[51:12] (3072.48s)
else." Okay. So now so now it would make
[51:14] (3074.64s)
sense why as a principal engineer your
[51:16] (3076.32s)
compensation package will be similar to
[51:18] (3078.24s)
like uh is it a senior engineering
[51:20] (3080.48s)
manager or something like that
[51:21] (3081.76s)
around that
[51:22] (3082.48s)
around that but basically like the job
[51:24] (3084.88s)
is has a lot of overlaps okay the
[51:27] (3087.68s)
benefit is you're not the one delivering
[51:29] (3089.44s)
the performance review the direct report
[51:31] (3091.68s)
but you're doing almost everything else
[51:33] (3093.36s)
or in terms of the effort I'm talking
[51:35] (3095.68s)
about.
[51:36] (3096.80s)
Okay. So, having been a principal
[51:39] (3099.28s)
engineer for 4 years, what are the good
[51:41] (3101.52s)
things that you really really liked
[51:42] (3102.96s)
about Amazon, specifically Amazon's
[51:45] (3105.20s)
principal engineer role? And what are
[51:46] (3106.72s)
some of the, you know, not so good or it
[51:50] (3110.24s)
could have been better things?
[51:51] (3111.44s)
I mean, the the great parts are you get
[51:54] (3114.72s)
visibility that you just couldn't
[51:56] (3116.48s)
possibly have at the team level. you
[51:58] (3118.64s)
know, within a large organization like
[52:00] (3120.56s)
Prime Video or wherever you're at, there
[52:03] (3123.12s)
are many thousands of people that are
[52:05] (3125.36s)
working within that organization doing
[52:07] (3127.68s)
so many things, right? And and typically
[52:10] (3130.08s)
the performance of these people is
[52:11] (3131.52s)
really high. There's so many different
[52:13] (3133.20s)
directions that are going on. And so to
[52:15] (3135.60s)
survive, you kind of have to look inward
[52:17] (3137.44s)
and you say, "Okay, well, here's my
[52:18] (3138.96s)
service boundary. Here's all the
[52:20] (3140.24s)
software I own. I'm going to own
[52:22] (3142.16s)
everything within the sphere of
[52:23] (3143.44s)
ownership." because you've built this
[52:25] (3145.12s)
wall up, you tend not to be able to see
[52:27] (3147.68s)
like that broader picture.
[52:29] (3149.52s)
And so, as a principal engineer, I think
[52:31] (3151.36s)
it's really awesome to be able to sort
[52:33] (3153.12s)
of like spelunk and and be able to go to
[52:35] (3155.44s)
different teams and and sort of see that
[52:37] (3157.52s)
broader picture. And I just don't I
[52:40] (3160.24s)
don't see a way that you would be able
[52:41] (3161.60s)
to get that vis that type of visibility
[52:43] (3163.76s)
that's super interesting um at a lower
[52:46] (3166.16s)
level. Mhm.
[52:47] (3167.04s)
You know, I think the other thing is
[52:48] (3168.56s)
like, you know, whether it's it's
[52:50] (3170.32s)
warranted or not, you do get some amount
[52:51] (3171.92s)
of status when you go to a meeting,
[52:53] (3173.92s)
people just listen to you. They listen
[52:56] (3176.24s)
to your hairrained ideas and it's kind
[52:58] (3178.40s)
of nice because you don't necessarily
[52:59] (3179.84s)
have to like prove yourself over and
[53:02] (3182.08s)
over again, right?
[53:03] (3183.36s)
It's a bit less like professional like
[53:05] (3185.60s)
not fights, but just establishing that
[53:08] (3188.24s)
you know what you're talking about.
[53:09] (3189.76s)
Yeah. Yeah. Um, now the bad things are,
[53:13] (3193.60s)
you know, uh, there's a lot of folks
[53:16] (3196.00s)
that are really good in tech and being
[53:17] (3197.52s)
really effective as a principal
[53:18] (3198.72s)
engineer, but then they also, you know,
[53:21] (3201.28s)
myself included, they're like, "Okay,
[53:23] (3203.04s)
cool. Well, that sort of makes me an
[53:24] (3204.64s)
expert in pretty much everything." And
[53:26] (3206.96s)
so you would get these principal
[53:28] (3208.32s)
engineers together. We had a weekly
[53:29] (3209.60s)
meeting and and so it would be like okay
[53:32] (3212.16s)
if you wanted to talk about like
[53:33] (3213.68s)
establishing a constitution for a small
[53:35] (3215.76s)
island nation all of a sudden they would
[53:37] (3217.68s)
just be like well like here the main
[53:39] (3219.20s)
considerations is like we nobody has a
[53:41] (3221.12s)
background in government policy but all
[53:43] (3223.92s)
of a sudden like just because you're
[53:45] (3225.44s)
sort of trained to do so you start to
[53:47] (3227.44s)
like pitch in you're like well actually
[53:49] (3229.36s)
you know maybe we should have two
[53:50] (3230.48s)
branches of government or three branches
[53:51] (3231.92s)
of government and and it just sounds
[53:54] (3234.16s)
like we would know what we're we're
[53:55] (3235.60s)
doing but we don't and so there's this
[53:58] (3238.88s)
trap and and again I've fallen into it
[54:00] (3240.80s)
many times where you actually think
[54:02] (3242.24s)
you're an expert in one thing but you're
[54:05] (3245.60s)
actually not right and so you know take
[54:07] (3247.60s)
LLMs there's a ton of folks that
[54:10] (3250.08s)
understand AI I left before it was sort
[54:12] (3252.88s)
of like allowed to use internally but I
[54:15] (3255.20s)
think you can
[54:16] (3256.08s)
use it now um I'm not an expert in LLMs
[54:19] (3259.92s)
at all but I I do think that um the
[54:23] (3263.20s)
expectation would be that you understand
[54:26] (3266.24s)
you know how they work but then the
[54:28] (3268.08s)
expectations also like hey what should
[54:30] (3270.24s)
our policy be how should we be thinking
[54:32] (3272.16s)
about this stuff
[54:33] (3273.84s)
and I think that's fine for mature
[54:36] (3276.96s)
technologies potentially like you can
[54:38] (3278.64s)
ramp yourself up for it but as like that
[54:40] (3280.72s)
particular landscape is changing so
[54:42] (3282.56s)
quickly I think there's this sort of
[54:44] (3284.80s)
trap where you you sort of you speak as
[54:47] (3287.12s)
an authority even though you haven't had
[54:49] (3289.44s)
the requisite time to ramp up at
[54:51] (3291.20s)
something
[54:51] (3291.92s)
and you've been there for 17 years at at
[54:54] (3294.00s)
Amazon what are your favorite parts of
[54:55] (3295.60s)
the culture like I I you know there's a
[54:57] (3297.44s)
lot of things that uh there's a values
[55:00] (3300.32s)
that that we all know like the frugality
[55:02] (3302.80s)
customer obsession what what were the
[55:05] (3305.20s)
things that you're that you found to be
[55:07] (3307.36s)
like the most interesting or the ones
[55:08] (3308.88s)
that had lasting impact and how did they
[55:10] (3310.72s)
change how did Amazon change over 17
[55:12] (3312.80s)
years they must have changed
[55:14] (3314.48s)
no I I think the the things I missed the
[55:17] (3317.12s)
most um and in the secret sauce yeah the
[55:19] (3319.84s)
the leadership principles are good but I
[55:21] (3321.76s)
think the actual secret sauce there is
[55:24] (3324.56s)
principled thinking Mhm.
[55:26] (3326.32s)
Right. Yeah. So, you know, there's, you
[55:29] (3329.12s)
know, uh, invent and simplify and bias
[55:31] (3331.20s)
for action and all of this stuff, but
[55:33] (3333.28s)
like ultimately the thing that is
[55:37] (3337.12s)
amazing about those leadership
[55:38] (3338.72s)
principles aren't the specific stances
[55:40] (3340.72s)
that they took. So, they decided that
[55:42] (3342.48s)
customer obsession is a big deal. They
[55:44] (3344.08s)
decided that bias for action is a big
[55:46] (3346.32s)
All of these things. But really, if you
[55:48] (3348.32s)
if you looked at a meta level, you'd be
[55:50] (3350.08s)
like, "Oh, these guys have principles
[55:51] (3351.92s)
that they won't budge on." I sort of
[55:53] (3353.68s)
think about it in terms of math and
[55:55] (3355.60s)
axioms like you just take certain things
[55:58] (3358.16s)
to be true. You know, two lines that are
[56:01] (3361.36s)
parallel if you extend them out to
[56:02] (3362.96s)
infinity won't touch them and won't
[56:04] (3364.80s)
touch with each other.
[56:05] (3365.84s)
Yeah. You assume that's true.
[56:07] (3367.28s)
Yeah. You you don't you don't prove
[56:08] (3368.80s)
that. It's an axiom and then based off
[56:10] (3370.72s)
of that you're able to build a system of
[56:13] (3373.12s)
mathematics, right? And so it's the same
[56:15] (3375.52s)
thing with the corporate leadership
[56:17] (3377.12s)
principles at Amazon. They basically
[56:19] (3379.52s)
said, "Okay, we are going to fix these
[56:22] (3382.16s)
things to be true." There are 16 or 12
[56:24] (3384.40s)
or I don't know, they just sort of built
[56:26] (3386.64s)
and now they're 16
[56:28] (3388.48s)
and um but there are like four or five
[56:31] (3391.60s)
that are just really core to to Amazon
[56:34] (3394.96s)
and we just fix those things to be true.
[56:37] (3397.12s)
Which which ones were the ones that you
[56:38] (3398.64s)
felt were the most present?
[56:40] (3400.88s)
Customer obsession. We are absolutely
[56:43] (3403.28s)
customer obsessed. We'll just burn money
[56:45] (3405.36s)
to to delight a customer. You can you
[56:47] (3407.44s)
can be in a meeting with a VP as an
[56:49] (3409.28s)
intern and you say hey that's a bad
[56:51] (3411.36s)
customer experience. It would be like a
[56:52] (3412.80s)
needle coming off a record. It would
[56:54] (3414.56s)
just be like what what are you talking
[56:55] (3415.92s)
about like immediately right? You know
[56:57] (3417.76s)
bias for action. Uh so like just get
[57:00] (3420.32s)
some stuff done. Stop asking for
[57:01] (3421.84s)
permission. Just like go and do it,
[57:03] (3423.44s)
right? Ownership it's just like you own
[57:05] (3425.60s)
your software, you run the you know you
[57:07] (3427.76s)
do the operations, you know you own the
[57:10] (3430.24s)
bug count, all of this stuff, right? Um,
[57:12] (3432.72s)
so those are the ones that are like
[57:14] (3434.16s)
those are fixed and then you start
[57:16] (3436.32s)
layering things on top of it and I think
[57:18] (3438.48s)
it's really great and but you know you
[57:19] (3439.84s)
could you could take Amazon and you
[57:21] (3441.44s)
could have like the you know evil goatee
[57:23] (3443.52s)
version of Amazon which is just sort of
[57:25] (3445.12s)
the opposite of those things and that
[57:26] (3446.88s)
would still be a really valid and
[57:28] (3448.56s)
awesome company. So you could say okay
[57:30] (3450.72s)
well what's the opposite of customer
[57:32] (3452.08s)
obsession? It's not customer obsession
[57:34] (3454.24s)
or not not being customer obsessed.
[57:35] (3455.92s)
I I I think it's you know like being
[57:37] (3457.92s)
about your staff. Yeah, which is Google.
[57:42] (3462.24s)
It could be like, hey, we really care
[57:43] (3463.92s)
about our people above everything else.
[57:45] (3465.60s)
Or it could be, you know, um let's not
[57:48] (3468.08s)
mince around it. We care about topline
[57:49] (3469.84s)
or bottom line revenue. Yeah,
[57:51] (3471.28s)
that's totally valid, right? And then
[57:53] (3473.44s)
you could just fix that. You wouldn't
[57:54] (3474.88s)
you can't prove that, you know, being uh
[57:57] (3477.12s)
you know, staff focused is a bad thing.
[57:59] (3479.12s)
You just build that and then you know a
[58:01] (3481.12s)
certain set of of things will happen
[58:02] (3482.88s)
like great things are going to happen
[58:04] (3484.32s)
and then like not so great things are
[58:06] (3486.16s)
going to happen. those not great things
[58:07] (3487.84s)
that happen, you can try to mitigate
[58:09] (3489.76s)
them, but you can't fix them because you
[58:12] (3492.00s)
have started with this principled
[58:13] (3493.60s)
approach to everything.
[58:14] (3494.72s)
Yeah. Yeah. It it it all goes like every
[58:17] (3497.36s)
everything has.
[58:18] (3498.48s)
I I see what you mean, but I I think
[58:20] (3500.08s)
what you're saying is like it it might
[58:22] (3502.08s)
be less about what the specific
[58:24] (3504.16s)
principles are. I mean, Amazon has
[58:25] (3505.60s)
theirs and we know about them, but it's
[58:27] (3507.12s)
just sticking to them and not keeping
[58:28] (3508.72s)
wiggling cuz because if you keep
[58:30] (3510.08s)
wiggling, it's like what what's the
[58:31] (3511.84s)
point, right? then then you're going to
[58:32] (3512.96s)
have a really look at a mediocre not
[58:35] (3515.52s)
truly not standout company whatever you
[58:38] (3518.08s)
what does it actually mean to be
[58:39] (3519.52s)
principled and to not bend when it could
[58:42] (3522.08s)
be really easy to do so so that's a
[58:44] (3524.00s)
that's an amazing secret sauce of
[58:45] (3525.68s)
Amazon's people look at the leadership
[58:47] (3527.04s)
principle I'm like no it's principle
[58:48] (3528.48s)
thinking another thing
[58:49] (3529.84s)
a lot of this honestly from what I
[58:51] (3531.52s)
understand talking to you earlier and
[58:53] (3533.12s)
some other people a lot of it probably
[58:54] (3534.32s)
comes from Jeff Bezos being from the top
[58:56] (3536.88s)
down being very principled and not not
[58:58] (3538.88s)
giving not saying we we will do whatever
[59:02] (3542.32s)
it takes. Sounds like it was customer
[59:04] (3544.56s)
obsession initially and then some other
[59:06] (3546.00s)
things.
[59:06] (3546.48s)
Yeah. Yeah. Absolutely. And he's he was
[59:08] (3548.56s)
he was an absolute genius uh when it it
[59:10] (3550.80s)
came through. So I'm a I'm a you know
[59:12] (3552.16s)
I'm a Jeff Bezos fanboy. Um for sure
[59:15] (3555.12s)
like it it just it just worked. Um
[59:17] (3557.84s)
another thing that uh uh that's Amazon
[59:20] (3560.88s)
secret sauce is just the writing
[59:22] (3562.16s)
culture. And so you know I spent on the
[59:25] (3565.52s)
order of like 1 to four hours every day
[59:27] (3567.68s)
reading while I was a principal
[59:29] (3569.20s)
engineer. And the it was we had a
[59:32] (3572.08s)
standard format. It was a it was a
[59:33] (3573.76s)
six-page memo. And you know uh that
[59:36] (3576.80s)
would be our business strategy. That
[59:38] (3578.64s)
would be uh a system design. That would
[59:41] (3581.12s)
be you know uh what we called the PR
[59:44] (3584.08s)
FAQ. So a press release and frequently
[59:45] (3585.92s)
asked questions for like a new line of
[59:48] (3588.00s)
business or a new initiative.
[59:49] (3589.76s)
And everybody was sort of constrained to
[59:52] (3592.00s)
the six-page format.
[59:53] (3593.84s)
And everybody just produces documents in
[59:56] (3596.24s)
that format for whatever they need to
[59:57] (3597.84s)
do. And so when I would try to get up to
[60:00] (3600.64s)
speed on a particular thing, I would
[60:02] (3602.24s)
just be like, "Give me your six pages.
[60:04] (3604.32s)
Give me all your documents." And I just
[60:05] (3605.92s)
got really really good at just reading
[60:08] (3608.56s)
these documents to get up to speed,
[60:10] (3610.96s)
which was a self-fulfilling and virtuous
[60:13] (3613.28s)
cycle, which is just like, "Okay, well
[60:14] (3614.96s)
now I need to express myself." And so I
[60:17] (3617.36s)
will write a six-pager, and that will
[60:19] (3619.04s)
set the context for whatever we're
[60:20] (3620.64s)
working on. we'd go to a meeting, you
[60:22] (3622.72s)
would read the six-pager and it was just
[60:24] (3624.96s)
super great to to just actually just
[60:28] (3628.32s)
have people do study hall at the
[60:29] (3629.92s)
beginning part of a meeting where you
[60:31] (3631.76s)
just everybody just gets fast forwarded
[60:33] (3633.84s)
and then you have a really great
[60:35] (3635.20s)
discussion at the end.
[60:36] (3636.56s)
That is what an amazing culture that I
[60:39] (3639.44s)
think that almost every other company
[60:41] (3641.84s)
should replicate if they could. But I
[60:44] (3644.80s)
think the the difficulty would be like
[60:46] (3646.80s)
you actually have to be disciplined and
[60:48] (3648.64s)
actually
[60:49] (3649.84s)
have a breathing cult. Yeah. In
[60:50] (3650.96s)
principle, then have a reading culture
[60:52] (3652.56s)
and then actually value writing.
[60:55] (3655.52s)
Yeah. I almost wonder if unless it comes
[60:57] (3657.60s)
from the top, some of these things might
[60:58] (3658.96s)
just be really really hard to do.
[61:01] (3661.20s)
One thing that I figured is we're in
[61:04] (3664.56s)
your studio right now and you have a lot
[61:06] (3666.48s)
of these blocks and I asked them what
[61:08] (3668.08s)
they are. Are they for promotions or
[61:10] (3670.00s)
projects or whatever? They're for
[61:11] (3671.92s)
patents.
[61:13] (3673.28s)
Uh and this is for patent number 10,
[61:17] (3677.04s)
10,824
[61:18] (3678.96s)
964. Can you tell me about why you have
[61:22] (3682.08s)
these, how they come about? Yeah. What
[61:24] (3684.32s)
you needed to do for them?
[61:25] (3685.52s)
So the the highest order bit is like you
[61:28] (3688.40s)
know um for better or for worse there
[61:30] (3690.08s)
are software patents that exist. Um
[61:32] (3692.56s)
Amazon they'll say that basically the
[61:35] (3695.12s)
reason they have them is defensively
[61:37] (3697.28s)
because you know other people will
[61:38] (3698.96s)
assert that hey you're in violation of
[61:41] (3701.20s)
our patents or our IP.
[61:43] (3703.12s)
Um and then you know we'll use them
[61:45] (3705.04s)
reactively. Okay fine but you know
[61:47] (3707.12s)
you're also in violation of these other
[61:48] (3708.88s)
things. Yeah.
[61:49] (3709.84s)
Um, and so, you know, there's a there is
[61:52] (3712.16s)
a culture of of trying to make sure
[61:54] (3714.00s)
that, you know, we protect ourselves in
[61:55] (3715.68s)
that way. But, you know, there's the
[61:57] (3717.04s)
other part of software patents, which is
[61:58] (3718.48s)
basically like, hey, can you really
[61:59] (3719.84s)
patent like math or whatever? Um, and so
[62:02] (3722.80s)
what I learned over time is that, you
[62:04] (3724.56s)
know, I'm just a really bad IP lawyer,
[62:06] (3726.64s)
even though, you know, as a principal
[62:08] (3728.40s)
engineer, I might cosplay as somebody
[62:10] (3730.08s)
that really understands software
[62:11] (3731.52s)
patents, right? um at the end of the day
[62:14] (3734.40s)
um you know what we would do is we would
[62:16] (3736.16s)
take our important six pages and we
[62:17] (3737.84s)
would hand them over to the legal team
[62:19] (3739.76s)
and then they would just be like oh this
[62:21] (3741.36s)
stuff is really interesting like let's
[62:23] (3743.04s)
explore that and so it it turned into
[62:25] (3745.52s)
this awesome thing where like we just
[62:27] (3747.20s)
had ready inputs to go into like the you
[62:30] (3750.72s)
know into that particular system
[62:32] (3752.16s)
a writing culture turns out has a bunch
[62:34] (3754.00s)
of benefits
[62:35] (3755.36s)
exactly and and I think that the there's
[62:38] (3758.16s)
this sort of like it's the concept is
[62:40] (3760.08s)
called like the curse of knowledge which
[62:41] (3761.44s)
is essentially Like if you understand
[62:43] (3763.60s)
something, you discount how long like
[62:46] (3766.32s)
how easy that concept is.
[62:48] (3768.80s)
and so it's just like you don't get it,
[62:50] (3770.64s)
you don't get it, you don't get it, and
[62:52] (3772.00s)
then you get it and then you're like,
[62:53] (3773.04s)
"Oh, that's trivial, right?" Even
[62:54] (3774.96s)
though, you know, there could have been,
[62:56] (3776.56s)
you know, it could actually be novel or
[62:57] (3777.92s)
it could actually be interesting. And so
[62:59] (3779.92s)
what ends up happening is that you would
[63:01] (3781.44s)
just throw these documents over to the
[63:03] (3783.20s)
lawyers and then they would basically be
[63:05] (3785.36s)
like, "Oh, this stuff is great." and you
[63:07] (3787.52s)
would just be like, well, that's just
[63:08] (3788.96s)
that's just regular software development
[63:10] (3790.40s)
or that's just the context and domain
[63:11] (3791.92s)
that we were living in. You know, it
[63:13] (3793.36s)
turns out that there's some some
[63:14] (3794.56s)
interesting stuff. This particular
[63:16] (3796.08s)
patent I'm I'm I'm proud of. So, there's
[63:18] (3798.48s)
a uh a system design interview question
[63:20] (3800.88s)
that seems to be popular right now, um
[63:22] (3802.88s)
which is like design ticket master,
[63:25] (3805.44s)
right? And so I work on Amazon tickets
[63:27] (3807.44s)
and you know, we ended up shuttering
[63:29] (3809.04s)
that business, but you know, we ended up
[63:30] (3810.96s)
building like one of the world's fastest
[63:32] (3812.72s)
like ticket selling systems like in the
[63:35] (3815.20s)
world, right? we could do many many
[63:37] (3817.12s)
orders per second. So the use case is
[63:39] (3819.52s)
basically at t0 that's you know for a
[63:41] (3821.92s)
really big ticket on sale like that's
[63:43] (3823.44s)
when the maximum amount of demand and
[63:45] (3825.52s)
requests are coming in um and you want
[63:48] (3828.00s)
to sell out all of your ticket supply as
[63:51] (3831.20s)
quickly as possible. The problem is I
[63:54] (3834.96s)
think uh one where you have seated
[63:57] (3837.68s)
concerts.
[63:58] (3838.48s)
Mhm. And so when you purchase a a
[64:01] (3841.84s)
ticket, you know, most of the time with
[64:03] (3843.60s)
the system design stuff, it'll be like
[64:05] (3845.20s)
general admission or it won't be a high
[64:07] (3847.12s)
ticket on, you know, like one with a
[64:09] (3849.36s)
bunch of demand. You have to find
[64:10] (3850.88s)
contiguous seats.
[64:12] (3852.48s)
Yeah. So the really next to each other.
[64:15] (3855.04s)
Yes. Exactly. And so, you know, it's uh
[64:19] (3859.12s)
it's actually really hard. Like suppose
[64:21] (3861.28s)
it was a SQL database as your backing
[64:23] (3863.20s)
store. like how do you come up with a
[64:24] (3864.88s)
SQL query that's just like hey give me
[64:27] (3867.20s)
the best four tickets you know within
[64:30] (3870.24s)
this particular price range that are
[64:32] (3872.08s)
sitting sitted next to each other.
[64:33] (3873.68s)
Yeah. Now now you're thinking so this is
[64:35] (3875.76s)
a real real world thing where you need
[64:37] (3877.44s)
to you want to be as efficient as
[64:38] (3878.88s)
possible in terms of resource usage may
[64:41] (3881.60s)
not be maybe you want to minimize your
[64:42] (3882.88s)
CPU or memory depending on on what you
[64:44] (3884.80s)
have I assume and you need to do as
[64:46] (3886.80s)
quick as rapidly as possible to give
[64:49] (3889.52s)
this to people. Okay. Okay. So, so now
[64:52] (3892.00s)
we're talking about a problem that is
[64:53] (3893.60s)
seems like pretty novel in some ways,
[64:56] (3896.48s)
right?
[64:56] (3896.96s)
Yeah. And so, you know, I was I I did
[64:58] (3898.88s)
this patent with a senior principal. I
[65:00] (3900.80s)
was a senior engineer at the time, but
[65:02] (3902.56s)
the the idea is like, you know, what is
[65:05] (3905.92s)
the theoretical
[65:07] (3907.84s)
maximum speed by which we could, you
[65:10] (3910.56s)
know, show this inventory to people.
[65:12] (3912.88s)
And it turns out that, you know, even if
[65:15] (3915.84s)
you have a high ticket on sale, you only
[65:17] (3917.44s)
have like thousands of tickets at the
[65:19] (3919.36s)
end of the day. So instead of making a
[65:21] (3921.44s)
request to like a backend that would
[65:24] (3924.08s)
conduct some sort of search across the
[65:25] (3925.76s)
space,
[65:27] (3927.12s)
what if you actually inverted it and
[65:29] (3929.44s)
then you basically had each of the
[65:31] (3931.92s)
individual hosts have like some view on
[65:34] (3934.96s)
the entire arena or venue that was there
[65:38] (3938.32s)
and you loaded up all of that
[65:40] (3940.72s)
availability and inventory into like L2
[65:43] (3943.92s)
cache on a CPU.
[65:45] (3945.36s)
Because it's actually not that many. So
[65:46] (3946.72s)
if you had this compact
[65:47] (3947.60s)
rep was pretty big.
[65:50] (3950.08s)
Then what you can do is you can you can
[65:52] (3952.24s)
do bit manipulation to like really
[65:54] (3954.56s)
really quickly get contiguous seats that
[65:57] (3957.20s)
are there.
[65:58] (3958.32s)
And then what you do is you can like
[66:00] (3960.32s)
send in that particular request and try
[66:02] (3962.96s)
to like reserve those particular seats.
[66:04] (3964.80s)
Yeah. Now now there's a logging problem
[66:06] (3966.80s)
which is much more tractable than like
[66:09] (3969.84s)
hey there's uh you know two million
[66:12] (3972.80s)
people that have just hit your on
[66:15] (3975.68s)
each of them. I'm launching a search for
[66:17] (3977.12s)
each of them.
[66:18] (3978.00s)
Yes. So the the inversion of that
[66:20] (3980.00s)
ordering process by which you like
[66:22] (3982.00s)
actually send out the inventory to the
[66:24] (3984.00s)
individual nodes and then like load it
[66:26] (3986.64s)
up into CPU cache and then just do bit
[66:28] (3988.72s)
manipulation
[66:30] (3990.24s)
um and then try to lock that resource
[66:32] (3992.32s)
from the individual nodes. That was that
[66:34] (3994.56s)
was the basis of this particular patent.
[66:36] (3996.48s)
Awesome. That's clever. And like that
[66:38] (3998.96s)
sounds like some you know people are
[66:40] (4000.48s)
always asking like oh you know on my job
[66:43] (4003.04s)
I don't use the algorithm stuff or or
[66:45] (4005.36s)
any of the formal methods. Sounds like
[66:47] (4007.36s)
there are some uses of it especially
[66:49] (4009.20s)
when you're trying to figure out what is
[66:50] (4010.56s)
it like when you just taking away from
[66:52] (4012.56s)
the pattern like just having a problem
[66:54] (4014.88s)
like like this and saying like what is
[66:56] (4016.48s)
the theoretical limit that we can do
[66:58] (4018.48s)
what is the fastest possible like to
[67:00] (4020.48s)
answer that you probably want to have
[67:02] (4022.24s)
access to these tools like you know like
[67:04] (4024.32s)
so it's it's not always the time and
[67:06] (4026.08s)
effort to yeah actually get into these
[67:08] (4028.08s)
things and um so what are you up to now
[67:11] (4031.04s)
that you've you've left Amazon a year
[67:13] (4033.84s)
ago after like 17 18 very long years,
[67:17] (4037.20s)
you know, I'm just, you know, I'm I'm
[67:18] (4038.56s)
just making content. I'm just sort of
[67:19] (4039.92s)
living the dream there, you know, making
[67:21] (4041.60s)
YouTube videos, uh, started up a
[67:23] (4043.52s)
newsletter. Um, I have a Discord
[67:26] (4046.24s)
community and yeah, just
[67:28] (4048.24s)
Yeah. And we're going to link all all of
[67:29] (4049.60s)
those below. I actually like got to
[67:32] (4052.00s)
first know you before we started
[67:33] (4053.60s)
talking. This was like probably a few
[67:35] (4055.20s)
years ago from your YouTube videos,
[67:36] (4056.88s)
which are, you know, you know, like you
[67:38] (4058.88s)
you shared a lot about like Amazon
[67:40] (4060.72s)
things, software engineering things, and
[67:42] (4062.56s)
just like your general thinking, but
[67:44] (4064.08s)
yeah, your news is a new one. So, I'm
[67:45] (4065.92s)
I'm we'll we'll link it in the show
[67:47] (4067.52s)
notes below. It's it's always a good way
[67:49] (4069.04s)
to keep in touch and also, you know,
[67:50] (4070.40s)
like on your YouTube channel.
[67:51] (4071.76s)
Awesome.
[67:52] (4072.56s)
So, as closing, I have some some rapid
[67:54] (4074.56s)
questions.
[67:55] (4075.52s)
So, I'll I'll just ask and you just
[67:56] (4076.88s)
shoot what comes to mind. What is career
[67:59] (4079.28s)
advice that greatly helped you in your
[68:02] (4082.16s)
Yeah. I mean, this is I you know, I talk
[68:04] (4084.40s)
a lot about this. It's kind of like, oh,
[68:06] (4086.32s)
what's what's your favorite food or your
[68:08] (4088.08s)
favorite movie? It's just like there's
[68:09] (4089.60s)
so much there and it's hard to pick one.
[68:11] (4091.52s)
What I would say is instead of saying
[68:13] (4093.60s)
like, hey, what's the technology that I
[68:15] (4095.52s)
should learn that's really going to, you
[68:18] (4098.00s)
know, u make my career uh, you know,
[68:20] (4100.56s)
solid, instead sort of flip it around
[68:23] (4103.04s)
and say like, how can I quickly learn
[68:25] (4105.52s)
skills?
[68:27] (4107.28s)
That makes you that makes you sort of
[68:29] (4109.36s)
like recession proof, right? That that
[68:31] (4111.52s)
sort of makes you valuable. It's
[68:33] (4113.12s)
essentially metalarning. It's like how
[68:34] (4114.64s)
can I learn something faster and faster?
[68:37] (4117.12s)
If if that's your focus, then you'll
[68:39] (4119.52s)
always be you you'll never have a
[68:41] (4121.36s)
problem finding a job and you'll never
[68:43] (4123.68s)
have a problem progressing in your
[68:45] (4125.84s)
career. Now some of the skills may be
[68:48] (4128.08s)
difficult to find resources on online
[68:50] (4130.56s)
but you know I think if you just sort of
[68:52] (4132.56s)
think about like what's a valuable skill
[68:54] (4134.48s)
that if I knew right now would you know
[68:57] (4137.76s)
make my you know job search easier or
[69:00] (4140.00s)
would like make me you know perform
[69:02] (4142.32s)
better on the job and then just sort of
[69:04] (4144.80s)
thinking about acquiring that skill as
[69:06] (4146.56s)
quickly as possible
[69:07] (4147.68s)
and do it now like don't wait.
[69:09] (4149.28s)
Yeah. Well people tend to postpone
[69:11] (4151.04s)
themselves. They'll be like, "Oh, well,
[69:12] (4152.48s)
I'll start when you know everything is l
[69:15] (4155.52s)
lined up." But like to begin, you just
[69:18] (4158.00s)
need to begin. Like when you start
[69:19] (4159.68s)
something that only then will you know
[69:21] (4161.36s)
what you need to do instead of saying
[69:23] (4163.36s)
like, "Oh, I need to get everything that
[69:25] (4165.68s)
I need to do first before I start."
[69:27] (4167.60s)
You've used a lot of programming
[69:28] (4168.88s)
languages. Which one's your favorite and
[69:31] (4171.28s)
why? And and which one do you dislike
[69:33] (4173.92s)
Yeah. You know, I I you know, I I have
[69:36] (4176.64s)
like a you know, obviously there's no
[69:38] (4178.24s)
perfect programming language. Um, what I
[69:40] (4180.56s)
would say is like I really enjoyed Pearl
[69:45] (4185.44s)
and nobody would ever give that answer,
[69:47] (4187.76s)
but I just like this concept of like
[69:49] (4189.52s)
there's just so many different ways to
[69:51] (4191.12s)
do it. It's a it's a write only
[69:52] (4192.56s)
language. Like you can't read anybody
[69:54] (4194.16s)
else's Pearl and I it's it's actually
[69:56] (4196.48s)
one of the languages that like uses up
[69:58] (4198.16s)
the most power. It's like the least
[69:59] (4199.68s)
efficient. It's interpreted. It's
[70:01] (4201.92s)
it's just like terrible.
[70:03] (4203.92s)
So most of Booking.com still runs out or
[70:06] (4206.16s)
some of it.
[70:06] (4206.64s)
Yeah. Amazon's back end was, you know,
[70:08] (4208.64s)
for a long time and still might be um,
[70:10] (4210.72s)
you know, sort of like Pearl Mason and
[70:12] (4212.40s)
sort of like, uh, web technology bolted
[70:14] (4214.48s)
onto Pearl. But I just kind of like it.
[70:16] (4216.24s)
I just feel like I can express myself
[70:17] (4217.84s)
and there's just like there's just what,
[70:19] (4219.84s)
however you'd like to express yourself,
[70:21] (4221.60s)
you can.
[70:22] (4222.64s)
Um, it also looked like an Asky factory
[70:24] (4224.72s)
blew up sometimes. And so it's just like
[70:26] (4226.64s)
it's it's, you know, now that it's on a
[70:28] (4228.96s)
podcast, you know, I wouldn't really,
[70:30] (4230.48s)
you know, advertise that fact. The best
[70:32] (4232.32s)
programming languages right now, I think
[70:34] (4234.00s)
Rust is pretty interesting. So I might,
[70:36] (4236.00s)
you know, pick that up. Um, at the end
[70:38] (4238.40s)
of the day, like I really love the
[70:41] (4241.44s)
boring languages. Yeah.
[70:43] (4243.04s)
Um, so you know, Java with, you know,
[70:46] (4246.00s)
for all of its stuff, like it's
[70:47] (4247.92s)
verbosity and I think it's just a great
[70:50] (4250.40s)
langu like a JVM based language,
[70:53] (4253.84s)
um, that has essentially like great like
[70:57] (4257.20s)
library support and a bunch of stuff
[70:58] (4258.96s)
written for it, but it's just like super
[71:00] (4260.96s)
boring. Maybe it's just because I'm from
[71:02] (4262.40s)
Amazon and we do this like enterprise
[71:04] (4264.08s)
stuff like
[71:05] (4265.52s)
it's a fine language.
[71:07] (4267.12s)
And then I see you you have a large
[71:09] (4269.04s)
bookshelf here. You also read a lot
[71:11] (4271.28s)
especially at Amazon although most
[71:12] (4272.56s)
internal documents. What is a book that
[71:14] (4274.24s)
you would recommend something around
[71:16] (4276.16s)
software engineering that that you
[71:17] (4277.60s)
enjoyed and it cannot be that book.
[71:19] (4279.68s)
It can't be your book. Um what I would
[71:22] (4282.40s)
say is you know you know I just given
[71:24] (4284.00s)
the advice about um you know metalarning
[71:27] (4287.12s)
and and career growth. I I think that
[71:29] (4289.76s)
most software developers should read a
[71:31] (4291.76s)
book by Kell Newport. It's called so
[71:34] (4294.16s)
good they can't ignore you. And so the
[71:35] (4295.84s)
concept there is around career capital.
[71:38] (4298.16s)
So like what are the skills that are in
[71:39] (4299.60s)
the most demand? And if you can just
[71:41] (4301.92s)
like learn those skills then you become
[71:44] (4304.24s)
in demand. And then you know from there
[71:46] (4306.00s)
you can choose what type of lifestyle
[71:47] (4307.92s)
that you'd like. You know you can also
[71:49] (4309.68s)
like sort of lean into you know some of
[71:52] (4312.16s)
the science of metalarning. So
[71:53] (4313.44s)
deliberate practice space repetition
[71:55] (4315.20s)
that sort of thing. Um, in terms of like
[71:57] (4317.84s)
tech books, I think the new uh AI
[72:00] (4320.16s)
engineering book uh by Chipwin is is
[72:02] (4322.88s)
amazing.
[72:03] (4323.76s)
It's Yeah.
[72:04] (4324.24s)
Um, I think uh DDIA, so the the the
[72:09] (4329.12s)
design of data intensive
[72:10] (4330.16s)
so good. A new new version is coming the
[72:11] (4331.92s)
end of a year actually.
[72:13] (4333.12s)
I'm excited about that. I think that'll
[72:14] (4334.64s)
be pretty good. Um, but you know, at the
[72:16] (4336.56s)
end of the day, like you don't want one
[72:18] (4338.48s)
book on your bookshelf, you want 50
[72:20] (4340.40s)
books on your bookshelf. Um, and so, you
[72:23] (4343.44s)
know, I think within a particular
[72:25] (4345.60s)
subgenre of techbooks, you know, I'd
[72:28] (4348.24s)
have recommendations there. But,
[72:29] (4349.60s)
yeah, Steve, this was great.
[72:31] (4351.20s)
Awesome.
[72:31] (4351.84s)
Really enjoyed it.
[72:32] (4352.96s)
Yeah, great. Thanks so much for having
[72:34] (4354.40s)
me. Thanks a lot for Steve for sharing
[72:36] (4356.24s)
all these details. Although Amazon's
[72:38] (4358.08s)
principal engineering level feels
[72:39] (4359.36s)
surprisingly difficult to get promoted
[72:41] (4361.04s)
to, I have yet to hear of such a strong
[72:43] (4363.20s)
principal engineering community than
[72:44] (4364.64s)
what Amazon builds and keeps investing
[72:46] (4366.48s)
in. This community itself could be a
[72:48] (4368.80s)
reason enough to consider the company
[72:50] (4370.32s)
after the principal plus level should
[72:52] (4372.16s)
you have the opportunity to do so. For a
[72:54] (4374.24s)
deep dive into Amazon's engineering
[72:55] (4375.76s)
culture, including the details on
[72:57] (4377.44s)
compensation, career ladders,
[72:59] (4379.36s)
performance reviews, and engineering
[73:00] (4380.64s)
processes, check out the Pragmatic
[73:02] (4382.48s)
Engineer deep dive linked in the show
[73:04] (4384.16s)
notes below. If you enjoyed this
[73:06] (4386.08s)
podcast, please do subscribe on your
[73:07] (4387.68s)
favorite podcast platform and on
[73:09] (4389.44s)
YouTube. This helps more people discover
[73:11] (4391.60s)
the podcast and a special thank you if
[73:13] (4393.44s)
you leave a rating. Thanks and see you
[73:15] (4395.44s)
in the next