[00:00] (0.08s)
A COE stands for
[00:01] (1.20s)
it's a correction of error. It's this
[00:02] (2.80s)
idea that you have holes in Swiss cheese
[00:04] (4.96s)
and you have a failure requires that
[00:07] (7.52s)
there's a hole across layers. That's the
[00:09] (9.52s)
best reading. Like I would just
[00:10] (10.80s)
subscribe to the email list where they
[00:12] (12.80s)
were published internally. So you have
[00:14] (14.08s)
this stream of disasters that are going
[00:16] (16.08s)
on within the company and you grab some
[00:18] (18.00s)
popcorn and you pop open one of these
[00:19] (19.84s)
COE's and you learn so much from that.
[00:22] (22.40s)
And I think that that's part of the
[00:23] (23.60s)
secret sauce. The idea, and I don't know
[00:25] (25.52s)
if it's like this for 100% of them, is
[00:27] (27.92s)
that it's a blameless culture sort of
[00:29] (29.60s)
thing. And so to really screw up
[00:31] (31.92s)
requires that multiple people drop the
[00:34] (34.56s)
ball. And you learn so much from that
[00:37] (37.04s)
sort of stuff. The brownouts, these
[00:38] (38.80s)
lessons that you would learn from trying
[00:40] (40.56s)
to recover from really large
[00:41] (41.76s)
dependencies. Those things are
[00:43] (43.04s)
immortalized inside some of these COE's.
[00:45] (45.12s)
So there's some very famous outages that
[00:47] (47.20s)
happened within Amazon and there were an
[00:49] (49.28s)
egg on our face. We really, really
[00:51] (51.28s)
learned those lessons through those
[00:52] (52.56s)
postmortems. They're absolutely
[00:53] (53.84s)
wonderful.