[00:00] (0.24s)
                    This is something of a nice change. I've
                 
            
                
                    [00:02] (2.40s)
                    given a lot of scientific talks and no
                 
            
                
                    [00:04] (4.32s)
                    one claps and cheers when I come on. Not
                 
            
                
                    [00:06] (6.96s)
                    normally even when I come on.
                 
            
                
                    [00:13] (13.76s)
                    It's really exciting. It's really
                 
            
                
                    [00:15] (15.12s)
                    wonderful to be here. I guess I should
                 
            
                
                    [00:17] (17.76s)
                    start off assuming that not everyone in
                 
            
                
                    [00:19] (19.76s)
                    this cavernous hall knows who I am. Who
                 
            
                
                    [00:22] (22.80s)
                    am I? I'm I'm someone who has done some
                 
            
                
                    [00:25] (25.68s)
                    work in AI for science who really
                 
            
                
                    [00:28] (28.24s)
                    believes that we can use the AI systems,
                 
            
                
                    [00:31] (31.52s)
                    these technologies, these ideas to
                 
            
                
                    [00:34] (34.80s)
                    change the world in a very specific way
                 
            
                
                    [00:37] (37.04s)
                    to make science go faster to enable new
                 
            
                
                    [00:39] (39.68s)
                    discoveries. I think it's really really
                 
            
                
                    [00:42] (42.00s)
                    wonderful. We have the opportunity to
                 
            
                
                    [00:44] (44.40s)
                    take these tools, these ideas
                 
            
                
                    [00:47] (47.68s)
                    and aim them toward the question of how
                 
            
                
                    [00:49] (49.92s)
                    can we build the right AI systems so
                 
            
                
                    [00:52] (52.72s)
                    that sick people can become healthy and
                 
            
                
                    [00:54] (54.96s)
                    go home from the hospital. And it's been
                 
            
                
                    [00:57] (57.84s)
                    kind of a a really wonderful and winding
                 
            
                
                    [01:00] (60.16s)
                    journey for me to end up here. I was
                 
            
                
                    [01:02] (62.48s)
                    originally trained as a physicist. I
                 
            
                
                    [01:04] (64.56s)
                    thought I was going to be a laws of the
                 
            
                
                    [01:06] (66.00s)
                    universe physicist. If I was very very
                 
            
                
                    [01:08] (68.72s)
                    lucky, I could do something that would
                 
            
                
                    [01:10] (70.64s)
                    end up one sentence in a textbook.
                 
            
                
                    [01:13] (73.60s)
                    And I did physics and I went to actually
                 
            
                
                    [01:16] (76.48s)
                    do a PhD in physics. And then kind of
                 
            
                
                    [01:19] (79.68s)
                    what I was working on didn't really grab
                 
            
                
                    [01:22] (82.24s)
                    me. I just it didn't feel like what I
                 
            
                
                    [01:24] (84.08s)
                    wanted to do. So I dropped out. I didn't
                 
            
                
                    [01:26] (86.64s)
                    start a startup. That would have been
                 
            
                
                    [01:28] (88.08s)
                    very on point for this event, but I uh
                 
            
                
                    [01:31] (91.04s)
                    dropped out and I ended up working at a
                 
            
                
                    [01:33] (93.68s)
                    company that was doing computational
                 
            
                
                    [01:35] (95.76s)
                    biology. How do we get computers to say
                 
            
                
                    [01:38] (98.08s)
                    something smart about biology? And I
                 
            
                
                    [01:40] (100.72s)
                    loved it. I loved it not just because it
                 
            
                
                    [01:43] (103.04s)
                    was fun, but it was something that would
                 
            
                
                    [01:44] (104.64s)
                    let me do what I thought I was good at.
                 
            
                
                    [01:47] (107.28s)
                    Write code, manipulate equations, think
                 
            
                
                    [01:50] (110.08s)
                    hard thoughts about the nature of the
                 
            
                
                    [01:51] (111.76s)
                    world and use it toward this very
                 
            
                
                    [01:54] (114.40s)
                    applied purpose that at the end we want
                 
            
                
                    [01:57] (117.12s)
                    to ena we want to make medicines or we
                 
            
                
                    [01:59] (119.12s)
                    want to enable others to make medicines.
                 
            
                
                    [02:01] (121.60s)
                    Then I really kind of became a biologist
                 
            
                
                    [02:04] (124.56s)
                    and a machine learner. Actually a
                 
            
                
                    [02:06] (126.16s)
                    machine learner because I left that job
                 
            
                
                    [02:07] (127.60s)
                    and I went back to grad school in
                 
            
                
                    [02:09] (129.20s)
                    biohysics and chemistry and uh I no
                 
            
                
                    [02:13] (133.04s)
                    longer had access to this incredible
                 
            
                
                    [02:14] (134.96s)
                    computer hardware that I had when I was
                 
            
                
                    [02:17] (137.60s)
                    working at my previous job and in fact
                 
            
                
                    [02:19] (139.28s)
                    they had custom asics for simulating how
                 
            
                
                    [02:22] (142.08s)
                    proteins this part of your body that
                 
            
                
                    [02:23] (143.60s)
                    I'll talk about move. And since I didn't
                 
            
                
                    [02:25] (145.76s)
                    have that anymore but I still wanted to
                 
            
                
                    [02:28] (148.00s)
                    work on the same problems. Well, I
                 
            
                
                    [02:29] (149.52s)
                    didn't want to just do the same thing
                 
            
                
                    [02:30] (150.72s)
                    with less compute. And so I started to
                 
            
                
                    [02:33] (153.92s)
                    learn and I was getting very interested
                 
            
                
                    [02:35] (155.68s)
                    in statistics, in machine learning. We
                 
            
                
                    [02:38] (158.64s)
                    didn't call it AI back then. In fact, we
                 
            
                
                    [02:40] (160.80s)
                    didn't even call it machine learning.
                 
            
                
                    [02:42] (162.08s)
                    That was a bit disreputable. I said, I'm
                 
            
                
                    [02:43] (163.68s)
                    working in statistical physics. But you
                 
            
                
                    [02:46] (166.64s)
                    know, how are we going to develop
                 
            
                
                    [02:48] (168.40s)
                    algorithms? How are we going to learn
                 
            
                
                    [02:50] (170.00s)
                    from data and do that instead of very
                 
            
                
                    [02:52] (172.08s)
                    large compute? And I guess it turns out
                 
            
                
                    [02:53] (173.60s)
                    in terms of AI in addition to very large
                 
            
                
                    [02:56] (176.08s)
                    compute to answer new problems. And
                 
            
                
                    [02:59] (179.84s)
                    after this I joined uh Google DeepMind
                 
            
                
                    [03:03] (183.36s)
                    and really joining a company that wanted
                 
            
                
                    [03:06] (186.80s)
                    to say how are we going to take these
                 
            
                
                    [03:09] (189.12s)
                    powerful technologies and all kind of
                 
            
                
                    [03:11] (191.68s)
                    these ideas and we they were becoming
                 
            
                
                    [03:13] (193.60s)
                    very very readily apparent how powerful
                 
            
                
                    [03:15] (195.68s)
                    these technologies were with
                 
            
                
                    [03:17] (197.20s)
                    applications
                 
            
                
                    [03:18] (198.96s)
                    uh to especially games but also to
                 
            
                
                    [03:22] (202.16s)
                    things like data centers and others. How
                 
            
                
                    [03:23] (203.84s)
                    are we going to take these technologies
                 
            
                
                    [03:25] (205.04s)
                    and use them to advance science and
                 
            
                
                    [03:27] (207.04s)
                    really push forward scientific frontier?
                 
            
                
                    [03:30] (210.08s)
                    And how can we do this in an industrial
                 
            
                
                    [03:32] (212.24s)
                    setting with an incredibly fast pace
                 
            
                
                    [03:35] (215.04s)
                    working with some really smart people
                 
            
                
                    [03:36] (216.72s)
                    working with great computer resources
                 
            
                
                    [03:38] (218.64s)
                    and with all that you darn well better
                 
            
                
                    [03:40] (220.64s)
                    make some progress and it's been really
                 
            
                
                    [03:42] (222.96s)
                    really fun and the fact that I'm on this
                 
            
                
                    [03:45] (225.20s)
                    stage indicates that we made some
                 
            
                
                    [03:46] (226.80s)
                    progress and I think it really the
                 
            
                
                    [03:49] (229.52s)
                    guiding principle for me has that when
                 
            
                
                    [03:52] (232.16s)
                    we do this work that ultimately we are
                 
            
                
                    [03:56] (236.64s)
                    building tools that will enable
                 
            
                
                    [03:58] (238.16s)
                    scientists to make discoveries.
                 
            
                
                    [04:00] (240.24s)
                    And what I think is really heartening
                 
            
                
                    [04:02] (242.40s)
                    about the work we've done and the part
                 
            
                
                    [04:04] (244.16s)
                    that really I think still just resonates
                 
            
                
                    [04:07] (247.44s)
                    with me at my core is there about I
                 
            
                
                    [04:10] (250.00s)
                    think 35,000 citations of Alphafold. But
                 
            
                
                    [04:13] (253.20s)
                    within that is there are tens of
                 
            
                
                    [04:16] (256.40s)
                    thousands of examples of people using
                 
            
                
                    [04:18] (258.72s)
                    our tools to do science that I couldn't
                 
            
                
                    [04:21] (261.36s)
                    do on my own but are using it to make
                 
            
                
                    [04:24] (264.48s)
                    discoveries. be it vaccines, be it drug
                 
            
                
                    [04:27] (267.60s)
                    development, be it how the body works.
                 
            
                
                    [04:30] (270.08s)
                    And I think that's really really
                 
            
                
                    [04:31] (271.20s)
                    exciting. And the part I want to talk to
                 
            
                
                    [04:33] (273.84s)
                    you about today and the story I want to
                 
            
                
                    [04:35] (275.52s)
                    tell you is a bit about the problem, a
                 
            
                
                    [04:38] (278.88s)
                    bit about how we did it. And I think
                 
            
                
                    [04:40] (280.32s)
                    especially the role of research and
                 
            
                
                    [04:43] (283.20s)
                    machine learning research and the fact
                 
            
                
                    [04:44] (284.48s)
                    that it isn't just off-the-shelf machine
                 
            
                
                    [04:46] (286.24s)
                    learning and then I want to tell you a
                 
            
                
                    [04:48] (288.24s)
                    little bit about what happens when you
                 
            
                
                    [04:50] (290.00s)
                    make something great and how people use
                 
            
                
                    [04:52] (292.32s)
                    it and what it does for the world. So,
                 
            
                
                    [04:54] (294.96s)
                    I'll start with the world's shortest
                 
            
                
                    [04:56] (296.56s)
                    biology lesson. The cell is complex.
                 
            
                
                    [05:00] (300.64s)
                    Um, for people who have only studied
                 
            
                
                    [05:04] (304.08s)
                    biology in high school or in college,
                 
            
                
                    [05:06] (306.48s)
                    you might have this idea that the cell
                 
            
                
                    [05:08] (308.24s)
                    is a couple parts that have labels
                 
            
                
                    [05:10] (310.24s)
                    attached to them. And it's kind of
                 
            
                
                    [05:12] (312.40s)
                    simple, but really it looks much more
                 
            
                
                    [05:14] (314.00s)
                    like what you see on the screen. It's
                 
            
                
                    [05:16] (316.16s)
                    dense. It's complex. Uh, in terms of
                 
            
                
                    [05:19] (319.12s)
                    crowding, it's like the swimming pool on
                 
            
                
                    [05:20] (320.96s)
                    the 4th of July and it's in full of
                 
            
                
                    [05:24] (324.48s)
                    enormous complexity. Humans have about
                 
            
                
                    [05:27] (327.36s)
                    20,000 different types of proteins.
                 
            
                
                    [05:30] (330.00s)
                    Those are some of the blobs you see on
                 
            
                
                    [05:31] (331.60s)
                    the screen. They come together to do
                 
            
                
                    [05:33] (333.52s)
                    practically every function in your cell.
                 
            
                
                    [05:36] (336.16s)
                    You can see that uh kind of green tail
                 
            
                
                    [05:38] (338.88s)
                    is the psyllium of uh an ecoli. That's
                 
            
                
                    [05:42] (342.80s)
                    how it moves around. And you can see in
                 
            
                
                    [05:44] (344.96s)
                    fact how it moves around. And you can
                 
            
                
                    [05:46] (346.64s)
                    see that thing that looks like it turns
                 
            
                
                    [05:48] (348.08s)
                    and in fact it turns and drives this
                 
            
                
                    [05:50] (350.72s)
                    motor. All of this is made of proteins.
                 
            
                
                    [05:52] (352.56s)
                    When people say that DNA is the
                 
            
                
                    [05:55] (355.52s)
                    instruction manual for life, well, this
                 
            
                
                    [05:57] (357.28s)
                    is what it's telling you how to do. It's
                 
            
                
                    [05:59] (359.76s)
                    telling you how to build these tiny
                 
            
                
                    [06:01] (361.76s)
                    machines. And biology has evolved an
                 
            
                
                    [06:04] (364.48s)
                    incredible mechanism to build the
                 
            
                
                    [06:07] (367.20s)
                    machines it needs, literal nano
                 
            
                
                    [06:09] (369.12s)
                    machines, and build them out of atoms.
                 
            
                
                    [06:11] (371.44s)
                    And so your DNA gives you instructions
                 
            
                
                    [06:13] (373.44s)
                    that say build a protein. Now you might
                 
            
                
                    [06:16] (376.16s)
                    say your DNA is a line and so are
                 
            
                
                    [06:18] (378.80s)
                    proteins in a certain sense. It's
                 
            
                
                    [06:20] (380.32s)
                    instructions on how to attach one bead
                 
            
                
                    [06:22] (382.32s)
                    after another where each bead is a
                 
            
                
                    [06:24] (384.64s)
                    specific kind of molecular arrangement
                 
            
                
                    [06:26] (386.32s)
                    of atoms. And you should wonder if I my
                 
            
                
                    [06:30] (390.08s)
                    DNA is aligned and I am very much not
                 
            
                
                    [06:32] (392.48s)
                    one-dimensional,
                 
            
                
                    [06:34] (394.24s)
                    what happens in between? And the answer
                 
            
                
                    [06:35] (395.92s)
                    is after you make this protein and
                 
            
                
                    [06:38] (398.72s)
                    assemble it one piece at a time, it will
                 
            
                
                    [06:41] (401.60s)
                    fold up spontaneously
                 
            
                
                    [06:43] (403.68s)
                    into a shape like you've opened your
                 
            
                
                    [06:46] (406.48s)
                    IKEA bookshelf and instead of having to
                 
            
                
                    [06:48] (408.48s)
                    do the hard work, it simply builds
                 
            
                
                    [06:50] (410.00s)
                    itself and you get this quite complex
                 
            
                
                    [06:52] (412.72s)
                    structure. You can see quite typical
                 
            
                
                    [06:55] (415.04s)
                    protein, a kynise for those of you who
                 
            
                
                    [06:56] (416.80s)
                    are biologists in the audience over
                 
            
                
                    [06:58] (418.88s)
                    there. And you can see this very complex
                 
            
                
                    [07:00] (420.72s)
                    arrangement of atoms and that
                 
            
                
                    [07:02] (422.72s)
                    arrangement is functional and and the
                 
            
                
                    [07:06] (426.24s)
                    majority not everyone of the proteins uh
                 
            
                
                    [07:08] (428.80s)
                    in your body undergo this transformation
                 
            
                
                    [07:11] (431.28s)
                    and that is what functions and that is
                 
            
                
                    [07:13] (433.44s)
                    incredibly small.
                 
            
                
                    [07:15] (435.60s)
                    So light itself is a few hundred
                 
            
                
                    [07:19] (439.28s)
                    nanometers in size and that's a few
                 
            
                
                    [07:22] (442.16s)
                    nanometers in size. So it's smaller than
                 
            
                
                    [07:24] (444.16s)
                    you can see in a microscope. And for a
                 
            
                
                    [07:26] (446.80s)
                    long time scientists have wanted to
                 
            
                
                    [07:28] (448.72s)
                    understand this structure because they
                 
            
                
                    [07:31] (451.12s)
                    use it to predict how changes in that
                 
            
                
                    [07:34] (454.16s)
                    protein might affect disease. How does
                 
            
                
                    [07:37] (457.44s)
                    that work? How does biology work? Often
                 
            
                
                    [07:39] (459.44s)
                    if you make a drug it is to interrupt
                 
            
                
                    [07:40] (460.96s)
                    the function of a certain protein like
                 
            
                
                    [07:42] (462.56s)
                    this one.
                 
            
                
                    [07:44] (464.64s)
                    Now scientists have through an
                 
            
                
                    [07:47] (467.36s)
                    incredible amount of cleverness figured
                 
            
                
                    [07:49] (469.52s)
                    out the structure of lots of proteins
                 
            
                
                    [07:51] (471.92s)
                    and it remains to this day exceptionally
                 
            
                
                    [07:54] (474.64s)
                    difficult. Right? You shouldn't imagine
                 
            
                
                    [07:56] (476.88s)
                    this as I want to determine the
                 
            
                
                    [07:59] (479.84s)
                    structure of a protein. So I shall open
                 
            
                
                    [08:01] (481.92s)
                    the lab protocol for protein structure
                 
            
                
                    [08:03] (483.92s)
                    determination. I shall follow the steps.
                 
            
                
                    [08:06] (486.88s)
                    It consists of cleverness of ideas of
                 
            
                
                    [08:10] (490.24s)
                    finding many ways. In this case, I'm
                 
            
                
                    [08:12] (492.08s)
                    describing one type of protein structure
                 
            
                
                    [08:14] (494.32s)
                    prediction in or protein structure,
                 
            
                
                    [08:16] (496.48s)
                    sorry, determination, experimental
                 
            
                
                    [08:17] (497.92s)
                    measurement, where you convince that big
                 
            
                
                    [08:19] (499.92s)
                    ugly molecule I just showed you to form
                 
            
                
                    [08:22] (502.24s)
                    a regular crystal kind of like table
                 
            
                
                    [08:24] (504.00s)
                    salt. No one has an easy recipe for
                 
            
                
                    [08:26] (506.88s)
                    this. So, they try many things. They
                 
            
                
                    [08:28] (508.64s)
                    have ideas and it's exceptionally
                 
            
                
                    [08:32] (512.08s)
                    difficult and filled with failure like
                 
            
                
                    [08:34] (514.40s)
                    many things in science.
                 
            
                
                    [08:36] (516.80s)
                    And you're really looking at
                 
            
                
                    [08:40] (520.32s)
                    kind of one way to get an idea of how
                 
            
                
                    [08:42] (522.24s)
                    difficult this is. Just one kind of
                 
            
                
                    [08:43] (523.92s)
                    ordinary paper that we were using. I
                 
            
                
                    [08:45] (525.68s)
                    flipped to the back and it said, you
                 
            
                
                    [08:48] (528.08s)
                    know, in their protocol, after more than
                 
            
                
                    [08:49] (529.68s)
                    a year, crystals began to form. Right?
                 
            
                
                    [08:52] (532.64s)
                    So, not only did they do all these hard
                 
            
                
                    [08:54] (534.40s)
                    experiments, but they had to wait about
                 
            
                
                    [08:56] (536.24s)
                    a year to find out if it worked. And
                 
            
                
                    [08:58] (538.08s)
                    probably that year wasn't spent waiting.
                 
            
                
                    [08:59] (539.76s)
                    It was trying a thousand other things
                 
            
                
                    [09:01] (541.44s)
                    that didn't work as well.
                 
            
                
                    [09:03] (543.92s)
                    Once you do that, you can take this to a
                 
            
                
                    [09:06] (546.72s)
                    uh synretron, a modest thing. You can
                 
            
                
                    [09:09] (549.44s)
                    see the cars rigging the outside of this
                 
            
                
                    [09:11] (551.44s)
                    instrument so that you can shine
                 
            
                
                    [09:13] (553.28s)
                    incredibly bright X-rays on it and get
                 
            
                
                    [09:15] (555.92s)
                    what is called a defraction pattern and
                 
            
                
                    [09:18] (558.08s)
                    you can solve that and you can deposit
                 
            
                
                    [09:20] (560.96s)
                    it in what's called the PDB or the
                 
            
                
                    [09:22] (562.96s)
                    protein datab bank. And one of the
                 
            
                
                    [09:24] (564.96s)
                    things that enabled the work we did is
                 
            
                
                    [09:27] (567.52s)
                    that scientists 50 years ago had the
                 
            
                
                    [09:29] (569.84s)
                    foresight to say these are important,
                 
            
                
                    [09:32] (572.40s)
                    these are hard. We should collect them
                 
            
                
                    [09:35] (575.04s)
                    all in one place. So there's a data set
                 
            
                
                    [09:37] (577.68s)
                    that represents ex essentially all the
                 
            
                
                    [09:40] (580.32s)
                    academic output of protein structures in
                 
            
                
                    [09:43] (583.20s)
                    the community and available to everyone.
                 
            
                
                    [09:46] (586.08s)
                    So our work was on very public data.
                 
            
                
                    [09:48] (588.96s)
                    About 200,000 protein structures are
                 
            
                
                    [09:51] (591.20s)
                    known. They pretty regularly increase at
                 
            
                
                    [09:53] (593.84s)
                    about 12,000 a year.
                 
            
                
                    [09:57] (597.12s)
                    But this is much much smaller than the
                 
            
                
            
                
                    [10:01] (601.20s)
                    Getting the kind of input information,
                 
            
                
                    [10:03] (603.76s)
                    the DNA that tells you about a protein
                 
            
                
                    [10:06] (606.24s)
                    is much much much much easier. So
                 
            
                
                    [10:09] (609.28s)
                    billions of protein sequences are being
                 
            
                
                    [10:12] (612.40s)
                    discovered. About 3,000 times faster are
                 
            
                
                    [10:14] (614.96s)
                    we learning about protein sequence than
                 
            
                
                    [10:16] (616.64s)
                    protein structure.
                 
            
                
                    [10:18] (618.80s)
                    Okay, that's all scientific content, but
                 
            
                
                    [10:21] (621.76s)
                    I should talk to you about the little
                 
            
                
                    [10:24] (624.32s)
                    thing we did which has this kind of
                 
            
                
                    [10:26] (626.40s)
                    schematic diagram.
                 
            
                
                    [10:28] (628.64s)
                    We wanted to build an AI system. In
                 
            
                
                    [10:31] (631.28s)
                    fact, we didn't even care if it was an
                 
            
                
                    [10:32] (632.64s)
                    AI system. That's one of the nice things
                 
            
                
                    [10:35] (635.20s)
                    about uh working in AI for science is
                 
            
                
                    [10:37] (637.92s)
                    you don't care how you solve it. If it
                 
            
                
                    [10:39] (639.52s)
                    ended up being a computer program, if it
                 
            
                
                    [10:41] (641.04s)
                    ended up being anything else, we want to
                 
            
                
                    [10:43] (643.04s)
                    find some way to get from the left where
                 
            
                
                    [10:46] (646.00s)
                    each of those letters represents a
                 
            
                
                    [10:47] (647.68s)
                    specific building block of the protein
                 
            
                
                    [10:49] (649.68s)
                    considered an order. We want to put
                 
            
                
                    [10:51] (651.68s)
                    something in the middle in the alpha
                 
            
                
                    [10:53] (653.76s)
                    fold and we want to end up with
                 
            
                
                    [10:55] (655.60s)
                    something on the right. And you'll see
                 
            
                
                    [10:57] (657.76s)
                    uh two structures there if you look
                 
            
                
                    [10:59] (659.36s)
                    closely where the blue is our prediction
                 
            
                
                    [11:02] (662.56s)
                    and the green is the experimental
                 
            
                
                    [11:04] (664.40s)
                    structure that took someone a year or
                 
            
                
                    [11:06] (666.08s)
                    two of effort. If you want to put an
                 
            
                
                    [11:07] (667.92s)
                    economic value on it on the order of
                 
            
                
                    [11:10] (670.80s)
                    $100,000
                 
            
                
                    [11:13] (673.04s)
                    and you can see we were able to do this
                 
            
                
                    [11:16] (676.40s)
                    and I want to tell you how
                 
            
                
                    [11:19] (679.12s)
                    and there were really three components
                 
            
                
                    [11:21] (681.84s)
                    to doing this or to do any machine
                 
            
                
                    [11:23] (683.68s)
                    learning problem and you can say you
                 
            
                
                    [11:25] (685.92s)
                    have data and you have compute and you
                 
            
                
                    [11:27] (687.60s)
                    have research
                 
            
                
                    [11:29] (689.68s)
                    and I feel like we tell too many stories
                 
            
                
                    [11:32] (692.72s)
                    about the first two and not enough about
                 
            
                
                    [11:34] (694.72s)
                    the third. In data, we had 200,000
                 
            
                
                    [11:37] (697.92s)
                    protein structures. Everyone has the
                 
            
                
                    [11:40] (700.08s)
                    same data.
                 
            
                
                    [11:41] (701.92s)
                    In terms of compute, this isn't LLM
                 
            
                
                    [11:44] (704.80s)
                    scale. It's the final model itself was
                 
            
                
                    [11:48] (708.64s)
                    128 TPU v3 cores, roughly equivalent to
                 
            
                
                    [11:52] (712.16s)
                    a GPU per core for two weeks. This is
                 
            
                
                    [11:55] (715.52s)
                    again within the scope of say academic
                 
            
                
                    [11:58] (718.56s)
                    resources but it's worth saying really
                 
            
                
                    [12:01] (721.68s)
                    most of your compute when you think
                 
            
                
                    [12:03] (723.12s)
                    about how much compute you need don't
                 
            
                
                    [12:04] (724.56s)
                    get distracted by the number for the
                 
            
                
                    [12:06] (726.16s)
                    final model the real cost of compute is
                 
            
                
                    [12:08] (728.80s)
                    the cost of ideas that didn't work all
                 
            
                
                    [12:12] (732.00s)
                    the things you had to do to get there
                 
            
                
                    [12:14] (734.08s)
                    and then finally research and I would
                 
            
                
                    [12:15] (735.84s)
                    say this is all but about two people
                 
            
                
                    [12:19] (739.20s)
                    that worked on this it's a small group
                 
            
                
                    [12:21] (741.28s)
                    of people that end up doing this So
                 
            
                
                    [12:24] (744.72s)
                    really when you look at these machine
                 
            
                
                    [12:26] (746.16s)
                    learning breakthroughs they're probably
                 
            
                
                    [12:28] (748.40s)
                    fewer people than you imagine and really
                 
            
                
                    [12:31] (751.28s)
                    this is where our work was
                 
            
                
                    [12:33] (753.36s)
                    differentiated. We came up with a new
                 
            
                
                    [12:35] (755.12s)
                    set of ideas on how do we bring machine
                 
            
                
                    [12:39] (759.04s)
                    learning to this problem and I can say
                 
            
                
                    [12:41] (761.68s)
                    earlier systems largely based on
                 
            
                
                    [12:44] (764.40s)
                    convolutional neural networks did okay.
                 
            
                
                    [12:46] (766.72s)
                    They certainly made progress. If you
                 
            
                
                    [12:48] (768.64s)
                    replace that with a transformer you're
                 
            
                
                    [12:50] (770.24s)
                    honestly about the same. If you take the
                 
            
                
                    [12:52] (772.72s)
                    ideas of a transformer and much
                 
            
                
                    [12:54] (774.56s)
                    experimentation and many more ideas,
                 
            
                
                    [12:57] (777.28s)
                    then that's when you start to get real
                 
            
                
                    [12:59] (779.76s)
                    change. And in almost all the AI systems
                 
            
                
                    [13:03] (783.44s)
                    you can see today, a tremendous amount
                 
            
                
                    [13:05] (785.92s)
                    of research and ideas and what I would
                 
            
                
                    [13:07] (787.76s)
                    call midscale ideas are involved. It
                 
            
                
                    [13:10] (790.64s)
                    isn't just about the headlines where
                 
            
                
                    [13:12] (792.88s)
                    people will say transformers,
                 
            
                
                    [13:15] (795.68s)
                    you know, scaling, test time inference.
                 
            
                
                    [13:18] (798.08s)
                    These are all important but they're one
                 
            
                
                    [13:20] (800.08s)
                    of many ingredients in a really powerful
                 
            
                
                    [13:22] (802.88s)
                    system and in fact we can measure how
                 
            
                
                    [13:26] (806.00s)
                    much our research was worth. So someone
                 
            
                
                    [13:29] (809.36s)
                    Alphafold 2 is the system that is quite
                 
            
                
                    [13:31] (811.36s)
                    famous the one that uh was quite a large
                 
            
                
                    [13:33] (813.68s)
                    improvement. Alpha fold one was the best
                 
            
                
                    [13:35] (815.36s)
                    in the world but someone did uh the
                 
            
                
                    [13:37] (817.76s)
                    Alcesi lab did a very uh careful
                 
            
                
                    [13:40] (820.08s)
                    experiment where they took Alphold 2 the
                 
            
                
                    [13:43] (823.44s)
                    architecture and they trained it on 1%
                 
            
                
                    [13:46] (826.40s)
                    of the available data and they could
                 
            
                
                    [13:48] (828.80s)
                    show that alpha fold 2 trained on 1% of
                 
            
                
                    [13:51] (831.36s)
                    the data was as accurate or more
                 
            
                
                    [13:54] (834.08s)
                    accurate as alphafold one which was the
                 
            
                
                    [13:56] (836.16s)
                    state-of-the-art system previously. So
                 
            
                
                    [13:58] (838.56s)
                    there's a very clean thing that says
                 
            
                
                    [14:00] (840.40s)
                    that the third uh the third of these
                 
            
                
                    [14:03] (843.60s)
                    ingredients research was worth a
                 
            
                
                    [14:06] (846.16s)
                    hundfold of the first of these
                 
            
                
                    [14:08] (848.00s)
                    ingredients data. And I think this is
                 
            
                
                    [14:10] (850.64s)
                    generally really really important that
                 
            
                
                    [14:13] (853.52s)
                    one of the big as you're all thinking as
                 
            
                
                    [14:16] (856.16s)
                    you're all in startups or thinking about
                 
            
                
                    [14:18] (858.08s)
                    startups think about the amount to which
                 
            
                
                    [14:21] (861.76s)
                    ideas research discoveries amplify data
                 
            
                
                    [14:26] (866.64s)
                    amplify compute they work together with
                 
            
                
                    [14:28] (868.64s)
                    it we wouldn't want to use less data
                 
            
                
                    [14:30] (870.48s)
                    than we have we wouldn't want to use
                 
            
                
                    [14:31] (871.92s)
                    less compute than we have available but
                 
            
                
                    [14:35] (875.36s)
                    ideas are a core component when you're
                 
            
                
                    [14:37] (877.68s)
                    doing machine learning research and they
                 
            
                
                    [14:39] (879.28s)
                    really helped to transform the world.
                 
            
                
                    [14:41] (881.76s)
                    YC's Next Batch is now taking
                 
            
                
                    [14:44] (884.08s)
                    applications. Got a startup in you?
                 
            
                
                    [14:46] (886.32s)
                    Apply at y combinator.com/apply.
                 
            
                
                    [14:49] (889.28s)
                    It's never too early. And filling out
                 
            
                
                    [14:51] (891.36s)
                    the app will level up your idea. Okay,
                 
            
                
                    [14:54] (894.32s)
                    back to the video. We can even go back
                 
            
                
                    [14:56] (896.64s)
                    and we can do ablations and we can say
                 
            
                
                    [14:58] (898.40s)
                    what parts matter. And don't focus too
                 
            
                
                    [15:00] (900.24s)
                    much on the details. We pulled this from
                 
            
                
                    [15:01] (901.92s)
                    our paper. You can see here this is the
                 
            
                
                    [15:04] (904.56s)
                    difference compared to the baseline. And
                 
            
                
                    [15:06] (906.40s)
                    you take either of those and you can see
                 
            
                
                    [15:08] (908.80s)
                    that each of the ideas that you might
                 
            
                
                    [15:10] (910.64s)
                    remove from our final system kind of
                 
            
                
                    [15:12] (912.48s)
                    discreet identifiable ideas some of
                 
            
                
                    [15:15] (915.04s)
                    which were incredibly popular research
                 
            
                
                    [15:18] (918.56s)
                    areas within the field like this work
                 
            
                
                    [15:20] (920.88s)
                    came out and a part of it was
                 
            
                
                    [15:22] (922.40s)
                    equivariant and people said equivariance
                 
            
                
                    [15:25] (925.52s)
                    that is the answer alphafold is an
                 
            
                
                    [15:27] (927.52s)
                    equivariant system and it's great we
                 
            
                
                    [15:29] (929.60s)
                    must do more research on equivarians to
                 
            
                
                    [15:31] (931.52s)
                    get even more great systems well I was
                 
            
                
                    [15:34] (934.48s)
                    very confused by this because the sixth
                 
            
                
                    [15:37] (937.92s)
                    uh row there no IPA invariant point
                 
            
                
                    [15:40] (940.80s)
                    attention that removes all the
                 
            
                
                    [15:42] (942.56s)
                    equavariance in alpha fold and it hurts
                 
            
                
                    [15:45] (945.60s)
                    a bit but only a bit. Alpha fold itself
                 
            
                
                    [15:48] (948.80s)
                    on this GDT scale that you can see on
                 
            
                
                    [15:51] (951.12s)
                    the left graph. Alphafold 2 was about 30
                 
            
                
                    [15:54] (954.08s)
                    GDT better than alphafold one and
                 
            
                
                    [15:57] (957.44s)
                    equivariance explains two or three of
                 
            
                
                    [15:59] (959.68s)
                    this. It isn't about one idea. It's
                 
            
                
                    [16:02] (962.48s)
                    about many midscale ideas that add up to
                 
            
                
                    [16:05] (965.04s)
                    a transformative system. And it's very
                 
            
                
                    [16:07] (967.68s)
                    very important when you're building
                 
            
                
                    [16:08] (968.88s)
                    these systems to think about what we
                 
            
                
                    [16:11] (971.20s)
                    would call in this context biological
                 
            
                
                    [16:13] (973.04s)
                    relevance. We would have ideas that were
                 
            
                
                    [16:15] (975.28s)
                    better. We kind of got our system
                 
            
                
                    [16:18] (978.24s)
                    grinding 1% at a time. But what really
                 
            
                
                    [16:21] (981.68s)
                    mattered was when we crossed the
                 
            
                
                    [16:23] (983.52s)
                    accuracy that it mattered to an
                 
            
                
                    [16:25] (985.68s)
                    experimental biologist who didn't care
                 
            
                
                    [16:27] (987.36s)
                    about machine learning. And you have to
                 
            
                
                    [16:29] (989.76s)
                    get there through a lot of work and a
                 
            
                
                    [16:31] (991.76s)
                    lot of effort. And when you do, it is
                 
            
                
                    [16:33] (993.84s)
                    incredibly transformative. And we can
                 
            
                
                    [16:36] (996.48s)
                    measure against uh this axis where the
                 
            
                
                    [16:38] (998.72s)
                    dark blue axis the other systems
                 
            
                
                    [16:40] (1000.88s)
                    available at the time. And this was
                 
            
                
                    [16:42] (1002.32s)
                    assessed. Protein structure prediction
                 
            
                
                    [16:45] (1005.12s)
                    is in some ways far ahead of uh LLMs or
                 
            
                
                    [16:49] (1009.20s)
                    the general machine learning space and
                 
            
                
                    [16:50] (1010.72s)
                    having blind assessment. Since 1994,
                 
            
                
                    [16:53] (1013.76s)
                    every two years, everyone interested in
                 
            
                
                    [16:55] (1015.44s)
                    predicting the structure of proteins
                 
            
                
                    [16:57] (1017.44s)
                    gets together and predicts the structure
                 
            
                
                    [16:58] (1018.96s)
                    of a hundred proteins whose answer isn't
                 
            
                
                    [17:00] (1020.88s)
                    known to anyone except the research
                 
            
                
                    [17:02] (1022.32s)
                    group that just solved it, right?
                 
            
                
                    [17:04] (1024.40s)
                    Unpublished. And so, you really do know
                 
            
                
                    [17:06] (1026.24s)
                    what works. And we had about a third of
                 
            
                
                    [17:08] (1028.40s)
                    the error of any other group on this
                 
            
                
                    [17:10] (1030.88s)
                    assessment. But it matters because once
                 
            
                
                    [17:13] (1033.60s)
                    you are working on problems in which you
                 
            
                
                    [17:15] (1035.20s)
                    don't know the answer, you get to really
                 
            
                
                    [17:16] (1036.72s)
                    measure how good things are. And you can
                 
            
                
                    [17:19] (1039.36s)
                    really find that a lot of systems don't
                 
            
                
                    [17:21] (1041.60s)
                    live up to what people believe over the
                 
            
                
                    [17:24] (1044.40s)
                    course of their research. And because
                 
            
                
                    [17:26] (1046.32s)
                    even if you have a benchmark, we all
                 
            
                
                    [17:28] (1048.32s)
                    overfit to our ideas to the benchmark,
                 
            
                
                    [17:31] (1051.44s)
                    right? Unless you have held out. And in
                 
            
                
                    [17:33] (1053.84s)
                    fact, the problems you have in the real
                 
            
                
                    [17:36] (1056.32s)
                    world are almost always harder than the
                 
            
                
                    [17:38] (1058.16s)
                    problems you train on, right? Because
                 
            
                
                    [17:40] (1060.16s)
                    you have to learn from much data and you
                 
            
                
                    [17:41] (1061.84s)
                    apply it to very important singular
                 
            
                
                    [17:44] (1064.16s)
                    problems. So it is very very important
                 
            
                
                    [17:46] (1066.48s)
                    that you measure well both as you're
                 
            
                
                    [17:48] (1068.48s)
                    developing and when people are trying to
                 
            
                
                    [17:50] (1070.64s)
                    decide whether they should use your
                 
            
                
                    [17:52] (1072.32s)
                    system. External benchmarks are
                 
            
                
                    [17:54] (1074.64s)
                    absolutely critical to figuring out what
                 
            
                
                    [17:57] (1077.76s)
                    works and that's what really helps drive
                 
            
                
                    [18:00] (1080.00s)
                    the world forward. So just some
                 
            
                
                    [18:02] (1082.24s)
                    wonderful examples of this is typical
                 
            
                
                    [18:04] (1084.00s)
                    performance for us. These are blind
                 
            
                
                    [18:05] (1085.92s)
                    predictions. You can see they're pretty
                 
            
                
                    [18:07] (1087.84s)
                    darn good. also important we made it
                 
            
                
                    [18:10] (1090.40s)
                    available and we thought it was and we
                 
            
                
                    [18:12] (1092.08s)
                    did a lot of assessment but we decided
                 
            
                
                    [18:13] (1093.60s)
                    that it was very important to make it
                 
            
                
                    [18:15] (1095.44s)
                    available in two ways. One is that we
                 
            
                
                    [18:17] (1097.12s)
                    open source the code and we actually
                 
            
                
                    [18:18] (1098.48s)
                    open sourced the code about a week
                 
            
                
                    [18:19] (1099.84s)
                    before we released a database of
                 
            
                
                    [18:22] (1102.80s)
                    predictions starting originally at
                 
            
                
                    [18:24] (1104.40s)
                    300,000 predictions and later going to
                 
            
                
                    [18:26] (1106.48s)
                    200 million essentially every protein um
                 
            
                
                    [18:29] (1109.76s)
                    from an organism whose genome has been
                 
            
                
                    [18:31] (1111.76s)
                    sequenced. And this made an enormous
                 
            
                
                    [18:34] (1114.00s)
                    difference. And one of the most
                 
            
                
                    [18:35] (1115.04s)
                    interesting kind of sociological things
                 
            
                
                    [18:36] (1116.72s)
                    is this huge difference between when we
                 
            
                
                    [18:39] (1119.20s)
                    released a piece of code that
                 
            
                
                    [18:40] (1120.48s)
                    specialists could use and we got some
                 
            
                
                    [18:43] (1123.20s)
                    information and then when we made it
                 
            
                
                    [18:44] (1124.80s)
                    available to the world in this database
                 
            
                
                    [18:48] (1128.72s)
                    form. It was really interesting kind of
                 
            
                
                    [18:51] (1131.52s)
                    you know you release something and every
                 
            
                
                    [18:52] (1132.80s)
                    day you check Twitter to find out or
                 
            
                
                    [18:54] (1134.72s)
                    check X to find out what's going on. And
                 
            
                
                    [18:58] (1138.08s)
                    what we would really see is even after
                 
            
                
                    [19:01] (1141.44s)
                    that CASP assessment, I would say that
                 
            
                
                    [19:03] (1143.92s)
                    the structure predictors were convinced
                 
            
                
                    [19:05] (1145.76s)
                    this obviously was this enormous advance
                 
            
                
                    [19:08] (1148.96s)
                    solved the problem. But general
                 
            
                
                    [19:10] (1150.96s)
                    biologists, the people we wanted to use,
                 
            
                
                    [19:12] (1152.48s)
                    the people who didn't care about
                 
            
                
                    [19:13] (1153.36s)
                    structure prediction, they cared about
                 
            
                
                    [19:14] (1154.64s)
                    proteins to do their experiments, they
                 
            
                
                    [19:16] (1156.96s)
                    weren't as sure. They said, "Well, maybe
                 
            
                
                    [19:18] (1158.32s)
                    CASP was easy. I don't know." And then
                 
            
                
                    [19:21] (1161.04s)
                    this database came out and people got
                 
            
                
                    [19:23] (1163.36s)
                    curious and they clicked in and the
                 
            
                
                    [19:26] (1166.64s)
                    amount to which the proof was social was
                 
            
                
                    [19:28] (1168.80s)
                    extraordinary that people would look and
                 
            
                
                    [19:31] (1171.52s)
                    say how did deep mind get access to my
                 
            
                
                    [19:34] (1174.08s)
                    unpublished structure. you know, this
                 
            
                
                    [19:36] (1176.72s)
                    moment at which they really believed it
                 
            
                
                    [19:38] (1178.16s)
                    that everyone had a a protein either had
                 
            
                
                    [19:41] (1181.60s)
                    a protein that they hadn't solved or had
                 
            
                
                    [19:43] (1183.44s)
                    a friend who had a protein that was
                 
            
                
                    [19:45] (1185.20s)
                    unpublished and they could compare and
                 
            
                
                    [19:47] (1187.28s)
                    that's what really made the difference.
                 
            
                
                    [19:49] (1189.04s)
                    And having this database, this
                 
            
                
                    [19:50] (1190.56s)
                    accessibility, this ease led everyone to
                 
            
                
                    [19:53] (1193.84s)
                    try it and figure out how it worked.
                 
            
                
                    [19:56] (1196.56s)
                    Word of mouth is really how this trust
                 
            
                
                    [19:59] (1199.12s)
                    is built. And you can kind of see some
                 
            
                
                    [20:00] (1200.88s)
                    of these testimonials, right? I wrestled
                 
            
                
                    [20:03] (1203.52s)
                    for three to four months trying to do
                 
            
                
                    [20:06] (1206.00s)
                    this uh scientific task. You know, this
                 
            
                
                    [20:09] (1209.44s)
                    morning I got an alpha fold prediction
                 
            
                
                    [20:11] (1211.60s)
                    and now it's much better. I want my time
                 
            
                
                    [20:14] (1214.72s)
                    back, right? You know, you really
                 
            
                
                    [20:17] (1217.84s)
                    appreciate alphafold when you run it on
                 
            
                
                    [20:19] (1219.76s)
                    a protein that for a year refused to get
                 
            
                
                    [20:22] (1222.40s)
                    expressed and purified. Meaning they for
                 
            
                
                    [20:24] (1224.16s)
                    a year they couldn't even get the
                 
            
                
                    [20:25] (1225.28s)
                    material to start experiments. These are
                 
            
                
                    [20:27] (1227.76s)
                    really important. When you build the
                 
            
                
                    [20:29] (1229.28s)
                    right tool, when you solve the right
                 
            
                
                    [20:30] (1230.96s)
                    problem, it matters and it changes the
                 
            
                
                    [20:34] (1234.24s)
                    lives of people who are doing things not
                 
            
                
                    [20:37] (1237.20s)
                    that you would do but building on top of
                 
            
                
                    [20:39] (1239.76s)
                    your work. And I think it's just
                 
            
                
                    [20:41] (1241.68s)
                    extraordinary to see these and the
                 
            
                
                    [20:43] (1243.52s)
                    number of people I talked to. The time
                 
            
                
                    [20:45] (1245.92s)
                    that I really knew this tool mattered.
                 
            
                
                    [20:47] (1247.92s)
                    In fact, there was a special issue of
                 
            
                
                    [20:49] (1249.36s)
                    science on the nuclear pore complex a
                 
            
                
                    [20:51] (1251.44s)
                    few months after the tool came out. And
                 
            
                
                    [20:54] (1254.96s)
                    the special issue was all about this
                 
            
                
                    [20:56] (1256.96s)
                    particular very large kind of several
                 
            
                
                    [20:59] (1259.36s)
                    hundred protein system. And three out of
                 
            
                
                    [21:02] (1262.48s)
                    the four uh papers in science about this
                 
            
                
                    [21:05] (1265.76s)
                    made extensive use of alpha fold. I
                 
            
                
                    [21:07] (1267.44s)
                    think I counted over a hundred mentions
                 
            
                
                    [21:08] (1268.88s)
                    of the word alphafold in science and we
                 
            
                
                    [21:11] (1271.36s)
                    had nothing to do with it. We didn't
                 
            
                
                    [21:12] (1272.64s)
                    know it was happening. We weren't
                 
            
                
                    [21:14] (1274.16s)
                    collaborating. It was just people doing
                 
            
                
                    [21:16] (1276.56s)
                    new science on top of the tools we had
                 
            
                
                    [21:18] (1278.56s)
                    built and that is the greatest feeling
                 
            
                
                    [21:19] (1279.84s)
                    in the world. And in fact, users do the
                 
            
                
                    [21:22] (1282.32s)
                    darnest things. They will use tools in
                 
            
                
                    [21:25] (1285.04s)
                    ways you didn't know were possible. The
                 
            
                
                    [21:28] (1288.56s)
                    tweet on the left from Yoshaka Morowaki
                 
            
                
                    [21:31] (1291.68s)
                    came out two days after our code was
                 
            
                
                    [21:33] (1293.68s)
                    available. We had predicted the
                 
            
                
                    [21:35] (1295.92s)
                    structure of individual proteins, but we
                 
            
                
                    [21:37] (1297.68s)
                    consider we were working on building a
                 
            
                
                    [21:39] (1299.44s)
                    system that would predict how proteins
                 
            
                
                    [21:40] (1300.88s)
                    came together. But uh this researcher
                 
            
                
                    [21:43] (1303.52s)
                    said, "Well, I have alphapold. Why don't
                 
            
                
                    [21:45] (1305.52s)
                    I just put two proteins together and
                 
            
                
                    [21:47] (1307.20s)
                    I'll put something in between?" You
                 
            
                
                    [21:49] (1309.04s)
                    could think of this as prompt
                 
            
                
                    [21:50] (1310.16s)
                    engineering but for proteins. And
                 
            
                
                    [21:52] (1312.56s)
                    suddenly they find out this is the best
                 
            
                
                    [21:54] (1314.16s)
                    protein interaction prediction in the
                 
            
                
                    [21:56] (1316.00s)
                    world, right? That when you train on
                 
            
                
                    [21:58] (1318.32s)
                    these a really really powerful system,
                 
            
                
                    [22:00] (1320.72s)
                    it will have additional in some sense
                 
            
                
                    [22:03] (1323.12s)
                    emergent skills as long as they're
                 
            
                
                    [22:05] (1325.20s)
                    aligned. People started to find all
                 
            
                
                    [22:07] (1327.36s)
                    sorts of problems that Alphafold would
                 
            
                
                    [22:11] (1331.12s)
                    work on that we hadn't anticipated. It
                 
            
                
                    [22:13] (1333.68s)
                    was so interesting to see the field of
                 
            
                
                    [22:16] (1336.72s)
                    science in real time reacting to the
                 
            
                
                    [22:19] (1339.04s)
                    existence of these tools, finding their
                 
            
                
                    [22:20] (1340.72s)
                    limitations, finding their possibilities
                 
            
                
                    [22:24] (1344.08s)
                    and this continues and people do all
                 
            
                
                    [22:26] (1346.96s)
                    sorts of exciting work be it in protein
                 
            
                
                    [22:28] (1348.96s)
                    design be it in others on top of either
                 
            
                
                    [22:31] (1351.84s)
                    the ideas and often the systems we have
                 
            
                
                    [22:34] (1354.00s)
                    built. One application that really uh I
                 
            
                
                    [22:39] (1359.04s)
                    thought was really important is that
                 
            
                
                    [22:41] (1361.28s)
                    people have started to learn how to use
                 
            
                
                    [22:43] (1363.12s)
                    it to engineer big proteins or to use it
                 
            
                
                    [22:46] (1366.40s)
                    in part of and I want to tell this story
                 
            
                
                    [22:48] (1368.40s)
                    for two reasons. One is I think it's a
                 
            
                
                    [22:50] (1370.00s)
                    really cool application but the second
                 
            
                
                    [22:52] (1372.00s)
                    is how it really changes the work of
                 
            
                
                    [22:54] (1374.08s)
                    science and often people will say
                 
            
                
                    [22:57] (1377.28s)
                    science is all about experiments and
                 
            
                
                    [22:59] (1379.36s)
                    validation. So it's great that you have
                 
            
                
                    [23:01] (1381.60s)
                    all these alpha fold predictions. Now
                 
            
                
                    [23:03] (1383.68s)
                    all we have to do is solve all the
                 
            
                
                    [23:05] (1385.52s)
                    proteins the classic way so that we can
                 
            
                
                    [23:08] (1388.88s)
                    tell whether your predictions are right
                 
            
                
                    [23:10] (1390.72s)
                    or wrong. And they're right about one
                 
            
                
                    [23:13] (1393.76s)
                    thing. Science is about experiments.
                 
            
                
                    [23:15] (1395.84s)
                    Science is about doing these
                 
            
                
                    [23:17] (1397.20s)
                    experiments.
                 
            
                
                    [23:19] (1399.04s)
                    But they're wrong about another thing.
                 
            
                
                    [23:21] (1401.68s)
                    Um science is about making hypotheses
                 
            
                
                    [23:24] (1404.40s)
                    and testing them not about the structure
                 
            
                
                    [23:27] (1407.60s)
                    of a particular protein. In this case,
                 
            
                
                    [23:29] (1409.28s)
                    the question was they took this protein
                 
            
                
                    [23:32] (1412.16s)
                    on the left called the contractile
                 
            
                
                    [23:34] (1414.64s)
                    inject injection system, but that's a
                 
            
                
                    [23:36] (1416.56s)
                    mouthful. They like to call it the
                 
            
                
                    [23:37] (1417.92s)
                    molecular syringe. And what it does is
                 
            
                
                    [23:40] (1420.48s)
                    it attaches to a cell and injects a
                 
            
                
                    [23:43] (1423.52s)
                    protein into it. And the scientists at
                 
            
                
                    [23:45] (1425.84s)
                    the Jang Lab at uh MIT were saying,
                 
            
                
                    [23:49] (1429.68s)
                    well, can we use this protein
                 
            
                
                    [23:53] (1433.36s)
                    to do targeted drug delivery? Can we use
                 
            
                
                    [23:55] (1435.36s)
                    it to get gene editors like cast 9 into
                 
            
                
                    [23:58] (1438.32s)
                    the cell? They tried over a hundred
                 
            
                
                    [24:01] (1441.04s)
                    methods to figure out how to take this
                 
            
                
                    [24:03] (1443.12s)
                    protein, which they didn't have a
                 
            
                
                    [24:04] (1444.48s)
                    structure of. This is just kind of a
                 
            
                
                    [24:05] (1445.92s)
                    rendition after the fact, and say, how
                 
            
                
                    [24:08] (1448.40s)
                    can we change what it recognizes? I
                 
            
                
                    [24:10] (1450.64s)
                    think it's originally involved in plant
                 
            
                
                    [24:12] (1452.00s)
                    defense or something like that, and they
                 
            
                
                    [24:14] (1454.00s)
                    didn't know how to do it. And they ran
                 
            
                
                    [24:15] (1455.28s)
                    an alpha fold prediction. You can see
                 
            
                
                    [24:16] (1456.88s)
                    the one on the left. I wouldn't even say
                 
            
                
                    [24:18] (1458.00s)
                    it's a great alpha fold prediction, but
                 
            
                
                    [24:20] (1460.16s)
                    almost immediately they looked at that
                 
            
                
                    [24:21] (1461.76s)
                    and said, "Wait a minute. those legs at
                 
            
                
                    [24:23] (1463.92s)
                    the bottom are how it must recognize and
                 
            
                
                    [24:26] (1466.00s)
                    attach to cells. Why don't we just
                 
            
                
                    [24:28] (1468.56s)
                    replace those with a designed protein?
                 
            
                
                    [24:31] (1471.28s)
                    And so almost immediately as soon as
                 
            
                
                    [24:32] (1472.96s)
                    they got the alpha fold prediction, they
                 
            
                
                    [24:34] (1474.48s)
                    re-engineered to add this design protein
                 
            
                
                    [24:36] (1476.96s)
                    that you see in red uh to target a new
                 
            
                
                    [24:40] (1480.88s)
                    type of cell. And they take this system
                 
            
                
                    [24:45] (1485.04s)
                    and then they show in fact that they can
                 
            
                
                    [24:47] (1487.60s)
                    choose cells within a mouse and they can
                 
            
                
                    [24:50] (1490.32s)
                    inject proteins in this case fluorescent
                 
            
                
                    [24:52] (1492.48s)
                    proteins. So there you'll see the color
                 
            
                
                    [24:54] (1494.32s)
                    and they can target the cells they want
                 
            
                
                    [24:56] (1496.08s)
                    within a mouse brain. And so they are
                 
            
                
                    [24:58] (1498.48s)
                    using this to develop a new type of
                 
            
                
                    [25:00] (1500.64s)
                    system
                 
            
                
                    [25:02] (1502.24s)
                    of targeted drug discovery. And we see
                 
            
                
                    [25:05] (1505.12s)
                    many more examples. We see some in which
                 
            
                
                    [25:07] (1507.36s)
                    scientists are using this tool to try
                 
            
                
                    [25:10] (1510.08s)
                    thousands and thousands of interactions
                 
            
                
                    [25:11] (1511.84s)
                    to figure out which ones are likely to
                 
            
                
                    [25:14] (1514.16s)
                    be the case. In fact, discovered a new
                 
            
                
                    [25:16] (1516.32s)
                    component of how eggs and sperm come
                 
            
                
                    [25:18] (1518.96s)
                    together in fertilization. Many many of
                 
            
                
                    [25:21] (1521.44s)
                    these discoveries that are built on top
                 
            
                
                    [25:23] (1523.28s)
                    of this. And I like to think that our
                 
            
                
                    [25:26] (1526.80s)
                    work made the whole field of what's
                 
            
                
                    [25:29] (1529.28s)
                    called structural biology, biology that
                 
            
                
                    [25:31] (1531.04s)
                    deals with structures, you know, five or
                 
            
                
                    [25:33] (1533.76s)
                    10% faster. But the amount to which that
                 
            
                
                    [25:37] (1537.04s)
                    matters for the world is enormous and we
                 
            
                
                    [25:39] (1539.92s)
                    will have more of these discoveries. And
                 
            
                
                    [25:43] (1543.28s)
                    I think ultimately structure prediction
                 
            
                
                    [25:45] (1545.44s)
                    and larger AI for science should be
                 
            
                
                    [25:47] (1547.36s)
                    thought of as an incredible capability
                 
            
                
                    [25:49] (1549.20s)
                    to be an amplifier for the work of
                 
            
                
                    [25:51] (1551.36s)
                    experimentalists that we start from
                 
            
                
                    [25:53] (1553.76s)
                    these scattered observations, these
                 
            
                
                    [25:55] (1555.52s)
                    natural data. This is our equivalent of
                 
            
                
                    [25:58] (1558.32s)
                    all the words on the internet. And then
                 
            
                
                    [26:00] (1560.48s)
                    we train a general model that
                 
            
                
                    [26:02] (1562.08s)
                    understands the rules underneath it and
                 
            
                
                    [26:04] (1564.32s)
                    can fill in the rest of the picture. And
                 
            
                
                    [26:06] (1566.64s)
                    I think that we will continue to see
                 
            
                
                    [26:08] (1568.56s)
                    this pattern and it will get more
                 
            
                
                    [26:10] (1570.40s)
                    general that we will find the right
                 
            
                
                    [26:11] (1571.92s)
                    foundational data sources in order to do
                 
            
                
                    [26:15] (1575.04s)
                    this. And I think the other thing that
                 
            
                
                    [26:17] (1577.28s)
                    has really been a property is that you
                 
            
                
                    [26:20] (1580.72s)
                    start where you have data but then you
                 
            
                
                    [26:22] (1582.88s)
                    find what problems it can be applied to.
                 
            
                
                    [26:25] (1585.84s)
                    And so we find enormous advance,
                 
            
                
                    [26:28] (1588.64s)
                    enormous capability to understand
                 
            
                
                    [26:30] (1590.88s)
                    interactions in the cell or others that
                 
            
                
                    [26:33] (1593.04s)
                    are downstream of extracting the
                 
            
                
                    [26:35] (1595.60s)
                    scientific content of these predictions
                 
            
                
                    [26:39] (1599.12s)
                    and then the rules they use can be
                 
            
                
                    [26:41] (1601.12s)
                    adapted to new purposes. And I think
                 
            
                
                    [26:42] (1602.96s)
                    this is really where we see the
                 
            
                
                    [26:45] (1605.20s)
                    foundational model aspect of alpha fold
                 
            
                
                    [26:47] (1607.92s)
                    or other narrow systems. And in fact, I
                 
            
                
                    [26:50] (1610.08s)
                    think we will start to see this on more
                 
            
                
                    [26:51] (1611.60s)
                    general systems, be them LLMs or others,
                 
            
                
                    [26:54] (1614.24s)
                    that we will find more and more
                 
            
                
                    [26:55] (1615.76s)
                    scientific knowledge within them and
                 
            
                
                    [26:58] (1618.32s)
                    we'll use them for important important
                 
            
                
                    [27:00] (1620.32s)
                    purposes. And I think this is really
                 
            
                
                    [27:03] (1623.04s)
                    where this is going. And I think the
                 
            
                
                    [27:04] (1624.96s)
                    most exciting question in AI for science
                 
            
                
                    [27:08] (1628.08s)
                    is how general will it be. Will we find
                 
            
                
                    [27:10] (1630.64s)
                    a couple of narrow places where we have
                 
            
                
                    [27:12] (1632.80s)
                    transformative impact or will we have
                 
            
                
                    [27:15] (1635.28s)
                    very very broad systems? And I expect it
                 
            
                
                    [27:17] (1637.28s)
                    will ultimately be the latter as we
                 
            
                
                    [27:19] (1639.36s)
                    figure it out. Thank you.