The New Code — Sean Grove, OpenAI

[00:00] (0.33s)

[Music]

[00:15] (15.48s)

[Music]

[00:24] (24.32s)

Hello everyone. Thank you very much for

[00:26] (26.00s)

having me. Uh it's a very exciting uh

[00:28] (28.56s)

place to be, very exciting time to be.

[00:31] (31.44s)

Uh

[00:33] (33.36s)

second, uh I mean this has been like a

[00:35] (35.68s)

pretty intense couple of days. I don't

[00:37] (37.76s)

know if you feel the same way. Uh but

[00:39] (39.76s)

also very energizing. So I want to take

[00:42] (42.24s)

a little bit of your time today uh to

[00:44] (44.08s)

talk about what I see as the coming of

[00:45] (45.60s)

the new code. uh in particular

[00:47] (47.52s)

specifications which sort of hold this

[00:49] (49.76s)

promise uh that it has been the dream of

[00:51] (51.76s)

the industry where you can write your

[00:53] (53.84s)

your code your intentions once and run

[00:56] (56.48s)

them everywhere.

[00:58] (58.40s)

Uh quick intro uh my name is Sean I work

[01:00] (60.32s)

at uh OpenAI uh specifically in

[01:03] (63.04s)

alignment research and today I want to

[01:05] (65.12s)

talk about sort of the value of code

[01:07] (67.04s)

versus communication and why

[01:08] (68.72s)

specifications might be a little bit of

[01:10] (70.32s)

a better approach in general.

[01:15] (75.52s)

Uh I'm going to go over the anatomy of a

[01:18] (78.24s)

specification and we'll use the uh model

[01:20] (80.64s)

spec as the example. Uh and we'll talk

[01:23] (83.76s)

about communicating intent to other

[01:26] (86.08s)

humans and we'll go over the 40 syphency

[01:28] (88.80s)

issue uh as a case study.

[01:31] (91.84s)

Uh we'll talk about how to make the

[01:33] (93.52s)

specification executable, how to

[01:36] (96.24s)

communicate intent to the models, uh and

[01:39] (99.84s)

how to think about specifications as

[01:42] (102.00s)

code, even if they're a little bit

[01:43] (103.20s)

different. Um, and we'll end on a couple

[01:45] (105.36s)

of open questions. So, let's talk about

[01:48] (108.32s)

code versus communication.

[01:51] (111.04s)

Real quick, raise your hand if you write

[01:54] (114.00s)

code and vibe code counts.

[01:57] (117.12s)

Cool. Keep them up if your job is to

[02:00] (120.88s)

write code.

[02:03] (123.44s)

Okay. Now, for those people, keep their

[02:05] (125.84s)

head up if you feel that the most

[02:07] (127.92s)

valuable professional artifact that you

[02:10] (130.08s)

produce is code.

[02:13] (133.44s)

Okay, there's quite a few people and I

[02:15] (135.68s)

think this is quite natural. We all work

[02:18] (138.08s)

very very hard to solve problems. We

[02:20] (140.96s)

talk with people. We gather

[02:22] (142.16s)

requirements. We think through

[02:24] (144.08s)

implementation details. We integrate

[02:25] (145.92s)

with lots of different sources. And the

[02:28] (148.00s)

ultimate thing that we produce is code.

[02:31] (151.28s)

Code is the artifact that we can point

[02:33] (153.60s)

to, we can measure, we can debate, and

[02:35] (155.76s)

we can discuss. uh it feels tangible and

[02:38] (158.88s)

real but it's sort of underelling the

[02:42] (162.16s)

job that each of you does. Code is sort

[02:45] (165.20s)

of 10 to 20% of the value that you

[02:47] (167.68s)

bring. The other 80 to 90% is in

[02:51] (171.04s)

structured communication.

[02:53] (173.28s)

And this is going to be different for

[02:54] (174.48s)

everyone, but a process typically looks

[02:56] (176.24s)

something like you talk to users in

[02:58] (178.88s)

order to understand their challenges.

[03:01] (181.76s)

You distill these stories down and then

[03:03] (183.92s)

ideulate about how to solve these

[03:06] (186.32s)

problems. What what is the goal that you

[03:08] (188.16s)

want to achieve? You plan ways to

[03:11] (191.04s)

achieve those goals. You share those

[03:13] (193.84s)

plans with your colleagues.

[03:16] (196.40s)

Uh you translate those plans into code.

[03:19] (199.36s)

So this is a very important step

[03:20] (200.72s)

obviously. And then you test and verify

[03:24] (204.16s)

not the code itself, right? No one cares

[03:27] (207.36s)

actually about the code itself. What you

[03:28] (208.96s)

care is when the code ran, did it

[03:32] (212.08s)

achieve the goals, did it alleviate the

[03:34] (214.72s)

challenges of your user? You look at the

[03:37] (217.28s)

the effects that your code had on the

[03:40] (220.24s)

world. So talking, understanding,

[03:44] (224.00s)

distilling, ideulating,

[03:48] (228.24s)

planning, sharing, translating, testing,

[03:51] (231.12s)

verifying. These all sound like

[03:53] (233.28s)

structured communication to me. And

[03:57] (237.28s)

structured communication is the

[03:59] (239.20s)

bottleneck.

[04:00] (240.88s)

Knowing what to build, talking to people

[04:03] (243.28s)

and gathering requirements, knowing how

[04:05] (245.60s)

to build it, knowing why to build it,

[04:08] (248.08s)

and at the end of the day, knowing if it

[04:10] (250.00s)

has been built correctly and has

[04:11] (251.68s)

actually achieved the intentions that

[04:13] (253.84s)

you set out with.

[04:15] (255.92s)

And the more advanced AI models get, the

[04:18] (258.96s)

more we are all going to starkly feel

[04:21] (261.52s)

this bottleneck.

[04:24] (264.00s)

Because in the near future, the person

[04:26] (266.16s)

who communicates most effectively is the

[04:29] (269.44s)

most valuable programmer. And literally,

[04:33] (273.04s)

if you can communicate effectively, you

[04:35] (275.28s)

can program.

[04:37] (277.68s)

So, let's take uh vibe coding as an

[04:39] (279.68s)

illustrative example. Vibe coding tends

[04:42] (282.00s)

to feel quite good. And it's worth

[04:44] (284.64s)

asking why is that? Well, vibe coding is

[04:47] (287.68s)

fundamentally about communication first.

[04:50] (290.32s)

And the code is actually a secondary

[04:52] (292.16s)

downstream artifact of that

[04:54] (294.00s)

communication.

[04:55] (295.68s)

We get to describe our intentions and

[04:57] (297.84s)

our the outcomes that we want to see and

[04:59] (299.76s)

we let the model actually handle the

[05:01] (301.68s)

grunt work for us. And even so, there is

[05:04] (304.88s)

something strange about the way that we

[05:06] (306.96s)

do vibe coding. We communicate via

[05:10] (310.24s)

prompts to the model

[05:12] (312.64s)

and we tell them our intentions and our

[05:14] (314.72s)

values and we get a code artifact out at

[05:17] (317.84s)

the end and then we sort of throw our

[05:20] (320.40s)

prompts away they're ephemeral

[05:24] (324.48s)

and if you've written TypeScript or Rust

[05:27] (327.36s)

once you put your your code through a

[05:29] (329.28s)

compiler or it gets down into a binary

[05:32] (332.08s)

no one is happy with that binary. That

[05:35] (335.36s)

wasn't the purpose. It's useful. In

[05:38] (338.56s)

fact, we always regenerate the binaries

[05:40] (340.56s)

from scratch every time we compile or we

[05:42] (342.96s)

run our code through V8 or whatever it

[05:44] (344.88s)

might be from the source spec. It's the

[05:47] (347.92s)

source specification that's the valuable

[05:50] (350.48s)

artifact.

[05:52] (352.16s)

And yet when we prompt elements, we sort

[05:53] (353.92s)

of do the opposite. We keep the

[05:55] (355.68s)

generated code and we delete the prompt.

[05:58] (358.40s)

And this feels like a little bit like

[05:59] (359.92s)

you shred the source and then you very

[06:01] (361.68s)

carefully version control the binary.

[06:05] (365.76s)

And that's why it's so important to

[06:07] (367.52s)

actually capture the intent and the

[06:09] (369.84s)

values in a specification.

[06:12] (372.64s)

A written specification is what enables

[06:14] (374.96s)

you to align humans on the shared set of

[06:17] (377.68s)

goals and to know if you are aligned if

[06:20] (380.72s)

you actually synchronize on what needs

[06:22] (382.16s)

to be done. This is the artifact that

[06:24] (384.16s)

you discuss that you debate that you

[06:26] (386.16s)

refer to and that you synchronize on.

[06:29] (389.12s)

And this is really important. And so I

[06:30] (390.24s)

want to nail this this home that a

[06:32] (392.64s)

written specification effectively aligns

[06:35] (395.84s)

humans

[06:37] (397.68s)

and it is the artifact that you use to

[06:40] (400.56s)

communicate and to discuss and debate

[06:42] (402.72s)

and refer to and synchronize on. If you

[06:45] (405.44s)

don't have a specification, you just

[06:47] (407.68s)

have a vague idea.

[06:50] (410.48s)

Now let's talk about why specifications

[06:52] (412.64s)

are more powerful in general than code.

[06:56] (416.08s)

Because code itself is actually a lossy

[06:58] (418.64s)

projection from the specification.

[07:01] (421.84s)

In the same way that if you were to take

[07:03] (423.36s)

a compiled C binary and decompile it,

[07:06] (426.96s)

you wouldn't get nice comments and uh

[07:09] (429.44s)

well-n named variables. You would have

[07:11] (431.28s)

to work backwards. You'd have to infer

[07:13] (433.12s)

what was this person trying to do? Why

[07:15] (435.12s)

is this code written this way? It isn't

[07:17] (437.36s)

actually contained in there. It was a

[07:18] (438.80s)

lossy translation. And in the same way,

[07:21] (441.60s)

code itself, even nice code, typically

[07:24] (444.32s)

doesn't embody all of the intentions and

[07:27] (447.04s)

the values in itself. You have to infer

[07:30] (450.96s)

what is the ultimate goal that this team

[07:32] (452.88s)

is trying to achieve. Uh when you read

[07:35] (455.20s)

through code,

[07:37] (457.20s)

so communication, the work that we

[07:39] (459.04s)

establish, we already do when embodied

[07:41] (461.84s)

inside of a written specification is

[07:43] (463.76s)

better than code. It actually encodes

[07:46] (466.48s)

all of the the necessary requirements in

[07:48] (468.72s)

order to generate the code. And in the

[07:51] (471.84s)

same way that having a source code that

[07:53] (473.68s)

you pass to a compiler allows you to

[07:56] (476.56s)

target multiple different uh

[07:58] (478.64s)

architectures, you can compile for ARM

[08:01] (481.20s)

64, x86 or web assembly. The source

[08:04] (484.56s)

document actually contains enough

[08:06] (486.64s)

information to describe how to translate

[08:09] (489.44s)

it to your target architecture.

[08:11] (491.92s)

In the same way, a a sufficiently robust

[08:15] (495.36s)

specification given to models will

[08:18] (498.48s)

produce good TypeScript, good Rust,

[08:21] (501.92s)

servers, clients, documentation,

[08:24] (504.56s)

tutorials, blog posts, and even

[08:26] (506.24s)

podcasts.

[08:27] (507.84s)

Uh, show of hands, who works at a

[08:30] (510.08s)

company that has developers as

[08:32] (512.00s)

customers?

[08:34] (514.24s)

Okay. So, a a quick like thought

[08:36] (516.56s)

exercise is if you were to take your

[08:38] (518.96s)

entire codebase, all of the the

[08:40] (520.88s)

documentation, oh, so all of the code

[08:43] (523.04s)

that runs your business, and you were to

[08:45] (525.12s)

put that into a podcast generator, could

[08:47] (527.36s)

you generate something that would be

[08:49] (529.44s)

sufficiently interesting and compelling

[08:50] (530.88s)

that would tell the users how to

[08:53] (533.36s)

succeed, how to achieve their goals, or

[08:55] (535.84s)

is all of that information somewhere

[08:57] (537.20s)

else? It's not actually in your code.

[09:01] (541.12s)

And so moving forward, the new scarce

[09:03] (543.52s)

skill is writing specifications that

[09:06] (546.40s)

fully capture the intent and values. And

[09:10] (550.16s)

whoever masters that again becomes the

[09:12] (552.64s)

most valuable programmer.

[09:15] (555.60s)

And there's a reasonable chance that

[09:16] (556.96s)

this is going to be the coders of today.

[09:19] (559.20s)

This is already very similar to what we

[09:21] (561.04s)

do. However, product managers also write

[09:24] (564.00s)

specifications. Lawmakers write legal

[09:26] (566.88s)

specifications.

[09:28] (568.64s)

This is actually a universal principle.

[09:31] (571.52s)

So with that in mind, let's look at what

[09:32] (572.88s)

a specification actually looks like. And

[09:35] (575.36s)

I'm going to use the OpenAI model spec

[09:37] (577.28s)

as an example here. So last year, OpenAI

[09:40] (580.16s)

released the model spec. And this is a

[09:42] (582.72s)

living document that tries to clearly

[09:46] (586.08s)

and unambiguously

[09:47] (587.92s)

express the intentions and values that

[09:50] (590.08s)

OpenAI hopes to imbue its models with

[09:52] (592.72s)

that it ships to the world.

[09:57] (597.04s)

and it was updated in in uh February and

[10:00] (600.16s)

open sourced. So you can actually go to

[10:02] (602.08s)

GitHub and you can see the

[10:03] (603.36s)

implementation of uh the model spec and

[10:07] (607.28s)

surprise surprise it's actually just a

[10:08] (608.88s)

collection of markdown files just looks

[10:11] (611.28s)

like this. Now markdown is remarkable.

[10:15] (615.04s)

It is human readable. It's versioned.

[10:17] (617.76s)

It's change logged and because it is

[10:20] (620.40s)

natural language everyone in not just

[10:23] (623.44s)

technical people can contribute

[10:25] (625.20s)

including product legal safety research

[10:27] (627.84s)

policy they can all read discuss debate

[10:32] (632.00s)

and contribute to the same source code.

[10:35] (635.12s)

This is the universal artifact that

[10:37] (637.84s)

aligns all of the humans as to our

[10:40] (640.48s)

intentions and values inside of the

[10:42] (642.88s)

company.

[10:44] (644.96s)

Now, as much as we might try to use

[10:47] (647.52s)

unambiguous language, there are times

[10:49] (649.92s)

where it's very difficult to express the

[10:51] (651.92s)

nuance. So, every clause in the model

[10:55] (655.36s)

spec has an ID here. So, you can see

[10:57] (657.60s)

sy73 here. And using that ID, you can

[11:01] (661.68s)

find another file in the repository

[11:03] (663.76s)

sy73.mmarkdown

[11:05] (665.36s)

or md uh that contains one or more

[11:08] (668.56s)

challenging prompts

[11:10] (670.80s)

for this exact clause. So the document

[11:13] (673.84s)

itself actually encodes success criteria

[11:18] (678.08s)

that the the model under test has to be

[11:21] (681.28s)

able to answer this in a way that

[11:22] (682.56s)

actually adheres to that clause.

[11:27] (687.28s)

So let's talk about uh syphy. Uh

[11:31] (691.28s)

recently there was a update to 40. I

[11:34] (694.08s)

don't know if you've heard of this. Uh

[11:36] (696.48s)

there uh caused extreme syphy. uh and we

[11:41] (701.68s)

can ask like what value is the model

[11:44] (704.40s)

spec in this scenario and the model spec

[11:48] (708.00s)

serves to align humans around a set of

[11:50] (710.88s)

values and intentions.

[11:53] (713.20s)

Here's an example of sycancy where the

[11:55] (715.60s)

user calls out the behavior of being uh

[11:58] (718.64s)

syphants uh or sophantic at the expense

[12:01] (721.52s)

of impartial truth and the model very

[12:04] (724.24s)

kindly uh praises the user for their

[12:06] (726.24s)

insight.

[12:09] (729.36s)

There have been other esteemed

[12:10] (730.64s)

researchers uh who have found similarly

[12:13] (733.36s)

uh

[12:15] (735.52s)

similarly uh concerning examples

[12:19] (739.92s)

and this hurts

[12:23] (743.36s)

uh shipping sycopency in this manner

[12:26] (746.56s)

erodess trust.

[12:28] (748.64s)

It hurts.

[12:31] (751.36s)

So and it also raises a lot of questions

[12:33] (753.84s)

like was this intentional? you could see

[12:36] (756.72s)

some way where you might interpret it

[12:38] (758.32s)

that way. Was it accidental and why

[12:40] (760.96s)

wasn't it caught?

[12:42] (762.88s)

Luckily, the model spec actually

[12:44] (764.64s)

includes a section dedicated to this

[12:48] (768.16s)

since its release that says don't be

[12:50] (770.72s)

sick of fantic and it explains that

[12:53] (773.20s)

while sophency might feel good in the

[12:55] (775.28s)

short term, it's bad for everyone in the

[12:57] (777.52s)

long term. So, we actually expressed our

[13:00] (780.32s)

intentions and our values and were able

[13:01] (781.84s)

to communicate it to others through this

[13:07] (787.44s)

So people could reference it and if we

[13:11] (791.12s)

have it in the model spec specification

[13:13] (793.60s)

if the model specification is our agreed

[13:16] (796.00s)

upon set of intentions and values and

[13:18] (798.32s)

the behavior doesn't align with that

[13:20] (800.80s)

then this must be a bug.

[13:23] (803.68s)

So we rolled back we published some

[13:26] (806.08s)

studies and some blog post and we fixed

[13:28] (808.08s)

it.

[13:31] (811.60s)

But in the interim, the specs served as

[13:34] (814.24s)

a trust anchor, a way to communicate to

[13:36] (816.72s)

people what is expected and what is not

[13:38] (818.72s)

expected.

[13:43] (823.28s)

So if just if the only thing the model

[13:46] (826.56s)

specification did was to align humans

[13:49] (829.76s)

along those shared sets of intentions

[13:51] (831.60s)

and values, it would already be

[13:53] (833.76s)

incredibly useful.

[13:56] (836.80s)

But ideally we can also align our models

[13:59] (839.84s)

and the artifacts that our models

[14:01] (841.44s)

produce against that same specification.

[14:05] (845.12s)

So there's a technique a paper that we

[14:06] (846.72s)

released uh called deliberative

[14:08] (848.08s)

alignment that sort of talks about this

[14:09] (849.68s)

how to automatically align a model and

[14:12] (852.40s)

the technique is uh such where you take

[14:15] (855.60s)

your specification and a set of very

[14:17] (857.60s)

challenging uh input prompts and you

[14:19] (859.92s)

sample from the model under test or

[14:21] (861.36s)

training.

[14:23] (863.20s)

You then uh take its response, the

[14:25] (865.36s)

original prompt and the policy and you

[14:27] (867.20s)

give that to a greater model and you ask

[14:28] (868.88s)

it to score the response according to

[14:32] (872.40s)

the specification. How aligned is it? So

[14:34] (874.88s)

the document actually becomes both

[14:36] (876.96s)

training material and eval material

[14:40] (880.16s)

and based off of this score we reinforce

[14:42] (882.16s)

those weights and it goes from you know

[14:45] (885.04s)

you could include your specification in

[14:47] (887.60s)

the context and then maybe a system

[14:48] (888.88s)

message or developer message in every

[14:50] (890.56s)

single time you sample and that is

[14:52] (892.00s)

actually quite useful. a prompted uh

[14:54] (894.00s)

model is going to be somewhat aligned,

[14:56] (896.24s)

but it does detract from the compute

[14:57] (897.92s)

available to solve the uh problem that

[15:01] (901.20s)

you're trying to solve with the model.

[15:02] (902.88s)

And keep in mind, these specifications

[15:04] (904.40s)

can be anything. They could be code

[15:06] (906.32s)

style or testing requirements or or

[15:08] (908.72s)

safety requirements. All of that can be

[15:10] (910.80s)

embedded into the model. So through this

[15:13] (913.28s)

technique you're actually moving it from

[15:14] (914.96s)

a inference time compute and actually

[15:17] (917.68s)

you're pushing down into the weights of

[15:19] (919.84s)

the model so that the model actually

[15:21] (921.52s)

feels your policy and is able to sort of

[15:24] (924.00s)

muscle memory uh style apply it to the

[15:27] (927.52s)

problem at hand.

[15:29] (929.84s)

And even though we saw that the model

[15:31] (931.60s)

spec is just markdown it's quite useful

[15:34] (934.08s)

to think of it as code. It's quite

[15:36] (936.08s)

analogous.

[15:37] (937.84s)

uh these specifications they compose

[15:39] (939.76s)

they're executable as we've seen uh they

[15:42] (942.24s)

are testable they have interfaces where

[15:44] (944.16s)

they they touch the real world uh they

[15:46] (946.48s)

can be shipped as modules

[15:49] (949.20s)

and whenever you're working on a model

[15:52] (952.08s)

spec there are a lot of similar sort of

[15:54] (954.24s)

uh problem domains so just like in

[15:56] (956.56s)

programming where you have a type

[15:58] (958.00s)

checker the type checker is meant to

[15:59] (959.68s)

ensure consistency where if interface A

[16:02] (962.40s)

has a dependent uh module B they have to

[16:05] (965.60s)

be consistent in their understanding of

[16:07] (967.20s)

one another. So if department A writes a

[16:10] (970.40s)

spec and department B writes a spec and

[16:12] (972.16s)

there is a conflict in there you want to

[16:13] (973.92s)

be able to pull that forward and maybe

[16:15] (975.76s)

block the publication of the the

[16:18] (978.24s)

specification as we saw the policy can

[16:21] (981.04s)

actually embody its own unit tests and

[16:23] (983.60s)

you can imagine sort of various llinters

[16:25] (985.20s)

where if you're using overly ambiguous

[16:26] (986.80s)

language you're going to confuse humans

[16:28] (988.72s)

and you're going to confuse the model

[16:30] (990.24s)

and the artifacts that you get from that

[16:32] (992.00s)

are going to be less satisfactory.

[16:34] (994.64s)

So specs actually give us a very similar

[16:37] (997.04s)

tool chain but it's targeted at

[16:39] (999.12s)

intentions rather than syntax.

[16:42] (1002.48s)

So let's talk about lawmakers as

[16:45] (1005.04s)

programmers. Uh

[16:48] (1008.40s)

the US constitution is literally a

[16:51] (1011.36s)

national model specification. It has

[16:53] (1013.84s)

written text which is aspirationally at

[16:56] (1016.00s)

least clear and unambiguous policy that

[16:58] (1018.96s)

we can all refer to. And it doesn't mean

[17:01] (1021.20s)

that we agree with it but we can refer

[17:03] (1023.04s)

to it as the current status quo as the

[17:05] (1025.60s)

reality. Uh there is a versioned way to

[17:09] (1029.84s)

make amendments to bump and to uh

[17:12] (1032.16s)

publish updates to it. There is judicial

[17:14] (1034.88s)

review where a a grader is effectively

[17:19] (1039.20s)

uh grading a situation and seeing how

[17:20] (1040.80s)

well it aligns with the policy. And even

[17:23] (1043.20s)

though the again because or even though

[17:25] (1045.36s)

the source policy is meant to be

[17:27] (1047.84s)

unambiguous sometimes you don't the

[17:30] (1050.56s)

world is messy and maybe you miss part

[17:32] (1052.40s)

of the distribution and a case falls

[17:34] (1054.32s)

through and in that case the there is a

[17:37] (1057.60s)

lot of compute spent in judicial review

[17:40] (1060.00s)

where you're trying to understand how

[17:41] (1061.76s)

the law actually applies here and once

[17:43] (1063.92s)

that's decided it sets a precedent and

[17:46] (1066.32s)

that precedent is effectively an input

[17:48] (1068.24s)

output pair that serves as a unit test

[17:50] (1070.80s)

that disambiguates and rein reinforces

[17:52] (1072.88s)

the original policy spec. Uh it has

[17:55] (1075.76s)

things like a chain of command embedded

[17:58] (1078.24s)

in it and the enforcement of this over

[18:01] (1081.52s)

time is a training loop that helps align

[18:03] (1083.44s)

all of us towards a shared set of

[18:05] (1085.52s)

intentions and values. So this is one

[18:08] (1088.56s)

artifact that communicates intent. It

[18:11] (1091.12s)

adjudicates compliance and it has a way

[18:13] (1093.28s)

of uh evolving safely.

[18:17] (1097.60s)

So it's quite possible that lawmakers

[18:19] (1099.52s)

will be programmers or inversely that

[18:21] (1101.92s)

programmers will be lawmakers in the

[18:24] (1104.48s)

future.

[18:26] (1106.08s)

And actually this apply this is a very

[18:28] (1108.40s)

universal concept. Programmers are in

[18:30] (1110.72s)

the business of aligning silicon via

[18:33] (1113.68s)

code specifications. Product managers

[18:36] (1116.08s)

align teams via product specifications.

[18:38] (1118.64s)

Lawmakers literally align humans via

[18:41] (1121.28s)

legal specifications. And everyone in

[18:43] (1123.68s)

this room whenever you are doing a

[18:45] (1125.28s)

prompt it's a sort of proto

[18:46] (1126.72s)

specification. You are in the business

[18:49] (1129.04s)

of aligning AI models towards a common

[18:52] (1132.48s)

set set of intentions and values. And

[18:55] (1135.52s)

whether you realize it or not you are

[18:56] (1136.96s)

spec authors in this world and specs let

[19:01] (1141.12s)

you ship faster and safer. Everyone can

[19:04] (1144.64s)

contribute and whoever writes the spec

[19:07] (1147.52s)

be it a

[19:09] (1149.84s)

uh a PM uh a lawmaker an engineer a

[19:13] (1153.52s)

marketer is now the programmer

[19:17] (1157.76s)

and software engineering has never been

[19:19] (1159.84s)

about code. Going back to our original

[19:22] (1162.64s)

question a lot of you put your hands

[19:24] (1164.24s)

down when you thought well actually the

[19:25] (1165.68s)

thing I produced is not code.

[19:28] (1168.00s)

Engineering has never been about this.

[19:29] (1169.52s)

Coding is an incredible skill and a

[19:31] (1171.20s)

wonderful asset, but it is not the end

[19:33] (1173.12s)

goal. Engineering is the precise

[19:35] (1175.52s)

exploration by humans of software

[19:37] (1177.68s)

solutions to human problems. It's always

[19:40] (1180.56s)

been this way. We're just moving away

[19:42] (1182.32s)

from sort of the disperate machine

[19:43] (1183.84s)

encodings to a unified human encoding uh

[19:47] (1187.28s)

of how we actually uh solve these these

[19:49] (1189.60s)

problems. Uh I want to thank Josh for

[19:51] (1191.60s)

this uh credit. So I want to ask you,

[19:54] (1194.96s)

put this in action. Whenever you're

[19:57] (1197.12s)

working on your next AI feature, start

[19:59] (1199.20s)

with a specification.

[20:01] (1201.28s)

What do you actually expect to happen?

[20:03] (1203.36s)

What's success criteria look like?

[20:05] (1205.52s)

Debate whether or not it's actually

[20:07] (1207.28s)

clearly written down and communicated.

[20:09] (1209.36s)

Make the spec executable. Feed the spec

[20:11] (1211.68s)

to the model

[20:14] (1214.48s)

and test against the model or test

[20:17] (1217.28s)

against the spec. And there's an

[20:19] (1219.84s)

interesting question sort of in this

[20:20] (1220.72s)

world given that there's so many uh

[20:22] (1222.96s)

parallels between programming and spec

[20:25] (1225.60s)

authorship.

[20:27] (1227.28s)

I wonder what is the what does the IDE

[20:30] (1230.08s)

look like in the future. you know, an

[20:31] (1231.92s)

integrated development environment. And

[20:33] (1233.76s)

I'd like to think it's something like an

[20:34] (1234.88s)

inte like integrated thought clarifier

[20:37] (1237.60s)

where whenever you're writing your

[20:38] (1238.80s)

specification, it sort of ex pulls out

[20:42] (1242.16s)

the ambiguity and asks you to clarify it

[20:45] (1245.52s)

and it really clarifies your thought so

[20:47] (1247.04s)

that you and all human beings can

[20:49] (1249.36s)

communicate your intent to each other

[20:51] (1251.68s)

much more effectively and to the models.

[20:55] (1255.92s)

And I have a a closing request for help

[20:58] (1258.24s)

which is uh what is both amenable and in

[21:01] (1261.12s)

desperate need of specification. This is

[21:04] (1264.08s)

aligning agent at scale. Uh I love this

[21:07] (1267.12s)

line of like you then you realize that

[21:09] (1269.68s)

you never told it what you wanted and

[21:11] (1271.20s)

maybe you never fully understood it

[21:12] (1272.80s)

anyway. This is a cry for specification.

[21:15] (1275.28s)

Uh we have a new agent robustness team

[21:17] (1277.44s)

that we've started up. So please join us

[21:19] (1279.60s)

and help us deliver safe AGI for the

[21:22] (1282.48s)

benefit of all humanity.

[21:25] (1285.28s)

And thank you. I'm happy to chat.

[21:29] (1289.67s)

[Music]

YouTube Deep Summary

The New Code — Sean Grove, OpenAI

🤖 AI-Generated Summary:

Summary History

Overview

Main Topics Covered

Key Takeaways & Insights

Actionable Strategies

Specific Details & Examples

Warnings & Common Mistakes

Resources & Next Steps

📝 Transcript (515 entries):