[00:00] (0.08s)
Hey, my name is Greg and I've spent
[00:02] (2.00s)
hundreds of dollars hacking with cloud
[00:03] (3.52s)
code over the last couple months. Uh,
[00:05] (5.68s)
it's become sort of my default coding
[00:08] (8.40s)
agent whenever I'm starting new projects
[00:10] (10.56s)
or especially if I'm working on an older
[00:12] (12.72s)
project and that has a larger code base.
[00:14] (14.96s)
Last week, OpenAI launched their codec
[00:18] (18.40s)
and I was really excited to try it out.
[00:20] (20.32s)
So, I tried both Cloud Code and OpenAI's
[00:24] (24.64s)
codeex out headtohead on a couple
[00:26] (26.56s)
different projects. Uh, I'll kind of go
[00:28] (28.80s)
against YouTube best practices here and
[00:30] (30.72s)
just spoil it. Uh, codeex fell pretty
[00:34] (34.16s)
short and it was disappointing. Uh, and
[00:37] (37.84s)
so in the rest of the video, I'm going
[00:38] (38.80s)
to share my opinion on what OpenAI needs
[00:40] (40.64s)
to do to bring Codeex up to the same
[00:43] (43.28s)
standard that Cloud Code's at now. Uh,
[00:45] (45.44s)
and I I honestly I think there's some
[00:47] (47.12s)
simple fixes having to do with the
[00:49] (49.20s)
developer
[00:50] (50.12s)
experience. Before we get into this, I
[00:52] (52.24s)
want to preface this by saying I'm
[00:53] (53.60s)
wearing my OpenAI Devday hoodie. I've
[00:55] (55.92s)
been hacking on OpenAI's APIs starting
[00:58] (58.40s)
with the GPT3 text completion endpoint.
[01:01] (61.28s)
OpenAI has shipped some of the best
[01:03] (63.76s)
developer experiences that I've ever
[01:05] (65.92s)
encountered in my career. I have a
[01:08] (68.32s)
number of former colleagues and friends
[01:09] (69.92s)
from Twilio who work there now. Uh I
[01:12] (72.48s)
generally don't want this channel or
[01:14] (74.48s)
really anything I do to be about
[01:16] (76.08s)
criticizing other people. Uh because
[01:18] (78.16s)
shipping stuff is really hard. Uh, I I
[01:20] (80.56s)
only feel comfortable critiquing this
[01:22] (82.24s)
because I know that OpenAI's bar for
[01:24] (84.88s)
developer experiences has historically
[01:27] (87.12s)
been really high, and this one just sort
[01:30] (90.08s)
of fell surprisingly short. So, let's
[01:32] (92.48s)
talk a little bit about uh a few points
[01:35] (95.04s)
of the developer experience. When I
[01:37] (97.12s)
personally start up uh Codeex, I'm given
[01:40] (100.16s)
a message that 04 is not available to
[01:42] (102.56s)
me. Uh, my account is tier 5. The
[01:45] (105.60s)
announcement said that 04 should be
[01:47] (107.04s)
rolled out to all tier 5 accounts. So,
[01:49] (109.04s)
they've set my expectations at here. Uh,
[01:51] (111.28s)
they're delivering at here with no
[01:53] (113.60s)
explanation of the gap. But Codeex
[01:55] (115.76s)
though does ship with 04 MIDI being the
[01:57] (117.68s)
default. And it does enough checking to
[01:59] (119.84s)
let me know that I don't have access to
[02:02] (122.32s)
that model. But why doesn't it just fall
[02:04] (124.56s)
back to a model that works then?
[02:06] (126.64s)
Instead, if I try to chat with 04 Mini,
[02:08] (128.88s)
it just fails. Okay, fine. I guess I
[02:11] (131.84s)
need to change my model. It's not
[02:14] (134.08s)
obvious on the screen how I do that, but
[02:16] (136.64s)
it does say I can type /help here. So,
[02:18] (138.32s)
I'll type /help. All right, cool.
[02:19] (139.84s)
There's a slashmodel command. Let's
[02:23] (143.72s)
/model. Oh my gosh. Which model am I
[02:27] (147.68s)
supposed to choose from
[02:29] (149.40s)
here? Do I want Babbage? Do I want GPT
[02:33] (153.80s)
3.5? I follow these things pretty
[02:36] (156.32s)
closely and I'm pretty sure that of this
[02:38] (158.96s)
list, 03 Mini is the best option for me.
[02:42] (162.72s)
Uh, and I'm pretty sure that 03 Mini is
[02:45] (165.44s)
better than this dated version of 03
[02:47] (167.36s)
Mini. I'm pretty sure the 03 Mini is
[02:48] (168.80s)
going to point to the latest one, but
[02:50] (170.00s)
I'm only like 90% sure. But hey, let's
[02:53] (173.44s)
just for shits and giggles, let's let's
[02:55] (175.12s)
choose one of the older ones. Let's try
[02:56] (176.80s)
Babage. I I This is before my day. Let's
[02:59] (179.12s)
just try Babage and play. Oh, man. And
[03:02] (182.40s)
it fails. All right. Well, let's try one
[03:04] (184.40s)
of these newer models. Oh, look. That
[03:06] (186.88s)
fails, too. Well, let's try one of these
[03:09] (189.04s)
other models that uh you know, it's
[03:11] (191.04s)
weird that Dollyy's in here, right? I
[03:12] (192.72s)
mean, that doesn't seem like we're going
[03:13] (193.84s)
to do image generation here. Let's try
[03:15] (195.20s)
that one. that fails as
[03:17] (197.56s)
well. Why are all of these models listed
[03:21] (201.84s)
here if they don't work with this
[03:24] (204.68s)
product? Like this just baffles me. This
[03:27] (207.76s)
is this is
[03:29] (209.56s)
like this is an if statement. You know,
[03:32] (212.72s)
this is 30 minutes of work. This is like
[03:34] (214.56s)
if you had a couple dozen people trying
[03:37] (217.24s)
this, this feedback would have been
[03:39] (219.36s)
surfaced. And so I think this sort of
[03:41] (221.84s)
thing is the thing that frustrated me
[03:43] (223.12s)
most coming from OpenAI because this
[03:45] (225.76s)
just sort of feels like wasting people's
[03:47] (227.76s)
time in order to get some headlines. And
[03:50] (230.72s)
and I don't that maybe that's not it,
[03:52] (232.72s)
but that's what it feels like from my
[03:55] (235.04s)
perspective is I haven't even gotten
[03:57] (237.12s)
started yet and I'm having to make these
[03:58] (238.84s)
decisions and it's like trying to pick
[04:01] (241.28s)
which door has the death trap and which
[04:05] (245.28s)
one leads to the happy place. Uh, and I
[04:07] (247.84s)
feel like a developer should never have
[04:09] (249.36s)
to make that decision just simply to get
[04:11] (251.28s)
to hello world with your product. API
[04:13] (253.84s)
key management. When you boot up cloud
[04:15] (255.68s)
code, it's going to ask you to oth into
[04:17] (257.76s)
your account and then it's going to set
[04:19] (259.84s)
up your API key and store it away into a
[04:22] (262.48s)
configuration file for you. Codeex on
[04:24] (264.88s)
the other hand uh expects you to set the
[04:27] (267.92s)
API key as an environment variable um
[04:30] (270.80s)
specifically either as a global
[04:32] (272.88s)
environment variable or as a
[04:37] (277.08s)
env. Um this is very similar to how you
[04:40] (280.24s)
would set the API key if you were just
[04:42] (282.40s)
going to use the OpenAI API in one of
[04:45] (285.04s)
your projects. Uh but that doesn't
[04:47] (287.84s)
actually make sense for this tool. I
[04:50] (290.40s)
really like how Anthropic expects you to
[04:52] (292.64s)
spin up a second API key or I think even
[04:55] (295.92s)
a separate project uh for cloud code
[05:00] (300.00s)
because the usage is different, right?
[05:02] (302.00s)
Right. And so if I've set uh an OpenAI
[05:04] (304.88s)
API key in my project, that means, you
[05:07] (307.92s)
know, I have some code in that project
[05:09] (309.04s)
that's using the OpenAI API and it's
[05:12] (312.08s)
using that API key. And I don't want the
[05:15] (315.12s)
usage of the coding tool to uh spend
[05:19] (319.68s)
against that same API key. Also, I
[05:22] (322.56s)
really don't like the idea that you need
[05:24] (324.88s)
to set a global OpenAI API key for if
[05:29] (329.92s)
you want to use this across all of your
[05:31] (331.28s)
projects. So, now there's this uh OpenAI
[05:33] (333.84s)
API key just like hanging out in your
[05:35] (335.60s)
environment all the time that could be
[05:38] (338.00s)
snagged by any script that's running
[05:40] (340.56s)
locally or any app I suspect that's
[05:42] (342.64s)
running. Uh, and so it just seems like
[05:45] (345.08s)
unnecessarily insecure. There's really
[05:48] (348.00s)
not a good option with codeex to create
[05:51] (351.68s)
an API key that's solely dedicated to
[05:54] (354.40s)
the usage of codecs and then to store it
[05:56] (356.88s)
in a persistent way that you can use it
[05:59] (359.20s)
across all of the different projects you
[06:01] (361.20s)
might be working on. And cloud code
[06:02] (362.96s)
makes this super easy. Cost management.
[06:05] (365.44s)
Uh so as I mentioned uh cloud code if
[06:07] (367.76s)
there's any major complaint of it is
[06:09] (369.28s)
that it can get expensive compared to
[06:11] (371.36s)
what developers are used to paying uh in
[06:14] (374.16s)
order to write code. uh but still, you
[06:16] (376.08s)
know, probably pretty good value
[06:17] (377.36s)
compared to hiring a human to do it. Um
[06:19] (379.76s)
and they know this and so they give you
[06:21] (381.68s)
a couple of options to manage the cost
[06:24] (384.40s)
over the course of your session. The
[06:26] (386.16s)
first is /cost. At any point in time,
[06:28] (388.56s)
you can run the /cost command and you
[06:30] (390.48s)
can see how much you've spent during
[06:31] (391.92s)
this coding session. This is really
[06:33] (393.36s)
really useful. Uh two, they give you the
[06:35] (395.68s)
slash compact command, right? So, it's
[06:37] (397.92s)
just basically taking everything you've
[06:39] (399.28s)
done, bulletointing it, and then
[06:41] (401.68s)
significantly reducing or compacting the
[06:44] (404.40s)
context uh with a summary so that cloud
[06:47] (407.04s)
code still knows what's happened, but
[06:49] (409.12s)
it's able to know that with much much
[06:52] (412.16s)
smaller requests. So, each request is
[06:54] (414.16s)
going to be a lot less expensive. Codeex
[06:55] (415.92s)
does none of this. There's no way on any
[06:57] (417.68s)
given session to figure out how much
[06:58] (418.88s)
you've spent. Uh and there is no concept
[07:01] (421.52s)
of compacting your history. Project
[07:03] (423.60s)
context. When you start up Claude, uh,
[07:06] (426.96s)
it does a scan of your directory that
[07:10] (430.56s)
you're in, of your codebase, and it
[07:12] (432.56s)
tries to gain an understanding of your
[07:14] (434.80s)
codebase, and then you can run this
[07:17] (437.76s)
/init command and write what it's
[07:20] (440.56s)
learned about the codebase to a claw.md
[07:23] (443.36s)
file. So, it doesn't need to do that
[07:25] (445.04s)
analysis every single time. You can also
[07:27] (447.28s)
give it instructions on you know your
[07:29] (449.60s)
formatting preferences on uh the you
[07:32] (452.72s)
know just coding conventions that you're
[07:34] (454.16s)
going to use in the project so that
[07:35] (455.60s)
you're not starting from scratch every
[07:36] (456.96s)
time you start this up. Codeex when you
[07:39] (459.52s)
start it up it does nothing. Uh it just
[07:42] (462.56s)
basically gives you a text box to chat
[07:45] (465.12s)
with whatever model you have chosen. It
[07:47] (467.52s)
doesn't take any initiative to
[07:49] (469.20s)
understand your codebase to understand
[07:50] (470.96s)
what is going on. Uh now once you
[07:53] (473.12s)
initiate the chat it will but I really
[07:56] (476.40s)
love the idea that since this is a
[07:58] (478.32s)
coding agent since you ran it in a
[07:59] (479.92s)
specific directory it can assume that
[08:02] (482.16s)
you want it to understand your codebase
[08:03] (483.92s)
before you start working with it. So why
[08:05] (485.68s)
not go ahead and start doing that and
[08:07] (487.36s)
then once you have started doing that
[08:09] (489.28s)
why not in order to save costs and save
[08:11] (491.36s)
time write what you've learned to a
[08:13] (493.60s)
standardized file so you don't have to
[08:15] (495.20s)
repeat that process every time. Uh fifth
[08:17] (497.28s)
is just the UI. I mean, Cloud Code is a
[08:19] (499.36s)
really, really nicely designed CLI, and
[08:21] (501.84s)
there's a lot of subtleties in Cloud
[08:24] (504.32s)
Codes, like the way that they separate
[08:26] (506.56s)
the output versus your section down
[08:28] (508.80s)
below where you're going to do the
[08:29] (509.92s)
input, um, the way they do the syntax
[08:32] (512.92s)
highlighting, the just the colors
[08:35] (515.28s)
they've used, like it just it feels like
[08:37] (517.68s)
a very intentional product. Uh by
[08:40] (520.88s)
contrast, codeex feels sort of like the
[08:42] (522.72s)
minimum viable design. You know, instead
[08:44] (524.88s)
of a single line that shows how long
[08:47] (527.92s)
you've been waiting, uh it just has like
[08:50] (530.08s)
sort of a console log uh that outputs
[08:53] (533.52s)
every few seconds uh saying, "Okay, now
[08:55] (535.60s)
you waited for 24 seconds. Now you
[08:57] (537.04s)
waited for 26 seconds." And it just
[08:58] (538.60s)
scrolls there. There's no clear visual
[09:01] (541.04s)
delineation between the where the user
[09:03] (543.68s)
does their input and the output of the
[09:05] (545.60s)
agent. um you know this there's not a
[09:09] (549.04s)
lot of good color contrast. Cloud Code
[09:11] (551.12s)
even has some nice considerations to
[09:12] (552.56s)
change the color scheme if you're color
[09:14] (554.40s)
blind. Codeex does not have this. So
[09:17] (557.12s)
again it it just feels like this was the
[09:19] (559.60s)
minimum possible that could be done and
[09:21] (561.36s)
then it was pushed out. This thing
[09:22] (562.80s)
crashes like uh I've never had cloud
[09:25] (565.68s)
code crash on me. Uh once I switch over
[09:28] (568.00s)
to 03 Mini, it has crashed multiple
[09:30] (570.00s)
times on me. Uh just hard crash and then
[09:32] (572.40s)
to start it back up again, I got to go
[09:34] (574.40s)
through and set all the same preferences
[09:35] (575.68s)
again. and then I lose all the context
[09:37] (577.04s)
and everything. Um, and it's just again
[09:39] (579.76s)
just sort of feels like table stakes.
[09:41] (581.36s)
Uh, and it's super super frustrating and
[09:43] (583.36s)
it just feels like it's not ready. MCP
[09:45] (585.36s)
support. Uh, you now can add MCP servers
[09:49] (589.44s)
to cloud code. So, one great use case
[09:51] (591.60s)
that I learned about on a cloud code
[09:54] (594.08s)
webinar last week was uh using the
[09:56] (596.92s)
Puppeteer uh MCP server to control a
[09:59] (599.76s)
browser. So you can ask cloud code to
[10:01] (601.44s)
make changes and then you can have it
[10:03] (603.28s)
control a browser to view those changes
[10:05] (605.28s)
and it sort of closes the loop on the
[10:07] (607.44s)
feedback cycle. Uh once you add MCP
[10:10] (610.40s)
servers to these tools, it just really
[10:12] (612.16s)
opens up the world of possibilities of
[10:14] (614.56s)
what you can do with them. Speaking of
[10:16] (616.08s)
that webinar, uh you know, I don't
[10:20] (620.16s)
remember the last time I sat in on a
[10:22] (622.08s)
webinar for a software product. uh but
[10:24] (624.64s)
I've used cloud code so much and uh that
[10:27] (627.44s)
I I wanted to participate and learn how
[10:29] (629.68s)
to get more uh use out of it and so the
[10:32] (632.64s)
seminar from anthropic was called the
[10:34] (634.24s)
origin story of cloud code and best
[10:36] (636.00s)
practices so I'm like I'm in um and
[10:38] (638.88s)
there was something they said that was
[10:39] (639.92s)
interesting there they said cloud code
[10:41] (641.20s)
started off as an internal tool and then
[10:43] (643.36s)
when they released it internally uh they
[10:45] (645.76s)
saw adoption internally just sort of go
[10:48] (648.40s)
up and to the right and so they said
[10:49] (649.76s)
okay there must be something here and
[10:51] (651.76s)
they polished it up and once they got it
[10:53] (653.12s)
to the right place then they released it
[10:55] (655.12s)
out into the open. Um but and even today
[10:58] (658.24s)
like it's how many many developers
[11:00] (660.32s)
inside of anthropic uh you know do their
[11:03] (663.12s)
daily coding tasks.
[11:05] (665.64s)
Uh you can tell from the polish that
[11:09] (669.04s)
people internally had used cloud code
[11:11] (671.20s)
before it made it to market. And you can
[11:14] (674.00s)
tell from the polish that very few
[11:16] (676.64s)
people used OpenAI's codeex CLI before
[11:19] (679.60s)
it came to market. really just like a
[11:21] (681.68s)
week or two of internal use and a little
[11:25] (685.24s)
intentionality, this could be so much
[11:28] (688.00s)
better. And notice here, I have said
[11:31] (691.16s)
nothing about this tool's ability to
[11:35] (695.20s)
solve coding problems. It is very
[11:38] (698.24s)
possible that 04 Mini plus codecs would
[11:43] (703.20s)
solve my developer problems with far
[11:45] (705.04s)
more reliability
[11:47] (707.20s)
uh at a fraction of the cost of cloud
[11:49] (709.12s)
code. But I never got there because it
[11:52] (712.00s)
was so frustrating to use it and because
[11:53] (713.92s)
it kept crashing on me and because I
[11:55] (715.68s)
don't have access to the latest and
[11:56] (716.88s)
greatest models.