[00:00] (0.08s)
cursor and anthropic both release coding
[00:01] (1.96s)
agents the same week and I wanted to
[00:04] (4.24s)
learn which one's better so I put them
[00:06] (6.40s)
to work on a rails app that I have
[00:08] (8.20s)
running in production I gave each of
[00:10] (10.24s)
them the same three tasks to complete
[00:13] (13.12s)
and this is what I learned along the way
[00:15] (15.04s)
let's first talk about the ux so for
[00:17] (17.12s)
cursor for .46 the big change is that
[00:19] (19.32s)
they promoted the agent to be the
[00:21] (21.56s)
default way of interacting with the llm
[00:23] (23.84s)
but you're still operating inside of a
[00:26] (26.36s)
fully featured IDE your interactions
[00:29] (29.20s)
with the agent are really the primary
[00:30] (30.84s)
way in which you're making changes to
[00:33] (33.44s)
the code I found that I I didn't
[00:35] (35.52s)
actually need to see the files open
[00:37] (37.76s)
there wasn't really much for me to do I
[00:39] (39.72s)
wasn't going to tweak the files in the
[00:41] (41.36s)
midst of an agent's actions I also just
[00:44] (44.36s)
thought that some elements of uh the
[00:47] (47.00s)
agent design were a little bit clunky um
[00:49] (49.40s)
at times there would be two or three
[00:50] (50.92s)
different places where I could click
[00:52] (52.32s)
accept at times there would be terminal
[00:54] (54.40s)
commands running in the agent Pane and
[00:57] (57.72s)
the terminal command needed me to hit
[00:59] (59.24s)
yes or no but the prompt had gone off of
[01:03] (63.24s)
the right hand side of the screen
[01:04] (64.76s)
because the pain was so small um there
[01:07] (67.08s)
were often times when I saw like a
[01:08] (68.88s)
spinning Circle and I thought that I was
[01:10] (70.80s)
waiting for a prompt gen or I thought
[01:12] (72.92s)
that I was waiting for an llm response
[01:14] (74.76s)
but really it was waiting for me to
[01:16] (76.36s)
click a button and now that the agent
[01:18] (78.72s)
has taken prominence and the agent's
[01:20] (80.32s)
doing so much action for you I found
[01:21] (81.96s)
myself just saying like do I really need
[01:24] (84.16s)
two-thirds of the pain taken up by the
[01:25] (85.80s)
file editor by contrast CLA code is a
[01:29] (89.12s)
CLI you run the command in the root of
[01:33] (93.36s)
your project in the root of your code
[01:35] (95.08s)
base and it will examine your project
[01:37] (97.44s)
and then ask hey what do you want me to
[01:38] (98.68s)
do and you tell it what to do and then
[01:41] (101.28s)
it just asks you a series of yes no
[01:43] (103.32s)
questions as it comes with commands
[01:44] (104.92s)
should I do this should I not do this
[01:47] (107.28s)
you just have that terminal window
[01:49] (109.00s)
that's all you're seeing at no point are
[01:50] (110.68s)
you seeing the files open up and close
[01:53] (113.40s)
and I felt like since you're abating so
[01:56] (116.08s)
much control to the agent that single
[01:58] (118.56s)
pane with a single interface the agent
[02:00] (120.56s)
was the right way to do this and so when
[02:03] (123.16s)
it comes to ux I preferred Cloud code
[02:06] (126.12s)
next let's talk about code quality and
[02:08] (128.08s)
let me tell you a little bit about the
[02:09] (129.16s)
challenges that I ran them through so I
[02:11] (131.00s)
have a rails app uh you can think of it
[02:12] (132.80s)
as like a email rapper for gpts so for
[02:15] (135.80s)
inst if you want to try it out you can
[02:16] (136.92s)
email roast highh high. just forward an
[02:20] (140.48s)
email to it it will roast your email and
[02:22] (142.20s)
reply back to you and there's a whole
[02:23] (143.40s)
bunch of these different uh email Bots
[02:25] (145.20s)
set up and each one has its own system
[02:27] (147.12s)
message and tracks conversations Etc now
[02:30] (150.04s)
touch this thing for 9 months because
[02:31] (151.68s)
there's enough complexity there that
[02:33] (153.00s)
it's like hard for me to load up the
[02:34] (154.36s)
context into my brain and so this felt
[02:36] (156.28s)
like a good opportunity to get some
[02:37] (157.80s)
momentum on a project that had stalled
[02:40] (160.12s)
the first thing I needed to do is clean
[02:41] (161.52s)
up my tests I was getting some warnings
[02:44] (164.16s)
from some of my gems I just needed to
[02:45] (165.76s)
update some of the gems and some
[02:47] (167.72s)
dependencies and then I wanted to
[02:50] (170.04s)
replace Lang chain for direct calls to
[02:52] (172.08s)
the open AI API and then finally I
[02:54] (174.40s)
wanted to add some support for anthropic
[02:56] (176.44s)
as well now it's worth noting that for
[02:59] (179.12s)
both agents the underlying model that I
[03:01] (181.44s)
was using was Claude 3.7 Sonet uh and so
[03:05] (185.88s)
as expected a lot of the code was
[03:08] (188.88s)
similar or the same on both approaches I
[03:12] (192.64s)
did find though that the one advantage
[03:15] (195.36s)
cursor had was that it had the ability
[03:17] (197.84s)
to search the web for documentation and
[03:20] (200.52s)
towards the end when I was adding
[03:22] (202.24s)
anthropic support it was kind of funny
[03:24] (204.76s)
that you know Claud 37 Sonet was
[03:27] (207.32s)
struggling to add support for anthropic
[03:29] (209.20s)
to my rails but whatever um I I found
[03:32] (212.60s)
that what Claud 37 Sonet wanted to do
[03:35] (215.00s)
was to mimic the syntax that was already
[03:37] (217.56s)
present in the code for open Ai and so
[03:41] (221.24s)
it was having a hard time getting the
[03:43] (223.00s)
anthropic gym to work and figuring out
[03:45] (225.12s)
the right parameters and the right
[03:46] (226.24s)
syntax to call and what ker was able to
[03:48] (228.48s)
do was search the web search for the
[03:49] (229.80s)
documentation and fight the right answer
[03:52] (232.08s)
uh what Claude code ended up doing was
[03:54] (234.24s)
sort of giving up and writing its own
[03:56] (236.32s)
implementation for the anthropic API
[03:58] (238.48s)
using http
[04:00] (240.68s)
uh which worked uh but the fact that it
[04:03] (243.52s)
lacked the ability to search the web and
[04:05] (245.48s)
to look up documentation is really the
[04:08] (248.08s)
only reason that I would give the plus
[04:10] (250.36s)
to cursor here like that I definitely
[04:12] (252.64s)
saw cursor use that ability to get
[04:15] (255.08s)
itself out of a jam once or twice in
[04:17] (257.08s)
this exercise next let's talk about cost
[04:20] (260.60s)
uh cladco can get expensive uh I guess
[04:23] (263.84s)
expensive is relative when we're talking
[04:25] (265.48s)
about software development I believe
[04:27] (267.68s)
that I probably had about 90 minutes or
[04:30] (270.04s)
so of working with Claude all in in
[04:32] (272.44s)
order to implement these three changes
[04:34] (274.52s)
to my codebase and it ended up costing
[04:36] (276.72s)
about $8 so not a lot of money in the
[04:40] (280.12s)
grand scheme of software development but
[04:42] (282.92s)
if I were doing this for 3 or 4 hours a
[04:45] (285.28s)
day every day it it would certainly add
[04:47] (287.92s)
up I I do think there'd be a lot of
[04:49] (289.60s)
value there absolutely um but it is
[04:52] (292.52s)
non-trivial cursor on the other hand I
[04:54] (294.68s)
pay my 20 bucks a month with that I get
[04:56] (296.96s)
500 premium model requests uh going
[05:00] (300.32s)
through these three coding tasks used
[05:03] (303.24s)
less than 50 of my 500 so less than a
[05:06] (306.08s)
tenth my subscription cost 20 bucks a
[05:08] (308.12s)
month let's just sort of naively say
[05:10] (310.52s)
that it cost me $2 to run this exercise
[05:12] (312.96s)
on cursor and it cost me $8 on CLA code
[05:16] (316.76s)
CLA code was about four times more
[05:19] (319.40s)
expensive this super super naive but you
[05:22] (322.28s)
can see that CLA code is non-trivially
[05:25] (325.60s)
more expensive and I do think that the
[05:28] (328.12s)
psychology of the metered pricing versus
[05:30] (330.88s)
the subscription pricing is interesting
[05:33] (333.00s)
here but for most folks Cloud code is
[05:35] (335.56s)
not going to be a replacement for cursor
[05:37] (337.68s)
it's going to be something they use in
[05:39] (339.16s)
addition to cursor and so they're really
[05:41] (341.36s)
going to have to ask themselves even if
[05:43] (343.44s)
Cloud code is better uh is it worth the
[05:46] (346.68s)
incremental cost over their subscription
[05:48] (348.60s)
when they're already getting so much use
[05:50] (350.52s)
out of the cursor agent included with
[05:52] (352.92s)
the subscription that they already have
[05:55] (355.00s)
so purely in terms of cost cursor agent
[05:57] (357.80s)
wins uh the 20 bucks a mon month get you
[06:00] (360.16s)
a whole lot more usage and it does seem
[06:02] (362.04s)
like Cloud code is about four times more
[06:04] (364.32s)
expensive than cursor agent next let's
[06:07] (367.72s)
talk about autonomy so I first did the
[06:11] (371.12s)
exercise with Cloud code and Cloud code
[06:13] (373.36s)
will propose a change to you and you
[06:15] (375.24s)
have three options yes you can do this
[06:17] (377.56s)
command yes you can do this command and
[06:19] (379.52s)
you don't need to ask uh again for this
[06:21] (381.88s)
command in the future or no I want you
[06:24] (384.00s)
to do something else and what I found
[06:25] (385.96s)
was in the beginning I was hesitant uh
[06:28] (388.16s)
but after it had performed the same
[06:29] (389.60s)
command a couple times I finally would
[06:31] (391.36s)
just say yes okay you can do this
[06:32] (392.96s)
command and you don't have to ask for
[06:34] (394.24s)
permission and by the end of my session
[06:36] (396.40s)
working with Cloud code it was doing
[06:38] (398.64s)
almost everything autonomously I had
[06:41] (401.00s)
basically had earned it had earned my
[06:43] (403.24s)
trust and I had given permission to do I
[06:45] (405.24s)
think just about everything except for
[06:46] (406.48s)
like RM
[06:48] (408.36s)
um cursor agent on the other hand did
[06:51] (411.40s)
not have that concept of gaining trust
[06:54] (414.92s)
it would ask you do you want to accept
[06:56] (416.68s)
this command or accept this change or do
[06:58] (418.68s)
you want to turn on mode and even though
[07:01] (421.24s)
I'd already been through this experience
[07:02] (422.80s)
with Claud code where I'd given it
[07:04] (424.88s)
permission to do basically everything I
[07:07] (427.08s)
did not trust cursor agent to enough to
[07:10] (430.56s)
turn on yellow mod and so I hope that
[07:13] (433.52s)
cursor agent does roll out that sort of
[07:16] (436.40s)
incremental permissioning that earned
[07:18] (438.96s)
trust and it feels like an easy enough
[07:21] (441.44s)
change I suspect we'll see it in uh
[07:23] (443.80s)
update soon but as of right now as we
[07:26] (446.40s)
all try to Grapple with the question of
[07:29] (449.00s)
how much do we trust our coding agent
[07:31] (451.80s)
how much do we want to let it do on our
[07:33] (453.56s)
local machine I think CLA code really
[07:36] (456.20s)
nailed that model with the earn trust or
[07:39] (459.80s)
the incremental permissions finally
[07:42] (462.52s)
let's talk about the whole software
[07:43] (463.96s)
development life cycle um I tried to
[07:46] (466.52s)
embrace test driven development or at
[07:48] (468.76s)
least having some good test coverage
[07:50] (470.72s)
with these agents I do feel like since
[07:52] (472.52s)
I'm giving up a lot of the control in
[07:54] (474.72s)
the code that's being written in the
[07:55] (475.88s)
file I want to make sure that I have a
[07:57] (477.52s)
lot of tests and uh I felt like Cloud
[08:00] (480.44s)
code did a much better job both working
[08:02] (482.96s)
with tests and also working with Version
[08:06] (486.04s)
Control so my workflow with Cloud code
[08:08] (488.84s)
was asking it to first uh write tests
[08:12] (492.84s)
for the feature that it was going to
[08:14] (494.36s)
build then to build the feature then to
[08:17] (497.76s)
make sure that the test pass and then to
[08:20] (500.84s)
commit its changes and I'll say that the
[08:23] (503.24s)
best commit messages that have ever been
[08:26] (506.00s)
written uh for code that I guess I've
[08:29] (509.04s)
written didn't really write it uh were
[08:30] (510.76s)
written by Cloud code like it's commit
[08:32] (512.40s)
messages were beautiful and it seemed to
[08:34] (514.76s)
do a much better job of interacting with
[08:37] (517.28s)
tests um than cursor did and I think
[08:40] (520.72s)
part of this is just that notion that it
[08:43] (523.16s)
is a command line tool it does live in
[08:45] (525.52s)
the terminal so just anytime Claude code
[08:49] (529.16s)
was running terminal commands it felt
[08:51] (531.80s)
much more natural uh anytime cursor
[08:54] (534.32s)
agent was doing this it just it didn't
[08:56] (536.40s)
feel like it fit right again back to
[08:57] (537.84s)
some of the ux stuff like I I had a a
[08:59] (539.96s)
small terminal window in that third of a
[09:02] (542.36s)
pane on the right hand side and it just
[09:04] (544.56s)
did not feel like cursor agent was as
[09:07] (547.24s)
comfortable of getting output for my
[09:09] (549.64s)
tests and then updating the files based
[09:11] (551.96s)
on uh what it was seeing happening in
[09:14] (554.20s)
the tests and then also I while I do
[09:17] (557.56s)
cursors um get repository UI that lets
[09:21] (561.72s)
you browse all the past get all the past
[09:24] (564.64s)
commits and everything and browse the
[09:26] (566.08s)
branches I really really do like having
[09:28] (568.32s)
that baked into my ID e um the place
[09:31] (571.16s)
where it sort of fell short was uh it
[09:34] (574.00s)
has a little button where you can
[09:35] (575.60s)
autogenerate the commit message and it
[09:37] (577.48s)
just does like align it basically it
[09:38] (578.88s)
writes like commit messages like I would
[09:41] (581.16s)
uh whereas CLA CES were just so detailed
[09:43] (583.00s)
like you have to give it points for that
[09:44] (584.68s)
between its use of tests and its use of
[09:47] (587.08s)
very detailed very verbose get commit
[09:49] (589.64s)
messages I feel like Claud code did a
[09:52] (592.08s)
better job of making up for some of the
[09:54] (594.36s)
concerns I would have for an agent than
[09:56] (596.88s)
cursor agent did before I Crown a winner
[10:00] (600.48s)
here let's just step back and
[10:01] (601.84s)
acknowledge two things one uh I gave
[10:06] (606.56s)
both of these code agents uh these three
[10:09] (609.16s)
coding tasks on a project that I was
[10:11] (611.36s)
stalled on and both of them completed
[10:13] (613.96s)
the job I I sort of can't believe that
[10:16] (616.64s)
we're here uh I I did not expect these
[10:19] (619.24s)
coding agents to work as well as they
[10:20] (620.68s)
did and I sort of have the last couple
[10:23] (623.08s)
years thought that uh while llms worked
[10:26] (626.52s)
really well for coding it was really
[10:28] (628.36s)
essential to have a human in the loop
[10:30] (630.28s)
orchestrating the changes and this is
[10:32] (632.64s)
one of the first times that I've used a
[10:34] (634.20s)
coding agent and been truly impressed
[10:36] (636.72s)
with the results and felt like it did a
[10:38] (638.76s)
better job than I could was it perfect
[10:40] (640.92s)
no is my code base as complex as what
[10:43] (643.96s)
you might be working on at work probably
[10:46] (646.28s)
not but this is a non-trivial code base
[10:49] (649.20s)
and these things applied changes and
[10:51] (651.60s)
wrote tests and wrote commit messages
[10:54] (654.88s)
better than I would uh second I don't
[10:57] (657.52s)
want to set up a false dichotomy here of
[10:59] (659.80s)
uh do you use cloud code or do you use
[11:01] (661.20s)
cursor agent the truth is you probably
[11:03] (663.20s)
should be using both you uh if you want
[11:05] (665.44s)
to actually you can just open up CLA
[11:07] (667.28s)
code inside a terminal inside a cursor
[11:09] (669.32s)
and then you sort of get the best of
[11:10] (670.48s)
both worlds um but honestly if you're a
[11:12] (672.84s)
software developer these days and you
[11:14] (674.12s)
have the ability you should probably
[11:15] (675.44s)
just get the $20 month cursor
[11:17] (677.32s)
subscription get familiar with it and
[11:19] (679.52s)
then you should just as you use cloud
[11:21] (681.36s)
code watch your costs make sure you're
[11:23] (683.08s)
compacting your conversation history
[11:24] (684.96s)
often that will help keep your cost down
[11:27] (687.08s)
uh and just use it and and get familiar
[11:29] (689.08s)
with both the tools so it's this is not
[11:30] (690.96s)
an either or thing all that said I
[11:33] (693.64s)
preferred Cloud code uh I did think the
[11:36] (696.28s)
ux was better uh I loved the way that it
[11:40] (700.72s)
had the incremental permissions I loved
[11:42] (702.64s)
the way that it earned my trust and I
[11:44] (704.44s)
thought it did a better job uh working
[11:46] (706.44s)
with Version Control and working with my
[11:48] (708.32s)
tests uh all that said the cursor team
[11:50] (710.52s)
they iterate they ship so fast so I'm
[11:52] (712.84s)
sure they're going to be learning from
[11:53] (713.88s)
cloud code and you're going be seeing a
[11:55] (715.32s)
lot of these changes and improvements
[11:56] (716.88s)
coming to cursor very soon