[00:00] (0.08s)
Hey, what's up? My name is Greg and
[00:01] (1.44s)
Enthropic just announced web search via
[00:03] (3.84s)
the API. So, it used to be if you wanted
[00:06] (6.32s)
to build a research bot that would go
[00:08] (8.00s)
gather up-to-date information via the
[00:10] (10.00s)
web, you'd have to use some form of
[00:11] (11.92s)
external tool or an MCP server. This now
[00:15] (15.28s)
allows you to build a web research agent
[00:18] (18.72s)
without using any external tools, just
[00:21] (21.28s)
solely using the cloud API. So, let's
[00:23] (23.44s)
walk through a demo of what it looks
[00:25] (25.04s)
like to use web search via the cloud API
[00:27] (27.52s)
in Python. And then I'll walk you
[00:29] (29.44s)
through how to build your own with a
[00:31] (31.28s)
getting started. Then we'll talk about
[00:32] (32.80s)
some of the options that you can turn on
[00:34] (34.80s)
and off and some more complex use cases.
[00:37] (37.44s)
I'm recording this on Thursday, May 8th,
[00:39] (39.56s)
2025. Just a few hours ago, the first
[00:42] (42.40s)
American Pope was selected. Uh this is
[00:45] (45.52s)
not information that is included in the
[00:48] (48.24s)
training data for claude uh sonnet 3.7
[00:51] (51.60s)
given that it just happened a few hours
[00:52] (52.88s)
ago. Uh so in order to have AI produce a
[00:56] (56.56s)
report on this event, it would need to
[00:58] (58.24s)
use web search. So here is an example of
[01:00] (60.56s)
what we'll be building towards in this
[01:02] (62.24s)
video. I asked the question, who is the
[01:04] (64.48s)
new pope? uh and it went out and it
[01:06] (66.72s)
searched for information and gave me
[01:08] (68.64s)
back text about that question with
[01:12] (72.36s)
citations both the URL of where it
[01:14] (74.96s)
pulled the information from and a
[01:17] (77.20s)
snippet of the cited text. So let's look
[01:20] (80.00s)
at how you use the web search tool. Uh
[01:22] (82.16s)
this just to start off with would be a
[01:24] (84.08s)
hello world with the anthropic API. We
[01:26] (86.88s)
have defined a question. We are
[01:29] (89.20s)
instantiating a client. Make sure that
[01:31] (91.36s)
you set the enthropic API key as an
[01:34] (94.16s)
environment variable. And if you do have
[01:35] (95.84s)
that set in your environment, then this
[01:37] (97.52s)
client will pick it up by default
[01:39] (99.12s)
without you needing to call it out
[01:40] (100.68s)
explicitly. Uh then you just simply uh
[01:44] (104.32s)
create a new message using the client.
[01:47] (107.28s)
You choose the model that you want and
[01:50] (110.16s)
you can pass along max tokens. This is
[01:52] (112.48s)
an optional parameter just to help you
[01:54] (114.48s)
control costs. Uh and then you pass in a
[01:57] (117.60s)
list of messages. uh and you define the
[02:00] (120.56s)
user message uh with the role user and
[02:02] (122.96s)
then you pass in along a string that has
[02:05] (125.04s)
the content of that message. So so if I
[02:07] (127.20s)
were to run the script without the web
[02:09] (129.68s)
search functionality, I get a response
[02:11] (131.36s)
that says that as of the last update uh
[02:14] (134.16s)
it was Pope Francis. So now let's add in
[02:17] (137.12s)
the web search tool. And we just add an
[02:19] (139.04s)
additional parameter called tools. This
[02:20] (140.96s)
takes a list of tools. Here we're only
[02:23] (143.28s)
going to define a single one. Uh and
[02:25] (145.36s)
we're telling it to use the uh web
[02:27] (147.52s)
search tool. Uh, I'm going to use ice
[02:30] (150.00s)
cream here to print out the full
[02:33] (153.52s)
response. This just is kind of like a
[02:36] (156.08s)
very nice pretty print. Super useful
[02:38] (158.64s)
helper library if you haven't used it
[02:40] (160.68s)
before. Let's run this again. And I'm
[02:43] (163.20s)
going to start a stopwatch here. I don't
[02:44] (164.48s)
know if I'm going to edit this out or
[02:45] (165.68s)
not, but I just do want you to see what
[02:47] (167.92s)
the total time
[02:51] (171.24s)
is. All right, so about 20 seconds to
[02:53] (173.76s)
get our response back. Let's take a look
[02:56] (176.32s)
at what we got.
[02:58] (178.96s)
So we look our response now is uh
[03:01] (181.88s)
considerably more uh fuller than what we
[03:05] (185.76s)
had the first time we did. It starts off
[03:07] (187.84s)
by saying that it needs to search for
[03:09] (189.52s)
the latest information. Uh and then you
[03:11] (191.92s)
can start seeing some tool usage. And so
[03:14] (194.96s)
uh Claude makes a query uh who is the
[03:17] (197.52s)
current pope May 2025. Uh and then it
[03:21] (201.52s)
finds a page here uh from CNN. Uh the
[03:26] (206.56s)
page was from 13 minutes ago. Uh and it
[03:29] (209.76s)
has a URL here. I do think processing
[03:32] (212.32s)
through this response can be a little
[03:33] (213.68s)
bit of a challenge just wrapping your
[03:35] (215.28s)
head around the structure of it. And
[03:37] (217.12s)
I've played with a few different ways to
[03:38] (218.80s)
do this. Uh here initially I'm
[03:41] (221.44s)
converting the response to a dictionary.
[03:43] (223.84s)
Uh but let's run this again and take a
[03:46] (226.16s)
look at what sort of the raw response
[03:50] (230.44s)
like. All right. And here because of ice
[03:52] (232.96s)
cream, we get a little bit prettier of a
[03:55] (235.36s)
response. Um uh but you see here that we
[03:58] (238.72s)
do get an initial uh text back from
[04:02] (242.72s)
Claude basically telling us that it's
[04:05] (245.04s)
going to initiate the search. Then we
[04:07] (247.20s)
see a lot of search results. Each one of
[04:10] (250.56s)
those has a fairly large encrypted
[04:12] (252.72s)
content block there. I assume that this
[04:15] (255.60s)
is just because they don't want to give
[04:18] (258.32s)
uh users, they basically don't want to
[04:20] (260.40s)
get in the scraping game, right? They
[04:21] (261.92s)
don't want to give users all of the
[04:23] (263.44s)
content that they went out and scraped
[04:24] (264.88s)
from websites. So, it's encrypted so
[04:27] (267.28s)
that uh Anthropic can use it later, but
[04:30] (270.56s)
they are not just going and scraping uh
[04:33] (273.28s)
other people's content and then just
[04:34] (274.80s)
passing that right off to the user uh
[04:36] (276.72s)
for your use. uh which I think is sort
[04:39] (279.20s)
of interesting, but it does sort of
[04:41] (281.44s)
pollute the uh the responses when you're
[04:44] (284.32s)
just trying to like look through these
[04:45] (285.52s)
things. Um so what you need to notice is
[04:48] (288.64s)
that each of the tool calls is going to
[04:51] (291.20s)
return some information and then once
[04:53] (293.84s)
the tool calls have completed, Claude
[04:57] (297.04s)
then will try to sort of summarize that
[05:00] (300.08s)
information uh and it will provide
[05:02] (302.64s)
citations. And so this is actually the
[05:05] (305.12s)
information that you will probably use
[05:07] (307.12s)
as you start building with this tool. So
[05:09] (309.84s)
it says here uh we have a new text
[05:11] (311.76s)
block. Based on the search results I can
[05:13] (313.76s)
provide you with information about the
[05:14] (314.96s)
new pope. And then it gives us
[05:17] (317.72s)
citations. Here's uh the cited text. And
[05:21] (321.52s)
so this is just a snippet of that big
[05:23] (323.76s)
block of encrypted text.
[05:26] (326.88s)
uh and then it gives us the title of the
[05:28] (328.72s)
page and it gives us a link uh to that
[05:32] (332.32s)
page. With the Python library, they use
[05:35] (335.52s)
these text blocks where they're using
[05:37] (337.12s)
pyantic objects uh to format all the the
[05:40] (340.48s)
data that's coming back, which does give
[05:42] (342.48s)
you a lot of nicities in interacting
[05:44] (344.08s)
with it. Uh but it can also it's not as
[05:48] (348.20s)
straightforward as this JSON here. So if
[05:51] (351.20s)
you do want to uh just work directly
[05:53] (353.84s)
with JSON for instance you can run uh to
[05:57] (357.52s)
JSON and if we run that
[06:01] (361.00s)
again then you can see that we get uh
[06:03] (363.76s)
just a big string back uh or I've also
[06:06] (366.72s)
found it helpful to just run uh two dict
[06:09] (369.36s)
here and here you can see the Python
[06:11] (371.52s)
dictionary. Let's just look at how we
[06:13] (373.20s)
did the report in the beginning of the
[06:14] (374.64s)
video. Um, and I had a function here
[06:17] (377.84s)
called ask claude that has a standard
[06:20] (380.08s)
tool use. And then really the rest of it
[06:21] (381.92s)
was just about how are we going to
[06:23] (383.04s)
process through the responses. Um, is
[06:25] (385.76s)
that we're going to ask the question. We
[06:27] (387.12s)
get the response back. Uh, and then I'm
[06:29] (389.36s)
basically constructing the report in
[06:30] (390.88s)
peace meal. So I have this function
[06:32] (392.72s)
called get assistant messages. I'm
[06:34] (394.88s)
looking for the block type called text.
[06:37] (397.12s)
And if we do see text, then I'm adding
[06:39] (399.28s)
in the text of that block to my content.
[06:43] (403.52s)
And then if that block contains
[06:45] (405.88s)
citations then I'm grabbing the
[06:47] (407.92s)
citations. And that very simply looks
[06:50] (410.24s)
like this. So if there is is uh a
[06:54] (414.48s)
attribute called citations on that block
[06:56] (416.80s)
then I iterate through each one of those
[06:58] (418.48s)
citations and I grab the URL, the title
[07:01] (421.20s)
uh and the cited text. Uh and so here's
[07:03] (423.60s)
my initial question and then I am just
[07:06] (426.00s)
pasting in the text and the citations as
[07:08] (428.64s)
they come in. And uh this would then
[07:11] (431.12s)
allow me to come and click on uh these
[07:14] (434.88s)
different sources to actually confirm uh
[07:17] (437.44s)
the reports that I'm getting back. Okay,
[07:19] (439.76s)
let's talk about a few different ways
[07:21] (441.04s)
that you can use this thing. Uh first,
[07:23] (443.04s)
you know, it took about 20 seconds for
[07:24] (444.80s)
me to get a response back there. Uh you
[07:26] (446.88s)
are often going to want to implement
[07:28] (448.56s)
streaming. You're going to use the async
[07:30] (450.88s)
anthropic client and then you're going
[07:32] (452.64s)
to create an async function to ask the
[07:35] (455.44s)
question. uh and then you get the
[07:38] (458.40s)
results from that as a stream and as the
[07:40] (460.80s)
events come in from the string you can
[07:42] (462.88s)
print those out and so if I were to uh
[07:45] (465.84s)
run this and you can see here that we're
[07:48] (468.56s)
just streaming the results as they're
[07:50] (470.80s)
coming in and if you want to see any of
[07:52] (472.80s)
the streaming code you can check out hih
[07:54] (474.76s)
high.ai/cloud web and you'll find all of
[07:57] (477.44s)
the information there the links in the
[07:59] (479.12s)
description. All right let's look at
[08:01] (481.12s)
another option. Uh this is another thing
[08:03] (483.12s)
that people were pretty excited about.
[08:04] (484.72s)
Uh there's an optional parameter here
[08:06] (486.80s)
called allowed domains. So you can
[08:09] (489.12s)
restrict your search to only a few
[08:11] (491.20s)
different domains. Uh so for instance,
[08:13] (493.52s)
if we wanted to run this uh and we only
[08:16] (496.40s)
want to use Reuters, then we can get
[08:19] (499.20s)
information back from just Reuters. And
[08:22] (502.40s)
so then here you can see that all of the
[08:24] (504.40s)
URLs cited are Reuters URLs. Now, uh it
[08:28] (508.64s)
might not be surprising to you, you
[08:30] (510.48s)
can't scrape all websites. So for
[08:32] (512.80s)
instance, if I were to change this from
[08:35] (515.52s)
Reuters to
[08:40] (520.76s)
BBC.co.uk and I try running this
[08:44] (524.12s)
again, I'm going to get an error saying
[08:46] (526.72s)
that the following domains are not
[08:48] (528.40s)
accessible to our agent. Uh you're going
[08:50] (530.80s)
to get the same error if you run this on
[08:55] (535.72s)
reddit.com. Uh similar to allowed
[08:57] (537.84s)
domains, you can do the inverse. There's
[08:59] (539.44s)
a blocked domains parameter if you want
[09:01] (541.76s)
to do that as well. Uh it you can use
[09:04] (544.88s)
either allowed domains or blocked
[09:06] (546.64s)
domains uh in a request, but you can't
[09:08] (548.56s)
use both. Uh last bit, let's talk a
[09:10] (550.40s)
little bit about price. Uh so Anthropics
[09:13] (553.20s)
web search is going to come in at $10
[09:16] (556.72s)
per 1,000 searches plus the tokens that
[09:19] (559.36s)
you pay to do all the standard API stuff
[09:21] (561.92s)
that you typically would. Uh that is a
[09:24] (564.72s)
quite a bit cheaper than OpenAI's
[09:27] (567.76s)
solution. and theirs is uh $30 to $50
[09:32] (572.16s)
per 10,000 calls. So that's curious. I'd
[09:34] (574.32s)
be interested to know uh what Anthropic
[09:36] (576.48s)
is doing to be able to offer that so
[09:38] (578.08s)
much cheaper. Uh and also Anthropic does
[09:40] (580.96s)
not give you the difference between this
[09:42] (582.72s)
search context size low, medium, high.
[09:46] (586.48s)
Uh and uh whereas OpenAI is giving you a
[09:49] (589.36s)
little bit more fine-tuned control
[09:50] (590.72s)
there. Uh so I wonder what default
[09:53] (593.44s)
Tanthropic has made here. uh I have to
[09:55] (595.76s)
imagine that they're just operating with
[09:57] (597.20s)
much lower search context uh in order to
[10:00] (600.00s)
be able to provide that pricing for you.
[10:01] (601.76s)
Uh but uh out of the box, if we're just
[10:04] (604.00s)
comparing, you know, apples to apples or
[10:06] (606.56s)
pricing page to pricing page, uh
[10:08] (608.24s)
Anthropic does say uh $10 per thousand
[10:11] (611.12s)
requests and the cheapest one for OpenAI
[10:13] (613.28s)
is $30 per thousand requests.