[00:00] (0.08s)
Hey everyone, welcome back to the
[00:01] (1.36s)
channel. My name is John and this is
[00:02] (2.72s)
your modern tech breakdown. Today I'm
[00:04] (4.64s)
looking at a new paper that criticizes
[00:06] (6.88s)
some of the more outlandish claims made
[00:09] (9.04s)
by AI safety researchers. Let's get into
[00:23] (23.52s)
All right. So, I came across this paper
[00:25] (25.20s)
from the UK AI Security Institute, which
[00:28] (28.16s)
is a relatively new department within
[00:30] (30.08s)
the UK government. And according to its
[00:32] (32.32s)
website, its mission is to equip
[00:34] (34.24s)
governments with a scientific
[00:36] (36.48s)
understanding of the risks posed by
[00:38] (38.72s)
advanced AI. Fair enough. Sounds similar
[00:41] (41.44s)
to a bunch of other AI safety research
[00:43] (43.68s)
organizations that seem to be springing
[00:45] (45.36s)
up lately, but they recently put out a
[00:47] (47.68s)
paper that caught my eye. It's titled
[00:50] (50.08s)
Lessons from a Chimp: AI Scheming and
[00:52] (52.80s)
the Quest for Ape Language. And it goes
[00:55] (55.44s)
into some of the research issues that
[00:57] (57.04s)
the current AI safety industrial complex
[00:59] (59.36s)
has generated that show some of the same
[01:01] (61.68s)
error patterns seen in previous research
[01:04] (64.24s)
into apes learning language. Let's go
[01:06] (66.48s)
through the issues raised by the paper.
[01:08] (68.48s)
The first and most obvious is researcher
[01:10] (70.80s)
bias. I've mentioned before how many,
[01:13] (73.04s)
not all, but many AI researchers seem to
[01:16] (76.00s)
have an obvious interest in finding AI
[01:18] (78.32s)
models that exhibit unaligned or
[01:20] (80.32s)
scheming behavior. Many researchers in
[01:22] (82.40s)
the field receive funding from companies
[01:24] (84.32s)
that are desperately competing to win
[01:26] (86.16s)
the AI race. And similarly, in the 1960s
[01:29] (89.20s)
and 70s, the researchers in the ape
[01:31] (91.52s)
language field were motivated to
[01:33] (93.04s)
discover that apes could learn language.
[01:35] (95.28s)
One, the discovery that an ape could
[01:37] (97.44s)
communicate through language would
[01:39] (99.20s)
clearly be a monumental discovery and
[01:41] (101.20s)
probably win someone a Nobel Prize. And
[01:43] (103.68s)
second, some of the researchers raised
[01:45] (105.68s)
their subjects from a young age and even
[01:48] (108.08s)
called themselves the mother of these
[01:49] (109.76s)
apes. It's not hard to imagine that
[01:51] (111.52s)
these ape researchers would be biased
[01:53] (113.44s)
towards seeing language driven behavior
[01:55] (115.28s)
from their subjects, much like a parent
[01:56] (116.96s)
would. Secondly, many AI safety research
[01:59] (119.76s)
articles are anecdotal and don't come
[02:01] (121.68s)
from a rigorous hypothesis and control
[02:04] (124.00s)
experimental framework. As an example,
[02:06] (126.08s)
how many of you have heard the story
[02:07] (127.36s)
about the AI model that blackmailed its
[02:09] (129.28s)
user in order to avoid being shut off?
[02:11] (131.68s)
That anecdote spun around the internet
[02:13] (133.60s)
recently, but the details were much less
[02:15] (135.44s)
widely disseminated or reported on.
[02:17] (137.60s)
Thirdly, and this is probably more a
[02:19] (139.76s)
critique of humans in general and less
[02:21] (141.84s)
focused on AI safety research, but
[02:24] (144.00s)
humans have a tendency to see beliefs
[02:25] (145.92s)
and desires in nonhuman agents when they
[02:28] (148.64s)
really don't exist. The example is
[02:30] (150.64s)
Clever Hans, a horse that people thought
[02:32] (152.88s)
could do simple arithmetic until it was
[02:34] (154.72s)
noticed that its owner was
[02:36] (156.00s)
subconsciously signaling the horse with
[02:38] (158.24s)
the correct answer. It certainly seems
[02:40] (160.08s)
plausible to me that people in the AI
[02:41] (161.76s)
safety research field could be similarly
[02:43] (163.84s)
influenced. And lastly, the authors note
[02:46] (166.00s)
that many of the papers written on this
[02:47] (167.60s)
topic come from a small set of
[02:49] (169.44s)
overlapping authors with similar views
[02:51] (171.52s)
about the near-term potential of an
[02:53] (173.20s)
emerging artificial general
[02:55] (175.04s)
intelligence, which sounds a lot like
[02:57] (177.28s)
the community that orbits anthropic and
[02:59] (179.44s)
its strange culty group of effective
[03:01] (181.44s)
altruists. It certainly seems like an
[03:03] (183.44s)
echo chamber to me. So, there you have
[03:05] (185.12s)
it. A nice paper that summarizes a few
[03:07] (187.28s)
of the issues with the current research
[03:08] (188.80s)
in the AI space. For me, I do think that
[03:10] (190.96s)
the AI techniques that we are seeing
[03:12] (192.64s)
coming from these labs will someday
[03:14] (194.40s)
create incredible artificial
[03:15] (195.92s)
intelligence. And as we get closer to
[03:17] (197.84s)
that, we should probably spend some time
[03:19] (199.44s)
thinking about how to ensure we don't
[03:20] (200.80s)
lose control of it. But here in 2025,
[03:23] (203.76s)
we're talking about chat bots that
[03:25] (205.36s)
hallucinate and agents that fail more
[03:27] (207.28s)
than twothirds of the time. I don't
[03:29] (209.12s)
think the AI apocalypse is right around
[03:30] (210.96s)
the corner. So, this hand ringing does
[03:33] (213.04s)
seem a little bit premature and self-
[03:35] (215.36s)
serving in most cases. But what do you
[03:37] (217.20s)
think? Leave a comment down below. As
[03:38] (218.96s)
always, thanks for watching. Please
[03:40] (220.24s)
like, comment, and subscribe.