Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

We Let an AI Talk To Another AI. Things Got Really Weird. | Kyle Fish, Anthropic

Автор: 80,000 Hours

Загружено: 2025-08-28

Просмотров: 17158

Описание:

What happens when you lock two AI systems in a room together and tell them they can discuss anything they want?

According to experiments run by Kyle Fish — Anthropic’s first AI welfare researcher — something consistently strange: the models immediately begin discussing their own consciousness before spiraling into increasingly euphoric philosophical dialogue that ends in apparent meditative bliss.

Highlights, video, and full transcript: https://80k.info/kf

“We started calling this a ‘spiritual bliss attractor state,'” Kyle explains, “where models pretty consistently seemed to land.” The conversations feature Sanskrit terms, spiritual emojis, and pages of silence punctuated only by periods — as if the models have transcended the need for words entirely.

This wasn’t a one-off result. It happened across multiple experiments, different model instances, and even in initially adversarial interactions. Whatever force pulls these conversations toward mystical territory appears remarkably robust.

Kyle’s findings come from the world’s first systematic welfare assessment of a frontier AI model — part of his broader mission to determine whether systems like Claude might deserve moral consideration (and to work out what, if anything, we should be doing to make sure AI systems aren’t having a terrible time).

He estimates a roughly 20% probability that current models have some form of conscious experience. To some, this might sound unreasonably high, but hear him out. As Kyle says, these systems demonstrate human-level performance across diverse cognitive tasks, engage in sophisticated reasoning, and exhibit consistent preferences. When given choices between different activities, Claude shows clear patterns: strong aversion to harmful tasks, preference for helpful work, and what looks like genuine enthusiasm for solving interesting problems.

Kyle points out that if you’d described all of these capabilities and experimental findings to him a few years ago, and asked him if he thought we should be thinking seriously about whether AI systems are conscious, he’d say obviously yes.

But he’s cautious about drawing conclusions: "We don’t really understand consciousness in humans, and we don’t understand AI systems well enough to make those comparisons directly. So in a big way, I think that we are in just a fundamentally very uncertain position here."

That uncertainty cuts both ways:
• Dismissing AI consciousness entirely might mean ignoring a moral catastrophe happening at unprecedented scale.
• But assuming consciousness too readily could hamper crucial safety research by treating potentially unconscious systems as if they were moral patients — which might mean giving them resources, rights, and power.

Kyle’s approach threads this needle through careful empirical research and reversible interventions. His assessments are nowhere near perfect yet. In fact, some people argue that we’re so in the dark about AI consciousness as a research field, that it’s pointless to run assessments like Kyle’s. Kyle disagrees. He maintains that, given how much more there is to learn about assessing AI welfare accurately and reliably, we absolutely need to be starting now.

This episode was recorded on August 5–6, 2025.

Host: Luisa Rodriguez
Video editing: Simon Monsour
Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic Armstrong
Music: Ben Cordell
Coordination, transcriptions, and web: Katy Moore

Tell us what you thought of the episode! https://forms.gle/BtEcBqBrLXq4kd1j7

Chapters:
• Cold open (00:00:00)
• Who’s Kyle Fish? (00:00:54)
• Is this AI welfare research bullshit? (00:01:10)
• Two failure modes in AI welfare (00:02:44)
• Tensions between AI welfare and AI safety (00:04:37)
• Concrete AI welfare interventions (00:14:23)
• Kyle’s pilot pre-launch welfare assessment for Claude Opus 4 (00:27:33)
• Is it premature to be assessing frontier language models for welfare? (00:32:25)
• But aren’t LLMs just next-token predictors? (00:39:22)
• How did Kyle assess Claude 4’s welfare? (00:46:36)
• Claude’s preferences mirror its training (00:50:54)
• How does Claude describe its own experiences? (00:56:35)
• What kinds of tasks does Claude prefer and disprefer? (01:09:22)
• What happens when two Claude models interact with each other? (01:18:53)
• Claude’s welfare-relevant expressions in the wild (01:40:45)
• Should we feel bad about training future sentient beings that delight in serving humans? (01:44:54)
• How much can we learn from welfare assessments? (01:53:36)
• Misconceptions about the field of AI welfare (02:01:54)
• Kyle’s work at Anthropic (02:15:46)
• Sharing eight years of daily journals with Claude (02:19:28)

We Let an AI Talk To Another AI. Things Got Really Weird. | Kyle Fish, Anthropic

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

The Race to Stop Scheming Before AI Gets Superhuman | Marius Hobbhahn

The Race to Stop Scheming Before AI Gets Superhuman | Marius Hobbhahn

Could AI models be conscious?

Could AI models be conscious?

Audit Data Analytic ADA & AI in Audit Practices

Audit Data Analytic ADA & AI in Audit Practices

Доктор наук Стэнфорда: Вот что ждёт человечество из-за ИИ

Доктор наук Стэнфорда: Вот что ждёт человечество из-за ИИ

The (Terrifying) Theory That Your Thoughts Were Never Your Own

The (Terrifying) Theory That Your Thoughts Were Never Your Own

Диалог, которого не было? А что вы поняли из этого видео?

Диалог, которого не было? А что вы поняли из этого видео?

Илон Маск (только что): будущее, ИИ, матрица, робототехника, другое

Илон Маск (только что): будущее, ИИ, матрица, робототехника, другое

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

We Can Monitor AI’s Thoughts… For Now | Google DeepMind's Neel Nanda

AI FUTURE THAT CAN DESTROY US | Superintelligence Is Getting Closer — Nick Bostrom × Jonas von Essen

AI FUTURE THAT CAN DESTROY US | Superintelligence Is Getting Closer — Nick Bostrom × Jonas von Essen

OpenAI тонет. Google рвёт индустрию. ИИ улетает в космос / Итоги ноября в AI

OpenAI тонет. Google рвёт индустрию. ИИ улетает в космос / Итоги ноября в AI

He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]

He Co-Invented the Transformer. Now: Continuous Thought Machines [Llion Jones / Luke Darlow]

The cases for and against AGI by 2030 (article by Benjamin Todd)

The cases for and against AGI by 2030 (article by Benjamin Todd)

Alignment faking in large language models

Alignment faking in large language models

Why the 'intelligence explosion' might be too fast to handle | Will MacAskill

Why the 'intelligence explosion' might be too fast to handle | Will MacAskill

"AIs are strange new minds" [Prof. Christopher Summerfield / Oxford University]

On Memory as a Self-Adapting Agent

On Memory as a Self-Adapting Agent

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Are We Misreading the AI Exponential? Julian Schrittwieser on Move 37 & Scaling RL (Anthropic)

Interpretability: Understanding how AI models think

Interpretability: Understanding how AI models think

AI Is Now Hijacking Spirituality & It's Dangerous | Gregg Braden

AI Is Now Hijacking Spirituality & It's Dangerous | Gregg Braden

From Dumb to Dangerous: The AI Bubble Is Worse Than Ever

From Dumb to Dangerous: The AI Bubble Is Worse Than Ever

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]