How many instructions can LLMs follow at once?

Автор: John Tan Chong Min

Загружено: 2025-07-21

Просмотров: 611

Описание:

As LLMs are able to do more complex tasks, how many instructions should we give it at one go for reliable, robust generation?

This ability for LLM to follow a greater number of instructions will help it greatly in doing more complex tool use / multi-step reasoning / agentic tasks.

By constraining LLMs to include difficult financial keywords in the report generated, this study measures how 20 SOTA LLMs can handle increasing constraints.

Overall, it appears that o3 (reasoning) and gemini-2.5-pro-review (reasoning) are the best at following complex instructions.

On another note, should we aim to increase instruction following complexity, or should we aim to modularise the process into easy bite-sized bits?

~~~

Slides: https://github.com/tanchongmin/john-y...
Paper: https://arxiv.org/pdf/2507.11538

Other references:
T5 Paper: https://arxiv.org/pdf/1910.10683
Length and correctness in LLMs (longer response tends to be inaccurate): https://arxiv.org/html/2505.00127v1

My repositories mentioned:
StrictJSON: https://github.com/tanchongmin/strict...
AgentJo: https://github.com/tanchongmin/agentjo
text-rpg (my attempt at vibe-coding an RPG): https://github.com/tanchongmin/text-rpg
Between Underthinking and Overthinking: An Empirical Study of Reasoning

~~~

0:00 Introduction
5:21 Main Results
16:22 Why is instruction following important?
24:37 Experiment Details
30:21 Report Generation Prompt
42:22 Verbosity of Response vs Accuracy
47:06 Variability of Accuracy across models
1:00:41 Does reasoning help with instruction following?
1:12:22 My guidelines: How to use LLMs in a process / agentic flow
1:28:16 Discussion
1:38:20 Conclusion

~~~

AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.

Discord:   / discord
LinkedIn:   / chong-min-tan-94652288
Online AI blog: https://delvingintotech.wordpress.com/
Twitter:   / johntanchongmin
Try out my games here: https://simmer.io/@chongmin

How many instructions can LLMs follow at once?

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

MemOS: A Paradigm Shift to Memory as a First Class Citizen for LLMs

MemOS: A Paradigm Shift to Memory as a First Class Citizen for LLMs

How to Train LLMs to

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

DINOv3: One backbone, multiple image/video tasks

DINOv3: One backbone, multiple image/video tasks

AlphaEvolve: My Implementation and Insights. Can AI self-improve?

AlphaEvolve: My Implementation and Insights. Can AI self-improve?

This 27M Parameter AI Just DESTROYED GPT Models 100x Its Size

This 27M Parameter AI Just DESTROYED GPT Models 100x Its Size

«Вся математика — это тайная обработка изображений»

«Вся математика — это тайная обработка изображений»

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

RAG | ВСЁ, что тебе нужно знать (+ 11 Продвинутых стратегий)

RAG | ВСЁ, что тебе нужно знать (+ 11 Продвинутых стратегий)

Андрей Девятов. Не было любви — и не надо!

Андрей Девятов. Не было любви — и не надо!

Илья Суцкевер: Мы переходим от эпохи масштабирования к эпохе исследований

Илья Суцкевер: Мы переходим от эпохи масштабирования к эпохе исследований

R-Zero: Self-Evolving Reasoning LLM from Zero Data

R-Zero: Self-Evolving Reasoning LLM from Zero Data

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

No one actually knows why AI works

No one actually knows why AI works

МФТИ – как учат ГЕНИЕВ? Полнометражный фильм

МФТИ – как учат ГЕНИЕВ? Полнометражный фильм

Гарвардский физик Сабрина Пастерски названа следующим «Эйнштейном»

Гарвардский физик Сабрина Пастерски названа следующим «Эйнштейном»

Правительство захватило власть на 45 миллиардов долларов, о которой никто не говорит

Правительство захватило власть на 45 миллиардов долларов, о которой никто не говорит

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it

Memory Meets Psychology - Claude Plays Pokemon: How It works, How to improve it

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Reasoning without Language (Part 2) - Deep Dive into 27 mil parameter Hierarchical Reasoning Model

Reasoning without Language (Part 2) - Deep Dive into 27 mil parameter Hierarchical Reasoning Model