Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Автор: Umar Jamil

Загружено: 2025-01-21

Просмотров: 59343

Описание:

Paper reading in the Discord group. All the lecture was improvised.

Join the group: / discord

Link to paper: https://github.com/deepseek-ai/DeepSe...

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Обзор теории DeepSeek R1 | GRPO + RL + SFT

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

How to Train LLMs to

How to Train LLMs to "Think" (o1 & DeepSeek-R1)

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Yann LeCun | Self-Supervised Learning, JEPA, World Models, and the future of AI

Понимание рассуждений LLM (o1/o3, DeepSeek-R1, Gemini Thinking, Grok 3, Claude 3.7)

Понимание рассуждений LLM (o1/o3, DeepSeek-R1, Gemini Thinking, Grok 3, Claude 3.7)

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

The ONLY DeepSeek GRPO/PPO video you'll EVER need (with examples and exercises) | RL Foundations

The ONLY DeepSeek GRPO/PPO video you'll EVER need (with examples and exercises) | RL Foundations

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

How Did They Do It? DeepSeek V3 and R1 Explained

How Did They Do It? DeepSeek V3 and R1 Explained

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

Теренс Тао о том, как Григорий Перельман решил гипотезу Пуанкаре | Лекс Фридман

The Key Equation Behind Probability

The Key Equation Behind Probability

Train Your Own Reasoning Model (DeepSeek Clone) Fast & With Only 7Gb Of VRAM

Train Your Own Reasoning Model (DeepSeek Clone) Fast & With Only 7Gb Of VRAM

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning

DeepSeek R1: объяснение твоей бабушке

DeepSeek R1: объяснение твоей бабушке

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models

GRPO - Group Relative Policy Optimization - How DeepSeek trains reasoning models