Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Автор: Dimitri Bertsekas

Загружено: 2024-04-27

Просмотров: 5090

Описание:

Slides, class notes, and related textbook material at http://web.mit.edu/dimitrib/www/RLboo...

The sound of the 1st videolecture of the 2024 class turned out to be degraded. I have instead posted the 1st video of the 2023 class, which has better sound and essentially identical content. Slides can be found at https://web.mit.edu/dimitrib/www/RLTo...

The subsequent videolectures 2-13 are from the 2024 offering of the course. The slides of the 1st lecture of 2024 can be found at https://web.mit.edu/dimitrib/www/RLTo...

Lecture Content: Course overview, AlphaZero, off-line training, on-line play, relation to Newton's method. Exact and approximate dynamic programming for deterministic problems, discrete optimization, model predictive and adaptive control, large language models via dynamic programming, approximation in value space and reinforcement learning

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

The failure of theoretical error bounds in Reinforcement Learning.

The failure of theoretical error bounds in Reinforcement Learning.

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Lecture 1 Part 1: Approximate Dynamic Programming Lectures by D. P. Bertsekas

Lecture 1 Part 1: Approximate Dynamic Programming Lectures by D. P. Bertsekas

Reinforcement Learning Course at ASU

Reinforcement Learning Course at ASU

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

Bertsekas - Dynamic Programming

Bertsekas - Dynamic Programming

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Lecture 19: Dynamic Programming I: Fibonacci, Shortest Paths

Lecture 19: Dynamic Programming I: Fibonacci, Shortest Paths

John Tsitsiklis (MIT):

John Tsitsiklis (MIT): "The Shades of Reinforcement Learning"

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Все, что вам нужно знать о теории управления

Все, что вам нужно знать о теории управления

The Full Reinforcement Learning Iceberg

The Full Reinforcement Learning Iceberg

Distributed Optimization via Alternating Direction Method of Multipliers

Distributed Optimization via Alternating Direction Method of Multipliers

Lecture 1, 2023: Introduction, AlphaZero, Deterministic DP, course overview, ASU

Lecture 1, 2023: Introduction, AlphaZero, Deterministic DP, course overview, ASU

Цепи Маркова — математика предсказаний [Veritasium]

Цепи Маркова — математика предсказаний [Veritasium]

Сеть Хопфилда: как хранятся воспоминания в нейронных сетях? [Нобелевская премия по физике 2024 го...

Сеть Хопфилда: как хранятся воспоминания в нейронных сетях? [Нобелевская премия по физике 2024 го...

Почему диффузия работает лучше, чем авторегрессия?

Почему диффузия работает лучше, чем авторегрессия?

Reinforcement Learning Series: Overview of Methods

Reinforcement Learning Series: Overview of Methods

ИНТЕРНЕТ 2026: Смерть VPN, Белые списки и режим Интранета. Системный анализ конца сети

ИНТЕРНЕТ 2026: Смерть VPN, Белые списки и режим Интранета. Системный анализ конца сети

Уоррен Баффет: Если вы хотите разбогатеть, перестаньте покупать эти 5 вещей.

Уоррен Баффет: Если вы хотите разбогатеть, перестаньте покупать эти 5 вещей.