Lecture 1, 2025, Course overview: RL and DP, AlphaZero, deterministic DP, examples, applications

Автор: Dimitri Bertsekas

Загружено: 2025-01-16

Просмотров: 6503

Описание:

Slides, class notes, and related textbook material at https://web.mit.edu/dimitrib/www/RLbo...
This site also contains complete PDF of related textbooks by Bertsekas:
"A Course in Reinforcement Learning", 2nd edition, 2025
"Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control," 2022
"Abstract Dynamic Programming", 3rd edition, 2022
"Rollout, Policy Iteration, and Distributed Reinforcement Learning," 2020
Lecture Content: Course overview, AlphaZero, off-line training, on-line play, relation to Newton's method. Exact and approximate dynamic programming for deterministic problems, discrete optimization, model predictive and adaptive control, large language models via dynamic programming, approximation in value space and reinforcement learning

Lecture 1, 2025, Course overview: RL and DP, AlphaZero, deterministic DP, examples, applications

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Lecture 2, 2025, Stochastic finite and infinite horizon DP, approximation in value and policy space

Lecture 2, 2025, Stochastic finite and infinite horizon DP, approximation in value and policy space

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Я в опасности

DeepMind x UCL | Introduction to Reinforcement Learning 2015

DeepMind x UCL | Introduction to Reinforcement Learning 2015

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

Stanford CS234 I Reinforcement Learning I Spring 2024 I Emma Brunskill

Stanford CS234 I Reinforcement Learning I Spring 2024 I Emma Brunskill

Lecture 1 | Convex Optimization I (Stanford)

Lecture 1 | Convex Optimization I (Stanford)

The failure of theoretical error bounds in Reinforcement Learning.

The failure of theoretical error bounds in Reinforcement Learning.

Computer chess with model predictive control and reinforcement learning

Computer chess with model predictive control and reinforcement learning

Reinforcement Learning By the Book

Reinforcement Learning By the Book

Solving Combinatorial Problems Using Reinforcement Learning and LLMs | Martin Takáč

Solving Combinatorial Problems Using Reinforcement Learning and LLMs | Martin Takáč

AI Learns to Walk (deep reinforcement learning)

AI Learns to Walk (deep reinforcement learning)

Обучение с подкреплением, по книге

Обучение с подкреплением, по книге

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

LIDS@80: Honoring Dimitri Bertsekas

LIDS@80: Honoring Dimitri Bertsekas

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

L1 MDPs, Exact Solution Methods, Max-ent RL (Foundations of Deep RL Series)

The Time Paradox Hidden Inside Feynman’s Nobel Prize Work

The Time Paradox Hidden Inside Feynman’s Nobel Prize Work

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Lecture 1, 2024, course overview: RL and DP, AlphaZero, discrete and continuous applications

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

26. Chernobyl — How It Happened

26. Chernobyl — How It Happened