New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

Автор: Dimitri Bertsekas

Загружено: 2025-03-01

Просмотров: 650

Описание:

This lecture explores three interrelated research directions in approximate dynamic programming and reinforcement learning:
1. Seminorm projections (unifying projected equation and aggregation
approaches), generalized Bellman equations (multistep equations with state-dependent
weights; the TD(lambda) equation is an example), and free form sampling (a flexible alternative to single long trajectory simulation)
2 Aggregation and seminorm projected equations
3 Simulation-based implementation of iterative and matrix inversion methods using free-form sampling.
Part of this material has appeared in varying degrees of detail in my 2012 DP book (Vol. II), and my 2022 Abstract DP book. Slides at http://www.mit.edu/~dimitrib/Gen_Bell...

New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Reinforcement Learning, Model Predictive Control, and the Newton Step for Solving Bellman's Equation

Lecture 8, 2025; GPT, HMM, and Markov chains: Rollout variants for most likely sequence generation

Lecture 8, 2025; GPT, HMM, and Markov chains: Rollout variants for most likely sequence generation

The failure of theoretical error bounds in Reinforcement Learning.

The failure of theoretical error bounds in Reinforcement Learning.

The Man Behind Google's AI Machine | Demis Hassabis Interview

The Man Behind Google's AI Machine | Demis Hassabis Interview

Lec 1 | MIT 9.00SC Introduction to Psychology, Spring 2011

Lec 1 | MIT 9.00SC Introduction to Psychology, Spring 2011

Я в опасности

Samuel C.C. Ting

Samuel C.C. Ting

Computer chess with model predictive control and reinforcement learning

Computer chess with model predictive control and reinforcement learning

26. Chernobyl — How It Happened

26. Chernobyl — How It Happened

Coupling Particle Simulations: Challenges, Strategies, and the ON-DEM Vision

Coupling Particle Simulations: Challenges, Strategies, and the ON-DEM Vision

Lecture 2: Analysis Methods and Rectifiers

Lecture 2: Analysis Methods and Rectifiers

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

NMPC 2024 - Model Predictive Control & RL: A Unified Framework Based on Dynamic Programming

Can You Name What You're Looking For?

Can You Name What You're Looking For?

Lec 1 | MIT 18.01 Single Variable Calculus, Fall 2007

Lec 1 | MIT 18.01 Single Variable Calculus, Fall 2007

MIT Sloan's Rama Ramakrishnan Shares Primer on ChatGPT

MIT Sloan's Rama Ramakrishnan Shares Primer on ChatGPT

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

Abstract Dynamic Programming, Reinforcement Learning, Newton's Method, and Gradient Optimization

2025 MIT Integration Bee - Semifinals

2025 MIT Integration Bee - Semifinals

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

Plenary lecture at IFAC Nonlinear MPC, 2024; Model Predictive Control and Reinforcement Learning

MIT Introduction to Deep Learning | 6.S191

MIT Introduction to Deep Learning | 6.S191

Lecture 2: Experimental Facts of Life

Lecture 2: Experimental Facts of Life