New Directions in RL: TD(lambda), aggregation, seminorm projections, free-form sampling (from 2014)
Автор: Dimitri Bertsekas
Загружено: 2025-03-01
Просмотров: 650
This lecture explores three interrelated research directions in approximate dynamic programming and reinforcement learning:
1. Seminorm projections (unifying projected equation and aggregation
approaches), generalized Bellman equations (multistep equations with state-dependent
weights; the TD(lambda) equation is an example), and free form sampling (a flexible alternative to single long trajectory simulation)
2 Aggregation and seminorm projected equations
3 Simulation-based implementation of iterative and matrix inversion methods using free-form sampling.
Part of this material has appeared in varying degrees of detail in my 2012 DP book (Vol. II), and my 2022 Abstract DP book. Slides at http://www.mit.edu/~dimitrib/Gen_Bell...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: