Session 9: Policy Iteration & Q learning code, Finite Horizon MDPs, Dynamic Program, Theory and Exmp
Автор: Mainak's PMRF Tutorials
Загружено: 2025-04-14
Просмотров: 96
This video starts with implementing the Q-learning and policy iteration algorithms in a dangerous grid world setting. Next, we introduce the concept of finite horizon MDPs and controlled Markov Chains and eventually define the Finite Horizon Problem in RL.
We extend the value functions already studied, using a three-parameter reward function and define the Value of a state for the Finite Horizon setting. Next, we define subproblems for the value function and show, using the principle of optimality, that the DP starting from the terminal state and running backwards in time is the optimal solution to the problem.
Materials: https://drive.google.com/drive/folder...

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: