TD-Lambda: Blending N-Step Return Estimates
Автор: Priyam Mazumdar
Загружено: 2025-09-03
Просмотров: 166
Code: https://github.com/priyammaz/PyTorch-...
Today we continue onto TD Lambda, which improves on TD(N). Instead of having a single N-step estimate, why not do a weighted average of all N-Step estimates on your trajectory? Of course, this leads to new issues, because we are back to the same setup as Monte-Carlo, we need the full trajectory. Luckily, there is an Online method that utilized Eligibility Traces to enable computation at every step!
We will first prove the equivalence between standard TD Lambda and Eligibility Traces. You can find the writeup of the proof here: http://incompleteideas.net/book/ebook.... Then we will implement it to see how it all comes together!
I hope you are already comfortable with the following:
Monte Carlo: • Online Monte Carlo Methods for Model-Free ...
TD Learning: • Q-Learning: Off-Policy Model-Free Learning
TD-N: • N-Step TD Learning: Navigating the Bias/Va...
Timestamps:
00:00:00 - Recap MC/TD(0)/TD(N)
00:03:32 - What is TD Lambda?
00:10:54 - Prove Forward/Backward Method Equivalence
00:17:10 - Get Explicit Form for Eligibility Trace
00:23:30 - What do we want to show?
00:26:17 - Expand the Backward Method (w/ Trace)
00:36:01 - Expand the Forward Method (w/o Trace)
00:58:00 - Implement TD Lambda
01:10:40 - Effect of Lambda
Socials!
X / data_adventurer
Instagram / nixielights
Linkedin / priyammaz
Discord / discord
🚀 Github: https://github.com/priyammaz
🌐 Website: https://www.priyammazumdar.com/
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: