RL 4: Thompson Sampling - Multi-armed bandits
Автор: AI Insights - Rituraj Kaushik
Загружено: 2019-02-03
Просмотров: 20574
Thompson Sampling - Multi-armed bandits - In this tutorial we discuss another interesting algorithm called Thompson Sampling to solve multi-armed bandit problem. Unlike UCB, this is a sampling based probabilistic approach and proved to be better than UCB.
If you did not watch the previous videos on multi-armed bandits then I strongly encourage you to watch them before watching this video to understand the whole story.
Reinforcement learning tutorial series:
1. Multi-armed Bandits: • RL 1: Multi-armed Bandits 1
2. Multi-Armed Bandits - Action value estimation: • RL 2: Multi-Armed Bandits 2 - Action value...
3. Upper confidence bound: • RL 3: Upper confidence bound (UCB) to solv...
4. Thompson Sampling: • RL 4: Thompson Sampling - Multi-armed bandits
5. Markov Decision Process - MDP: • RL 5: Markov Decision Process - MDP | Rein...
6. Policy iteration and value iteration: • RL 6: Policy iteration and value iteration...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: