RL 4: Thompson Sampling - Multi-armed bandits

Автор: AI Insights - Rituraj Kaushik

Загружено: 2019-02-03

Просмотров: 20574

Описание:

Thompson Sampling - Multi-armed bandits - In this tutorial we discuss another interesting algorithm called Thompson Sampling to solve multi-armed bandit problem. Unlike UCB, this is a sampling based probabilistic approach and proved to be better than UCB.

If you did not watch the previous videos on multi-armed bandits then I strongly encourage you to watch them before watching this video to understand the whole story.

Reinforcement learning tutorial series:

1. Multi-armed Bandits:    • RL 1: Multi-armed Bandits 1
2. Multi-Armed Bandits - Action value estimation:    • RL 2: Multi-Armed Bandits 2 - Action value...
3. Upper confidence bound:    • RL 3: Upper confidence bound (UCB) to solv...
4. Thompson Sampling:    • RL 4: Thompson Sampling - Multi-armed bandits
5. Markov Decision Process - MDP:    • RL 5: Markov Decision Process - MDP | Rein...
6. Policy iteration and value iteration:    • RL 6: Policy iteration and value iteration...

RL 4: Thompson Sampling - Multi-armed bandits

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

RL 5: Markov Decision Process - MDP | Reinforcement Learning

RL 5: Markov Decision Process - MDP | Reinforcement Learning

Thompson Sampling : Data Science Concepts

Thompson Sampling : Data Science Concepts

Выборка Томпсона, однорукие бандиты и бета-распределение

Выборка Томпсона, однорукие бандиты и бета-распределение

RL 1: Multi-armed Bandits 1

RL 1: Multi-armed Bandits 1

Multi-armed bandit algorithms: Thompson Sampling

Multi-armed bandit algorithms: Thompson Sampling

Моделирование Монте-Карло

Моделирование Монте-Карло

Machine learning - Bayesian optimization and multi-armed bandits

Machine learning - Bayesian optimization and multi-armed bandits

CS885 Lecture 8b: Bayesian and Contextual Bandits

CS885 Lecture 8b: Bayesian and Contextual Bandits

Thompson Sampling Algorithm

Thompson Sampling Algorithm

The Contextual Bandits Problem

The Contextual Bandits Problem

Reinforcement Learning Chapter 2: Multi-Armed Bandits

Reinforcement Learning Chapter 2: Multi-Armed Bandits

Написал нейросети для рисования | Как работает DeepDream?

Написал нейросети для рисования | Как работает DeepDream?

RL 6: Policy iteration and value iteration - Reinforcement learning

RL 6: Policy iteration and value iteration - Reinforcement learning

CS885 Lecture 8a: Multi-armed bandits

CS885 Lecture 8a: Multi-armed bandits

Лучшая стратегия для многорукого бандита? (при участии UCB Method)

Лучшая стратегия для многорукого бандита? (при участии UCB Method)

ЛЕКЦИЯ ПРО НАДЁЖНЫЕ ШИФРЫ НА КОНФЕРЕНЦИИ БАЗОВЫХ ШКОЛ РАН В ТРОИЦКЕ

ЛЕКЦИЯ ПРО НАДЁЖНЫЕ ШИФРЫ НА КОНФЕРЕНЦИИ БАЗОВЫХ ШКОЛ РАН В ТРОИЦКЕ

Multi-Armed Bandit Strategies: Epsilon Greedy, UCB, Thompson Sampling | Contextual MABs: LinUCB | RL

Multi-Armed Bandit Strategies: Epsilon Greedy, UCB, Thompson Sampling | Contextual MABs: LinUCB | RL

Многорукий бандит: концепции науки о данных

Многорукий бандит: концепции науки о данных

Thompson Sampling

Thompson Sampling

Задача из вступительных Стэнфорда

Задача из вступительных Стэнфорда