RL 5: Markov Decision Process - MDP | Reinforcement Learning
Автор: AI Insights - Rituraj Kaushik
Загружено: 2019-02-10
Просмотров: 85616
Markov Decision Process - MDP - Markov decision process process is a way to formalize sequential decision making process. Thus we can formalize reinforcement learning problem with finite markov decision process. There are 5 components of Markov decision process - the agent, the environment, the states, the actions and the rewards. The agents takes an action in the environment based on the current state of the environment. After every action the environment moves t[o another state. The agent receives a reward for it's action on the previous state. The goal of the agent is to maximize the total reward it receives in an episode or a specific number of steps.
Reinforcement learning tutorial series:
1. Multi-armed Bandits: • RL 1: Multi-armed Bandits 1
2. Multi-Armed Bandits - Action value estimation: • RL 2: Multi-Armed Bandits 2 - Action value...
3. Upper confidence bound: • RL 3: Upper confidence bound (UCB) to solv...
4. Thompson Sampling: • RL 4: Thompson Sampling - Multi-armed bandits
5. Markov Decision Process - MDP: • RL 5: Markov Decision Process - MDP | Rein...
6. Policy iteration and value iteration: • RL 6: Policy iteration and value iteration...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: