Wired to Learn #4: Reinforcement Learning Bot Understands Tic Tac Toe after 10,000 Games!
Автор: Prof. Weimao Ke
Загружено: 2023-08-04
Просмотров: 164
Create a Reinforcement Learning Bot from scratch for Tic Tac Toe, with code available at:
https://github.com/keweimao/wired
The following video explains the idea of Reinforcement Learning (RL):
• Reinforcement Learning
Key ideas of RL applied to the Tic-Tac-Toe bot:
1. Monitors every state of the game, i.e. 'X', 'O', and ' ' marks on the 3x3 grid (`3^9 = 19683` permutations).
2. Makes a move and, depending on the `exploration rate`, it will select:
EITHER an random move to *explore* different situations
OR the *best move* based on past rewards
3. No reward or penalty if there is no immediate winner.
4. In the end:
IF the bot wins, the LAST move will receive a reward of `1`
IF the bot loses, the LAST move will receive a penalty of `-1`
5. This repeat with NEW games and the bot continues to learn.
Reading the above, one may wonder whether ONLY the LAST move will be rewarded? The answer is NO.
(1) *All actions leading to* the LAST move (for a win or loss) will be rewarded or penalized but the reward/penalty will be **discounted**.
```
For example, given RLBot Actions:
Move1 to Move2 to Move3 (WIN)
Its rewards will be like:
Move3 (1 point) to Move2 (0.9 point) to Move1 (0.9*0.9 point)
```
(2) We do have to *repeat the game* to train the bot in order to update rewards/penalties to previous moves.
(3) Another parameter `learning rate` determines *how fast* the bot will update the reward/penalty.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: