Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Автор: Vizuara

Загружено: 2025-06-01

Просмотров: 874

Описание:

So far, in the Reinforcement Learning Phase, we have looked at tabular methods for calculating the value functions. That is, the states and their values are represented in the form of tables.

In most practical problems, these methods are not useful, since the number of states are quite large. For example, the number of states in a game of chess are ~10^46.

From this lecture onwards, we will start to look at function approximate methods, use to calculate values of a certain states and then generalize to other states.

It is quite similar to supervised learning except:

(1) The Target is not known beforehand
(2) The Target is non-stationary

We will learn how to use a function to express the value function, and also how to use gradient descent to optimize this function.

We are now getting closer to understanding how reinforcement learning is used in language models. This lecture marks the beginning of this transition.

Lecture 11 - Function Approximation Methods|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

array(10) { [0]=> object(stdClass)#4603 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "WEY0FduXs28" ["related_video_title"]=> string(92) "Lecture 12 - Policy Control using Value Function Approximation | Reasoning LLMs from Scratch" ["posted_time"]=> string(25) "2 недели назад" ["channelName"]=> string(7) "Vizuara" } [1]=> object(stdClass)#4576 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "NlzExbeFNPs" ["related_video_title"]=> string(100) "Lecture 15 Generalized Advantage Estimation|Reinforcement Learning Phase|Reasoning LLMs from Scratch" ["posted_time"]=> string(23) "6 часов назад" ["channelName"]=> string(7) "Vizuara" } [2]=> object(stdClass)#4601 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "vXtfdGphr3c" ["related_video_title"]=> string(35) "Reinforcement Learning from scratch" ["posted_time"]=> string(19) "1 год назад" ["channelName"]=> string(21) "Graphics in 5 Minutes" } [3]=> object(stdClass)#4608 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "Qe4fCFbQynQ" ["related_video_title"]=> string(118) "ГРОБОВАЯ ЗАДАЧА ИЗ ЯНДЕКСА! | БЕСКОНЕЧНАЯ СУММА ЧИСЕЛ ФИБОНАЧЧИ??" ["posted_time"]=> string(25) "2 недели назад" ["channelName"]=> string(35) "Профиматика.Вышмат" } [4]=> object(stdClass)#4587 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "22tkx79icy4" ["related_video_title"]=> string(55) "RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!" ["posted_time"]=> string(23) "1 месяц назад" ["channelName"]=> string(8) "AI RANEZ" } [5]=> object(stdClass)#4605 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "0myDortFqqU" ["related_video_title"]=> string(81) "Lecture 14 - REINFORCE | Reinforcement Learning Phase|Reasoning LLMs from Scratch" ["posted_time"]=> string(21) "7 дней назад" ["channelName"]=> string(7) "Vizuara" } [6]=> object(stdClass)#4600 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "4Oaveqn2YwY" ["related_video_title"]=> string(125) "Эти 5 СЕКРЕТНЫХ функций в Excel НИКТО не использует. Вот, что они делают!" ["posted_time"]=> string(25) "4 недели назад" ["channelName"]=> string(54) "Эксперт ЭКСЕЛЬ и ГУГЛ-ТАБЛИЦЫ" } [7]=> object(stdClass)#4610 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "blWdjRUPP6E" ["related_video_title"]=> string(72) "Разведчик о том, как использовать людей" ["posted_time"]=> string(25) "3 недели назад" ["channelName"]=> string(18) "Коллектив" } [8]=> object(stdClass)#4586 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "E0Hmnixke2g" ["related_video_title"]=> string(51) "All Machine Learning algorithms explained in 17 min" ["posted_time"]=> string(27) "9 месяцев назад" ["channelName"]=> string(14) "Infinite Codes" } [9]=> object(stdClass)#4604 (5) { ["video_id"]=> int(9999999) ["related_video_id"]=> string(11) "IgLGiJ00a1I" ["related_video_title"]=> string(151) "Проброс портов, брутфорс wordpress и jenkins! Прохожу машину Internal на TryHackMe, уровень тяжелый!" ["posted_time"]=> string(25) "3 месяца назад" ["channelName"]=> string(14) "Mister Exploit" } }

Lecture 12 - Policy Control using Value Function Approximation | Reasoning LLMs from Scratch

Lecture 12 - Policy Control using Value Function Approximation | Reasoning LLMs from Scratch

Lecture 15 Generalized Advantage Estimation|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Lecture 15 Generalized Advantage Estimation|Reinforcement Learning Phase|Reasoning LLMs from Scratch

Reinforcement Learning from scratch

Reinforcement Learning from scratch

ГРОБОВАЯ ЗАДАЧА ИЗ ЯНДЕКСА! | БЕСКОНЕЧНАЯ СУММА ЧИСЕЛ ФИБОНАЧЧИ??

ГРОБОВАЯ ЗАДАЧА ИЗ ЯНДЕКСА! | БЕСКОНЕЧНАЯ СУММА ЧИСЕЛ ФИБОНАЧЧИ??

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

RAG | САМОЕ ПОНЯТНОЕ ОБЪЯСНЕНИЕ!

Lecture 14 - REINFORCE | Reinforcement Learning Phase|Reasoning LLMs from Scratch

Lecture 14 - REINFORCE | Reinforcement Learning Phase|Reasoning LLMs from Scratch

Эти 5 СЕКРЕТНЫХ функций в Excel НИКТО не использует. Вот, что они делают!

Эти 5 СЕКРЕТНЫХ функций в Excel НИКТО не использует. Вот, что они делают!

Разведчик о том, как использовать людей

Разведчик о том, как использовать людей

All Machine Learning algorithms explained in 17 min

All Machine Learning algorithms explained in 17 min

Проброс портов, брутфорс wordpress и jenkins! Прохожу машину Internal на TryHackMe, уровень тяжелый!

Проброс портов, брутфорс wordpress и jenkins! Прохожу машину Internal на TryHackMe, уровень тяжелый!