CatBoost Part 1: Ordered Target Encoding
Автор: StatQuest with Josh Starmer
Загружено: 27 февр. 2023 г.
Просмотров: 44 265 просмотров
One of the defining features of CatBoost is its concerted effort to avoid data leakage at all costs. In this video, we'll see how it eliminates a potential threat in Target Encoding by ordering the data and encoding it sequentially. This ordered approach is central to everything CatBoost does and we'll see it again in Part 2 when we talk about how it builds trees.
NOTE: This StatQuest is based on the original CatBoost manuscript... https://arxiv.org/abs/1706.09516
...and an example provided in the CatBoost documentation...
https://catboost.ai/en/docs/concepts/...
English
This video has been dubbed using an artificial voice via https://aloud.area120.google.com to increase accessibility. You can change the audio track language in the Settings menu.
Spanish
Este video ha sido doblado al español con voz artificial con https://aloud.area120.google.com para aumentar la accesibilidad. Puede cambiar el idioma de la pista de audio en el menú Configuración.
Portuguese
Este vídeo foi dublado para o português usando uma voz artificial via https://aloud.area120.google.com para melhorar sua acessibilidade. Você pode alterar o idioma do áudio no menu Configurações.
For a complete index of all the StatQuest videos, check out:
https://statquest.org/video-index/
If you'd like to support StatQuest, please consider...
Patreon: / statquest
...or...
YouTube Membership: / @statquest
...buying one of my books, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/
...or just donating to StatQuest!
https://www.paypal.me/statquest
Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
/ joshuastarmer
0:00 Awesome song and introduction
1:56 A slight problem with k-fold target encoding
3:42 Ordered Target Encoding
Corrections:
4:09 It is also worth noting that if there were more than 2 target values, for example, if Loves Troll 2 could be 0, 1 and 2, then, when calculating the OptionCount for a sample with Loves Troll 2 = 1, we would include rows that had Loves Troll 2 = 1 and 2.
#StatQuest #CatBoost #dubbedwithaloud

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: