L-3 | LLM Tokenizers Explained: BPE, SentencePiece, Pretrained vs Custom (Full Hands-On Guide)

Автор: Code With Aarohi

Загружено: 2025-12-08

Просмотров: 542

Описание:

In the last lecture, we built our own TinyGPT LLM from scratch using manual tokenization.
Today, we upgrade that system using real, production-level tokenizers.

GitHub: ( both links have same code )

https://github.com/codewithaarohi/Bui...

https://github.com/AarohiSingla/Build...

📧 You can also reach me at: [email protected]

📸 Follow me on Instagram: @codewithaarohi
🔗 / codewithaarohi

If you haven’t watched the previous lecture
I highly recommend watching it first—we built the entire TinyGPT model step-by-step.

In this video, you will learn:
What tokenizers really do
How LLMs convert text → tokens → numbers
How to use SentencePiece
How to use BPE (Byte Pair Encoding)
How to use pretrained tokenizers like GPT-2, BERT, LLaMA, T5
How to train your own tokenizer from your own dataset
How vocabulary size, domain-specific text, and language mix affect tokens
How embedding layers convert token IDs into vectors
How to integrate everything into our TinyGPT model

Libraries Covered
sentencepiece (train your own tokenizer)
tokenizers (BPE, ByteLevelBPETokenizer)
gensim (Word2Vec, FastText embeddings)
transformers (HuggingFace tokenizers)

👍 Support the Channel
Your support pushes me to create even better videos.
Please Like, Comment, Share, and Subscribe ❤️

L-3 | LLM Tokenizers Explained: BPE, SentencePiece, Pretrained vs Custom (Full Hands-On Guide)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

i think this is what AI should look like

i think this is what AI should look like

L-2 | Build a Mini GPT Model From Scratch Using PyTorch | Step-by-Step Tutorial for Beginners

L-2 | Build a Mini GPT Model From Scratch Using PyTorch | Step-by-Step Tutorial for Beginners

L-1 | Understanding LLMs — Conceptually & Mathematically | Lecture 1 | LLMs Course

L-1 | Understanding LLMs — Conceptually & Mathematically | Lecture 1 | LLMs Course

🤷 AI обучен на говнокоде! Разработчиков компиляторов, протоколов и СУБД не хватает, а LLM не может

🤷 AI обучен на говнокоде! Разработчиков компиляторов, протоколов и СУБД не хватает, а LLM не может

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

Мужик украл карася, Муму и Герасим, Участковый Сюткин, У губ твоих конфетный вкус - КВН ДАЛС

Мужик украл карася, Муму и Герасим, Участковый Сюткин, У губ твоих конфетный вкус - КВН ДАЛС

✓ Новая формула площади прямоугольного треугольника | Ботай со мной #159 | Борис Трушин

✓ Новая формула площади прямоугольного треугольника | Ботай со мной #159 | Борис Трушин

Лучший ПК на Windows – это iMac | Старый моноблок Apple vs мини-ПК на N100

Лучший ПК на Windows – это iMac | Старый моноблок Apple vs мини-ПК на N100

OpenAI тонет. Google рвёт индустрию. ИИ улетает в космос / Итоги ноября в AI

OpenAI тонет. Google рвёт индустрию. ИИ улетает в космос / Итоги ноября в AI

Can a Local LLM REALLY be your daily coder? Framework Desktop with GLM 4.5 Air and Qwen 3 Coder

Can a Local LLM REALLY be your daily coder? Framework Desktop with GLM 4.5 Air and Qwen 3 Coder

Building the PERFECT Linux PC with Linus Torvalds

Building the PERFECT Linux PC with Linus Torvalds

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

DeepSeek V3.2 Just Broke SoTA Again… But How?

DeepSeek V3.2 Just Broke SoTA Again… But How?

This is not a Framework Laptop

This is not a Framework Laptop

🧑‍💻 Собеседования и найм: алгоритмы, высокие нагрузки, использование LLM, IDE, стресс и лайвкодинг

🧑‍💻 Собеседования и найм: алгоритмы, высокие нагрузки, использование LLM, IDE, стресс и лайвкодинг

Разбираем зачем он нужен и как им пользоваться | Первый отдел

Разбираем зачем он нужен и как им пользоваться | Первый отдел

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Декораторы Python — наглядное объяснение

Декораторы Python — наглядное объяснение

Canonical — враг или спаситель Ubuntu? | Спорные решения и заброшенные проекты

Canonical — враг или спаситель Ubuntu? | Спорные решения и заброшенные проекты

Почему ты никогда не вылечишь спину. Секрет избавления от боли.

Почему ты никогда не вылечишь спину. Секрет избавления от боли.