Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text

Автор: Sebastian Raschka

Загружено: 2025-03-17

Просмотров: 25971

Описание:

Links to the book:
https://amzn.to/4fqvn0D (Amazon)
https://mng.bz/M96o (Manning)

Link to the GitHub repository: https://github.com/rasbt/LLMs-from-sc...

This is a supplementary video explaining how to code an LLM architecture from scratch.

00:00 4.1 Coding an LLM architecture
13:52 4.2 Normalizing activations withlayer normalization
36:02 4.3 Implementing a feed forward network with GELU activations
52:16 4.4 Adding shortcut connections
1:03:18 4.5 Connecting attention and linear layers in a transformer block
1:15:13 4.6 Coding the GPT model

You can find additional bonus materials on GitHub, for example converting the GPT-2 architecture into Llama 2 and Llama 3: https://github.com/rasbt/LLMs-from-sc...

Build an LLM from Scratch 4: Implementing a GPT model from Scratch To Generate Text

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Build an LLM from Scratch 5: Pretraining on Unlabeled Data

Build an LLM from Scratch 5: Pretraining on Unlabeled Data

Build an LLM from Scratch 3: Coding attention mechanisms

Build an LLM from Scratch 3: Coding attention mechanisms

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Начало — LLM с нуля, эпизод 0

Начало — LLM с нуля, эпизод 0

Управление поведением LLM без тонкой настройки

Управление поведением LLM без тонкой настройки

Build a Large Language Model (From Scratch)

Build a Large Language Model (From Scratch)

LLMs

Build an LLM from Scratch 1: Set up your code environment

Build an LLM from Scratch 1: Set up your code environment

Let's build GPT: from scratch, in code, spelled out.

Let's build GPT: from scratch, in code, spelled out.

The Big LLM Architecture Comparison

The Big LLM Architecture Comparison

Получение степени магистра права: создание, обучение, доработка

Получение степени магистра права: создание, обучение, доработка

Почему «Трансформеры» заменяют CNN?

Почему «Трансформеры» заменяют CNN?

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Как внимание стало настолько эффективным [GQA/MLA/DSA]

Программирование на ассемблере без операционной системы

Программирование на ассемблере без операционной системы

Покойся с миром, Arduino и Open Hardware... спасибо Qualcomm

Покойся с миром, Arduino и Open Hardware... спасибо Qualcomm

Don't forget to Google it...

Don't forget to Google it...

I Visualised Attention in Transformers

I Visualised Attention in Transformers

Почему диффузия работает лучше, чем авторегрессия?

Почему диффузия работает лучше, чем авторегрессия?

How DeepSeek Rewrote the Transformer [MLA]

How DeepSeek Rewrote the Transformer [MLA]