Umar Jamil

I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥.
我也会中文.

Titans: Learning to Memorize at Test Time

Titans: Learning to Memorize at Test Time

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Flash Attention derived and coded from first principles with Triton (Python)

Flash Attention derived and coded from first principles with Triton (Python)

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)

Объяснение BERT: обучение, вывод, BERT против GPT/LLamA, тонкая настройка, токен [CLS]

Объяснение BERT: обучение, вывод, BERT против GPT/LLamA, тонкая настройка, токен [CLS]

Coding Stable Diffusion from scratch in PyTorch

Coding Stable Diffusion from scratch in PyTorch

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Segment Anything - Model explanation with code

Segment Anything - Model explanation with code

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation

How diffusion models work - explanation and code!

How diffusion models work - explanation and code!

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Variational Autoencoder - Model, ELBO, loss function and maths explained easily!

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.

CLIP - Paper explanation (training and inference)

CLIP - Paper explanation (training and inference)

Wav2Lip (generate talking avatar videos) - Paper reading and explanation