Umar Jamil
I'm a Machine Learning Engineer from Milan, Italy, teaching complex deep learning and machine learning concepts to my cat, 奥利奥.
我也会中文.
Titans: Learning to Memorize at Test Time
Paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Flash Attention derived and coded from first principles with Triton (Python)
Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation
Kolmogorov-Arnold Networks: MLP vs KAN, Math, B-Splines, Universal Approximation Theorem
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.
Mamba and S4 Explained: Architecture, Parallel Scan, Kernel Fusion, Recurrent, Convolution, Math
Mistral / Mixtral Explained: Sliding Window Attention, Sparse Mixture of Experts, Rolling Buffer
Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code
Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training
Retrieval Augmented Generation (RAG) Explained: Embedding, Sentence BERT, Vector Database (HNSW)
Объяснение BERT: обучение, вывод, BERT против GPT/LLamA, тонкая настройка, токен [CLS]
Coding Stable Diffusion from scratch in PyTorch
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU
Segment Anything - Model explanation with code
LoRA: Low-Rank Adaptation of Large Language Models - Explained visually + PyTorch code from scratch
LongNet: Scaling Transformers to 1,000,000,000 tokens: Python Code + Explanation
How diffusion models work - explanation and code!
Variational Autoencoder - Model, ELBO, loss function and maths explained easily!
Внимание — это всё, что вам нужно (Transformer) — объяснение модели (включая математику), вывод и...
Coding a Transformer from scratch on PyTorch, with full explanation, training and inference.
CLIP - Paper explanation (training and inference)
Wav2Lip (generate talking avatar videos) - Paper reading and explanation