How massive Cerebras chips rival Nvidia GPUs for AI

Автор: Dr Waku

Загружено: 2024-12-25

Просмотров: 34198

Описание:

I interviewed Joel Hestness, a key engineer at Cerebras. Cerebras produces AI accelerators like Groq and Nvidia, but Cerebras focuses on producing the largest chips possible. Their chips use an entire silicon wafer, and contain a million cores (the same way Nvidia gpus contain a few tens of thousands of Cuda cores).

We discuss in detail how the memory architecture works for such a unique system, cooling, compiler architecture, logical mapping of cores, etc. One of the most interesting aspects is that the hardware can handle arbitrary failures of specific cores, which is necessary because almost any wafer would have some faults in it that would cause cores to stop working.

Cerebras is price competitive with Nvidia GPUs but can perform inference many times faster on just a single node. For training, many nodes can be networked together. They demonstrate support for multi-trillion parameter models, and have out of the box support for open source models like Llama 3.3. Very interesting hardware, and I hope the company sees success in the market.

#ai #hardware #accelerators

Cerebras
https://cerebras.ai/

Announcing Cerebras Inference Research Grant
https://cerebras.ai/blog/grantfrp

Joel Hestness
/ joelhestness

0:00 Intro
0:15 Contents
0:27 Part 1: Introduction
0:43 Experience at Baidu research lab
1:57 Exposure to hardware companies like Cerebras
2:33 Focus on pretraining at Cerebras
3:27 Overview of Cerebras, using a giant wafer to accelerate AI
4:24 Very large scale trillion parameter models
5:40 How many GPUs is this equivalent to?
6:19 How much memory is in one Cerebras chip?
7:32 Activations (in SRAM) vs weights (off chip)
8:18 New inference solution, 4x faster than anything else
9:13 Enough memory for a 24 trillion parameter model??
10:26 Cerebras more flexible than other hardware approaches
11:42 High performance computing stack
13:03 Part 2: The hardware
13:15 How large are these chips anyway?
14:02 One million cores
14:38 Logical array of cores
15:23 Mapping out cores that aren't working
16:10 IBM Cell processor comparison
16:57 Dealing with defects in the wafer for 100% yield
18:11 It's almost like having a million separate chips
18:36 Stress testing the chips to find defects
19:20 Types of issues: stalls, bit flips, etc
19:51 Ryzen segfault bug comparison
20:34 So many ways to fail
21:35 Are these chips future proof against failures?
23:57 How do you keep these chips cool?
25:01 Matching the power density of Nvidia GPUs
25:39 Blackwell GPU power consumption halves number of nodes
26:47 Moving complexity out of hardware into software
27:54 Part 3: Accessing the hardware
28:07 Four different ways for customers to access
29:49 Inference API, support for Llama 3.3
30:40 Geographic distribution of Cerebras clusters
31:46 Pytorch compatibility and compiler
32:36 No custom code in pytorch needed
33:41 Details of compiler implementation
34:39 Testing 1400 hugging face models
35:47 What is the network between nodes?
36:08 Three different kinds of nodes inside Cerebras systems
36:54 How a model fits into the architecture
37:39 Whole distributed system, codesign of hardware and ML
38:38 Other supercomputing workloads
39:33 Conclusion
39:52 Cerebras has grants available
40:12 Cerebras good at inference time compute like o1
40:57 Outro

How massive Cerebras chips rival Nvidia GPUs for AI

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Can Amazon compete against Nvidia GPUs?

Can Amazon compete against Nvidia GPUs?

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Соучредитель Cerebras разбирает задержку графического процессора Blackwell

Соучредитель Cerebras разбирает задержку графического процессора Blackwell

4 000 000 000 000 транзисторов, один гигантский чип (Cerebras WSE-3)

4 000 000 000 000 транзисторов, один гигантский чип (Cerebras WSE-3)

NVIDIA взлом реальности и будущего науки — суперкомпьютеры SC25 покажут что-то шокирующее!

NVIDIA взлом реальности и будущего науки — суперкомпьютеры SC25 покажут что-то шокирующее!

Новый завод по производству микросхем в Америке — катастрофа стоимостью 50 миллиардов долларов

Новый завод по производству микросхем в Америке — катастрофа стоимостью 50 миллиардов долларов

How do Graphics Cards Work? Exploring GPU Architecture

How do Graphics Cards Work? Exploring GPU Architecture

Закат программистов? Нет, эра архитекторов AI. // Сергей Марков

Закат программистов? Нет, эра архитекторов AI. // Сергей Марков

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

«Open AI — это пузырь»! Откровения из Кремниевой долины | Братья Либерманы

«Open AI — это пузырь»! Откровения из Кремниевой долины | Братья Либерманы

How AI actually reasons: inside model thought processes

How AI actually reasons: inside model thought processes

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Как изменилась жизнь разработчиков с приходом ИИ

Как изменилась жизнь разработчиков с приходом ИИ

Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260

Jonathan Ross, Founder & CEO @ Groq: NVIDIA vs Groq - The Future of Training vs Inference | E1260

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

Как производятся микрочипы? 🖥️🛠️ Этапы производства процессоров

Inside the World's Largest AI Supercluster xAI Colossus

Inside the World's Largest AI Supercluster xAI Colossus

Это снова повторяется, и никто об этом не говорит.

Это снова повторяется, и никто об этом не говорит.

This Is The Future of AI

This Is The Future of AI

Ермак и Зе готовятся к посадке

Ермак и Зе готовятся к посадке

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности

Музыка для работы за компьютером | Фоновая музыка для концентрации и продуктивности