Tips and tricks for distributed large model training

Автор: TensorFlow

Загружено: 2022-05-12

Просмотров: 7966

Описание:

Discover several different distribution strategies and related concepts for data and model parallel training. Walk through an example of training a 39 billion parameter language model on TPUs, and conclude with the challenges and best practices of orchestrating large scale language model training.

Resource:
TensorFlow website → https://goo.gle/3KejoUZ

Speakers: Nikita Namjoshi, Vaibhav Singh

Watch more:
All Google I/O 2022 Sessions → https://goo.gle/IO22_AllSessions
ML/AI at I/O 2022 playlist → https://goo.gle/IO22_ML-AI
All Google I/O 2022 technical sessions → https://goo.gle/IO22_Sessions

Subscribe to TensorFlow → https://goo.gle/TensorFlow

#GoogleIO

Tips and tricks for distributed large model training

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

A friendly introduction to distributed training (ML Tech Talks)

A friendly introduction to distributed training (ML Tech Talks)

Distributed TensorFlow (TensorFlow @ O’Reilly AI Conference, San Francisco '18)

Distributed TensorFlow (TensorFlow @ O’Reilly AI Conference, San Francisco '18)

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

Inside TensorFlow: Parameter server training

Inside TensorFlow: Parameter server training

Ray, a Unified Distributed Framework for the Modern AI Stack | Ion Stoica

Ray, a Unified Distributed Framework for the Modern AI Stack | Ion Stoica

Machine Learning Zero to Hero (Google I/O'19)

Machine Learning Zero to Hero (Google I/O'19)

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Inside TensorFlow: tf.distribute.Strategy

Inside TensorFlow: tf.distribute.Strategy

A developer's guide to responsible AI review processes

A developer's guide to responsible AI review processes

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Distributed ML Talk @ UC Berkeley

Distributed ML Talk @ UC Berkeley

DL4CV@WIS (Spring 2021) Tutorial 13: Training with Multiple GPUs

DL4CV@WIS (Spring 2021) Tutorial 13: Training with Multiple GPUs

AWS re:Invent 2022 - Train ML models at scale with Amazon SageMaker, featuring AI21 Labs (AIM301)

AWS re:Invent 2022 - Train ML models at scale with Amazon SageMaker, featuring AI21 Labs (AIM301)

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Distributed Training with PyTorch: complete tutorial with cloud infrastructure and code

Overview of KerasCV and KerasNLP

Overview of KerasCV and KerasNLP

How Fully Sharded Data Parallel (FSDP) works?

How Fully Sharded Data Parallel (FSDP) works?

DeepSpeed: All the tricks to scale to gigantic models

DeepSpeed: All the tricks to scale to gigantic models

СЕКРЕТ обучения ChatGPT, о котором никто не говорит | FSDP разъясняет

СЕКРЕТ обучения ChatGPT, о котором никто не говорит | FSDP разъясняет

Интервью по проектированию системы Google: Design Spotify (с бывшим менеджером по маркетингу Google)

Интервью по проектированию системы Google: Design Spotify (с бывшим менеджером по маркетингу Google)

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper