Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber

Автор: Anyscale

Загружено: 2020-10-03

Просмотров: 2898

Описание:

Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber

Horovod is an open source framework created to make distributed training of deep neural networks fast and easy for TensorFlow, PyTorch, and MXNet models. Horovod's API makes it easy to take an existing training script and scale it run on hundreds of GPUs, but provisioning a Horovod job with hundreds of GPUs can often be a challenge for users who lack access to HPC systems preconfigured with tools like MPI. The newly introduced Elastic Horovod API introduces fault tolerance and auto-scaling capabilities, but requires further infrastructure scaffolding to configure. In this talk, you will learn how Horovod on Ray can be used to easily provision large distributed Horovod jobs and take advantage of Ray's auto-scaling and fault tolerance with Elastic Horovod out of the box. With Ray Tune integration, Horovod can further be used to accelerate your time-constrained hyperparameter search jobs. Finally, we'll show you how Ray and Horovod are helping to define the future of machine learning workflows at scale.

Distributed Deep Learning with Horovod on Ray - Travis Addair, Uber

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

Keynote: Anyscale Product Demo - Edward Oakes, Software Engineer, Anyscale

Keynote: Anyscale Product Demo - Edward Oakes, Software Engineer, Anyscale

A friendly introduction to distributed training (ML Tech Talks)

A friendly introduction to distributed training (ML Tech Talks)

Deep Learning at Scale with Horovod feat. Travis Addair | Stanford MLSys Seminar Episode 10

Deep Learning at Scale with Horovod feat. Travis Addair | Stanford MLSys Seminar Episode 10

Getting Started with Ray Clusters

Getting Started with Ray Clusters

NVAITC Webinar: Multi-GPU Training using Horovod

NVAITC Webinar: Multi-GPU Training using Horovod

Ray Train: A Production-Ready Library for Distributed Deep Learning

Ray Train: A Production-Ready Library for Distributed Deep Learning

How Runhouse Orchestrates Multi-Cluster Ray Workloads | Ray Summit 2025

How Runhouse Orchestrates Multi-Cluster Ray Workloads | Ray Summit 2025

DeepSpeed: All the tricks to scale to gigantic models

DeepSpeed: All the tricks to scale to gigantic models

Introduction to Distributed Deep Learnring

Introduction to Distributed Deep Learnring

Появляется новый тип искусственного интеллекта, и он лучше, чем LLMS?

Появляется новый тип искусственного интеллекта, и он лучше, чем LLMS?

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

Prompt Learning: A Reinforcement Learning-Inspired Approach to AI Optimization | Ray Summit 2025

Training on multiple GPUs and multi-node training with PyTorch DistributedDataParallel

Training on multiple GPUs and multi-node training with PyTorch DistributedDataParallel

Frameworks & Distributed Training (5) - Infrastructure & Tooling - Full Stack Deep Learning

Frameworks & Distributed Training (5) - Infrastructure & Tooling - Full Stack Deep Learning

A Quick Overview of the Ray Libraries Built on Ray Core | Ray Summit Expo

A Quick Overview of the Ray Libraries Built on Ray Core | Ray Summit Expo

Ray: Faster Python through parallel and distributed computing

Ray: Faster Python through parallel and distributed computing

Brendan Burns: Lessons from Building Kubernetes and the Future of AI Infrastructure

Brendan Burns: Lessons from Building Kubernetes and the Future of AI Infrastructure

Introduction to Distributed ML Workloads with Ray on Kubernetes - Abdel Sghiouar, Google Cloud

Introduction to Distributed ML Workloads with Ray on Kubernetes - Abdel Sghiouar, Google Cloud

Scaling Deep Learning on Databricks

Scaling Deep Learning on Databricks

The Energy Storage Problem No One Explained Properly

The Energy Storage Problem No One Explained Properly

Byzantine Fault-Tolerant Machine Learning (ft. El Mahdi El Mhamdi)

Byzantine Fault-Tolerant Machine Learning (ft. El Mahdi El Mhamdi)

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]