Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Автор: CNCF [Cloud Native Computing Foundation]

Загружено: 2023-12-04

Просмотров: 8832

Описание:

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Dynamic Resource Allocation (DRA) is new Kubernetes feature that puts resource scheduling in the hands of 3rd-party developers. It moves away from the limited "countable" interface for requesting access to resources (e.g. "nvidia.com/gpu: 2"), providing an API more akin to that of persistent volumes. In the context of GPUs, this unlocks a host of new features without the need for awkward solutions shoehorned on top of the existing device plugin API. These features include: * Controlled GPU Sharing (both within a pod and across pods) * Multiple GPU models per node (e.g. T4 and A100) * Specifying arbitrary constraints for a GPU (min/max memory, device model, etc.) * Dynamic allocation of Multi-Instance GPUs (MIG) * … the list goes on ... In this talk, you will learn about the DRA resource driver we have built for GPUs. We walk through each of the features it provides, and conclude with a series of demos showing you how you can get started using it today.

Unlocking the Full Potential of GPUs for AI Workloads on Kubernetes - Kevin Klues, NVIDIA

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

Mastering GPU Management in Kubernetes Using the Operator Pattern- Shiva Krishna Merla & Kevin Klues

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

это заставило меня улыбнуться

это заставило меня улыбнуться

Линус Торвальдс рассказывает о шумихе вокруг искусственного интеллекта, мощности графических проц...

Линус Торвальдс рассказывает о шумихе вокруг искусственного интеллекта, мощности графических проц...

Масштабирование рабочих нагрузок ИИ с помощью Kubernetes: совместное использование ресурсов графи...

Масштабирование рабочих нагрузок ИИ с помощью Kubernetes: совместное использование ресурсов графи...

A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes - Kevin Klues, NVIDIA

A Deep Dive on Supporting Multi-Instance GPUs in Containers and Kubernetes - Kevin Klues, NVIDIA

vLLM on Kubernetes in Production

vLLM on Kubernetes in Production

Этот суперкомпьютер на основе искусственного интеллекта может поместиться на вашем столе...

Этот суперкомпьютер на основе искусственного интеллекта может поместиться на вашем столе...

NVIDIA GPU Operator Overview

NVIDIA GPU Operator Overview

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

Что такое стек ИИ? Магистратура LLM, RAG и аппаратное обеспечение ИИ

Parallel Computing Simplified

Parallel Computing Simplified

Ollama with GPU acceleration on Kubernetes!

Ollama with GPU acceleration on Kubernetes!

GPUs in Kubernetes for AI Workloads

GPUs in Kubernetes for AI Workloads

Keynote: Networking for AI and HPC, and Ultra Ethernet

Keynote: Networking for AI and HPC, and Ultra Ethernet

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Новый код — Шон Гроув, OpenAI

Новый код — Шон Гроув, OpenAI

VM vs Kubernetes: Performance 🚀

VM vs Kubernetes: Performance 🚀

Kubernetes Design Principles: Understand the Why - Saad Ali, Google

Kubernetes Design Principles: Understand the Why - Saad Ali, Google

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper

Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM | Jared Casper