A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes - Yuan Chen & Chen Wang

Автор: CNCF [Cloud Native Computing Foundation]

Загружено: 2025-04-15

Просмотров: 555

Описание:

Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); Tokyo, Japan (June 16-17); Hyderabad, India (August 6-7); Atlanta, US (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io

A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes - Yuan Chen, NVIDIA & Chen Wang, IBM Research

Effective benchmarking is required to optimize GPU resource efficiency and enhance performance for AI workloads. This talk provides a practical guide on setting up, configuring, and running various GPU and AI workload benchmarks in Kubernetes.

The talk covers benchmarks for a range of use cases, including model serving, model training and GPU stress testing, using tools like NVIDIA Triton Inference Server, fmperf: an open-source tool for benchmarking LLM serving performance, MLPerf: an open benchmark suite to compare the performance of machine learning systems, GPUStressTest, gpu-burn, and cuda benchmark. The talk will also introduce GPU monitoring and load generation tools.

Through step-by-step demonstrations, attendees will gain practical experience using benchmark tools. They will learn how to effectively run benchmarks on GPUs in Kubernetes and leverage existing tools to fine-tune and optimize GPU resource and workload management for improved performance and resource efficiency.

A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes - Yuan Chen & Chen Wang

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Enabling Fault Tolerance for GPU Accelerated AI Workloads in Kubernetes - A. Singh & A. Paithankar

Enabling Fault Tolerance for GPU Accelerated AI Workloads in Kubernetes - A. Singh & A. Paithankar

Performance loop—A practical guide to profiling and benchmarking - Daniel Marbach - NDC London 2025

Performance loop—A practical guide to profiling and benchmarking - Daniel Marbach - NDC London 2025

openSUSE MicroOS 2.0 Review: The Future of Immutable Linux, Containers & Kubernetes Workloads

openSUSE MicroOS 2.0 Review: The Future of Immutable Linux, Containers & Kubernetes Workloads

Explain How Kubernetes Works With GPU Like I’m 5 - Carlos Santana, AWS

Explain How Kubernetes Works With GPU Like I’m 5 - Carlos Santana, AWS

Nvidia GTC 2025 Recap + PyTorch Model Tuning +AI Systems Performance Engineering Tips

Nvidia GTC 2025 Recap + PyTorch Model Tuning +AI Systems Performance Engineering Tips

Почему RAG терпит неудачу — как CLaRa устраняет свой главный недостаток

Почему RAG терпит неудачу — как CLaRa устраняет свой главный недостаток

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

FLOPS: The New Benchmark For AI Performance (Explained Simply)

FLOPS: The New Benchmark For AI Performance (Explained Simply)

AI in Kubernetes: How to Get Started?

AI in Kubernetes: How to Get Started?

#AskRaghav | How To Decide Benchmark in Performance Testing

#AskRaghav | How To Decide Benchmark in Performance Testing

Изучите Microsoft Active Directory (ADDS) за 30 минут

Изучите Microsoft Active Directory (ADDS) за 30 минут

Может ли у ИИ появиться сознание? — Семихатов, Анохин

Может ли у ИИ появиться сознание? — Семихатов, Анохин

BoF | Fueling Cloud Native: The Data We Have, the Data We Need - Hilary Carter, SVP Research

BoF | Fueling Cloud Native: The Data We Have, the Data We Need - Hilary Carter, SVP Research

Accelerating AI: Running Meta Llama on DigitalOcean Kubernetes (DOKS) with NVIDIA NIM

Accelerating AI: Running Meta Llama on DigitalOcean Kubernetes (DOKS) with NVIDIA NIM

GPUs in Kubernetes for AI Workloads

GPUs in Kubernetes for AI Workloads

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Build an Ai Server for less than $1k and Run LLM's Locally FREE

Build an Ai Server for less than $1k and Run LLM's Locally FREE

Масштабирование рабочих нагрузок ИИ с помощью Kubernetes: совместное использование ресурсов графи...

Масштабирование рабочих нагрузок ИИ с помощью Kubernetes: совместное использование ресурсов графи...

Inside AI Infrastructure: How Data Flows from Archive to Accelerator

Inside AI Infrastructure: How Data Flows from Archive to Accelerator

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?

Для Чего РЕАЛЬНО Нужен был ГОРБ Boeing 747?