A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes - Yuan Chen & Chen Wang
Автор: CNCF [Cloud Native Computing Foundation]
Загружено: 2025-04-15
Просмотров: 555
Don't miss out! Join us at our next Flagship Conference: KubeCon + CloudNativeCon events in Hong Kong, China (June 10-11); Tokyo, Japan (June 16-17); Hyderabad, India (August 6-7); Atlanta, US (November 10-13). Connect with our current graduated, incubating, and sandbox projects as the community gathers to further the education and advancement of cloud native computing. Learn more at https://kubecon.io
A Practical Guide To Benchmarking AI and GPU Workloads in Kubernetes - Yuan Chen, NVIDIA & Chen Wang, IBM Research
Effective benchmarking is required to optimize GPU resource efficiency and enhance performance for AI workloads. This talk provides a practical guide on setting up, configuring, and running various GPU and AI workload benchmarks in Kubernetes.
The talk covers benchmarks for a range of use cases, including model serving, model training and GPU stress testing, using tools like NVIDIA Triton Inference Server, fmperf: an open-source tool for benchmarking LLM serving performance, MLPerf: an open benchmark suite to compare the performance of machine learning systems, GPUStressTest, gpu-burn, and cuda benchmark. The talk will also introduce GPU monitoring and load generation tools.
Through step-by-step demonstrations, attendees will gain practical experience using benchmark tools. They will learn how to effectively run benchmarks on GPUs in Kubernetes and leverage existing tools to fine-tune and optimize GPU resource and workload management for improved performance and resource efficiency.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: