Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim & Kai-Hsun Chen

Автор: CNCF [Cloud Native Computing Foundation]

Загружено: 2024-11-14

Просмотров: 1975

Описание:

Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim, Google & Kai-Hsun Chen, Anyscale

With the proliferation of Large Language Models, Ray, a distributed open-source framework for scaling AI/ML, has developed many advanced techniques for serving LLMs in a distributed environment. In this session, Andrew Sy Kim and Kai-Hsun Chen will provide an in-depth exploration of advanced model serving techniques using Ray, covering model composition, model multiplexing and fractional GPU scheduling. Additionally, they will discuss ongoing initiatives in Ray focused on GPU-native communication, which, when combined with Kubernetes DRA, offers a scalable approach to tensor parallelism, a technique used to fit large models across multiple GPUs. Finally, they will present a live demo, demonstrating how KubeRay enables the practical application of these techniques to real-world LLM deployments on Kubernetes. The demo will showcase Ray’s powerful capabilities to scale, compose and orchestrate popular open-source models across a diverse set of hardware accelerators and failure domains.

Advanced Model Serving Techniques with Ray on Kubernetes - Andrew Sy Kim & Kai-Hsun Chen

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

The Hard Truth About GitOps and Database Rollbacks - Rotem Tamir, Ariga

The Hard Truth About GitOps and Database Rollbacks - Rotem Tamir, Ariga

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

Introduction to Distributed ML Workloads with Ray on Kubernetes - Mofi Rahman & Abdel Sghiouar

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

AI in Kubernetes: How to Get Started?

AI in Kubernetes: How to Get Started?

vLLM on Kubernetes in Production

vLLM on Kubernetes in Production

Fast LLM Serving with vLLM and PagedAttention

Fast LLM Serving with vLLM and PagedAttention

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

No Jsonnet? No Problem! Prometheus in Perses, Powered by Go - Saswata Mukherjee, Hélia Barroso

No Jsonnet? No Problem! Prometheus in Perses, Powered by Go - Saswata Mukherjee, Hélia Barroso

КАК УСТРОЕН TCP/IP?

КАК УСТРОЕН TCP/IP?

KubeRay: A Ray cluster management solution on Kubernetes

KubeRay: A Ray cluster management solution on Kubernetes

What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok

What’s Going on in the Containerd Neighborhood? - P. Estes, S. Karp, A. Suda, M. Brown, K. Ashok

Enabling Cost-Efficient LLM Serving with Ray Serve

Enabling Cost-Efficient LLM Serving with Ray Serve

Сквозные многозадачные операции (MLOps) с MLflow и Kubeflow — Ник Чейз, CloudGeometry

Сквозные многозадачные операции (MLOps) с MLflow и Kubeflow — Ник Чейз, CloudGeometry

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Efficient LLM Deployment: A Unified Approach with Ray, VLLM, and Kubernetes - Lily (Xiaoxuan) Liu

Лучший Гайд по Kafka для Начинающих За 1 Час

Лучший Гайд по Kafka для Начинающих За 1 Час

Introduction to Model Deployment with Ray Serve

Introduction to Model Deployment with Ray Serve

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

From Manual to Managed: Prometheus Agent Deployment at Scale - Mihail Mihaylov

From Manual to Managed: Prometheus Agent Deployment at Scale - Mihail Mihaylov