Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024

Автор: Anyscale

Загружено: 21 окт. 2024 г.

Просмотров: 468 просмотров

Описание:

Hanyu Zhao from Alibaba Group presents Llumnix, a dynamic request scheduling system for large language models, at Ray Summit 2024. Built on vLLM and Ray, Llumnix addresses key challenges in LLM serving through innovative runtime rescheduling and KV cache migration across instances.

Zhao discusses how Llumnix reduces prefill latencies through cross-instance defragmentation and minimizes tail decoding latencies by balancing loads and reducing preemptions. The talk covers the research journey behind Llumnix, from its origins to its publication at OSDI '24, and its subsequent deployment and evolution at Alibaba.

The presentation provides insights into the current state of Llumnix and outlines future development plans. Zhao also highlights the open-source nature of the project, available on GitHub, encouraging community engagement and collaboration.

This session offers valuable information for those interested in optimizing LLM serving, particularly in large-scale, high-performance environments. It demonstrates practical applications of Ray and vLLM in addressing complex scheduling challenges in AI infrastructure.

--

Interested in more?
Watch the full Day 1 Keynote:    • Ray Summit 2024 Keynote Day 1 | Where Buil...
Watch the full Day 2 Keynote    • Ray Summit 2024 Keynote Day 2 | Where Buil...

--

🔗 Connect with us:
Subscribe to our YouTube channel:    / @anyscale
Twitter: https://x.com/anyscalecompute
LinkedIn:   / joinanyscale
Website: https://www.anyscale.com

Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

How Uber Optimize Marketplaces with Ray | Ray Summit 2024

How Uber Optimize Marketplaces with Ray | Ray Summit 2024

🌧️ Cozy Bedroom Ambience 🎶 Relaxing Piano Jazz Music on a Rainy Night in Paris City for Deep Sleep 😴

🌧️ Cozy Bedroom Ambience 🎶 Relaxing Piano Jazz Music on a Rainy Night in Paris City for Deep Sleep 😴

Greg Brockman on Founding OpenAI and Systems for AI | Ray Summit 2022

Greg Brockman on Founding OpenAI and Systems for AI | Ray Summit 2022

n8n Beginner Course (2/9) - Introduction to APIs and Webhooks

n8n Beginner Course (2/9) - Introduction to APIs and Webhooks

سورة يس، سورة الواقعة، سورة الرحمن، سورة الملك شغلها بنية جلب الرزق _ بصوت الشيخ عبد العزيز سحيم (1)

سورة يس، سورة الواقعة، سورة الرحمن، سورة الملك شغلها بنية جلب الرزق _ بصوت الشيخ عبد العزيز سحيم (1)

일상생활 최고의 카페 음악 🎧 Relaxing Cafe Music 📌 오월 이십 팔일

일상생활 최고의 카페 음악 🎧 Relaxing Cafe Music 📌 오월 이십 팔일

Cybersecurity Architecture: Networks

Cybersecurity Architecture: Networks

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Музыка для работы — Deep Focus Mix для программирования, кодирования

Музыка для работы — Deep Focus Mix для программирования, кодирования

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024