Популярное

Музыка Кино и Анимация Автомобили Животные Спорт Путешествия Игры Юмор

Интересные видео

2025 Сериалы Трейлеры Новости Как сделать Видеоуроки Diy своими руками

Топ запросов

смотреть а4 schoolboy runaway турецкий сериал смотреть мультфильмы эдисон
dTub
Скачать

Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024

Автор: Anyscale

Загружено: 21 окт. 2024 г.

Просмотров: 468 просмотров

Описание:

Hanyu Zhao from Alibaba Group presents Llumnix, a dynamic request scheduling system for large language models, at Ray Summit 2024. Built on vLLM and Ray, Llumnix addresses key challenges in LLM serving through innovative runtime rescheduling and KV cache migration across instances.

Zhao discusses how Llumnix reduces prefill latencies through cross-instance defragmentation and minimizes tail decoding latencies by balancing loads and reducing preemptions. The talk covers the research journey behind Llumnix, from its origins to its publication at OSDI '24, and its subsequent deployment and evolution at Alibaba.

The presentation provides insights into the current state of Llumnix and outlines future development plans. Zhao also highlights the open-source nature of the project, available on GitHub, encouraging community engagement and collaboration.

This session offers valuable information for those interested in optimizing LLM serving, particularly in large-scale, high-performance environments. It demonstrates practical applications of Ray and vLLM in addressing complex scheduling challenges in AI infrastructure.

--

Interested in more?
Watch the full Day 1 Keynote:    • Ray Summit 2024 Keynote Day 1 | Where Buil...  
Watch the full Day 2 Keynote    • Ray Summit 2024 Keynote Day 2 | Where Buil...  

--

🔗 Connect with us:
Subscribe to our YouTube channel:    / @anyscale  
Twitter: https://x.com/anyscalecompute
LinkedIn:   / joinanyscale  
Website: https://www.anyscale.com

Dynamic Scheduling for Large Language Model Serving | Ray Summit 2024

Поделиться в:

Доступные форматы для скачивания:

Скачать видео mp4

  • Информация по загрузке:

Скачать аудио mp3

Похожие видео

How Uber Optimize Marketplaces with Ray | Ray Summit 2024

How Uber Optimize Marketplaces with Ray | Ray Summit 2024

🌧️ Cozy Bedroom Ambience 🎶 Relaxing Piano Jazz Music on a Rainy Night in Paris City for Deep Sleep 😴

🌧️ Cozy Bedroom Ambience 🎶 Relaxing Piano Jazz Music on a Rainy Night in Paris City for Deep Sleep 😴

Greg Brockman on Founding OpenAI and Systems for AI | Ray Summit 2022

Greg Brockman on Founding OpenAI and Systems for AI | Ray Summit 2022

n8n Beginner Course (2/9) - Introduction to APIs and Webhooks

n8n Beginner Course (2/9) - Introduction to APIs and Webhooks

سورة يس، سورة الواقعة، سورة الرحمن، سورة الملك شغلها بنية جلب الرزق _ بصوت الشيخ عبد العزيز سحيم (1)

سورة يس، سورة الواقعة، سورة الرحمن، سورة الملك شغلها بنية جلب الرزق _ بصوت الشيخ عبد العزيز سحيم (1)

일상생활 최고의 카페 음악 🎧 Relaxing Cafe Music 📌 오월 이십 팔일

일상생활 최고의 카페 음악 🎧 Relaxing Cafe Music 📌 오월 이십 팔일

Cybersecurity Architecture: Networks

Cybersecurity Architecture: Networks

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Optimizing vLLM Performance through Quantization | Ray Summit 2024

Музыка для работы — Deep Focus Mix для программирования, кодирования

Музыка для работы — Deep Focus Mix для программирования, кодирования

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024

Scaling Ray to 10K NPUs: Huawei's Hyperscale Journey | Ray Summit 2024

© 2025 dtub. Все права защищены.



  • Контакты
  • О нас
  • Политика конфиденциальности



Контакты для правообладателей: [email protected]