The State of vLLM | Ray Summit 2024

Автор: Anyscale

Загружено: 2024-10-18

Просмотров: 4765

Описание:

At Ray Summit 2024, Kuntai Du from the University of Chicago and Zhuohan Li from UC Berkeley present a comprehensive update on vLLM, the open-source LLM inference and serving engine. Their talk covers the significant developments in vLLM over the past year, focusing on its growing adoption, new features, and performance improvements.

The speakers discuss the project's community growth and governance changes, providing insight into vLLM's evolving ecosystem. They conclude by outlining the roadmap for upcoming releases, offering attendees a glimpse into the future direction of this fast-growing LLM serving solution.

This presentation is particularly valuable for those interested in the latest advancements in efficient LLM deployment and serving technologies.

--

Interested in more?
Watch the full Day 1 Keynote:    • Ray Summit 2024 Keynote Day 1 | Where Buil...
Watch the full Day 2 Keynote    • Ray Summit 2024 Keynote Day 2 | Where Buil...

--

🔗 Connect with us:
Subscribe to our YouTube channel:    / @anyscale
Twitter: https://x.com/anyscalecompute
LinkedIn:   / joinanyscale
Website: https://www.anyscale.com

The State of vLLM | Ray Summit 2024

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Fast LLM Serving with vLLM and PagedAttention

Fast LLM Serving with vLLM and PagedAttention

The Evolution of Multi-GPU Inference in vLLM | Ray Summit 2024

The Evolution of Multi-GPU Inference in vLLM | Ray Summit 2024

Andrej Karpathy: Software Is Changing (Again)

Andrej Karpathy: Software Is Changing (Again)

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Scaling Inference Deployments with NVIDIA Triton Inference Server and Ray Serve | Ray Summit 2024

Конфиденциальные вычисления: Доверенные вычисления в облаке — Адит Сачде

Конфиденциальные вычисления: Доверенные вычисления в облаке — Адит Сачде

vLLM on Kubernetes in Production

vLLM on Kubernetes in Production

Stanford Seminar - Nvidia’s H100 GPU

Stanford Seminar - Nvidia’s H100 GPU

Meta's Roadmap for Full Stack AI: Insights from Joe Spisak | Ray Summit 2024

Meta's Roadmap for Full Stack AI: Insights from Joe Spisak | Ray Summit 2024

Enabling Cost-Efficient LLM Serving with Ray Serve

Enabling Cost-Efficient LLM Serving with Ray Serve

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

Efficient LLM Inference with SGLang, Lianmin Zheng, xAI

How the VLLM inference engine works?

How the VLLM inference engine works?

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

NVIDIA Triton Inference Server and its use in Netflix's Model Scoring Service

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

How BMW Scales Automotive AI Workloads with the Ray Framework | Ray Summit 2025

How BMW Scales Automotive AI Workloads with the Ray Framework | Ray Summit 2025

How to pick a GPU and Inference Engine?

How to pick a GPU and Inference Engine?

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

KubeRay: A Ray cluster management solution on Kubernetes

KubeRay: A Ray cluster management solution on Kubernetes

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры