SGLang: Open-Source Model Performance Optimization

Автор: AMD Developer Central

Загружено: 2025-11-10

Просмотров: 297

Описание:

This talk introduces SGLang, a high-performance serving framework for large language models (LLMs) and vision-language models (VLMs), and reviews key advancements achieved in 2025. Yineng Zhang covers optimizations for DeepSeek V3 that improve throughput and latency, large-scale production deployments, and the integration of reinforcement learning to adapt serving policies under real workloads. The session details training acceleration via speculative decoding, hierarchical KV caching for memory efficiency at scale, and deterministic inference for reproducibility and compliance. He also highlights day-0 support for new model families, robust model deployment orchestration, and distributed inference on AMD platforms to unlock cost-effective performance.

Find the resources you need to develop using AMD products: https://www.amd.com/en/developer.html

***

© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.

SGLang: Open-Source Model Performance Optimization

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Introduction to LLM serving with SGLang - Philip Kiely and Yineng Zhang, Baseten

Lianmin Zheng on Efficient LLM Inference with SGLang

Lianmin Zheng on Efficient LLM Inference with SGLang

Забудь VS Code — Вот Почему Все Переходят на Cursor AI

Забудь VS Code — Вот Почему Все Переходят на Cursor AI

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Accelerating LLM Inference with vLLM (and SGLang) - Ion Stoica

Leveraging Foundational Models for Multimodal Perception and Situation Awareness (Paul Lukowicz)

Leveraging Foundational Models for Multimodal Perception and Situation Awareness (Paul Lukowicz)

WasmEdge Community Meeting #42 Run your local and edge AI agents

WasmEdge Community Meeting #42 Run your local and edge AI agents

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

The Easiest Way to Get Started Running Models Locally And In The Cloud

The Easiest Way to Get Started Running Models Locally And In The Cloud

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

Программируем с ИИ в VS Code - БЕСПЛАТНО! Сможет каждый!

ROCm: Enabling Open Innovation and the Future of GPU Kernel Development

ROCm: Enabling Open Innovation and the Future of GPU Kernel Development

Как писать код с ИИ: советы от разработчика с 25-летним стажем

Как писать код с ИИ: советы от разработчика с 25-летним стажем

Это снова повторяется, и никто об этом не говорит.

Это снова повторяется, и никто об этом не говорит.

DGX Spark... First Mini PC That Feels Like a Data Center

DGX Spark... First Mini PC That Feels Like a Data Center

Introduction to Primus

Introduction to Primus

Google Antigravity: ЛУЧШАЯ AI IDE?

Google Antigravity: ЛУЧШАЯ AI IDE?

Kubernetes — Простым Языком на Понятном Примере

Kubernetes — Простым Языком на Понятном Примере

Смешайте ЛАК с КЛЕЕМ ПВА и откройте СЕКРЕТ, о котором мало кто знает! Удивительно!

Смешайте ЛАК с КЛЕЕМ ПВА и откройте СЕКРЕТ, о котором мало кто знает! Удивительно!

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

Превратите ЛЮБОЙ файл в знания LLM за СЕКУНДЫ

GPT OSS Release, Inference and Fine tuning

GPT OSS Release, Inference and Fine tuning

Deepseek V3.2 Exp: САМАЯ ДЕШЁВАЯ, НО МОЩНАЯ МОДЕЛЬ! Лучшая модель с открытым исходным кодом?

Deepseek V3.2 Exp: САМАЯ ДЕШЁВАЯ, НО МОЩНАЯ МОДЕЛЬ! Лучшая модель с открытым исходным кодом?