SGLang: Open-Source Model Performance Optimization
Автор: AMD Developer Central
Загружено: 2025-11-10
Просмотров: 297
This talk introduces SGLang, a high-performance serving framework for large language models (LLMs) and vision-language models (VLMs), and reviews key advancements achieved in 2025. Yineng Zhang covers optimizations for DeepSeek V3 that improve throughput and latency, large-scale production deployments, and the integration of reinforcement learning to adapt serving policies under real workloads. The session details training acceleration via speculative decoding, hierarchical KV caching for memory efficiency at scale, and deterministic inference for reproducibility and compliance. He also highlights day-0 support for new model families, robust model deployment orchestration, and distributed inference on AMD platforms to unlock cost-effective performance.
Find the resources you need to develop using AMD products: https://www.amd.com/en/developer.html
***
© 2025 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, EPYC, ROCm, and AMD Instinct and combinations thereof are trademarks of Advanced Micro Devices, Inc.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: