VLLM K/V Caching With Ceph - Kyle Bader, IBM & Tushar Gohad, Intel
Автор: Ceph
Загружено: 2025-11-19
Просмотров: 104
VLLM K/V Caching With Ceph - Kyle Bader, IBM & Tushar Gohad, Intel
Generative AI and LLMs are all the rage right now, and many people are asking where storage fits in and how it can help with either accelerating or reducing the cost of various AI workflows. In this session we will dive into a prototype Ceph caching plugin for vLLM that allows offloading attention states to Ceph, lowering the cost of inference by allowing clustered NVMe to complement GPU memory. We will describe how K/V caching fits into inference workloads, caching plugin implementation details, how we think it should evolve, and share some preliminary performance data.
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: