vLLM for Intel xpu on Dual Intel Arc B580 - Setup and Demo for VERY FAST LLM Performance!
Автор: YourAvgDev
Загружено: 2025-12-28
Просмотров: 3
Write up and instructions here: https://www.roger.lol/blog/accessible...
Let's go through the process in setting up vLLM for xpu for our Dual Intel Arc B580 system. We'll compare the speed of token generation using gpt-oss-20b at full context window (128K) against llama.cpp. Spoiler alert: vLLM for xpu is FAST. VERY FAST.
Also for fun you get to see some fun demos that we can create with this type of model and token generation speed on these GPUs. :)
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: