Local AI Speed Test: Qwen3, Llama, GPT-OSS and Deepseek model tested
Автор: InfraSec
Загружено: 2026-01-05
Просмотров: 34
Today, I'm gonna compare the speeds of different local LLMs models.
I'm using Ollama with OpenWebUI as interface on my X99 server.
ATTENTION: Nothing the LLMs said in the video was checked, it's just a speed comparison, accuracy wasn't counted.
The specs for my server are:
CPU: Intel Xeon E5 2680 V4
RAM: 32 GB DDR4 2133 MHz
GPU: Nvidia Tesla P100 PCIE (16 GB HBM2) Power limited at 125W for cooling purposes (10-20% performance loss in those tests)
Motherboard: chinese x99 from Aliexpress (Mougol X99 bundle CPU + RAM)
Storage: 4x 1 TB HDDs, 1 256 GB SATA Samsung SSD, 1 NvMe PCIE 3.0 4x Patriot 512 GB.
Timestamps:
0:00 GPT OSS
2:20 Llama3.1 8b Q8
4:00 Llama2 Uncensored 7b Q4
5:00 Llama3.2 3b Q5
6:00 Qwen3 1.7b Q8
7:00 Qwen3 4b Q4
8:20 Qwen3 4b FP16
10:00 Qwen3 8b Q4
11:30 Qwen3 8b FP16
13:45 Qwen3 14b Q4
16:35 Qwen3 14b Q8
19:30 Qwen3coder 30b Q4
24:00 DeepseekR1 1.5b Q4
24:40 DeepseekR1 7b Q4
26:05 DeepseekR1 14b Q4
27:45 DeepseekR1 32b Q4
29:15 DeepseekCoder 33b Q4
31:35 DeepseekCoderV2 16b Q4
33:15 Summary
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: