Running FULL Llama 4 Locally (Test & Install!)
Автор: Bijan Bowen
Загружено: 8 апр. 2025 г.
Просмотров: 4 258 просмотров
Timestamps
00:00 - Intro
00:30 - Running Locally
02:43 - Python Game Test
04:51 - Improved Game Test
06:23 - Refusal Test
07:58 - Roleplay Test
11:20 - Specific Parameters
12:14 - Closing Thoughts
In this video, we test the newly released Llama-4 Scout model running locally in LM Studio using the Q4_K_M quantized GGUF builds by Bartowski. With support for local inference now available, we're able to run this large model on a setup with 2×3090 Ti GPUs and 128GB of system RAM.
We begin with a walkthrough of the local setup and configuration, then move into a series of practical tests to see how the model performs outside the cloud. Starting with a Python-based synthwave game generation, we push the model to improve its own output and get some surprisingly strong results. From there, we test its refusal handling, followed by a roleplay prompt to gauge how well it maintains character and conversational context.
Overall, Llama-4 Scout felt snappier and more enjoyable to use locally, even compared to previous online tests. Whether it’s a placebo or not, the performance and personality of the model in a quantized, local setup were genuinely impressive—and fun to explore.

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: