Speeding Up AI Quantization Techniques for Models and Vector DBs

Автор: Weaviate vector database

Загружено: 2025-03-26

Просмотров: 444

Описание:

In this talk, Marcin Antas ( / antasmarcin , a senior Core Engineer who's been at ‪@Weaviate‬ for over 4 years, breaks down the essential techniques for optimizing AI models through quantization.

Learn how to significantly reduce the memory footprint of large language models and embedding models while preserving their functionality - even on constrained devices like Raspberry Pi 5!

🔑 Key Topics Covered:
LLM quantization techniques (from FP16/FP8 to 4-bit precision)
The GGUF format and LLAMA.cpp framework
Why feed-forward layer parameters are more sensitive than attention layers
Embedding model quantization using ONNX
Vector database quantization methods (Product, Binary, and Scalar)
Running vector databases and AI models on edge devices

This technical deep dive is perfect for developers looking to optimize AI models for memory-constrained environments or deploy vector search capabilities on edge devices.

Learn more from Weaviate at https://weaviate.io.

Speeding Up AI Quantization Techniques for Models and Vector DBs

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео