ACM AI | Compressing LLMs for Efficient Inference | Reading Group W25W6

Автор: ACM at UCLA

Загружено: 2025-02-22

Просмотров: 96

Описание:

LLMs are computationally expensive not only to train but also to run inference. 🥲 Therefore a major focus of current research is on how to compress LLMs, while retaining as much performance as possible. 😊 In this talk, we will explore recent papers on compressing DL models, with a focus on LLMs. We will cover different approaches including Low Rank Compression, Pruning, Quantization, and maybe Knowledge Distillation!?. In addition, we will discuss some techniques for post compression fine tuning to recover performance and perhaps some efficient finetuning algorithms such as LoftQ and QLora!

ACM AI | Compressing LLMs for Efficient Inference | Reading Group W25W6

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео