Скачать
ACM AI | Compressing LLMs for Efficient Inference | Reading Group W25W6
Автор: ACM at UCLA
Загружено: 2025-02-22
Просмотров: 96
Описание:
LLMs are computationally expensive not only to train but also to run inference. 🥲 Therefore a major focus of current research is on how to compress LLMs, while retaining as much performance as possible. 😊 In this talk, we will explore recent papers on compressing DL models, with a focus on LLMs. We will cover different approaches including Low Rank Compression, Pruning, Quantization, and maybe Knowledge Distillation!?. In addition, we will discuss some techniques for post compression fine tuning to recover performance and perhaps some efficient finetuning algorithms such as LoftQ and QLora!
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: