Tuning Free (Inference Time) Alignment of Large Language Models - Amrit Singh Bedi

Автор: WaterlooAI

Загружено: 2025-02-03

Просмотров: 137

Описание:

Abstract: Traditional fine-tuning of foundation models is computationally heavy, involving updates to billions of parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward r, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function (Q*), which is often unavailable in practice. We propose Transfer Q*, which implicitly estimates the optimal value function for a target reward through a baseline model aligned with a baseline reward rBL (which can be different from the target reward). Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets.

Bio: Amrit Singh Bedi is an assistant professor in the Computer Science department at the University of Central Florida, Fl, USA. Before that, He was a research assistant professor in the Computer Science Department at the University of Maryland, College Park, MD, USA. He obtained his Ph.D. in Electrical Engineering from IIT Kanpur, Kanpur, India, in 2018. Following his doctoral studies, he worked as a Research Associate within the Computational and Information Sciences Directorate at the US Army Research Laboratory (ARL) in Adelphi, MD, USA, from 2019 to 2022. His research interests lie in artificial intelligence (AI) for autonomous systems, with specific emphasis on scalable & sample-efficient learning algorithms. Currently, he is working on the problem of AI alignment in language models. His paper was selected as one of the Best Paper Finalists at the 2017 IEEE Asilomar Conference on Signals, Systems, and Computers. He received an honorable mention from the IEEE Robotics and Automation Letters in 2020. He was awarded the Amazon Research Award in 2022.

Tuning Free (Inference Time) Alignment of Large Language Models - Amrit Singh Bedi

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Bridging the Reality Gap in Reinforcement Learning - Sophia Lien

Bridging the Reality Gap in Reinforcement Learning - Sophia Lien

Spiking Neural Networks for More Efficient AI Algorithms

Spiking Neural Networks for More Efficient AI Algorithms

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Судьи не думали, что она умеет петь... Но потом она открыла рот!

Судьи не думали, что она умеет петь... Но потом она открыла рот!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

Hassabis on an AI Shift Bigger Than Industrial Age

Hassabis on an AI Shift Bigger Than Industrial Age

AlphaFold - The Most Useful Thing AI Has Ever Done

AlphaFold - The Most Useful Thing AI Has Ever Done

Новый курс обучения DeepSeek LLM - Гиперсоединения с ограничениями многообразия (mHC)

Новый курс обучения DeepSeek LLM - Гиперсоединения с ограничениями многообразия (mHC)

Probalistic Inference & Decision making with foundation models for bayesian optimization – Agustinus

Probalistic Inference & Decision making with foundation models for bayesian optimization – Agustinus

Екатерина Шульман. Был ли авторитарный разворот заложен в Конституции 1993? / Лекция №5

Екатерина Шульман. Был ли авторитарный разворот заложен в Конституции 1993? / Лекция №5

Language Model Alignment: Theory & Algorithms

Language Model Alignment: Theory & Algorithms

Высокомерный полицейский остановил чернокожего агента ФБР и пожалел об этом

Высокомерный полицейский остановил чернокожего агента ФБР и пожалел об этом

The World's Most Important Machine

The World's Most Important Machine

Переговоры в Абу-Даби, Киев на грани гуманитарной катастрофы и секретное оружие американцев

Переговоры в Абу-Даби, Киев на грани гуманитарной катастрофы и секретное оружие американцев

Как происходит модернизация остаточных соединений [mHC]

Как происходит модернизация остаточных соединений [mHC]

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

ЛУЧШАЯ БЕСПЛАТНАЯ НЕЙРОСЕТЬ Google, которой нет аналогов

Fair and Optimal Prediction via Post-Processing - Han Zhao

Fair and Optimal Prediction via Post-Processing - Han Zhao

Самая сложная модель из тех, что мы реально понимаем

Самая сложная модель из тех, что мы реально понимаем

Бывший рекрутер Google объясняет, почему «ложь» помогает получить работу.

Бывший рекрутер Google объясняет, почему «ложь» помогает получить работу.

Mr Bean does 'Blind Date' | Comic Relief

Mr Bean does 'Blind Date' | Comic Relief