Can MLLMs Perform Text-to-Image In-Context Learning? (Re-recorded version)

Автор: UWMadison MLOPT Idea Seminar

Загружено: 2024-02-23

Просмотров: 315

Описание:

This video has been re-recorded due to the original presentation not being captured during the talk.

Speaker: Yuchen Zeng (https://yzeng58.github.io/zyc_cv/) from UW-Madison
Time: Feb 23, 2024, 12:45 PM – 1:45 PM CT
Paper Link: https://arxiv.org/abs/2402.01293
Abstract: The evolution from Large Language Models (LLMs) to Multimodal Large Language Models (MLLMs) has spurred research into extending In-Context Learning (ICL) to its multimodal counterpart. Existing such studies have primarily concentrated on image-to-text ICL. However, the Text-to-Image ICL (T2I-ICL), with its unique characteristics and potential applications, remains underexplored. To address this gap, we formally define the task of T2I-ICL and present CoBSAT, the first T2I-ICL benchmark dataset, encompassing ten tasks. Utilizing our dataset to benchmark six state-of-the-art MLLMs, we uncover considerable difficulties MLLMs encounter in solving T2I-ICL. We identify the primary challenges as the inherent complexity of multimodality and image generation. To overcome these challenges, we explore strategies like fine-tuning and Chain-of-Thought prompting, demonstrating notable improvements. Our code and dataset are available at https://github.com/UW-Madison-Lee-Lab....
Bio: Yuchen is a graduate student pursuing a PhD’s degree in the Department of Computer Science at the University of Wisconsin-Madison. She is advised by Prof. Kangwook Lee. Her current research interest is centering on large language models.
Location: Engineering Research Building (1550 Engineering Drive) Room 106

Can MLLMs Perform Text-to-Image In-Context Learning? (Re-recorded version)

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Anticipation and the Anticipatory Music Transformer

Anticipation and the Anticipatory Music Transformer

Consistent Diffusion Models and Learning from Corrupted Data with Ambient Diffusion

Consistent Diffusion Models and Learning from Corrupted Data with Ambient Diffusion

[H-JEPA] Hierarchical Joint Embedding Predictive Architecture (V-JEPA) for Autonomous Intelligence

[H-JEPA] Hierarchical Joint Embedding Predictive Architecture (V-JEPA) for Autonomous Intelligence

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

20 концепций искусственного интеллекта, объясненных за 40 минут

20 концепций искусственного интеллекта, объясненных за 40 минут

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

LLM fine-tuning или ОБУЧЕНИЕ малой модели? Мы проверили!

18 КРУТЫХ способов для ChatGPT (что кажется нелегально)

18 КРУТЫХ способов для ChatGPT (что кажется нелегально)

Short review on Rethinking the role of Demonstrations: What makes In-context Learning Work?

Short review on Rethinking the role of Demonstrations: What makes In-context Learning Work?

Compositional Visual-Linguistic Models Via Visual Markers and Counterfactual Examples

Compositional Visual-Linguistic Models Via Visual Markers and Counterfactual Examples

Mechanism of feature learning in neural networks

Mechanism of feature learning in neural networks

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

ИИ - ЭТО ИЛЛЮЗИЯ ИНТЕЛЛЕКТА. Но что он такое и почему совершил революцию?

Лижут ли Вас Собаки? ВОТ ЧТО ЭТО ЗНАЧИТ (вас шокирует)!

Лижут ли Вас Собаки? ВОТ ЧТО ЭТО ЗНАЧИТ (вас шокирует)!

Краткое объяснение больших языковых моделей

Краткое объяснение больших языковых моделей

Как начать вайб-кодить с ИИ: 6 принципов, которые заменят дорогие курсы

Как начать вайб-кодить с ИИ: 6 принципов, которые заменят дорогие курсы

Тренды в ИИ 2026. К чему готовиться каждому.

Тренды в ИИ 2026. К чему готовиться каждому.

ChatGPT + Nano Banana: Твой Личный AI-Дизайнер. Пошаговая Инструкция 2026

ChatGPT + Nano Banana: Твой Личный AI-Дизайнер. Пошаговая Инструкция 2026

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость

Если у тебя спросили «Как твои дела?» — НЕ ГОВОРИ! Ты теряешь свою силу | Еврейская мудрость

Steerable Visual Intelligence

Steerable Visual Intelligence

Я попробовал все нейросети для видео! Какую выбрать? МОЙ ТОП

Я попробовал все нейросети для видео! Какую выбрать? МОЙ ТОП

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ

КОЗЫРЕВ - астрофизик ДОКАЗАЛ, что ВРЕМЯ это ЭНЕРГИЯ: дважды СИДЕЛ, приговорён к РАССТРЕЛУ