Multimodality-Guided Image Style Transfer Using Cross-Modal GAN Inversion

Автор: ComputerVisionFoundation Videos

Загружено: 30 янв. 2024 г.

Просмотров: 292 просмотра

Описание:

Authors: Hanyu Wang; Pengxiang Wu; Kevin Dela Rosa; Chen Wang; Abhinav Shrivastava
Description: Image Style Transfer (IST) is an interdisciplinary topic of computer vision and art that continuously attracts researchers’ interests. Different from traditional Image-guided Image Style Transfer (IIST) methods that require a style reference image as input to define the desired style, recent works start to tackle the problem in a text-guided manner, i.e., Text-guided Image Style Transfer (TIST). Compared to IIST, such approaches provide more flexibility with text-specified styles, which are useful in scenarios where the style is hard to define with reference images. Unfortunately, many TIST approaches produce undesirable artifacts in the transferred images. To address this issue, we present a novel method to achieve much improved style transfer based on text guidance. Meanwhile, to offer more flexibility than IIST and TIST, our method allows style inputs from multiple sources and modalities, enabling MultiModality-guided Image Style Transfer (MMIST). Specifically, we realize MMIST with a novel cross-modal GAN inversion method, which generates style representations consistent with specified styles. Such style representations facilitate style transfer and in principle generalize any IIST methods to MMIST. Large-scale experiments and user studies demonstrate that our method achieves state-of-the-art performance on TIST task. Furthermore, comprehensive qualitative results confirm the effectiveness of our method on MMIST task and cross-modal style interpolation.

Multimodality-Guided Image Style Transfer Using Cross-Modal GAN Inversion

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

ZIGNeRF: Zero-Shot 3D Scene Representation With Invertible Generative Neural Radiance Fields

ZIGNeRF: Zero-Shot 3D Scene Representation With Invertible Generative Neural Radiance Fields

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

Cloud Computing For Beginners | What is Cloud Computing | Cloud Computing Explained | Simplilearn

Cloud Computing For Beginners | What is Cloud Computing | Cloud Computing Explained | Simplilearn

سورة البقرة كاملة بدون اعلانات فضيلة الشيخ سعود الشريم surah baqarah saud shuraim

سورة البقرة كاملة بدون اعلانات فضيلة الشيخ سعود الشريم surah baqarah saud shuraim

Principles for a Self-Explainable Model Through Information Theoretic Learning: Changkyu Choi (UiT)

Principles for a Self-Explainable Model Through Information Theoretic Learning: Changkyu Choi (UiT)

[Seminar] Self-Composing Policies for Scalable Continual RL

[Seminar] Self-Composing Policies for Scalable Continual RL

КАК ПЕРЕСТАТЬ ТУПИТЬ? | амоБлог

КАК ПЕРЕСТАТЬ ТУПИТЬ? | амоБлог

Самая простая нерешённая задача — гипотеза Коллатца [Veritasium]

Самая простая нерешённая задача — гипотеза Коллатца [Veritasium]

Как устроен QR-код? [Veritasium]

Как устроен QR-код? [Veritasium]

Кухня | Сезон 1 | Серия 1 - 5

Кухня | Сезон 1 | Серия 1 - 5