Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation (Sep 2025)
Автор: AI Papers Slop
Загружено: 2025-10-23
Просмотров: 33
Title: Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation (Sep 2025)
Link: http://arxiv.org/abs/2510.01284v1
Date: September 2025
Summary:
OVI introduces a unified, one-pass audio-video generation paradigm using twin-DiT modules with blockwise cross-modal fusion, eliminating the need for separate pipelines or post-hoc alignment. The model trains identical video and audio towers, initialized with a strong pretrained video model architecture, from scratch on vast audio and video data. Fusion is achieved through blockwise exchange of timing via scaled-ROPE embeddings and semantics via bidirectional cross-attention. OVI generates realistic sound effects and speech, enabling cinematic storytelling with natural synchronization and context-matched audio.
Key Topics:
Audio-Video Generation
Cross-Modal Fusion
Diffusion Transformers (DiT)
Generative Models
Temporal Synchronization
Rotary Positional Embeddings (RoPE)
Speech Synthesis
Sound Effects Generation
Chapters:
00:00 - Introducing OVI Research
00:40 - OVI's Core Unified Approach
02:05 - Challenges in AV Generation
03:10 - OVI's Five Key Contributions
04:11 - Symmetric Twin Backbone
05:27 - Unified Prompt Input
06:19 - Achieving Temporal Alignment
07:32 - Data Pipeline Overview
08:09 - Strict Data Synchronization
09:18 - Detailed Multimodal Captions
09:58 - Two-Stage Training Strategy
11:38 - Weighted Loss Function
12:28 - Performance & Human Evaluation
14:04 - Prompt Effectiveness & Visual Proof
15:23 - OVI Limitations & Future Work
16:48 - Impact & Real-time Potential
Stock video credits:
Colin Jones - https://www.pexels.com/@larchmedia
Yaroslav Shuraev - https://www.pexels.com/@yaroslav-shuraev
Charlie Mounsey - https://www.pexels.com/@charlie-mouns...
Pressmaster - https://www.pexels.com/@pressmaster
Kindel Media - https://www.pexels.com/@kindelmedia
@svetjekolem - https://www.pexels.com/@svetjekolem
Trippy Lagoon - https://www.pexels.com/@trippy-lagoon...
StefWithAnF - https://www.pexels.com/@stefwithanf-1...
Soumya - https://www.pexels.com/@soumya-1446957
Danil Shostak - https://www.pexels.com/@danil-shostak...
Oleg Gamulinskii - https://www.pexels.com/@oleg-gamulins...
Pixabay - https://www.pexels.com/@pixabay
crazy motions - https://www.pexels.com/@crazy-motions...
Silviu Din - https://www.pexels.com/@silviu-din-16...
Dan Cristian Pădureț - https://www.pexels.com/@paduret
Pavel Danilyuk - https://www.pexels.com/@pavel-danilyuk
Pachon in Motion - https://www.pexels.com/@pachon-in-mot...
Mikhail Nilov - https://www.pexels.com/@mikhail-nilov
cottonbro studio - https://www.pexels.com/@cottonbro
Anthony 🙂 - https://www.pexels.com/@inspiredimages
Stas Knop - https://www.pexels.com/@stasknop
Engin Akyurt - https://www.pexels.com/@enginakyurt
Kelly - https://www.pexels.com/@kelly
José Alfredo Munguía Lira - https://www.pexels.com/@rectorretro
KATRIN BOLOVTSOVA - https://www.pexels.com/@ekaterina-bol...
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: