Processing Videos for GPT-4o and Search
Автор: James Briggs
Загружено: 2024-05-21
Просмотров: 7099
Recent multi-modal models like OpenAI's gpt-4o and Google's Gemini 1.5 models can comprehend video. When feeding video into these new models, we can push frames at a set frequency (for example, one frame every second) — but this method can be wildly inefficient and expensive.
Fortunately, there is a better method called "semantic chunking." Semantic chunking is a common method used in text-based Retrieval-Augmented Generation (RAG), but we can apply the same logic to video using image embedding models. Using the similarity between these frames, we can effectively split videos based on the semantic meaning of the constituent frames.
In this video, we'll explore how to use two test videos and chunk them into semantic blocks.
📌 Code:
https://github.com/aurelio-labs/seman...
📖 Article:
https://www.aurelio.ai/learn/video-ch...
⭐ Repo:
https://github.com/aurelio-labs/seman...
🌟 Build Better Agents + RAG:
https://platform.aurelio.ai (use "JBMARCH2025" coupon code for $20 free credits)
👾 Discord:
/ discord
Twitter: / jamescalam
LinkedIn: / jamescalam
#ai #artificialintelligence #openai
00:00 Semantic Chunking
00:24 Video Chunking and gpt-4o
01:59 Video Chunking Code
03:28 Setting up the Vision Transformer
05:56 ViT vs. CLIP and other models
06:40 Video Chunking Results
08:37 Using CLIP for Vision Chunking
11:29 Final Conclusion on Video Processing
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: