Processing Videos for GPT-4o and Search

Автор: James Briggs

Загружено: 2024-05-21

Просмотров: 7099

Описание:

Recent multi-modal models like OpenAI's gpt-4o and Google's Gemini 1.5 models can comprehend video. When feeding video into these new models, we can push frames at a set frequency (for example, one frame every second) — but this method can be wildly inefficient and expensive.

Fortunately, there is a better method called "semantic chunking." Semantic chunking is a common method used in text-based Retrieval-Augmented Generation (RAG), but we can apply the same logic to video using image embedding models. Using the similarity between these frames, we can effectively split videos based on the semantic meaning of the constituent frames.

In this video, we'll explore how to use two test videos and chunk them into semantic blocks.

📌 Code:
https://github.com/aurelio-labs/seman...

📖 Article:
https://www.aurelio.ai/learn/video-ch...

⭐ Repo:
https://github.com/aurelio-labs/seman...

🌟 Build Better Agents + RAG:
https://platform.aurelio.ai (use "JBMARCH2025" coupon code for $20 free credits)

👾 Discord:
  / discord

Twitter:   / jamescalam
LinkedIn:   / jamescalam

#ai #artificialintelligence #openai

00:00 Semantic Chunking
00:24 Video Chunking and gpt-4o
01:59 Video Chunking Code
03:28 Setting up the Vision Transformer
05:56 ViT vs. CLIP and other models
06:40 Video Chunking Results
08:37 Using CLIP for Vision Chunking
11:29 Final Conclusion on Video Processing

Processing Videos for GPT-4o and Search

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Семантическая фрагментация — 3 метода для улучшения RAG

Семантическая фрагментация — 3 метода для улучшения RAG

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

LLM и GPT - как работают большие языковые модели? Визуальное введение в трансформеры

i think this is what AI should look like

i think this is what AI should look like

Семантическое разделение для RAG

Семантическое разделение для RAG

What are AI guardrails? How do they work?

What are AI guardrails? How do they work?

Чамат утверждает, что OpenAI превращается в MySpace.

Чамат утверждает, что OpenAI превращается в MySpace.

Supercharge Your RAG with Contextualized Late Interactions

Supercharge Your RAG with Contextualized Late Interactions

LangChain Agents Deep Dive with GPT 3.5 — LangChain #7

LangChain Agents Deep Dive with GPT 3.5 — LangChain #7

Как я заставил ИИ-помощников выполнять мою работу: CrewAI

Как я заставил ИИ-помощников выполнять мою работу: CrewAI

Обучить собственную модель искусственного интеллекта не так сложно, как вы (вероятно) думаете

Обучить собственную модель искусственного интеллекта не так сложно, как вы (вероятно) думаете

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning Large Language Models (LLMs) | w/ Example Code

Fine-tuning OpenAI's GPT 3.5 for LangChain Agents

Fine-tuning OpenAI's GPT 3.5 for LangChain Agents

Крах Jaguar: Как “повестка” в рекламе добила легенду британского автопрома

Крах Jaguar: Как “повестка” в рекламе добила легенду британского автопрома

RAG But Better: Rerankers with Cohere AI

RAG But Better: Rerankers with Cohere AI

Генеральный директор Google DeepMind только что изменил мое представление об искусственном интелл...

Генеральный директор Google DeepMind только что изменил мое представление об искусственном интелл...

What Ilya Saw: The Truth That Could Change Everything About AI’s Future

What Ilya Saw: The Truth That Could Change Everything About AI’s Future

«Я хочу, чтобы Llama3 работала в 10 раз лучше, используя мои личные знания» — Local Agentic RAG с...

«Я хочу, чтобы Llama3 работала в 10 раз лучше, используя мои личные знания» — Local Agentic RAG с...

AI Agent Evaluation with RAGAS

AI Agent Evaluation with RAGAS

Advanced Guardrails for AI Agents | Full Tutorial

Advanced Guardrails for AI Agents | Full Tutorial

GraphRAG: графы знаний, полученные с помощью LLM, для RAG

GraphRAG: графы знаний, полученные с помощью LLM, для RAG