Multimodal AI in 2025: Testing Commercial and Open Source Models & Modalities
Автор: CanAIHelp
Загружено: 2025-04-02
Просмотров: 357
🚀 Multimodal AI in 2025! 🚀
AI isn’t just about text anymore—it sees, hears, and even reasons across multiple types of data. But which models are actually delivering? In this video, I test and explore the latest multimodal AI models, from Gemini 2 and Apple’s Intelligence to open-source challengers.
More content on Neural Nets here: • Neural Nets Explained
🔍 What’s inside?
✅ Hands-on tests with cutting-edge multimodal models
✅ Testing Gemini 2 with images, YouTube videos, videos, and screen sharing
✅ Open-source challengers like QVQ and InternVL—can they compete with the big names?
✅ AI beyond speech and vision—music from images, scent mapping, and even robotic action!
📖 Chapters:
1. 00:00 Intuition behind multimodal AI
2. 00:50 Gemini 2.0
3. 02:09 Gemini in Google AI Studio
4. 03:14 Screen share with Gemini 2.0
5. 03:58 Apple Intelligence
6. 06:11 Open Source Multimodal models
7. 07:47 QVQ model
8. 08:58 InternVL model
9. 09:40 Other modalities
💡 Whether you're a tech enthusiast, researcher, or just curious about AI's next leap, this video breaks it all down with real examples.
🔔 Like, subscribe, and join the conversation on the future of AI!
Links:
1. MMMU: https://mmmu-benchmark.github.io/
2. QVQ model: https://qwenlm.github.io/blog/qvq-72b...
3. IntenrVL: https://internvl.opengvlab.com/
4. Riffusion: https://www.riffusion.com/
5. Osmo AI: https://www.osmo.ai/
#AI #MultimodalAI #ArtificialIntelligence #Gemini2 #DeepLearning #MachineLearning #TechNews #OpenSourceAI #FutureTech
Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: