PaliGemma by Google: Train Model on Custom Detection Dataset

Автор: Roboflow

Загружено: 2024-06-03

Просмотров: 15655

Описание:

Learn how to fine-tune PaliGemma, Google's open-source Vision-Language Model, for custom object detection tasks. This step-by-step tutorial walks you through modifying Google's notebook to train PaliGemma on your dataset. We'll use the handwritten digits and math operations dataset from RF100, explore the JSONL format, and demonstrate how to deploy your fine-tuned model for real-world inference. Discover the power of PaliGemma for image captioning, VQA, and object detection, and overcome its limitations.

Chapters:

00:00 PaliGemma Capabilities
02:03 Environment Setup
05:25 Dataset Format
09:07 Downloading Pre-trained Model
11:27 Loading Dataset
13:45 Training and Evaluating the Model
15:19 Deploying the Model
17:37 Important Considerations
20:02 Outro

Resources:

Roboflow: https://roboflow.com

🔴 Community Session June 6th, 2024 at 08:00 AM PST / 11:00 AM EST / 05:00 PM CET: https://roboflow.stream

⭐ Notebooks GitHub: https://github.com/roboflow/notebooks
⭐ Supervision GitHub: https://github.com/roboflow/supervision

📓 PaliGemma notebook: https://colab.research.google.com/git...

🗞 Gemma arXiv paper: https://arxiv.org/pdf/2403.08295
🗞 SigLIP arXiv paper: https://arxiv.org/pdf/2303.15343
🗞 PaliGemma blog post: https://blog.roboflow.com/how-to-fine...

🔗 RF100: https://www.rf100.org
🔗 PaliGemma model card: https://www.kaggle.com/models/google/...
🔗 PaliGemma fine-tuned checkpoints: https://huggingface.co/collections/go...
🔗 PaliGemma HF Space: https://huggingface.co/spaces/big-vis...

Stay updated with the projects I'm working on at https://github.com/roboflow and https://github.com/SkalskiP! ⭐

PaliGemma by Google: Train Model on Custom Detection Dataset

Доступные форматы для скачивания:

Скачать видео mp4

Информация по загрузке:

Скачать аудио mp3

Похожие видео

Florence-2: Fine-tune Microsoft’s Multimodal Model

Florence-2: Fine-tune Microsoft’s Multimodal Model

YOLO-World: Real-Time, Zero-Shot Object Detection Explained

YOLO-World: Real-Time, Zero-Shot Object Detection Explained

CPU VS CPU

How good is YOLOv10? | Hacking Google's new VLM, PaliGemma | Community Q&A (Jun 6)

How good is YOLOv10? | Hacking Google's new VLM, PaliGemma | Community Q&A (Jun 6)

Fine-tune PaliGemma for image to JSON use cases

Fine-tune PaliGemma for image to JSON use cases

Train RFDETR for Custom Object Detection | Step-by-Step Deep Learning Guide

Train RFDETR for Custom Object Detection | Step-by-Step Deep Learning Guide

YOLO11: бесплатное обучение на пользовательском наборе данных в Google Colab

YOLO11: бесплатное обучение на пользовательском наборе данных в Google Colab

PaliGemma – Making Gemma 2 see by adding a vision encoder

PaliGemma – Making Gemma 2 see by adding a vision encoder

YOLOv11: How to Train for Object Detection on a Custom Dataset | Step-by-step guide

YOLOv11: How to Train for Object Detection on a Custom Dataset | Step-by-step guide

Basketball AI: Player Tracking, Team Detection, and Number Recognition with Python

Basketball AI: Player Tracking, Team Detection, and Number Recognition with Python

NanoNets OCR-s

Beyond mAP: How to Evaluate and Improve Vision AI Models

Beyond mAP: How to Evaluate and Improve Vision AI Models

Train Yolov10 object detection custom data FULL GUIDE | Computer vision tutorial

Train Yolov10 object detection custom data FULL GUIDE | Computer vision tutorial

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Чем ОПАСЕН МАХ? Разбор приложения специалистом по кибер безопасности

Тонкая настройка DeepSeek R1 | Создание медицинского чат-бота

Тонкая настройка DeepSeek R1 | Создание медицинского чат-бота

Обучение YOLOv9 на пользовательском наборе данных в Google Colab с помощью Roboflow

Обучение YOLOv9 на пользовательском наборе данных в Google Colab с помощью Roboflow

Top Vision Models 2025: Qwen 2.5 VL, Moondream, & SmolVLM (Fine-Tuning & Benchmarks)

Top Vision Models 2025: Qwen 2.5 VL, Moondream, & SmolVLM (Fine-Tuning & Benchmarks)

NotebookLM: большой разбор инструмента (12 сценариев применения)

NotebookLM: большой разбор инструмента (12 сценариев применения)

Feed Your OWN Documents to a Local Large Language Model!

Feed Your OWN Documents to a Local Large Language Model!

Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA

Detect Anything You Want with Grounding DINO | Zero Shot Object Detection SOTA