Build an AI Document (PDF, DOC, XML) Processing Pipeline for RAG | Docling, OCR, Chunking, Images
Автор: Venelin Valkov
Загружено: Premiered Apr 6, 2025
Просмотров: 2,693 views
Full-text tutorial with source code (requires MLExpert Pro): https://www.mlexpert.io/v2-bootcamp/d...
Step-by-step tutorial on building an AI document processing pipeline - completely local. Convert PDFs, perform OCR, use VLMs for images, apply LLM semantic chunking, and add context. Get your documents ready for RAG and AI models.
Docling: https://docling-project.github.io/doc...
Chunking evaluation: https://research.trychroma.com/evalua...
PDF document: https://nvidianews.nvidia.com/news/nv...
AI Bootcamp: https://www.mlexpert.io/
LinkedIn: / venelin-valkov
Follow me on X: / venelin_valkov
Discord: / discord
Subscribe: http://bit.ly/venelin-subscribe
GitHub repository: https://github.com/curiousily/AI-Boot...
👍 Don't Forget to Like, Comment, and Subscribe for More Tutorials!
00:00 - Welcome
01:01 - Document processing pipeline
02:07 - Full-text tutorial and source code on MLExpert.io
02:41 - Docling
03:53 - PDF document sample
04:38 - Notebook setup
05:45 - PDF to Markdown (OCR, layout analysis, image to text)
08:45 - Visual inspection
11:02 - Image annotations
14:37 - Chunking with Ollama (and Gemma 3)
19:58 - Contextual enrichment (retrieval)
21:50 - Test the pipeline with simple RAG
24:42 - Conclusion
Join this channel to get access to the perks and support my work:
/ @venelin_valkov
#rag #ocr #documentprocessing #ollama #chatgpt #python #artificialintelligence

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: