Can Vision Language Models ( VLM's) replace OCR ? OmniAI OCR Benchmark
Автор: AI WITH Rithesh
Загружено: 24 февр. 2025 г.
Просмотров: 933 просмотра
Are LLMs a total replacement for traditional OCR models? It's been an increasingly hot topic, especially with models like Gemini 2.0 becoming cost competitive with traditional OCR.
To answer this, OmniAI run a benchmark evaluating OCR accuracy between traditional OCR providers and Vision Language Models. This is run with a wide variety of real world documents. Including all the complex, messy, low quality scans you might expect to see in the wild.
The evaluation dataset and methodologies are entirely Open Source. You can run the benchmark yourself using their benchmark repository on Github. You can also view the raw data from the benchmark in the Hugging Face repository. The following results evaluate the top VLMs and OCR providers on 1,000 documents. They measure accuracy, cost, and latency for each provider.The following results evaluate the 10 most popular providers on 1,000 documents.
Overall VLMs performance matched or exceeded most traditional OCR providers. The most notable performance gains were in documents with charts/infograpics, handwriting, or complex input fieds (i.e. checkboxes, highlighted fields). VLMs are also more predictable on photos and low quality scans. They are generally more capable of "looking past the noise" of scan lines, creases, watermarks. Traditional models tend to outperform on high-density pages (textbooks, research papers) as well as common document formats like tax forms.
Relevant Links:
https://getomni.ai/ocr-benchmark
https://arxiv.org/html/2305.07895v5
If you like to support me financially, It is totally optional and voluntary. Buy me a coffee here: https://www.buymeacoffee.com/rithesh
If you like such content please subscribe to the channel here:
https://www.youtube.com/c/RitheshSree...

Доступные форматы для скачивания:
Скачать видео mp4
-
Информация по загрузке: