ViDoRe Chunk OCR (baseline) - a vidore Collection

vidore 's Collections

ViDoRe Benchmark v2

ColQwen2 Models

Hf-native ColVision Models

ViDoRe Benchmark

ViDoRe Benchmark (BEIR)

ViDoRe Chunk OCR (baseline)

ColPali Paper Resources

ViDoRe Page OCR (artifact)

ViDoRe Chunk OCR (baseline)

updated Jan 23

The ViDoRe benchmark was passed to Unstructured to partition each page into text chunks. Detected figures/tables were captioned with Claude 3-Sonnet.

vidore/arxivqa_test_subsampled_ocr_chunk

Viewer • Updated Jun 13, 2024 • 1.44k • 28
vidore/docvqa_test_subsampled_ocr_chunk

Viewer • Updated Jun 13, 2024 • 1.24k • 50
vidore/infovqa_test_subsampled_ocr_chunk

Viewer • Updated Jun 13, 2024 • 2.78k • 29
vidore/tabfquad_test_subsampled_ocr_chunk

Viewer • Updated Jun 13, 2024 • 636 • 19
vidore/tatdqa_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 8.54k • 70
vidore/shiftproject_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 2.05k • 34
vidore/syntheticDocQA_artificial_intelligence_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 1.9k • 140
vidore/syntheticDocQA_energy_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 2.29k • 143
vidore/syntheticDocQA_government_reports_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 1.9k • 131
vidore/syntheticDocQA_healthcare_industry_test_ocr_chunk

Viewer • Updated Jun 13, 2024 • 2.17k • 181