Benchmark for error detection in LLM responses
Ryo Kamoi
ryokamoi
AI & ML interests
NLP
Recent Activity
updated
a collection
7 days ago
ReaLMistake
liked
a dataset
15 days ago
sam-paech/mmlu-pro-nomath-sml
liked
a model
about 1 month ago
Qwen/Qwen2.5-Math-7B-PRM800K
Organizations
Collections
2
Dataset for evaluating the visual perception capabilities of LVLMs.
-
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information
Paper • 2412.00947 • Published • 8 -
ryokamoi/VisOnlyQA_Eval_Real_v1.1
Viewer • Updated • 900 • 223 -
ryokamoi/VisOnlyQA_Eval_Synthetic
Viewer • Updated • 700 • 72 • 2 -
ryokamoi/VisOnlyQA_Train
Viewer • Updated • 70k • 282 • 2
models
0
None public yet
datasets
6
ryokamoi/VisOnlyQA_metadata
Viewer
•
Updated
•
3
•
35
ryokamoi/VisOnlyQA_Train
Viewer
•
Updated
•
70k
•
282
•
2
ryokamoi/VisOnlyQA_Eval_Synthetic
Viewer
•
Updated
•
700
•
72
•
2
ryokamoi/VisOnlyQA_Eval_Real
Viewer
•
Updated
•
500
•
76
•
2
ryokamoi/VisOnlyQA_Eval_Real_v1.1
Viewer
•
Updated
•
900
•
223
ryokamoi/realmistake
Viewer
•
Updated
•
903
•
73
•
2