Spaces:

maum-ai
/

KOFFVQA-Leaderboard

Running

File size: 3,327 Bytes

bc0f422
 
 
 
 
 
 
 
da544b6
df32536
bc0f422
 
df32536
96e3dc1
 
fd98684
df32536
cc7cabb
df32536
bc0f422
df32536
bc0f422
df32536
 
 
 
 
 
 
bc0f422
 
 
642b52f
df32536
fd98684
df32536
85c8e9f
e01a765
0870049
96e3dc1
bc0f422
df32536
85c8e9f
 
 
61b5038
85c8e9f
 
 
 
 
 
61b5038
df32536
 
 
85c8e9f
df32536
bc0f422
df32536
85c8e9f
e01a765
 
 
df32536

import os
import base64

current_dir = os.path.dirname(os.path.realpath(__file__))

with open(os.path.join(current_dir, "bottom_logo.png"), "rb") as image_file:
    bottom_logo = base64.b64encode(image_file.read()).decode("utf-8")

benchname = 'KOFFVQA'

Bottom_logo = f'''<img src="data:image/jpeg;base64,{bottom_logo}" style="width:20%;display:block;margin-left:auto;margin-right:auto">'''

intro_md = f'''
# {benchname} Leaderboard

[**🏆 Leaderboard**](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) | [**📄 KOFFVQA Arxiv**](https://arxiv.org/abs/2503.23730) | [**🤗 KOFFVQA Dataset**](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data)

{benchname}🔍 is a Free-Form VQA benchmark dataset designed to evaluate Vision-Language Models (VLMs) in Korean language environments. Unlike traditional multiple-choice or predefined answer formats, KOFFVQA challenges models to generate open-ended, natural-language answers to visually grounded questions. This allows for a more comprehensive assessment of a model's ability to understand and generate nuanced Korean responses.

The dataset encompasses diverse real-world scenarios, including object attributes, recognition, relationship, etc.

The page will be continuously updated and will accept requests to add models to the leaderboard. For more details, please refer to the "Submit" tab.

'''.strip()

about_md = f'''

# About

The {benchname} benchmark is designed to evaluate and compare the performance of Vision-Language Models (VLMs) in Korean language environments.

This benchmark includes a total of 275 Korean questions across 10 tasks. The questions are open-ended, free-form VQA (Visual Question Answering) with objective answers, allowing responses without strict format constraints.

## News
* **2025-04-25** : Our [leaderboard](https://huggingface.co/spaces/maum-ai/KOFFVQA-Leaderboard) currently finished evaluating **81** total open- and closed- sourced VLMs. Also we have refactored the evaluation code to make it easier to use and be able to evaluate much more diverse models.

* **2025-04-01** : Our paper [KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language](https://arxiv.org/abs/2503.23730) has been released and accepted to CVPRW 2025, Workshop on Benchmarking and Expanding AI Multimodal Approaches(BEAM 2025) 🎉

* **2025-01-21**: [Evaluation code](https://github.com/maum-ai/KOFFVQA) and [dataset](https://huggingface.co/datasets/maum-ai/KOFFVQA_Data) release

* **2024-12-06**: Leaderboard Release!

## Citation

**BibTeX:**
'''.strip() + "\n```bibtex\n" + '''
@article{kim2025koffvqa,
  title={KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language},
  author={Kim, Yoonshik and Jung, Jaeyoon},
  journal={arXiv preprint arXiv:2503.23730},
  year={2025}
}
''' + "\n```"

submit_md = f'''

# Submit

We are not accepting model addition requests at the moment. Once the request system is established, we will start accepting requests.

🚀 Wondering how your VLM stacks up in Korean? Just run it with our evaluation code and get your score—no API key needed!

🧑‍⚖️ We currently use google/gemma-2-9b-it as the judge model, so there's no need to worry about API keys or usage fees.

'''.strip()