Spaces:
Running
Running
Update app.py
Browse filesModifie "About" page text
app.py
CHANGED
@@ -55,19 +55,18 @@ def main():
|
|
55 |
plot_column_dropdown_mmlu = gr.Dropdown(choices=subject_cols, value='Average', label='Select Column to Plot')
|
56 |
plot_output_mmlu = gr.Plot(lambda column: mmlu_chart(global_output_mmlu, column), inputs=plot_column_dropdown_mmlu)
|
57 |
with gr.TabItem("About"):
|
58 |
-
gr.Markdown("#
|
59 |
gr.Markdown(
|
60 |
"""
|
61 |
-
This benchmark evaluates Language Models on Armenian-specific tasks, including Armenian Unified Test Exams and a translated version of the MMLU-Pro benchmark (MMLU-Pro-Hy). It is designed to measure the models' understanding and generation capabilities in the Armenian language.
|
62 |
|
63 |
-
|
64 |
)
|
65 |
-
gr.Image("logo.png", width=200, show_label=False, show_download_button=False, show_fullscreen_button=False, show_share_button=False)
|
66 |
-
gr.Markdown("""
|
67 |
-
- [Website](https://metric.am/)
|
68 |
-
- [Hugging Face](https://huggingface.co/Metric-AI)
|
69 |
|
70 |
-
|
|
|
|
|
|
|
71 |
"""
|
72 |
)
|
73 |
gr.Markdown("## Submission Guide")
|
@@ -81,36 +80,47 @@ def main():
|
|
81 |
2. **Format your submission file**:
|
82 |
- After evaluation, you will get a `results.json` file. Ensure the file follows this format:
|
83 |
|
84 |
-
|
85 |
-
|
86 |
-
|
87 |
-
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
-
|
97 |
-
|
98 |
-
|
99 |
-
|
100 |
-
|
101 |
-
|
102 |
3. **Submit your model**:
|
103 |
- Add the `Arm-LLM-Bench` tag and the `results.json` file to your model card.
|
104 |
- Click on the "Refresh Data" button in this app, and you will see your model's results.
|
105 |
"""
|
106 |
)
|
107 |
-
gr.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
108 |
gr.Markdown(
|
109 |
"""
|
110 |
-
|
111 |
-
|
112 |
-
- Citing our work in your research and publications.
|
113 |
-
- Contributing to the development of the benchmark itself.
|
114 |
"""
|
115 |
)
|
116 |
|
|
|
55 |
plot_column_dropdown_mmlu = gr.Dropdown(choices=subject_cols, value='Average', label='Select Column to Plot')
|
56 |
plot_output_mmlu = gr.Plot(lambda column: mmlu_chart(global_output_mmlu, column), inputs=plot_column_dropdown_mmlu)
|
57 |
with gr.TabItem("About"):
|
58 |
+
gr.Markdown("# Overview")
|
59 |
gr.Markdown(
|
60 |
"""
|
61 |
+
This benchmark is developed and maintained by [Metric](https://metric.am/). It evaluates the capabilities of Large Language Models on Armenian-specific tasks, including Armenian Unified Test Exams and a translated version of the MMLU-Pro benchmark (MMLU-Pro-Hy). It is designed to measure the models' understanding and generation capabilities in the Armenian language.
|
62 |
|
63 |
+
"""
|
64 |
)
|
|
|
|
|
|
|
|
|
65 |
|
66 |
+
gr.Markdown("# Dataset")
|
67 |
+
gr.Markdown("""
|
68 |
+
- [Armenian Unified Exams](https://dimord.am/public/tests): collection of High School graduation test exams used in 2025 in Armenia. The highest achievable score per test is 20. The data is extracted from PDFs and manually prepared for LLM evaluation.
|
69 |
+
- MMLU-Pro-Hy: a massive multi-task test in MCQA format, inspired by the original [MMLU benchmark](https://arxiv.org/abs/2406.01574), adapted for the Armenian language. Currently, a stratified sample is sued for evaluation summing up to 500 questions in total. The Armenian version is generated through machine-translation. Resulting dataset went extensive post-processing to ensure high quality subsample is selected for evaluation..
|
70 |
"""
|
71 |
)
|
72 |
gr.Markdown("## Submission Guide")
|
|
|
80 |
2. **Format your submission file**:
|
81 |
- After evaluation, you will get a `results.json` file. Ensure the file follows this format:
|
82 |
|
83 |
+
```json
|
84 |
+
{
|
85 |
+
"mmlu_results": [
|
86 |
+
{
|
87 |
+
"category": "category_name",
|
88 |
+
"score": score_value
|
89 |
+
},
|
90 |
+
...
|
91 |
+
],
|
92 |
+
"unified_exam_results": [
|
93 |
+
{
|
94 |
+
"category": "category_name",
|
95 |
+
"score": score_value
|
96 |
+
},
|
97 |
+
...
|
98 |
+
]
|
99 |
+
}
|
100 |
+
```
|
101 |
3. **Submit your model**:
|
102 |
- Add the `Arm-LLM-Bench` tag and the `results.json` file to your model card.
|
103 |
- Click on the "Refresh Data" button in this app, and you will see your model's results.
|
104 |
"""
|
105 |
)
|
106 |
+
with gr.Column():
|
107 |
+
gr.Markdown("## Contributing")
|
108 |
+
gr.Markdown(
|
109 |
+
"""
|
110 |
+
You can contribute to this benchmark in several ways:
|
111 |
+
- Provide API credits for evaluating additional API-based models.
|
112 |
+
- Citing our work in your research and publications.
|
113 |
+
- Contributing to the development of the benchmark itself with data or with evaluation results.
|
114 |
+
"""
|
115 |
+
)
|
116 |
+
with gr.Column():
|
117 |
+
gr.Image("logo.png", width=200, show_label=False, show_download_button=False, show_fullscreen_button=False, show_share_button=False)
|
118 |
+
|
119 |
+
gr.Markdown("# About Metric")
|
120 |
gr.Markdown(
|
121 |
"""
|
122 |
+
[Metric](https://metric.am/) is an AI Research Lab based in Yerevan, Armenia. It is specialized in training custom embedding and generation models for use cases such as Document AI or low-represented languages. If you are interested in our research or advisory services, drop an email to [email protected].
|
123 |
+
|
|
|
|
|
124 |
"""
|
125 |
)
|
126 |
|