Hrant commited on
Commit
f4b3542
·
verified ·
1 Parent(s): 2966224

Update app.py

Browse files

Modifie "About" page text

Files changed (1) hide show
  1. app.py +41 -31
app.py CHANGED
@@ -55,19 +55,18 @@ def main():
55
  plot_column_dropdown_mmlu = gr.Dropdown(choices=subject_cols, value='Average', label='Select Column to Plot')
56
  plot_output_mmlu = gr.Plot(lambda column: mmlu_chart(global_output_mmlu, column), inputs=plot_column_dropdown_mmlu)
57
  with gr.TabItem("About"):
58
- gr.Markdown("# About the Benchmark")
59
  gr.Markdown(
60
  """
61
- This benchmark evaluates Language Models on Armenian-specific tasks, including Armenian Unified Test Exams and a translated version of the MMLU-Pro benchmark (MMLU-Pro-Hy). It is designed to measure the models' understanding and generation capabilities in the Armenian language.
62
 
63
- **Creator Company:** Metric AI Research Lab, Yerevan, Armenia."""
64
  )
65
- gr.Image("logo.png", width=200, show_label=False, show_download_button=False, show_fullscreen_button=False, show_share_button=False)
66
- gr.Markdown("""
67
- - [Website](https://metric.am/)
68
- - [Hugging Face](https://huggingface.co/Metric-AI)
69
 
70
- MMLU-Pro-Hy is a massive multi-task test in MCQA format, inspired by the original MMLU benchmark, adapted for the Armenian language. The Armenian Unified Exams benchmark allows for comparison with human-level knowledge.
 
 
 
71
  """
72
  )
73
  gr.Markdown("## Submission Guide")
@@ -81,36 +80,47 @@ def main():
81
  2. **Format your submission file**:
82
  - After evaluation, you will get a `results.json` file. Ensure the file follows this format:
83
 
84
- ```json
85
- {
86
- "mmlu_results": [
87
- {
88
- "category": "category_name",
89
- "score": score_value
90
- },
91
- ...
92
- ],
93
- "unified_exam_results": [
94
- {
95
- "category": "category_name",
96
- "score": score_value
97
- },
98
- ...
99
- ]
100
- }
101
- ```
102
  3. **Submit your model**:
103
  - Add the `Arm-LLM-Bench` tag and the `results.json` file to your model card.
104
  - Click on the "Refresh Data" button in this app, and you will see your model's results.
105
  """
106
  )
107
- gr.Markdown("## Contributing")
 
 
 
 
 
 
 
 
 
 
 
 
 
108
  gr.Markdown(
109
  """
110
- You can contribute to this benchmark in several ways:
111
- - Providing API credits for evaluating API-based models.
112
- - Citing our work in your research and publications.
113
- - Contributing to the development of the benchmark itself.
114
  """
115
  )
116
 
 
55
  plot_column_dropdown_mmlu = gr.Dropdown(choices=subject_cols, value='Average', label='Select Column to Plot')
56
  plot_output_mmlu = gr.Plot(lambda column: mmlu_chart(global_output_mmlu, column), inputs=plot_column_dropdown_mmlu)
57
  with gr.TabItem("About"):
58
+ gr.Markdown("# Overview")
59
  gr.Markdown(
60
  """
61
+ This benchmark is developed and maintained by [Metric](https://metric.am/). It evaluates the capabilities of Large Language Models on Armenian-specific tasks, including Armenian Unified Test Exams and a translated version of the MMLU-Pro benchmark (MMLU-Pro-Hy). It is designed to measure the models' understanding and generation capabilities in the Armenian language.
62
 
63
+ """
64
  )
 
 
 
 
65
 
66
+ gr.Markdown("# Dataset")
67
+ gr.Markdown("""
68
+ - [Armenian Unified Exams](https://dimord.am/public/tests): collection of High School graduation test exams used in 2025 in Armenia. The highest achievable score per test is 20. The data is extracted from PDFs and manually prepared for LLM evaluation.
69
+ - MMLU-Pro-Hy: a massive multi-task test in MCQA format, inspired by the original [MMLU benchmark](https://arxiv.org/abs/2406.01574), adapted for the Armenian language. Currently, a stratified sample is sued for evaluation summing up to 500 questions in total. The Armenian version is generated through machine-translation. Resulting dataset went extensive post-processing to ensure high quality subsample is selected for evaluation..
70
  """
71
  )
72
  gr.Markdown("## Submission Guide")
 
80
  2. **Format your submission file**:
81
  - After evaluation, you will get a `results.json` file. Ensure the file follows this format:
82
 
83
+ ```json
84
+ {
85
+ "mmlu_results": [
86
+ {
87
+ "category": "category_name",
88
+ "score": score_value
89
+ },
90
+ ...
91
+ ],
92
+ "unified_exam_results": [
93
+ {
94
+ "category": "category_name",
95
+ "score": score_value
96
+ },
97
+ ...
98
+ ]
99
+ }
100
+ ```
101
  3. **Submit your model**:
102
  - Add the `Arm-LLM-Bench` tag and the `results.json` file to your model card.
103
  - Click on the "Refresh Data" button in this app, and you will see your model's results.
104
  """
105
  )
106
+ with gr.Column():
107
+ gr.Markdown("## Contributing")
108
+ gr.Markdown(
109
+ """
110
+ You can contribute to this benchmark in several ways:
111
+ - Provide API credits for evaluating additional API-based models.
112
+ - Citing our work in your research and publications.
113
+ - Contributing to the development of the benchmark itself with data or with evaluation results.
114
+ """
115
+ )
116
+ with gr.Column():
117
+ gr.Image("logo.png", width=200, show_label=False, show_download_button=False, show_fullscreen_button=False, show_share_button=False)
118
+
119
+ gr.Markdown("# About Metric")
120
  gr.Markdown(
121
  """
122
+ [Metric](https://metric.am/) is an AI Research Lab based in Yerevan, Armenia. It is specialized in training custom embedding and generation models for use cases such as Document AI or low-represented languages. If you are interested in our research or advisory services, drop an email to [email protected].
123
+
 
 
124
  """
125
  )
126