Spaces:

Duplicated from datenlabor-bmz/ai-language-monitor

GIZ
/

ai-language-proficiency-monitor

Running

App Files Files Community

ai-language-proficiency-monitor / evals /tasks.py

Commit History

Run on 40 languages, additional models

260c1a3

David Pomerenke commited on 3 days ago

Shorter classification prompt + error handling

0384b92

David Pomerenke commited on 3 days ago

Implement MMLU task

a683732

David Pomerenke commited on 12 days ago

MMLU data loader for 3 parallel datasets

47170a5

David Pomerenke commited on 13 days ago

Add Global MMLU benchmark

ce2acb0

David Pomerenke commited on 13 days ago

Translation both from and to

731eddd

David Pomerenke commited on 17 days ago

Run on 100 languages, adjust display

8274634

David Pomerenke commited on 24 days ago

spBLEU tokenizer, run on more languages

eaf2d97

David Pomerenke commited on Mar 25

Refactor eval code into files

da6e1bc

David Pomerenke commited on Mar 15