@ZennyKenny on Hugging Face: "Besides being the coolest named benchmark in the game, HellaSwag is an…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

ZennyKenny

posted an update Mar 26

Post

1939

Besides being the coolest named benchmark in the game, HellaSwag is an important measurement of здравый смысль (or common sense) in LLMs.

- More on HellaSwag: https://github.com/rowanz/hellaswag

I spent the afternoon benchmarking YandexGPT Pro 4th Gen, one of the Russian tech giant's premier models.

- Yandex HF Org:

yandex
- More on Yandex models: https://yandex.cloud/ru/docs/foundation-models/concepts/yandexgpt/models

The eval notebook is available on GitHub and the resulting dataset is already on the HF Hub!

- Eval Notebook: https://github.com/kghamilton89/ai-explorer/blob/main/yandex-hellaswag/hellaswag-assess.ipynb
- Eval Dataset: ZennyKenny/yandexgptpro_4th_gen-hellaswag

And of course, everyone wants to see the results so have a look at the results in the context of other zero-shot experiments that I was able to find!

JLouisBiz

Mar 26

Sadly, not a free software license

ZennyKenny

Mar 27

Definitely true about the underlying model, but the eval dataset and notebook are!

In this post