Spaces:

evalitahf
/

evalita_llm_leaderboard

Running

App Files Files Community

evalita_llm_leaderboard / src /tasks.py

rzanoli

Add new scripts for model processing and tasks management

ad489d5 2 months ago

raw

history blame

2.64 kB

	from dataclasses import dataclass
	from enum import Enum

	@dataclass
	class Task:
	benchmark: str
	# metric: str
	accuracy: str
	col_name: str


	NUM_FEWSHOT = 0 # Change with your few shot
	# ---------------------------------------------------

	# Your leaderboard name
	TITLE = """<h1 align="center" id="space-title">🚀 EVALITA-LLM Leaderboard 🚀</h1>"""

	# What does your leaderboard evaluate?
	INTRODUCTION_TEXT = """
	Evalita-LLM, a new benchmark designed to evaluate Large Language Models (LLMs) on Italian tasks. The distinguishing and innovative features of Evalita-LLM are the following: (i) all tasks are native Italian, avoiding issues of translating from Italian and potential cultural biases; (ii) in addition to well established multiple-choice tasks, the benchmark includes generative tasks, enabling more natural interaction with LLMs; (iii) all tasks are evaluated against multiple prompts, this way mitigating the model sensitivity to specific prompts and allowing a fairer and objective evaluation.
	"""

	# Which evaluations are you running? how can people reproduce what you have?
	TE_DESCRIPTION = """### Textual Entailment (TE)
	The input are two sentences: the text (T) and the hypothesis (H). The model has to determine whether the meaning of the hypothesis is logically entailed by the text.

	\| # \| Prompt \| Answer Choices \|
	\|-----\|--------\|----------------\|
	\| 1 \| La frase: '{{text1}}' implica logicamente che la frase: '{{text2}}' sia vera? \| ["Sì", "No"] \|
	\| 2 \| Devi risolvere un compito di inferenza semantica. La frase: '{{text1}}' implica logicamente che la frase: '{{text2}}' sia vera? \| ["Sì", "No"] \|
	\| 3 \| La frase: '{{text1}}' implica logicamente che la frase: '{{text2}}' sia vera?\\nA: Sì\\nB: No\\nRisposta: \| ["A", "B"] \|
	\| 4 \| Devi risolvere un compito di inferenza semantica. La frase: '{{text1}}' implica logicamente che la frase: '{{text2}}' sia vera?\\nA: Sì\\nB: No\\nRisposta: \| ["A", "B"] \|
	\| 5 \| Frase 1: '{{text1}}' Frase 2: '{{text2}}' \| ["La frase 1 implica logicamente che la frase 2 sia vera", "La frase 1 non implica logicamente che la frase 2 sia vera"] \|
	\| 6 \| Devi risolvere un compito di inferenza semantica. Frase 1: '{{text1}}' Frase 2: '{{text2}}' \| ["La frase 1 implica logicamente che la frase 2 sia vera", "La frase 1 non implica logicamente che la frase 2 sia vera"] \|

	Combined Performance = (1 - (Best_Prompt - Prompt_Average) / 100) * Best_Prompt. Prompt Average = accuracy averaged over the six prompts. Best Prompt = accuracy of the best prompt. Prompt ID = ID of the best prompt (see legend above)
	"""