Spaces:

WeijianQi1999
/

Test

Sleeping

App Files Files Community

Test / content.py

WeijianQi1999

Upload folder using huggingface_hub

76a84b9 verified about 2 months ago

raw

history blame contribute delete

2.74 kB

	TITLE = """<h1 align="center" id="space-title">🏆 Online Mind2Web Leaderboard</h1>"""

	INTRODUCTION_TEXT = """
	Online Mind2Web is a benchmark designed to evaluate real-world performance of web agents on online websites.

	[[Blog]]() [[Paper]]() [[Code]]() [[Data]]()

	## Tasks
	Online Mind2Web includes 300 tasks from 136 popular websites across various domains. It covers a diverse set of user tasks, to evaluate agents' performance in real-world environments.

	Tasks are categorized into three difficulty levels based on the steps human annotators need:
	- Easy: 1 - 5
	- Medium: 6 - 10
	- Hard: 11 +

	## Leaderboard
	"""

	SUBMISSION_TEXT = """
	## Submissions
	Participants are invited to submit your agent's trajectory to test. The submissions will be evaluated based on our auto-eval.

	### Format of submission
	Submissions must include a sequence of images (i.e., screenshots in the trajectory) and a result.json file for each task. The JSON file should contain the fields: "Task", "Task_id", and "action_history". You can refer to an example of the submission files.
	"""

	CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results"
	CITATION_BUTTON_TEXT = r"""
	Online Mind2Web"""

	SUBMIT_INTRODUCTION = """
	## ⚠ Please submit the trajectory file with the following format:
	Each task is stored in a folder named after its `task_id`, containing:

	- `trajectory/`: Stores screenshots of each step.
	- `result.json`: Task metadata and action history.

	Structure:
	```
	main_directory/
	└── task_id/
	├── result.json
	└── trajectory/
	├── 0_screenshot.png
	├── 1_screenshot.png
	└── ...
	```

	`result.json` format:
	```json
	{
	"task_id": 123,
	"task": "abc",
	"action_history": ["abc", "xyz", "..."]
	}
	```
	Please send your agent's name, model family, and organization via email to [email protected], along with the trajectory directory attached.

	We will run the auto-evaluation. If you have conducted your own human evaluation, please also attach your human eval results—we will spot-check these before adding them to the human-eval table.

	"""
	DATA_DATASET = """## More Statistics for Online Mind2Web Benchmark
	"""


	def format_error(msg):
	return f"<p style='color: red; font-size: 20px; text-align: center;'>{msg}</p>"

	def format_warning(msg):
	return f"<p style='color: orange; font-size: 20px; text-align: center;'>{msg}</p>"

	def format_log(msg):
	return f"<p style='color: green; font-size: 20px; text-align: center;'>{msg}</p>"

	def model_hyperlink(link, model_name):
	return f'<a target="_blank" href="{link}" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">{model_name}</a>'