Spaces:

SevenQin
/

cmkj-gpt

Runtime error

App Files Files Community

cmkj-gpt / eval /EVALUATION.md

SevenQin

Upload folder using huggingface_hub

4450c0d over 1 year ago

preview code

raw

history blame contribute delete

2.58 kB

	## 评测复现

	- CEVAL

	```Shell
	wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
	mkdir data/ceval
	mv ceval-exam.zip data/ceval
	cd data/ceval; unzip ceval-exam.zip
	cd ../../

	# Qwen-7B
	python evaluate_ceval.py -d data/ceval/

	# Qwen-7B-Chat
	pip install thefuzz
	python evaluate_chat_ceval.py -d data/ceval/
	```

	- MMLU

	```Shell
	wget https://people.eecs.berkeley.edu/~hendrycks/data.tar
	mkdir data/mmlu
	mv data.tar data/mmlu
	cd data/mmlu; tar xf data.tar
	cd ../../

	# Qwen-7B
	python evaluate_mmlu.py -d data/mmlu/data/

	# Qwen-7B-Chat
	pip install thefuzz
	python evaluate_chat_mmlu.py -d data/mmlu/data/
	```

	- CMMLU

	```Shell
	wget https://huggingface.co/datasets/haonan-li/cmmlu/resolve/main/cmmlu_v1_0_1.zip
	mkdir data/cmmlu
	mv cmmlu_v1_0_1.zip data/cmmlu
	cd data/cmmlu; unzip cmmlu_v1_0_1.zip
	cd ../../

	# Qwen-7B
	python evaluate_cmmlu.py -d data/cmmlu/
	```

	- HumanEval

	Get the HumanEval.jsonl file from [here](https://github.com/openai/human-eval/tree/master/data)

	```Shell
	git clone https://github.com/openai/human-eval
	pip install -e human-eval

	# Qwen-7B
	python evaluate_humaneval.py -f HumanEval.jsonl -o HumanEval_res.jsonl
	evaluate_functional_correctness HumanEval_res.jsonl
	# Qwen-7B-Chat
	python evaluate_chat_mmlu.py -f HumanEval.jsonl -o HumanEval_res_chat.jsonl
	evaluate_functional_correctness HumanEval_res_chat.jsonl
	```

	When installing package human-eval, please note its following disclaimer:

	This program exists to run untrusted model-generated code. Users are strongly encouraged not to do so outside of a robust security sandbox. The execution call in execution.py is deliberately commented out to ensure users read this disclaimer before running code in a potentially unsafe manner. See the comment in execution.py for more information and instructions.

	- GSM8K

	```Shell
	# Qwen-7B
	python evaluate_gsm8k.py

	# Qwen-7B-Chat
	python evaluate_chat_gsm8k.py # zeroshot
	python evaluate_chat_gsm8k.py --use-fewshot # fewshot
	```

	- PLUGIN

	This script is used to reproduce the results of the ReAct and Hugging Face Agent in the Tool Usage section of the README document.

	```Shell
	# Qwen-7B-Chat
	mkdir data;
	cd data;
	wget https://qianwen-res.oss-cn-beijing.aliyuncs.com/opensource_data/exam_plugin_v1/exam_plugin_v1_react_positive.jsonl;
	wget https://qianwen-res.oss-cn-beijing.aliyuncs.com/opensource_data/exam_plugin_v1/exam_plugin_v1_react_negative.jsonl;
	cd ..;
	pip install json5;
	pip install jsonlines;
	pip install rouge_score;
	python evaluate_plugin.py --eval-react-positive --eval-react-negative --eval-hfagent
	```