lazychih114
commited on
Commit
·
5ec6899
1
Parent(s):
d2703ad
update gaia
Browse files- README.md +7 -2
- README_zh.md +8 -2
README.md
CHANGED
@@ -172,9 +172,14 @@ Example tasks you can try:
|
|
172 |
|
173 |
# 🧪 Experiments
|
174 |
|
175 |
-
|
176 |
-
You can check the `run_gaia_roleplaying.py` file and run the following command:
|
177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
178 |
```bash
|
179 |
python run_gaia_roleplaying.py
|
180 |
```
|
|
|
172 |
|
173 |
# 🧪 Experiments
|
174 |
|
175 |
+
To reproduce OWL's GAIA benchmark score of 58.18:
|
|
|
176 |
|
177 |
+
1. Switch to the `gaia58.18` branch:
|
178 |
+
```bash
|
179 |
+
git checkout gaia58.18
|
180 |
+
```
|
181 |
+
|
182 |
+
1. Run the evaluation script:
|
183 |
```bash
|
184 |
python run_gaia_roleplaying.py
|
185 |
```
|
README_zh.md
CHANGED
@@ -164,9 +164,15 @@ logger.success(f"Answer: {answer}")
|
|
164 |
- "总结这篇研究论文的主要观点:[论文URL]"
|
165 |
# 🧪 实验
|
166 |
|
167 |
-
我们提供了一个脚本用于复现 GAIA 上的实验结果。
|
168 |
-
|
169 |
|
|
|
|
|
|
|
|
|
|
|
|
|
170 |
```bash
|
171 |
python run_gaia_roleplaying.py
|
172 |
```
|
|
|
164 |
- "总结这篇研究论文的主要观点:[论文URL]"
|
165 |
# 🧪 实验
|
166 |
|
167 |
+
我们提供了一个脚本用于复现 GAIA 上的实验结果。
|
168 |
+
要复现我们在 GAIA 基准测试中获得的 58.18 分:
|
169 |
|
170 |
+
1. 切换到 `gaia58.18` 分支:
|
171 |
+
```bash
|
172 |
+
git checkout gaia58.18
|
173 |
+
```
|
174 |
+
|
175 |
+
2. 运行评估脚本:
|
176 |
```bash
|
177 |
python run_gaia_roleplaying.py
|
178 |
```
|