Quest-AI
/

qwen-writerdemo-7b-s500

Model card Files Files and versions Community

Improve language tag

#1

by lbourdois - opened 9 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +34 -20

README.md CHANGED Viewed

@@ -1,21 +1,35 @@
----
-license: apache-2.0
-datasets:
-- Mielikki/Erebus-87k
-base_model:
-- Qwen/Qwen2.5-7B
----
-## Qwen2.5 7b GRPO RM Train (Writing Demo)
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/5zmLnxtUy1j5NehNAZ2wA.png)
-This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
-## Model Output Example (from 768 token prefix)
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/LHDsJ6P4OUnCI1Fq-PS2H.png)
-## Other
-Reward function files can be found here:
-[verifiers](https://wandb.ai/kalomaze/verifiers-verifiers_examples/runs/gbgsnilw/files/run_files_20250319_153009/verifiers)
 This model was trained using my chunked pref reward model baseline: [pretrain-rm-baseline-7b](https://huggingface.co/Quest-AI/pretrain-rm-baseline-7b)

+---
+license: apache-2.0
+datasets:
+- Mielikki/Erebus-87k
+base_model:
+- Qwen/Qwen2.5-7B
+language:
+- zho
+- eng
+- fra
+- spa
+- por
+- deu
+- ita
+- rus
+- jpn
+- kor
+- vie
+- tha
+- ara
+---
+## Qwen2.5 7b GRPO RM Train (Writing Demo)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/5zmLnxtUy1j5NehNAZ2wA.png)
+This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
+## Model Output Example (from 768 token prefix)
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/LHDsJ6P4OUnCI1Fq-PS2H.png)
+## Other
+Reward function files can be found here:
+[verifiers](https://wandb.ai/kalomaze/verifiers-verifiers_examples/runs/gbgsnilw/files/run_files_20250319_153009/verifiers)
 This model was trained using my chunked pref reward model baseline: [pretrain-rm-baseline-7b](https://huggingface.co/Quest-AI/pretrain-rm-baseline-7b)