Safetensors
qwen2
Files changed (1) hide show
  1. README.md +34 -20
README.md CHANGED
@@ -1,21 +1,35 @@
1
- ---
2
- license: apache-2.0
3
- datasets:
4
- - Mielikki/Erebus-87k
5
- base_model:
6
- - Qwen/Qwen2.5-7B
7
- ---
8
-
9
- ## Qwen2.5 7b GRPO RM Train (Writing Demo)
10
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/5zmLnxtUy1j5NehNAZ2wA.png)
11
-
12
- This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
13
-
14
- ## Model Output Example (from 768 token prefix)
15
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/LHDsJ6P4OUnCI1Fq-PS2H.png)
16
-
17
- ## Other
18
- Reward function files can be found here:
19
- [verifiers](https://wandb.ai/kalomaze/verifiers-verifiers_examples/runs/gbgsnilw/files/run_files_20250319_153009/verifiers)
20
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  This model was trained using my chunked pref reward model baseline: [pretrain-rm-baseline-7b](https://huggingface.co/Quest-AI/pretrain-rm-baseline-7b)
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Mielikki/Erebus-87k
5
+ base_model:
6
+ - Qwen/Qwen2.5-7B
7
+ language:
8
+ - zho
9
+ - eng
10
+ - fra
11
+ - spa
12
+ - por
13
+ - deu
14
+ - ita
15
+ - rus
16
+ - jpn
17
+ - kor
18
+ - vie
19
+ - tha
20
+ - ara
21
+ ---
22
+
23
+ ## Qwen2.5 7b GRPO RM Train (Writing Demo)
24
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/5zmLnxtUy1j5NehNAZ2wA.png)
25
+
26
+ This is a base model that has had an experimental reward model RL training done over it for a subset of the Erebus dataset (creative writing).
27
+
28
+ ## Model Output Example (from 768 token prefix)
29
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6491e00e057b0928b3e07b75/LHDsJ6P4OUnCI1Fq-PS2H.png)
30
+
31
+ ## Other
32
+ Reward function files can be found here:
33
+ [verifiers](https://wandb.ai/kalomaze/verifiers-verifiers_examples/runs/gbgsnilw/files/run_files_20250319_153009/verifiers)
34
+
35
  This model was trained using my chunked pref reward model baseline: [pretrain-rm-baseline-7b](https://huggingface.co/Quest-AI/pretrain-rm-baseline-7b)