RahhulG commited on
Commit
d992024
·
verified ·
1 Parent(s): f184af7

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openlm-research/open_llama_3b
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openlm-research/open_llama_3b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "v_proj",
28
+ "q_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "trainable_token_indices": null,
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:607c3f2ed0bcf7175d2653b204b1d9456d5338a559ae6f2b0882238f2a4d2ae0
3
+ size 10663320
checkpoint-500/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openlm-research/open_llama_3b
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-500/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openlm-research/open_llama_3b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "v_proj",
28
+ "q_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "trainable_token_indices": null,
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
checkpoint-500/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:869ce8cb8525e2e9d25141da07e7e2fd7f96b6e826dd4bd204dbf60b5653c067
3
+ size 10663320
checkpoint-500/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:82db044f769362afc41ac564211afeeb387d8a43bdbd7e6bcde3f11f6f08d480
3
+ size 21386746
checkpoint-500/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f37c40ce327861a7ca13b719d3aa37510a143368b6e74358bdb14becb3899e1e
3
+ size 14244
checkpoint-500/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c40b1ef83e0ab438de8d7316f6b9e11585b931afa2859bc18f6b2472417e80ad
3
+ size 1064
checkpoint-500/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-500/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-500/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab1b681ec7fc02fed5edd3026687d7a692a918c4dd8e150ca2e3994a6229843b
3
+ size 534194
checkpoint-500/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": true,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
checkpoint-500/trainer_state.json ADDED
@@ -0,0 +1,784 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 4.0,
6
+ "eval_steps": 500,
7
+ "global_step": 500,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.08048289738430583,
14
+ "grad_norm": 1.8390964269638062,
15
+ "learning_rate": 1.976e-05,
16
+ "logits/chosen": -7.928730010986328,
17
+ "logits/rejected": -7.768202304840088,
18
+ "logps/chosen": -126.51554870605469,
19
+ "logps/rejected": -141.75454711914062,
20
+ "loss": 0.6957,
21
+ "rewards/accuracies": 0.4000000059604645,
22
+ "rewards/chosen": -0.002580838045105338,
23
+ "rewards/margins": -0.004946361295878887,
24
+ "rewards/rejected": 0.0023655227851122618,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.16096579476861167,
29
+ "grad_norm": 2.1244845390319824,
30
+ "learning_rate": 1.9493333333333335e-05,
31
+ "logits/chosen": -8.07243537902832,
32
+ "logits/rejected": -8.191374778747559,
33
+ "logps/chosen": -140.3431854248047,
34
+ "logps/rejected": -127.8196029663086,
35
+ "loss": 0.6941,
36
+ "rewards/accuracies": 0.4749999940395355,
37
+ "rewards/chosen": 0.0015918923309072852,
38
+ "rewards/margins": -0.0016587398713454604,
39
+ "rewards/rejected": 0.0032506324350833893,
40
+ "step": 20
41
+ },
42
+ {
43
+ "epoch": 0.2414486921529175,
44
+ "grad_norm": 2.229215383529663,
45
+ "learning_rate": 1.922666666666667e-05,
46
+ "logits/chosen": -7.767237663269043,
47
+ "logits/rejected": -7.859239101409912,
48
+ "logps/chosen": -140.90402221679688,
49
+ "logps/rejected": -136.17660522460938,
50
+ "loss": 0.6931,
51
+ "rewards/accuracies": 0.4749999940395355,
52
+ "rewards/chosen": 0.0016952038276940584,
53
+ "rewards/margins": 0.0003137874882668257,
54
+ "rewards/rejected": 0.001381416223011911,
55
+ "step": 30
56
+ },
57
+ {
58
+ "epoch": 0.32193158953722334,
59
+ "grad_norm": 1.4213330745697021,
60
+ "learning_rate": 1.896e-05,
61
+ "logits/chosen": -7.892869472503662,
62
+ "logits/rejected": -7.617051124572754,
63
+ "logps/chosen": -121.54151916503906,
64
+ "logps/rejected": -132.56336975097656,
65
+ "loss": 0.6901,
66
+ "rewards/accuracies": 0.574999988079071,
67
+ "rewards/chosen": 0.009255774319171906,
68
+ "rewards/margins": 0.006459495518356562,
69
+ "rewards/rejected": 0.0027962785679847,
70
+ "step": 40
71
+ },
72
+ {
73
+ "epoch": 0.4024144869215292,
74
+ "grad_norm": 2.9331471920013428,
75
+ "learning_rate": 1.8693333333333333e-05,
76
+ "logits/chosen": -7.751120090484619,
77
+ "logits/rejected": -7.877864837646484,
78
+ "logps/chosen": -125.05207824707031,
79
+ "logps/rejected": -139.2479248046875,
80
+ "loss": 0.6888,
81
+ "rewards/accuracies": 0.699999988079071,
82
+ "rewards/chosen": 0.013416772708296776,
83
+ "rewards/margins": 0.008954327553510666,
84
+ "rewards/rejected": 0.004462444689124823,
85
+ "step": 50
86
+ },
87
+ {
88
+ "epoch": 0.482897384305835,
89
+ "grad_norm": 1.9477262496948242,
90
+ "learning_rate": 1.8426666666666668e-05,
91
+ "logits/chosen": -7.746337890625,
92
+ "logits/rejected": -7.790966987609863,
93
+ "logps/chosen": -108.3724365234375,
94
+ "logps/rejected": -141.29953002929688,
95
+ "loss": 0.6947,
96
+ "rewards/accuracies": 0.44999998807907104,
97
+ "rewards/chosen": 0.0028126670513302088,
98
+ "rewards/margins": -0.0029423837549984455,
99
+ "rewards/rejected": 0.005755049642175436,
100
+ "step": 60
101
+ },
102
+ {
103
+ "epoch": 0.5633802816901409,
104
+ "grad_norm": 2.476123809814453,
105
+ "learning_rate": 1.8160000000000002e-05,
106
+ "logits/chosen": -7.934849739074707,
107
+ "logits/rejected": -7.784144401550293,
108
+ "logps/chosen": -148.48281860351562,
109
+ "logps/rejected": -140.32388305664062,
110
+ "loss": 0.6895,
111
+ "rewards/accuracies": 0.5,
112
+ "rewards/chosen": 0.016439538449048996,
113
+ "rewards/margins": 0.007786408998072147,
114
+ "rewards/rejected": 0.008653131313621998,
115
+ "step": 70
116
+ },
117
+ {
118
+ "epoch": 0.6438631790744467,
119
+ "grad_norm": 1.9035556316375732,
120
+ "learning_rate": 1.7893333333333337e-05,
121
+ "logits/chosen": -7.956778526306152,
122
+ "logits/rejected": -7.880636692047119,
123
+ "logps/chosen": -134.6743621826172,
124
+ "logps/rejected": -142.11549377441406,
125
+ "loss": 0.6855,
126
+ "rewards/accuracies": 0.6499999761581421,
127
+ "rewards/chosen": 0.005495529621839523,
128
+ "rewards/margins": 0.015812702476978302,
129
+ "rewards/rejected": -0.010317172855138779,
130
+ "step": 80
131
+ },
132
+ {
133
+ "epoch": 0.7243460764587525,
134
+ "grad_norm": 2.0266778469085693,
135
+ "learning_rate": 1.762666666666667e-05,
136
+ "logits/chosen": -7.979184627532959,
137
+ "logits/rejected": -8.19025993347168,
138
+ "logps/chosen": -129.11795043945312,
139
+ "logps/rejected": -133.7857666015625,
140
+ "loss": 0.6911,
141
+ "rewards/accuracies": 0.5249999761581421,
142
+ "rewards/chosen": 0.00029927687137387693,
143
+ "rewards/margins": 0.0047690351493656635,
144
+ "rewards/rejected": -0.004469756968319416,
145
+ "step": 90
146
+ },
147
+ {
148
+ "epoch": 0.8048289738430584,
149
+ "grad_norm": 2.324070453643799,
150
+ "learning_rate": 1.736e-05,
151
+ "logits/chosen": -7.854997158050537,
152
+ "logits/rejected": -7.669028282165527,
153
+ "logps/chosen": -129.60366821289062,
154
+ "logps/rejected": -129.22799682617188,
155
+ "loss": 0.6872,
156
+ "rewards/accuracies": 0.5249999761581421,
157
+ "rewards/chosen": 0.005187320522964001,
158
+ "rewards/margins": 0.012428809888660908,
159
+ "rewards/rejected": -0.007241488434374332,
160
+ "step": 100
161
+ },
162
+ {
163
+ "epoch": 0.8853118712273642,
164
+ "grad_norm": 2.3035223484039307,
165
+ "learning_rate": 1.7093333333333335e-05,
166
+ "logits/chosen": -7.684693336486816,
167
+ "logits/rejected": -8.104289054870605,
168
+ "logps/chosen": -135.45704650878906,
169
+ "logps/rejected": -142.0008544921875,
170
+ "loss": 0.6842,
171
+ "rewards/accuracies": 0.625,
172
+ "rewards/chosen": -0.0016596360364928842,
173
+ "rewards/margins": 0.0194082073867321,
174
+ "rewards/rejected": -0.021067844703793526,
175
+ "step": 110
176
+ },
177
+ {
178
+ "epoch": 0.96579476861167,
179
+ "grad_norm": 2.0869996547698975,
180
+ "learning_rate": 1.682666666666667e-05,
181
+ "logits/chosen": -8.05040168762207,
182
+ "logits/rejected": -8.025721549987793,
183
+ "logps/chosen": -127.3593978881836,
184
+ "logps/rejected": -122.40535736083984,
185
+ "loss": 0.6837,
186
+ "rewards/accuracies": 0.550000011920929,
187
+ "rewards/chosen": -0.012504220008850098,
188
+ "rewards/margins": 0.0199703611433506,
189
+ "rewards/rejected": -0.0324745811522007,
190
+ "step": 120
191
+ },
192
+ {
193
+ "epoch": 1.040241448692153,
194
+ "grad_norm": 1.9564281702041626,
195
+ "learning_rate": 1.656e-05,
196
+ "logits/chosen": -8.038365364074707,
197
+ "logits/rejected": -8.249411582946777,
198
+ "logps/chosen": -122.29254913330078,
199
+ "logps/rejected": -139.63780212402344,
200
+ "loss": 0.628,
201
+ "rewards/accuracies": 0.6486486196517944,
202
+ "rewards/chosen": -0.008433423936367035,
203
+ "rewards/margins": 0.0293881893157959,
204
+ "rewards/rejected": -0.03782161325216293,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 1.1207243460764587,
209
+ "grad_norm": 2.4709627628326416,
210
+ "learning_rate": 1.6293333333333335e-05,
211
+ "logits/chosen": -7.3343305587768555,
212
+ "logits/rejected": -7.533148288726807,
213
+ "logps/chosen": -125.78892517089844,
214
+ "logps/rejected": -131.87570190429688,
215
+ "loss": 0.6687,
216
+ "rewards/accuracies": 0.75,
217
+ "rewards/chosen": 0.01648072525858879,
218
+ "rewards/margins": 0.05134889483451843,
219
+ "rewards/rejected": -0.03486816957592964,
220
+ "step": 140
221
+ },
222
+ {
223
+ "epoch": 1.2012072434607646,
224
+ "grad_norm": 2.2346677780151367,
225
+ "learning_rate": 1.6026666666666667e-05,
226
+ "logits/chosen": -7.965804100036621,
227
+ "logits/rejected": -8.229939460754395,
228
+ "logps/chosen": -134.5069580078125,
229
+ "logps/rejected": -150.25515747070312,
230
+ "loss": 0.6559,
231
+ "rewards/accuracies": 0.875,
232
+ "rewards/chosen": 0.00035553425550460815,
233
+ "rewards/margins": 0.0779074877500534,
234
+ "rewards/rejected": -0.0775519534945488,
235
+ "step": 150
236
+ },
237
+ {
238
+ "epoch": 1.2816901408450705,
239
+ "grad_norm": 2.3957595825195312,
240
+ "learning_rate": 1.576e-05,
241
+ "logits/chosen": -7.930548667907715,
242
+ "logits/rejected": -8.198356628417969,
243
+ "logps/chosen": -127.7699203491211,
244
+ "logps/rejected": -140.8756866455078,
245
+ "loss": 0.6581,
246
+ "rewards/accuracies": 0.7250000238418579,
247
+ "rewards/chosen": 0.0083924550563097,
248
+ "rewards/margins": 0.07435399293899536,
249
+ "rewards/rejected": -0.06596153974533081,
250
+ "step": 160
251
+ },
252
+ {
253
+ "epoch": 1.3621730382293762,
254
+ "grad_norm": 2.6251115798950195,
255
+ "learning_rate": 1.5493333333333333e-05,
256
+ "logits/chosen": -8.110880851745605,
257
+ "logits/rejected": -8.14248275756836,
258
+ "logps/chosen": -124.23824310302734,
259
+ "logps/rejected": -133.23915100097656,
260
+ "loss": 0.6484,
261
+ "rewards/accuracies": 0.75,
262
+ "rewards/chosen": 0.02494233287870884,
263
+ "rewards/margins": 0.09594295918941498,
264
+ "rewards/rejected": -0.07100063562393188,
265
+ "step": 170
266
+ },
267
+ {
268
+ "epoch": 1.442655935613682,
269
+ "grad_norm": 2.497619867324829,
270
+ "learning_rate": 1.5226666666666668e-05,
271
+ "logits/chosen": -7.945010185241699,
272
+ "logits/rejected": -7.939410209655762,
273
+ "logps/chosen": -116.3001937866211,
274
+ "logps/rejected": -115.17778015136719,
275
+ "loss": 0.6531,
276
+ "rewards/accuracies": 0.800000011920929,
277
+ "rewards/chosen": 0.0008405310800299048,
278
+ "rewards/margins": 0.08627375215291977,
279
+ "rewards/rejected": -0.085433229804039,
280
+ "step": 180
281
+ },
282
+ {
283
+ "epoch": 1.5231388329979878,
284
+ "grad_norm": 2.756197690963745,
285
+ "learning_rate": 1.496e-05,
286
+ "logits/chosen": -7.959776401519775,
287
+ "logits/rejected": -7.889138698577881,
288
+ "logps/chosen": -137.98397827148438,
289
+ "logps/rejected": -134.6807403564453,
290
+ "loss": 0.6609,
291
+ "rewards/accuracies": 0.699999988079071,
292
+ "rewards/chosen": 0.0001740553416311741,
293
+ "rewards/margins": 0.06992082297801971,
294
+ "rewards/rejected": -0.06974677741527557,
295
+ "step": 190
296
+ },
297
+ {
298
+ "epoch": 1.6036217303822937,
299
+ "grad_norm": 3.126654624938965,
300
+ "learning_rate": 1.4693333333333336e-05,
301
+ "logits/chosen": -7.728564262390137,
302
+ "logits/rejected": -7.694819450378418,
303
+ "logps/chosen": -140.72019958496094,
304
+ "logps/rejected": -146.0216064453125,
305
+ "loss": 0.6504,
306
+ "rewards/accuracies": 0.75,
307
+ "rewards/chosen": 0.004489220678806305,
308
+ "rewards/margins": 0.09480009973049164,
309
+ "rewards/rejected": -0.09031088650226593,
310
+ "step": 200
311
+ },
312
+ {
313
+ "epoch": 1.6841046277665996,
314
+ "grad_norm": 2.8069660663604736,
315
+ "learning_rate": 1.4426666666666669e-05,
316
+ "logits/chosen": -8.206178665161133,
317
+ "logits/rejected": -7.909379482269287,
318
+ "logps/chosen": -133.2583465576172,
319
+ "logps/rejected": -140.89820861816406,
320
+ "loss": 0.6309,
321
+ "rewards/accuracies": 0.8999999761581421,
322
+ "rewards/chosen": 0.0008678575977683067,
323
+ "rewards/margins": 0.14069929718971252,
324
+ "rewards/rejected": -0.13983140885829926,
325
+ "step": 210
326
+ },
327
+ {
328
+ "epoch": 1.7645875251509056,
329
+ "grad_norm": 3.165580987930298,
330
+ "learning_rate": 1.416e-05,
331
+ "logits/chosen": -7.514456748962402,
332
+ "logits/rejected": -7.704525947570801,
333
+ "logps/chosen": -133.6655731201172,
334
+ "logps/rejected": -133.7900390625,
335
+ "loss": 0.6461,
336
+ "rewards/accuracies": 0.7749999761581421,
337
+ "rewards/chosen": -0.018594294786453247,
338
+ "rewards/margins": 0.1046491265296936,
339
+ "rewards/rejected": -0.12324341386556625,
340
+ "step": 220
341
+ },
342
+ {
343
+ "epoch": 1.8450704225352113,
344
+ "grad_norm": 2.8950154781341553,
345
+ "learning_rate": 1.3893333333333335e-05,
346
+ "logits/chosen": -7.968588352203369,
347
+ "logits/rejected": -7.763016700744629,
348
+ "logps/chosen": -121.14384460449219,
349
+ "logps/rejected": -138.0037384033203,
350
+ "loss": 0.6169,
351
+ "rewards/accuracies": 0.824999988079071,
352
+ "rewards/chosen": 0.04061353951692581,
353
+ "rewards/margins": 0.16627803444862366,
354
+ "rewards/rejected": -0.12566450238227844,
355
+ "step": 230
356
+ },
357
+ {
358
+ "epoch": 1.925553319919517,
359
+ "grad_norm": 2.657349109649658,
360
+ "learning_rate": 1.3626666666666668e-05,
361
+ "logits/chosen": -7.7075958251953125,
362
+ "logits/rejected": -7.8631134033203125,
363
+ "logps/chosen": -135.63150024414062,
364
+ "logps/rejected": -142.60678100585938,
365
+ "loss": 0.6269,
366
+ "rewards/accuracies": 0.824999988079071,
367
+ "rewards/chosen": 0.0013963343808427453,
368
+ "rewards/margins": 0.1436794400215149,
369
+ "rewards/rejected": -0.1422831118106842,
370
+ "step": 240
371
+ },
372
+ {
373
+ "epoch": 2.0,
374
+ "grad_norm": 1.3372068405151367,
375
+ "learning_rate": 1.3360000000000003e-05,
376
+ "logits/chosen": -8.093542098999023,
377
+ "logits/rejected": -7.9440999031066895,
378
+ "logps/chosen": -126.51072692871094,
379
+ "logps/rejected": -128.38294982910156,
380
+ "loss": 0.6052,
381
+ "rewards/accuracies": 0.7027027010917664,
382
+ "rewards/chosen": 0.015508824028074741,
383
+ "rewards/margins": 0.10147809982299805,
384
+ "rewards/rejected": -0.08596926927566528,
385
+ "step": 250
386
+ },
387
+ {
388
+ "epoch": 2.080482897384306,
389
+ "grad_norm": 2.8451366424560547,
390
+ "learning_rate": 1.3093333333333334e-05,
391
+ "logits/chosen": -7.710860252380371,
392
+ "logits/rejected": -7.813695430755615,
393
+ "logps/chosen": -131.70631408691406,
394
+ "logps/rejected": -132.4761962890625,
395
+ "loss": 0.6136,
396
+ "rewards/accuracies": 0.7749999761581421,
397
+ "rewards/chosen": 0.05115525797009468,
398
+ "rewards/margins": 0.1773657351732254,
399
+ "rewards/rejected": -0.12621048092842102,
400
+ "step": 260
401
+ },
402
+ {
403
+ "epoch": 2.160965794768612,
404
+ "grad_norm": 3.0381789207458496,
405
+ "learning_rate": 1.2826666666666667e-05,
406
+ "logits/chosen": -7.992362976074219,
407
+ "logits/rejected": -8.11289119720459,
408
+ "logps/chosen": -148.1260986328125,
409
+ "logps/rejected": -150.82127380371094,
410
+ "loss": 0.6069,
411
+ "rewards/accuracies": 0.875,
412
+ "rewards/chosen": -0.01732814498245716,
413
+ "rewards/margins": 0.19258996844291687,
414
+ "rewards/rejected": -0.20991814136505127,
415
+ "step": 270
416
+ },
417
+ {
418
+ "epoch": 2.2414486921529173,
419
+ "grad_norm": 3.104029417037964,
420
+ "learning_rate": 1.2560000000000002e-05,
421
+ "logits/chosen": -7.620333671569824,
422
+ "logits/rejected": -7.614747524261475,
423
+ "logps/chosen": -120.11073303222656,
424
+ "logps/rejected": -146.62948608398438,
425
+ "loss": 0.6026,
426
+ "rewards/accuracies": 0.824999988079071,
427
+ "rewards/chosen": 0.06914319843053818,
428
+ "rewards/margins": 0.22265009582042694,
429
+ "rewards/rejected": -0.15350690484046936,
430
+ "step": 280
431
+ },
432
+ {
433
+ "epoch": 2.3219315895372232,
434
+ "grad_norm": 2.918400526046753,
435
+ "learning_rate": 1.2293333333333335e-05,
436
+ "logits/chosen": -8.447749137878418,
437
+ "logits/rejected": -8.199603080749512,
438
+ "logps/chosen": -123.7292709350586,
439
+ "logps/rejected": -141.75552368164062,
440
+ "loss": 0.6137,
441
+ "rewards/accuracies": 0.8999999761581421,
442
+ "rewards/chosen": -0.0003969132958445698,
443
+ "rewards/margins": 0.17746230959892273,
444
+ "rewards/rejected": -0.17785921692848206,
445
+ "step": 290
446
+ },
447
+ {
448
+ "epoch": 2.402414486921529,
449
+ "grad_norm": 3.638888120651245,
450
+ "learning_rate": 1.202666666666667e-05,
451
+ "logits/chosen": -7.351523399353027,
452
+ "logits/rejected": -7.5525007247924805,
453
+ "logps/chosen": -114.2864990234375,
454
+ "logps/rejected": -128.71466064453125,
455
+ "loss": 0.5855,
456
+ "rewards/accuracies": 0.8500000238418579,
457
+ "rewards/chosen": 0.08879393339157104,
458
+ "rewards/margins": 0.24962946772575378,
459
+ "rewards/rejected": -0.16083553433418274,
460
+ "step": 300
461
+ },
462
+ {
463
+ "epoch": 2.482897384305835,
464
+ "grad_norm": 2.6371285915374756,
465
+ "learning_rate": 1.1760000000000001e-05,
466
+ "logits/chosen": -7.6833176612854,
467
+ "logits/rejected": -7.768864631652832,
468
+ "logps/chosen": -119.72042083740234,
469
+ "logps/rejected": -116.38240051269531,
470
+ "loss": 0.5752,
471
+ "rewards/accuracies": 0.8999999761581421,
472
+ "rewards/chosen": 0.0737290009856224,
473
+ "rewards/margins": 0.267278254032135,
474
+ "rewards/rejected": -0.1935492306947708,
475
+ "step": 310
476
+ },
477
+ {
478
+ "epoch": 2.563380281690141,
479
+ "grad_norm": 3.9384307861328125,
480
+ "learning_rate": 1.1493333333333334e-05,
481
+ "logits/chosen": -7.878905296325684,
482
+ "logits/rejected": -7.912691593170166,
483
+ "logps/chosen": -130.33595275878906,
484
+ "logps/rejected": -141.26541137695312,
485
+ "loss": 0.5933,
486
+ "rewards/accuracies": 0.800000011920929,
487
+ "rewards/chosen": 0.0017801120411604643,
488
+ "rewards/margins": 0.22636179625988007,
489
+ "rewards/rejected": -0.22458168864250183,
490
+ "step": 320
491
+ },
492
+ {
493
+ "epoch": 2.6438631790744465,
494
+ "grad_norm": 3.0923478603363037,
495
+ "learning_rate": 1.1226666666666669e-05,
496
+ "logits/chosen": -8.036542892456055,
497
+ "logits/rejected": -8.248276710510254,
498
+ "logps/chosen": -125.16792297363281,
499
+ "logps/rejected": -132.63623046875,
500
+ "loss": 0.5676,
501
+ "rewards/accuracies": 0.824999988079071,
502
+ "rewards/chosen": 0.07399231195449829,
503
+ "rewards/margins": 0.29761144518852234,
504
+ "rewards/rejected": -0.22361913323402405,
505
+ "step": 330
506
+ },
507
+ {
508
+ "epoch": 2.7243460764587524,
509
+ "grad_norm": 4.881553649902344,
510
+ "learning_rate": 1.0960000000000002e-05,
511
+ "logits/chosen": -7.877285957336426,
512
+ "logits/rejected": -8.097951889038086,
513
+ "logps/chosen": -126.46671295166016,
514
+ "logps/rejected": -139.54888916015625,
515
+ "loss": 0.5812,
516
+ "rewards/accuracies": 0.8500000238418579,
517
+ "rewards/chosen": -0.05153612047433853,
518
+ "rewards/margins": 0.2616890072822571,
519
+ "rewards/rejected": -0.3132251501083374,
520
+ "step": 340
521
+ },
522
+ {
523
+ "epoch": 2.8048289738430583,
524
+ "grad_norm": 3.55159068107605,
525
+ "learning_rate": 1.0693333333333333e-05,
526
+ "logits/chosen": -7.992476463317871,
527
+ "logits/rejected": -7.782661437988281,
528
+ "logps/chosen": -144.28924560546875,
529
+ "logps/rejected": -154.6568145751953,
530
+ "loss": 0.578,
531
+ "rewards/accuracies": 0.75,
532
+ "rewards/chosen": -0.04911986365914345,
533
+ "rewards/margins": 0.29945996403694153,
534
+ "rewards/rejected": -0.3485798239707947,
535
+ "step": 350
536
+ },
537
+ {
538
+ "epoch": 2.885311871227364,
539
+ "grad_norm": 3.865807056427002,
540
+ "learning_rate": 1.0426666666666668e-05,
541
+ "logits/chosen": -8.330907821655273,
542
+ "logits/rejected": -8.05485725402832,
543
+ "logps/chosen": -145.71182250976562,
544
+ "logps/rejected": -136.82615661621094,
545
+ "loss": 0.5965,
546
+ "rewards/accuracies": 0.7250000238418579,
547
+ "rewards/chosen": -0.02360793948173523,
548
+ "rewards/margins": 0.23897425830364227,
549
+ "rewards/rejected": -0.2625822126865387,
550
+ "step": 360
551
+ },
552
+ {
553
+ "epoch": 2.96579476861167,
554
+ "grad_norm": 5.761505126953125,
555
+ "learning_rate": 1.0160000000000001e-05,
556
+ "logits/chosen": -7.840609073638916,
557
+ "logits/rejected": -7.903719425201416,
558
+ "logps/chosen": -130.0011444091797,
559
+ "logps/rejected": -134.54214477539062,
560
+ "loss": 0.5836,
561
+ "rewards/accuracies": 0.75,
562
+ "rewards/chosen": 0.01680201105773449,
563
+ "rewards/margins": 0.30428779125213623,
564
+ "rewards/rejected": -0.2874857783317566,
565
+ "step": 370
566
+ },
567
+ {
568
+ "epoch": 3.0402414486921527,
569
+ "grad_norm": 5.530464172363281,
570
+ "learning_rate": 9.893333333333334e-06,
571
+ "logits/chosen": -7.832857608795166,
572
+ "logits/rejected": -7.836492538452148,
573
+ "logps/chosen": -126.37843322753906,
574
+ "logps/rejected": -122.53112030029297,
575
+ "loss": 0.5751,
576
+ "rewards/accuracies": 0.7027027010917664,
577
+ "rewards/chosen": -0.007279620040208101,
578
+ "rewards/margins": 0.2105865776538849,
579
+ "rewards/rejected": -0.2178661823272705,
580
+ "step": 380
581
+ },
582
+ {
583
+ "epoch": 3.1207243460764587,
584
+ "grad_norm": 2.370408296585083,
585
+ "learning_rate": 9.626666666666667e-06,
586
+ "logits/chosen": -7.422645568847656,
587
+ "logits/rejected": -7.677459716796875,
588
+ "logps/chosen": -125.61036682128906,
589
+ "logps/rejected": -142.198486328125,
590
+ "loss": 0.5097,
591
+ "rewards/accuracies": 0.875,
592
+ "rewards/chosen": 0.1445888727903366,
593
+ "rewards/margins": 0.4941856265068054,
594
+ "rewards/rejected": -0.3495967984199524,
595
+ "step": 390
596
+ },
597
+ {
598
+ "epoch": 3.2012072434607646,
599
+ "grad_norm": 2.9659812450408936,
600
+ "learning_rate": 9.360000000000002e-06,
601
+ "logits/chosen": -7.914555549621582,
602
+ "logits/rejected": -7.748204708099365,
603
+ "logps/chosen": -126.63890075683594,
604
+ "logps/rejected": -142.98391723632812,
605
+ "loss": 0.5004,
606
+ "rewards/accuracies": 0.925000011920929,
607
+ "rewards/chosen": 0.15164130926132202,
608
+ "rewards/margins": 0.46555933356285095,
609
+ "rewards/rejected": -0.31391802430152893,
610
+ "step": 400
611
+ },
612
+ {
613
+ "epoch": 3.2816901408450705,
614
+ "grad_norm": 4.255645751953125,
615
+ "learning_rate": 9.093333333333333e-06,
616
+ "logits/chosen": -8.225273132324219,
617
+ "logits/rejected": -8.452564239501953,
618
+ "logps/chosen": -139.03164672851562,
619
+ "logps/rejected": -141.5215606689453,
620
+ "loss": 0.5326,
621
+ "rewards/accuracies": 0.8500000238418579,
622
+ "rewards/chosen": 0.07439279556274414,
623
+ "rewards/margins": 0.40740150213241577,
624
+ "rewards/rejected": -0.33300870656967163,
625
+ "step": 410
626
+ },
627
+ {
628
+ "epoch": 3.3621730382293764,
629
+ "grad_norm": 3.6998543739318848,
630
+ "learning_rate": 8.826666666666668e-06,
631
+ "logits/chosen": -7.763664245605469,
632
+ "logits/rejected": -7.92657470703125,
633
+ "logps/chosen": -138.3274383544922,
634
+ "logps/rejected": -139.1911163330078,
635
+ "loss": 0.5505,
636
+ "rewards/accuracies": 0.8500000238418579,
637
+ "rewards/chosen": 0.045943666249513626,
638
+ "rewards/margins": 0.3697761595249176,
639
+ "rewards/rejected": -0.3238324820995331,
640
+ "step": 420
641
+ },
642
+ {
643
+ "epoch": 3.442655935613682,
644
+ "grad_norm": 2.9942128658294678,
645
+ "learning_rate": 8.560000000000001e-06,
646
+ "logits/chosen": -7.518572807312012,
647
+ "logits/rejected": -7.9670090675354,
648
+ "logps/chosen": -120.34236145019531,
649
+ "logps/rejected": -144.70004272460938,
650
+ "loss": 0.4991,
651
+ "rewards/accuracies": 0.875,
652
+ "rewards/chosen": 0.11089307069778442,
653
+ "rewards/margins": 0.49521318078041077,
654
+ "rewards/rejected": -0.38432011008262634,
655
+ "step": 430
656
+ },
657
+ {
658
+ "epoch": 3.523138832997988,
659
+ "grad_norm": 4.232895851135254,
660
+ "learning_rate": 8.293333333333334e-06,
661
+ "logits/chosen": -8.115842819213867,
662
+ "logits/rejected": -7.9553937911987305,
663
+ "logps/chosen": -131.760986328125,
664
+ "logps/rejected": -138.90676879882812,
665
+ "loss": 0.5524,
666
+ "rewards/accuracies": 0.824999988079071,
667
+ "rewards/chosen": 0.02769925631582737,
668
+ "rewards/margins": 0.36288028955459595,
669
+ "rewards/rejected": -0.3351810574531555,
670
+ "step": 440
671
+ },
672
+ {
673
+ "epoch": 3.6036217303822937,
674
+ "grad_norm": 3.185037851333618,
675
+ "learning_rate": 8.026666666666667e-06,
676
+ "logits/chosen": -8.362442970275879,
677
+ "logits/rejected": -8.021516799926758,
678
+ "logps/chosen": -136.4081573486328,
679
+ "logps/rejected": -128.19985961914062,
680
+ "loss": 0.5657,
681
+ "rewards/accuracies": 0.800000011920929,
682
+ "rewards/chosen": 0.07184217125177383,
683
+ "rewards/margins": 0.3203974664211273,
684
+ "rewards/rejected": -0.2485552728176117,
685
+ "step": 450
686
+ },
687
+ {
688
+ "epoch": 3.6841046277665996,
689
+ "grad_norm": 4.260532855987549,
690
+ "learning_rate": 7.76e-06,
691
+ "logits/chosen": -7.6452765464782715,
692
+ "logits/rejected": -7.829669952392578,
693
+ "logps/chosen": -118.22314453125,
694
+ "logps/rejected": -128.7630615234375,
695
+ "loss": 0.5738,
696
+ "rewards/accuracies": 0.7749999761581421,
697
+ "rewards/chosen": 0.026561100035905838,
698
+ "rewards/margins": 0.29474154114723206,
699
+ "rewards/rejected": -0.2681804597377777,
700
+ "step": 460
701
+ },
702
+ {
703
+ "epoch": 3.7645875251509056,
704
+ "grad_norm": 3.9909141063690186,
705
+ "learning_rate": 7.493333333333333e-06,
706
+ "logits/chosen": -8.017112731933594,
707
+ "logits/rejected": -7.949077606201172,
708
+ "logps/chosen": -111.22465515136719,
709
+ "logps/rejected": -145.2229461669922,
710
+ "loss": 0.5364,
711
+ "rewards/accuracies": 0.7250000238418579,
712
+ "rewards/chosen": 0.08945528417825699,
713
+ "rewards/margins": 0.4134772717952728,
714
+ "rewards/rejected": -0.32402199506759644,
715
+ "step": 470
716
+ },
717
+ {
718
+ "epoch": 3.845070422535211,
719
+ "grad_norm": 4.2791523933410645,
720
+ "learning_rate": 7.226666666666667e-06,
721
+ "logits/chosen": -8.009933471679688,
722
+ "logits/rejected": -7.980963706970215,
723
+ "logps/chosen": -143.8748779296875,
724
+ "logps/rejected": -151.89736938476562,
725
+ "loss": 0.5372,
726
+ "rewards/accuracies": 0.8500000238418579,
727
+ "rewards/chosen": 0.024521049112081528,
728
+ "rewards/margins": 0.40748023986816406,
729
+ "rewards/rejected": -0.38295918703079224,
730
+ "step": 480
731
+ },
732
+ {
733
+ "epoch": 3.925553319919517,
734
+ "grad_norm": 3.2610886096954346,
735
+ "learning_rate": 6.96e-06,
736
+ "logits/chosen": -7.851205348968506,
737
+ "logits/rejected": -7.731414794921875,
738
+ "logps/chosen": -123.91703033447266,
739
+ "logps/rejected": -123.8227767944336,
740
+ "loss": 0.5066,
741
+ "rewards/accuracies": 0.824999988079071,
742
+ "rewards/chosen": 0.10982272773981094,
743
+ "rewards/margins": 0.46656376123428345,
744
+ "rewards/rejected": -0.3567410409450531,
745
+ "step": 490
746
+ },
747
+ {
748
+ "epoch": 4.0,
749
+ "grad_norm": 2.3405067920684814,
750
+ "learning_rate": 6.693333333333334e-06,
751
+ "logits/chosen": -8.007993698120117,
752
+ "logits/rejected": -7.974534511566162,
753
+ "logps/chosen": -136.6571502685547,
754
+ "logps/rejected": -155.05227661132812,
755
+ "loss": 0.5037,
756
+ "rewards/accuracies": 0.7837837934494019,
757
+ "rewards/chosen": 0.05127580463886261,
758
+ "rewards/margins": 0.39702004194259644,
759
+ "rewards/rejected": -0.34574422240257263,
760
+ "step": 500
761
+ }
762
+ ],
763
+ "logging_steps": 10,
764
+ "max_steps": 750,
765
+ "num_input_tokens_seen": 0,
766
+ "num_train_epochs": 7,
767
+ "save_steps": 500,
768
+ "stateful_callbacks": {
769
+ "TrainerControl": {
770
+ "args": {
771
+ "should_epoch_stop": false,
772
+ "should_evaluate": false,
773
+ "should_log": false,
774
+ "should_save": true,
775
+ "should_training_stop": false
776
+ },
777
+ "attributes": {}
778
+ }
779
+ },
780
+ "total_flos": 0.0,
781
+ "train_batch_size": 1,
782
+ "trial_name": null,
783
+ "trial_params": null
784
+ }
checkpoint-500/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:221328b488cbd7c8001fec74cbc28a889d631151b99e80fa9d7de1e2595f7246
3
+ size 6200
checkpoint-750/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: openlm-research/open_llama_3b
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
checkpoint-750/adapter_config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "openlm-research/open_llama_3b",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "v_proj",
28
+ "q_proj"
29
+ ],
30
+ "task_type": "CAUSAL_LM",
31
+ "trainable_token_indices": null,
32
+ "use_dora": false,
33
+ "use_rslora": false
34
+ }
checkpoint-750/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:607c3f2ed0bcf7175d2653b204b1d9456d5338a559ae6f2b0882238f2a4d2ae0
3
+ size 10663320
checkpoint-750/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:000610572ec96ce116395efd49923a098ee3865ece676de762d5cc76f629b24b
3
+ size 21386746
checkpoint-750/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:95b6047bd8cc6f4cdf7c46dea47edb8e542435510070c6cd1e0a7d9ccf5fd7da
3
+ size 14244
checkpoint-750/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a504ee88fd535edcd1ff9dcbaad07a4a854c27733e5765fb9b8035b9ab72d593
3
+ size 1064
checkpoint-750/special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
checkpoint-750/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-750/tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab1b681ec7fc02fed5edd3026687d7a692a918c4dd8e150ca2e3994a6229843b
3
+ size 534194
checkpoint-750/tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": true,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }
checkpoint-750/trainer_state.json ADDED
@@ -0,0 +1,1159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 6.0,
6
+ "eval_steps": 500,
7
+ "global_step": 750,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "epoch": 0.08048289738430583,
14
+ "grad_norm": 1.8390964269638062,
15
+ "learning_rate": 1.976e-05,
16
+ "logits/chosen": -7.928730010986328,
17
+ "logits/rejected": -7.768202304840088,
18
+ "logps/chosen": -126.51554870605469,
19
+ "logps/rejected": -141.75454711914062,
20
+ "loss": 0.6957,
21
+ "rewards/accuracies": 0.4000000059604645,
22
+ "rewards/chosen": -0.002580838045105338,
23
+ "rewards/margins": -0.004946361295878887,
24
+ "rewards/rejected": 0.0023655227851122618,
25
+ "step": 10
26
+ },
27
+ {
28
+ "epoch": 0.16096579476861167,
29
+ "grad_norm": 2.1244845390319824,
30
+ "learning_rate": 1.9493333333333335e-05,
31
+ "logits/chosen": -8.07243537902832,
32
+ "logits/rejected": -8.191374778747559,
33
+ "logps/chosen": -140.3431854248047,
34
+ "logps/rejected": -127.8196029663086,
35
+ "loss": 0.6941,
36
+ "rewards/accuracies": 0.4749999940395355,
37
+ "rewards/chosen": 0.0015918923309072852,
38
+ "rewards/margins": -0.0016587398713454604,
39
+ "rewards/rejected": 0.0032506324350833893,
40
+ "step": 20
41
+ },
42
+ {
43
+ "epoch": 0.2414486921529175,
44
+ "grad_norm": 2.229215383529663,
45
+ "learning_rate": 1.922666666666667e-05,
46
+ "logits/chosen": -7.767237663269043,
47
+ "logits/rejected": -7.859239101409912,
48
+ "logps/chosen": -140.90402221679688,
49
+ "logps/rejected": -136.17660522460938,
50
+ "loss": 0.6931,
51
+ "rewards/accuracies": 0.4749999940395355,
52
+ "rewards/chosen": 0.0016952038276940584,
53
+ "rewards/margins": 0.0003137874882668257,
54
+ "rewards/rejected": 0.001381416223011911,
55
+ "step": 30
56
+ },
57
+ {
58
+ "epoch": 0.32193158953722334,
59
+ "grad_norm": 1.4213330745697021,
60
+ "learning_rate": 1.896e-05,
61
+ "logits/chosen": -7.892869472503662,
62
+ "logits/rejected": -7.617051124572754,
63
+ "logps/chosen": -121.54151916503906,
64
+ "logps/rejected": -132.56336975097656,
65
+ "loss": 0.6901,
66
+ "rewards/accuracies": 0.574999988079071,
67
+ "rewards/chosen": 0.009255774319171906,
68
+ "rewards/margins": 0.006459495518356562,
69
+ "rewards/rejected": 0.0027962785679847,
70
+ "step": 40
71
+ },
72
+ {
73
+ "epoch": 0.4024144869215292,
74
+ "grad_norm": 2.9331471920013428,
75
+ "learning_rate": 1.8693333333333333e-05,
76
+ "logits/chosen": -7.751120090484619,
77
+ "logits/rejected": -7.877864837646484,
78
+ "logps/chosen": -125.05207824707031,
79
+ "logps/rejected": -139.2479248046875,
80
+ "loss": 0.6888,
81
+ "rewards/accuracies": 0.699999988079071,
82
+ "rewards/chosen": 0.013416772708296776,
83
+ "rewards/margins": 0.008954327553510666,
84
+ "rewards/rejected": 0.004462444689124823,
85
+ "step": 50
86
+ },
87
+ {
88
+ "epoch": 0.482897384305835,
89
+ "grad_norm": 1.9477262496948242,
90
+ "learning_rate": 1.8426666666666668e-05,
91
+ "logits/chosen": -7.746337890625,
92
+ "logits/rejected": -7.790966987609863,
93
+ "logps/chosen": -108.3724365234375,
94
+ "logps/rejected": -141.29953002929688,
95
+ "loss": 0.6947,
96
+ "rewards/accuracies": 0.44999998807907104,
97
+ "rewards/chosen": 0.0028126670513302088,
98
+ "rewards/margins": -0.0029423837549984455,
99
+ "rewards/rejected": 0.005755049642175436,
100
+ "step": 60
101
+ },
102
+ {
103
+ "epoch": 0.5633802816901409,
104
+ "grad_norm": 2.476123809814453,
105
+ "learning_rate": 1.8160000000000002e-05,
106
+ "logits/chosen": -7.934849739074707,
107
+ "logits/rejected": -7.784144401550293,
108
+ "logps/chosen": -148.48281860351562,
109
+ "logps/rejected": -140.32388305664062,
110
+ "loss": 0.6895,
111
+ "rewards/accuracies": 0.5,
112
+ "rewards/chosen": 0.016439538449048996,
113
+ "rewards/margins": 0.007786408998072147,
114
+ "rewards/rejected": 0.008653131313621998,
115
+ "step": 70
116
+ },
117
+ {
118
+ "epoch": 0.6438631790744467,
119
+ "grad_norm": 1.9035556316375732,
120
+ "learning_rate": 1.7893333333333337e-05,
121
+ "logits/chosen": -7.956778526306152,
122
+ "logits/rejected": -7.880636692047119,
123
+ "logps/chosen": -134.6743621826172,
124
+ "logps/rejected": -142.11549377441406,
125
+ "loss": 0.6855,
126
+ "rewards/accuracies": 0.6499999761581421,
127
+ "rewards/chosen": 0.005495529621839523,
128
+ "rewards/margins": 0.015812702476978302,
129
+ "rewards/rejected": -0.010317172855138779,
130
+ "step": 80
131
+ },
132
+ {
133
+ "epoch": 0.7243460764587525,
134
+ "grad_norm": 2.0266778469085693,
135
+ "learning_rate": 1.762666666666667e-05,
136
+ "logits/chosen": -7.979184627532959,
137
+ "logits/rejected": -8.19025993347168,
138
+ "logps/chosen": -129.11795043945312,
139
+ "logps/rejected": -133.7857666015625,
140
+ "loss": 0.6911,
141
+ "rewards/accuracies": 0.5249999761581421,
142
+ "rewards/chosen": 0.00029927687137387693,
143
+ "rewards/margins": 0.0047690351493656635,
144
+ "rewards/rejected": -0.004469756968319416,
145
+ "step": 90
146
+ },
147
+ {
148
+ "epoch": 0.8048289738430584,
149
+ "grad_norm": 2.324070453643799,
150
+ "learning_rate": 1.736e-05,
151
+ "logits/chosen": -7.854997158050537,
152
+ "logits/rejected": -7.669028282165527,
153
+ "logps/chosen": -129.60366821289062,
154
+ "logps/rejected": -129.22799682617188,
155
+ "loss": 0.6872,
156
+ "rewards/accuracies": 0.5249999761581421,
157
+ "rewards/chosen": 0.005187320522964001,
158
+ "rewards/margins": 0.012428809888660908,
159
+ "rewards/rejected": -0.007241488434374332,
160
+ "step": 100
161
+ },
162
+ {
163
+ "epoch": 0.8853118712273642,
164
+ "grad_norm": 2.3035223484039307,
165
+ "learning_rate": 1.7093333333333335e-05,
166
+ "logits/chosen": -7.684693336486816,
167
+ "logits/rejected": -8.104289054870605,
168
+ "logps/chosen": -135.45704650878906,
169
+ "logps/rejected": -142.0008544921875,
170
+ "loss": 0.6842,
171
+ "rewards/accuracies": 0.625,
172
+ "rewards/chosen": -0.0016596360364928842,
173
+ "rewards/margins": 0.0194082073867321,
174
+ "rewards/rejected": -0.021067844703793526,
175
+ "step": 110
176
+ },
177
+ {
178
+ "epoch": 0.96579476861167,
179
+ "grad_norm": 2.0869996547698975,
180
+ "learning_rate": 1.682666666666667e-05,
181
+ "logits/chosen": -8.05040168762207,
182
+ "logits/rejected": -8.025721549987793,
183
+ "logps/chosen": -127.3593978881836,
184
+ "logps/rejected": -122.40535736083984,
185
+ "loss": 0.6837,
186
+ "rewards/accuracies": 0.550000011920929,
187
+ "rewards/chosen": -0.012504220008850098,
188
+ "rewards/margins": 0.0199703611433506,
189
+ "rewards/rejected": -0.0324745811522007,
190
+ "step": 120
191
+ },
192
+ {
193
+ "epoch": 1.040241448692153,
194
+ "grad_norm": 1.9564281702041626,
195
+ "learning_rate": 1.656e-05,
196
+ "logits/chosen": -8.038365364074707,
197
+ "logits/rejected": -8.249411582946777,
198
+ "logps/chosen": -122.29254913330078,
199
+ "logps/rejected": -139.63780212402344,
200
+ "loss": 0.628,
201
+ "rewards/accuracies": 0.6486486196517944,
202
+ "rewards/chosen": -0.008433423936367035,
203
+ "rewards/margins": 0.0293881893157959,
204
+ "rewards/rejected": -0.03782161325216293,
205
+ "step": 130
206
+ },
207
+ {
208
+ "epoch": 1.1207243460764587,
209
+ "grad_norm": 2.4709627628326416,
210
+ "learning_rate": 1.6293333333333335e-05,
211
+ "logits/chosen": -7.3343305587768555,
212
+ "logits/rejected": -7.533148288726807,
213
+ "logps/chosen": -125.78892517089844,
214
+ "logps/rejected": -131.87570190429688,
215
+ "loss": 0.6687,
216
+ "rewards/accuracies": 0.75,
217
+ "rewards/chosen": 0.01648072525858879,
218
+ "rewards/margins": 0.05134889483451843,
219
+ "rewards/rejected": -0.03486816957592964,
220
+ "step": 140
221
+ },
222
+ {
223
+ "epoch": 1.2012072434607646,
224
+ "grad_norm": 2.2346677780151367,
225
+ "learning_rate": 1.6026666666666667e-05,
226
+ "logits/chosen": -7.965804100036621,
227
+ "logits/rejected": -8.229939460754395,
228
+ "logps/chosen": -134.5069580078125,
229
+ "logps/rejected": -150.25515747070312,
230
+ "loss": 0.6559,
231
+ "rewards/accuracies": 0.875,
232
+ "rewards/chosen": 0.00035553425550460815,
233
+ "rewards/margins": 0.0779074877500534,
234
+ "rewards/rejected": -0.0775519534945488,
235
+ "step": 150
236
+ },
237
+ {
238
+ "epoch": 1.2816901408450705,
239
+ "grad_norm": 2.3957595825195312,
240
+ "learning_rate": 1.576e-05,
241
+ "logits/chosen": -7.930548667907715,
242
+ "logits/rejected": -8.198356628417969,
243
+ "logps/chosen": -127.7699203491211,
244
+ "logps/rejected": -140.8756866455078,
245
+ "loss": 0.6581,
246
+ "rewards/accuracies": 0.7250000238418579,
247
+ "rewards/chosen": 0.0083924550563097,
248
+ "rewards/margins": 0.07435399293899536,
249
+ "rewards/rejected": -0.06596153974533081,
250
+ "step": 160
251
+ },
252
+ {
253
+ "epoch": 1.3621730382293762,
254
+ "grad_norm": 2.6251115798950195,
255
+ "learning_rate": 1.5493333333333333e-05,
256
+ "logits/chosen": -8.110880851745605,
257
+ "logits/rejected": -8.14248275756836,
258
+ "logps/chosen": -124.23824310302734,
259
+ "logps/rejected": -133.23915100097656,
260
+ "loss": 0.6484,
261
+ "rewards/accuracies": 0.75,
262
+ "rewards/chosen": 0.02494233287870884,
263
+ "rewards/margins": 0.09594295918941498,
264
+ "rewards/rejected": -0.07100063562393188,
265
+ "step": 170
266
+ },
267
+ {
268
+ "epoch": 1.442655935613682,
269
+ "grad_norm": 2.497619867324829,
270
+ "learning_rate": 1.5226666666666668e-05,
271
+ "logits/chosen": -7.945010185241699,
272
+ "logits/rejected": -7.939410209655762,
273
+ "logps/chosen": -116.3001937866211,
274
+ "logps/rejected": -115.17778015136719,
275
+ "loss": 0.6531,
276
+ "rewards/accuracies": 0.800000011920929,
277
+ "rewards/chosen": 0.0008405310800299048,
278
+ "rewards/margins": 0.08627375215291977,
279
+ "rewards/rejected": -0.085433229804039,
280
+ "step": 180
281
+ },
282
+ {
283
+ "epoch": 1.5231388329979878,
284
+ "grad_norm": 2.756197690963745,
285
+ "learning_rate": 1.496e-05,
286
+ "logits/chosen": -7.959776401519775,
287
+ "logits/rejected": -7.889138698577881,
288
+ "logps/chosen": -137.98397827148438,
289
+ "logps/rejected": -134.6807403564453,
290
+ "loss": 0.6609,
291
+ "rewards/accuracies": 0.699999988079071,
292
+ "rewards/chosen": 0.0001740553416311741,
293
+ "rewards/margins": 0.06992082297801971,
294
+ "rewards/rejected": -0.06974677741527557,
295
+ "step": 190
296
+ },
297
+ {
298
+ "epoch": 1.6036217303822937,
299
+ "grad_norm": 3.126654624938965,
300
+ "learning_rate": 1.4693333333333336e-05,
301
+ "logits/chosen": -7.728564262390137,
302
+ "logits/rejected": -7.694819450378418,
303
+ "logps/chosen": -140.72019958496094,
304
+ "logps/rejected": -146.0216064453125,
305
+ "loss": 0.6504,
306
+ "rewards/accuracies": 0.75,
307
+ "rewards/chosen": 0.004489220678806305,
308
+ "rewards/margins": 0.09480009973049164,
309
+ "rewards/rejected": -0.09031088650226593,
310
+ "step": 200
311
+ },
312
+ {
313
+ "epoch": 1.6841046277665996,
314
+ "grad_norm": 2.8069660663604736,
315
+ "learning_rate": 1.4426666666666669e-05,
316
+ "logits/chosen": -8.206178665161133,
317
+ "logits/rejected": -7.909379482269287,
318
+ "logps/chosen": -133.2583465576172,
319
+ "logps/rejected": -140.89820861816406,
320
+ "loss": 0.6309,
321
+ "rewards/accuracies": 0.8999999761581421,
322
+ "rewards/chosen": 0.0008678575977683067,
323
+ "rewards/margins": 0.14069929718971252,
324
+ "rewards/rejected": -0.13983140885829926,
325
+ "step": 210
326
+ },
327
+ {
328
+ "epoch": 1.7645875251509056,
329
+ "grad_norm": 3.165580987930298,
330
+ "learning_rate": 1.416e-05,
331
+ "logits/chosen": -7.514456748962402,
332
+ "logits/rejected": -7.704525947570801,
333
+ "logps/chosen": -133.6655731201172,
334
+ "logps/rejected": -133.7900390625,
335
+ "loss": 0.6461,
336
+ "rewards/accuracies": 0.7749999761581421,
337
+ "rewards/chosen": -0.018594294786453247,
338
+ "rewards/margins": 0.1046491265296936,
339
+ "rewards/rejected": -0.12324341386556625,
340
+ "step": 220
341
+ },
342
+ {
343
+ "epoch": 1.8450704225352113,
344
+ "grad_norm": 2.8950154781341553,
345
+ "learning_rate": 1.3893333333333335e-05,
346
+ "logits/chosen": -7.968588352203369,
347
+ "logits/rejected": -7.763016700744629,
348
+ "logps/chosen": -121.14384460449219,
349
+ "logps/rejected": -138.0037384033203,
350
+ "loss": 0.6169,
351
+ "rewards/accuracies": 0.824999988079071,
352
+ "rewards/chosen": 0.04061353951692581,
353
+ "rewards/margins": 0.16627803444862366,
354
+ "rewards/rejected": -0.12566450238227844,
355
+ "step": 230
356
+ },
357
+ {
358
+ "epoch": 1.925553319919517,
359
+ "grad_norm": 2.657349109649658,
360
+ "learning_rate": 1.3626666666666668e-05,
361
+ "logits/chosen": -7.7075958251953125,
362
+ "logits/rejected": -7.8631134033203125,
363
+ "logps/chosen": -135.63150024414062,
364
+ "logps/rejected": -142.60678100585938,
365
+ "loss": 0.6269,
366
+ "rewards/accuracies": 0.824999988079071,
367
+ "rewards/chosen": 0.0013963343808427453,
368
+ "rewards/margins": 0.1436794400215149,
369
+ "rewards/rejected": -0.1422831118106842,
370
+ "step": 240
371
+ },
372
+ {
373
+ "epoch": 2.0,
374
+ "grad_norm": 1.3372068405151367,
375
+ "learning_rate": 1.3360000000000003e-05,
376
+ "logits/chosen": -8.093542098999023,
377
+ "logits/rejected": -7.9440999031066895,
378
+ "logps/chosen": -126.51072692871094,
379
+ "logps/rejected": -128.38294982910156,
380
+ "loss": 0.6052,
381
+ "rewards/accuracies": 0.7027027010917664,
382
+ "rewards/chosen": 0.015508824028074741,
383
+ "rewards/margins": 0.10147809982299805,
384
+ "rewards/rejected": -0.08596926927566528,
385
+ "step": 250
386
+ },
387
+ {
388
+ "epoch": 2.080482897384306,
389
+ "grad_norm": 2.8451366424560547,
390
+ "learning_rate": 1.3093333333333334e-05,
391
+ "logits/chosen": -7.710860252380371,
392
+ "logits/rejected": -7.813695430755615,
393
+ "logps/chosen": -131.70631408691406,
394
+ "logps/rejected": -132.4761962890625,
395
+ "loss": 0.6136,
396
+ "rewards/accuracies": 0.7749999761581421,
397
+ "rewards/chosen": 0.05115525797009468,
398
+ "rewards/margins": 0.1773657351732254,
399
+ "rewards/rejected": -0.12621048092842102,
400
+ "step": 260
401
+ },
402
+ {
403
+ "epoch": 2.160965794768612,
404
+ "grad_norm": 3.0381789207458496,
405
+ "learning_rate": 1.2826666666666667e-05,
406
+ "logits/chosen": -7.992362976074219,
407
+ "logits/rejected": -8.11289119720459,
408
+ "logps/chosen": -148.1260986328125,
409
+ "logps/rejected": -150.82127380371094,
410
+ "loss": 0.6069,
411
+ "rewards/accuracies": 0.875,
412
+ "rewards/chosen": -0.01732814498245716,
413
+ "rewards/margins": 0.19258996844291687,
414
+ "rewards/rejected": -0.20991814136505127,
415
+ "step": 270
416
+ },
417
+ {
418
+ "epoch": 2.2414486921529173,
419
+ "grad_norm": 3.104029417037964,
420
+ "learning_rate": 1.2560000000000002e-05,
421
+ "logits/chosen": -7.620333671569824,
422
+ "logits/rejected": -7.614747524261475,
423
+ "logps/chosen": -120.11073303222656,
424
+ "logps/rejected": -146.62948608398438,
425
+ "loss": 0.6026,
426
+ "rewards/accuracies": 0.824999988079071,
427
+ "rewards/chosen": 0.06914319843053818,
428
+ "rewards/margins": 0.22265009582042694,
429
+ "rewards/rejected": -0.15350690484046936,
430
+ "step": 280
431
+ },
432
+ {
433
+ "epoch": 2.3219315895372232,
434
+ "grad_norm": 2.918400526046753,
435
+ "learning_rate": 1.2293333333333335e-05,
436
+ "logits/chosen": -8.447749137878418,
437
+ "logits/rejected": -8.199603080749512,
438
+ "logps/chosen": -123.7292709350586,
439
+ "logps/rejected": -141.75552368164062,
440
+ "loss": 0.6137,
441
+ "rewards/accuracies": 0.8999999761581421,
442
+ "rewards/chosen": -0.0003969132958445698,
443
+ "rewards/margins": 0.17746230959892273,
444
+ "rewards/rejected": -0.17785921692848206,
445
+ "step": 290
446
+ },
447
+ {
448
+ "epoch": 2.402414486921529,
449
+ "grad_norm": 3.638888120651245,
450
+ "learning_rate": 1.202666666666667e-05,
451
+ "logits/chosen": -7.351523399353027,
452
+ "logits/rejected": -7.5525007247924805,
453
+ "logps/chosen": -114.2864990234375,
454
+ "logps/rejected": -128.71466064453125,
455
+ "loss": 0.5855,
456
+ "rewards/accuracies": 0.8500000238418579,
457
+ "rewards/chosen": 0.08879393339157104,
458
+ "rewards/margins": 0.24962946772575378,
459
+ "rewards/rejected": -0.16083553433418274,
460
+ "step": 300
461
+ },
462
+ {
463
+ "epoch": 2.482897384305835,
464
+ "grad_norm": 2.6371285915374756,
465
+ "learning_rate": 1.1760000000000001e-05,
466
+ "logits/chosen": -7.6833176612854,
467
+ "logits/rejected": -7.768864631652832,
468
+ "logps/chosen": -119.72042083740234,
469
+ "logps/rejected": -116.38240051269531,
470
+ "loss": 0.5752,
471
+ "rewards/accuracies": 0.8999999761581421,
472
+ "rewards/chosen": 0.0737290009856224,
473
+ "rewards/margins": 0.267278254032135,
474
+ "rewards/rejected": -0.1935492306947708,
475
+ "step": 310
476
+ },
477
+ {
478
+ "epoch": 2.563380281690141,
479
+ "grad_norm": 3.9384307861328125,
480
+ "learning_rate": 1.1493333333333334e-05,
481
+ "logits/chosen": -7.878905296325684,
482
+ "logits/rejected": -7.912691593170166,
483
+ "logps/chosen": -130.33595275878906,
484
+ "logps/rejected": -141.26541137695312,
485
+ "loss": 0.5933,
486
+ "rewards/accuracies": 0.800000011920929,
487
+ "rewards/chosen": 0.0017801120411604643,
488
+ "rewards/margins": 0.22636179625988007,
489
+ "rewards/rejected": -0.22458168864250183,
490
+ "step": 320
491
+ },
492
+ {
493
+ "epoch": 2.6438631790744465,
494
+ "grad_norm": 3.0923478603363037,
495
+ "learning_rate": 1.1226666666666669e-05,
496
+ "logits/chosen": -8.036542892456055,
497
+ "logits/rejected": -8.248276710510254,
498
+ "logps/chosen": -125.16792297363281,
499
+ "logps/rejected": -132.63623046875,
500
+ "loss": 0.5676,
501
+ "rewards/accuracies": 0.824999988079071,
502
+ "rewards/chosen": 0.07399231195449829,
503
+ "rewards/margins": 0.29761144518852234,
504
+ "rewards/rejected": -0.22361913323402405,
505
+ "step": 330
506
+ },
507
+ {
508
+ "epoch": 2.7243460764587524,
509
+ "grad_norm": 4.881553649902344,
510
+ "learning_rate": 1.0960000000000002e-05,
511
+ "logits/chosen": -7.877285957336426,
512
+ "logits/rejected": -8.097951889038086,
513
+ "logps/chosen": -126.46671295166016,
514
+ "logps/rejected": -139.54888916015625,
515
+ "loss": 0.5812,
516
+ "rewards/accuracies": 0.8500000238418579,
517
+ "rewards/chosen": -0.05153612047433853,
518
+ "rewards/margins": 0.2616890072822571,
519
+ "rewards/rejected": -0.3132251501083374,
520
+ "step": 340
521
+ },
522
+ {
523
+ "epoch": 2.8048289738430583,
524
+ "grad_norm": 3.55159068107605,
525
+ "learning_rate": 1.0693333333333333e-05,
526
+ "logits/chosen": -7.992476463317871,
527
+ "logits/rejected": -7.782661437988281,
528
+ "logps/chosen": -144.28924560546875,
529
+ "logps/rejected": -154.6568145751953,
530
+ "loss": 0.578,
531
+ "rewards/accuracies": 0.75,
532
+ "rewards/chosen": -0.04911986365914345,
533
+ "rewards/margins": 0.29945996403694153,
534
+ "rewards/rejected": -0.3485798239707947,
535
+ "step": 350
536
+ },
537
+ {
538
+ "epoch": 2.885311871227364,
539
+ "grad_norm": 3.865807056427002,
540
+ "learning_rate": 1.0426666666666668e-05,
541
+ "logits/chosen": -8.330907821655273,
542
+ "logits/rejected": -8.05485725402832,
543
+ "logps/chosen": -145.71182250976562,
544
+ "logps/rejected": -136.82615661621094,
545
+ "loss": 0.5965,
546
+ "rewards/accuracies": 0.7250000238418579,
547
+ "rewards/chosen": -0.02360793948173523,
548
+ "rewards/margins": 0.23897425830364227,
549
+ "rewards/rejected": -0.2625822126865387,
550
+ "step": 360
551
+ },
552
+ {
553
+ "epoch": 2.96579476861167,
554
+ "grad_norm": 5.761505126953125,
555
+ "learning_rate": 1.0160000000000001e-05,
556
+ "logits/chosen": -7.840609073638916,
557
+ "logits/rejected": -7.903719425201416,
558
+ "logps/chosen": -130.0011444091797,
559
+ "logps/rejected": -134.54214477539062,
560
+ "loss": 0.5836,
561
+ "rewards/accuracies": 0.75,
562
+ "rewards/chosen": 0.01680201105773449,
563
+ "rewards/margins": 0.30428779125213623,
564
+ "rewards/rejected": -0.2874857783317566,
565
+ "step": 370
566
+ },
567
+ {
568
+ "epoch": 3.0402414486921527,
569
+ "grad_norm": 5.530464172363281,
570
+ "learning_rate": 9.893333333333334e-06,
571
+ "logits/chosen": -7.832857608795166,
572
+ "logits/rejected": -7.836492538452148,
573
+ "logps/chosen": -126.37843322753906,
574
+ "logps/rejected": -122.53112030029297,
575
+ "loss": 0.5751,
576
+ "rewards/accuracies": 0.7027027010917664,
577
+ "rewards/chosen": -0.007279620040208101,
578
+ "rewards/margins": 0.2105865776538849,
579
+ "rewards/rejected": -0.2178661823272705,
580
+ "step": 380
581
+ },
582
+ {
583
+ "epoch": 3.1207243460764587,
584
+ "grad_norm": 2.370408296585083,
585
+ "learning_rate": 9.626666666666667e-06,
586
+ "logits/chosen": -7.422645568847656,
587
+ "logits/rejected": -7.677459716796875,
588
+ "logps/chosen": -125.61036682128906,
589
+ "logps/rejected": -142.198486328125,
590
+ "loss": 0.5097,
591
+ "rewards/accuracies": 0.875,
592
+ "rewards/chosen": 0.1445888727903366,
593
+ "rewards/margins": 0.4941856265068054,
594
+ "rewards/rejected": -0.3495967984199524,
595
+ "step": 390
596
+ },
597
+ {
598
+ "epoch": 3.2012072434607646,
599
+ "grad_norm": 2.9659812450408936,
600
+ "learning_rate": 9.360000000000002e-06,
601
+ "logits/chosen": -7.914555549621582,
602
+ "logits/rejected": -7.748204708099365,
603
+ "logps/chosen": -126.63890075683594,
604
+ "logps/rejected": -142.98391723632812,
605
+ "loss": 0.5004,
606
+ "rewards/accuracies": 0.925000011920929,
607
+ "rewards/chosen": 0.15164130926132202,
608
+ "rewards/margins": 0.46555933356285095,
609
+ "rewards/rejected": -0.31391802430152893,
610
+ "step": 400
611
+ },
612
+ {
613
+ "epoch": 3.2816901408450705,
614
+ "grad_norm": 4.255645751953125,
615
+ "learning_rate": 9.093333333333333e-06,
616
+ "logits/chosen": -8.225273132324219,
617
+ "logits/rejected": -8.452564239501953,
618
+ "logps/chosen": -139.03164672851562,
619
+ "logps/rejected": -141.5215606689453,
620
+ "loss": 0.5326,
621
+ "rewards/accuracies": 0.8500000238418579,
622
+ "rewards/chosen": 0.07439279556274414,
623
+ "rewards/margins": 0.40740150213241577,
624
+ "rewards/rejected": -0.33300870656967163,
625
+ "step": 410
626
+ },
627
+ {
628
+ "epoch": 3.3621730382293764,
629
+ "grad_norm": 3.6998543739318848,
630
+ "learning_rate": 8.826666666666668e-06,
631
+ "logits/chosen": -7.763664245605469,
632
+ "logits/rejected": -7.92657470703125,
633
+ "logps/chosen": -138.3274383544922,
634
+ "logps/rejected": -139.1911163330078,
635
+ "loss": 0.5505,
636
+ "rewards/accuracies": 0.8500000238418579,
637
+ "rewards/chosen": 0.045943666249513626,
638
+ "rewards/margins": 0.3697761595249176,
639
+ "rewards/rejected": -0.3238324820995331,
640
+ "step": 420
641
+ },
642
+ {
643
+ "epoch": 3.442655935613682,
644
+ "grad_norm": 2.9942128658294678,
645
+ "learning_rate": 8.560000000000001e-06,
646
+ "logits/chosen": -7.518572807312012,
647
+ "logits/rejected": -7.9670090675354,
648
+ "logps/chosen": -120.34236145019531,
649
+ "logps/rejected": -144.70004272460938,
650
+ "loss": 0.4991,
651
+ "rewards/accuracies": 0.875,
652
+ "rewards/chosen": 0.11089307069778442,
653
+ "rewards/margins": 0.49521318078041077,
654
+ "rewards/rejected": -0.38432011008262634,
655
+ "step": 430
656
+ },
657
+ {
658
+ "epoch": 3.523138832997988,
659
+ "grad_norm": 4.232895851135254,
660
+ "learning_rate": 8.293333333333334e-06,
661
+ "logits/chosen": -8.115842819213867,
662
+ "logits/rejected": -7.9553937911987305,
663
+ "logps/chosen": -131.760986328125,
664
+ "logps/rejected": -138.90676879882812,
665
+ "loss": 0.5524,
666
+ "rewards/accuracies": 0.824999988079071,
667
+ "rewards/chosen": 0.02769925631582737,
668
+ "rewards/margins": 0.36288028955459595,
669
+ "rewards/rejected": -0.3351810574531555,
670
+ "step": 440
671
+ },
672
+ {
673
+ "epoch": 3.6036217303822937,
674
+ "grad_norm": 3.185037851333618,
675
+ "learning_rate": 8.026666666666667e-06,
676
+ "logits/chosen": -8.362442970275879,
677
+ "logits/rejected": -8.021516799926758,
678
+ "logps/chosen": -136.4081573486328,
679
+ "logps/rejected": -128.19985961914062,
680
+ "loss": 0.5657,
681
+ "rewards/accuracies": 0.800000011920929,
682
+ "rewards/chosen": 0.07184217125177383,
683
+ "rewards/margins": 0.3203974664211273,
684
+ "rewards/rejected": -0.2485552728176117,
685
+ "step": 450
686
+ },
687
+ {
688
+ "epoch": 3.6841046277665996,
689
+ "grad_norm": 4.260532855987549,
690
+ "learning_rate": 7.76e-06,
691
+ "logits/chosen": -7.6452765464782715,
692
+ "logits/rejected": -7.829669952392578,
693
+ "logps/chosen": -118.22314453125,
694
+ "logps/rejected": -128.7630615234375,
695
+ "loss": 0.5738,
696
+ "rewards/accuracies": 0.7749999761581421,
697
+ "rewards/chosen": 0.026561100035905838,
698
+ "rewards/margins": 0.29474154114723206,
699
+ "rewards/rejected": -0.2681804597377777,
700
+ "step": 460
701
+ },
702
+ {
703
+ "epoch": 3.7645875251509056,
704
+ "grad_norm": 3.9909141063690186,
705
+ "learning_rate": 7.493333333333333e-06,
706
+ "logits/chosen": -8.017112731933594,
707
+ "logits/rejected": -7.949077606201172,
708
+ "logps/chosen": -111.22465515136719,
709
+ "logps/rejected": -145.2229461669922,
710
+ "loss": 0.5364,
711
+ "rewards/accuracies": 0.7250000238418579,
712
+ "rewards/chosen": 0.08945528417825699,
713
+ "rewards/margins": 0.4134772717952728,
714
+ "rewards/rejected": -0.32402199506759644,
715
+ "step": 470
716
+ },
717
+ {
718
+ "epoch": 3.845070422535211,
719
+ "grad_norm": 4.2791523933410645,
720
+ "learning_rate": 7.226666666666667e-06,
721
+ "logits/chosen": -8.009933471679688,
722
+ "logits/rejected": -7.980963706970215,
723
+ "logps/chosen": -143.8748779296875,
724
+ "logps/rejected": -151.89736938476562,
725
+ "loss": 0.5372,
726
+ "rewards/accuracies": 0.8500000238418579,
727
+ "rewards/chosen": 0.024521049112081528,
728
+ "rewards/margins": 0.40748023986816406,
729
+ "rewards/rejected": -0.38295918703079224,
730
+ "step": 480
731
+ },
732
+ {
733
+ "epoch": 3.925553319919517,
734
+ "grad_norm": 3.2610886096954346,
735
+ "learning_rate": 6.96e-06,
736
+ "logits/chosen": -7.851205348968506,
737
+ "logits/rejected": -7.731414794921875,
738
+ "logps/chosen": -123.91703033447266,
739
+ "logps/rejected": -123.8227767944336,
740
+ "loss": 0.5066,
741
+ "rewards/accuracies": 0.824999988079071,
742
+ "rewards/chosen": 0.10982272773981094,
743
+ "rewards/margins": 0.46656376123428345,
744
+ "rewards/rejected": -0.3567410409450531,
745
+ "step": 490
746
+ },
747
+ {
748
+ "epoch": 4.0,
749
+ "grad_norm": 2.3405067920684814,
750
+ "learning_rate": 6.693333333333334e-06,
751
+ "logits/chosen": -8.007993698120117,
752
+ "logits/rejected": -7.974534511566162,
753
+ "logps/chosen": -136.6571502685547,
754
+ "logps/rejected": -155.05227661132812,
755
+ "loss": 0.5037,
756
+ "rewards/accuracies": 0.7837837934494019,
757
+ "rewards/chosen": 0.05127580463886261,
758
+ "rewards/margins": 0.39702004194259644,
759
+ "rewards/rejected": -0.34574422240257263,
760
+ "step": 500
761
+ },
762
+ {
763
+ "epoch": 4.0804828973843055,
764
+ "grad_norm": 3.0611572265625,
765
+ "learning_rate": 6.426666666666668e-06,
766
+ "logits/chosen": -7.8229827880859375,
767
+ "logits/rejected": -8.056346893310547,
768
+ "logps/chosen": -123.7149887084961,
769
+ "logps/rejected": -154.68173217773438,
770
+ "loss": 0.5072,
771
+ "rewards/accuracies": 0.8500000238418579,
772
+ "rewards/chosen": 0.04092025011777878,
773
+ "rewards/margins": 0.4962732791900635,
774
+ "rewards/rejected": -0.4553530812263489,
775
+ "step": 510
776
+ },
777
+ {
778
+ "epoch": 4.160965794768612,
779
+ "grad_norm": 3.2638542652130127,
780
+ "learning_rate": 6.16e-06,
781
+ "logits/chosen": -7.906125068664551,
782
+ "logits/rejected": -7.711597442626953,
783
+ "logps/chosen": -120.27784729003906,
784
+ "logps/rejected": -128.5182647705078,
785
+ "loss": 0.5079,
786
+ "rewards/accuracies": 0.8500000238418579,
787
+ "rewards/chosen": 0.06980352103710175,
788
+ "rewards/margins": 0.5142830610275269,
789
+ "rewards/rejected": -0.4444795250892639,
790
+ "step": 520
791
+ },
792
+ {
793
+ "epoch": 4.241448692152917,
794
+ "grad_norm": 5.444788932800293,
795
+ "learning_rate": 5.893333333333334e-06,
796
+ "logits/chosen": -8.111169815063477,
797
+ "logits/rejected": -7.884283542633057,
798
+ "logps/chosen": -140.63882446289062,
799
+ "logps/rejected": -140.9823760986328,
800
+ "loss": 0.566,
801
+ "rewards/accuracies": 0.800000011920929,
802
+ "rewards/chosen": -0.051231205463409424,
803
+ "rewards/margins": 0.33118587732315063,
804
+ "rewards/rejected": -0.38241708278656006,
805
+ "step": 530
806
+ },
807
+ {
808
+ "epoch": 4.321931589537224,
809
+ "grad_norm": 3.564617156982422,
810
+ "learning_rate": 5.626666666666667e-06,
811
+ "logits/chosen": -8.093330383300781,
812
+ "logits/rejected": -8.017361640930176,
813
+ "logps/chosen": -139.41275024414062,
814
+ "logps/rejected": -138.95274353027344,
815
+ "loss": 0.5066,
816
+ "rewards/accuracies": 0.8500000238418579,
817
+ "rewards/chosen": 0.10449226200580597,
818
+ "rewards/margins": 0.4961455762386322,
819
+ "rewards/rejected": -0.39165326952934265,
820
+ "step": 540
821
+ },
822
+ {
823
+ "epoch": 4.402414486921529,
824
+ "grad_norm": 4.48131799697876,
825
+ "learning_rate": 5.36e-06,
826
+ "logits/chosen": -8.066251754760742,
827
+ "logits/rejected": -7.98406982421875,
828
+ "logps/chosen": -134.0875244140625,
829
+ "logps/rejected": -134.54095458984375,
830
+ "loss": 0.5085,
831
+ "rewards/accuracies": 0.8500000238418579,
832
+ "rewards/chosen": 0.18554718792438507,
833
+ "rewards/margins": 0.5177052617073059,
834
+ "rewards/rejected": -0.33215808868408203,
835
+ "step": 550
836
+ },
837
+ {
838
+ "epoch": 4.482897384305835,
839
+ "grad_norm": 2.9168806076049805,
840
+ "learning_rate": 5.093333333333333e-06,
841
+ "logits/chosen": -8.315153121948242,
842
+ "logits/rejected": -8.331533432006836,
843
+ "logps/chosen": -130.7205352783203,
844
+ "logps/rejected": -128.97933959960938,
845
+ "loss": 0.495,
846
+ "rewards/accuracies": 0.8500000238418579,
847
+ "rewards/chosen": 0.07991756498813629,
848
+ "rewards/margins": 0.5530857443809509,
849
+ "rewards/rejected": -0.47316819429397583,
850
+ "step": 560
851
+ },
852
+ {
853
+ "epoch": 4.563380281690141,
854
+ "grad_norm": 4.712265968322754,
855
+ "learning_rate": 4.826666666666667e-06,
856
+ "logits/chosen": -7.964537143707275,
857
+ "logits/rejected": -8.250347137451172,
858
+ "logps/chosen": -107.0856704711914,
859
+ "logps/rejected": -131.0066680908203,
860
+ "loss": 0.5051,
861
+ "rewards/accuracies": 0.7749999761581421,
862
+ "rewards/chosen": 0.03752124309539795,
863
+ "rewards/margins": 0.5069595575332642,
864
+ "rewards/rejected": -0.46943825483322144,
865
+ "step": 570
866
+ },
867
+ {
868
+ "epoch": 4.6438631790744465,
869
+ "grad_norm": 5.547153472900391,
870
+ "learning_rate": 4.56e-06,
871
+ "logits/chosen": -7.999167442321777,
872
+ "logits/rejected": -7.8645429611206055,
873
+ "logps/chosen": -123.46644592285156,
874
+ "logps/rejected": -145.62319946289062,
875
+ "loss": 0.5287,
876
+ "rewards/accuracies": 0.800000011920929,
877
+ "rewards/chosen": 0.018197838217020035,
878
+ "rewards/margins": 0.43459415435791016,
879
+ "rewards/rejected": -0.4163963198661804,
880
+ "step": 580
881
+ },
882
+ {
883
+ "epoch": 4.724346076458753,
884
+ "grad_norm": 4.132951259613037,
885
+ "learning_rate": 4.2933333333333334e-06,
886
+ "logits/chosen": -7.352658271789551,
887
+ "logits/rejected": -7.5392165184021,
888
+ "logps/chosen": -122.26692962646484,
889
+ "logps/rejected": -144.80618286132812,
890
+ "loss": 0.4491,
891
+ "rewards/accuracies": 0.875,
892
+ "rewards/chosen": 0.1904078871011734,
893
+ "rewards/margins": 0.6379453539848328,
894
+ "rewards/rejected": -0.4475374221801758,
895
+ "step": 590
896
+ },
897
+ {
898
+ "epoch": 4.804828973843058,
899
+ "grad_norm": 3.974529981613159,
900
+ "learning_rate": 4.026666666666667e-06,
901
+ "logits/chosen": -7.725615501403809,
902
+ "logits/rejected": -7.880255222320557,
903
+ "logps/chosen": -126.33785247802734,
904
+ "logps/rejected": -141.9647979736328,
905
+ "loss": 0.4566,
906
+ "rewards/accuracies": 0.875,
907
+ "rewards/chosen": 0.041528504341840744,
908
+ "rewards/margins": 0.6287237405776978,
909
+ "rewards/rejected": -0.5871952176094055,
910
+ "step": 600
911
+ },
912
+ {
913
+ "epoch": 4.885311871227364,
914
+ "grad_norm": 5.846194744110107,
915
+ "learning_rate": 3.7600000000000004e-06,
916
+ "logits/chosen": -7.711016654968262,
917
+ "logits/rejected": -7.9464921951293945,
918
+ "logps/chosen": -140.57571411132812,
919
+ "logps/rejected": -134.4864959716797,
920
+ "loss": 0.5305,
921
+ "rewards/accuracies": 0.7749999761581421,
922
+ "rewards/chosen": 0.10763657093048096,
923
+ "rewards/margins": 0.5108700394630432,
924
+ "rewards/rejected": -0.40323343873023987,
925
+ "step": 610
926
+ },
927
+ {
928
+ "epoch": 4.96579476861167,
929
+ "grad_norm": 4.602907180786133,
930
+ "learning_rate": 3.4933333333333335e-06,
931
+ "logits/chosen": -7.861893653869629,
932
+ "logits/rejected": -7.691662788391113,
933
+ "logps/chosen": -134.75808715820312,
934
+ "logps/rejected": -151.4270782470703,
935
+ "loss": 0.5428,
936
+ "rewards/accuracies": 0.800000011920929,
937
+ "rewards/chosen": 0.012985095381736755,
938
+ "rewards/margins": 0.4465901255607605,
939
+ "rewards/rejected": -0.4336049556732178,
940
+ "step": 620
941
+ },
942
+ {
943
+ "epoch": 5.040241448692153,
944
+ "grad_norm": 4.058650493621826,
945
+ "learning_rate": 3.226666666666667e-06,
946
+ "logits/chosen": -7.957271099090576,
947
+ "logits/rejected": -8.070477485656738,
948
+ "logps/chosen": -139.47348022460938,
949
+ "logps/rejected": -148.40493774414062,
950
+ "loss": 0.4508,
951
+ "rewards/accuracies": 0.8108108043670654,
952
+ "rewards/chosen": 0.11836089193820953,
953
+ "rewards/margins": 0.5762597918510437,
954
+ "rewards/rejected": -0.457898885011673,
955
+ "step": 630
956
+ },
957
+ {
958
+ "epoch": 5.120724346076459,
959
+ "grad_norm": 3.104494571685791,
960
+ "learning_rate": 2.96e-06,
961
+ "logits/chosen": -8.098637580871582,
962
+ "logits/rejected": -8.007506370544434,
963
+ "logps/chosen": -117.58427429199219,
964
+ "logps/rejected": -141.4518585205078,
965
+ "loss": 0.5134,
966
+ "rewards/accuracies": 0.8999999761581421,
967
+ "rewards/chosen": 0.013196405954658985,
968
+ "rewards/margins": 0.4996967315673828,
969
+ "rewards/rejected": -0.4865003228187561,
970
+ "step": 640
971
+ },
972
+ {
973
+ "epoch": 5.201207243460765,
974
+ "grad_norm": 4.8590216636657715,
975
+ "learning_rate": 2.6933333333333335e-06,
976
+ "logits/chosen": -7.682847499847412,
977
+ "logits/rejected": -7.851078987121582,
978
+ "logps/chosen": -128.17315673828125,
979
+ "logps/rejected": -143.27740478515625,
980
+ "loss": 0.4639,
981
+ "rewards/accuracies": 0.925000011920929,
982
+ "rewards/chosen": 0.1449861079454422,
983
+ "rewards/margins": 0.6247243881225586,
984
+ "rewards/rejected": -0.4797382950782776,
985
+ "step": 650
986
+ },
987
+ {
988
+ "epoch": 5.28169014084507,
989
+ "grad_norm": 4.140720367431641,
990
+ "learning_rate": 2.426666666666667e-06,
991
+ "logits/chosen": -8.153270721435547,
992
+ "logits/rejected": -7.843749046325684,
993
+ "logps/chosen": -145.75784301757812,
994
+ "logps/rejected": -151.5276641845703,
995
+ "loss": 0.4924,
996
+ "rewards/accuracies": 0.800000011920929,
997
+ "rewards/chosen": -0.024539103731513023,
998
+ "rewards/margins": 0.5701876878738403,
999
+ "rewards/rejected": -0.5947268009185791,
1000
+ "step": 660
1001
+ },
1002
+ {
1003
+ "epoch": 5.362173038229376,
1004
+ "grad_norm": 3.363086223602295,
1005
+ "learning_rate": 2.16e-06,
1006
+ "logits/chosen": -7.592160701751709,
1007
+ "logits/rejected": -8.117759704589844,
1008
+ "logps/chosen": -136.26693725585938,
1009
+ "logps/rejected": -131.3207550048828,
1010
+ "loss": 0.5143,
1011
+ "rewards/accuracies": 0.800000011920929,
1012
+ "rewards/chosen": 0.13678143918514252,
1013
+ "rewards/margins": 0.4933203160762787,
1014
+ "rewards/rejected": -0.356538861989975,
1015
+ "step": 670
1016
+ },
1017
+ {
1018
+ "epoch": 5.442655935613682,
1019
+ "grad_norm": 4.244340896606445,
1020
+ "learning_rate": 1.8933333333333333e-06,
1021
+ "logits/chosen": -7.977016448974609,
1022
+ "logits/rejected": -7.939810276031494,
1023
+ "logps/chosen": -135.91845703125,
1024
+ "logps/rejected": -139.4970703125,
1025
+ "loss": 0.4766,
1026
+ "rewards/accuracies": 0.824999988079071,
1027
+ "rewards/chosen": 0.14392784237861633,
1028
+ "rewards/margins": 0.6197702884674072,
1029
+ "rewards/rejected": -0.4758424162864685,
1030
+ "step": 680
1031
+ },
1032
+ {
1033
+ "epoch": 5.523138832997988,
1034
+ "grad_norm": 4.6983747482299805,
1035
+ "learning_rate": 1.6266666666666666e-06,
1036
+ "logits/chosen": -8.18427848815918,
1037
+ "logits/rejected": -8.351648330688477,
1038
+ "logps/chosen": -122.5636978149414,
1039
+ "logps/rejected": -129.7288360595703,
1040
+ "loss": 0.5299,
1041
+ "rewards/accuracies": 0.7749999761581421,
1042
+ "rewards/chosen": 0.012566042132675648,
1043
+ "rewards/margins": 0.44475775957107544,
1044
+ "rewards/rejected": -0.43219175934791565,
1045
+ "step": 690
1046
+ },
1047
+ {
1048
+ "epoch": 5.603621730382294,
1049
+ "grad_norm": 3.8425047397613525,
1050
+ "learning_rate": 1.3600000000000001e-06,
1051
+ "logits/chosen": -8.02586555480957,
1052
+ "logits/rejected": -7.971312046051025,
1053
+ "logps/chosen": -128.48104858398438,
1054
+ "logps/rejected": -124.56768798828125,
1055
+ "loss": 0.5179,
1056
+ "rewards/accuracies": 0.7250000238418579,
1057
+ "rewards/chosen": 0.08885373175144196,
1058
+ "rewards/margins": 0.564620316028595,
1059
+ "rewards/rejected": -0.4757665991783142,
1060
+ "step": 700
1061
+ },
1062
+ {
1063
+ "epoch": 5.684104627766599,
1064
+ "grad_norm": 4.374882221221924,
1065
+ "learning_rate": 1.0933333333333334e-06,
1066
+ "logits/chosen": -7.965134620666504,
1067
+ "logits/rejected": -7.833965301513672,
1068
+ "logps/chosen": -131.6035919189453,
1069
+ "logps/rejected": -149.52066040039062,
1070
+ "loss": 0.5213,
1071
+ "rewards/accuracies": 0.824999988079071,
1072
+ "rewards/chosen": -0.04283960908651352,
1073
+ "rewards/margins": 0.503414511680603,
1074
+ "rewards/rejected": -0.5462541580200195,
1075
+ "step": 710
1076
+ },
1077
+ {
1078
+ "epoch": 5.7645875251509056,
1079
+ "grad_norm": 5.414626598358154,
1080
+ "learning_rate": 8.266666666666668e-07,
1081
+ "logits/chosen": -7.947386264801025,
1082
+ "logits/rejected": -8.022677421569824,
1083
+ "logps/chosen": -120.8558120727539,
1084
+ "logps/rejected": -135.9299774169922,
1085
+ "loss": 0.4254,
1086
+ "rewards/accuracies": 0.925000011920929,
1087
+ "rewards/chosen": 0.11365139484405518,
1088
+ "rewards/margins": 0.7651789784431458,
1089
+ "rewards/rejected": -0.6515275239944458,
1090
+ "step": 720
1091
+ },
1092
+ {
1093
+ "epoch": 5.845070422535211,
1094
+ "grad_norm": 2.958667278289795,
1095
+ "learning_rate": 5.6e-07,
1096
+ "logits/chosen": -7.740857124328613,
1097
+ "logits/rejected": -7.85732364654541,
1098
+ "logps/chosen": -140.94491577148438,
1099
+ "logps/rejected": -145.21192932128906,
1100
+ "loss": 0.4625,
1101
+ "rewards/accuracies": 0.8999999761581421,
1102
+ "rewards/chosen": 0.13008669018745422,
1103
+ "rewards/margins": 0.6117427349090576,
1104
+ "rewards/rejected": -0.4816560745239258,
1105
+ "step": 730
1106
+ },
1107
+ {
1108
+ "epoch": 5.925553319919517,
1109
+ "grad_norm": 3.8819665908813477,
1110
+ "learning_rate": 2.9333333333333337e-07,
1111
+ "logits/chosen": -7.447596549987793,
1112
+ "logits/rejected": -7.661311149597168,
1113
+ "logps/chosen": -119.31522369384766,
1114
+ "logps/rejected": -155.86141967773438,
1115
+ "loss": 0.4112,
1116
+ "rewards/accuracies": 0.949999988079071,
1117
+ "rewards/chosen": 0.2565927505493164,
1118
+ "rewards/margins": 0.7656124234199524,
1119
+ "rewards/rejected": -0.509019672870636,
1120
+ "step": 740
1121
+ },
1122
+ {
1123
+ "epoch": 6.0,
1124
+ "grad_norm": 2.323993444442749,
1125
+ "learning_rate": 2.6666666666666667e-08,
1126
+ "logits/chosen": -7.972095489501953,
1127
+ "logits/rejected": -7.644292831420898,
1128
+ "logps/chosen": -119.33430480957031,
1129
+ "logps/rejected": -137.5045928955078,
1130
+ "loss": 0.4655,
1131
+ "rewards/accuracies": 0.7567567825317383,
1132
+ "rewards/chosen": 0.04585576057434082,
1133
+ "rewards/margins": 0.5475000143051147,
1134
+ "rewards/rejected": -0.5016443133354187,
1135
+ "step": 750
1136
+ }
1137
+ ],
1138
+ "logging_steps": 10,
1139
+ "max_steps": 750,
1140
+ "num_input_tokens_seen": 0,
1141
+ "num_train_epochs": 7,
1142
+ "save_steps": 500,
1143
+ "stateful_callbacks": {
1144
+ "TrainerControl": {
1145
+ "args": {
1146
+ "should_epoch_stop": false,
1147
+ "should_evaluate": false,
1148
+ "should_log": false,
1149
+ "should_save": true,
1150
+ "should_training_stop": true
1151
+ },
1152
+ "attributes": {}
1153
+ }
1154
+ },
1155
+ "total_flos": 0.0,
1156
+ "train_batch_size": 1,
1157
+ "trial_name": null,
1158
+ "trial_params": null
1159
+ }
checkpoint-750/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:221328b488cbd7c8001fec74cbc28a889d631151b99e80fa9d7de1e2595f7246
3
+ size 6200
runs/Apr18_06-26-04_81a10bb95825/events.out.tfevents.1744957564.81a10bb95825.14299.0 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c2dc45b7117e3c4454389fc7f22e33e8fb9be5d9d4259d13f9ad7c9bdf7735d6
3
+ size 57765
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": true,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": true,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": true,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ab1b681ec7fc02fed5edd3026687d7a692a918c4dd8e150ca2e3994a6229843b
3
+ size 534194
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": true,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": true,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": true,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "bos_token": "<s>",
32
+ "clean_up_tokenization_spaces": false,
33
+ "eos_token": "</s>",
34
+ "extra_special_tokens": {},
35
+ "legacy": true,
36
+ "model_max_length": 2048,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }