lucian-li commited on
Commit
feae91a
·
verified ·
1 Parent(s): 58520dd

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,1192 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:583058
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: Alibaba-NLP/gte-multilingual-base
10
+ widget:
11
+ - source_sentence: 'Pre-Emphasis (PE)
12
+
13
+ A pre-emphasis filter is applied to the framed offset-free input signal:
14
+
15
+
16
+ )1
17
+
18
+ ('
19
+ sentences:
20
+ - 'Windowing (W)
21
+
22
+ A Hamming window of length N is applied to the output of the pre-emphasis block:
23
+
24
+
25
+ (
26
+
27
+ )
28
+
29
+ N
30
+
31
+ n
32
+
33
+ n
34
+
35
+ s
36
+
37
+ N
38
+
39
+ n
40
+
41
+ n
42
+
43
+ s
44
+
45
+ pe
46
+
47
+ w
48
+
49
+
50
+
51
+
52
+
53
+ ×
54
+
55
+
56
+
57
+
58
+
59
+
60
+
61
+
62
+
63
+
64
+
65
+
66
+
67
+
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+
78
+
79
+
80
+
81
+
82
+
83
+ ×
84
+
85
+
86
+
87
+ ='
88
+ - 'Group or broadcast call, called mobile stations (GSM only)
89
+
90
+ Within each set of voice group call or voice broadcast call attributes stored
91
+ in the GCR as defined in 3GPP TS 43.068
92
+
93
+ and 3GPP TS 43.069, respectively, a priority level is included if eMLPP is applied.
94
+ The priority level will be provided
95
+
96
+ by the GCR to the MSC together with the call attributes.
97
+
98
+ The priority level shall be indicated together with the related notification messages
99
+ and treated in the mobile station as
100
+
101
+ defined in 3GPP TS 43.0'
102
+ - 'Description of the access technology indicator mechanism
103
+
104
+ This clause describes the mechanisms that can be employed to indicate access technology
105
+ specific dependencies in a
106
+
107
+ multi-access technology environment.
108
+
109
+ There are cases where toolkit applications need to know which access technology
110
+ the terminal is currently in so that it
111
+
112
+ can issue access technology dependent commands as well as determine that the response
113
+ to a particular command is
114
+
115
+ technology dependent. Setting up the event, ACCESS TECHNOL'
116
+ - source_sentence: 'Distribution of DL delay between NG-RAN and UE
117
+
118
+ a) This measurement provides the distribution of DL packet delay between NG-RAN
119
+ and UE, which is the delay
120
+
121
+ incurred in NG-RAN (including the delay at gNB-CU-UP, on F1-U and on gNB-DU) and
122
+ the delay over Uu
123
+
124
+ interface. This measurement is split into subcounters per 5QI and subcounters
125
+ per S-NSSAI.
126
+
127
+ b) DER (n=1).
128
+
129
+
130
+ ETSI
131
+
132
+ ETSI TS 128 552 V16.18.0 (2024-08)'
133
+ sentences:
134
+ - 'Distribution of UL delay between NG-RAN and UE
135
+
136
+ a) This measurement provides the distribution of UL packet delay between NG-RAN
137
+ and UE, which is the delay
138
+
139
+ incurred in NG-RAN (including the delay at gNB-CU-UP, on F1-U and on gNB-DU) and
140
+ the delay over Uu
141
+
142
+ interface. This measurement is split into subcounters per 5QI and subcounters
143
+ per S-NSSAI.
144
+
145
+ b) DER (n=1).
146
+
147
+ c) The measurement is obtained by the following method:
148
+
149
+
150
+ The gNB performs the GTP PDU packet delay measurement for QoS monitoring per the
151
+ GTP '
152
+ - 'Subscriber data
153
+
154
+ Subscription to MExE services shall be logically separate to subscription of network
155
+ services. A subscriber may have a
156
+
157
+ MExE subscription to multiple MExE service providers. It may also be possible
158
+ for the subscriber to interrogate such
159
+
160
+ subscription registration (with a suitable means of authorisation), depending
161
+ on PLMN support.'
162
+ - 'MSC for LMU Control
163
+
164
+ When a control message has to be routed to an LMU from an SMLC, the SMLC addresses
165
+ the serving MSC for the
166
+
167
+ LMU using an E.164 address.
168
+
169
+
170
+ ETSI
171
+
172
+ ETSI TS 129 002 V10.6.0 (2012-04)'
173
+ - source_sentence: 'Enter SMS Block Mode Protocol +CESP
174
+
175
+ Table 3.2.4-1: +CESP Action Command Syntax
176
+
177
+ Command
178
+
179
+ Possible response(s)
180
+
181
+ +CESP
182
+
183
+
184
+ +CESP=?
185
+
186
+
187
+
188
+ Description
189
+
190
+ Execution command sets the TA in SMS block protocol mode. The TA shall return
191
+ OK (or 0) to confirm acceptance of
192
+
193
+ the command prior to entering the block mode (see clause 2.1.1). The final result
194
+ code OK (or 0) shall be returned when
195
+
196
+ the block mode is exited.
197
+
198
+ NOTE:
199
+
200
+ Commands following +CESP in the AT command line must not be processed by the TA.
201
+
202
+ Implementation
203
+
204
+ Ma'
205
+ sentences:
206
+ - 'SGSN
207
+
208
+ To support NBIFOM, the SGSN needs to be capable to:
209
+
210
+
211
+ ETSI
212
+
213
+ ETSI TS 123 161 V14.0.0 (2017-05)'
214
+ - 'Message Service Failure Result Code +CMS ERROR
215
+
216
+ Final result code +CMS ERROR: <err> indicates an error related to mobile equipment
217
+ or network. The operation is
218
+
219
+ similar to ERROR final result code. None of the following commands in the same
220
+ command line is executed. Neither
221
+
222
+ ERROR nor OK final result code shall be returned. ERROR is returned normally when
223
+ error is related to syntax or invalid
224
+
225
+ parameters.
226
+
227
+ Defined Values
228
+
229
+ <err> values used by common messaging commands:'
230
+ - 'C
231
+
232
+ C
233
+
234
+ -
235
+
236
+ -
237
+
238
+ P
239
+
240
+ Service Priority Level'
241
+ - source_sentence: 'Definition
242
+
243
+ Cell synchronization accuracy is defined as the maximum deviation in frame start
244
+ times between any pair of cells on the
245
+
246
+ same frequency that have overlapping coverage areas.'
247
+ sentences:
248
+ - 'Minimum requirements
249
+
250
+ The cell synchronization accuracy shall be better than or equal to 3μs.'
251
+ - "Subsequent Inter-MSC Handover to third MSC\nWhen a Mobile Station is being handed\
252
+ \ over to a third MSC, the procedure (described in GSM 03.09)\ndoes require one\
253
+ \ specific interworking case in MSC-A (figure 20) between E-Interface from MSC-B\
254
+ \ and E-\nInterface from MSC-B' other than the combination of the ones described\
255
+ \ in the chapter 4.5.1 and 4.5.2.\n%66\x10$\x03\x03\x03\x03\x03\x0306&\x10%\x03\
256
+ \x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x0306&\x10$\x03\
257
+ \x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x0306&\x10%\n_\x03\x03\x03\x03\x03\x03\
258
+ \x03\x03\x03\x03\x03\x03_\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\
259
+ \x03\x03\x03\x03\x03\x03\x03\x03_\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\
260
+ \x03\x03\x03\x03_\n_+$1'29(5\x03\x03\x03\x03_\x03\x03\x03\x03\x03\x03\x03\x03\x03\
261
+ \x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03\x03_\x03\x03\x03\x03"
262
+ - 'DL Total PRB Usage
263
+
264
+ a) This measurement provides the total usage (in percentage) of physical resource
265
+ blocks (PRBs) on the downlink
266
+
267
+ for any purpose.
268
+
269
+ b) SI
270
+
271
+ c) This measurement is obtained as:
272
+
273
+ 
274
+
275
+
276
+
277
+ 
278
+
279
+
280
+
281
+
282
+
283
+ ='
284
+ - source_sentence: Carrier aggregation measurement accuracy
285
+ sentences:
286
+ - 'PUCCH / PUSCH / SRS time mask
287
+
288
+ The PUCCH/PUSCH/SRS time mask defines the observation period between sounding
289
+ reference symbol (SRS) and an
290
+
291
+ adjacent PUSCH/PUCCH symbol and subsequent sub-frame.
292
+
293
+ There are no additional requirements on UE transmit power beyond that which is
294
+ required in subclause 6.2.2 and
295
+
296
+ subclause 6.6.2.3
297
+
298
+
299
+ ETSI
300
+
301
+ ETSI TS 136 101 V9.16.0 (2013-07)'
302
+ - 'Reference Signal Time Difference (RSTD) Measurement Accuracy
303
+
304
+ Requirements for Carrier Aggregation
305
+
306
+ A.8
307
+
308
+ UE Measurements Procedures
309
+
310
+ A.9
311
+
312
+ Measurement Performance Requirements
313
+
314
+ NOTE:
315
+
316
+ Only requirements and test cases in this table defined for inter-band carrier
317
+ aggregation shall apply.
318
+
319
+
320
+
321
+ ETSI
322
+
323
+ ETSI TS 136 307 V10.17.0 (2016-01)'
324
+ - 'Operator control
325
+
326
+ Three general architectures are candidates to offer energy savings functionalities:
327
+
328
+ Distributed, NM-Centralized, EM-Centralized as defined in TS 32.500 [6].
329
+
330
+ Energy savings in cells can be initiated in several different ways. Some of the
331
+ mechanisms are:
332
+
333
+ For NM-centralized architecture
334
+
335
+ -
336
+
337
+ IRPManager instructs the cells to move to energySaving state (e.g. according to
338
+ a schedule determined by
339
+
340
+ network statistics) , configures trigger points (e.g. load threshold crossing)
341
+ when it want'
342
+ pipeline_tag: sentence-similarity
343
+ library_name: sentence-transformers
344
+ ---
345
+
346
+ # SentenceTransformer based on Alibaba-NLP/gte-multilingual-base
347
+
348
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
349
+
350
+ ## Model Details
351
+
352
+ ### Model Description
353
+ - **Model Type:** Sentence Transformer
354
+ - **Base model:** [Alibaba-NLP/gte-multilingual-base](https://huggingface.co/Alibaba-NLP/gte-multilingual-base) <!-- at revision 9fdd4ee8bba0e2808a34e0e739576f6740d2b225 -->
355
+ - **Maximum Sequence Length:** 8192 tokens
356
+ - **Output Dimensionality:** 768 dimensions
357
+ - **Similarity Function:** Cosine Similarity
358
+ <!-- - **Training Dataset:** Unknown -->
359
+ <!-- - **Language:** Unknown -->
360
+ <!-- - **License:** Unknown -->
361
+
362
+ ### Model Sources
363
+
364
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
365
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
366
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
367
+
368
+ ### Full Model Architecture
369
+
370
+ ```
371
+ SentenceTransformer(
372
+ (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel
373
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
374
+ (2): Normalize()
375
+ )
376
+ ```
377
+
378
+ ## Usage
379
+
380
+ ### Direct Usage (Sentence Transformers)
381
+
382
+ First install the Sentence Transformers library:
383
+
384
+ ```bash
385
+ pip install -U sentence-transformers
386
+ ```
387
+
388
+ Then you can load this model and run inference.
389
+ ```python
390
+ from sentence_transformers import SentenceTransformer
391
+
392
+ # Download from the 🤗 Hub
393
+ model = SentenceTransformer("lucian-li/my_new_model")
394
+ # Run inference
395
+ sentences = [
396
+ 'Carrier aggregation measurement accuracy',
397
+ 'Reference Signal Time Difference (RSTD) Measurement Accuracy\nRequirements for Carrier Aggregation\nA.8\nUE Measurements Procedures\nA.9\nMeasurement Performance Requirements\nNOTE:\nOnly requirements and test cases in this table defined for inter-band carrier aggregation shall apply.\n\n\nETSI\nETSI TS 136 307 V10.17.0 (2016-01)',
398
+ 'Operator control\nThree general architectures are candidates to offer energy savings functionalities:\nDistributed, NM-Centralized, EM-Centralized as defined in TS 32.500 [6].\nEnergy savings in cells can be initiated in several different ways. Some of the mechanisms are:\nFor NM-centralized architecture\n-\nIRPManager instructs the cells to move to energySaving state (e.g. according to a schedule determined by\nnetwork statistics) , configures trigger points (e.g. load threshold crossing) when it want',
399
+ ]
400
+ embeddings = model.encode(sentences)
401
+ print(embeddings.shape)
402
+ # [3, 768]
403
+
404
+ # Get the similarity scores for the embeddings
405
+ similarities = model.similarity(embeddings, embeddings)
406
+ print(similarities.shape)
407
+ # [3, 3]
408
+ ```
409
+
410
+ <!--
411
+ ### Direct Usage (Transformers)
412
+
413
+ <details><summary>Click to see the direct usage in Transformers</summary>
414
+
415
+ </details>
416
+ -->
417
+
418
+ <!--
419
+ ### Downstream Usage (Sentence Transformers)
420
+
421
+ You can finetune this model on your own dataset.
422
+
423
+ <details><summary>Click to expand</summary>
424
+
425
+ </details>
426
+ -->
427
+
428
+ <!--
429
+ ### Out-of-Scope Use
430
+
431
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
432
+ -->
433
+
434
+ <!--
435
+ ## Bias, Risks and Limitations
436
+
437
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
438
+ -->
439
+
440
+ <!--
441
+ ### Recommendations
442
+
443
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
444
+ -->
445
+
446
+ ## Training Details
447
+
448
+ ### Training Dataset
449
+
450
+ #### Unnamed Dataset
451
+
452
+ * Size: 583,058 training samples
453
+ * Columns: <code>anchor</code> and <code>positive</code>
454
+ * Approximate statistics based on the first 1000 samples:
455
+ | | anchor | positive |
456
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
457
+ | type | string | string |
458
+ | details | <ul><li>min: 7 tokens</li><li>mean: 85.73 tokens</li><li>max: 229 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 85.86 tokens</li><li>max: 229 tokens</li></ul> |
459
+ * Samples:
460
+ | anchor | positive |
461
+ |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
462
+ | <code>Triggering Optimization Function (TG_F)<br>This functional bloc supports the following functions: [SO2], [SO3].</code> | <code>Optimization Fallback Function (O_FB_F)<br>This functional bloc supports the following functions: [SO7], [SO9], [SO10].</code> |
463
+ | <code>Optimization Fallback Function (O_FB_F)<br>This functional bloc supports the following functions: [SO7], [SO9], [SO10].</code> | <code>Self-Optimization Progress Update Function (SO_PGS_UF)<br>This function updates the self-optimization progress and important events to the operator: [SO11]</code> |
464
+ | <code>Self-Optimization Progress Update Function (SO_PGS_UF)<br>This function updates the self-optimization progress and important events to the operator: [SO11]</code> | <code>NRM IRP Update Function (NRM_UF)<br>This function updates the E-UTRAN and EPC NRM IRP with the optimization modification if needed.</code> |
465
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
466
+ ```json
467
+ {
468
+ "scale": 20.0,
469
+ "similarity_fct": "cos_sim"
470
+ }
471
+ ```
472
+
473
+ ### Training Hyperparameters
474
+ #### Non-Default Hyperparameters
475
+
476
+ - `per_device_train_batch_size`: 11
477
+ - `num_train_epochs`: 1
478
+ - `warmup_ratio`: 0.1
479
+
480
+ #### All Hyperparameters
481
+ <details><summary>Click to expand</summary>
482
+
483
+ - `overwrite_output_dir`: False
484
+ - `do_predict`: False
485
+ - `eval_strategy`: no
486
+ - `prediction_loss_only`: True
487
+ - `per_device_train_batch_size`: 11
488
+ - `per_device_eval_batch_size`: 8
489
+ - `per_gpu_train_batch_size`: None
490
+ - `per_gpu_eval_batch_size`: None
491
+ - `gradient_accumulation_steps`: 1
492
+ - `eval_accumulation_steps`: None
493
+ - `torch_empty_cache_steps`: None
494
+ - `learning_rate`: 5e-05
495
+ - `weight_decay`: 0.0
496
+ - `adam_beta1`: 0.9
497
+ - `adam_beta2`: 0.999
498
+ - `adam_epsilon`: 1e-08
499
+ - `max_grad_norm`: 1.0
500
+ - `num_train_epochs`: 1
501
+ - `max_steps`: -1
502
+ - `lr_scheduler_type`: linear
503
+ - `lr_scheduler_kwargs`: {}
504
+ - `warmup_ratio`: 0.1
505
+ - `warmup_steps`: 0
506
+ - `log_level`: passive
507
+ - `log_level_replica`: warning
508
+ - `log_on_each_node`: True
509
+ - `logging_nan_inf_filter`: True
510
+ - `save_safetensors`: True
511
+ - `save_on_each_node`: False
512
+ - `save_only_model`: False
513
+ - `restore_callback_states_from_checkpoint`: False
514
+ - `no_cuda`: False
515
+ - `use_cpu`: False
516
+ - `use_mps_device`: False
517
+ - `seed`: 42
518
+ - `data_seed`: None
519
+ - `jit_mode_eval`: False
520
+ - `use_ipex`: False
521
+ - `bf16`: False
522
+ - `fp16`: False
523
+ - `fp16_opt_level`: O1
524
+ - `half_precision_backend`: auto
525
+ - `bf16_full_eval`: False
526
+ - `fp16_full_eval`: False
527
+ - `tf32`: None
528
+ - `local_rank`: 0
529
+ - `ddp_backend`: None
530
+ - `tpu_num_cores`: None
531
+ - `tpu_metrics_debug`: False
532
+ - `debug`: []
533
+ - `dataloader_drop_last`: False
534
+ - `dataloader_num_workers`: 0
535
+ - `dataloader_prefetch_factor`: None
536
+ - `past_index`: -1
537
+ - `disable_tqdm`: False
538
+ - `remove_unused_columns`: True
539
+ - `label_names`: None
540
+ - `load_best_model_at_end`: False
541
+ - `ignore_data_skip`: False
542
+ - `fsdp`: []
543
+ - `fsdp_min_num_params`: 0
544
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
545
+ - `tp_size`: 0
546
+ - `fsdp_transformer_layer_cls_to_wrap`: None
547
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
548
+ - `deepspeed`: None
549
+ - `label_smoothing_factor`: 0.0
550
+ - `optim`: adamw_torch
551
+ - `optim_args`: None
552
+ - `adafactor`: False
553
+ - `group_by_length`: False
554
+ - `length_column_name`: length
555
+ - `ddp_find_unused_parameters`: None
556
+ - `ddp_bucket_cap_mb`: None
557
+ - `ddp_broadcast_buffers`: False
558
+ - `dataloader_pin_memory`: True
559
+ - `dataloader_persistent_workers`: False
560
+ - `skip_memory_metrics`: True
561
+ - `use_legacy_prediction_loop`: False
562
+ - `push_to_hub`: False
563
+ - `resume_from_checkpoint`: None
564
+ - `hub_model_id`: None
565
+ - `hub_strategy`: every_save
566
+ - `hub_private_repo`: None
567
+ - `hub_always_push`: False
568
+ - `gradient_checkpointing`: False
569
+ - `gradient_checkpointing_kwargs`: None
570
+ - `include_inputs_for_metrics`: False
571
+ - `include_for_metrics`: []
572
+ - `eval_do_concat_batches`: True
573
+ - `fp16_backend`: auto
574
+ - `push_to_hub_model_id`: None
575
+ - `push_to_hub_organization`: None
576
+ - `mp_parameters`:
577
+ - `auto_find_batch_size`: False
578
+ - `full_determinism`: False
579
+ - `torchdynamo`: None
580
+ - `ray_scope`: last
581
+ - `ddp_timeout`: 1800
582
+ - `torch_compile`: False
583
+ - `torch_compile_backend`: None
584
+ - `torch_compile_mode`: None
585
+ - `include_tokens_per_second`: False
586
+ - `include_num_input_tokens_seen`: False
587
+ - `neftune_noise_alpha`: None
588
+ - `optim_target_modules`: None
589
+ - `batch_eval_metrics`: False
590
+ - `eval_on_start`: False
591
+ - `use_liger_kernel`: False
592
+ - `eval_use_gather_object`: False
593
+ - `average_tokens_across_devices`: False
594
+ - `prompts`: None
595
+ - `batch_sampler`: batch_sampler
596
+ - `multi_dataset_batch_sampler`: proportional
597
+
598
+ </details>
599
+
600
+ ### Training Logs
601
+ <details><summary>Click to expand</summary>
602
+
603
+ | Epoch | Step | Training Loss |
604
+ |:------:|:-----:|:-------------:|
605
+ | 0.0019 | 100 | 0.8198 |
606
+ | 0.0038 | 200 | 0.7651 |
607
+ | 0.0057 | 300 | 0.6659 |
608
+ | 0.0075 | 400 | 0.6404 |
609
+ | 0.0094 | 500 | 0.5638 |
610
+ | 0.0113 | 600 | 0.5184 |
611
+ | 0.0132 | 700 | 0.448 |
612
+ | 0.0151 | 800 | 0.4464 |
613
+ | 0.0170 | 900 | 0.3461 |
614
+ | 0.0189 | 1000 | 0.3731 |
615
+ | 0.0208 | 1100 | 0.343 |
616
+ | 0.0226 | 1200 | 0.3557 |
617
+ | 0.0245 | 1300 | 0.3623 |
618
+ | 0.0264 | 1400 | 0.2941 |
619
+ | 0.0283 | 1500 | 0.3153 |
620
+ | 0.0302 | 1600 | 0.2724 |
621
+ | 0.0321 | 1700 | 0.2702 |
622
+ | 0.0340 | 1800 | 0.2934 |
623
+ | 0.0358 | 1900 | 0.2255 |
624
+ | 0.0377 | 2000 | 0.2519 |
625
+ | 0.0396 | 2100 | 0.2424 |
626
+ | 0.0415 | 2200 | 0.1883 |
627
+ | 0.0434 | 2300 | 0.2428 |
628
+ | 0.0453 | 2400 | 0.2212 |
629
+ | 0.0472 | 2500 | 0.1862 |
630
+ | 0.0491 | 2600 | 0.2451 |
631
+ | 0.0509 | 2700 | 0.2336 |
632
+ | 0.0528 | 2800 | 0.225 |
633
+ | 0.0547 | 2900 | 0.2154 |
634
+ | 0.0566 | 3000 | 0.1907 |
635
+ | 0.0585 | 3100 | 0.2514 |
636
+ | 0.0604 | 3200 | 0.2082 |
637
+ | 0.0623 | 3300 | 0.2076 |
638
+ | 0.0641 | 3400 | 0.1818 |
639
+ | 0.0660 | 3500 | 0.1688 |
640
+ | 0.0679 | 3600 | 0.2261 |
641
+ | 0.0698 | 3700 | 0.2108 |
642
+ | 0.0717 | 3800 | 0.1732 |
643
+ | 0.0736 | 3900 | 0.1764 |
644
+ | 0.0755 | 4000 | 0.1481 |
645
+ | 0.0773 | 4100 | 0.1687 |
646
+ | 0.0792 | 4200 | 0.1897 |
647
+ | 0.0811 | 4300 | 0.1685 |
648
+ | 0.0830 | 4400 | 0.1915 |
649
+ | 0.0849 | 4500 | 0.2013 |
650
+ | 0.0868 | 4600 | 0.1701 |
651
+ | 0.0887 | 4700 | 0.2006 |
652
+ | 0.0906 | 4800 | 0.2006 |
653
+ | 0.0924 | 4900 | 0.1617 |
654
+ | 0.0943 | 5000 | 0.1406 |
655
+ | 0.0962 | 5100 | 0.1456 |
656
+ | 0.0981 | 5200 | 0.1703 |
657
+ | 0.1000 | 5300 | 0.1464 |
658
+ | 0.1019 | 5400 | 0.1803 |
659
+ | 0.1038 | 5500 | 0.1346 |
660
+ | 0.1056 | 5600 | 0.134 |
661
+ | 0.1075 | 5700 | 0.1567 |
662
+ | 0.1094 | 5800 | 0.163 |
663
+ | 0.1113 | 5900 | 0.1544 |
664
+ | 0.1132 | 6000 | 0.1648 |
665
+ | 0.1151 | 6100 | 0.1505 |
666
+ | 0.1170 | 6200 | 0.1231 |
667
+ | 0.1189 | 6300 | 0.1591 |
668
+ | 0.1207 | 6400 | 0.1533 |
669
+ | 0.1226 | 6500 | 0.1376 |
670
+ | 0.1245 | 6600 | 0.1473 |
671
+ | 0.1264 | 6700 | 0.1405 |
672
+ | 0.1283 | 6800 | 0.141 |
673
+ | 0.1302 | 6900 | 0.1105 |
674
+ | 0.1321 | 7000 | 0.1712 |
675
+ | 0.1339 | 7100 | 0.1534 |
676
+ | 0.1358 | 7200 | 0.1578 |
677
+ | 0.1377 | 7300 | 0.1101 |
678
+ | 0.1396 | 7400 | 0.128 |
679
+ | 0.1415 | 7500 | 0.1679 |
680
+ | 0.1434 | 7600 | 0.1592 |
681
+ | 0.1453 | 7700 | 0.1383 |
682
+ | 0.1472 | 7800 | 0.1274 |
683
+ | 0.1490 | 7900 | 0.1616 |
684
+ | 0.1509 | 8000 | 0.1617 |
685
+ | 0.1528 | 8100 | 0.1361 |
686
+ | 0.1547 | 8200 | 0.1268 |
687
+ | 0.1566 | 8300 | 0.1286 |
688
+ | 0.1585 | 8400 | 0.1253 |
689
+ | 0.1604 | 8500 | 0.1157 |
690
+ | 0.1622 | 8600 | 0.1499 |
691
+ | 0.1641 | 8700 | 0.1398 |
692
+ | 0.1660 | 8800 | 0.1188 |
693
+ | 0.1679 | 8900 | 0.1103 |
694
+ | 0.1698 | 9000 | 0.1217 |
695
+ | 0.1717 | 9100 | 0.1144 |
696
+ | 0.1736 | 9200 | 0.1203 |
697
+ | 0.1755 | 9300 | 0.1074 |
698
+ | 0.1773 | 9400 | 0.1145 |
699
+ | 0.1792 | 9500 | 0.1035 |
700
+ | 0.1811 | 9600 | 0.1406 |
701
+ | 0.1830 | 9700 | 0.1465 |
702
+ | 0.1849 | 9800 | 0.1169 |
703
+ | 0.1868 | 9900 | 0.1115 |
704
+ | 0.1887 | 10000 | 0.1207 |
705
+ | 0.1905 | 10100 | 0.1191 |
706
+ | 0.1924 | 10200 | 0.1099 |
707
+ | 0.1943 | 10300 | 0.1309 |
708
+ | 0.1962 | 10400 | 0.1092 |
709
+ | 0.1981 | 10500 | 0.1075 |
710
+ | 0.2000 | 10600 | 0.1174 |
711
+ | 0.2019 | 10700 | 0.1103 |
712
+ | 0.2038 | 10800 | 0.1077 |
713
+ | 0.2056 | 10900 | 0.0844 |
714
+ | 0.2075 | 11000 | 0.1093 |
715
+ | 0.2094 | 11100 | 0.1428 |
716
+ | 0.2113 | 11200 | 0.0928 |
717
+ | 0.2132 | 11300 | 0.1039 |
718
+ | 0.2151 | 11400 | 0.1436 |
719
+ | 0.2170 | 11500 | 0.1197 |
720
+ | 0.2188 | 11600 | 0.1249 |
721
+ | 0.2207 | 11700 | 0.0856 |
722
+ | 0.2226 | 11800 | 0.1126 |
723
+ | 0.2245 | 11900 | 0.1028 |
724
+ | 0.2264 | 12000 | 0.0988 |
725
+ | 0.2283 | 12100 | 0.1031 |
726
+ | 0.2302 | 12200 | 0.101 |
727
+ | 0.2320 | 12300 | 0.1188 |
728
+ | 0.2339 | 12400 | 0.0908 |
729
+ | 0.2358 | 12500 | 0.069 |
730
+ | 0.2377 | 12600 | 0.1099 |
731
+ | 0.2396 | 12700 | 0.1227 |
732
+ | 0.2415 | 12800 | 0.0794 |
733
+ | 0.2434 | 12900 | 0.0969 |
734
+ | 0.2453 | 13000 | 0.0864 |
735
+ | 0.2471 | 13100 | 0.1193 |
736
+ | 0.2490 | 13200 | 0.0824 |
737
+ | 0.2509 | 13300 | 0.12 |
738
+ | 0.2528 | 13400 | 0.0928 |
739
+ | 0.2547 | 13500 | 0.1126 |
740
+ | 0.2566 | 13600 | 0.0912 |
741
+ | 0.2585 | 13700 | 0.1126 |
742
+ | 0.2603 | 13800 | 0.078 |
743
+ | 0.2622 | 13900 | 0.0715 |
744
+ | 0.2641 | 14000 | 0.1095 |
745
+ | 0.2660 | 14100 | 0.089 |
746
+ | 0.2679 | 14200 | 0.0926 |
747
+ | 0.2698 | 14300 | 0.086 |
748
+ | 0.2717 | 14400 | 0.1115 |
749
+ | 0.2736 | 14500 | 0.0996 |
750
+ | 0.2754 | 14600 | 0.1014 |
751
+ | 0.2773 | 14700 | 0.1033 |
752
+ | 0.2792 | 14800 | 0.0732 |
753
+ | 0.2811 | 14900 | 0.0994 |
754
+ | 0.2830 | 15000 | 0.0872 |
755
+ | 0.2849 | 15100 | 0.0923 |
756
+ | 0.2868 | 15200 | 0.111 |
757
+ | 0.2886 | 15300 | 0.0891 |
758
+ | 0.2905 | 15400 | 0.0868 |
759
+ | 0.2924 | 15500 | 0.0773 |
760
+ | 0.2943 | 15600 | 0.0918 |
761
+ | 0.2962 | 15700 | 0.0726 |
762
+ | 0.2981 | 15800 | 0.0951 |
763
+ | 0.3000 | 15900 | 0.0835 |
764
+ | 0.3019 | 16000 | 0.083 |
765
+ | 0.3037 | 16100 | 0.095 |
766
+ | 0.3056 | 16200 | 0.0722 |
767
+ | 0.3075 | 16300 | 0.1061 |
768
+ | 0.3094 | 16400 | 0.0902 |
769
+ | 0.3113 | 16500 | 0.0978 |
770
+ | 0.3132 | 16600 | 0.0983 |
771
+ | 0.3151 | 16700 | 0.0808 |
772
+ | 0.3169 | 16800 | 0.0758 |
773
+ | 0.3188 | 16900 | 0.071 |
774
+ | 0.3207 | 17000 | 0.0918 |
775
+ | 0.3226 | 17100 | 0.1011 |
776
+ | 0.3245 | 17200 | 0.079 |
777
+ | 0.3264 | 17300 | 0.0992 |
778
+ | 0.3283 | 17400 | 0.1089 |
779
+ | 0.3302 | 17500 | 0.0904 |
780
+ | 0.3320 | 17600 | 0.0956 |
781
+ | 0.3339 | 17700 | 0.0747 |
782
+ | 0.3358 | 17800 | 0.0961 |
783
+ | 0.3377 | 17900 | 0.0923 |
784
+ | 0.3396 | 18000 | 0.1114 |
785
+ | 0.3415 | 18100 | 0.0689 |
786
+ | 0.3434 | 18200 | 0.1308 |
787
+ | 0.3452 | 18300 | 0.0923 |
788
+ | 0.3471 | 18400 | 0.0756 |
789
+ | 0.3490 | 18500 | 0.0842 |
790
+ | 0.3509 | 18600 | 0.0859 |
791
+ | 0.3528 | 18700 | 0.0903 |
792
+ | 0.3547 | 18800 | 0.084 |
793
+ | 0.3566 | 18900 | 0.0923 |
794
+ | 0.3584 | 19000 | 0.0848 |
795
+ | 0.3603 | 19100 | 0.0812 |
796
+ | 0.3622 | 19200 | 0.0872 |
797
+ | 0.3641 | 19300 | 0.083 |
798
+ | 0.3660 | 19400 | 0.0826 |
799
+ | 0.3679 | 19500 | 0.101 |
800
+ | 0.3698 | 19600 | 0.0804 |
801
+ | 0.3717 | 19700 | 0.0676 |
802
+ | 0.3735 | 19800 | 0.0836 |
803
+ | 0.3754 | 19900 | 0.0711 |
804
+ | 0.3773 | 20000 | 0.0825 |
805
+ | 0.3792 | 20100 | 0.0835 |
806
+ | 0.3811 | 20200 | 0.0816 |
807
+ | 0.3830 | 20300 | 0.0812 |
808
+ | 0.3849 | 20400 | 0.0689 |
809
+ | 0.3867 | 20500 | 0.0627 |
810
+ | 0.3886 | 20600 | 0.0965 |
811
+ | 0.3905 | 20700 | 0.0632 |
812
+ | 0.3924 | 20800 | 0.0945 |
813
+ | 0.3943 | 20900 | 0.0923 |
814
+ | 0.3962 | 21000 | 0.0833 |
815
+ | 0.3981 | 21100 | 0.0537 |
816
+ | 0.4000 | 21200 | 0.0822 |
817
+ | 0.4018 | 21300 | 0.0684 |
818
+ | 0.4037 | 21400 | 0.0807 |
819
+ | 0.4056 | 21500 | 0.0945 |
820
+ | 0.4075 | 21600 | 0.0981 |
821
+ | 0.4094 | 21700 | 0.0748 |
822
+ | 0.4113 | 21800 | 0.0943 |
823
+ | 0.4132 | 21900 | 0.0709 |
824
+ | 0.4150 | 22000 | 0.0551 |
825
+ | 0.4169 | 22100 | 0.0679 |
826
+ | 0.4188 | 22200 | 0.0666 |
827
+ | 0.4207 | 22300 | 0.0976 |
828
+ | 0.4226 | 22400 | 0.0666 |
829
+ | 0.4245 | 22500 | 0.0651 |
830
+ | 0.4264 | 22600 | 0.0803 |
831
+ | 0.4283 | 22700 | 0.068 |
832
+ | 0.4301 | 22800 | 0.0541 |
833
+ | 0.4320 | 22900 | 0.0487 |
834
+ | 0.4339 | 23000 | 0.091 |
835
+ | 0.4358 | 23100 | 0.074 |
836
+ | 0.4377 | 23200 | 0.0733 |
837
+ | 0.4396 | 23300 | 0.0845 |
838
+ | 0.4415 | 23400 | 0.0823 |
839
+ | 0.4433 | 23500 | 0.0561 |
840
+ | 0.4452 | 23600 | 0.0508 |
841
+ | 0.4471 | 23700 | 0.074 |
842
+ | 0.4490 | 23800 | 0.0683 |
843
+ | 0.4509 | 23900 | 0.0797 |
844
+ | 0.4528 | 24000 | 0.0561 |
845
+ | 0.4547 | 24100 | 0.0744 |
846
+ | 0.4566 | 24200 | 0.0638 |
847
+ | 0.4584 | 24300 | 0.0633 |
848
+ | 0.4603 | 24400 | 0.062 |
849
+ | 0.4622 | 24500 | 0.0887 |
850
+ | 0.4641 | 24600 | 0.0908 |
851
+ | 0.4660 | 24700 | 0.0654 |
852
+ | 0.4679 | 24800 | 0.0522 |
853
+ | 0.4698 | 24900 | 0.0851 |
854
+ | 0.4716 | 25000 | 0.0763 |
855
+ | 0.4735 | 25100 | 0.0623 |
856
+ | 0.4754 | 25200 | 0.0712 |
857
+ | 0.4773 | 25300 | 0.0866 |
858
+ | 0.4792 | 25400 | 0.0812 |
859
+ | 0.4811 | 25500 | 0.0706 |
860
+ | 0.4830 | 25600 | 0.0734 |
861
+ | 0.4849 | 25700 | 0.068 |
862
+ | 0.4867 | 25800 | 0.111 |
863
+ | 0.4886 | 25900 | 0.0627 |
864
+ | 0.4905 | 26000 | 0.0459 |
865
+ | 0.4924 | 26100 | 0.0794 |
866
+ | 0.4943 | 26200 | 0.0547 |
867
+ | 0.4962 | 26300 | 0.0779 |
868
+ | 0.4981 | 26400 | 0.0609 |
869
+ | 0.4999 | 26500 | 0.0785 |
870
+ | 0.5018 | 26600 | 0.0722 |
871
+ | 0.5037 | 26700 | 0.0585 |
872
+ | 0.5056 | 26800 | 0.0572 |
873
+ | 0.5075 | 26900 | 0.0636 |
874
+ | 0.5094 | 27000 | 0.0642 |
875
+ | 0.5113 | 27100 | 0.0606 |
876
+ | 0.5131 | 27200 | 0.0725 |
877
+ | 0.5150 | 27300 | 0.0664 |
878
+ | 0.5169 | 27400 | 0.0933 |
879
+ | 0.5188 | 27500 | 0.0486 |
880
+ | 0.5207 | 27600 | 0.0514 |
881
+ | 0.5226 | 27700 | 0.0779 |
882
+ | 0.5245 | 27800 | 0.0614 |
883
+ | 0.5264 | 27900 | 0.0646 |
884
+ | 0.5282 | 28000 | 0.0606 |
885
+ | 0.5301 | 28100 | 0.0453 |
886
+ | 0.5320 | 28200 | 0.0749 |
887
+ | 0.5339 | 28300 | 0.0695 |
888
+ | 0.5358 | 28400 | 0.0897 |
889
+ | 0.5377 | 28500 | 0.0612 |
890
+ | 0.5396 | 28600 | 0.0542 |
891
+ | 0.5414 | 28700 | 0.0504 |
892
+ | 0.5433 | 28800 | 0.0539 |
893
+ | 0.5452 | 28900 | 0.0584 |
894
+ | 0.5471 | 29000 | 0.0552 |
895
+ | 0.5490 | 29100 | 0.076 |
896
+ | 0.5509 | 29200 | 0.0861 |
897
+ | 0.5528 | 29300 | 0.067 |
898
+ | 0.5547 | 29400 | 0.0887 |
899
+ | 0.5565 | 29500 | 0.059 |
900
+ | 0.5584 | 29600 | 0.0484 |
901
+ | 0.5603 | 29700 | 0.0703 |
902
+ | 0.5622 | 29800 | 0.0802 |
903
+ | 0.5641 | 29900 | 0.0805 |
904
+ | 0.5660 | 30000 | 0.0737 |
905
+ | 0.5679 | 30100 | 0.0518 |
906
+ | 0.5697 | 30200 | 0.0517 |
907
+ | 0.5716 | 30300 | 0.0806 |
908
+ | 0.5735 | 30400 | 0.0586 |
909
+ | 0.5754 | 30500 | 0.0491 |
910
+ | 0.5773 | 30600 | 0.0591 |
911
+ | 0.5792 | 30700 | 0.066 |
912
+ | 0.5811 | 30800 | 0.0419 |
913
+ | 0.5830 | 30900 | 0.0517 |
914
+ | 0.5848 | 31000 | 0.0539 |
915
+ | 0.5867 | 31100 | 0.0845 |
916
+ | 0.5886 | 31200 | 0.044 |
917
+ | 0.5905 | 31300 | 0.0597 |
918
+ | 0.5924 | 31400 | 0.0556 |
919
+ | 0.5943 | 31500 | 0.0724 |
920
+ | 0.5962 | 31600 | 0.0465 |
921
+ | 0.5980 | 31700 | 0.0585 |
922
+ | 0.5999 | 31800 | 0.0978 |
923
+ | 0.6018 | 31900 | 0.0657 |
924
+ | 0.6037 | 32000 | 0.0438 |
925
+ | 0.6056 | 32100 | 0.0429 |
926
+ | 0.6075 | 32200 | 0.0629 |
927
+ | 0.6094 | 32300 | 0.0591 |
928
+ | 0.6113 | 32400 | 0.0543 |
929
+ | 0.6131 | 32500 | 0.0502 |
930
+ | 0.6150 | 32600 | 0.0733 |
931
+ | 0.6169 | 32700 | 0.0426 |
932
+ | 0.6188 | 32800 | 0.0626 |
933
+ | 0.6207 | 32900 | 0.0406 |
934
+ | 0.6226 | 33000 | 0.0524 |
935
+ | 0.6245 | 33100 | 0.0619 |
936
+ | 0.6263 | 33200 | 0.0633 |
937
+ | 0.6282 | 33300 | 0.0582 |
938
+ | 0.6301 | 33400 | 0.0852 |
939
+ | 0.6320 | 33500 | 0.0482 |
940
+ | 0.6339 | 33600 | 0.0509 |
941
+ | 0.6358 | 33700 | 0.0626 |
942
+ | 0.6377 | 33800 | 0.0609 |
943
+ | 0.6396 | 33900 | 0.0508 |
944
+ | 0.6414 | 34000 | 0.0486 |
945
+ | 0.6433 | 34100 | 0.0508 |
946
+ | 0.6452 | 34200 | 0.0581 |
947
+ | 0.6471 | 34300 | 0.0409 |
948
+ | 0.6490 | 34400 | 0.0703 |
949
+ | 0.6509 | 34500 | 0.0606 |
950
+ | 0.6528 | 34600 | 0.0517 |
951
+ | 0.6546 | 34700 | 0.0493 |
952
+ | 0.6565 | 34800 | 0.0271 |
953
+ | 0.6584 | 34900 | 0.0337 |
954
+ | 0.6603 | 35000 | 0.0369 |
955
+ | 0.6622 | 35100 | 0.0474 |
956
+ | 0.6641 | 35200 | 0.0562 |
957
+ | 0.6660 | 35300 | 0.0663 |
958
+ | 0.6678 | 35400 | 0.0419 |
959
+ | 0.6697 | 35500 | 0.0766 |
960
+ | 0.6716 | 35600 | 0.0439 |
961
+ | 0.6735 | 35700 | 0.0538 |
962
+ | 0.6754 | 35800 | 0.0512 |
963
+ | 0.6773 | 35900 | 0.0388 |
964
+ | 0.6792 | 36000 | 0.0528 |
965
+ | 0.6811 | 36100 | 0.0489 |
966
+ | 0.6829 | 36200 | 0.0454 |
967
+ | 0.6848 | 36300 | 0.0449 |
968
+ | 0.6867 | 36400 | 0.055 |
969
+ | 0.6886 | 36500 | 0.0344 |
970
+ | 0.6905 | 36600 | 0.0485 |
971
+ | 0.6924 | 36700 | 0.0496 |
972
+ | 0.6943 | 36800 | 0.0705 |
973
+ | 0.6961 | 36900 | 0.0617 |
974
+ | 0.6980 | 37000 | 0.054 |
975
+ | 0.6999 | 37100 | 0.0613 |
976
+ | 0.7018 | 37200 | 0.0549 |
977
+ | 0.7037 | 37300 | 0.0378 |
978
+ | 0.7056 | 37400 | 0.0508 |
979
+ | 0.7075 | 37500 | 0.0613 |
980
+ | 0.7094 | 37600 | 0.0602 |
981
+ | 0.7112 | 37700 | 0.0592 |
982
+ | 0.7131 | 37800 | 0.0441 |
983
+ | 0.7150 | 37900 | 0.0445 |
984
+ | 0.7169 | 38000 | 0.0464 |
985
+ | 0.7188 | 38100 | 0.0537 |
986
+ | 0.7207 | 38200 | 0.0521 |
987
+ | 0.7226 | 38300 | 0.0447 |
988
+ | 0.7244 | 38400 | 0.044 |
989
+ | 0.7263 | 38500 | 0.0506 |
990
+ | 0.7282 | 38600 | 0.043 |
991
+ | 0.7301 | 38700 | 0.0441 |
992
+ | 0.7320 | 38800 | 0.0444 |
993
+ | 0.7339 | 38900 | 0.0416 |
994
+ | 0.7358 | 39000 | 0.0556 |
995
+ | 0.7377 | 39100 | 0.0829 |
996
+ | 0.7395 | 39200 | 0.043 |
997
+ | 0.7414 | 39300 | 0.0366 |
998
+ | 0.7433 | 39400 | 0.0457 |
999
+ | 0.7452 | 39500 | 0.0622 |
1000
+ | 0.7471 | 39600 | 0.0353 |
1001
+ | 0.7490 | 39700 | 0.0597 |
1002
+ | 0.7509 | 39800 | 0.0468 |
1003
+ | 0.7527 | 39900 | 0.0418 |
1004
+ | 0.7546 | 40000 | 0.0606 |
1005
+ | 0.7565 | 40100 | 0.0613 |
1006
+ | 0.7584 | 40200 | 0.0654 |
1007
+ | 0.7603 | 40300 | 0.046 |
1008
+ | 0.7622 | 40400 | 0.034 |
1009
+ | 0.7641 | 40500 | 0.0378 |
1010
+ | 0.7660 | 40600 | 0.0461 |
1011
+ | 0.7678 | 40700 | 0.0404 |
1012
+ | 0.7697 | 40800 | 0.0583 |
1013
+ | 0.7716 | 40900 | 0.0636 |
1014
+ | 0.7735 | 41000 | 0.0537 |
1015
+ | 0.7754 | 41100 | 0.0336 |
1016
+ | 0.7773 | 41200 | 0.0315 |
1017
+ | 0.7792 | 41300 | 0.0536 |
1018
+ | 0.7810 | 41400 | 0.0532 |
1019
+ | 0.7829 | 41500 | 0.0553 |
1020
+ | 0.7848 | 41600 | 0.0458 |
1021
+ | 0.7867 | 41700 | 0.0372 |
1022
+ | 0.7886 | 41800 | 0.0346 |
1023
+ | 0.7905 | 41900 | 0.0419 |
1024
+ | 0.7924 | 42000 | 0.0461 |
1025
+ | 0.7942 | 42100 | 0.0517 |
1026
+ | 0.7961 | 42200 | 0.0574 |
1027
+ | 0.7980 | 42300 | 0.0411 |
1028
+ | 0.7999 | 42400 | 0.0389 |
1029
+ | 0.8018 | 42500 | 0.0578 |
1030
+ | 0.8037 | 42600 | 0.0637 |
1031
+ | 0.8056 | 42700 | 0.0434 |
1032
+ | 0.8075 | 42800 | 0.0776 |
1033
+ | 0.8093 | 42900 | 0.0644 |
1034
+ | 0.8112 | 43000 | 0.0537 |
1035
+ | 0.8131 | 43100 | 0.0519 |
1036
+ | 0.8150 | 43200 | 0.0241 |
1037
+ | 0.8169 | 43300 | 0.0295 |
1038
+ | 0.8188 | 43400 | 0.0618 |
1039
+ | 0.8207 | 43500 | 0.0275 |
1040
+ | 0.8225 | 43600 | 0.0605 |
1041
+ | 0.8244 | 43700 | 0.0414 |
1042
+ | 0.8263 | 43800 | 0.0446 |
1043
+ | 0.8282 | 43900 | 0.0449 |
1044
+ | 0.8301 | 44000 | 0.0558 |
1045
+ | 0.8320 | 44100 | 0.0336 |
1046
+ | 0.8339 | 44200 | 0.0555 |
1047
+ | 0.8358 | 44300 | 0.0399 |
1048
+ | 0.8376 | 44400 | 0.0319 |
1049
+ | 0.8395 | 44500 | 0.0331 |
1050
+ | 0.8414 | 44600 | 0.0415 |
1051
+ | 0.8433 | 44700 | 0.0424 |
1052
+ | 0.8452 | 44800 | 0.0287 |
1053
+ | 0.8471 | 44900 | 0.044 |
1054
+ | 0.8490 | 45000 | 0.0375 |
1055
+ | 0.8508 | 45100 | 0.032 |
1056
+ | 0.8527 | 45200 | 0.0406 |
1057
+ | 0.8546 | 45300 | 0.0429 |
1058
+ | 0.8565 | 45400 | 0.0727 |
1059
+ | 0.8584 | 45500 | 0.05 |
1060
+ | 0.8603 | 45600 | 0.0436 |
1061
+ | 0.8622 | 45700 | 0.0401 |
1062
+ | 0.8641 | 45800 | 0.0312 |
1063
+ | 0.8659 | 45900 | 0.036 |
1064
+ | 0.8678 | 46000 | 0.0558 |
1065
+ | 0.8697 | 46100 | 0.0436 |
1066
+ | 0.8716 | 46200 | 0.0517 |
1067
+ | 0.8735 | 46300 | 0.0361 |
1068
+ | 0.8754 | 46400 | 0.038 |
1069
+ | 0.8773 | 46500 | 0.0418 |
1070
+ | 0.8791 | 46600 | 0.0407 |
1071
+ | 0.8810 | 46700 | 0.0336 |
1072
+ | 0.8829 | 46800 | 0.0559 |
1073
+ | 0.8848 | 46900 | 0.0488 |
1074
+ | 0.8867 | 47000 | 0.0463 |
1075
+ | 0.8886 | 47100 | 0.0504 |
1076
+ | 0.8905 | 47200 | 0.0414 |
1077
+ | 0.8924 | 47300 | 0.0428 |
1078
+ | 0.8942 | 47400 | 0.0389 |
1079
+ | 0.8961 | 47500 | 0.0422 |
1080
+ | 0.8980 | 47600 | 0.0533 |
1081
+ | 0.8999 | 47700 | 0.0386 |
1082
+ | 0.9018 | 47800 | 0.0672 |
1083
+ | 0.9037 | 47900 | 0.0505 |
1084
+ | 0.9056 | 48000 | 0.0632 |
1085
+ | 0.9074 | 48100 | 0.0263 |
1086
+ | 0.9093 | 48200 | 0.0448 |
1087
+ | 0.9112 | 48300 | 0.0413 |
1088
+ | 0.9131 | 48400 | 0.0532 |
1089
+ | 0.9150 | 48500 | 0.0503 |
1090
+ | 0.9169 | 48600 | 0.0472 |
1091
+ | 0.9188 | 48700 | 0.0255 |
1092
+ | 0.9207 | 48800 | 0.035 |
1093
+ | 0.9225 | 48900 | 0.0353 |
1094
+ | 0.9244 | 49000 | 0.0407 |
1095
+ | 0.9263 | 49100 | 0.0154 |
1096
+ | 0.9282 | 49200 | 0.0535 |
1097
+ | 0.9301 | 49300 | 0.0435 |
1098
+ | 0.9320 | 49400 | 0.0461 |
1099
+ | 0.9339 | 49500 | 0.0288 |
1100
+ | 0.9357 | 49600 | 0.0366 |
1101
+ | 0.9376 | 49700 | 0.0411 |
1102
+ | 0.9395 | 49800 | 0.0605 |
1103
+ | 0.9414 | 49900 | 0.0551 |
1104
+ | 0.9433 | 50000 | 0.0297 |
1105
+ | 0.9452 | 50100 | 0.0388 |
1106
+ | 0.9471 | 50200 | 0.0402 |
1107
+ | 0.9489 | 50300 | 0.0321 |
1108
+ | 0.9508 | 50400 | 0.0538 |
1109
+ | 0.9527 | 50500 | 0.036 |
1110
+ | 0.9546 | 50600 | 0.0318 |
1111
+ | 0.9565 | 50700 | 0.0398 |
1112
+ | 0.9584 | 50800 | 0.0405 |
1113
+ | 0.9603 | 50900 | 0.0408 |
1114
+ | 0.9622 | 51000 | 0.0485 |
1115
+ | 0.9640 | 51100 | 0.047 |
1116
+ | 0.9659 | 51200 | 0.0452 |
1117
+ | 0.9678 | 51300 | 0.0469 |
1118
+ | 0.9697 | 51400 | 0.0473 |
1119
+ | 0.9716 | 51500 | 0.039 |
1120
+ | 0.9735 | 51600 | 0.0579 |
1121
+ | 0.9754 | 51700 | 0.0332 |
1122
+ | 0.9772 | 51800 | 0.0322 |
1123
+ | 0.9791 | 51900 | 0.0324 |
1124
+ | 0.9810 | 52000 | 0.035 |
1125
+ | 0.9829 | 52100 | 0.0517 |
1126
+ | 0.9848 | 52200 | 0.0275 |
1127
+ | 0.9867 | 52300 | 0.0466 |
1128
+ | 0.9886 | 52400 | 0.0452 |
1129
+ | 0.9905 | 52500 | 0.0446 |
1130
+ | 0.9923 | 52600 | 0.0357 |
1131
+ | 0.9942 | 52700 | 0.0368 |
1132
+ | 0.9961 | 52800 | 0.0365 |
1133
+ | 0.9980 | 52900 | 0.0303 |
1134
+ | 0.9999 | 53000 | 0.0288 |
1135
+
1136
+ </details>
1137
+
1138
+ ### Framework Versions
1139
+ - Python: 3.11.12
1140
+ - Sentence Transformers: 3.4.1
1141
+ - Transformers: 4.51.1
1142
+ - PyTorch: 2.6.0+cu124
1143
+ - Accelerate: 1.5.2
1144
+ - Datasets: 3.5.0
1145
+ - Tokenizers: 0.21.1
1146
+
1147
+ ## Citation
1148
+
1149
+ ### BibTeX
1150
+
1151
+ #### Sentence Transformers
1152
+ ```bibtex
1153
+ @inproceedings{reimers-2019-sentence-bert,
1154
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
1155
+ author = "Reimers, Nils and Gurevych, Iryna",
1156
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
1157
+ month = "11",
1158
+ year = "2019",
1159
+ publisher = "Association for Computational Linguistics",
1160
+ url = "https://arxiv.org/abs/1908.10084",
1161
+ }
1162
+ ```
1163
+
1164
+ #### MultipleNegativesRankingLoss
1165
+ ```bibtex
1166
+ @misc{henderson2017efficient,
1167
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
1168
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
1169
+ year={2017},
1170
+ eprint={1705.00652},
1171
+ archivePrefix={arXiv},
1172
+ primaryClass={cs.CL}
1173
+ }
1174
+ ```
1175
+
1176
+ <!--
1177
+ ## Glossary
1178
+
1179
+ *Clearly define terms in order to be accessible across audiences.*
1180
+ -->
1181
+
1182
+ <!--
1183
+ ## Model Card Authors
1184
+
1185
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
1186
+ -->
1187
+
1188
+ <!--
1189
+ ## Model Card Contact
1190
+
1191
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
1192
+ -->
config.json ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "NewModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.0,
6
+ "auto_map": {
7
+ "AutoConfig": "configuration.NewConfig",
8
+ "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
9
+ "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
10
+ "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
11
+ "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
12
+ "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
13
+ "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
14
+ },
15
+ "classifier_dropout": 0.0,
16
+ "hidden_act": "gelu",
17
+ "hidden_dropout_prob": 0.1,
18
+ "hidden_size": 768,
19
+ "id2label": {
20
+ "0": "LABEL_0"
21
+ },
22
+ "initializer_range": 0.02,
23
+ "intermediate_size": 3072,
24
+ "label2id": {
25
+ "LABEL_0": 0
26
+ },
27
+ "layer_norm_eps": 1e-12,
28
+ "layer_norm_type": "layer_norm",
29
+ "logn_attention_clip1": false,
30
+ "logn_attention_scale": false,
31
+ "max_position_embeddings": 8192,
32
+ "model_type": "new",
33
+ "num_attention_heads": 12,
34
+ "num_hidden_layers": 12,
35
+ "pack_qkv": true,
36
+ "pad_token_id": 1,
37
+ "position_embedding_type": "rope",
38
+ "rope_scaling": {
39
+ "factor": 8.0,
40
+ "type": "ntk"
41
+ },
42
+ "rope_theta": 20000,
43
+ "torch_dtype": "float32",
44
+ "transformers_version": "4.51.1",
45
+ "type_vocab_size": 1,
46
+ "unpad_inputs": false,
47
+ "use_memory_efficient_attention": false,
48
+ "vocab_size": 250048
49
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.51.1",
5
+ "pytorch": "2.6.0+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
configuration.py ADDED
@@ -0,0 +1,145 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # coding=utf-8
2
+ # Copyright 2024 The GTE Team Authors and Alibaba Group.
3
+ # Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+ """ NEW model configuration"""
17
+ from transformers.configuration_utils import PretrainedConfig
18
+ from transformers.utils import logging
19
+
20
+ logger = logging.get_logger(__name__)
21
+
22
+
23
+ class NewConfig(PretrainedConfig):
24
+ r"""
25
+ This is the configuration class to store the configuration of a [`NewModel`] or a [`TFNewModel`]. It is used to
26
+ instantiate a NEW model according to the specified arguments, defining the model architecture. Instantiating a
27
+ configuration with the defaults will yield a similar configuration to that of the NEW
28
+ [izhx/new-base-en](https://huggingface.co/izhx/new-base-en) architecture.
29
+
30
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
31
+ documentation from [`PretrainedConfig`] for more information.
32
+
33
+
34
+ Args:
35
+ vocab_size (`int`, *optional*, defaults to 30522):
36
+ Vocabulary size of the NEW model. Defines the number of different tokens that can be represented by the
37
+ `inputs_ids` passed when calling [`NewModel`] or [`TFNewModel`].
38
+ hidden_size (`int`, *optional*, defaults to 768):
39
+ Dimensionality of the encoder layers and the pooler layer.
40
+ num_hidden_layers (`int`, *optional*, defaults to 12):
41
+ Number of hidden layers in the Transformer encoder.
42
+ num_attention_heads (`int`, *optional*, defaults to 12):
43
+ Number of attention heads for each attention layer in the Transformer encoder.
44
+ intermediate_size (`int`, *optional*, defaults to 3072):
45
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
46
+ hidden_act (`str` or `Callable`, *optional*, defaults to `"gelu"`):
47
+ The non-linear activation function (function or string) in the encoder and pooler. If string, `"gelu"`,
48
+ `"relu"`, `"silu"` and `"gelu_new"` are supported.
49
+ hidden_dropout_prob (`float`, *optional*, defaults to 0.1):
50
+ The dropout probability for all fully connected layers in the embeddings, encoder, and pooler.
51
+ attention_probs_dropout_prob (`float`, *optional*, defaults to 0.1):
52
+ The dropout ratio for the attention probabilities.
53
+ max_position_embeddings (`int`, *optional*, defaults to 512):
54
+ The maximum sequence length that this model might ever be used with. Typically set this to something large
55
+ just in case (e.g., 512 or 1024 or 2048).
56
+ type_vocab_size (`int`, *optional*, defaults to 2):
57
+ The vocabulary size of the `token_type_ids` passed when calling [`NewModel`] or [`TFNewModel`].
58
+ initializer_range (`float`, *optional*, defaults to 0.02):
59
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
60
+ layer_norm_eps (`float`, *optional*, defaults to 1e-12):
61
+ The epsilon used by the layer normalization layers.
62
+ position_embedding_type (`str`, *optional*, defaults to `"rope"`):
63
+ Type of position embedding. Choose one of `"absolute"`, `"rope"`.
64
+ rope_theta (`float`, *optional*, defaults to 10000.0):
65
+ The base period of the RoPE embeddings.
66
+ rope_scaling (`Dict`, *optional*):
67
+ Dictionary containing the scaling configuration for the RoPE embeddings. Currently supports two scaling
68
+ strategies: linear and dynamic. Their scaling factor must be a float greater than 1. The expected format is
69
+ `{"type": strategy name, "factor": scaling factor}`. When using this flag, don't update
70
+ `max_position_embeddings` to the expected new maximum. See the following thread for more information on how
71
+ these scaling strategies behave:
72
+ https://www.reddit.com/r/LocalLLaMA/comments/14mrgpr/dynamically_scaled_rope_further_increases/. This is an
73
+ experimental feature, subject to breaking API changes in future versions.
74
+ classifier_dropout (`float`, *optional*):
75
+ The dropout ratio for the classification head.
76
+
77
+ Examples:
78
+
79
+ ```python
80
+ >>> from transformers import NewConfig, NewModel
81
+
82
+ >>> # Initializing a NEW izhx/new-base-en style configuration
83
+ >>> configuration = NewConfig()
84
+
85
+ >>> # Initializing a model (with random weights) from the izhx/new-base-en style configuration
86
+ >>> model = NewModel(configuration)
87
+
88
+ >>> # Accessing the model configuration
89
+ >>> configuration = model.config
90
+ ```"""
91
+
92
+ model_type = "new"
93
+
94
+ def __init__(
95
+ self,
96
+ vocab_size=30528,
97
+ hidden_size=768,
98
+ num_hidden_layers=12,
99
+ num_attention_heads=12,
100
+ intermediate_size=3072,
101
+ hidden_act="gelu",
102
+ hidden_dropout_prob=0.1,
103
+ attention_probs_dropout_prob=0.0,
104
+ max_position_embeddings=2048,
105
+ type_vocab_size=1,
106
+ initializer_range=0.02,
107
+ layer_norm_type='layer_norm',
108
+ layer_norm_eps=1e-12,
109
+ # pad_token_id=0,
110
+ position_embedding_type="rope",
111
+ rope_theta=10000.0,
112
+ rope_scaling=None,
113
+ classifier_dropout=None,
114
+ pack_qkv=True,
115
+ unpad_inputs=False,
116
+ use_memory_efficient_attention=False,
117
+ logn_attention_scale=False,
118
+ logn_attention_clip1=False,
119
+ **kwargs,
120
+ ):
121
+ super().__init__(**kwargs)
122
+
123
+ self.vocab_size = vocab_size
124
+ self.hidden_size = hidden_size
125
+ self.num_hidden_layers = num_hidden_layers
126
+ self.num_attention_heads = num_attention_heads
127
+ self.hidden_act = hidden_act
128
+ self.intermediate_size = intermediate_size
129
+ self.hidden_dropout_prob = hidden_dropout_prob
130
+ self.attention_probs_dropout_prob = attention_probs_dropout_prob
131
+ self.max_position_embeddings = max_position_embeddings
132
+ self.type_vocab_size = type_vocab_size
133
+ self.initializer_range = initializer_range
134
+ self.layer_norm_type = layer_norm_type
135
+ self.layer_norm_eps = layer_norm_eps
136
+ self.position_embedding_type = position_embedding_type
137
+ self.rope_theta = rope_theta
138
+ self.rope_scaling = rope_scaling
139
+ self.classifier_dropout = classifier_dropout
140
+
141
+ self.pack_qkv = pack_qkv
142
+ self.unpad_inputs = unpad_inputs
143
+ self.use_memory_efficient_attention = use_memory_efficient_attention
144
+ self.logn_attention_scale = logn_attention_scale
145
+ self.logn_attention_clip1 = logn_attention_clip1
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdf748855813d79ca3904e4b67cbf5b1692effc5b0b9f98e21505d1b372d410e
3
+ size 1221487872
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 8192,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:aa7a6ad87a7ce8fe196787355f6af7d03aee94d19c54a5eb1392ed18c8ef451a
3
+ size 17082988
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 8192,
51
+ "model_max_length": 8192,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizerFast",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }