Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,15 @@ library_name: transformers
|
|
7 |
tags:
|
8 |
- mergekit
|
9 |
- merge
|
10 |
-
|
|
|
11 |
---
|
12 |
# merge
|
13 |
|
14 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
15 |
|
|
|
|
|
16 |
## Merge Details
|
17 |
### Merge Method
|
18 |
|
@@ -40,4 +43,4 @@ base_model: Qwen/Qwen2.5-Coder-32B
|
|
40 |
parameters:
|
41 |
select_topk: 1.0
|
42 |
dtype: bfloat16
|
43 |
-
```
|
|
|
7 |
tags:
|
8 |
- mergekit
|
9 |
- merge
|
10 |
+
- qwen2
|
11 |
+
license: apache-2.0
|
12 |
---
|
13 |
# merge
|
14 |
|
15 |
This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
|
16 |
|
17 |
+
I wanted to see if it would be possible to improve on [FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) and [CoderO1-DeepSeekR1-Coder-32B-Preview](https://huggingface.co/RDson/CoderO1-DeepSeekR1-Coder-32B-Preview) by using [Sky-T1-32B-Flash](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash) as the reasoning model that is merged with [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) instead of DeepSeek-R1-Distill-Qwen-32B. The idea is to have a strong coder model that can reason but without very long reasoning chains (hence using the Flash model).
|
18 |
+
|
19 |
## Merge Details
|
20 |
### Merge Method
|
21 |
|
|
|
43 |
parameters:
|
44 |
select_topk: 1.0
|
45 |
dtype: bfloat16
|
46 |
+
```
|