tomasmcm commited on
Commit
53e4634
·
verified ·
1 Parent(s): 41760bc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -2
README.md CHANGED
@@ -7,12 +7,15 @@ library_name: transformers
7
  tags:
8
  - mergekit
9
  - merge
10
-
 
11
  ---
12
  # merge
13
 
14
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
15
 
 
 
16
  ## Merge Details
17
  ### Merge Method
18
 
@@ -40,4 +43,4 @@ base_model: Qwen/Qwen2.5-Coder-32B
40
  parameters:
41
  select_topk: 1.0
42
  dtype: bfloat16
43
- ```
 
7
  tags:
8
  - mergekit
9
  - merge
10
+ - qwen2
11
+ license: apache-2.0
12
  ---
13
  # merge
14
 
15
  This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
16
 
17
+ I wanted to see if it would be possible to improve on [FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) and [CoderO1-DeepSeekR1-Coder-32B-Preview](https://huggingface.co/RDson/CoderO1-DeepSeekR1-Coder-32B-Preview) by using [Sky-T1-32B-Flash](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash) as the reasoning model that is merged with [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) instead of DeepSeek-R1-Distill-Qwen-32B. The idea is to have a strong coder model that can reason but without very long reasoning chains (hence using the Flash model).
18
+
19
  ## Merge Details
20
  ### Merge Method
21
 
 
43
  parameters:
44
  select_topk: 1.0
45
  dtype: bfloat16
46
+ ```