tomasmcm
/

sky-t1-coder-32b-flash

Text Generation

text-generation-inference

Model card Files Files and versions Community

tomasmcm commited on Feb 22

Commit

53e4634

·

verified ·

1 Parent(s): 41760bc

Update README.md

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -7,12 +7,15 @@ library_name: transformers
 tags:
 - mergekit
 - merge
 ---
 # merge
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
 ## Merge Details
 ### Merge Method
@@ -40,4 +43,4 @@ base_model: Qwen/Qwen2.5-Coder-32B
 parameters:
   select_topk: 1.0
 dtype: bfloat16
-```

 tags:
 - mergekit
 - merge
+- qwen2
+license: apache-2.0
 ---
 # merge
 This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
+I wanted to see if it would be possible to improve on [FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview](https://huggingface.co/FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview) and [CoderO1-DeepSeekR1-Coder-32B-Preview](https://huggingface.co/RDson/CoderO1-DeepSeekR1-Coder-32B-Preview) by using [Sky-T1-32B-Flash](https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash) as the reasoning model that is merged with [Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) instead of DeepSeek-R1-Distill-Qwen-32B. The idea is to have a strong coder model that can reason but without very long reasoning chains (hence using the Flash model).
 ## Merge Details
 ### Merge Method
 parameters:
   select_topk: 1.0
 dtype: bfloat16
+```