Used Open R1 (by Huggingface) to SFT my earlier thinker models. Encouraging results. Checkpoints also present.

https://github.com/ewre324/open-r1/tree/main

Based on DeepSeek R1 based method to train on specific reasoning dataset to ensure more thinking. Still the ... tags are not generated. TODO.

Downloads last month
1
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ewre324/ewre324-R1-SmolLM2-135M-Distill

Dataset used to train ewre324/ewre324-R1-SmolLM2-135M-Distill

Collection including ewre324/ewre324-R1-SmolLM2-135M-Distill