--- license: apache-2.0 --- # Rodimus+-Coder
🤖 ModelScope 🤗 Hugging Face 🖥️ GitHub
## Introduction Rodimus* is a new series of efficient large language models designed to address the challenges of computational complexity in Transformer-based architectures. The Rodimus* includes the base Rodimus model and its enhanced version, Rodimus+. Rodimus leverages a novel Data-Dependent Tempered Selection (DDTS) mechanism within a purely recurrent, linear attention-based framework, achieving high performance. Building on this, Rodimus+ combines the strengths of Rodimus and the innovative Sliding Window Shared-Key Attention (SW-SKA) in a hybrid approach. This combination effectively integrates semantic, token, and head compression techniques, enabling a balance between accuracy and efficiency.
Datasets | Qwen2.5-Coder-1.5B | Rodimus+-Coder-1.6B-Base | Gemma2-2B-PT | Qwen2.5-Coder-3B | Rodimus+-Coder-4B-Base | Gemma3-4B-PT | Qwen2.5-Coder-7B |
---|---|---|---|---|---|---|---|
Coding Tasks | |||||||
HumanEval | 41.5 | 51.2 | 19.5 | 51.8 | 60.4 | 36.0 | 60.4 |
HumanEval+ | 34.8 | 45.1 | - | 40.9 | 52.4 | - | 50.6 |
MBPP | 57.2 | 51.2 | 31.0 | 62.6 | 64.6 | 46.0 | 70.0 |
MBPP+ | 66.1 | 62.2 | - | 65.9 | 71.4 | - | 70.1 |
BCBCOMPLETION | 21.6 | 17.9 | - | 26.2 | 30.8 | - | 30.4 |
MultiPL-E | 46.1 | 52.5 | - | 49.4 | 60.7 | - | 56.9 |
CRUXEval | 38.5 | 45.1 | - | 44.6 | 56.4 | - | 56.8 |
Coding Avg. | 43.7 | 46.5 | - | 48.8 | 56.7 | - | 56.4 |
General Tasks | |||||||
C-EVAL | 55.2 | 56.7 | - | 65.3 | 70.2 | - | 69.1 |
CMMLU | 54.5 | 52.3 | - | 65.4 | 68.3 | - | 72.7 |
MMLU | 55.5 | 51.1 | 52.2 | 63.3 | 62.6 | 59.6 | 70.5 |
BBH | 21.8 | 46.8 | 42.4 | 32.5 | 61.9 | 50.9 | 67.3 |
General Avg. | 46.8 | 51.7 | - | 56.6 | 65.8 | - | 69.9 |
Mathematics Tasks | |||||||
GSM8K | 60.4 | 68.7 | 25.0 | 72.1 | 78.5 | 38.4 | 83.4 |
MATH | 23.7 | 29.0 | 16.4 | 31.9 | 37.0 | 24.2 | 42.2 |
Math Avg. | 41.9 | 48.9 | 20.7 | 52.0 | 57.8 | 31.3 | 62.8 |
Overall | |||||||
Overall | 44.4 | 48.4 | - | 51.7 | 59.6 | - | 61.6 |
Datasets | Qwen2.5-Coder-1.5B-Instruct | Rodimus+-Coder-1.6B-Chat | Gemma2-2B-IT | Qwen2.5-Coder-Instruct | Phi-4-Mini-3.8B | Rodimus+-Coder-4B-Chat | Gemma3-4B-IT | Qwen2.5-Coder-7B-Instruct |
---|---|---|---|---|---|---|---|---|
Coding Tasks | ||||||||
HumanEval | 64.6 | 76.8 | 20.1 | 79.9 | 74.4 | 86.6 | 71.3 | 87.2 |
HumanEval+ | 63.4 | 73.8 | - | 80.5 | 68.3 | 82.9 | - | 82.3 |
MBPP | 51.0 | 59.0 | 36.6 | 59.2 | 65.3 | 68.0 | 63.2 | 75.8 |
MBPP+ | 53.0 | 66.4 | - | 61.9 | 63.8 | 68.5 | - | 75.1 |
LCB(24.08-24.11) | 4.0 | 10.9 | - | 13.0 | - | 13.9 | - | 22.8 |
BCBINSTRUCT | 10.8 | 21.5 | - | 21.7 | 33.8 | 26.6 | - | 30.6 |
HumanEval-Mul | 50.8 | 57.3 | - | 67.4 | - | 70.6 | - | 76.1 |
MBPP-Mul | 43.4 | 52.4 | - | 53.4 | - | 59.6 | - | 61.4 |
MBXP-EN | 55.8 | 75.5 | - | 76.0 | - | 87.3 | - | 87.7 |
MBXP-CN | 48.8 | 75.0 | - | 68.7 | - | 84.3 | - | 83.5 |
CRUXEval | 28.6 | 55.0 | - | 51.6 | - | 63.2 | - | 69.3 |
HumanEvalFix | 38.9 | 52.6 | - | 55.5 | - | 68.8 | - | 69.3 |
Spider | 61.2 | 71.4 | - | 71.8 | 42.2 | 73.5 | - | 82.0 |
Coding Avg. | 44.2 | 57.5 | - | 58.5 | - | 65.7 | - | 69.5 |
General Tasks | ||||||||
C-EVAL | 51.5 | 50.8 | - | 62.0 | - | 61.6 | - | 66.4 |
CMMLU | 45.2 | 50.5 | - | 60.1 | - | 62.0 | - | 64.9 |
MMLU | 52.0 | 49.3 | 56.1 | 61.7 | 67.3 | 57.5 | 58.1 | 66.1 |
BBH | 24.2 | 58.7 | 41.4 | 57.3 | 70.4 | 63.7 | 72.2 | 59.1 |
General Avg. | 43.2 | 52.3 | - | 60.3 | - | 61.2 | - | 64.1 |
Mathematics Tasks | ||||||||
GSM8K | 54.4 | 68.5 | 62.6 | 73.5 | 88.6 | 79.2 | 89.2 | 79.5 |
MATH | 38.1 | 33.5 | 27.2 | 44.1 | 64.0 | 44.1 | 75.6 | 60.8 |
Math Avg. | 46.2 | 51.0 | 44.9 | 58.8 | 68.8 | 61.7 | 82.4 | 70.1 |
Overall | ||||||||
Overall | 44.2 | 55.8 | - | 58.9 | - | 64.3 | - | 68.4 |