Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
Edit Models filters
Tasks
Libraries
Datasets
Languages
Licenses
Other
2
Inference Providers
Select all
Cohere
Hyperbolic
SambaNova
fal
Fireworks
Replicate
Novita
Together AI
Nebius AI Studio
Cerebras
HF Inference API
Misc
Reset Misc
grpo
trl
Inference Endpoints
text-generation-inference
4-bit precision
8-bit precision
custom_code
Carbon Emissions
Eval Results
text-embeddings-inference
Misc with no match
Merge
Mixture of Experts
Apply filters
Models
5,542
Full-text search
Edit filters
Sort: Trending
Active filters:
grpo, trl
Clear all
ruslanmv/granite-3.1-8b-Reasoning
Text Generation
•
Updated
Feb 12
•
17
•
1
mradermacher/granite-3.1-8b-Reasoning-GGUF
Text Generation
•
Updated
Feb 12
•
175
•
1
mradermacher/granite-3.1-2b-Reasoning-GGUF
Updated
Feb 13
•
220
•
3
mradermacher/granite-3.1-8b-Reasoning-i1-GGUF
Text Generation
•
Updated
Feb 12
•
212
•
1
mradermacher/granite-3.1-2b-Reasoning-i1-GGUF
Updated
Feb 13
•
315
•
2
mradermacher/Superthoughts-lite-v1-GGUF
Updated
Feb 13
•
161
•
1
mradermacher/Superthoughts-lite-v1-i1-GGUF
Updated
Feb 13
•
328
•
1
mradermacher/Phi-Mini-3.5-R1-GGUF
Updated
Feb 14
•
133
•
1
AdamLucek/Qwen2.5-3B-Instruct-GRPO-2K-GSM8K
Text Generation
•
Updated
Feb 15
•
17
•
1
grounded-ai/phi4-r1-guard
Text Generation
•
Updated
Mar 4
•
96
•
5
BraylonDash/Qwen-2.5-3b-instruct-GRPO-250
Text Generation
•
Updated
Feb 18
•
9
•
2
ericrisco/salamandra-7b-r1
Updated
Feb 18
•
24
•
1
SmallDoge/Doge-160M-Reason-Distill
Question Answering
•
Updated
Mar 7
•
28
•
4
mradermacher/Qwen2.5-3B-Knowledge-R1-GRPO-GGUF
Updated
Feb 19
•
155
•
1
mradermacher/Bluebrain-GRPO-Qwen2.5-3B-Instruct-GGUF
Updated
Feb 20
•
114
•
1
lmassaron/gemma-2-2b-it-grpo-gsm8k
Text Generation
•
Updated
Feb 24
•
480
•
1
xingqiang/Llama3.1-8B-GRPO-Planing
Text Generation
•
Updated
Feb 21
•
9
•
1
mradermacher/OLMoE-1B-7B-0125-Instruct-grpo-GGUF
Updated
Feb 22
•
124
•
1
mradermacher/Qwen2.5-7B-GRPO-1M-Context-Medical-Reasoning-f16-GGUF
Updated
Feb 22
•
189
•
1
mradermacher/Qwen2.5-7B-GRPO-1M-Context-Medical-Reasoning-f16-v2-GGUF
Updated
Feb 22
•
157
•
1
Metin/LLaMA-3-8B-GRPO-Finance-Math-TR
Text Generation
•
Updated
Feb 24
•
8
•
6
ibrahho/model
Text Generation
•
Updated
Feb 23
•
6
•
1
mlabonne/SmolGRPO-135M
Text Generation
•
Updated
Feb 26
•
9
•
5
valoomba/Rombo-V3.1-32B-Reasoner
Text Generation
•
Updated
Feb 24
•
6
•
1
Creekside/Lia-01
Text Generation
•
Updated
Feb 26
•
13
•
1
Locutusque/Thespis-Llama-3.1-8B
Text Generation
•
Updated
Feb 28
•
77
•
14
mradermacher/Rombo-V3.1-32B-Reasoner-i1-GGUF
Updated
Feb 26
•
113
•
1
mradermacher/Thespis-Llama-3.1-8B-GGUF
Updated
Feb 26
•
642
•
1
mradermacher/Thespis-Llama-3.1-8B-i1-GGUF
Updated
Feb 26
•
314
•
1
mradermacher/Lia-01-GGUF
Updated
Feb 26
•
115
•
1
Previous
1
2
3
4
...
100
Next