VL-Rethinker-72B

🚀 News: We release our meticulously curated collection of RL training queries for multimodal reasoning: ViRL39K.

VL-Rethinker-72B achieves SoTA results on various multimodal reasoning benchmarks.

It is trained using the Forced Rethinking technique, on top of VL-Reasoner with GRPO-SSR training.

For details of our approach and performance comparison, please see our paper.

For details of training and evaluation, please see our code repo.

Explore further via the following links:

Prompt

Append the following after the user query:

"""Guidelines: 
Please think step by step, and **regularly perform self-questioning, self-verification, self-correction to check your ongoing reasoning**, using connectives such as "Wait a moment", "Wait, does it seem right?", etc. Remember to put your final answer within \\boxed{}.
"""

Citation

If you feel this model useful, please give us a free cite:

@article{vl-rethinker,
      title={VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning},
      author = {Wang, Haozhe and Qu, Chao and Huang, Zuming and Chu, Wei and Lin, Fangzhen and Chen, Wenhu},
      journal={arXiv preprint arXiv:2504.08837},
      year={2025}
}

TIGER-Lab
/

VL-Rethinker-72B

VL-Rethinker-72B

Prompt

Citation

Model tree for TIGER-Lab/VL-Rethinker-72B

Space using TIGER-Lab/VL-Rethinker-72B 1

Collection including TIGER-Lab/VL-Rethinker-72B

VL-Rethinker