Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning

M1-32B is a 32B-parameter large language model fine-tuned from Qwen2.5-32B-Instruct on the M500 datasetβ€”an interdisciplinary multi-agent collaborative reasoning dataset. M1-32B is optimized for improved reasoning, discussion, and decision-making in multi-agent systems (MAS), including frameworks such as AgentVerse.

Code: https://github.com/jincan333/MAS-TTS


πŸš€ Key Features

  • 🧠 Enhanced Collaborative Reasoning
    Trained on real multi-agent traces involving diverse roles like Expert Recruiter, Problem Solvers, and Evaluator.

  • πŸ—£οΈ Role-Aware Dialogue Generation
    Learns to reason and respond from different expert perspectives based on structured prompts.

  • βš™οΈ Optimized for Multi-Agent Systems
    Performs well as a MAS agent with adaptive collaboration and token budgeting.


πŸ—οΈ Model Training

  • Base Model: Qwen2.5-32B-Instruct
  • Dataset: M500 (500 curated multi-agent reasoning traces)
  • Objective: Supervised Fine-Tuning (SFT) on role-conditioned prompts
  • Training Setup:
    • 8 Γ— A100 GPUs
    • 5 epochs
    • Learning rate: 1e-5
    • Frameworks: DeepSpeed, FlashAttention, LLaMA-Factory

πŸ“Š Performance

Model General Understanding Mathematical Reasoning Coding
GPQA Commongen AIME2024 MATH-500 HumanEval MBPP-S
Non-Reasoning Models
Qwen2.5 50.2 96.7 21.1 84.4 89.0 80.2
DeepSeek-V3 58.6 98.6 33.3 88.6 89.6 83.9
GPT-4o 49.2 97.8 7.8 81.3 90.9 85.4
Reasoning Models
s1.1-32B 58.3 94.1 53.3 90.6 82.3 77.4
DeepSeek-R1 75.5 97.2 78.9 96.2 98.2 91.7
o3-mini 71.3 99.1 84.4 95.3 97.0 93.6
M1-32B (Ours) 61.1 96.9 60.0 95.1 92.8 89.1
M1-32B w. CEO (Ours) 62.1 97.4 62.2 95.8 93.9 90.5

Table Caption:
Performance comparison on general understanding, mathematical reasoning, and coding tasks using strong reasoning and non-reasoning models within the AgentVerse framework. Our method achieves substantial improvements over Qwen2.5 and s1.1-32B on all tasks, and attains performance comparable to o3-mini and DeepSeek-R1 on MATH-500 and MBPP-S, demonstrating its effectiveness in enhancing collaborative reasoning in MAS. Note that the results of s1.1-32B are obtained without using budget forcing.


πŸ’¬ Intended Use

M1-32B is intended for research on Multi-agent reasoning and collaboration in MAS


Citation

If you use this model, please cite the relevant papers:

@article{jin2025two,
  title={Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning},
  author={Jin, Can and Peng, Hongwu and Zhang, Qixin and Tang, Yujin and Metaxas, Dimitris N and Che, Tong},
  journal={arXiv preprint arXiv:2504.09772},
  year={2025}
}
Downloads last month
86
Safetensors
Model size
32.8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Can111/m1-32b

Base model

Qwen/Qwen2.5-32B
Finetuned
(187)
this model
Quantizations
2 models