Post
3077
FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.
β First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. π€
β Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. π‘
β Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.
FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner
β First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. π€
β Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. π‘
β Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.
FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner