Jiarui Yao's picture

6

Jiarui Yao

FlippyDora

·

AI & ML interests

None yet

Recent Activity

updated a dataset 33 minutes ago

ScaleML-FANS/FANS-data

updated a model about 15 hours ago

ScaleML-RLHF/Qwen2.5-Math1.5B-gvm-raftpp-iter6

published a model about 15 hours ago

ScaleML-RLHF/Qwen2.5-Math1.5B-gvm-raftpp-iter6

View all activity

Organizations

FlippyDora's activity

upvoted 2 papers 2 days ago

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published 23 days ago • 16

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published 3 days ago • 21

upvoted a collection 8 days ago

Qwen3

27 items • Updated about 5 hours ago • 543

upvoted 2 papers 16 days ago

OTC: Optimal Tool Calls via Reinforcement Learning

Paper • 2504.14870 • Published 17 days ago • 33

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published 21 days ago • 43

upvoted a paper 2 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 84