Hanning Zhang's picture

4 5

Hanning Zhang

HanningZhang

·

AI & ML interests

None yet

Recent Activity

authored a paper 2 days ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

upvoted a paper 2 days ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

updated a dataset 4 days ago

HanningZhang/mistral1-selected-baseline

View all activity

Organizations

HanningZhang's activity

upvoted a paper 2 days ago

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

Paper • 2505.02391 • Published 3 days ago • 21

upvoted a paper 20 days ago

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Paper • 2504.13161 • Published 21 days ago • 88

upvoted a paper 2 months ago

Self-rewarding correction for mathematical reasoning

Paper • 2502.19613 • Published Feb 26 • 84

upvoted a collection 6 months ago

RLHFlow MATH Process Reward Model

This is a collection of datasets and models of process reward modeling. • 15 items • Updated Nov 9, 2024 • 10