10 57 55

Pengxiang Li

pengxiang

pixeli99

AI & ML interests

Video generation, Image editing, AD

Recent Activity

upvoted a paper 5 days ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

upvoted a paper 11 days ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

updated a model 12 days ago

pengxiang/Qwen2.5-1.5B-Open-R1-Distill-loop

View all activity

Organizations

None yet

pengxiang's activity

upvoted a paper 5 days ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published 9 days ago • 88

upvoted a paper 11 days ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published 19 days ago • 119

upvoted a paper 16 days ago

InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners

Paper • 2504.14239 • Published 19 days ago • 13

upvoted a paper 29 days ago

Rethinking Reflection in Pre-Training

Paper • 2504.04022 • Published Apr 5 • 77

upvoted 2 papers 30 days ago

Multi-Token Attention

Paper • 2504.00927 • Published Apr 1 • 49

One-Minute Video Generation with Test-Time Training

Paper • 2504.05298 • Published about 1 month ago • 102

upvoted 2 papers about 1 month ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26 • 48

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Paper • 2503.09573 • Published Mar 12 • 71

upvoted a paper about 2 months ago

Frac-Connections: Fractional Extension of Hyper-Connections

Paper • 2503.14125 • Published Mar 18 • 21

upvoted 2 papers 2 months ago

HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization

Paper • 2503.04598 • Published Mar 6 • 20

Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs

Paper • 2503.01307 • Published Mar 3 • 38

upvoted 3 papers 3 months ago

upvoted 6 papers 4 months ago

Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

Paper • 2501.09732 • Published Jan 16 • 72

SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training

Paper • 2501.06842 • Published Jan 12 • 16

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 276

InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection

Paper • 2501.04575 • Published Jan 8 • 24

Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though

Paper • 2501.04682 • Published Jan 8 • 97

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 88