16 24 196

Yinxu Pan

cppowboy

https://github.com/Cppowboy

AI & ML interests

Code LLM, Function Calling, Code Interpreter, Vision-Language Pretraining, Text-Rich Vision-Language Pretraining

Recent Activity

liked a dataset 5 days ago

nvidia/OpenMathReasoning

upvoted a paper 5 days ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

upvoted a paper 6 days ago

ToolRL: Reward is All Tool Learning Needs

View all activity

Organizations

cppowboy's activity

upvoted a paper 5 days ago

LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale

Paper • 2504.16030 • Published 6 days ago • 30

upvoted 2 papers 6 days ago

ToolRL: Reward is All Tool Learning Needs

Paper • 2504.13958 • Published 12 days ago • 40

Learning to Reason under Off-Policy Guidance

Paper • 2504.14945 • Published 7 days ago • 77

upvoted a collection 19 days ago

Kimi-VL-A3B

Collection

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 16 days ago • 63

upvoted 2 papers about 1 month ago

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Paper • 2503.14476 • Published Mar 18 • 122

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13 • 161

upvoted 5 papers 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 156

upvoted a paper 3 months ago

UltraIF: Advancing Instruction Following from the Wild

Paper • 2502.04153 • Published Feb 6 • 22

upvoted an article 3 months ago

Article

The N Implementation Details of RLHF with PPO

Oct 24, 2023

• 50

upvoted 4 papers 3 months ago

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Paper • 2501.12326 • Published Jan 21 • 58

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 289

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 89

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 99

upvoted a paper 4 months ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 99