Soumye Singhal's picture

12 11

Soumye Singhal

soumye

·

AI & ML interests

LLM Post-training

Recent Activity

upvoted a paper about 1 hour ago

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

upvoted a collection about 1 hour ago

OpenMathReasoning

upvoted a collection about 1 hour ago

RL+reason model

View all activity

Organizations

soumye's activity

upvoted a paper about 1 hour ago

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

Paper • 2504.16891 • Published 15 days ago • 19

upvoted 3 collections about 1 hour ago

OpenMathReasoning

Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated 3 days ago • 36

RL+reason model

125 items • Updated about 11 hours ago • 6

Fav-papers

31 items • Updated 1 day ago • 3

authored 6 papers 2 days ago

Llama-Nemotron: Efficient Reasoning Models

Paper • 2505.00949 • Published 7 days ago • 26

Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective

Paper • 2311.14948 • Published Nov 25, 2023

Adversarial Training of Reward Models

Paper • 2504.06141 • Published about 1 month ago

Countering Language Drift with Seeded Iterated Learning

Paper • 2003.12694 • Published Mar 28, 2020 • 1

Recall Traces: Backtracking Models for Efficient Reinforcement Learning

Paper • 1804.00379 • Published Apr 2, 2018

Supervised Seeded Iterated Learning for Interactive Language Learning

Paper • 2010.02975 • Published Oct 6, 2020

upvoted 2 papers 3 days ago

Countering Language Drift with Seeded Iterated Learning

Paper • 2003.12694 • Published Mar 28, 2020 • 1

Llama-Nemotron: Efficient Reasoning Models

Paper • 2505.00949 • Published 7 days ago • 26

liked a model 3 days ago

nvidia/Llama-3_1-Nemotron-Ultra-253B-v1-FP8

Text Generation • Updated about 2 hours ago • 43 • 7

upvoted a paper 9 days ago

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment

Paper • 2502.00203 • Published Jan 31 • 2

authored 2 papers 24 days ago

Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment

Paper • 2502.00203 • Published Jan 31 • 2

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Paper • 2504.03624 • Published Apr 4 • 13

upvoted a paper 24 days ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Paper • 2504.03624 • Published Apr 4 • 13

liked 3 models 24 days ago

nvidia/Nemotron-H-56B-Base-8K

Text Generation • Updated 22 days ago • 943 • 26

nvidia/Nemotron-H-47B-Base-8K

Text Generation • Updated 16 days ago • 1.34k • 17

nvidia/Nemotron-H-8B-Base-8K

Text Generation • Updated 22 days ago • 8.94k • 39