-
FLAME: Factuality-Aware Alignment for Large Language Models
Paper • 2405.01525 • Published • 29 -
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data
Paper • 2405.14333 • Published • 41 -
Transformers Can Do Arithmetic with the Right Embeddings
Paper • 2405.17399 • Published • 54 -
EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture
Paper • 2405.18991 • Published • 12
Collections
Discover the best community collections!
Collections including paper arxiv:2503.08681
-
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15 -
Self-Improving Robust Preference Optimization
Paper • 2406.01660 • Published • 20 -
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition
Paper • 2503.00735 • Published • 21 -
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge
Paper • 2407.19594 • Published • 21
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 121 -
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Paper • 2502.12853 • Published • 29 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Self-Taught Self-Correction for Small Language Models
Paper • 2503.08681 • Published • 15
-
Rho-1: Not All Tokens Are What You Need
Paper • 2404.07965 • Published • 94 -
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time
Paper • 2404.10667 • Published • 19 -
Instruction-tuned Language Models are Better Knowledge Learners
Paper • 2402.12847 • Published • 27 -
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper • 2402.09353 • Published • 27
-
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 23 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 49 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 148 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 116