JuanRafap
's Collections
Interés
updated
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum
Reinforcement Learning
Paper
•
2411.02337
•
Published
•
38
Mixture-of-Transformers: A Sparse and Scalable Architecture for
Multi-Modal Foundation Models
Paper
•
2411.04996
•
Published
•
52
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle
Grandmaster Level
Paper
•
2411.03562
•
Published
•
68
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via
Inference-time Hybrid Information Structurization
Paper
•
2410.08815
•
Published
•
50
Game-theoretic LLM: Agent Workflow for Negotiation Games
Paper
•
2411.05990
•
Published
•
8
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large
Language Models on Mobile Devices
Paper
•
2411.10640
•
Published
•
47
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Paper
•
2411.19146
•
Published
•
18
Snowflake/snowflake-arctic-embed-m-v2.0
Sentence Similarity
•
Updated
•
113k
•
77
Snowflake/snowflake-arctic-embed-l-v2.0
Sentence Similarity
•
Updated
•
236k
•
•
165
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases
Paper
•
2412.04862
•
Published
•
51
ruliad/deepthought-8b-llama-v0.01-alpha
Text Generation
•
Updated
•
25
•
145
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's
Reasoning Capability
Paper
•
2411.19943
•
Published
•
64
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on
Retrieval-Augmented Generation
Paper
•
2412.02592
•
Published
•
23
RL Zero: Zero-Shot Language to Behaviors without any Supervision
Paper
•
2412.05718
•
Published
•
5
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal
Retrieval-Augmented Generation
Paper
•
2412.10704
•
Published
•
15
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented
Generation for Preference Alignment
Paper
•
2412.13746
•
Published
•
9
Wonderful Matrices: Combining for a More Efficient and Effective
Foundation Model Architecture
Paper
•
2412.11834
•
Published
•
8
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation
Model Internet Agents
Paper
•
2412.13194
•
Published
•
12
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Paper
•
2412.14711
•
Published
•
16
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
•
2412.15797
•
Published
•
18
Progressive Multimodal Reasoning via Active Retrieval
Paper
•
2412.14835
•
Published
•
74
MixLLM: LLM Quantization with Global Mixed-precision between
Output-features and Highly-efficient System Design
Paper
•
2412.14590
•
Published
•
14
Learned Compression for Compressed Learning
Paper
•
2412.09405
•
Published
•
13
Token-Budget-Aware LLM Reasoning
Paper
•
2412.18547
•
Published
•
47
ericsonwillians/distilbert-base-uncased-steam-sentiment
Text Classification
•
Updated
•
27
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
40
Personalized Graph-Based Retrieval for Large Language Models
Paper
•
2501.02157
•
Published
•
32
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper
•
2412.18925
•
Published
•
103
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper
•
2501.04652
•
Published
•
10
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
101
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
Paper
•
2501.02576
•
Published
•
15
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
•
2501.03262
•
Published
•
99
BoostStep: Boosting mathematical capability of Large Language Models via
improved single-step reasoning
Paper
•
2501.03226
•
Published
•
45
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
114
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
•
2501.09686
•
Published
•
41
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Paper
•
2501.08617
•
Published
•
10
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
98
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot
Paper
•
2501.09012
•
Published
•
10
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
•
2501.06590
•
Published
•
11
CodeElo: Benchmarking Competition-level Code Generation of LLMs with
Human-comparable Elo Ratings
Paper
•
2501.01257
•
Published
•
53
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary
Feedback
Paper
•
2501.10799
•
Published
•
15
Control LLM: Controlled Evolution for Intelligence Retention in LLM
Paper
•
2501.10979
•
Published
•
6
Autonomy-of-Experts Models
Paper
•
2501.13074
•
Published
•
45
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large
Language Models via a Multi-Paradigm Perspective
Paper
•
2501.11110
•
Published
•
3
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning
Paper
•
2412.09078
•
Published
LLM2: Let Large Language Models Harness System 2 Reasoning
Paper
•
2412.20372
•
Published
TinyThinker: Distilling Reasoning through Coarse-to-Fine Knowledge
Internalization with Self-Reflection
Paper
•
2412.08024
•
Published
•
1
Table as Thought: Exploring Structured Thoughts in LLM Reasoning
Paper
•
2501.02152
•
Published
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
390
Self-supervised Quantized Representation for Seamlessly Integrating
Knowledge Graphs with Large Language Models
Paper
•
2501.18119
•
Published
•
25
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Paper
•
2502.01534
•
Published
•
40
The Differences Between Direct Alignment Algorithms are a Blur
Paper
•
2502.01237
•
Published
•
115
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper
•
2501.13200
•
Published
•
68
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning
Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles
Paper
•
2502.01081
•
Published
•
14
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
•
2502.05664
•
Published
•
23
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
•
2502.06060
•
Published
•
38
Paper
•
2502.06049
•
Published
•
30
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
61
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
•
2502.06703
•
Published
•
152
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference
Paper
•
2502.04416
•
Published
•
12
Goku: Flow Based Video Generative Foundation Models
Paper
•
2502.04896
•
Published
•
104
In-Context Retrieval-Augmented Language Models
Paper
•
2302.00083
•
Published
•
1
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Paper
•
2502.16614
•
Published
•
27
Can Large Language Models Detect Errors in Long Chain-of-Thought
Reasoning?
Paper
•
2502.19361
•
Published
•
28
STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task
Planning
Paper
•
2502.10177
•
Published
•
6
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem
Proving
Paper
•
2502.07640
•
Published
•
8
LoRACode: LoRA Adapters for Code Embeddings
Paper
•
2503.05315
•
Published
•
11
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
•
2503.04808
•
Published
•
18
ds4sd/SmolDocling-256M-preview
Image-Text-to-Text
•
Updated
•
84k
•
1.33k
Bridging Continuous and Discrete Tokens for Autoregressive Visual
Generation
Paper
•
2503.16430
•
Published
•
35
MAPS: A Multi-Agent Framework Based on Big Seven Personality and
Socratic Guidance for Multimodal Scientific Problem Solving
Paper
•
2503.16905
•
Published
•
54
Improving Autoregressive Image Generation through Coarse-to-Fine Token
Prediction
Paper
•
2503.16194
•
Published
•
8
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper
•
2503.15055
•
Published
•
6
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
•
2503.16219
•
Published
•
48
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Paper
•
2503.16356
•
Published
•
15
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement
Learning
Paper
•
2503.15265
•
Published
•
47
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
•
2503.14476
•
Published
•
124
Advancing Language Model Reasoning through Reinforcement Learning and
Inference Scaling
Paper
•
2501.11651
•
Published
•
1
API Agents vs. GUI Agents: Divergence and Convergence
Paper
•
2503.11069
•
Published
•
37
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
•
2503.12937
•
Published
•
29
Self-Evolved Preference Optimization for Enhancing Mathematical
Reasoning in Small Language Models
Paper
•
2503.04813
•
Published
Full-Step-DPO: Self-Supervised Preference Optimization with Step-wise
Rewards for Mathematical Reasoning
Paper
•
2502.14356
•
Published
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM
Reasoning via Autoregressive Search
Paper
•
2502.02508
•
Published
•
23
NousResearch/DeepHermes-3-Mistral-24B-Preview
Text Generation
•
Updated
•
2.17k
•
94
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
17
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
50
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time
Thinking
Paper
•
2503.19855
•
Published
•
27
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs
using Particle-Based Monte Carlo Methods
Paper
•
2502.01618
•
Published
•
10
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
290
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language
Model Born from Transformer
Paper
•
2501.15570
•
Published
•
25
open-thoughts/OpenThoughts-114k
Viewer
•
Updated
•
228k
•
23.7k
•
703
Beyond Prompt Content: Enhancing LLM Performance via Content-Format
Integrated Prompt Optimization
Paper
•
2502.04295
•
Published
•
13
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
•
2504.00824
•
Published
•
41
ZClip: Adaptive Spike Mitigation for LLM Pre-Training
Paper
•
2504.02507
•
Published
•
78
agentica-org/DeepCoder-14B-Preview
Text Generation
•
Updated
•
48.1k
•
627
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Paper
•
2504.06261
•
Published
•
107
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
25
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
•
2504.07128
•
Published
•
83
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
•
2504.08600
•
Published
•
27
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
•
2504.03601
•
Published
•
16
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for
Autoregressive Image Generation
Paper
•
2504.08736
•
Published
•
47
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to
Reinforce
Paper
•
2504.11343
•
Published
•
16
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Paper
•
2502.01142
•
Published
•
24
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
•
2504.08672
•
Published
•
54
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question
Answering?
Paper
•
2502.13233
•
Published
•
15
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
•
2502.13922
•
Published
•
28
WorldGUI: Dynamic Testing for Comprehensive Desktop GUI Automation
Paper
•
2502.08047
•
Published
•
27
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale
Reinforcement Learning
Paper
•
2503.07365
•
Published
•
61