Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper β’ 2504.08672 β’ Published 26 days ago β’ 54
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper β’ 2504.10127 β’ Published 23 days ago β’ 17
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning Paper β’ 2504.00487 β’ Published Apr 1 β’ 18
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper β’ 2503.21620 β’ Published Mar 27 β’ 62
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper β’ 2503.16905 β’ Published Mar 21 β’ 54
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper β’ 2503.16874 β’ Published Mar 21 β’ 44
Ο-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper β’ 2503.13288 β’ Published Mar 17 β’ 51
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper β’ 2503.11227 β’ Published Mar 14 β’ 24
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper β’ 2503.12329 β’ Published Mar 16 β’ 25
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper β’ 2502.07346 β’ Published Feb 11 β’ 54
Are Large Language Models Really Good Logical Reasoners? A Comprehensive Evaluation and Beyond Paper β’ 2306.09841 β’ Published Jun 16, 2023 β’ 3
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper β’ 2501.18119 β’ Published Jan 30 β’ 25
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis Paper β’ 2412.19723 β’ Published Dec 27, 2024 β’ 88
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting Paper β’ 2411.17176 β’ Published Nov 26, 2024 β’ 24
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization Paper β’ 2411.10442 β’ Published Nov 15, 2024 β’ 81
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper β’ 2411.10323 β’ Published Nov 15, 2024 β’ 35
Vision-Language Models Can Self-Improve Reasoning via Reflection Paper β’ 2411.00855 β’ Published Oct 30, 2024 β’ 5
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper β’ 2410.23218 β’ Published Oct 30, 2024 β’ 51
AgentStore: Scalable Integration of Heterogeneous Agents As Specialized Generalist Computer Assistant Paper β’ 2410.18603 β’ Published Oct 24, 2024 β’ 33