2 1 1

Jifeng Dai

daijifeng

https://jifengdai.org/

AI & ML interests

None yet

Recent Activity

authored a paper 15 days ago

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

upvoted a paper 24 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

authored a paper about 1 month ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

View all activity

Organizations

None yet

daijifeng's activity

authored a paper 15 days ago

VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models

Paper • 2504.15279 • Published 17 days ago • 73

upvoted a paper 24 days ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published 24 days ago • 255

authored a paper about 1 month ago

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

Paper • 2503.19757 • Published Mar 25 • 50

authored 2 papers about 2 months ago

VisualPRM: An Effective Process Reward Model for Multimodal Reasoning

Paper • 2503.10291 • Published Mar 13 • 36

GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing

Paper • 2503.10639 • Published Mar 13 • 50

authored a paper 4 months ago

Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding

Paper • 2501.07783 • Published Jan 14 • 7

authored 2 papers 5 months ago

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

Paper • 2412.09604 • Published Dec 12, 2024 • 38

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Paper • 2412.05271 • Published Dec 6, 2024 • 157

authored a paper 6 months ago

Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization

Paper • 2411.10442 • Published Nov 15, 2024 • 81

authored a paper 7 months ago

PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 57

authored a paper 9 months ago

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 62

authored 2 papers 10 months ago

Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams

Paper • 2406.08085 • Published Jun 12, 2024 • 17

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Paper • 2407.03320 • Published Jul 3, 2024 • 96

authored a paper about 1 year ago

Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments

Paper • 2403.13803 • Published Mar 20, 2024

liked a model about 1 year ago

OpenGVLab/InternVL-Chat-V1-5

Image-Text-to-Text • Updated Mar 25 • 3.1k • 411

authored 2 papers over 1 year ago

Motion-I2V: Consistent and Controllable Image-to-Video Generation with Explicit Motion Modeling

Paper • 2401.15977 • Published Jan 29, 2024 • 40

ControlLLM: Augment Language Models with Tools by Searching on Graphs

Paper • 2310.17796 • Published Oct 26, 2023 • 18

authored 3 papers almost 2 years ago

The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World

Paper • 2308.01907 • Published Aug 3, 2023 • 12

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

Paper • 2305.17144 • Published May 25, 2023 • 2

InternChat: Solving Vision-Centric Tasks by Interacting with Chatbots Beyond Language

Paper • 2305.05662 • Published May 9, 2023 • 4