-
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters
Paper • 2403.02677 • Published • 18 -
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Paper • 2403.03003 • Published • 11 -
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Paper • 2403.01487 • Published • 16 -
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks
Paper • 2403.00522 • Published • 47
Qingkai Fang
poeroz
AI & ML interests
Large Language Models, Speech-Language Models, Speech Translation
Recent Activity
authored
a paper
3 days ago
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
commented on
a paper
3 days ago
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
published
a dataset
19 days ago
ICTNLP/Multiturn-Speech-Conversations
Organizations
Collections
1
models
0
None public yet
datasets
0
None public yet