Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 3 days ago • 49
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 3 days ago • 49
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 16 days ago • 60