> [**VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding**](https://github.com/DAMO-NLP-SG/VideoLLaMA3)
> Boqiang Zhang* , Kehan Li* , Zesen Cheng* , Zhiqiang Hu* , Yuqian Yuan* , Guanzheng Chen* , Sicong Leng* , Yuming Jiang* , Hang Zhang* , Xin Li* , Peng Jin, Wenqi Zhang, Fan Wang, Lidong Bing, Deli Zhao
[](https://github.com/DAMO-NLP-SG/VideoLLaMA3) [](https://github.com/DAMO-NLP-SG/VideoLLaMA3) [](https://arxiv.org/abs/2501.13106)
> [**Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding**](https://github.com/DAMO-NLP-SG/Video-LLaMA)
> Hang Zhang, Xin Li, Lidong Bing
[](https://github.com/DAMO-NLP-SG/Video-LLaMA) [](https://github.com/DAMO-NLP-SG/Video-LLaMA) [](https://arxiv.org/abs/2306.02858)
> [**VCD: Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding**](https://arxiv.org/abs/2311.16922)
> Sicong Leng* , Hang Zhang* , Guanzheng Chen, Xin Li, Shijian Lu, Chunyan Miao, Lidong Bing
[](https://github.com/DAMO-NLP-SG/VCD) [](https://github.com/DAMO-NLP-SG/VCD) [](https://arxiv.org/abs/2311.16922)
> [**The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio**](https://arxiv.org/abs/2410.12787)
> Sicong Leng, Yun Xing, Zesen Cheng, Yang Zhou, Hang Zhang, Xin Li, Deli Zhao, Shijian Lu, Chunyan Miao, Lidong Bing
[](https://github.com/DAMO-NLP-SG/CMM) [](https://github.com/DAMO-NLP-SG/CMM) [](https://arxiv.org/abs/2410.12787)
> [**Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss**](https://arxiv.org/abs/2410.17243)
> Zesen Cheng*, Hang Zhang*, Kehan Li*, Sicong Leng, Zhiqiang Hu, Fei Wu, Deli Zhao, Xin Li, Lidong Bing
[](https://github.com/DAMO-NLP-SG/Inf-CLIP) [](https://github.com/DAMO-NLP-SG/Inf-CLIP) [](https://arxiv.org/abs/2410.17243)