File size: 1,306 Bytes
e3ed6ed
 
e19aac6
 
 
e3ed6ed
e19aac6
e3ed6ed
 
86f2b4a
 
e3ed6ed
 
29b141e
 
 
 
 
 
 
 
e3ed6ed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
title: Describe Anything
emoji: 
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.7.1
app_file: app.py
pinned: false
models:
- nvidia/DAM-3B
---

# Describe Anything: Detailed Localized Image and Video Captioning

**NVIDIA, UC Berkeley, UCSF**

[Long Lian](https://tonylian.com), [Yifan Ding](https://research.nvidia.com/person/yifan-ding), [Yunhao Ge](https://gyhandy.github.io/), [Sifei Liu](https://sifeiliu.net/), [Hanzi Mao](https://hanzimao.me/), [Boyi Li](https://sites.google.com/site/boyilics/home), [Marco Pavone](https://research.nvidia.com/person/marco-pavone), [Ming-Yu Liu](https://mingyuliu.net/), [Trevor Darrell](https://people.eecs.berkeley.edu/~trevor/), [Adam Yala](https://www.adamyala.org/), [Yin Cui](https://ycui.me/)

[[Paper](https://arxiv.org/abs/2504.16072)] | [[Code](https://github.com/NVlabs/describe-anything)] | [[Project Page](https://describe-anything.github.io/)] | [[Video](https://describe-anything.github.io/#video)] | [[HuggingFace Demo](https://huggingface.co/spaces/nvidia/describe-anything-model-demo)] | [[Model/Benchmark/Datasets](https://huggingface.co/collections/nvidia/describe-anything-680825bb8f5e41ff0785834c)] | [[Citation](#citation)]

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference