--- title: DeepSound-V1 colorFrom: blue colorTo: indigo sdk: gradio app_file: app.py pinned: false ---
## [DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos](https://github.com/lym0302/DeepSound-V1) ## Highlight DeepSound-V1 is a framework enabling audio generation from videos towards initial step-by-step thinking without extra annotations based on the internal chain-of-thought (CoT) of Multi-modal large language model(MLLM). ## Installation ```bash conda create -n deepsound-v1 python=3.10.16 -y conda activate deepsound-v1 pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu120 pip install flash-attn==2.5.8 --no-build-isolation pip install -e . pip install -r reqirments.txt ``` ## Demo ### Pretrained models See [MODELS.md](docs/MODELS.md). ### Command-line interface With `demo.py` ```bash python demo.py -i