ibm-research
/

DAC.speech.v1.0

Model card Files Files and versions Community

slavashe commited on Sep 19, 2024

Commit

f11347d

·

1 Parent(s): dc73522

update README

Files changed (1) hide show

README.md +61 -0

README.md CHANGED Viewed

@@ -1,3 +1,64 @@
 ---
 license: cdla-permissive-2.0
 ---

 ---
 license: cdla-permissive-2.0
 ---
+## Model Summary
+[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for speech-only signals with high-quality signal reconstruction.
+## Usage
+follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
+download the model weights from the current repo (e.g., *weights_24khz_1.5kbps_v1.0*)
+### Compress audio
+```
+python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path /path/to/weights_24khz_1.5kbps_v1.0
+```
+This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
+### Reconstruct audio from compressed codes
+```
+python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path /path/to/weights_24khz_1.5kbps_v1.0
+```
+This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
+### Programmatic Usage
+```py
+import dac
+from audiotools import AudioSignal
+# Download a model
+model_path = /path/to/weights_24khz_1.5kbps_v1.0
+model = dac.DAC.load(model_path)
+model.to('cuda')
+# Load audio signal file
+signal = AudioSignal('input.wav')
+# Encode audio signal as one long file
+# (may run out of GPU memory on long files)
+signal.to(model.device)
+x = model.preprocess(signal.audio_data, signal.sample_rate)
+z, codes, latents, _, _ = model.encode(x)
+# Decode audio signal
+y = model.decode(z)
+# Alternatively, use the `compress` and `decompress` functions
+# to compress long files.
+signal = signal.cpu()
+x = model.compress(signal)
+# Save and load to and from disk
+x.save("compressed.dac")
+x = dac.DACFile.load("compressed.dac")
+# Decompress it back to an AudioSignal
+y = model.decompress(x)
+# Write to file
+y.write('output.wav')
+```