update README
Browse files
README.md
CHANGED
@@ -1,3 +1,64 @@
|
|
1 |
---
|
2 |
license: cdla-permissive-2.0
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: cdla-permissive-2.0
|
3 |
---
|
4 |
+
|
5 |
+
## Model Summary
|
6 |
+
[DAC auto-encoder models](https://github.com/descriptinc/descript-audio-codec) provide compact discrete tokenization of speech and audio signals that facilitate signal generation by cascaded generative AI models (e.g. multi-modal generative AI models) and high-quality reconstruction of the original signals. [The current models](https://www.isca-archive.org/interspeech_2024/shechtman24_interspeech.pdf) improve upon the [original DAC models](https://github.com/descriptinc/descript-audio-codec) by allowing a more compact representation for speech-only signals with high-quality signal reconstruction.
|
7 |
+
|
8 |
+
## Usage
|
9 |
+
follow [DAC](https://github.com/descriptinc/descript-audio-codec) installation instructions
|
10 |
+
download the model weights from the current repo (e.g., *weights_24khz_1.5kbps_v1.0*)
|
11 |
+
### Compress audio
|
12 |
+
```
|
13 |
+
python3 -m dac encode /path/to/input --output /path/to/output/codes --weights_path /path/to/weights_24khz_1.5kbps_v1.0
|
14 |
+
```
|
15 |
+
|
16 |
+
This command will create `.dac` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac encode --help` for more options.
|
17 |
+
|
18 |
+
### Reconstruct audio from compressed codes
|
19 |
+
```
|
20 |
+
python3 -m dac decode /path/to/output/codes --output /path/to/reconstructed_input --weights_path /path/to/weights_24khz_1.5kbps_v1.0
|
21 |
+
```
|
22 |
+
|
23 |
+
This command will create `.wav` files with the same name as the input files. It will also preserve the directory structure relative to input root and re-create it in the output directory. Please use `python -m dac decode --help` for more options.
|
24 |
+
|
25 |
+
### Programmatic Usage
|
26 |
+
```py
|
27 |
+
import dac
|
28 |
+
from audiotools import AudioSignal
|
29 |
+
|
30 |
+
# Download a model
|
31 |
+
model_path = /path/to/weights_24khz_1.5kbps_v1.0
|
32 |
+
model = dac.DAC.load(model_path)
|
33 |
+
|
34 |
+
model.to('cuda')
|
35 |
+
|
36 |
+
# Load audio signal file
|
37 |
+
signal = AudioSignal('input.wav')
|
38 |
+
|
39 |
+
# Encode audio signal as one long file
|
40 |
+
# (may run out of GPU memory on long files)
|
41 |
+
signal.to(model.device)
|
42 |
+
|
43 |
+
x = model.preprocess(signal.audio_data, signal.sample_rate)
|
44 |
+
z, codes, latents, _, _ = model.encode(x)
|
45 |
+
|
46 |
+
# Decode audio signal
|
47 |
+
y = model.decode(z)
|
48 |
+
|
49 |
+
# Alternatively, use the `compress` and `decompress` functions
|
50 |
+
# to compress long files.
|
51 |
+
|
52 |
+
signal = signal.cpu()
|
53 |
+
x = model.compress(signal)
|
54 |
+
|
55 |
+
# Save and load to and from disk
|
56 |
+
x.save("compressed.dac")
|
57 |
+
x = dac.DACFile.load("compressed.dac")
|
58 |
+
|
59 |
+
# Decompress it back to an AudioSignal
|
60 |
+
y = model.decompress(x)
|
61 |
+
|
62 |
+
# Write to file
|
63 |
+
y.write('output.wav')
|
64 |
+
```
|