File size: 1,949 Bytes
7934b29 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
# NeMo ASR+VAD Inference
This example provides the ASR+VAD inference pipeline, with the option to perform only ASR or VAD alone.
## Input
There are two types of input
- A manifest passed to `manifest_filepath`,
- A directory containing audios passed to `audio_dir` and also specify `audio_type` (default to `wav`).
The input manifest must be a manifest json file, where each line is a Python dictionary. The fields ["audio_filepath", "offset", "duration", "text"] are required. An example of a manifest file is:
```json
{"audio_filepath": "/path/to/audio_file1", "offset": 0, "duration": 10000, "text": "a b c d e"}
{"audio_filepath": "/path/to/audio_file2", "offset": 0, "duration": 10000, "text": "f g h i j"}
```
## Output
Output will be a folder storing the VAD predictions and/or a manifest containing the audio transcriptions. Some temporary data will also be stored.
## Usage
To run the code with ASR+VAD default settings:
```bash
python speech_to_text_with_vad.py \
manifest_filepath=/PATH/TO/MANIFEST.json \
vad_model=vad_multilingual_marblenet \
asr_model=stt_en_conformer_ctc_large \
vad_config=../conf/vad/vad_inference_postprocess.yaml
```
To use only ASR and disable VAD, set `vad_model=None` and `use_rttm=False`.
To use only VAD, set `asr_model=None` and specify both `vad_model` and `vad_config`.
To enable profiling, set `profiling=True`, but this will significantly slow down the program.
To use or disable feature masking, set `use_rttm` to `True` or `False`.
To normalize feature before masking, set `normalize=pre_norm`,
and set `normalize=post_norm` for masking before normalization.
To use a specific value for feature masking, set `feat_mask_val` to the desired value.
Default is `feat_mask_val=None`, where -16.530 (zero log mel-spectrogram value) will be used for `post_norm` and 0 (same as SpecAugment) will be used for `pre_norm`.
See more options in the `InferenceConfig` class.
|