## Install NVidia NeMo environment

You can locally install NeMo environment by following [installation guide](https://github.com/heartexlabs/NeMo#installation), or quickstart it from the prebuilt Docker container:

In [None]:
!docker run --gpus all -it --rm --shm-size=8g \
-p 8888:8888 -p 6006:6006 -p 8080:8080 --ulimit memlock=-1 --ulimit \
stack=67108864 --device=/dev/snd nvcr.io/nvidia/nemo:1.0.1

Note that the default Label Studio port 8080 is exposed from Docker.

## Install Label Studio

In [None]:
!pip install label-studio

## Create ML backend with NeMo model

Let's create a simple script `asr.py` that wraps NeMo inference call and converts its output to annotation format expected by Label Studio

In [None]:
import nemo
import nemo.collections.asr as nemo_asr
from label_studio_ml.model import LabelStudioMLBase


class NemoASR(LabelStudioMLBase):

    def __init__(self, model_name='QuartzNet15x5Base-En', **kwargs):
        super(NemoASR, self).__init__(**kwargs)

        # Find TextArea control tag and bind ASR model to it
        self.from_name, self.to_name, self.value = self._bind_to_textarea()

        # This line will download pre-trained QuartzNet15x5 model from NVIDIA's NGC cloud and instantiate it for you
        self.model = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name=model_name)

    def predict(self, tasks, **kwargs):
        """Returns NeMo ASR predictions given audio files in Label Studio's tasks"""
        audio_path = self.get_local_path(tasks[0]['data'][self.value])
        transcription = self.model.transcribe(paths2audio_files=[audio_path])[0]
        return [{
            'result': [{
                'from_name': self.from_name,
                'to_name': self.to_name,
                'type': 'textarea',
                'value': {
                    'text': [transcription]
                }
            }],
            'score': 1.0
        }]

    def _bind_to_textarea(self):
        """Helper to bind inference output to annotation format expected by Label Studio"""
        from_name, to_name, value = None, None, None
        for tag_name, tag_info in self.parsed_label_config.items():
            if tag_info['type'] == 'TextArea':
                from_name = tag_name
                if len(tag_info['inputs']) > 1:
                    logger.warning(
                        'ASR model works with single Audio or AudioPlus input, '
                        'but {0} found: {1}. We\'ll use only the first one'.format(
                            len(tag_info['inputs']), ', '.join(tag_info['to_name'])))
                if tag_info['inputs'][0]['type'] not in ('Audio', 'AudioPlus'):
                    raise ValueError('{0} tag expected to be of type Audio or AudioPlus, but type {1} found'.format(
                        tag_info['to_name'][0], tag_info['inputs'][0]['type']))
                to_name = tag_info['to_name'][0]
                value = tag_info['inputs'][0]['value']
        if from_name is None:
            raise ValueError('ASR model expects <TextArea> tag to be presented in a label config.')
        return from_name, to_name, value

## Run ML backend

The following initializes ML backend by creating a directory `./nemo-ml-backend` and copying everything needed to run, including `asr.py` script.

In [None]:
!label-studio-ml init nemo-ml-backend --from asr.py

Then launch ML backend serving on default `http://localhost:9090`

In [None]:
!label-studio-ml start nemo-ml-backend

## Connect ML backend to Label Studio

Launch Label Studio web application running on `http://localhost:8080`

In [None]:
!label-studio start annotation-with-nemo --init

In Label Studio, upload audio files either by drag-and-drop, or by importing a text file with one URL referencing an audio file per line. Then, go to the **Settings** page and select the **Speech Transcription** template. Click **Save**.

On the **Model** page, add the ML backend URL `http://localhost:9090`. If it connects successfully, you see "Connected" status in green.

Then you can start to annotate your audio files by correcting the text areas prepopulated by NeMo ASR's output. After you finish labeling, you can export results in the `ASR_MANIFEST` format ready to use for [training a NeMo ASR model](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/asr/ASR_with_NeMo.ipynb)