metadata

license: apache-2.0
library_name: unity-sentis
pipeline_tag: automatic-speech-recognition
tags:
  - unity-inference-engine

Whisper-Tiny model in Unity 6 with Inference Engine

This is the Whisper Tiny model running in Unity 6 with Inference Engine. It is a speech-to-text model that transcribes 16kHz wav audio to text.

How to Use

Create a new scene in Unity 6;
Install com.unity.ai.inference from the package manager;
Install com.unity.nuget.newtonsoft-json from the package manager;
Add the RunWhisper.cs script to the Main Camera;
Drag the decoder_model.onnx asset from the models folder into the Audio Decoder 1 field;
Drag the decoder_with_past_model.onnx asset from the models folder into the Audio Decoder 2 field;
Drag the encoder_model.onnx asset from the models folder into the Audio Encoder field;
Drag the logmel_spectrogram.onnx asset from the models folder into the Log Mel Spectro field;
Drag the vocab.json asset from the data folder into the Vocab Asset field;
Drag an audio asset, e.g. data/answering-machine16kHz.wav to the Audio Clip field. Ensure the Normalize flag is set on asset import for best results.

Enter play mode. If working correctly the transcribed audio will be logged to the console.

Inference Engine is a neural network inference library for Unity. Find out more here.