Spaces:

jacktol
/

ATC-Transcription-Assistant

Paused

App Files Files Community

ATC-Transcription-Assistant / README.md

Jack

edited readme

73ee96d 7 months ago

preview code

raw

history blame contribute delete

3.14 kB

	---
	title: ATC Transcription Assistant
	emoji: ✈️
	colorFrom: purple
	colorTo: red
	sdk: docker
	pinned: false
	---

	# ATC Transcription Assistant

	## Overview

	Welcome to the ATC Transcription Assistant, a tool designed to transcribe Air Traffic Control (ATC) audio. This app utilizes OpenAI’s Whisper medium.en model, fine-tuned specifically for ATC communications. The fine-tuned model significantly improves transcription accuracy for aviation communications, making it a useful tool for researchers, enthusiasts, and professionals interested in analyzing ATC communications.

	This project is a part of a broader research initiative aimed at enhancing Automatic Speech Recognition (ASR) accuracy in high-stakes aviation environments.

	## Features

	- Transcription Model: The app uses a fine-tuned version of the Whisper medium.en model.
	- Audio Formats: Supports MP3 and WAV files containing ATC audio.
	- Transcription Output: Converts uploaded audio into text and displays it in an easily readable format.
	- Enhanced Accuracy: The fine-tuned model offers a Word Error Rate (WER) of 15.08%, a significant improvement over the 94.59% WER of the non-fine-tuned model.

	## Performance

	- Fine-tuned Whisper medium.en WER: 15.08%
	- Non fine-tuned Whisper medium.en WER: 94.59%
	- Relative Improvement: 84.06%

	> While the fine-tuned model provides substantial improvements, please note that transcription accuracy is not guaranteed.

	For more details on the fine-tuning process and model performance, see the [blog post](https://jacktol.net/posts/fine-tuning_whisper_on_atc_data), or check out the [project repository](https://github.com/jack-tol/fine-tuning-whisper-on-atc-data).

	## How It Works

	1. Upload ATC Audio: Upload an audio file containing ATC communications in MP3 or WAV format.
	2. View Transcription: The app will transcribe the audio and display the text on the screen.
	3. Transcribe More Audio: To transcribe another file, click New Chat in the top-right corner of the app.

	## Fine-Tuning Process

	The Whisper model was fine-tuned on a custom ATC dataset created from publicly available resources, such as:

	- The ATCO2 test subset (871 audio-transcription pairs).
	- The UWB-ATCC corpus (11.3k rows in the training set and 2.82k rows in the test set).

	After data preprocessing, dynamic data augmentation was applied to simulate challenging conditions during fine-tuning. The fine-tuned model was trained for 10 epochs on two A100 GPUs, achieving an average WER of 15.08%.

	## Limitations

	- Word Error Rate (WER): While WER is a standard evaluation metric, it does not account for subtleties like meaning or word proximity, which can make the evaluation more rigid.
	- Transcription Accuracy: In real-world applications, minor errors may occur, but these often don't significantly impact communication.

	## Get in Touch

	If you have any questions or suggestions, feel free to contact me at [[email protected]](mailto:[email protected]).

	## License

	This project is licensed under the [MIT License](LICENSE).