Spaces:

doyouknowmarc
/

Transcription

Sleeping

App Files Files Community

Transcription / README.md

doyouknowmarc

Update Readme

121e197 verified 3 months ago

preview code

raw

history blame contribute delete

3.1 kB

	---
	title: Transcription
	emoji: 👀
	colorFrom: yellow
	colorTo: pink
	sdk: gradio
	sdk_version: 5.15.0
	app_file: app.py
	pinned: false
	short_description: This tool is intended to help transcribing interviews.
	---

	# Audio Transcription App

	A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.

	## Features

	- Multiple Audio File Support: Process multiple MP3 or M4A files simultaneously
	- Silence Removal: Option to remove silence from audio to reduce processing time and improve accuracy
	- Audio Chunking: Split long audio files into manageable chunks for better processing
	- Multiple Language Support: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
	- Multiple Whisper Models: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
	- Detailed Output: Get both full transcriptions and segment-wise transcriptions with timestamps
	- Download Results: All processed files and transcripts are provided in a convenient ZIP file

	## Setup

	1. Clone the repository
	2. Install the required dependencies:
	```bash
	pip install -r requirements.txt
	```
	3. Make sure you have ffmpeg installed on your system

	## Usage

	1. Run the application:
	```bash
	python app.py
	```
	2. Open the provided local URL in your web browser
	3. Upload your audio file(s)
	4. Configure the settings:
	- Enable/disable silence removal
	- Enable/disable audio chunking
	- Select the Whisper model size
	- Choose the target language
	5. Click "Process" to start transcription
	6. View the results and download the ZIP file containing all processed files

	## Settings

	### Silence Removal
	- Minimum Silence Length: 100-2000ms (default: 500ms)
	- Silence Threshold: -70 to -30dB (default: -50dB)

	### Chunking
	- Chunk Duration: 60-3600 seconds (default: 600 seconds/10 minutes)
	- FFmpeg Path: Path to ffmpeg executable (default: "ffmpeg")

	### Transcription
	- Model Size: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
	- Language: German (de), English (en), French (fr), Spanish (es), Italian (it)

	## Output

	- Full Transcription: Complete text of the audio file
	- Segmented Transcription: Text segments with timestamps
	- ZIP File: Contains:
	- Processed audio files
	- Individual transcript files
	- Combined transcript file

	## Deployment on Hugging Face Spaces

	1. Create a new Space on Hugging Face
	2. Choose "Gradio" as the SDK
	3. Upload the following files:
	- app.py
	- requirements.txt
	4. The app will automatically deploy and be available at your Space's URL

	## Requirements

	- Python 3.7+
	- ffmpeg
	- See requirements.txt for Python package dependencies

	## License

	This project is open source and available under the MIT License.

	## Acknowledgments

	- [OpenAI Whisper](https://github.com/openai/whisper)
	- [Gradio](https://gradio.app/)
	- [FFmpeg](https://ffmpeg.org/)