--- title: Transcription emoji: 👀 colorFrom: yellow colorTo: pink sdk: gradio sdk_version: 5.15.0 app_file: app.py pinned: false short_description: This tool is intended to help transcribing interviews. --- # Audio Transcription App A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking. ## Features - **Multiple Audio File Support**: Process multiple MP3 or M4A files simultaneously - **Silence Removal**: Option to remove silence from audio to reduce processing time and improve accuracy - **Audio Chunking**: Split long audio files into manageable chunks for better processing - **Multiple Language Support**: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it) - **Multiple Whisper Models**: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs - **Detailed Output**: Get both full transcriptions and segment-wise transcriptions with timestamps - **Download Results**: All processed files and transcripts are provided in a convenient ZIP file ## Setup 1. Clone the repository 2. Install the required dependencies: ```bash pip install -r requirements.txt ``` 3. Make sure you have ffmpeg installed on your system ## Usage 1. Run the application: ```bash python app.py ``` 2. Open the provided local URL in your web browser 3. Upload your audio file(s) 4. Configure the settings: - Enable/disable silence removal - Enable/disable audio chunking - Select the Whisper model size - Choose the target language 5. Click "Process" to start transcription 6. View the results and download the ZIP file containing all processed files ## Settings ### Silence Removal - **Minimum Silence Length**: 100-2000ms (default: 500ms) - **Silence Threshold**: -70 to -30dB (default: -50dB) ### Chunking - **Chunk Duration**: 60-3600 seconds (default: 600 seconds/10 minutes) - **FFmpeg Path**: Path to ffmpeg executable (default: "ffmpeg") ### Transcription - **Model Size**: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo - **Language**: German (de), English (en), French (fr), Spanish (es), Italian (it) ## Output - **Full Transcription**: Complete text of the audio file - **Segmented Transcription**: Text segments with timestamps - **ZIP File**: Contains: - Processed audio files - Individual transcript files - Combined transcript file ## Deployment on Hugging Face Spaces 1. Create a new Space on Hugging Face 2. Choose "Gradio" as the SDK 3. Upload the following files: - app.py - requirements.txt 4. The app will automatically deploy and be available at your Space's URL ## Requirements - Python 3.7+ - ffmpeg - See requirements.txt for Python package dependencies ## License This project is open source and available under the MIT License. ## Acknowledgments - [OpenAI Whisper](https://github.com/openai/whisper) - [Gradio](https://gradio.app/) - [FFmpeg](https://ffmpeg.org/)