Spaces:
Sleeping
Sleeping
Update Readme
Browse files
README.md
CHANGED
@@ -10,4 +10,89 @@ pinned: false
|
|
10 |
short_description: This tool is intended to help transcribing interviews.
|
11 |
---
|
12 |
|
13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
short_description: This tool is intended to help transcribing interviews.
|
11 |
---
|
12 |
|
13 |
+
# Audio Transcription App
|
14 |
+
|
15 |
+
A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.
|
16 |
+
|
17 |
+
## Features
|
18 |
+
|
19 |
+
- **Multiple Audio File Support**: Process multiple MP3 or M4A files simultaneously
|
20 |
+
- **Silence Removal**: Option to remove silence from audio to reduce processing time and improve accuracy
|
21 |
+
- **Audio Chunking**: Split long audio files into manageable chunks for better processing
|
22 |
+
- **Multiple Language Support**: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
|
23 |
+
- **Multiple Whisper Models**: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
|
24 |
+
- **Detailed Output**: Get both full transcriptions and segment-wise transcriptions with timestamps
|
25 |
+
- **Download Results**: All processed files and transcripts are provided in a convenient ZIP file
|
26 |
+
|
27 |
+
## Setup
|
28 |
+
|
29 |
+
1. Clone the repository
|
30 |
+
2. Install the required dependencies:
|
31 |
+
```bash
|
32 |
+
pip install -r requirements.txt
|
33 |
+
```
|
34 |
+
3. Make sure you have ffmpeg installed on your system
|
35 |
+
|
36 |
+
## Usage
|
37 |
+
|
38 |
+
1. Run the application:
|
39 |
+
```bash
|
40 |
+
python app.py
|
41 |
+
```
|
42 |
+
2. Open the provided local URL in your web browser
|
43 |
+
3. Upload your audio file(s)
|
44 |
+
4. Configure the settings:
|
45 |
+
- Enable/disable silence removal
|
46 |
+
- Enable/disable audio chunking
|
47 |
+
- Select the Whisper model size
|
48 |
+
- Choose the target language
|
49 |
+
5. Click "Process" to start transcription
|
50 |
+
6. View the results and download the ZIP file containing all processed files
|
51 |
+
|
52 |
+
## Settings
|
53 |
+
|
54 |
+
### Silence Removal
|
55 |
+
- **Minimum Silence Length**: 100-2000ms (default: 500ms)
|
56 |
+
- **Silence Threshold**: -70 to -30dB (default: -50dB)
|
57 |
+
|
58 |
+
### Chunking
|
59 |
+
- **Chunk Duration**: 60-3600 seconds (default: 600 seconds/10 minutes)
|
60 |
+
- **FFmpeg Path**: Path to ffmpeg executable (default: "ffmpeg")
|
61 |
+
|
62 |
+
### Transcription
|
63 |
+
- **Model Size**: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
|
64 |
+
- **Language**: German (de), English (en), French (fr), Spanish (es), Italian (it)
|
65 |
+
|
66 |
+
## Output
|
67 |
+
|
68 |
+
- **Full Transcription**: Complete text of the audio file
|
69 |
+
- **Segmented Transcription**: Text segments with timestamps
|
70 |
+
- **ZIP File**: Contains:
|
71 |
+
- Processed audio files
|
72 |
+
- Individual transcript files
|
73 |
+
- Combined transcript file
|
74 |
+
|
75 |
+
## Deployment on Hugging Face Spaces
|
76 |
+
|
77 |
+
1. Create a new Space on Hugging Face
|
78 |
+
2. Choose "Gradio" as the SDK
|
79 |
+
3. Upload the following files:
|
80 |
+
- app.py
|
81 |
+
- requirements.txt
|
82 |
+
4. The app will automatically deploy and be available at your Space's URL
|
83 |
+
|
84 |
+
## Requirements
|
85 |
+
|
86 |
+
- Python 3.7+
|
87 |
+
- ffmpeg
|
88 |
+
- See requirements.txt for Python package dependencies
|
89 |
+
|
90 |
+
## License
|
91 |
+
|
92 |
+
This project is open source and available under the MIT License.
|
93 |
+
|
94 |
+
## Acknowledgments
|
95 |
+
|
96 |
+
- [OpenAI Whisper](https://github.com/openai/whisper)
|
97 |
+
- [Gradio](https://gradio.app/)
|
98 |
+
- [FFmpeg](https://ffmpeg.org/)
|