doyouknowmarc commited on
Commit
121e197
·
verified ·
1 Parent(s): a64043c

Update Readme

Browse files
Files changed (1) hide show
  1. README.md +86 -1
README.md CHANGED
@@ -10,4 +10,89 @@ pinned: false
10
  short_description: This tool is intended to help transcribing interviews.
11
  ---
12
 
13
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  short_description: This tool is intended to help transcribing interviews.
11
  ---
12
 
13
+ # Audio Transcription App
14
+
15
+ A Gradio-based web application for transcribing audio files (MP3 or M4A) using OpenAI's Whisper model. Perfect for transcribing interviews and long audio recordings with features like silence removal and audio chunking.
16
+
17
+ ## Features
18
+
19
+ - **Multiple Audio File Support**: Process multiple MP3 or M4A files simultaneously
20
+ - **Silence Removal**: Option to remove silence from audio to reduce processing time and improve accuracy
21
+ - **Audio Chunking**: Split long audio files into manageable chunks for better processing
22
+ - **Multiple Language Support**: Supports German (de), English (en), French (fr), Spanish (es), and Italian (it)
23
+ - **Multiple Whisper Models**: Choose from various Whisper model sizes (tiny to large-v3-turbo) based on your needs
24
+ - **Detailed Output**: Get both full transcriptions and segment-wise transcriptions with timestamps
25
+ - **Download Results**: All processed files and transcripts are provided in a convenient ZIP file
26
+
27
+ ## Setup
28
+
29
+ 1. Clone the repository
30
+ 2. Install the required dependencies:
31
+ ```bash
32
+ pip install -r requirements.txt
33
+ ```
34
+ 3. Make sure you have ffmpeg installed on your system
35
+
36
+ ## Usage
37
+
38
+ 1. Run the application:
39
+ ```bash
40
+ python app.py
41
+ ```
42
+ 2. Open the provided local URL in your web browser
43
+ 3. Upload your audio file(s)
44
+ 4. Configure the settings:
45
+ - Enable/disable silence removal
46
+ - Enable/disable audio chunking
47
+ - Select the Whisper model size
48
+ - Choose the target language
49
+ 5. Click "Process" to start transcription
50
+ 6. View the results and download the ZIP file containing all processed files
51
+
52
+ ## Settings
53
+
54
+ ### Silence Removal
55
+ - **Minimum Silence Length**: 100-2000ms (default: 500ms)
56
+ - **Silence Threshold**: -70 to -30dB (default: -50dB)
57
+
58
+ ### Chunking
59
+ - **Chunk Duration**: 60-3600 seconds (default: 600 seconds/10 minutes)
60
+ - **FFmpeg Path**: Path to ffmpeg executable (default: "ffmpeg")
61
+
62
+ ### Transcription
63
+ - **Model Size**: Choose from tiny, base, small, medium, large, large-v2, large-v3, turbo, or large-v3-turbo
64
+ - **Language**: German (de), English (en), French (fr), Spanish (es), Italian (it)
65
+
66
+ ## Output
67
+
68
+ - **Full Transcription**: Complete text of the audio file
69
+ - **Segmented Transcription**: Text segments with timestamps
70
+ - **ZIP File**: Contains:
71
+ - Processed audio files
72
+ - Individual transcript files
73
+ - Combined transcript file
74
+
75
+ ## Deployment on Hugging Face Spaces
76
+
77
+ 1. Create a new Space on Hugging Face
78
+ 2. Choose "Gradio" as the SDK
79
+ 3. Upload the following files:
80
+ - app.py
81
+ - requirements.txt
82
+ 4. The app will automatically deploy and be available at your Space's URL
83
+
84
+ ## Requirements
85
+
86
+ - Python 3.7+
87
+ - ffmpeg
88
+ - See requirements.txt for Python package dependencies
89
+
90
+ ## License
91
+
92
+ This project is open source and available under the MIT License.
93
+
94
+ ## Acknowledgments
95
+
96
+ - [OpenAI Whisper](https://github.com/openai/whisper)
97
+ - [Gradio](https://gradio.app/)
98
+ - [FFmpeg](https://ffmpeg.org/)