soiz1 commited on
Commit
6e16af3
·
verified ·
1 Parent(s): 9aaf513

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +137 -134
README.md CHANGED
@@ -1,134 +1,137 @@
1
- # Whisper-WebUI
2
- A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). You can use it as an Easy Subtitle Generator!
3
-
4
- ![screen](https://github.com/user-attachments/assets/caea3afd-a73c-40af-a347-8d57914b1d0f)
5
-
6
-
7
-
8
- ## Notebook
9
- If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
10
-
11
- # Feature
12
- - Select the Whisper implementation you want to use between :
13
- - [openai/whisper](https://github.com/openai/whisper)
14
- - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
15
- - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
16
- - Generate subtitles from various sources, including :
17
- - Files
18
- - Youtube
19
- - Microphone
20
- - Currently supported subtitle formats :
21
- - SRT
22
- - WebVTT
23
- - txt ( only text file without timeline )
24
- - Speech to Text Translation
25
- - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
26
- - Text to Text Translation
27
- - Translate subtitle files using Facebook NLLB models
28
- - Translate subtitle files using DeepL API
29
- - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
30
- - Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui).
31
- - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
32
- - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
33
- 1. https://huggingface.co/pyannote/speaker-diarization-3.1
34
- 2. https://huggingface.co/pyannote/segmentation-3.0
35
-
36
- ### Pipeline Diagram
37
- ![Transcription Pipeline](https://github.com/user-attachments/assets/1d8c63ac-72a4-4a0b-9db0-e03695dcf088)
38
-
39
- # Installation and Running
40
-
41
- - ## Running with Pinokio
42
-
43
- The app is able to run with [Pinokio](https://github.com/pinokiocomputer/pinokio).
44
-
45
- 1. Install [Pinokio Software](https://program.pinokio.computer/#/?id=install).
46
- 2. Open the software and search for Whisper-WebUI and install it.
47
- 3. Start the Whisper-WebUI and connect to the `http://localhost:7860`.
48
-
49
- - ## Running with Docker
50
-
51
- 1. Install and launch [Docker-Desktop](https://www.docker.com/products/docker-desktop/).
52
-
53
- 2. Git clone the repository
54
-
55
- ```sh
56
- git clone https://github.com/jhj0517/Whisper-WebUI.git
57
- ```
58
-
59
- 3. Build the image ( Image is about 7GB~ )
60
-
61
- ```sh
62
- docker compose build
63
- ```
64
-
65
- 4. Run the container
66
-
67
- ```sh
68
- docker compose up
69
- ```
70
-
71
- 5. Connect to the WebUI with your browser at `http://localhost:7860`
72
-
73
- If needed, update the [`docker-compose.yaml`](https://github.com/jhj0517/Whisper-WebUI/blob/master/docker-compose.yaml) to match your environment.
74
-
75
- - ## Run Locally
76
-
77
- ### Prerequisite
78
- To run this WebUI, you need to have `git`, `3.10 <= python <= 3.12`, `FFmpeg`. <br>
79
- And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the [`requirements.txt`](https://github.com/jhj0517/Whisper-WebUI/blob/master/requirements.txt) to match your environment.
80
-
81
- Please follow the links below to install the necessary software:
82
- - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
83
- - python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **`3.10 ~ 3.12` is recommended.**
84
- - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
85
- - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
86
-
87
- After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
88
-
89
- ### Installation Using the Script Files
90
-
91
- 1. git clone this repository
92
- ```shell
93
- git clone https://github.com/jhj0517/Whisper-WebUI.git
94
- ```
95
- 2. Run `install.bat` or `install.sh` to install dependencies. (It will create a `venv` directory and install dependencies there.)
96
- 3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
97
-
98
- And you can also run the project with command line arguments if you like to, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
99
-
100
- # VRAM Usages
101
- This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
102
-
103
- According to faster-whisper, the efficiency of the optimized whisper model is as follows:
104
- | Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
105
- |-------------------|-----------|-----------|-------|-----------------|-----------------|
106
- | openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
107
- | faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
108
-
109
- If you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.<br>
110
- Read [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more info about CLI args.
111
-
112
- If you want to use a fine-tuned model, manually place the models in `models/Whisper/` corresponding to the implementation.
113
-
114
- Alternatively, if you enter the huggingface repo id (e.g, [deepdml/faster-whisper-large-v3-turbo-ct2](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2)) in the "Model" dropdown, it will be automatically downloaded in the directory.
115
-
116
- ![image](https://github.com/user-attachments/assets/76487a46-b0a5-4154-b735-ded73b2d83d4)
117
-
118
- # REST API
119
- If you're interested in deploying this app as a REST API, please check out [/backend](https://github.com/jhj0517/Whisper-WebUI/tree/master/backend).
120
-
121
- ## TODO🗓
122
-
123
- - [x] Add DeepL API translation
124
- - [x] Add NLLB Model translation
125
- - [x] Integrate with faster-whisper
126
- - [x] Integrate with insanely-fast-whisper
127
- - [x] Integrate with whisperX ( Only speaker diarization part )
128
- - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)
129
- - [x] Add fast api script
130
- - [ ] Add CLI usages
131
- - [ ] Support real-time transcription for microphone
132
-
133
- ### Translation 🌐
134
- Any PRs that translate the language into [translation.yaml](https://github.com/jhj0517/Whisper-WebUI/blob/master/configs/translation.yaml) would be greatly appreciated!
 
 
 
 
1
+ ---
2
+ sdk: gradio
3
+ ---
4
+ # Whisper-WebUI
5
+ A Gradio-based browser interface for [Whisper](https://github.com/openai/whisper). You can use it as an Easy Subtitle Generator!
6
+
7
+ ![screen](https://github.com/user-attachments/assets/caea3afd-a73c-40af-a347-8d57914b1d0f)
8
+
9
+
10
+
11
+ ## Notebook
12
+ If you wish to try this on Colab, you can do it in [here](https://colab.research.google.com/github/jhj0517/Whisper-WebUI/blob/master/notebook/whisper-webui.ipynb)!
13
+
14
+ # Feature
15
+ - Select the Whisper implementation you want to use between :
16
+ - [openai/whisper](https://github.com/openai/whisper)
17
+ - [SYSTRAN/faster-whisper](https://github.com/SYSTRAN/faster-whisper) (used by default)
18
+ - [Vaibhavs10/insanely-fast-whisper](https://github.com/Vaibhavs10/insanely-fast-whisper)
19
+ - Generate subtitles from various sources, including :
20
+ - Files
21
+ - Youtube
22
+ - Microphone
23
+ - Currently supported subtitle formats :
24
+ - SRT
25
+ - WebVTT
26
+ - txt ( only text file without timeline )
27
+ - Speech to Text Translation
28
+ - From other languages to English. ( This is Whisper's end-to-end speech-to-text translation feature )
29
+ - Text to Text Translation
30
+ - Translate subtitle files using Facebook NLLB models
31
+ - Translate subtitle files using DeepL API
32
+ - Pre-processing audio input with [Silero VAD](https://github.com/snakers4/silero-vad).
33
+ - Pre-processing audio input to separate BGM with [UVR](https://github.com/Anjok07/ultimatevocalremovergui).
34
+ - Post-processing with speaker diarization using the [pyannote](https://huggingface.co/pyannote/speaker-diarization-3.1) model.
35
+ - To download the pyannote model, you need to have a Huggingface token and manually accept their terms in the pages below.
36
+ 1. https://huggingface.co/pyannote/speaker-diarization-3.1
37
+ 2. https://huggingface.co/pyannote/segmentation-3.0
38
+
39
+ ### Pipeline Diagram
40
+ ![Transcription Pipeline](https://github.com/user-attachments/assets/1d8c63ac-72a4-4a0b-9db0-e03695dcf088)
41
+
42
+ # Installation and Running
43
+
44
+ - ## Running with Pinokio
45
+
46
+ The app is able to run with [Pinokio](https://github.com/pinokiocomputer/pinokio).
47
+
48
+ 1. Install [Pinokio Software](https://program.pinokio.computer/#/?id=install).
49
+ 2. Open the software and search for Whisper-WebUI and install it.
50
+ 3. Start the Whisper-WebUI and connect to the `http://localhost:7860`.
51
+
52
+ - ## Running with Docker
53
+
54
+ 1. Install and launch [Docker-Desktop](https://www.docker.com/products/docker-desktop/).
55
+
56
+ 2. Git clone the repository
57
+
58
+ ```sh
59
+ git clone https://github.com/jhj0517/Whisper-WebUI.git
60
+ ```
61
+
62
+ 3. Build the image ( Image is about 7GB~ )
63
+
64
+ ```sh
65
+ docker compose build
66
+ ```
67
+
68
+ 4. Run the container
69
+
70
+ ```sh
71
+ docker compose up
72
+ ```
73
+
74
+ 5. Connect to the WebUI with your browser at `http://localhost:7860`
75
+
76
+ If needed, update the [`docker-compose.yaml`](https://github.com/jhj0517/Whisper-WebUI/blob/master/docker-compose.yaml) to match your environment.
77
+
78
+ - ## Run Locally
79
+
80
+ ### Prerequisite
81
+ To run this WebUI, you need to have `git`, `3.10 <= python <= 3.12`, `FFmpeg`. <br>
82
+ And if you're not using an Nvida GPU, or using a different `CUDA` version than 12.4, edit the [`requirements.txt`](https://github.com/jhj0517/Whisper-WebUI/blob/master/requirements.txt) to match your environment.
83
+
84
+ Please follow the links below to install the necessary software:
85
+ - git : [https://git-scm.com/downloads](https://git-scm.com/downloads)
86
+ - python : [https://www.python.org/downloads/](https://www.python.org/downloads/) **`3.10 ~ 3.12` is recommended.**
87
+ - FFmpeg : [https://ffmpeg.org/download.html](https://ffmpeg.org/download.html)
88
+ - CUDA : [https://developer.nvidia.com/cuda-downloads](https://developer.nvidia.com/cuda-downloads)
89
+
90
+ After installing FFmpeg, **make sure to add the `FFmpeg/bin` folder to your system PATH!**
91
+
92
+ ### Installation Using the Script Files
93
+
94
+ 1. git clone this repository
95
+ ```shell
96
+ git clone https://github.com/jhj0517/Whisper-WebUI.git
97
+ ```
98
+ 2. Run `install.bat` or `install.sh` to install dependencies. (It will create a `venv` directory and install dependencies there.)
99
+ 3. Start WebUI with `start-webui.bat` or `start-webui.sh` (It will run `python app.py` after activating the venv)
100
+
101
+ And you can also run the project with command line arguments if you like to, see [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for a guide to arguments.
102
+
103
+ # VRAM Usages
104
+ This project is integrated with [faster-whisper](https://github.com/guillaumekln/faster-whisper) by default for better VRAM usage and transcription speed.
105
+
106
+ According to faster-whisper, the efficiency of the optimized whisper model is as follows:
107
+ | Implementation | Precision | Beam size | Time | Max. GPU memory | Max. CPU memory |
108
+ |-------------------|-----------|-----------|-------|-----------------|-----------------|
109
+ | openai/whisper | fp16 | 5 | 4m30s | 11325MB | 9439MB |
110
+ | faster-whisper | fp16 | 5 | 54s | 4755MB | 3244MB |
111
+
112
+ If you want to use an implementation other than faster-whisper, use `--whisper_type` arg and the repository name.<br>
113
+ Read [wiki](https://github.com/jhj0517/Whisper-WebUI/wiki/Command-Line-Arguments) for more info about CLI args.
114
+
115
+ If you want to use a fine-tuned model, manually place the models in `models/Whisper/` corresponding to the implementation.
116
+
117
+ Alternatively, if you enter the huggingface repo id (e.g, [deepdml/faster-whisper-large-v3-turbo-ct2](https://huggingface.co/deepdml/faster-whisper-large-v3-turbo-ct2)) in the "Model" dropdown, it will be automatically downloaded in the directory.
118
+
119
+ ![image](https://github.com/user-attachments/assets/76487a46-b0a5-4154-b735-ded73b2d83d4)
120
+
121
+ # REST API
122
+ If you're interested in deploying this app as a REST API, please check out [/backend](https://github.com/jhj0517/Whisper-WebUI/tree/master/backend).
123
+
124
+ ## TODO🗓
125
+
126
+ - [x] Add DeepL API translation
127
+ - [x] Add NLLB Model translation
128
+ - [x] Integrate with faster-whisper
129
+ - [x] Integrate with insanely-fast-whisper
130
+ - [x] Integrate with whisperX ( Only speaker diarization part )
131
+ - [x] Add background music separation pre-processing with [UVR](https://github.com/Anjok07/ultimatevocalremovergui)
132
+ - [x] Add fast api script
133
+ - [ ] Add CLI usages
134
+ - [ ] Support real-time transcription for microphone
135
+
136
+ ### Translation 🌐
137
+ Any PRs that translate the language into [translation.yaml](https://github.com/jhj0517/Whisper-WebUI/blob/master/configs/translation.yaml) would be greatly appreciated!