Spaces:
Runtime error
A newer version of the Gradio SDK is available:
5.30.0
title: Style-Bert-VITS2
app_file: app.py
sdk: gradio
sdk_version: 5.16.0
Style-Bert-VITS2
å©çšã®éã¯å¿ ããé¡ããšããã©ã«ãã¢ãã«ã®å©çšèŠçŽããèªã¿ãã ããã
Bert-VITS2 with more controllable voice styles.
https://github.com/litagin02/Style-Bert-VITS2/assets/139731664/e853f9a2-db4a-4202-a1dd-56ded3c562a0
You can install via pip install style-bert-vits2
(inference only), see library.ipynb for example usage.
解説ãã¥ãŒããªã¢ã«åç» YouTubeããã³ãã³åç»
ãªãªãŒã¹ããŒãžãæŽæ°å±¥æŽ
- 2024-09-09: Ver 2.6.1: Google colabã§ããŸãåŠç¿ã§ããªãçã®ãã°ä¿®æ£ã®ã¿
- 2024-06-16: Ver 2.6.0 (ã¢ãã«ã®å·®åããŒãžã»å éããŒãžã»ãã«ã¢ãã«ããŒãžã®è¿œå ã䜿ãéã«ã€ããŠã¯ãã®èšäºåç §)
- 2024-06-14: Ver 2.5.1 (å©çšèŠçŽããé¡ããžå€æŽããã®ã¿)
- 2024-06-02: Ver 2.5.0 (å©çšèŠçŽã®è¿œå ããã©ã«ãåãããã®ã¹ã¿ã€ã«çæãå°æ¥é³ã¢ãã»ãã¿ããã¢ãã«ã®è¿œå ãã€ã³ã¹ããŒã«ã®é«éåç)
- 2024-03-16: ver 2.4.1 (batãã¡ã€ã«ã«ããã€ã³ã¹ããŒã«æ¹æ³ã®å€æŽ)
- 2024-03-15: ver 2.4.0 (å€§èŠæš¡ãªãã¡ã¯ã¿ãªã³ã°ãçš®ã ã®æ¹è¯ãã©ã€ãã©ãªå)
- 2024-02-26: ver 2.3 (èŸæžæ©èœãšãšãã£ã¿ãŒæ©èœ)
- 2024-02-09: ver 2.2
- 2024-02-07: ver 2.1
- 2024-02-03: ver 2.0 (JP-Extra)
- 2024-01-09: ver 1.3
- 2023-12-31: ver 1.2
- 2023-12-29: ver 1.1
- 2023-12-27: ver 1.0
This repository is based on Bert-VITS2 v2.1 and Japanese-Extra, so many thanks to the original author!
æŠèŠ
- å ¥åãããããã¹ãã®å 容ãããšã«ææ è±ããªé³å£°ãçæããBert-VITS2ã®v2.1ãšJapanese-Extraãå ã«ãææ ãçºè©±ã¹ã¿ã€ã«ã匷匱蟌ã¿ã§èªç±ã«å¶åŸ¡ã§ããããã«ãããã®ã§ãã
- GitãPythonããªã人ã§ãïŒWindowsãŠãŒã¶ãŒãªãïŒç°¡åã«ã€ã³ã¹ããŒã«ã§ããåŠç¿ãã§ããŸã (å€ããEasyBertVits2ãããåãããŸãã)ããŸãGoogle Colabã§ã®åŠç¿ããµããŒãããŠããŸã:
- é³å£°åæã®ã¿ã«äœ¿ãå Žåã¯ãã°ã©ãããªããŠãCPUã§åäœããŸãã
- é³å£°åæã®ã¿ã«äœ¿ãå ŽåãPythonã©ã€ãã©ãªãšããŠ
pip install style-bert-vits2
ã§ã€ã³ã¹ããŒã«ã§ããŸããäŸã¯library.ipynbãåç §ããŠãã ããã - ä»ãšã®é£æºã«äœ¿ããAPIãµãŒããŒã忢±ããŠããŸã (@darai0512 æ§ã«ããPRã§ããããããšãããããŸã)ã
- å ã ãæ¥œããããªæç« ã¯æ¥œãããã«ãæ²ããããªæç« ã¯æ²ãããã«ãèªãã®ãBert-VITS2ã®åŒ·ã¿ã§ãã®ã§ãã¹ã¿ã€ã«æå®ãããã©ã«ãã§ãææ è±ããªé³å£°ãçæããããšãã§ããŸãã
äœ¿ãæ¹
- CLIã§ã®äœ¿ãæ¹ã¯ãã¡ããåç §ããŠãã ããã
- ãããã質åãåç §ããŠãã ããã
åäœç°å¢
åUIãšAPI Serverã«ãããŠãWindows ã³ãã³ãããã³ããã»WSL2ã»Linux(Ubuntu Desktop)ã§ã®åäœã確èªããŠããŸã(WSLã§ã®ãã¹æå®ã¯çžå¯Ÿãã¹ãªã©å·¥å€«ãã ãã)ãNVidiaã®GPUãç¡ãå Žåã¯åŠç¿ã¯ã§ããŸãããé³å£°åæãšããŒãžã¯å¯èœã§ãã
ã€ã³ã¹ããŒã«
Pythonã©ã€ãã©ãªãšããŠã®pipã§ã®ã€ã³ã¹ããŒã«ã䜿çšäŸã¯library.ipynbãåç §ããŠãã ããã
GitãPythonã«éŠŽæã¿ãç¡ãæ¹
WindowsãåæãšããŠããŸãã
- ãã®zipãã¡ã€ã«ããã¹ã«æ¥æ¬èªã空çœãå«ãŸããªãå Žæã«ããŠã³ããŒãããŠå±éããŸãã
- ã°ã©ããããæ¹ã¯ã
Install-Style-Bert-VITS2.bat
ãããã«ã¯ãªãã¯ããŸãã - ã°ã©ãããªãæ¹ã¯ã
Install-Style-Bert-VITS2-CPU.bat
ãããã«ã¯ãªãã¯ããŸããCPUçã§ã¯åŠç¿ã¯ã§ããŸããããé³å£°åæãšããŒãžã¯å¯èœã§ãã
- åŸ ã€ãšèªåã§å¿ èŠãªç°å¢ãã€ã³ã¹ããŒã«ãããŸãã
- ãã®åŸãèªåçã«é³å£°åæããããã®ãšãã£ã¿ãŒãèµ·åãããã€ã³ã¹ããŒã«æåã§ããããã©ã«ãã®ã¢ãã«ãããŠã³ããŒãããããŠããã®ã§ããã®ãŸãŸéã¶ããšãã§ããŸãã
ãŸãã¢ããããŒãããããå Žåã¯ãUpdate-Style-Bert-VITS2.bat
ãããã«ã¯ãªãã¯ããŠãã ããã
ãã ã2024-03-16ã®2.4.1ããŒãžã§ã³æªæºããã®ã¢ããããŒãã®å Žåã¯ãå šãŠãåé€ããŠããåã³ã€ã³ã¹ããŒã«ããå¿ èŠããããŸããç³ãèš³ãããŸãããç§»è¡æ¹æ³ã¯CHANGELOG.mdãåç §ããŠãã ããã
GitãPython䜿ãã人
Pythonã®ä»®æ³ç°å¢ã»ããã±ãŒãžç®¡çããŒã«ã§ããuvãpipããé«éãªã®ã§ãããã䜿ã£ãŠã€ã³ã¹ããŒã«ããããšããå§ãããŸãã ïŒäœ¿ããããªãå Žåã¯éåžžã®pipã§ã倧äžå€«ã§ããïŒ
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
git clone https://github.com/litagin02/Style-Bert-VITS2.git
cd Style-Bert-VITS2
uv venv venv
venv\Scripts\activate
uv pip install "torch<2.4" "torchaudio<2.4" --index-url https://download.pytorch.org/whl/cu118
uv pip install -r requirements.txt
python initialize.py # å¿
èŠãªã¢ãã«ãšããã©ã«ãTTSã¢ãã«ãããŠã³ããŒã
æåŸãå¿ããã«ã
é³å£°åæ
é³å£°åæãšãã£ã¿ãŒã¯Editor.bat
ãããã«ã¯ãªãã¯ããpython server_editor.py --inbrowser
ãããšèµ·åããŸãïŒ--device cpu
ã§CPUã¢ãŒãã§èµ·åïŒãç»é¢å
ã§åã»ãªãããšã«èšå®ãå€ããŠåçš¿ãäœã£ãããä¿åãèªã¿èŸŒã¿ãèŸæžã®ç·šéçãã§ããŸãã
ã€ã³ã¹ããŒã«æã«ããã©ã«ãã®ã¢ãã«ãããŠã³ããŒããããŠããã®ã§ãåŠç¿ããŠããªããŠãããã䜿ãããšãã§ããŸãã
ãšãã£ã¿ãŒéšåã¯å¥ãªããžããªã«åãããŠããŸãã
ããŒãžã§ã³2.2以åã§ã®é³å£°åæWebUIã¯ãApp.bat
ãããã«ã¯ãªãã¯ããpython app.py
ãããšWebUIãèµ·åããŸãããŸãã¯Inference.bat
ã§ãé³å£°åæåç¬ã¿ããéããŸãã
é³å£°åæã«å¿ èŠãªã¢ãã«ãã¡ã€ã«ãã¡ã®æ§é ã¯ä»¥äžã®éãã§ãïŒæåã§é 眮ããå¿ èŠã¯ãããŸããïŒã
model_assets
âââ your_model
â âââ config.json
â âââ your_model_file1.safetensors
â âââ your_model_file2.safetensors
â âââ ...
â âââ style_vectors.npy
âââ another_model
âââ ...
ãã®ããã«ãæšè«ã«ã¯config.json
ãš*.safetensors
ãšstyle_vectors.npy
ãå¿
èŠã§ããã¢ãã«ãå
±æããå Žåã¯ããã®3ã€ã®ãã¡ã€ã«ãå
±æããŠãã ããã
ãã®ãã¡style_vectors.npy
ã¯ã¹ã¿ã€ã«ãå¶åŸ¡ããããã«å¿
èŠãªãã¡ã€ã«ã§ãåŠç¿ã®æã«ããã©ã«ãã§å¹³åã¹ã¿ã€ã«ãNeutralããçæãããŸãã
è€æ°ã¹ã¿ã€ã«ã䜿ã£ãŠãã詳ããã¹ã¿ã€ã«ãå¶åŸ¡ãããæ¹ã¯ãäžã®ãã¹ã¿ã€ã«ã®çæããåç
§ããŠãã ããïŒå¹³åã¹ã¿ã€ã«ã®ã¿ã§ããåŠç¿ããŒã¿ãææ
è±ããªãã°ååææ
è±ããªé³å£°ãçæãããŸãïŒã
åŠç¿
- CLIã§ã®åŠç¿ã®è©³çްã¯ãã¡ããåç §ããŠãã ããã
- paperspaceäžã§ã®åŠç¿ã®è©³çްã¯ãã¡ããcolabã§ã®åŠç¿ã¯ãã¡ããåç §ããŠãã ããã
åŠç¿ã«ã¯2-14ç§çšåºŠã®é³å£°ãã¡ã€ã«ãè€æ°ãšããããã®æžãèµ·ããããŒã¿ãå¿ èŠã§ãã
- æ¢åã³ãŒãã¹ãªã©ã§ãã§ã«åå²ãããé³å£°ãã¡ã€ã«ãšæžãèµ·ããããŒã¿ãããå Žåã¯ãã®ãŸãŸïŒå¿ èŠã«å¿ããŠæžãèµ·ãããã¡ã€ã«ãä¿®æ£ããŠïŒäœ¿ããŸããäžã®ãåŠç¿WebUIããåç §ããŠãã ããã
- ããã§ãªãå ŽåãïŒé·ãã¯åããªãïŒé³å£°ãã¡ã€ã«ã®ã¿ãããã°ãããããåŠç¿ã«ããã«äœ¿ããããã«ããŒã¿ã»ãããäœãããã®ããŒã«ã忢±ããŠããŸãã
ããŒã¿ã»ããäœã
App.bat
ãããã«ã¯ãªãã¯ãpython app.py
ãããšããã®ãããŒã¿ã»ããäœæãã¿ããããé³å£°ãã¡ã€ã«ãé©åãªé·ãã«ã¹ã©ã€ã¹ãããã®åŸã«æåã®æžãèµ·ãããèªåã§è¡ããŸãããŸãã¯Dataset.bat
ãããã«ã¯ãªãã¯ã§ããã®åç¬ã¿ããéããŸãã- æç€ºã«åŸã£ãåŸãäžã®ãåŠç¿ãã¿ãã§ãã®ãŸãŸåŠç¿ãè¡ãããšãã§ããŸãã
åŠç¿WebUI
App.bat
ãããã«ã¯ãªãã¯ãpython app.py
ããŠéãWebUIã®ãåŠç¿ãã¿ãããæç€ºã«åŸã£ãŠãã ããããŸãã¯Train.bat
ãããã«ã¯ãªãã¯ã§ããã®åç¬ã¿ããéããŸãã
ã¹ã¿ã€ã«ã®çæ
- ããã©ã«ãã§ã¯ãããã©ã«ãã¹ã¿ã€ã«ãNeutralãã®ä»ãåŠç¿ãã©ã«ãã®ãã©ã«ãåãã«å¿ããã¹ã¿ã€ã«ãçæãããŸãã
- ãã以å€ã®æ¹æ³ã§æåã§ã¹ã¿ã€ã«ãäœæããã人åãã§ãã
App.bat
ãããã«ã¯ãªãã¯ãpython app.py
ããŠéãWebUIã®ãã¹ã¿ã€ã«äœæãã¿ããããé³å£°ãã¡ã€ã«ã䜿ã£ãŠã¹ã¿ã€ã«ãçæã§ããŸãããŸãã¯StyleVectors.bat
ãããã«ã¯ãªãã¯ã§ããã®åç¬ã¿ããéããŸãã- åŠç¿ãšã¯ç¬ç«ããŠããã®ã§ãåŠç¿äžã§ãã§ããããåŠç¿ãçµãã£ãŠãäœåºŠããããªãããŸãïŒååŠçã¯çµããããŠããå¿ èŠããããŸãïŒã
API Server
æ§ç¯ããç°å¢äžã§python server_fastapi.py
ãããšAPIãµãŒããŒãèµ·åããŸãã
API仿§ã¯èµ·ååŸã«/docs
ã«ãŠç¢ºèªãã ããã
- å
¥åæåæ°ã¯ããã©ã«ãã§100æåãäžéãšãªã£ãŠããŸããããã¯
config.yml
ã®server.limit
ã§å€æŽã§ããŸãã - ããã©ã«ãã§ã¯CORSèšå®ãå
šãŠã®ãã¡ã€ã³ã§èš±å¯ããŠããŸããã§ããéãã
config.yml
ã®server.origins
ã®å€ã倿Žããä¿¡é Œã§ãããã¡ã€ã³ã«å¶éãã ãã(ããŒãæ¶ãã°CORSèšå®ãç¡å¹ã«ã§ããŸã)ã
ãŸãé³å£°åæãšãã£ã¿ãŒã®APIãµãŒããŒã¯python server_editor.py
ã§èµ·åããŸãããããŸããŸã æŽåãããŠããŸããããšãã£ã¿ãŒã®ãªããžããªããå¿
èŠãªæäœéã®APIããçŸåšã¯å®è£
ããŠããŸããã
é³å£°åæãšãã£ã¿ãŒã®ãŠã§ããããã€ã«ã€ããŠã¯ãã®Dockerfileãåèã«ããŠãã ããã
ããŒãž
2ã€ã®ã¢ãã«ããã声質ãã声ã®é«ãããææ
衚çŸãããã³ããã®4ç¹ã§æ··ãåãããŠãæ°ããã¢ãã«ãäœã£ããããŸããããã¢ãã«ã«ãå¥ã®2ã€ã®ã¢ãã«ã®å·®åãè¶³ããçã®æäœãã§ããŸãã
App.bat
ãããã«ã¯ãªãã¯ãpython app.py
ããŠéãWebUIã®ãããŒãžãã¿ãããã2ã€ã®ã¢ãã«ãéžæããŠããŒãžããããšãã§ããŸãããŸãã¯Merge.bat
ãããã«ã¯ãªãã¯ã§ããã®åç¬ã¿ããéããŸãã
èªç¶æ§è©äŸ¡
åŠç¿çµæã®ãã¡ã©ã®ã¹ãããæ°ããããã®ãäžã€ã®ãææšãšããŠãSpeechMOS ã䜿ãã¹ã¯ãªãããçšæããŠããŸã:
python speech_mos.py -m <model_name>
ã¹ãããããšã®èªç¶æ§è©äŸ¡ã衚瀺ãããmos_results
ãã©ã«ãã®mos_{model_name}.csv
ãšmos_{model_name}.png
ã«çµæãä¿åããããèªã¿äžãããããæç« ãå€ãããã£ããäžã®ãã¡ã€ã«ãåŒã£ãŠåèªèª¿æŽããŠãã ããããŸããããŸã§ã¢ã¯ã»ã³ããææ
衚çŸãææãå
šãèããªãåºæºã§ã®è©äŸ¡ã§ãç®å®ã®ã²ãšã€ãªã®ã§ãå®éã«èªã¿äžããããŠéžå¥ããã®ãäžçªã ãšæããŸãã
Bert-VITS2ãšã®é¢ä¿
åºæ¬çã«ã¯Bert-VITS2ã®ã¢ãã«æ§é ãå°ãæ¹é ããã ãã§ããæ§äºååŠç¿ã¢ãã«ãJP-Extraã®äºååŠç¿ã¢ãã«ããå®è³ªBert-VITS2 v2.1 or JP-Extraãšåããã®ã䜿çšããŠããŸãïŒäžèŠãªéã¿ãåã£ãŠsafetensorsã«å€æãããã®ïŒã
å ·äœçã«ã¯ä»¥äžã®ç¹ãç°ãªããŸãã
- EasyBertVits2ã®ããã«ãPythonãGitãç¥ããªã人ã§ãç°¡åã«äœ¿ããã
- ææ åã蟌ã¿ã®ã¢ãã«ã倿ŽïŒ256次å ã®wespeaker-voxceleb-resnet34-LMãžãææ åã蟌ã¿ãšããããã¯è©±è èå¥ã®ããã®åã蟌ã¿ïŒ
- ææ åã蟌ã¿ããã¯ãã«éååãåãæããåãªãå šçµåå±€ã«ã
- ã¹ã¿ã€ã«ãã¯ãã«ãã¡ã€ã«
style_vectors.npy
ãäœãããšã§ããã®ã¹ã¿ã€ã«ã䜿ã£ãŠå¹æã®åŒ·ããé£ç¶çã«æå®ãã€ã€é³å£°ãçæããããšãã§ããã - åçš®WebUIãäœæ
- bf16ã§ã®åŠç¿ã®ãµããŒã
- safetensors圢åŒã®ãµããŒããããã©ã«ãã§safetensorsã䜿çšããããã«
- ãã®ä»è»œåŸ®ãªbugfixããªãã¡ã¯ã¿ãªã³ã°
References
In addition to the original reference (written below), I used the following repositories:
The pretrained model and JP-Extra version is essentially taken from the original base model of Bert-VITS2 v2.1 and JP-Extra pretrained model of Bert-VITS2, so all the credits go to the original author (Fish Audio):
In addition, text/user_dict/ module is based on the following repositories:
- voicevox_engine and the license of this module is LGPL v3.
LICENSE
This repository is licensed under the GNU Affero General Public License v3.0, the same as the original Bert-VITS2 repository. For more details, see LICENSE.
In addition, text/user_dict/ module is licensed under the GNU Lesser General Public License v3.0, inherited from the original VOICEVOX engine repository. For more details, see LGPL_LICENSE.
Below is the original README.md.

Bert-VITS2
VITS2 Backbone with multilingual bert
For quick guide, please refer to webui_preprocess.py
.
ç®ææçšè¯·åè§ webui_preprocess.py
ã
请泚æïŒæ¬é¡¹ç®æ žå¿æè·¯æ¥æºäºanyvoiceai/MassTTS äžäžªé垞奜çtts项ç®
MassTTSçæŒç€ºdemo䞺aiçå³°å¥éè¯å³°å¥æ¬äºº,å¹¶æŸåäºåšéäžè§å€±èœçè °å
æççæ è¡è /åŒæè /è°é¿/å士/sensei/çé人/åµåµé²/Våºåœåé 代ç èªå·±åŠä¹ åŠäœè®ç»ã
䞥çŠå°æ€é¡¹ç®çšäºäžåè¿åãäžå人æ°å ±ååœå®ªæ³ãïŒãäžå人æ°å ±ååœåæ³ãïŒãäžå人æ°å ±ååœæ²»å®ç®¡çå€çœæ³ãåãäžå人æ°å ±ååœæ°æ³å žãä¹çšéã
䞥çŠçšäºä»»äœæ¿æ²»çžå ³çšéã
Video:https://www.bilibili.com/video/BV1hp4y1K78E
Demo:https://www.bilibili.com/video/BV1TF411k78w
QQ GroupïŒ815818430
References
- anyvoiceai/MassTTS
- jaywalnut310/vits
- p0p4k/vits2_pytorch
- svc-develop-team/so-vits-svc
- PaddlePaddle/PaddleSpeech
- emotional-vits
- fish-speech
- Bert-VITS2-UI