Model Card for Arabic StyleTTS2

This is an Arabic text-to-speech model based on StyleTTS2 architecture, specifically adapted for Arabic language synthesis. The model achieves good quality Arabic speech synthesis, though not yet state-of-the-art, and further experimentation is needed to optimize performance for Arabic language specifically. All training objectives from the original StyleTTS2 were maintained, except for the WavLM objectives which were removed as they were primarily designed for English speech.

Example

Here is an example output from the model:

Sample 1

Efficiency and Performance

A key strength of this model lies in its efficiency and performance characteristics:

  • Compact Architecture: Achieves impressive quality with <100M parameters
  • Limited Training Data: Trained on only 22 hours of single-speaker audio
  • Transfer Learning: Successfully fine-tuned from LibriTTS multi-speaker model to single-speaker Arabic
  • Resource Efficient: Good quality achieved despite limited computational resources

Note: According to the StyleTTS2 authors, performance should improve further when training a single-speaker model from scratch rather than fine-tuning. This wasn't attempted in our case due to computational resource constraints, suggesting potential for even better results with more extensive training.

Model Details

Model Description

This model is a modified version of StyleTTS2, specifically adapted for Arabic text-to-speech synthesis. It incorporates a custom-trained PL-BERT model for Arabic language understanding and removes the WavLM adversarial training component (which was primarily designed for English).

  • Developed by: Fadi (GitHub: Fadi987)
  • Model type: Text-to-Speech (StyleTTS2 architecture)
  • Language(s): Arabic
  • Finetuned from model: yl4579/StyleTTS2-LibriTTS

Model Sources

Uses

Direct Use

The model can be used for generating Arabic speech from text. To use the model:

  1. Clone the StyleTTS2 repository:
git clone https://github.com/Fadi987/StyleTTS2
cd StyleTTS2
  1. Install espeak-ng for phonemization backend:
# For macOS
brew install espeak-ng

# For Ubuntu/Debian
sudo apt-get install espeak-ng

# For Windows
# Download and install espeak-ng from: https://github.com/espeak-ng/espeak-ng/releases
  1. Install Python dependencies:
pip install -r requirements.txt
  1. Download the model.pth and config.yml files from this repository

  2. Run inference using:

python inference.py --config config.yml --model model.pth --text "ุงู„ุฅูุชู’ู‚ูŽุงู†ู ูŠูŽุญู’ุชูŽุงุฌู ุฅูู„ูŽู‰ ุงู„ู’ุนูŽู…ูŽู„ู ูˆูŽุงู„ู’ู…ูุซูŽุงุจูŽุฑูŽุฉ"

Make sure use properly diacritized Arabic text for best results

Out-of-Scope Use

The model is specifically designed for Arabic text-to-speech synthesis and may not perform well for:

  • Other languages
  • Heavy dialect variations
  • Non-diacritized Arabic text

Training Details

Training Data

  • Training was performed on approximately 22 hours of Arabic audiobook data
  • Dataset: fadi77/arabic-audiobook-dataset-24khz
  • The PL-BERT component was trained on fully diacritized Wikipedia Arabic text

Training Hyperparameters

  • Number of epochs: 20
  • Diffusion training: Started from epoch 5

Objectives

  • Training objectives: All original StyleTTS2 objectives maintained, except WavLM adversarial training
  • Validation objectives: Identical to original StyleTTS2 validation process

Compute Infrastructure

  • Hardware Type: NVIDIA H100 GPU

Notable Modifications from Original StyleTTS2 in Architecture and Objectives

The architecture of the model follows that of StyleTTS2 with the following exceptions:

  • Removed WavLM adversarial training component
  • Custom PL-BERT trained for Arabic language

Citation

BibTeX:

@article{styletts2,
  title={StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models},
  author={Liu, Yinghao Aaron and Chen, Tao and Ping, Wei and Wu, Xiaoliang and Wang, Dongchao and Duan, Yuxuan and Li, Xiaodi and Li, Chong and Liang, Xuchen and Liu, Qiong and others},
  journal={arXiv preprint arXiv:2306.07691},
  year={2023}
}

Model Card Contact

GitHub: @Fadi987 Hugging Face: @fadi77

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support