xiaozhongabc
/

my-speecht5-tts

Model card Files Files and versions Community

xiaozhongabc commited on Aug 19, 2024

Commit

40c713a

·

verified ·

1 Parent(s): 11c8943

Create README.md

Files changed (1) hide show

README.md +40 -3

README.md CHANGED Viewed

@@ -1,3 +1,40 @@
----
-license: apache-2.0
----

+---
+language:
+- en
+tags:
+- text-to-speech
+- tts
+- speech
+license: mit
+datasets:
+- Matthijs/cmu-arctic-xvectors
+---
+# SpeechT5 TTS
+This is a re-upload of the Microsoft/SpeechT5_TTS model.
+## Model description
+SpeechT5 is a unified-modal speech and text model developed by Microsoft. This specific model is fine-tuned for text-to-speech tasks.
+## Usage
+```python
+from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
+from datasets import load_dataset
+import torch
+import soundfile as sf
+processor = SpeechT5Processor.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
+model = SpeechT5ForTextToSpeech.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
+vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
+inputs = processor(text="Hello, how are you?", return_tensors="pt")
+embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
+speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
+speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
+sf.write("speech.wav", speech.numpy(), samplerate=16000)