xiaozhongabc commited on
Commit
40c713a
·
verified ·
1 Parent(s): 11c8943

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ tags:
5
+ - text-to-speech
6
+ - tts
7
+ - speech
8
+ license: mit
9
+ datasets:
10
+ - Matthijs/cmu-arctic-xvectors
11
+ ---
12
+
13
+ # SpeechT5 TTS
14
+
15
+ This is a re-upload of the Microsoft/SpeechT5_TTS model.
16
+
17
+ ## Model description
18
+
19
+ SpeechT5 is a unified-modal speech and text model developed by Microsoft. This specific model is fine-tuned for text-to-speech tasks.
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import SpeechT5Processor, SpeechT5ForTextToSpeech, SpeechT5HifiGan
25
+ from datasets import load_dataset
26
+ import torch
27
+ import soundfile as sf
28
+
29
+ processor = SpeechT5Processor.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
30
+ model = SpeechT5ForTextToSpeech.from_pretrained("YOUR_USERNAME/YOUR_REPO_NAME")
31
+ vocoder = SpeechT5HifiGan.from_pretrained("microsoft/speecht5_hifigan")
32
+
33
+ inputs = processor(text="Hello, how are you?", return_tensors="pt")
34
+
35
+ embeddings_dataset = load_dataset("Matthijs/cmu-arctic-xvectors", split="validation")
36
+ speaker_embeddings = torch.tensor(embeddings_dataset[7306]["xvector"]).unsqueeze(0)
37
+
38
+ speech = model.generate_speech(inputs["input_ids"], speaker_embeddings, vocoder=vocoder)
39
+
40
+ sf.write("speech.wav", speech.numpy(), samplerate=16000)