File size: 1,693 Bytes
fd1adc1
 
 
 
 
 
 
 
 
 
 
12d303c
 
 
 
 
 
fd1adc1
 
4a58eca
 
12d303c
4a58eca
12d303c
4a58eca
 
 
 
 
12d303c
4a58eca
 
12d303c
4a58eca
 
 
 
 
12d303c
 
 
4a58eca
12d303c
4a58eca
 
 
12d303c
4a58eca
12d303c
4a58eca
12d303c
4a58eca
12d303c
f31bed1
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: Tortoise TTS API
emoji: 🦀
colorFrom: yellow
colorTo: purple
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Text-to-speech using Gradio, FastAPI, and TorToise TTS
tags:
  - tortoise-tts
  - text-to-speech
  - voice-cloning
  - gradio
  - fastapi
---

# Voice Chat Assistant
A conversational voice assistant powered by AI that responds to your spoken queries with natural-sounding speech.

## Features

- Speech Recognition: Uses OpenAI's Whisper model to accurately transcribe your voice
- Natural Language Understanding: Leverages Cohere's LLM API for intelligent responses
- Text-to-Speech: Generates natural speech using Tortoise-TTS
- Reply on Pause: Automatically responds when you finish speaking
- Conversation History: Maintains context throughout your dialogue

## Demo
Speak into your microphone and the assistant will respond with voice!

## How It Works
- Your voice is transcribed to text using Whisper
- The text is processed by Cohere's LLM to generate a response
- The response is converted to speech using Tortoise-TTS
- The conversation continues with full context retention

## Technical Details

This project utilizes:

- Zero-GPU: Efficient GPU memory usage with Hugging Face's Zero-GPU technology
- FastRTC: Real-time communication for seamless voice interaction
- Gradio: Simple and intuitive user interface

## Setup

To run this locally, you'll need a Cohere API key and Python 3.8+.

## Acknowledgements

- OpenAI for the Whisper speech recognition model
- Cohere for the language model API
- Tortoise-TTS for the text-to-speech capabilities
- Hugging Face for the Spaces and Zero-GPU infrastructure