Spaces:
Running
Running
File size: 5,715 Bytes
432a474 d2ae73d 432a474 8ba4308 6680f24 87ff28a b3660ff 8ba4308 87ff28a 9100090 92eaf0c 633a175 87ff28a 9751248 cb57d96 b3660ff 87ff28a e898abd 9100090 e898abd 5ed9749 0f77dec 5ed9749 b850013 5ed9749 de305ed 5ed9749 dfbb840 e898abd 0f77dec e898abd 87ff28a f420a37 8ba4308 f420a37 87ff28a f420a37 87ff28a b3660ff 87ff28a f420a37 557e7ca 1ed6720 557e7ca 8ba4308 1ed6720 87ff28a f420a37 633a175 f420a37 8ba4308 87ff28a 633a175 87ff28a 9100090 f420a37 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 |
---
title: Expressive TTS Arena
emoji: π€
colorFrom: indigo
colorTo: pink
sdk: docker
app_file: src/main.py
python_version: "3.11"
pinned: true
license: mit
---
<div align="center">
<img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
<h1>Expressive TTS Arena</h1>
<p>
<strong>
A web application for comparing and evaluating the expressiveness of different text-to-speech models
</strong>
</p>
</div>
## Overview
Expressive TTS Arena is an open-source web application for evaluating the expressiveness of voice generation and speech synthesis from different text-to-speech providers.
For support or to join the conversation, visit our [Discord](https://discord.com/invite/humeai).
## Prerequisites
- [Python >=3.11.11](https://www.python.org/downloads/)
- [pip >=25.0](https://pypi.org/project/pip/)
- [uv >=0.5.29](https://github.com/astral-sh/uv)
- [Postgres](https://www.postgresql.org/download/)
- API keys for Hume AI, Anthropic, OpenAI, and ElevenLabs
## Project Structure
```
Expressive TTS Arena/
βββ public/
βββ src/
β βββ common/
β β βββ __init__.py
β β βββ common_types.py # Application-wide custom type aliases and definitions.
β β βββ config.py # Manages application config (Singleton) loaded from env vars.
β β βββ constants.py # Application-wide constant values.
β β βββ utils.py # General-purpose utility functions used across modules.
β βββ core/
β β βββ __init__.py
β β βββ tts_service.py # Service handling Text-to-Speech provider selection and API calls.
β β βββ voting_service.py # Service managing database operations for votes and leaderboards.
β βββ database/ # Database access layer using SQLAlchemy.
β β βββ __init__.py
β β βββ crud.py # Data Access Objects (DAO) / CRUD operations for database models.
β β βββ database.py # Database connection setup (engine, session management).
β β βββ models.py # SQLAlchemy ORM models defining database tables.
β βββ frontend/
β β βββ components/
β β β β βββ __init__.py
β β β β βββ arena.py # UI definition and logic for the 'Arena' tab.
β β β β βββ leaderboard.py # UI definition and logic for the 'Leaderboard' tab.
β β βββ __init__.py
β β βββ frontend.py # Main Gradio application class; orchestrates UI components and layout.
β βββ integrations/ # Modules for interacting with external third-party APIs.
β β βββ __init__.py
β β βββ anthropic_api.py # Integration logic for the Anthropic API.
β β βββ elevenlabs_api.py # Integration logic for the ElevenLabs API.
β β βββ hume_api.py # Integration logic for the Hume API.
β βββ middleware/
β β βββ __init__.py
β β βββ meta_tag_injection.py # Middleware for injecting custom HTML meta tags into the Gradio page.
β βββ scripts/
β β βββ __init__.py
β β βββ init_db.py # Script to create database tables based on models.
β β βββ test_db.py # Script for testing the database connection configuration.
β βββ __init__.py
β βββ main.py # Main script to configure and run the Gradio application.
βββ static/
β βββ audio/ # Temporary storage for generated audio files served to the UI.
β βββ css/
β β βββ styles.css # Custom CSS overrides and styling for the Gradio UI.
βββ .dockerignore
βββ .env.example
βββ .gitignore
βββ .pre-commit-config.yaml
βββ Dockerfile
βββ LICENSE.txt
βββ pyproject.toml
βββ README.md
βββ uv.lock
```
## Installation
1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/).
2. Configure environment variables:
- Create a `.env` file based on `.env.example`
- Add your API keys:
```txt
HUME_API_KEY=YOUR_HUME_API_KEY
ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
OPENAI_API_KEY=YOUR_OPENAI_API_KEY
```
3. Run the application:
Standard
```sh
uv run python -m src.main
```
With hot-reloading
```sh
uv run watchfiles "python -m src.main" src
```
4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`)
5. (Optional) If contributing, install pre-commit hook for automatic linting, formatting, and type-checking:
```sh
uv run pre-commit install
```
## User Flow
1. Select a sample character, or input a custom character description and click **"Generate Text"**, to generate your text input.
2. Click the **"Synthesize Speech"** button to synthesize two TTS outputs based on your text and character description.
3. Listen to both audio samples to compare their expressiveness.
4. Vote for the most expressive result by clicking either **"Select Option A"** or **"Select Option B"**.
## License
This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.
|