File size: 5,715 Bytes
432a474
 
 
 
d2ae73d
432a474
 
 
 
 
 
 
8ba4308
 
6680f24
87ff28a
b3660ff
 
 
8ba4308
87ff28a
 
 
9100090
92eaf0c
633a175
 
87ff28a
 
 
9751248
 
 
cb57d96
b3660ff
87ff28a
e898abd
9100090
e898abd
 
5ed9749
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0f77dec
5ed9749
 
 
 
 
b850013
5ed9749
de305ed
5ed9749
dfbb840
e898abd
 
 
0f77dec
 
 
 
 
e898abd
 
 
87ff28a
f420a37
8ba4308
f420a37
87ff28a
 
 
f420a37
87ff28a
 
 
b3660ff
87ff28a
 
f420a37
557e7ca
 
 
1ed6720
557e7ca
 
 
8ba4308
1ed6720
87ff28a
 
f420a37
 
633a175
f420a37
 
 
8ba4308
87ff28a
 
633a175
 
 
 
87ff28a
 
9100090
f420a37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
title: Expressive TTS Arena
emoji: 🎀
colorFrom: indigo
colorTo: pink
sdk: docker
app_file: src/main.py
python_version: "3.11"
pinned: true
license: mit
---

<div align="center">
    <img src="https://storage.googleapis.com/hume-public-logos/hume/hume-banner.png">
    <h1>Expressive TTS Arena</h1>
    <p>
        <strong> 
            A web application for comparing and evaluating the expressiveness of different text-to-speech models 
        </strong>
    </p>
</div>

## Overview

Expressive TTS Arena is an open-source web application for evaluating the expressiveness of voice generation and speech synthesis from different text-to-speech providers.

For support or to join the conversation, visit our [Discord](https://discord.com/invite/humeai).

## Prerequisites

- [Python >=3.11.11](https://www.python.org/downloads/)
- [pip >=25.0](https://pypi.org/project/pip/)
- [uv >=0.5.29](https://github.com/astral-sh/uv)
- [Postgres](https://www.postgresql.org/download/)
- API keys for Hume AI, Anthropic, OpenAI, and ElevenLabs

## Project Structure

```
Expressive TTS Arena/
β”œβ”€β”€ public/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ common/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ common_types.py         # Application-wide custom type aliases and definitions.
β”‚   β”‚   β”œβ”€β”€ config.py               # Manages application config (Singleton) loaded from env vars.
β”‚   β”‚   β”œβ”€β”€ constants.py            # Application-wide constant values.
β”‚   β”‚   β”œβ”€β”€ utils.py                # General-purpose utility functions used across modules.
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ tts_service.py          # Service handling Text-to-Speech provider selection and API calls.
β”‚   β”‚   β”œβ”€β”€ voting_service.py       # Service managing database operations for votes and leaderboards.
β”‚   β”œβ”€β”€ database/                   # Database access layer using SQLAlchemy.
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ crud.py                 # Data Access Objects (DAO) / CRUD operations for database models.
β”‚   β”‚   β”œβ”€β”€ database.py             # Database connection setup (engine, session management).
β”‚   β”‚   └── models.py               # SQLAlchemy ORM models defining database tables.
β”‚   β”œβ”€β”€ frontend/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ __init__.py     
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ arena.py        # UI definition and logic for the 'Arena' tab.
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ leaderboard.py  # UI definition and logic for the 'Leaderboard' tab.
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ frontend.py             # Main Gradio application class; orchestrates UI components and layout.
β”‚   β”œβ”€β”€ integrations/               # Modules for interacting with external third-party APIs.
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ anthropic_api.py        # Integration logic for the Anthropic API.
β”‚   β”‚   β”œβ”€β”€ elevenlabs_api.py       # Integration logic for the ElevenLabs API.
β”‚   β”‚   └── hume_api.py             # Integration logic for the Hume API.
β”‚   β”œβ”€β”€ middleware/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ meta_tag_injection.py   # Middleware for injecting custom HTML meta tags into the Gradio page.
β”‚   β”œβ”€β”€ scripts/
β”‚   β”‚   β”œβ”€β”€ __init__.py
β”‚   β”‚   β”œβ”€β”€ init_db.py              # Script to create database tables based on models.
β”‚   β”‚   β”œβ”€β”€ test_db.py              # Script for testing the database connection configuration.
β”‚   β”œβ”€β”€ __init__.py
β”‚   β”œβ”€β”€ main.py                     # Main script to configure and run the Gradio application.
│── static/
β”‚   β”œβ”€β”€ audio/                      # Temporary storage for generated audio files served to the UI.
β”‚   β”œβ”€β”€ css/
β”‚   β”‚   β”œβ”€β”€ styles.css              # Custom CSS overrides and styling for the Gradio UI.
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ .env.example
β”œβ”€β”€ .gitignore
β”œβ”€β”€ .pre-commit-config.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ LICENSE.txt
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ uv.lock
```

## Installation

1. This project uses the [uv](https://docs.astral.sh/uv/) package manager. Follow the installation instructions for your platform [here](https://docs.astral.sh/uv/getting-started/installation/).

2. Configure environment variables:
    - Create a `.env` file based on `.env.example`
    - Add your API keys:

    ```txt
    HUME_API_KEY=YOUR_HUME_API_KEY
    ANTHROPIC_API_KEY=YOUR_ANTHROPIC_API_KEY
    ELEVENLABS_API_KEY=YOUR_ELEVENLABS_API_KEY
    OPENAI_API_KEY=YOUR_OPENAI_API_KEY
    ```

3. Run the application:

    Standard
    ```sh
    uv run python -m src.main
    ```

    With hot-reloading
    ```sh
    uv run watchfiles "python -m src.main" src
    ```

4. Test the application by navigating to the the localhost URL in your browser (e.g. `localhost:7860` or `http://127.0.0.1:7860`)

5. (Optional) If contributing, install pre-commit hook for automatic linting, formatting, and type-checking:
    ```sh
    uv run pre-commit install
    ```

## User Flow

1. Select a sample character, or input a custom character description and click **"Generate Text"**, to generate your text input.
2. Click the **"Synthesize Speech"** button to synthesize two TTS outputs based on your text and character description.
3. Listen to both audio samples to compare their expressiveness.
4. Vote for the most expressive result by clicking either **"Select Option A"** or **"Select Option B"**.

## License

This project is licensed under the MIT License - see the [LICENSE.txt](LICENSE.txt) file for details.