Harmony Speech V1

Version 1 of Harmony Speech is one of the first AI technologies which were created by Project Harmony.AI back in 2023. The goal was to achive an AI voice cloning engine, which is capable of maintaining the Speaker Identity of any voice on speech generation, and allows for faster-than-realtime voice generation, even if performed in a CPU-only environment.

It builds on top of a fork of CorentinJ's amazing "Real-Time-Voice-Cloning" repositoriy, however, with some significant changes applied to the codebase from our side. The full training code will be open sourced at a later point, when we had the chance to wrap it up properly.

As part of Harmony.AI Release Version 0.2.1, we're open sourcing these model weights under the Apache License on Huggingface.

The inference code is part of our Harmony Speech Engine codebase.

Main components of this model:

Speaker Encoder, trained on Mozilla CommonVoice 7 and a few other Multilingual Speech corpuses to ensure capturing of voice characteristics across any language
Mel Spectogram Synthesizer based on Forward Tacotron, trained in English language
Speech Vocoder based on Multi-band MelGAN, trained on the same dataset as the Synthesizer and Spectograms generated by the Synthesizer

Following the recent advancements of Open Source AI Speech Technology and the lack of development capacities on our side currently, we came to the conclusion that we can best support our community by creating an inference engine with an unified API for the huge variety of Speech related AI models and toolchains, rather than training additional, custom models.

This also led to our decision to open source this model.

We hope you enjoy this release, and feel free to also visit our discord (link below) to leave feedback or questions.

About Project Harmony.AI

Our goal: Elevating Human <-to-> AI Interaction beyond known boundaries.

Project Harmony.AI emerged from the idea to allow for a seamless living together between AI-driven characters and humans. Since it became obvious that a lot of technologies required for achieving this goal are not existing or still very experimental, the long term vision of Project Harmony is to establish the full set of technologies which help minimizing biological and technological barriers in Human <-to-> AI Interaction.

Our principles: Fair use and accessibility

We want to counter today's tendencies of AI development centralization at the hands of big corporations. We're pushing towards maximum transparency in our own development efforts, and aim for our software to be accessible and usable in the most democratic ways possible.

Therefore, for all our current and future software offerings, we'll perform a constant and well-educated evaluation whether we can safely open source them in parts or even completely, as long as this appears to be non-harmful towards achieving the project's main goal.

Harmony Speech Engine is being distributed under the AGPLv3 License, because A lot of the code in the module harmonyspeech has been borrowed from Aphrodite Engine. Everyone can use this software as part of their own projects without any restrictions from our side, except from restrictions derived from the nature of the licensing.

How to reach out to us

Official Website of Project Harmony.AI

If you want to collaborate or support this Project financially:

Feel free to join our Discord Server and / or subscribe to our Patreon - Even $1 helps us drive this project forward.

Harmony.AI Discord Server

Harmony.AI Patreon