Jaward Sesay's picture

Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

Organizations

MLX Community's profile picture

Jaward's activity

replied to their post 3 days ago
posted an update 3 days ago
view post
Post
591
Thrilled to share our latest work: Voila - a family of fully opensourced voice models for real-time autonomous convos and role-play, some of our major contributions include 🧵:
1) An End-to-End Full-Duplex Arch: that directly processes & handles simultaneous audio token streams from user to model and vice versa.
2) Voila-Tokenizer: A 100K-hour trained tokenizer with interleaved alignment (audio & text) that distills semantic/acoustic tokens via RVQ.
3) Text-Audio Interleaved Alignment: We leveraged a fine-grained alignment of text and audio tokens that allows synchronization and expressiveness for tasks like ASR (WER 2.7%) and TTS (WER 2.8%).
4) Voice Customization: Supports 1M+ pre-built voices and 1 shot voice clone from 10s audio clips using Wespeaker embeddings.

Models: maitrix-org/voila-67e0d96962c19f221fc73fa5
Code: https://github.com/maitrix-org/Voila
Demo: maitrix-org/Voila-demo
Project page: maitrix-org/Voila-demo
  • 2 replies
·
posted an update 5 days ago