Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

liked a Space 2 days ago

maitrix-org/Voila-demo

liked a model 3 days ago

maitrix-org/Voila-audio-alpha

authored a paper 3 days ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

View all activity

Organizations

Jaward's activity

liked a Space 2 days ago

Voila Demo

💻

Chat with a voice-clone AI

liked a model 3 days ago

maitrix-org/Voila-audio-alpha

Audio-to-Audio • Updated 2 days ago • 54 • 2

authored a paper 3 days ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published 3 days ago • 70

liked a dataset 3 days ago

maitrix-org/Voila-million-voice

Updated 2 days ago • 450 • 1

replied to their post 3 days ago

if you like this work, kindly upvote the paper, thanks: https://huggingface.co/papers/2505.02707

posted an update 3 days ago

Post

591

Thrilled to share our latest work: Voila - a family of fully opensourced voice models for real-time autonomous convos and role-play, some of our major contributions include 🧵:
1) An End-to-End Full-Duplex Arch: that directly processes & handles simultaneous audio token streams from user to model and vice versa.
2) Voila-Tokenizer: A 100K-hour trained tokenizer with interleaved alignment (audio & text) that distills semantic/acoustic tokens via RVQ.
3) Text-Audio Interleaved Alignment: We leveraged a fine-grained alignment of text and audio tokens that allows synchronization and expressiveness for tasks like ASR (WER 2.7%) and TTS (WER 2.8%).
4) Voice Customization: Supports 1M+ pre-built voices and 1 shot voice clone from 10s audio clips using Wespeaker embeddings.

Models: maitrix-org/voila-67e0d96962c19f221fc73fa5
Code: https://github.com/maitrix-org/Voila
Demo: maitrix-org/Voila-demo
Project page: maitrix-org/Voila-demo

2 replies

upvoted a paper 3 days ago

Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play

Paper • 2505.02707 • Published 3 days ago • 70

updated a Space 3 days ago

Professor AI Feynman

🚀

Create lecture slides and audio using AI

posted an update 5 days ago

Post

1235

late submission but managed to cook up a nascent Feynman-inspired agent app for Microsoft’s AI Agent hackathon, wish me luck lol. @clem ps I need this on gpu, thank you:)
Try Demo: Jaward/Professor-AI-Feynman
Code: https://github.com/Jaykef/professor-ai-feynman