metadata

title: VERSA Speech & Audio Evaluation Demo
emoji: 🎙️
colorFrom: blue
colorTo: indigo
sdk: docker
hf_oauth: false
license: apache-2.0

VERSA Speech & Audio Evaluation Demo

This demo allows you to evaluate speech and audio files using the VERSA (Versatile Evaluation of Speech and Audio) toolkit.

How to Use

Upload a ground truth audio file (the reference audio)
Upload a prediction audio file (the audio to be evaluated)
Select an evaluation metric from the dropdown menu
Click the "Evaluate" button
View the results in the table and raw JSON format

About VERSA

VERSA is a toolkit dedicated to collecting evaluation metrics in speech and audio quality. It provides a comprehensive connection to cutting-edge evaluation techniques and is tightly integrated with ESPnet.

With full installation, VERSA offers over 80 metrics with 700+ metric variations based on different configurations. These metrics encompass evaluations utilizing diverse external resources, including matching and non-matching reference audio, text transcriptions, and text captions.

Learn more at the VERSA GitHub Repository.

Features

Easy-to-use interface for audio evaluation
Support for various evaluation metrics
Detailed results displayed in table format
Raw JSON output for further analysis

Citation

If you use VERSA in your research, please cite:

@misc{shi2024versaversatileevaluationtoolkit,
  title={VERSA: A Versatile Evaluation Toolkit for Speech, Audio, and Music},
  author={Jiatong Shi and Hye-jin Shim and Jinchuan Tian and Siddhant Arora and Haibin Wu and Darius Petermann and Jia Qi Yip and You Zhang and Yuxun Tang and Wangyou Zhang and Dareen Safar Alharthi and Yichen Huang and Koichi Saito and Jionghao Han and Yiwen Zhao and Chris Donahue and Shinji Watanabe},
  year={2024},
  eprint={2412.17667},
  archivePrefix={arXiv},
  primaryClass={cs.SD},
  url={https://arxiv.org/abs/2412.17667},
}