Spaces:
Configuration error
Configuration error
license: mit | |
datasets: | |
- benax-rw/my_kinyarwanda_dataset | |
language: | |
- rw | |
metrics: | |
- wer | |
base_model: openai/whisper-small | |
pipeline_tag: automatic-speech-recognition | |
library_name: transformers | |
tags: | |
- kinyarwanda | |
- asr | |
- whisper | |
- low-resource | |
- fine-tuning | |
- benax-technologies | |
- transformers | |
- torchaudio | |
- speech-recognition | |
model-index: | |
- name: KinyaWhisper | |
results: | |
- task: | |
name: Automatic Speech Recognition | |
type: automatic-speech-recognition | |
dataset: | |
name: KinyaWhisper Custom Dataset | |
type: custom | |
config: kinyarwanda | |
metrics: | |
- name: WER | |
type: wer | |
value: 51.85 | |
## 🗣️ KinyaWhisper | |
KinyaWhisper is a fine-tuned version of OpenAI’s Whisper model for automatic speech recognition (ASR) in Kinyarwanda. It was trained on 102 manually labeled .wav files and serves as a reproducible baseline for speech recognition in low-resource, indigenous languages. | |
## 🔧 Usage | |
To run inference on your own audio files using the fine-tuned KinyaWhisper model: | |
```python | |
from transformers import WhisperProcessor, WhisperForConditionalGeneration | |
import torchaudio | |
# Load fine-tuned KinyaWhisper model and processor from Hugging Face | |
model = WhisperForConditionalGeneration.from_pretrained("benax-rw/KinyaWhisper") | |
processor = WhisperProcessor.from_pretrained("benax-rw/KinyaWhisper") | |
# Load and preprocess audio | |
waveform, sample_rate = torchaudio.load("your_audio.wav") | |
inputs = processor(waveform.squeeze(), sampling_rate=sample_rate, return_tensors="pt") | |
# Generate prediction | |
predicted_ids = model.generate(inputs["input_features"]) | |
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] | |
print("🗣️ Transcription:", transcription) | |
``` | |
## 🏋️ Taining Details | |
• Model: openai/whisper-small | |
• Epochs: 80 | |
• Batch size: 4 | |
• Learning rate: 1e-5 | |
• Optimizer: Adam | |
• Final loss: 0.00024 | |
• WER: 51.85% | |
## ⚠️Limitations | |
The model was trained on a small dataset (102 samples). It performs best on short, clear Kinyarwanda utterances and may struggle with longer or noisy audio. This is an early-stage educational model, not yet suitable for production use. | |
## 📚 Citation | |
If you use this model, please cite: | |
```bibtex | |
@misc{baziramwabo2025kinyawhisper, | |
author = {Gabriel Baziramwabo}, | |
title = {KinyaWhisper: Fine-Tuning Whisper for Kinyarwanda ASR}, | |
year = {2025}, | |
publisher = {Hugging Face}, | |
howpublished = {\url{https://huggingface.co/benax-rw/KinyaWhisper}}, | |
note = {Version 1.0} | |
} | |
``` | |
## 📬 Contact | |
Maintained by Gabriel Baziramwabo. | |
✉️ [email protected] | |
🔗 https://benax.rw | |