Spaces:
Configuration error
Configuration error
File size: 2,746 Bytes
5cae690 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
---
license: mit
datasets:
- benax-rw/my_kinyarwanda_dataset
language:
- rw
metrics:
- wer
base_model: openai/whisper-small
pipeline_tag: automatic-speech-recognition
library_name: transformers
tags:
- kinyarwanda
- asr
- whisper
- low-resource
- fine-tuning
- benax-technologies
- transformers
- torchaudio
- speech-recognition
model-index:
- name: KinyaWhisper
results:
- task:
name: Automatic Speech Recognition
type: automatic-speech-recognition
dataset:
name: KinyaWhisper Custom Dataset
type: custom
config: kinyarwanda
metrics:
- name: WER
type: wer
value: 51.85
---
## 🗣️ KinyaWhisper
KinyaWhisper is a fine-tuned version of OpenAI’s Whisper model for automatic speech recognition (ASR) in Kinyarwanda. It was trained on 102 manually labeled .wav files and serves as a reproducible baseline for speech recognition in low-resource, indigenous languages.
## 🔧 Usage
To run inference on your own audio files using the fine-tuned KinyaWhisper model:
```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torchaudio
# Load fine-tuned KinyaWhisper model and processor from Hugging Face
model = WhisperForConditionalGeneration.from_pretrained("benax-rw/KinyaWhisper")
processor = WhisperProcessor.from_pretrained("benax-rw/KinyaWhisper")
# Load and preprocess audio
waveform, sample_rate = torchaudio.load("your_audio.wav")
inputs = processor(waveform.squeeze(), sampling_rate=sample_rate, return_tensors="pt")
# Generate prediction
predicted_ids = model.generate(inputs["input_features"])
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print("🗣️ Transcription:", transcription)
```
## 🏋️ Taining Details
• Model: openai/whisper-small
• Epochs: 80
• Batch size: 4
• Learning rate: 1e-5
• Optimizer: Adam
• Final loss: 0.00024
• WER: 51.85%
## ⚠️Limitations
The model was trained on a small dataset (102 samples). It performs best on short, clear Kinyarwanda utterances and may struggle with longer or noisy audio. This is an early-stage educational model, not yet suitable for production use.
## 📚 Citation
If you use this model, please cite:
```bibtex
@misc{baziramwabo2025kinyawhisper,
author = {Gabriel Baziramwabo},
title = {KinyaWhisper: Fine-Tuning Whisper for Kinyarwanda ASR},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/benax-rw/KinyaWhisper}},
note = {Version 1.0}
}
```
## 📬 Contact
Maintained by Gabriel Baziramwabo.
✉️ [email protected]
🔗 https://benax.rw
|