Wav2Vec2-XLS-R-300m finetuned on the data of Korean pronunciations of English speakers.

This repository contains a finetuned Wav2Vec2-xls-r-300m model for Automatic Speech Recognition (ASR) task. The model was trained and evaluated on โ€œthe spoken Korean voice of native English speakersโ€ provided by AIHub https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=&topMenu=&aihubDataSe=data&dataSetSn=71469

Creator & Uploader: Sehyun Oh ([email protected])

Data Information

  • Dataset Name: the spoken Korean voice of native English speakers.

  • Data Type: Speech recordings of English speakers speaking Korean.

  • Annotation: Each utterance is annotated with korean words and phoneme sequences.

  • Train Set: 50,525 samples, 47.91 hours

  • Valid Set: 6,510 samples, 6.18 hours

  • Test Set: 6,315 samples, 6.03 hours

Training Procedure

The model was fine-tuned for ASR using the Hugging Face transformers library. Below are the training steps:

  1. Data preprocessing to align audio with word labels.
  2. Wav2Vec2-XLS-R-300M model fine-tuning with CTC loss.
  3. Evaluation on validation and test sets.

Training Hyperparameters

  • Epochs: 50
  • Learning Rate: 0.0001
  • Warmup Ratio: 0.1
  • Scheduler: Linear
  • Batch Size: 8
  • Loss Reduction: Mean
  • Feature Extractor Freeze: Enabled

Test Results

The model was evaluated on the test dataset with the following performance:

  • Word Error Rate (WER): 0.0130
  • Character Error Rate (CER): 0.0069
  • Phoneme Error Rate (PER): 0.0114

Sample :

  • Correct Sentence: ์ข‹์€ ์˜๊ฒฌ์ด ์žˆ์œผ์‹œ๋ฉด ์˜๊ฒฌ๋ž€์— ๊ผญ ์จ ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค
  • Predicted Sentence: ์ข‹์€ ์˜๊ฒฌ์ด ์žˆ์œผ์‹œ๋ฉด ์˜๊ฒฌ๋ž€์— ๊ผญ ์จ ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค

Training Logs

TensorBoard logs are available for detailed training analysis:

  • events.out.tfevents.1742786238.oem-WS-C621E-SAGE-Series.3352548.0โ€™, โ€˜events.out.tfevents.1742889983.oem-WS-C621E-SAGE-Series.3352548.1

Use the following command to visualize logs:

tensorboard --logdir=./logs/โ€จโ€จ
Downloads last month
18
Safetensors
Model size
316M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support