metadata
library_name: transformers
datasets:
- Bingsu/zeroth-korean
- google/fleurs
language:
- ko
metrics:
- cer
- wer
- bleu
base_model:
- microsoft/Phi-4-multimodal-instruct
model-index:
- name: Phi-4-multimodal-instruct-ko-asr
results:
- task:
type: automatic-speech-recognition
dataset:
type: Bingsu/zeroth_korean
name: zeroth-korean-test
metrics:
- type: bleu
name: zeroth-test-BLEU
value: 94.837
- type: cer
name: zeroth-test-CER
value: 1.316
- type: wer
name: zeroth-test-WER
value: 2.951
- task:
type: automatic-speech-recognition
dataset:
type: google/flerus
name: flerus-ko-test
metrics:
- type: bleu
name: fleurs-test-BLEU
value: 67.659
- type: cer
name: fleurs-test-CER
value: 7.951
- type: wer
name: fleurs-test-WER
value: 18.313
pipeline_tag: automatic-speech-recognition
This model is fine-tuned from microsoft/Phi-4-multimodal-instruct on Bingsu/zeroth-korean, google/flerus in 5 epochs.
This model is trained 960 steps on datasets for Korean Audio Speech Recognition on H100.
After that, we continue training with CoVoST2 Dataset / CoVoST2-Ko for AST.
AST Finetuned model is Here : Phi-4-multimodal-instruct-ko-speech
Evaluation
Evaluation was done on the following datasets:
- ASR (Automatic Speech Recognition): Evaluated with CER (Character Error Rate) on zeroth-test set (457 samples).
- AST (Automatic Speech Translation): Evaluated with BLEU score on fleurs ko <-> en speech translation result (270 samples).
Script is retrieved from here.
Compared to Phi-4-mm-inst-zeroth-kor and Phi-4-multimodal-finetune-ko-speech, ASR is significantly improved.
Model | zeroth-CER | zeroth-WER | fleurs-ko_en-BLEU | fleurs-ko_en-cot-BLEU | fleurs-en_ko-BLEU | fleurs-en_ko-cot-BLEU |
---|---|---|---|---|---|---|
original | 198.32 | - | 5.63 | 2.42 | 6.86 | 4.17 |
daekeun-ml/Phi-4-multimodal-finetune-ko-speech | 1.61 | 3.54 | 7.67 | 8.38 | 12.31 | 9.69 |
seastar105/Phi-4-mm-inst-zeroth-kor | 7.02 | - | 7.07 | 9.19 | 13.08 | 9.35 |
ASR finetune(this model) | 1.31 | 2.95 | 7.46 | 6.24 | 12.15 | 8.91 |
+ 1 epoch finetune with Covost-Ko | 3.88 | - | 8.07 | 10.09 | 18.82 | 15.41 |
AST finetuned model | 1.77 | 2.99 | 8.01 | 9.09 | 17.09 | 11.82 |