ysdede/whisper-khanacademy-large-v3-turbo-tr

etemiz

16 days ago

Hi,
Thanks for the model. How do I use it ?
whisper.load_model() does not work..

ysdede

Owner 15 days ago

•

edited 15 days ago

Hi,

whisper.load_model() only works with OpenAI's original models. For custom models like mine, use either:

Transformers (easiest):

from transformers import pipeline
pipe = pipeline("automatic-speech-recognition", model="ysdede/whisper-khanacademy-large-v3-turbo-tr")
pipe("audio.mp3")

Example Notebook

Faster Inference (CT2 backend):
I recommend the optimized ct2 version for better speed:
ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2
Works with faster-whisper or whisperx

for whisperx simply:

whisperx audio.mp3 --model ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2

or usign a batch file:

@echo off
set "input_file=%~1"
set "output_file=%~dpn1.vtt"
set "output_dir=%~dp1"
if "%output_dir:~-1%"=="\" set "output_dir=%output_dir:~0,-1%"

echo Input file: %input_file%
echo Output file: %output_file%
echo Output directory: %output_dir%

whisperx.exe "%input_file%" --language tr --output_format vtt --compute_type int8 --model "ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2" --segment_resolution sentence --verbose True --batch_size 1 --print_progress True --output_dir "%output_dir%"

Important Note:
This is not superior to vanilla large-v3 for general Turkish - it's specifically fine-tuned to recognize Khan Academy's teaching style (educational terms, lecture cadence). For everyday speech, OpenAI's original may perform better.

ysdede changed discussion status to closed 13 days ago

etemiz

13 days ago

Thank you for the answer. In my case whisperx had best accuracy. faster-whisper was second. transformers was worst.

ysdede
/

whisper-khanacademy-large-v3-turbo-tr

How to?