How to?
Hi,
Thanks for the model. How do I use it ?
whisper.load_model() does not work..
Hi,
whisper.load_model()
only works with OpenAI's original models. For custom models like mine, use either:
Transformers (easiest):
from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="ysdede/whisper-khanacademy-large-v3-turbo-tr") pipe("audio.mp3")
Faster Inference (CT2 backend):
I recommend the optimizedct2
version for better speed:ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2
Works withfaster-whisper
orwhisperx
for whisperx simply:
whisperx audio.mp3 --model ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2
or usign a batch file:
@echo off
set "input_file=%~1"
set "output_file=%~dpn1.vtt"
set "output_dir=%~dp1"
if "%output_dir:~-1%"=="\" set "output_dir=%output_dir:~0,-1%"
echo Input file: %input_file%
echo Output file: %output_file%
echo Output directory: %output_dir%
whisperx.exe "%input_file%" --language tr --output_format vtt --compute_type int8 --model "ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2" --segment_resolution sentence --verbose True --batch_size 1 --print_progress True --output_dir "%output_dir%"
Important Note:
This is not superior to vanilla large-v3 for general Turkish - it's specifically fine-tuned to recognize Khan Academy's teaching style (educational terms, lecture cadence). For everyday speech, OpenAI's original may perform better.
Thank you for the answer. In my case whisperx had best accuracy. faster-whisper was second. transformers was worst.