How to?

#1
by etemiz - opened

Hi,
Thanks for the model. How do I use it ?
whisper.load_model() does not work..

Hi,

whisper.load_model() only works with OpenAI's original models. For custom models like mine, use either:

  1. Transformers (easiest):

    from transformers import pipeline
    pipe = pipeline("automatic-speech-recognition", model="ysdede/whisper-khanacademy-large-v3-turbo-tr")
    pipe("audio.mp3")
    

    Example Notebook

  2. Faster Inference (CT2 backend):
    I recommend the optimized ct2 version for better speed:
    ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2
    Works with faster-whisper or whisperx

for whisperx simply:

whisperx audio.mp3 --model ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2

or usign a batch file:

@echo off
set "input_file=%~1"
set "output_file=%~dpn1.vtt"
set "output_dir=%~dp1"
if "%output_dir:~-1%"=="\" set "output_dir=%output_dir:~0,-1%"

echo Input file: %input_file%
echo Output file: %output_file%
echo Output directory: %output_dir%

whisperx.exe "%input_file%" --language tr --output_format vtt --compute_type int8 --model "ysdede/whisper-khanacademy-large-v3-turbo-tr-ct2" --segment_resolution sentence --verbose True --batch_size 1 --print_progress True --output_dir "%output_dir%"

Important Note:
This is not superior to vanilla large-v3 for general Turkish - it's specifically fine-tuned to recognize Khan Academy's teaching style (educational terms, lecture cadence). For everyday speech, OpenAI's original may perform better.

ysdede changed discussion status to closed

Thank you for the answer. In my case whisperx had best accuracy. faster-whisper was second. transformers was worst.

Sign up or log in to comment