benderrodriguez commited on
Commit
22f0fe4
·
1 Parent(s): c68e242

1. Update eval-d1, eval-whatsapp results to align with dataset modifications.

Browse files

2. Update about page to include informaiton about eval-whatsapp.
3. Results for 2025.05.13 Whisper v3 and Whisper v3 turbo models.

Files changed (44) hide show
  1. benchmark.csv +17 -15
  2. results/amazon-transcribe-batch/ivrit_ai_eval_d1.csv +0 -0
  3. results/amazon-transcribe-batch/ivrit_ai_eval_whatsapp.csv +0 -0
  4. results/amazon-transcribe-stream/ivrit_ai_eval_d1.csv +0 -0
  5. results/amazon-transcribe-stream/ivrit_ai_eval_whatsapp.csv +0 -0
  6. results/elevenlabs-scribe-v1/ivrit_ai_eval_d1.csv +0 -0
  7. results/elevenlabs-scribe-v1/ivrit_ai_eval_whatsapp.csv +0 -0
  8. results/faster-whisper-ivrit-ai-v2-d3-e3/ivrit_ai_eval_d1.csv +0 -0
  9. results/faster-whisper-ivrit-ai-v2-d3-e3/ivrit_ai_eval_whatsapp.csv +0 -0
  10. results/faster-whisper-ivrit-ai-v2-d4/ivrit_ai_eval_d1.csv +0 -0
  11. results/faster-whisper-ivrit-ai-v2-d4/ivrit_ai_eval_whatsapp.csv +0 -0
  12. results/faster-whisper-large-v2/ivrit_ai_eval_d1.csv +0 -0
  13. results/faster-whisper-large-v2/ivrit_ai_eval_whatsapp.csv +0 -0
  14. results/faster-whisper-large-v3-turbo/ivrit_ai_eval_d1.csv +0 -0
  15. results/faster-whisper-large-v3-turbo/ivrit_ai_eval_whatsapp.csv +0 -0
  16. results/faster-whisper-large-v3/ivrit_ai_eval_d1.csv +0 -0
  17. results/faster-whisper-large-v3/ivrit_ai_eval_whatsapp.csv +0 -0
  18. results/google-speech/ivrit_ai_eval_d1.csv +0 -0
  19. results/google-speech/ivrit_ai_eval_whatsapp.csv +0 -0
  20. results/ivrit-ai-whisper-large-v3-20250513/common_voice_17.csv +0 -0
  21. results/ivrit-ai-whisper-large-v3-20250513/fleurs.csv +0 -0
  22. results/ivrit-ai-whisper-large-v3-20250513/hebrew_speech_kan.csv +0 -0
  23. results/ivrit-ai-whisper-large-v3-20250513/ivrit_ai_eval_d1.csv +0 -0
  24. results/ivrit-ai-whisper-large-v3-20250513/ivrit_ai_eval_whatsapp.csv +0 -0
  25. results/ivrit-ai-whisper-large-v3-20250513/saspeech.csv +0 -0
  26. results/ivrit-ai-whisper-large-v3-ct2-20250209/ivrit_ai_eval_d1.csv +0 -0
  27. results/ivrit-ai-whisper-large-v3-ct2-20250209/ivrit_ai_eval_whatsapp.csv +0 -0
  28. results/ivrit-ai-whisper-large-v3-ct2-20250403/ivrit_ai_eval_d1.csv +0 -0
  29. results/ivrit-ai-whisper-large-v3-ct2-20250403/ivrit_ai_eval_whatsapp.csv +0 -0
  30. results/ivrit-ai-whisper-large-v3-turbo-20250513/common_voice_17.csv +0 -0
  31. results/ivrit-ai-whisper-large-v3-turbo-20250513/fleurs.csv +0 -0
  32. results/ivrit-ai-whisper-large-v3-turbo-20250513/hebrew_speech_kan.csv +0 -0
  33. results/ivrit-ai-whisper-large-v3-turbo-20250513/ivrit_ai_eval_d1.csv +0 -0
  34. results/ivrit-ai-whisper-large-v3-turbo-20250513/ivrit_ai_eval_whatsapp.csv +0 -0
  35. results/ivrit-ai-whisper-large-v3-turbo-20250513/saspeech.csv +0 -0
  36. results/ivrit-ai-whisper-large-v3-turbo-ct2-20250209/ivrit_ai_eval_d1.csv +0 -0
  37. results/ivrit-ai-whisper-large-v3-turbo-ct2-20250209/ivrit_ai_eval_whatsapp.csv +0 -0
  38. results/ivrit-ai-whisper-large-v3-turbo-ct2-20250403/ivrit_ai_eval_d1.csv +0 -0
  39. results/ivrit-ai-whisper-large-v3-turbo-ct2-20250403/ivrit_ai_eval_whatsapp.csv +0 -0
  40. results/openai-gpt-4o-mini-transcribe/ivrit_ai_eval_d1.csv +0 -0
  41. results/openai-gpt-4o-mini-transcribe/ivrit_ai_eval_whatsapp.csv +0 -0
  42. results/openai-gpt-4o-transcribe/ivrit_ai_eval_d1.csv +0 -0
  43. results/openai-gpt-4o-transcribe/ivrit_ai_eval_whatsapp.csv +0 -0
  44. src/about.py +5 -0
benchmark.csv CHANGED
@@ -1,16 +1,18 @@
1
  engine,model,ivrit-ai/eval-d1,ivrit-ai/eval-whatsapp,ivrit-ai/saspeech,google/fleurs/he,mozilla-foundation/common_voice_17_0/he,imvladikon/hebrew_speech_kan
2
- faster-whisper,ivrit-ai/whisper-large-v3-ct2-20250403,0.056,0.085,0.071,0.200,0.171,0.095
3
- faster-whisper,ivrit-ai/whisper-large-v3-ct2-20250209,0.060,-,0.074,0.208,0.172,0.094
4
- faster-whisper,ivrit-ai/faster-whisper-v2-d4,0.062,-,0.080,0.241,0.207,0.113
5
- faster-whisper,ivrit-ai/faster-whisper-v2-d3-e3,0.070,-,0.086,0.255,0.214,0.139
6
- amazon-transcribe,batch,0.068,0.110,0.085,0.230,0.141,0.090
7
- faster-whisper,large-v2,0.080,0.133,0.098,0.266,0.233,0.164
8
- faster-whisper,large-v3-turbo,0.085,0.131,0.104,0.289,0.280,0.156
9
- faster-whisper,ivrit-ai/whisper-large-v3-turbo-ct2-20250403,0.058,0.069,0.074,0.208,0.183,0.100
10
- faster-whisper,ivrit-ai/whisper-large-v3-turbo-ct2-20250209,0.067,-,0.077,0.245,0.209,0.100
11
- amazon-transcribe,stream,0.081,0.140,0.090,0.287,0.200,0.131
12
- faster-whisper,large-v3,0.096,0.147,0.094,0.262,0.231,0.134
13
- google-speech,google-speech,0.212,0.356,0.189,0.385,0.380,0.292
14
- openai,gpt-4o-transcribe,0.074,0.129,0.109,0.210,0.169,0.394
15
- openai,gpt-4o-mini-transcribe,0.093,0.156,0.150,0.300,0.237,0.468
16
- elevenlabs,scribe_v1,0.201,0.262,0.068,0.181,0.156,0.109
 
 
 
1
  engine,model,ivrit-ai/eval-d1,ivrit-ai/eval-whatsapp,ivrit-ai/saspeech,google/fleurs/he,mozilla-foundation/common_voice_17_0/he,imvladikon/hebrew_speech_kan
2
+ faster-whisper,ivrit-ai/whisper-large-v3-ct2-20250513,0.051,0.072,0.064,0.174,0.149,0.081
3
+ faster-whisper,ivrit-ai/whisper-large-v3-ct2-20250403,0.055,0.075,0.071,0.200,0.171,0.095
4
+ faster-whisper,ivrit-ai/whisper-large-v3-ct2-20250209,0.059,0.082,0.074,0.208,0.172,0.094
5
+ faster-whisper,ivrit-ai/faster-whisper-v2-d4,0.061,0.098,0.080,0.241,0.207,0.113
6
+ faster-whisper,ivrit-ai/faster-whisper-v2-d3-e3,0.068,0.104,0.086,0.255,0.214,0.139
7
+ amazon-transcribe,batch,0.066,0.104,0.085,0.230,0.141,0.090
8
+ faster-whisper,large-v2,0.077,0.121,0.098,0.266,0.233,0.164
9
+ faster-whisper,large-v3-turbo,0.084,0.128,0.104,0.289,0.280,0.156
10
+ faster-whisper,ivrit-ai/whisper-large-v3-turbo-ct2-20250513,0.053,0.071,0.066,0.181,0.151,0.082
11
+ faster-whisper,ivrit-ai/whisper-large-v3-turbo-ct2-20250403,0.055,0.061,0.074,0.208,0.183,0.100
12
+ faster-whisper,ivrit-ai/whisper-large-v3-turbo-ct2-20250209,0.071,0.104,0.077,0.245,0.209,0.100
13
+ amazon-transcribe,stream,0.079,0.129,0.090,0.287,0.200,0.131
14
+ faster-whisper,large-v3,0.098,0.132,0.094,0.262,0.231,0.134
15
+ google-speech,google-speech,0.211,0.352,0.189,0.385,0.380,0.292
16
+ openai,gpt-4o-transcribe,0.073,0.126,0.109,0.210,0.169,0.394
17
+ openai,gpt-4o-mini-transcribe,0.090,0.158,0.150,0.300,0.237,0.468
18
+ elevenlabs,scribe_v1,0.200,0.264,0.068,0.181,0.156,0.109
results/amazon-transcribe-batch/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/amazon-transcribe-batch/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/amazon-transcribe-stream/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/amazon-transcribe-stream/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/elevenlabs-scribe-v1/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/elevenlabs-scribe-v1/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-ivrit-ai-v2-d3-e3/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-ivrit-ai-v2-d3-e3/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-ivrit-ai-v2-d4/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-ivrit-ai-v2-d4/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v2/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v2/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v3-turbo/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v3-turbo/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v3/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/faster-whisper-large-v3/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/google-speech/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/google-speech/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/common_voice_17.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/fleurs.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/hebrew_speech_kan.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/ivrit_ai_eval_d1.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-20250513/saspeech.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-ct2-20250209/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-ct2-20250209/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-ct2-20250403/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-ct2-20250403/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/common_voice_17.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/fleurs.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/hebrew_speech_kan.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/ivrit_ai_eval_d1.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-20250513/saspeech.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-ct2-20250209/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-ct2-20250209/ivrit_ai_eval_whatsapp.csv ADDED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-ct2-20250403/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/ivrit-ai-whisper-large-v3-turbo-ct2-20250403/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/openai-gpt-4o-mini-transcribe/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/openai-gpt-4o-mini-transcribe/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/openai-gpt-4o-transcribe/ivrit_ai_eval_d1.csv CHANGED
The diff for this file is too large to render. See raw diff
 
results/openai-gpt-4o-transcribe/ivrit_ai_eval_whatsapp.csv CHANGED
The diff for this file is too large to render. See raw diff
 
src/about.py CHANGED
@@ -47,6 +47,11 @@ The following datasets are used in our evaluation:
47
  - **Domain**: Manual transcription of a single podcast episode featuring an informal conversation between two speakers (male and female). Audio is segmented into approximately 5-minute chunks.
48
  - **Source**: Part of the ivrit.ai corpus. Selected episode has been manually transcribed to golden standard quality to serve as a high-quality evaluation benchmark.
49
 
 
 
 
 
 
50
  ### [SASpeech](https://huggingface.co/datasets/upai-inc/saspeech)
51
  - **Size**: 4 hours (manually corrected portion of the corpus)
52
  - **Domain**: Economic and political podcast content, containing both read speech and conversational segments. Segments are several seconds in length.
 
47
  - **Domain**: Manual transcription of a single podcast episode featuring an informal conversation between two speakers (male and female). Audio is segmented into approximately 5-minute chunks.
48
  - **Source**: Part of the ivrit.ai corpus. Selected episode has been manually transcribed to golden standard quality to serve as a high-quality evaluation benchmark.
49
 
50
+ ### [ivrit-ai/eval-whatsapp](https://huggingface.co/datasets/ivrit-ai/eval-whatsapp)
51
+ - **Size**: 1:10 hours
52
+ - **Domain**: Freestyle WhatsApp recordings made by volunteers.
53
+ - **Source**: ivrit.ai volunteers. Manually transcribed by an expert.
54
+
55
  ### [SASpeech](https://huggingface.co/datasets/upai-inc/saspeech)
56
  - **Size**: 4 hours (manually corrected portion of the corpus)
57
  - **Domain**: Economic and political podcast content, containing both read speech and conversational segments. Segments are several seconds in length.