Repository: https://github.com/egorsmkv/speech-to-text-using-php
Smoliakov
Yehor
AI & ML interests
Speech-to-Text, Text-to-Speech, Voice over Internet Protocol
Recent Activity
updated
a Space
13 days ago
Yehor/evaluate-asr-outputs
updated
a Space
15 days ago
Yehor/pdf-generator-gradio
published
a Space
15 days ago
Yehor/text-mesanenet
Organizations
Yehor's activity

posted
an
update
17 days ago
Post
599
Esoteric practices: inference models in PHP!
Repository: https://github.com/egorsmkv/speech-to-text-using-php
Repository: https://github.com/egorsmkv/speech-to-text-using-php

reacted to
leonardlin's
post with ๐
23 days ago
Post
2635
Happy to announce the release of Shisa V2, our latest generation of our bilingual Japanese-English language models. After hundreds of ablations and months of work, we're releasing some of the strongest open Japanese models at 7B, 8B, 12B, 14B, 32B and 70B! Full announcement here https://shisa.ai/posts/shisa-v2/ or visit the Shisa V2 HF collection:
shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689

replied to
their
post
25 days ago
Also, tested it on A100 with TensorRT:
https://colab.research.google.com/drive/1-agoo5ll-hWEecWQAtO1FM39sqavJxph?usp=sharing
Results are not so obvious, but it works base_rfdetr_fp16.onnx model and gives ~10ms/img

posted
an
update
26 days ago
Post
2682
I have made a Rust project with integration of the latest state-of-the-art model for object detection, it outperforms YOLO!
Check it out: https://github.com/egorsmkv/rf-detr-usls
Check it out: https://github.com/egorsmkv/rf-detr-usls

replied to
their
post
30 days ago
This program does what datasets does. When you push dataset created by the audiofolder script, it creates parquet data and shard them internally.
So, you can use audios-to-dataset instead if you need faster speeds than datasets provides.

posted
an
update
30 days ago
Post
2088
Convert your audio data to Parquet/DuckDB files with blazingly fast speeds!
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset
Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset

replied to
their
post
about 1 month ago
My channel in Telegram: https://t.me/doing_something

posted
an
update
about 1 month ago
Post
2251
Create spectrogram using Rust!
Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.
Repo: https://github.com/crs-org/sonogram
Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.
Repo: https://github.com/crs-org/sonogram

replied to
their
post
about 1 month ago

posted
an
update
about 1 month ago
Post
662
Added more built executables to extract-audio I've released recently.
See my previous post - https://huggingface.co/posts/Yehor/654118712490771
Repository: https://github.com/crs-org/extract-audio
See my previous post - https://huggingface.co/posts/Yehor/654118712490771
Repository: https://github.com/crs-org/extract-audio

posted
an
update
about 1 month ago
Post
1941
Made a simple Python script to generate Argilla project for audio annotation from a dataset:
https://github.com/egorsmkv/argilla-audio-annotation
https://github.com/egorsmkv/argilla-audio-annotation

posted
an
update
about 1 month ago
Post
2047
Are you interesting in different runtimes for AI models?
Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.
I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11
Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.
I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

replied to
their
post
about 1 month ago

posted
an
update
about 1 month ago
Post
2228
Extract audio datasets with Rust on blazingly fast speeds!
With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.
Repository: https://github.com/egorsmkv/extract-audio
With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.
Repository: https://github.com/egorsmkv/extract-audio

posted
an
update
about 2 months ago
Post
619
If you spent a lot of time in Telegram, then use this bot to monitor state of your ML lab:
https://github.com/egorsmkv/gpu-state-tgbot
https://github.com/egorsmkv/gpu-state-tgbot

reacted to
eliebak's
post with ๐ฅ
about 2 months ago
Post
1724
Google just dropped an exciting technical report for the brand-new Gemma3 model! ๐ Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:
1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!
2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension
3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?
4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2
1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!
2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension
3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?
4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

reacted to
tomaarsen's
post with ๐
about 2 months ago
Post
6695
An assembly of 18 European companies, labs, and universities have banded together to launch ๐ช๐บ EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.
๐ช๐บ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3๏ธโฃ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
โก๏ธ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
โ๏ธ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
๐ฅ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
๐ Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
๐ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!
๐ช๐บ 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3๏ธโฃ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
โก๏ธ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
โ๏ธ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
๐ฅ A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
๐ Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
๐ Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.
Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B
The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!

replied to
their
post
about 2 months ago
Also, the Q&A dataset:
https://huggingface.co/datasets/ua-l/questions-with-answers