Yehor (Smoliakov)

posted an update 17 days ago

Post

599

Esoteric practices: inference models in PHP!

Repository: https://github.com/egorsmkv/speech-to-text-using-php

posted an update 22 days ago

Post

2314

Made a workable program that uses IREE runtime using Rust to inference wav2vec2-bert model for Automatic Speech Recognition.

1 reply

·

reacted to leonardlin's post with 👍 23 days ago

Post

2635

Happy to announce the release of Shisa V2, our latest generation of our bilingual Japanese-English language models. After hundreds of ablations and months of work, we're releasing some of the strongest open Japanese models at 7B, 8B, 12B, 14B, 32B and 70B! Full announcement here https://shisa.ai/posts/shisa-v2/ or visit the Shisa V2 HF collection: shisa-ai/shisa-v2-67fc98ecaf940ad6c49f5689

replied to their post 25 days ago

Also, tested it on A100 with TensorRT:

https://colab.research.google.com/drive/1-agoo5ll-hWEecWQAtO1FM39sqavJxph?usp=sharing

Results are not so obvious, but it works base_rfdetr_fp16.onnx model and gives ~10ms/img

posted an update 26 days ago

Post

2682

I have made a Rust project with integration of the latest state-of-the-art model for object detection, it outperforms YOLO!

Check it out: https://github.com/egorsmkv/rf-detr-usls

2 replies

·

replied to their post 30 days ago

This program does what datasets does. When you push dataset created by the audiofolder script, it creates parquet data and shard them internally.

So, you can use audios-to-dataset instead if you need faster speeds than datasets provides.

posted an update 30 days ago

Post

2088

Convert your audio data to Parquet/DuckDB files with blazingly fast speeds!

Repository with pre-built binaries: https://github.com/crs-org/audios-to-dataset

2 replies

·

replied to their post about 1 month ago

My channel in Telegram: https://t.me/doing_something

posted an update about 1 month ago

Post

2251

Create spectrogram using Rust!

Slightly improved nice project that creates spectrogram and built binaries for different platform using cross-rs I've mentioned earlier in my channel.

Repo: https://github.com/crs-org/sonogram

1 reply

·

replied to their post about 1 month ago

Tested the tool on Windows, it works as expected. It's now more easier to work with audio data than before!

posted an update about 1 month ago

Post

662

Added more built executables to extract-audio I've released recently.

See my previous post - https://huggingface.co/posts/Yehor/654118712490771

Repository: https://github.com/crs-org/extract-audio

1 reply

·

posted an update about 1 month ago

Post

1941

Made a simple Python script to generate Argilla project for audio annotation from a dataset:

https://github.com/egorsmkv/argilla-audio-annotation

1 reply

·

posted an update about 1 month ago

Post

2047

Are you interesting in different runtimes for AI models?

Check out IREE (iree.dev), it convert models to MLIR and then execute on different platforms.

I have tested it in Rust on CPU and CUDA: https://github.com/egorsmkv/eerie-yolo11

replied to their post about 1 month ago

See a demo: https://colab.research.google.com/drive/1prztEZIf8nNFUSaptY8Jv16VO8Crjnzb?usp=sharing

posted an update about 1 month ago

Post

2228

Extract audio datasets with Rust on blazingly fast speeds!

With this tool you can extract audio files from a parquet or arrow file generated by Hugging Face datasets library.

Repository: https://github.com/egorsmkv/extract-audio

1 reply

·

posted an update about 2 months ago

Post

619

If you spent a lot of time in Telegram, then use this bot to monitor state of your ML lab:

https://github.com/egorsmkv/gpu-state-tgbot

reacted to clem's post with 🤗 about 2 months ago

Post

4670

We just crossed 1,500,000 public models on Hugging Face (and 500k spaces, 330k datasets, 50k papers). One new repository is created every 15 seconds. Congratulations all!

3 replies

·

reacted to eliebak's post with 🔥 about 2 months ago

Post

1724

Google just dropped an exciting technical report for the brand-new Gemma3 model! 🚀 Here are my personal notes highlighting the most intriguing architectural innovations, design choices, and insights from this release:

1) Architecture choices:
> No more softcaping, replace by QK-Norm
> Both Pre AND Post Norm
> Wider MLP than Qwen2.5, ~ same depth
> SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
> No MLA to save KV cache, SWA do the job!

2) Long context
> Only increase the rope in the global layer (to 1M)
> Confirmation that it's harder to do long context for smol models, no 128k for the 1B
> Pretrained with 32k context? seems very high
> No yarn nor llama3 like rope extension

3) Distillation
> Only keep te first 256 logits for the teacher
> Ablation on the teacher gap (tl;dr you need some "patience" to see that using a small teacher is better)
> On policy distillation yeahh (by
@agarwl_
et al), not sure if the teacher gap behave the same here, curious if someone have more info?

4) Others
> Checkpoint with QAT, that's very cool
> RL using improve version of BOND, WARM/WARP good excuse to look at
@ramealexandre
papers
> Only use Zero3, no TP/PP if i understand correctly ?
> Training budget relatively similar than gemma2

1 reply

·

reacted to tomaarsen's post with 👍 about 2 months ago

Post

6695

An assembly of 18 European companies, labs, and universities have banded together to launch 🇪🇺 EuroBERT! It's a state-of-the-art multilingual encoder for 15 European languages, designed to be finetuned for retrieval, classification, etc.

🇪🇺 15 Languages: English, French, German, Spanish, Chinese, Italian, Russian, Polish, Portuguese, Japanese, Vietnamese, Dutch, Arabic, Turkish, Hindi
3️⃣ 3 model sizes: 210M, 610M, and 2.1B parameters - very very useful sizes in my opinion
➡️ Sequence length of 8192 tokens! Nice to see these higher sequence lengths for encoders becoming more common.
⚙️ Architecture based on Llama, but with bi-directional (non-causal) attention to turn it into an encoder. Flash Attention 2 is supported.
🔥 A new Pareto frontier (stronger *and* smaller) for multilingual encoder models
📊 Evaluated against mDeBERTa, mGTE, XLM-RoBERTa for Retrieval, Classification, and Regression (after finetuning for each task separately): EuroBERT punches way above its weight.
📝 Detailed paper with all details, incl. data: FineWeb for English and CulturaX for multilingual data, The Stack v2 and Proof-Pile-2 for code.

Check out the release blogpost here: https://huggingface.co/blog/EuroBERT/release
* EuroBERT/EuroBERT-210m
* EuroBERT/EuroBERT-610m
* EuroBERT/EuroBERT-2.1B

The next step is for researchers to build upon the 3 EuroBERT base models and publish strong retrieval, zero-shot classification, etc. models for all to use. I'm very much looking forward to it!

1 reply

·

replied to their post about 2 months ago

Also, the Q&A dataset:

https://huggingface.co/datasets/ua-l/questions-with-answers

Smoliakov

AI & ML interests

Recent Activity

Organizations

Yehor's activity