1 6 19

Rasmus Aagaard

rasgaard

https://rasgaard.github.io/

AI & ML interests

Interested in using LLMs in products, evaluation of those products and small models

Recent Activity

liked a Space about 20 hours ago

mhenrichsen/tts

updated a model 2 days ago

rasgaard/whisper-tiny.da

published a model 4 days ago

rasgaard/whisper-tiny.da

View all activity

Organizations

rasgaard's activity

liked a Space about 20 hours ago

syv.ai TTS

🚀

Prøv syv.ai TTS

updated a model 2 days ago

rasgaard/whisper-tiny.da

Automatic Speech Recognition • Updated 2 days ago • 53

published a model 4 days ago

rasgaard/whisper-tiny.da

Automatic Speech Recognition • Updated 2 days ago • 53

New activity in CoRal-project/roest-wav2vec2-315m-v1 9 days ago

Convert to ONNX

#1 opened 19 days ago by

PierreMesure

reacted to Xenova's post with 🔥 9 days ago

Post

5346

Introducing the ONNX model explorer: Browse, search, and visualize neural networks directly in your browser. 🤯 A great tool for anyone studying Machine Learning! We're also releasing the entire dataset of graphs so you can use them in your own projects! 🤗

Check it out! 👇
Demo: onnx-community/model-explorer
Dataset: onnx-community/model-explorer
Source code: https://github.com/xenova/model-explorer

upvoted a collection 27 days ago

Orpheus Multilingual Research Release

Collection

Beta Release of multilingual models. • 12 items • Updated 28 days ago • 77

updated a model 27 days ago

rasgaard/orpheus-3b-coral-tts

Updated 27 days ago • 43

published a model about 1 month ago

rasgaard/orpheus-3b-coral-tts

Updated 27 days ago • 43

updated a dataset about 1 month ago

rasgaard/coral-tts-orpheus-tokenized

Viewer • Updated Apr 4 • 18.9k • 36

published a dataset about 1 month ago

rasgaard/coral-tts-orpheus-tokenized

Viewer • Updated Apr 4 • 18.9k • 36

liked a model about 2 months ago

mistralai/Mistral-Small-3.1-24B-Base-2503

Updated Mar 19 • 8.06k • 208

published an article 2 months ago

Article

Scaling Expert judgment with Large Language Models (LLM-as-a-Judge)

•

Feb 28

liked a model 4 months ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 28 days ago • 1.87M • • 4.23k

reacted to davanstrien's post with 🤗 4 months ago

Post

3086

Introducing scandi-fine-web-cleaner davanstrien/scandi-fine-web-cleaner, the first model trained on FineWeb-C community annotations!

FineWeb2 is a massive multilingual dataset for pre-training language models. Like any web-scale dataset, it contains low-quality content. How can we improve it?

Over the past months, an amazing community of 400+ annotators has been labelling content quality (using Argilla) across 23 languages through the FineWeb-C initiative.

Today, I'm happy to share the first classifier trained on this data.

🔍 What we've built:

- A lightweight classifier that efficiently removes low-quality content
- 90%+ precision demonstrated on Danish & Swedish
- Can process the 43M+ documents in Danish FineWeb2 with minimal compute

🌍 Why this matters: The approach can be reproduced for any of the 23 languages in FineWeb-C ( data-is-better-together/fineweb-c). We can improve training data quality at scale without massive compute resources by starting with community annotations and training small, efficient classifiers.

Want to build a classifier for your language? Check out the full blog post with code examples and implementation details: https://danielvanstrien.xyz/posts/2025/FineWeb-c/scandinavian-content-filtering-fineweb.html