Mistral-AI-Game-Jam (Mistral AI Game Jam)

theo-michel

updated a Space about 10 hours ago

43

Shyguy's Wingman

💘

Help the Shy Guy finaly talk to Jessica

MikeDoes

posted an update 9 days ago

Post

1493

PII-Masking-1M Final Day (7/7)! 🚀 Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

🏥 **PHI Preview**: For Healthcare Data
💳 **PFI Preview:** For Financial Data
🏢 **PWI Preview:** For Workplace Data
💻 **PDI Preview:** For Digital Activity Data
📍 **PLI Preview:** For Location Data

That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love ❤️ if you find them useful!
🔗 Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!

MrDragonFox

posted an update 17 days ago

Post

2436

as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)

MrDragonFox

posted an update 25 days ago

Post

3567

yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format

nouamanetazi

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

MikeDoes

posted an update about 1 month ago

Post

1066

I need your help! Please vote for Ai4Privacy to do a demo at The first Global Open Source AI Conference for developers!
https://glosaic.org/demo-voting#:~:text=Founder%20@-,AI4Privacy

MrDragonFox

posted an update about 1 month ago

Post

2105

did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune

1 reply

·

MikeDoes

posted an update about 1 month ago

Post

2782

🚀 We are quite excited to announce the Ai4Privacy Python library! 🎉

pip install ai4privacy to anonymize short english text with OpenPII Masking 500k labels

📊 Day 5/7 of PII Masking 1M announcements complete! ⏰

MikeDoes

posted an update about 2 months ago

Post

3056

🌟 Day 4: Two Models, One Privacy Mission! 🌟

The PII-Masking-1M series rolls on with two gems:

Categorical: ai4privacy/llama-ai4privacy-multilingual-categorical-anonymiser-openpii
Redaction: ai4privacy/llama-ai4privacy-multilingual-anonymiser-openpii
Join us in protecting data everywhere!

#AI #Privacy #OpenSource #Multilingual

MikeDoes

posted an update about 2 months ago

Post

1723

📊 99%+ PII Masking Precision in English Straight to Your Browser! 🚀

ai4privacy/general-english-anonymiser-openpii-500k

Hard Facts:
🖥️ Runs in-browser—blazing fast, no server latency
👐 Open-source, MIT-licensed (even for commercial use)
📈 Full metrics on Hugging Face dataset and model pages

Day 3 out 7 of PII-Masking-1M Announcements Complete!
*Accuracies reported from the new OpenPII-500k dataset

#DataPrivacy #AI #OpenSource

MikeDoes

posted an update about 2 months ago

Post

2099

#PII Masking Tech that does not **** around!

We are happy to release the OpenPII English Anonymiser —the most powerful open-source tool for redacting sensitive info from English text.

Fine-tuned Modernbert on 5.7 million+ PII examples, it’s clocking 99%+ accuracy across emails, dates, social numbers, and more!

Why it’s a big deal:
✅ Top-tier precision: 100% for passport numbers, 99.96% for emails*.
✅ Totally free: MIT license for personal or commercial use.
✅ No secrets: Full metrics shared on Hugging Face.

#AI #OpenSource #DataSecurity @huggingface

Day 2 out 7 of PII-Masking-1M Announcements Complete!

*Accuracies reported from the new OpenPII-500k dataset

ai4privacy/llama-ai4privacy-english-anonymiser-openpii

MikeDoes

posted an update about 2 months ago

Post

2705

🚀 Ai4Privacy Team is excited to unveil PII-Masking-1M, our most significant release yet! 🎉

This publication series 📦 includes datasets 📊, models 🤖, and applications ⚙️ to advance PII masking with AI systems 🛡️

Starting on Monday with daily posts at 7 PM CET ⏰

Tonic

posted an update 2 months ago

Post

1441

🙋🏻‍♂️Hey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

👆🏻if you like this topic please contribute code upstream 🚀

2 replies

·

Tonic

posted an update 2 months ago

Post

787

Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg](

KRLabsOrg )
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.

ngxson

posted an update 2 months ago

Post

3596

A comprehensive matrix for which format should you use.

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats

| Hardware        | GGUF      | PyTorch                | Safetensors              | ONNX  |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU             | ✅ (best) | 🟡                      | 🟡                       | ✅    |
| GPU             | ✅        | ✅                      | ✅                       | ✅    |
| Mobile          | ✅        | 🟡 (via executorch)     | ❌                       | ✅    |
| Apple silicon   | ✅        | 🟡                      | ✅ (via MLX framework)   | ✅    |

1 reply

·

ngxson

authored a paper 3 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 229

Tonic

posted an update 3 months ago

Post

2408

🙋🏻‍♂️hey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !

Tonic

posted an update 3 months ago

Post

2995

🙋🏻‍♂️ Hey there folks ,

our team made a game during the @mistral-game-jam and we're trying to win the community award !

try our game out and drop us a ❤️ like basically to vote for us !

Mistral-AI-Game-Jam/TextToSurvive

hope you like it !

ngxson

posted an update 4 months ago

Post

1073

Fun fact: you can get any DeepSeek-R1-Qwen **abliterated** by using one of these LoRA adapters (GGUF available!)

ngxson/extracted-lora-mergekit-677d5c3eea0b6a7661201846

Tonic

posted an update 4 months ago

Post

1919

🙋🏻‍♂️ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it

Mistral AI Game Jam

AI & ML interests

Recent Activity

Mistral-AI-Game-Jam's activity

Shyguy's Wingman

SmolVLM: Redefining small and efficient multimodal models

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

AI & ML interests

Recent Activity

Team members 98

Mistral-AI-Game-Jam's activity

Shyguy's Wingman