Mistral AI Game Jam

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

theo-michelย  updated a Space about 10 hours ago
Mistral-AI-Game-Jam/shyguys_2
theo-michelย  updated a Space about 1 month ago
Mistral-AI-Game-Jam/shyguys_2
View all activity

Mistral-AI-Game-Jam's activity

MikeDoesย 
posted an update 9 days ago
view post
Post
1493
PII-Masking-1M Final Day (7/7)! ๐Ÿš€ Today, we unveil 5 NEW Enterprise PII (E-PII) Dataset PREVIEWS!

Standard PII tools often miss sensitive *business* data. That's why we built E-PII previews for the data that powers your operations and compliance needs.

Get a first look (representing 100,000 samples each!) into datasets designed for real-world enterprise security across these categories:

๐Ÿฅ **PHI Preview**: For Healthcare Data
๐Ÿ’ณ **PFI Preview:** For Financial Data
๐Ÿข **PWI Preview:** For Workplace Data
๐Ÿ’ป **PDI Preview:** For Digital Activity Data
๐Ÿ“ **PLI Preview:** For Location Data


That wraps up our #PIIMasking1M 7 days announcement! HUGE thanks for following along and for your engagement.
Explore ALL our releases, including these E-PII previews, in the Ai4Privacy Hugging Face Collection & show some love โค๏ธ if you find them useful!
๐Ÿ”— Visit the Collection:https://huggingface.co/ai4privacy

Let's keep building safer AI, together!
MrDragonFoxย 
posted an update 17 days ago
view post
Post
2436
as a few of you know - i am working on a rather more elaborate-tts that can produce more interesting sounds in context of rp

early sneak peak is here -

MrDragonFox/mOrpheus_3B-1Base_early_preview-v1-25000

its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress

should be able to express fine even in a sfw context

probably the last release for a few weeks as i go back to the data pipeline and improve there ..

in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !

(small colab is provided on the model page if you dont have the gpu to run that your self)
MrDragonFoxย 
posted an update 25 days ago
view post
Post
3567
yet a other audio datasets pre classified for events + audio aestetics

this time for german - 680h sampled from emilia yodas

timestamps for asr training or other fancier things available as nc in the raw repo

MrDragonFox/DE_Emilia_Yodas_680h

cc by 4.0 as by emilia yodas

raw events / transcriptions are cc by NC 4.0

MrDragonFox/DE_Emilia_Yodas_680h_raw_timestamps

the coming days i should push about 600h english + some japanese too same format
MikeDoesย 
posted an update about 1 month ago
MrDragonFoxย 
posted an update about 1 month ago
view post
Post
2105
did a small emotive classified test dataset for all the tts tuners out there

MrDragonFox/Elise

3h total mit - single speaker voice

dataset is a copy of an existing one just added the emotional tags over 1200 samples - should be good enough to test if emotional tags stick in your finetune
  • 1 reply
ยท
MikeDoesย 
posted an update about 1 month ago
view post
Post
2782
๐Ÿš€ We are quite excited to announce the Ai4Privacy Python library! ๐ŸŽ‰

pip install ai4privacy to anonymize short english text with OpenPII Masking 500k labels

๐Ÿ“Š Day 5/7 of PII Masking 1M announcements complete! โฐ
MikeDoesย 
posted an update about 2 months ago
MikeDoesย 
posted an update about 2 months ago
view post
Post
1723
๐Ÿ“Š 99%+ PII Masking Precision in English Straight to Your Browser! ๐Ÿš€

ai4privacy/general-english-anonymiser-openpii-500k

Hard Facts:
๐Ÿ–ฅ๏ธ Runs in-browserโ€”blazing fast, no server latency
๐Ÿ‘ Open-source, MIT-licensed (even for commercial use)
๐Ÿ“ˆ Full metrics on Hugging Face dataset and model pages

Day 3 out 7 of PII-Masking-1M Announcements Complete!
*Accuracies reported from the new OpenPII-500k dataset

#DataPrivacy #AI #OpenSource
MikeDoesย 
posted an update about 2 months ago
view post
Post
2099
#PII Masking Tech that does not **** around!

We are happy to release the OpenPII English Anonymiser โ€”the most powerful open-source tool for redacting sensitive info from English text.

Fine-tuned Modernbert on 5.7 million+ PII examples, itโ€™s clocking 99%+ accuracy across emails, dates, social numbers, and more!

Why itโ€™s a big deal:
โœ… Top-tier precision: 100% for passport numbers, 99.96% for emails*.
โœ… Totally free: MIT license for personal or commercial use.
โœ… No secrets: Full metrics shared on Hugging Face.

#AI #OpenSource #DataSecurity @huggingface

Day 2 out 7 of PII-Masking-1M Announcements Complete!

*Accuracies reported from the new OpenPII-500k dataset

ai4privacy/llama-ai4privacy-english-anonymiser-openpii
MikeDoesย 
posted an update about 2 months ago
view post
Post
2705
๐Ÿš€ Ai4Privacy Team is excited to unveil PII-Masking-1M, our most significant release yet! ๐ŸŽ‰

This publication series ๐Ÿ“ฆ includes datasets ๐Ÿ“Š, models ๐Ÿค–, and applications โš™๏ธ to advance PII masking with AI systems ๐Ÿ›ก๏ธ

Starting on Monday with daily posts at 7 PM CET โฐ
Tonicย 
posted an update 2 months ago
view post
Post
1441
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธHey there folks,

Did you know that you can use ModernBERT to detect model hallucinations ?

Check out the Demo : Tonic/hallucination-test

See here for Medical Context Demo : MultiTransformer/tonic-discharge-guard

check out the model from KRLabs : KRLabsOrg/lettucedect-large-modernbert-en-v1

and the library they kindly open sourced for it : https://github.com/KRLabsOrg/LettuceDetect

๐Ÿ‘†๐Ÿปif you like this topic please contribute code upstream ๐Ÿš€

  • 2 replies
ยท
Tonicย 
posted an update 2 months ago
view post
Post
787
Powered by KRLabsOrg/lettucedect-large-modernbert-en-v1 from KRLabsOrg.

Detect hallucinations in answers based on context and questions using ModernBERT with 8192-token context support!

### Model Details
- **Model Name**: [lettucedect-large-modernbert-en-v1]( KRLabsOrg/lettucedect-large-modernbert-en-v1)
- **Organization**: [KRLabsOrg]( KRLabsOrg )
- **Github**: [https://github.com/KRLabsOrg/LettuceDetect](https://github.com/KRLabsOrg/LettuceDetect)
- **Architecture**: ModernBERT (Large) with extended context support up to 8192 tokens
- **Task**: Token Classification / Hallucination Detection
- **Training Dataset**: [RagTruth]( wandb/RAGTruth-processed)
- **Language**: English
- **Capabilities**: Detects hallucinated spans in answers, provides confidence scores, and calculates average confidence across detected spans.

LettuceDetect excels at processing long documents to determine if an answer aligns with the provided context, making it a powerful tool for ensuring factual accuracy.
ngxsonย 
posted an update 2 months ago
view post
Post
3596
A comprehensive matrix for which format should you use.

Read more on my blog post: https://huggingface.co/blog/ngxson/common-ai-model-formats

| Hardware        | GGUF      | PyTorch                | Safetensors              | ONNX  |
|-----------------|-----------|------------------------|--------------------------|-------|
| CPU             | โœ… (best) | ๐ŸŸก                      | ๐ŸŸก                       | โœ…    |
| GPU             | โœ…        | โœ…                      | โœ…                       | โœ…    |
| Mobile          | โœ…        | ๐ŸŸก (via executorch)     | โŒ                       | โœ…    |
| Apple silicon   | โœ…        | ๐ŸŸก                      | โœ… (via MLX framework)   | โœ…    |
  • 1 reply
ยท
Tonicย 
posted an update 3 months ago
view post
Post
2408
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธhey there folks ,

Goedel's Theorem Prover is now being demo'ed on huggingface : Tonic/Math

give it a try !
Tonicย 
posted an update 3 months ago
view post
Post
2995
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

our team made a game during the @mistral-game-jam and we're trying to win the community award !

try our game out and drop us a โค๏ธ like basically to vote for us !

Mistral-AI-Game-Jam/TextToSurvive

hope you like it !
ngxsonย 
posted an update 4 months ago
Tonicย 
posted an update 4 months ago
view post
Post
1919
๐Ÿ™‹๐Ÿปโ€โ™‚๏ธ Hey there folks ,

Facebook AI just released JASCO models that make music stems .

you can try it out here : Tonic/audiocraft

hope you like it