Dataset-Tools (Dataset Tools)

posted an update 1 day ago

Post

1517

Well, here’s the updated version with the 20,000+ entry sampled dataset for Watermark Filter Content Moderation models incl. [Food25, Weather, Watermark, Marathi/Hindi Sign Language Detection], post-trained from the base models: sigLip2 patch16 224 — now with mixed aspect ratios for better performance and reduced misclassification. 🔥

Models :
➮ Watermark-Detection : prithivMLmods/Watermark-Detection-SigLIP2
⌨︎ Watermark Detection & Batch Image Processing Experimentals, Colab Notebook : https://colab.research.google.com/drive/1mlQrSsSjkGimUt0VyRi3SoWMv8OMyvw3?usp=drive_link
➮ Weather-Image-Classification : prithivMLmods/Weather-Image-Classification
➮ TurkishFoods-25 : prithivMLmods/TurkishFoods-25
➮ Marathi-Sign-Language-Detection : prithivMLmods/Marathi-Sign-Language-Detection
➮ Hindi-Sign-Language-Detection : prithivMLmods/Hindi-Sign-Language-Detection

Datasets :
Watermark : qwertyforce/scenery_watermarks
Weather : prithivMLmods/WeatherNet-05-18039
Turkish Foods 25 : yunusserhat/TurkishFoods-25
Marathi Sign Language : VinayHajare/Marathi-Sign-Language
Hindi Sign Language : Vedant3907/Hindi-Sign-Language-Dataset

Collection : prithivMLmods/content-filters-siglip2-vit-68197e3357d4de18fb3b4d2b

prithivMLmods

posted an update 5 days ago

Post

974

The new versions of Midjourney Mix adapters have been dropped in stranger zone hf. These adapters excel in studio lighting portraits and painterly styles, trained using the style of strangerzonehf/Flux-Midjourney-Mix2-LoRA. They leverage 24-bit colored synthetic images generated form midjourney v6 to achieve high-quality image reproducibility and support adaptable aspect ratios, using Flux.1 as the base model. 🥳

Models [ ⌗ ]

> Flux-Midjourney-Painterly-LoRA : strangerzonehf/Flux-Midjourney-Painterly-LoRA
> Flux-Midjourney-Studio-LoRA : strangerzonehf/Flux-Midjourney-Studio-LoRA

> Collection : strangerzonehf/midjourney-mix-3-ft-flux1-dev-68165d58a2a08025852d63f3

> Space : prithivMLmods/FLUX-LoRA-DLC2

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.

fdaudens

posted an update 6 days ago

Post

2901

Forget everything you know about transcription models - NVIDIA's parakeet-tdt-0.6b-v2 changed the game for me!

Just tested it with Steve Jobs' Stanford speech and was speechless (pun intended). The video isn’t sped up.

3 things that floored me:
- Transcription took just 10 seconds for a 15-min file
- Got a CSV with perfect timestamps, punctuation & capitalization
- Stunning accuracy (correctly captured "Reed College" and other specifics)

NVIDIA also released a demo where you can click any transcribed segment to play it instantly.

The improvement is significant: number 1 on the ASR Leaderboard, 6% error rate (best in class) with complete commercial freedom (cc-by-4.0 license).

Time to update those Whisper pipelines! H/t @Steveeeeeeen for the finding!

Model: nvidia/parakeet-tdt-0.6b-v2
Demo: nvidia/parakeet-tdt-0.6b-v2
ASR Leaderboard: hf-audio/open_asr_leaderboard

1 reply

·

fdaudens

posted an update 7 days ago

Post

452

I just gave my chatbots a massive upgrade: they can now generate audio from text, modify images — you name it. Here’s how:

The Gradio team shipped MCP support. That means you can plug any AI app built with it into Claude or Cursor using the Model Context Protocol (MCP) — think of it like a USB port for LLMs.

I put it to the test:
- Whipped up a quick text-to-speech app with Kokoro on HF (with an LLM riding shotgun, naturally)
- Added "mcp_server=True" in the code
- Connected it to Claude

Now I can generate audio from any text. The possibilities are next-level: you can potentially plug any of the 500K+ AI apps on Hugging Face to your favorite LLM.

Is this the new UI for AI?

- My tts app (feel free to use/duplicate it): fdaudens/kokoro-mcp
- Blog post: https://huggingface.co/blog/gradio-mcp

alielfilali01

posted an update 7 days ago

Post

444

Great efforts from @AtlasIA folks to adapt text2image models (ghibli style) for Moroccan Context

Read the blog is here : https://huggingface.co/blog/atlasia/creating-your-custom-ghibli-text-to-image-model

fdaudens

posted an update 8 days ago

Post

1713

Want to know which AI models are least likely to hallucinate — and how to keep yours from spiking hallucinations by 20%?

A new benchmark called Phare, by Giskard, tested leading models across multiple languages, revealing three key findings:

1️⃣ Popular models aren't necessarily factual. Some models ranking highest in user satisfaction benchmarks like LMArena are actually more prone to hallucination.

2️⃣ The way you ask matters - a lot. When users present claims confidently ("My teacher said..."), models are 15% less likely to correct misinformation vs. neutral framing ("I heard...").

3️⃣ Telling models to "be concise" can increase hallucination by up to 20%.

What's also cool is that the full dataset is public - use them to test your own models or dive deeper into the results! H/t @davidberenstein1957 for the link.

- Study: https://www.giskard.ai/knowledge/good-answers-are-not-necessarily-factual-answers-an-analysis-of-hallucination-in-leading-llms
- Leaderboard: https://phare.giskard.ai/
- Dataset: giskardai/phare

prithivMLmods

posted an update 8 days ago

Post

1796

Dropping downstream tasks using newly initialized parameters and weights supports domain-specific image classification post-training, based on the SigLIP-2 models: Patch-16/224, Patch-16/256, and Patch-32/256. For more details, please refer to the respective model cards : 🤗

+ watermark detection : prithivMLmods/Watermark-Detection-SigLIP2
+ resisc45 : prithivMLmods/RESISC45-SigLIP2
+ pacs dg : prithivMLmods/PACS-DG-SigLIP2
+ 3d printed or not : prithivMLmods/3D-Printed-Or-Not-SigLIP2
+ formula or text : prithivMLmods/Formula-Text-Detection

Categorizing Un-Safe Content :
- explicit content patch16 256 : prithivMLmods/siglip2-x256-explicit-content
- explicit content patch32 256 : prithivMLmods/siglip2-x256p32-explicit-content

Collection :
> SigLIP2 Content Filters 042025 Final : https://huggingface.co/collections/prithivMLmods/siglip2-content-filters-04202-final-680fe4aa1a9d589bf2c915ff
> SigLIP2 : google/siglip2-67b5dcef38c175486e240107
> SigLIP2 Multilingual Vision-Language Encoders : https://arxiv.org/pdf/2502.14786

prithivMLmods

posted an update 13 days ago

Post

2201

Bringing out style-intermixing adapters for Flux.Dev, including Aura Glow, Fallen Ink Art, Cardboard Paper Arts, Black & White Expressions, and Glitter Gem Touch. For more details, visit the model card of the LoRA. 🥳

╰┈➤Demo : prithivMLmods/FLUX-LoRA-DLC2 & prithivMLmods/FLUX-LoRA-DLC

╰┈➤ Adapters :
+ Aura Glow : strangerzonehf/2DAura-Flux
+ Fallen Ink Art : strangerzonehf/FallenArt-Flux
+ Black & White Expressions : strangerzonehf/BnW-Expressions-Flux
+ Glitter Gem Touch : strangerzonehf/Gem-Touch-LoRA-Flux
+ Cardboard Paper Arts v1 : strangerzonehf/Flux-Cardboard-Art-LoRA
+ Cardboard Paper Arts v2 : strangerzonehf/Cardboard-v2-Flux

╰┈➤ Pages :
- Repository Page :

strangerzonehf
- Collection : strangerzonehf/mixer-adp-042025-68095c365d9d1072c8d860be
- Flux Ultimate LoRA Collection : strangerzonehf/Flux-Ultimate-LoRA-Collection
- By prithivMLmods : @prithivMLmods

The best dimensions and inference settings for optimal results are as follows: A resolution of 1280 x 832 with a 3:2 aspect ratio is recommended for the best quality, while 1024 x 1024 with a 1:1 aspect ratio serves as the default option. For inference, the recommended number of steps ranges between 30 and 35 to achieve optimal output.

prithivMLmods

posted an update 14 days ago

Post

1190

Dropping the domain-specific downstream image classification content moderation models, including the anime image type classification, GeoSceneNet, indoor-outdoor scene classification, and black-and-white vs. colored image classification models, along with the datasets. 🔥

╰┈➤Models :
+ GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet
+ IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet
+ B&W vs Colored : prithivMLmods/BnW-vs-Colored-Detection
+ Anime Image Type : prithivMLmods/Anime-Classification-v1.0
+ Multilabel Portrait : prithivMLmods/Multilabel-Portrait-SigLIP2

╰┈➤Datasets :
- GeoSceneNet : prithivMLmods/Multilabel-GeoSceneNet-16K
- IndoorOutdoorNet : prithivMLmods/IndoorOutdoorNet-20K
- BnW vs Colored : prithivMLmods/BnW-vs-Colored-10K
- Multilabel Portrait : prithivMLmods/Multilabel-Portrait-18K

╰┈➤Collections :
> Multilabel Image Classification Datasets : prithivMLmods/multilabel-image-classification-datasets-6809aa64637f45d4c47fa6ca
> Model Collection : prithivMLmods/siglip2-content-filters-models-v2-68053a958c42ef17a3a3f4d1

Note: The anime scene type dataset is not mentioned in the list because it is private and only accessible to members of the DeepGHS organization.

For raw ZIP files or more information about the datasets, visit: https://www.kaggle.com/prithivsakthiur/datasets

1 reply

·

fdaudens

posted an update 15 days ago

Post

2121

@thomwolf and @m-ric teaming up as the perfect instructor duo for DeepLearning.ai’s new course: Building Code Agents with Hugging Face smolagents!

https://www.deeplearning.ai/short-courses/building-code-agents-with-hugging-face-smolagents/

davanstrien

posted an update 15 days ago

Post

1948

Came across a very nice submission from @marcodsn for the reasoning datasets competition (https://huggingface.co/blog/bespokelabs/reasoning-datasets-competition).

The dataset distils reasoning chains from arXiv research papers in biology and economics. Some nice features of the dataset:

- Extracts both the logical structure AND researcher intuition from academic papers
- Adopts the persona of researchers "before experiments" to capture exploratory thinking
- Provides multi-short and single-long reasoning formats with token budgets - Shows 7.2% improvement on MMLU-Pro Economics when fine-tuning a 3B model

It's created using the Curator framework with plans to scale across more scientific domains and incorporate multi-modal reasoning with charts and mathematics.

I personally am very excited about datasets like this, which involve creativity in their creation and don't just rely on $$$ to produce a big dataset with little novelty.

Dataset can be found here: marcodsn/academic-chains (give it a like!)

prithivMLmods

posted an update 20 days ago

Post

2830

Dropping an entire collection of Style Intermixing Adapters on StrangerZone HF — including Realism, Anime, Sketch, Texture-Rich 3D Experimentals, Automotive Concept Images, and LoRA models based on Flux.1, SD 3.5 Turbo/Large, Stable Diffusion XL 🎨

╰┈➤Collection :
➜ sketch : strangerzonehf/sketch-fav-675ba869c7ceaec7e652ee1c
➜ sketch2 : strangerzonehf/q-series-sketch-678e3503bf3a661758429717
➜ automotive : strangerzonehf/automotive-3d-675bb31a491d8c264d45d843
➜ texture 3d : strangerzonehf/flux-3dxl-engine-674833c14a001d5b1fdb5139
➜ super 3d : strangerzonehf/super-3d-engine-6743231d69f496df97addd2b
➜ style mix : strangerzonehf/mixer-engine-673582c9c5939d8aa5bf9533
➜ realism : strangerzonehf/realism-engine-67343495b6daf0fbdb904cc1

╰┈➤The Entire Collection :
➜ flux.1 : prithivMLmods/flux-lora-collections-66dd5908be2206cfaa8519be
➜ flux-ultimate-lora-collection : strangerzonehf/Flux-Ultimate-LoRA-Collection
➜ sd 3.5 large / turbo : prithivMLmods/sd-35-large-lora-671b39d7bc2e7f71a446b163
➜ sdxl : prithivMLmods/sdxl-dev-models-667803a6d5ac75b59110e527

╰┈➤Pages :
➜ page 1:

strangerzonehf
➜ page 2: @prithivMLmods
➜ demo : prithivMLmods/FLUX-LoRA-DLC

.🤗

1024m

authored a paper 21 days ago

Robust and Fine-Grained Detection of AI Generated Texts

Paper • 2504.11952 • Published 22 days ago • 11

fdaudens

posted an update 22 days ago

Post

1561

Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think — could MCPs be a real opportunity for journalism?

1 reply

·

prithivMLmods

posted an update 22 days ago

Post

2548

Try out the demo for Multimodal OCR featuring the implementation of models including RolmOCR and Qwen2VL OCR. The use case showcases image-text-to-text conversion and video understanding support for the RolmOCR model ! 🚀

🤗Multimodal OCR Space : prithivMLmods/Multimodal-OCR

📦The models implemented in this Space are:
+ Qwen2VL OCR : prithivMLmods/Qwen2-VL-OCR-2B-Instruct [ or ]
+ Qwen2VL OCR2 : prithivMLmods/Qwen2-VL-OCR2-2B-Instruct
+ RolmOCR : reducto/RolmOCR

Qwen2VL OCR supports only image-text-to-text in the space.

1024m

authored a paper 23 days ago

Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance

Paper • 2504.09753 • Published 24 days ago • 4

fdaudens

posted an update 27 days ago

Post

2138

Want AI that truly understands your country's culture? Public institutions are sitting on the next AI revolution - and here's the practical guide to unlock it.

I've had fascinating conversations recently about sovereign AI, with people trying to solve this recurring question: "How do we build AI that truly understands our culture?"

This guide by @evijit and @yjernite brings lots of insights about this question. It's not just about throwing data at models. It's about partnering cultural expertise with tech infrastructure in ways we're just starting to figure out.

An example? The National Library of Norway already has 150+ AI models on Hugging Face. They're not just digitizing books - they're building AI that thinks in Norwegian, understands Norwegian values, and serves Norwegian citizens.

This is sovereign AI in practice: technology that understands your culture, values, and languages.

Especially loved the practical examples on how to do this:
- Real examples from museums, libraries, and government agencies
- How to convert complex documents (PDFs, PowerPoints) into ML-ready formats
- Code templates for processing public data
- Technical recipes for sharing datasets on open platforms

The stakes? Citizens' ability to leverage their collective digital intelligence.

The technology is ready. The infrastructure exists. The guide shows exactly how to use it. What's needed is your cultural expertise to shape these tools.

Check it out: https://huggingface.co/blog/evijit/public-org-data-ai

P.s.: Building cool projects in a public institution? Share them in the comments for others to learn from!

fdaudens

posted an update 28 days ago

Post

2832

Do chatbots lie about Céline Dion? We now have answers, not speculation.

Ai2 just released OLMoTrace and it's a game-changer for transparency. You can literally see where an AI's responses come from in its training data - in real time.

The demo shows results about Céline. So I tried it out myself! Watch what happens in the video.

For journalists, researchers studying hallucinations and anyone who needs to trust their AI, this is like getting X-ray vision into AI systems. When the model made claims, I could instantly verify them against original sources. When it hallucinated, I could see why.

You can finally 1) understand how LLMs actually work and 2) verify if what they're saying is true. No more blind trust.

This pushes the open data movement to the next level.

👉 Blog post: https://allenai.org/blog/olmotrace
👉 Paper: https://www.datocms-assets.com/64837/1743890415-olmotrace.pdf

P.S.: A word of caution: never use a chatbot as a knowledge base. It's not Google. Better use it with a connection to the internet.

1 reply

·

1024m

authored a paper 28 days ago

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Paper • 2504.07072 • Published 29 days ago • 8

fdaudens

posted an update 28 days ago

Post

4109

🎨 Designers, meet OmniSVG! This new model helps you create professional vector graphics from text/images, generate editable SVGs from icons to detailed characters, convert rasters to vectors, maintain style consistency with references, and integrate into your workflow.

@OmniSVG

2 replies

·

Dataset Tools

AI & ML interests

Recent Activity

Dataset-Tools's activity

Robust and Fine-Grained Detection of AI Generated Texts

Improving Multilingual Capabilities with Cultural and Local Knowledge in Large Language Models While Enhancing Native Performance

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

AI & ML interests

Recent Activity

Team members 37

Dataset-Tools's activity