nyuuzyou (nyuuzyou)

replied to their post about 21 hours ago

https://huggingface.co/datasets/nyuuzyou/archiveofourown/discussions/2#680b5fa727163e0c419e4045

The OTW are the only ones making money off of you. If a non-profit organization does the opposite of their principles, there are big questions about the rest of their activities, including financial ones

replied to their post about 21 hours ago

Sorry, replied to the wrong comment/it was deleted. I meant to/it was to reply to another aggressive comment

reacted to DawnC's post with 🔥 about 22 hours ago

Post

2485

I'm excited to introduce VisionScout —an interactive vision tool that makes computer vision both accessible and powerful! 👀🔍

What can VisionScout do right now?
🖼️ Upload any image and detect 80 different object types using YOLOv8.
🔄 Instantly switch between Nano, Medium, and XLarge models depending on your speed vs. accuracy needs.
🎯 Filter specific classes (people, vehicles, animals, etc.) to focus only on what matters to you.
📊 View detailed statistics about detected objects, confidence levels, and spatial distribution.
🎨 Enjoy a clean, intuitive interface with responsive design and enhanced visualizations.

What's next?
I'm working on exciting updates:
- Support for more models
- Video processing and object tracking across frames
- Faster real-time detection
- Improved mobile responsiveness

The goal is to build a complete but user-friendly vision toolkit for both beginners and advanced users.

Try it yourself! 🚀
DawnC/VisionScout

I'd love to hear your feedback , what features would you find most useful? Any specific use cases you'd love to see supported?

Give it a try and let me know your thoughts in the comments! Stay tuned for future updates.

#ComputerVision #ObjectDetection #YOLO #MachineLearning #TechForLife

replied to their post about 23 hours ago

you're welcome

posted an update 1 day ago

Post

2138

🖼️ SVGRepo Icons Dataset - nyuuzyou/svgrepo

Collection of 217,510 Scalable Vector Graphics (SVG) icons featuring:

- Sourced from SVGRepo.com across diverse categories & styles
- Includes metadata: title, tags, source collection, and specific license
- Contains minified SVG markup for direct use or processing
- Organized into splits based on individual icon license (e.g., MIT, CC0, Apache)

replied to their post 3 days ago

you're welcome

reacted to JingzeShi's post with 🚀🚀 4 days ago

Post

2432

@SmallDoge SmallTalks( SmallDoge/SmallTalks) is a synthetic dataset designed for supervised fine-tuning of language models. The dataset covers a variety of conversational content, including daily conversations, tool usage, Python programming, encyclopedia Q&A, exam problem-solving, logical reasoning, and more. Each task is provided in both English and Chinese versions.

reacted to aiqtech's post with 🔥 8 days ago

Post

4267

🌐 AI Token Visualization Tool with Perfect Multilingual Support

Hello! Today I'm introducing my Token Visualization Tool with comprehensive multilingual support. This web-based application allows you to see how various Large Language Models (LLMs) tokenize text.

aiqtech/LLM-Token-Visual

✨ Key Features

🤖 Multiple LLM Tokenizers: Support for Llama 4, Mistral, Gemma, Deepseek, QWQ, BERT, and more
🔄 Custom Model Support: Use any tokenizer available on HuggingFace
📊 Detailed Token Statistics: Analyze total tokens, unique tokens, compression ratio, and more
🌈 Visual Token Representation: Each token assigned a unique color for visual distinction
📂 File Analysis Support: Upload and analyze large files

🌏 Powerful Multilingual Support
The most significant advantage of this tool is its perfect support for all languages:

📝 Asian languages including Korean, Chinese, and Japanese fully supported
🔤 RTL (right-to-left) languages like Arabic and Hebrew supported
🈺 Special characters and emoji tokenization visualization
🧩 Compare tokenization differences between languages
💬 Mixed multilingual text processing analysis

🚀 How It Works

Select your desired tokenizer model (predefined or HuggingFace model ID)
Input multilingual text or upload a file for analysis
Click 'Analyze Text' to see the tokenized results
Visually understand how the model breaks down various languages with color-coded tokens

💡 Benefits of Multilingual Processing
Understanding multilingual text tokenization patterns helps you:

Optimize prompts that mix multiple languages
Compare token efficiency across languages (e.g., English vs. Korean vs. Chinese token usage)
Predict token usage for internationalization (i18n) applications
Optimize costs for multilingual AI services

🛠️ Technology Stack

Backend: Flask (Python)
Frontend: HTML, CSS, JavaScript (jQuery)
Tokenizers: 🤗 Transformers library

27 replies

·

reacted to openfree's post with 🔥 9 days ago

Post

4862

🧠 ThinkFlow: The Revolutionary Platform That Gives LLMs the Power to Think 🚀

Hello AI community! We're excited to introduce you to ThinkFlow, an innovative service that transforms how language models solve problems. 🎉
VIDraft/ThinkFlow-llama

✨ What is ThinkFlow?
ThinkFlow is a groundbreaking platform that automatically applies step-by-step reasoning capabilities to existing LLM models without any modifications. It makes complex problem-solving transparent, allowing you to witness the model's thought process in real-time.

🔍 Key Features

Reasoning Without Model Modifications: Add step-by-step reasoning while utilizing existing LLMs as they are ⚙️
Visualized Thinking Process: See exactly how the model analyzes and solves problems 👁️
Before & After Comparison: Compare standard responses with reasoning-enhanced outputs in real-time 📊
Improved Accuracy: Deliver more accurate solutions for complex math and logic problems 📈
Educational Value: Teach students systematic approaches to problem-solving 👨‍🏫
User-Friendly Interface: Intuitive and easy-to-use UI for seamless experience 🖥️

💡 What Problems Can It Solve?
ThinkFlow is particularly effective for various domains including:

Complex mathematical problems 🧮
Logic puzzles 🧩
Questions requiring multi-step reasoning 🤔
Scientific analysis challenges 🔬
Complex decision-making processes 📝

👨‍💻 Technical Details
ThinkFlow is built on the meta-llama/Llama-3.1-8B-Instruct model and uses carefully designed prompt chains to guide the model through step-by-step thinking. Each reasoning step builds upon the results of previous steps, culminating in a comprehensive final answer.

💬 Join Our Community!
If you have questions or suggestions about ThinkFlow, join our Discord community: https://discord.gg/openfreeai
Let's build better AI reasoning experiences together! 💪

#AI #LLM #ReasoningAI #ThinkFlow #HuggingFace #OpenSource #AIEducation

9 replies

·

posted an update 9 days ago

Post

3424

🦅 SmolLM2-Eagle Collection - nyuuzyou/smollm2-eagle-680263bf97f0c7e6bbe4936b

Collection of fine-tuned bilingual language models featuring:
- Models in three parameter sizes: 135M, 360M, and 1.7B based on HuggingFaceTB's SmolLM2 models
- Both standard and GGUF formats for flexible deployment in llama.cpp and Ollama
- Fine-tuned on nyuuzyou/EagleSFT dataset (536,231 Russian-English QA pairs derived from 739k+ real user queries)
- Experimental Russian language capabilities while maintaining English performance
- Limited Russian capabilities due to SFT-only approach without Russian pre-training
- Environmental impact: ~19.75 kg CO2eq

This collection provides compact models for research on bilingual language capabilities, resource-constrained environments, and educational applications. Not recommended for production use due to experimental nature and inherent limitations. Available under Apache 2.0 license.

1 reply

·

reacted to philschmid's post with 🔥 10 days ago

Post

2193

Gemini 2.5 Flash is here! We excited launch our first hybrid reasoning Gemini model. In Flash 2.5 developer can turn thinking off.

**TL;DR:**
- 🧠 Controllable "Thinking" with thinking budget with up to 24k token
- 🌌 1 Million multimodal input context for text, image, video, audio, and pdf
- 🛠️ Function calling, structured output, google search & code execution.
- 🏦 $0.15 1M input tokens; $0.6 or $3.5 (thinking on) per million output tokens (thinking tokens are billed as output tokens)
- 💡 Knowledge cut of January 2025
- 🚀 Rate limits - Free 10 RPM 500 req/day
- 🏅Outperforms 2.0 Flash on every benchmark

Try it ⬇️
https://aistudio.google.com/prompts/new_chat?model=gemini-2.5-flash-preview-04-17

1 reply

·

posted an update 11 days ago

Post

2919

🦅 EagleSFT Dataset - nyuuzyou/EagleSFT

Collection of 536,231 question-answer pairs featuring:

- Human-posed questions and machine-generated responses for SFT
- Bilingual content in Russian and English with linked IDs
- Derived from 739k+ real user queries, primarily educational topics
- Includes unique IDs and machine-generated category labels

This dataset provides a resource for supervised fine-tuning (SFT) of large language models, cross-lingual research, and understanding model responses to diverse user prompts. Released to the public domain under CC0 1.0 license.

reacted to neph1's post with 👍 12 days ago

Post

2622

I know Hunyuan Video is yesterday's jam, but in case you're looking for some cinematic LoRA's (and don't like civitai for some reason), I've uploaded my most popular ones to hf. They are:
1980s fantasy: neph1/1980s_Fantasy_Movies_Hunyuan_Video_Lora
1950s scifi: neph1/50s_scifi_hunyuan_video_lora
1920s horror: neph1/1920s_horror_hunyuan_video_lora

replied to merve's post 13 days ago

Multimodal reasoning is the way to go, especially with open source, congrats to the Moonshot team

posted an update 15 days ago

Post

5577

🇷🇺 Russian Forum Messages Dataset - nyuuzyou/ruforum

Collection of approximately 58 million Russian forum messages featuring:

- Complete message content from Russian online forums spanning 2010-2025
- Comprehensive metadata including unique message IDs and timestamps
- Full text content preserving original user discussions and interactions
- Monolingual dataset focused exclusively on Russian language content

This dataset offers a unique textual archive of Russian online conversations suitable for text generation, sentiment analysis, and language modeling research. Released to the public domain under CC0 1.0 license.

replied to piper2024's post 15 days ago

Hello, Hugging Face uses git (or backwards-compatible xet) for storing files. When you upload a new version of a file to Git, it doesn't overwrite the old file. Instead, Git stores both versions, with the new version becoming the current one while the old version remains accessible in your history, which is why repositories grow over time.

replied to piper2024's post 16 days ago

@piper2024 Yes. Open the repository you want to delete files from, go to settings and under “Storage Usage” click “List LFS files”. There you can select multiple files and delete them at once

replied to piper2024's post 16 days ago

"Above the included 1TB (or 1TB per seat) of private storage in PRO and Enterprise Hub, private storage is invoiced at $25/TB/month, in 1TB increments."

https://huggingface.co/docs/hub/storage-limits

replied to etemiz's post 17 days ago

It's long been my view that LMArena isn't a fully reliable measure of real-world LLM performance. I suspect many users might click somewhat randomly, perhaps favoring answers based on superficial qualities like length, formatting, or speed, rather than deeper assessment.

Since all the Arena dialogues are publicly available on Hugging Face, a crowdsourced evaluation system utilizing that data seems like it could be quite valuable. It would also be interesting to see more development in automated evaluation systems, perhaps along the lines of "Arena-Hard-Auto" (though keeping such systems updated and robust is a challenge). However, building an effective automated evaluator would likely require training a specialized model on a large corpus, because I'm fairly certain that using a current powerful model like GPT-4-Turbo (or any other) for evaluation would introduce bias, favoring responses that align with its own style.

nyuuzyou PRO

AI & ML interests

Recent Activity

Organizations

nyuuzyou's activity