Collection of 217,510 Scalable Vector Graphics (SVG) icons featuring:
- Sourced from SVGRepo.com across diverse categories & styles - Includes metadata: title, tags, source collection, and specific license - Contains minified SVG markup for direct use or processing - Organized into splits based on individual icon license (e.g., MIT, CC0, Apache)
Collection of fine-tuned bilingual language models featuring: - Models in three parameter sizes: 135M, 360M, and 1.7B based on HuggingFaceTB's SmolLM2 models - Both standard and GGUF formats for flexible deployment in llama.cpp and Ollama - Fine-tuned on nyuuzyou/EagleSFT dataset (536,231 Russian-English QA pairs derived from 739k+ real user queries) - Experimental Russian language capabilities while maintaining English performance - Limited Russian capabilities due to SFT-only approach without Russian pre-training - Environmental impact: ~19.75 kg CO2eq
This collection provides compact models for research on bilingual language capabilities, resource-constrained environments, and educational applications. Not recommended for production use due to experimental nature and inherent limitations. Available under Apache 2.0 license.
Collection of 536,231 question-answer pairs featuring:
- Human-posed questions and machine-generated responses for SFT - Bilingual content in Russian and English with linked IDs - Derived from 739k+ real user queries, primarily educational topics - Includes unique IDs and machine-generated category labels
This dataset provides a resource for supervised fine-tuning (SFT) of large language models, cross-lingual research, and understanding model responses to diverse user prompts. Released to the public domain under CC0 1.0 license.