✨ M2-Base: 3.5TB web data (EN/ZH), with LLM-augmented content, APACHE2.0 ✨ M2-CoT: 4.2TB of auto-synthesized CoT reasoning data ✨ M2-Extra: domain-specific knowledge
DeepSeek, Alibaba, Skywork, Xiaomi, Bytedance..... And that’s just part of the companies from the Chinese community that released open models in April 🤯
🎬 Video > MAGI-1 by SandAI > SkyReels-A2 & SkyReels-V2 by Skywork > Wan2.1-FLF2V by Alibaba-Wan
🎨 Image > HiDream-I1 by Vivago AI > Kimi-VL by Moonshot AI > InstantCharacter by InstantX & Tencent-Hunyuan > Step1X-Edit by StepFun > EasyControl by Shanghai Jiaotong University
🧠 Reasoning > MiMo by Xiaomi > Skywork-R1V 2.0 by Skywork > ChatTS by ByteDance > Kimina by Moonshot AI & Numina > GLM-Z1 by Zhipu AI > Skywork OR1 by Skywork > Kimi-VL-Thinking by Moonshot AI
🔊 Audio > Kimi-Audio by Moonshot AI > IndexTTS by BiliBili > MegaTTS3 by ByteDance > Dolphin by DataOceanAI
🔢 Math > DeepSeek Prover V2 by Deepseek
🌍 LLM > Qwen by Alibaba-Qwen > InternVL3 by Shanghai AI lab > Ernie4.5 (demo) by Baidu
📊 Dataset > PHYBench by Eureka-Lab > ChildMandarin & Seniortalk by BAAI
I am fascinated by models learning from prompts and rewards - no example answers needed like in Supervised Fine-Tuning.
After the DeepSeek boom, everyone is trying GRPO with GSM8K or the Countdown Game...
I wanted a different challenge, like 𝘁𝗲𝗮𝗰𝗵𝗶𝗻𝗴 𝗮 𝗺𝗼𝗱𝗲𝗹 𝘁𝗼 𝗰𝗿𝗲𝗮𝘁𝗲 𝗮 𝘀𝗰𝗵𝗲𝗱𝘂𝗹𝗲 𝗳𝗿𝗼𝗺 𝗮 𝗹𝗶𝘀𝘁 𝗼𝗳 𝗲𝘃𝗲𝗻𝘁𝘀 𝗮𝗻𝗱 𝗽𝗿𝗶𝗼𝗿𝗶𝘁𝗶𝗲𝘀.
Choosing an original problem forced me to: 🤔 Think about the problem setting 🧬 Generate data 🤏 Choose the right base model 🏆 Design reward functions (and experiencing reward hacking) 🔄 Run multiple rounds of training, hoping that my model would learn something.
Kimi-Audio 🚀🎧 an OPEN audio foundation model released by Moonshot AI moonshotai/Kimi-Audio-7B-Instruct ✨ 7B ✨ 13M+ hours of pretraining data ✨ Novel hybrid input architecture ✨ Universal audio capabilities (ASR, AQA, AAC, SER, SEC/ASC, end-to-end conversation)
🔥 Announcing FLUX-Juiced: The Fastest Image Generation Endpoint (2.6x faster)!
Optimisations are widely applied and can reduce inference time, but their impact on quality often remains unclear, so we decided to challenge the status quo and create our own optimised version of FLUX.1[dev] called FLUX-juiced.
its based on orpheus - but really the model is irrelevant as i focus mostly on data augmentation / prep / pipelineing - its just the way to show progress
should be able to express fine even in a sfw context
probably the last release for a few weeks as i go back to the data pipeline and improve there ..
in the mean time, please do test and report problems or enjoyable generations you found - we have a growing discord community and i love to see what you get out of that early release !
(small colab is provided on the model page if you dont have the gpu to run that your self)
RealHarm: A Collection of Real-World Language Model Application Failure
I'm David from Giskard, and we work on securing your Agents. Today, we are launching RealHarm: a dataset of real-world problematic interactions with AI agents, drawn from publicly reported incidents.
✨Skywork OR1-Math-7B > Optimized for math reasoning ✨Skywork-OR1-7B-preview > Excels in math & coding ✨Skywork-OR1-32B-preview > Matches Deepseek-R1 on math (AIME24/25) and coding (LiveCodeBench)
Released under the Apache 2.0 license 🥳 Final version coming in 2 weeks!