💡 DICE - a sail Collection

sail 's Collections

🚀 Active PRM

🌾Oat-Zero: Understanding R1-Zero-Like Training

🔱 Sailor2 Language Models

🧬 RegMix: Data Mixture as Regression

📈 Scaling Laws with Vocabulary

⚓️ Sailor Language Models

💡 DICE

updated Jul 28, 2024

Self-alignment with DPO Implicit Rewards