COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7 • 44
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text • Updated 30 days ago • 829k • • 879
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published Mar 28 • 44