Zhaolin Gao
GitBag
AI & ML interests
Reinforcement Learning from Human Feedback
Recent Activity
updated
a dataset
about 2 hours ago
GitBag/open_r1_mar2_round_1_tokenized_DeepSeek-R1-Distill-Qwen-1.5B
updated
a model
about 4 hours ago
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-14336_critic
updated
a model
about 4 hours ago
GitBag/block-q-sharp_ds-distilled-qwen-1.5b-ppo-kl-1e-4-ec-0.001-14336_actor