2 38 63

Chao Zhou

ASHIDAKA

AI & ML interests

Object Detection, Transformer

Recent Activity

liked a Space 21 days ago

nanotron/ultrascale-playbook

upvoted a collection 28 days ago

Kimi-VL-A3B

liked a dataset about 2 months ago

nvidia/Llama-Nemotron-Post-Training-Dataset

View all activity

Organizations

None yet

ASHIDAKA's activity

liked a Space 21 days ago

2.56k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

upvoted a collection 28 days ago

Kimi-VL-A3B

Collection

Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 26 days ago • 66

liked a dataset about 2 months ago

nvidia/Llama-Nemotron-Post-Training-Dataset

Viewer • Updated 11 days ago • 3.91M • 11.3k • 466

upvoted a paper 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 144

upvoted 2 papers 3 months ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20 • 103

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 186

liked a dataset 3 months ago

open-r1/OpenR1-Math-220k

Viewer • Updated Feb 18 • 450k • 28.4k • 574

upvoted an article 4 months ago

Article

How to train a Language Model with Megatron-LM

Sep 7, 2022

• 10

liked a model 4 months ago

facebook/multi-token-prediction

Updated Jun 18, 2024 • 369

liked a dataset 4 months ago

allenai/dolma

Updated Apr 17, 2024 • 780 • 904

upvoted a collection 5 months ago

Tulu 3 Datasets

Collection

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 8 days ago • 80

upvoted 2 papers 7 months ago

Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens

Paper • 2410.13863 • Published Oct 17, 2024 • 38

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 99

liked a model 7 months ago

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Text Generation • Updated 25 days ago • 20k • • 2.04k

upvoted a paper 7 months ago

Pixtral 12B

Paper • 2410.07073 • Published Oct 9, 2024 • 66

liked a Space 7 months ago

281

Zero123++ Demo Space

🌒

upvoted a paper 7 months ago

Law of the Weakest Link: Cross Capabilities of Large Language Models

Paper • 2409.19951 • Published Sep 30, 2024 • 55