NAS question

#9
by khoantap - opened

Thank you for the great work! Would you mind sharing the number of GPUs used for NAS to come up with 49B architecture? A roughly estimate would be great!

NVIDIA org

Thank you for your comment! The NAS part of our algorithms is not very compute intensive. For scoring each block, we only require forward passes, and for this model this amounts to something on the order of 300 gpu hours. The heavier parts are local block distillation and global knowledge distillation. More details on the general approach can be found here:
https://arxiv.org/abs/2411.19146

This is very helpful. Thank you

khoantap changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment