NAS question
#9
by
khoantap
- opened
Thank you for the great work! Would you mind sharing the number of GPUs used for NAS to come up with 49B architecture? A roughly estimate would be great!
Thank you for your comment! The NAS part of our algorithms is not very compute intensive. For scoring each block, we only require forward passes, and for this model this amounts to something on the order of 300 gpu hours. The heavier parts are local block distillation and global knowledge distillation. More details on the general approach can be found here:
https://arxiv.org/abs/2411.19146
This is very helpful. Thank you
khoantap
changed discussion status to
closed