Spaces:
Running
on
CPU Upgrade
What are the estimates based on?
The estimates seem to be off on a lot of these models, especially the smaller models. I've run many of these models on both iOS and Android devices, and if some of these took 2% of the battery on each run, there's absolutely no way I would have had a phone with battery charge at NeurIPS. Is the repo for the space available anywhere?
Could comment! The energy consumption is highly correlated with hardware used. For Qwen 2.5 7B , the estimation is the real measurement done on the gpu (Nvidia L4) measured through nvml. For the other models it's just an estimation supposing a similar hardware, improvements on estimates can absolutely be make. The repo used for deploying Qwen 2.5 7B is here: https://github.com/JulienDelavande/text-generation-inference. It is based on the TGI repo for deploying llms at scale.