$MODE=bfloat16 model weights with serialized version (5,1)
#11
by
RyanL22
- opened
We support lite-inference for multiple GPU architectures, primarily in two modes.
MODE=torchscript: All GPUs with PyTorch2.2+. Inference at float32, slower but closest to original model performance.
MODE=bfloat16: Optimized mode for A100 GPUs with PyTorch-2.3. Uses FlashAttention for accelerated inference. Coming Soon!
facebook/sapiens-pose-1b-bfloat16
, facebook/sapiens-seg-1b-bfloat16
have serialized version as (5,1) which is under PyTorch 2.1.x
Do you have any plan to upload new model weights with (8,2) or (7,3) serialized version under PyTorch 2.3.x?
Thanks in advance!
RyanL22
changed discussion status to
closed