Trouble Loading Evo2 40B Model on 2x A100 GPUs

#1
by RuiHu - opened

Hello guys,

I tried deploying both the 7B and 40B pretrained Evo2 models on my machine, using 2 A100 GPUs.

The 7B model runs pretty well on a single GPU, achieving around 80–90% accuracy. However, the 40B model behaved oddly, as shown in the attached screenshot.

[Image]

I suspect the issue might be related to merging the two shards (part 0 & 1 .pt files).

Here’s the command I used:
CUDA_VISIBLE_DEVICES=2,3 torchrun --nproc_per_node=1 ./test/test_evo2.py --model_name evo2_40b

I also adjusted the following configs:
use_fp8_input_projections: False

model_parallel_size: 4
pipe_parallel_size: 4

Has anyone successfully loaded and run the 40B model yet?

Thanks!

img_v3_02ku_68421a4d-eca8-47de-b203-e719ce98896g.jpeg

Arc Institute org

Hello Rui. Thanks for your question. Turning off FP8 is known to cause issues for the 40B model, which is what you are seeing. To run the 40B correctly requires an FP8 compatible GPU

Hello Garyk,

Thank you for your response. May I ask if the performance issues of the 40B model is due to a high proportion of parameters being deprecated when loading FP8 on the A100, while the 7B model contains relatively fewer FP8 parameter layers, resulting in less noticeable issues?

Arc Institute org

Thanks RuiHu. Not necessarily, the 1B model actually also appears to be sensitive to fp8

Sorry for the delayed response, and thanks for this info, it sounds interesting. I’m wondering whether the variation in sensitivity across different model sizes is due to differences in their architectures.
Also, may I ask which dataset or criteria you used to train the function score, and the reasoning behind that choice? I noticed that there are multiple approaches to score single nucleotide variant mutations, while results can vary significantly across different versions when I run SNV mutation tests.
Apologies for the naive questions, I am brand new to the genomics and still learning it.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment