size mismatch when loading pretrained model from huggingface
#3
by
Anunay-epfl
- opened
Hi, I was trying to load this model using
model_name ="state-spaces/mamba2-2.7b"
teacher_model = AutoModelForCausalLM.from_pretrained(
model_name, torch_dtype=dtype)
I am getting size mismatch error, it seems that the config from huggingface is not the same as the pretrained model it is fetching.
versions listed below:
mamba-2.2.2
causal_conv1d (built from source)
flash-attn==2.6.3
peft==0.12.0
huggingface-hub==0.24.5
deepspeed==0.12.2
trl==0.8.6
transformers==4.43.1
triton-2.1.0
base image: nvcr.io/nvidia/pytorch:23.10-py3
I tried the same on google colab with no new installation just basic 3 line import and the same error occurred.
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba2-2.7b")
link: https://colab.research.google.com/drive/1f6UEE--ApFTELpMKNDdBN-wEZ7IXwXU5?usp=sharing
Thanks for the help, online there are not many discussions regarding this.