Transformers
PyTorch

size mismatch when loading pretrained model from huggingface

#3
by Anunay-epfl - opened

Hi, I was trying to load this model using

model_name ="state-spaces/mamba2-2.7b"
teacher_model = AutoModelForCausalLM.from_pretrained(
        model_name, torch_dtype=dtype)

I am getting size mismatch error, it seems that the config from huggingface is not the same as the pretrained model it is fetching.
versions listed below:

mamba-2.2.2
causal_conv1d (built from source)
flash-attn==2.6.3 
peft==0.12.0
 huggingface-hub==0.24.5 
deepspeed==0.12.2 
trl==0.8.6 
transformers==4.43.1
triton-2.1.0

base image: nvcr.io/nvidia/pytorch:23.10-py3

I tried the same on google colab with no new installation just basic 3 line import and the same error occurred.

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("state-spaces/mamba2-2.7b")

link: https://colab.research.google.com/drive/1f6UEE--ApFTELpMKNDdBN-wEZ7IXwXU5?usp=sharing

Thanks for the help, online there are not many discussions regarding this.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment