nikravan/glm-4vq · Sample example does not work.

I am very hardly trying to run your sample code.
At first I had a problem with the dependencies but I solved it following your advice on this forum.
Now I got two different problems. First when I run the sample code I got

$ python test2.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:02<00:00, 2.42it/s]
/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py:1659: UserWarning: You are calling .generate() with the input_ids being on a device type different than your model's device. input_ids is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate().
warnings.warn(
Traceback (most recent call last):
File "/home/me_l/TestOCR/test2.py", line 36, in
outputs = model.generate(**inputs, **gen_kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/me_l/.cache/huggingface/modules/transformers_modules/nikravan/glm-4vq/e441477369dc88ad0ab225d9cd69db0291e2dc7b/modeling_chatglm.py", line 1017, in forward
transformer_outputs = self.transformer(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me_l/.cache/huggingface/modules/transformers_modules/nikravan/glm-4vq/e441477369dc88ad0ab225d9cd69db0291e2dc7b/modeling_chatglm.py", line 866, in forward
new_input_embeds.append(torch.cat(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

So I changed the device_map="auto" to device_map={"": 0} and then deleted that completely
Now I got a different error

could you help me how to run this code?