Sample example does not work.
I am very hardly trying to run your sample code.
At first I had a problem with the dependencies but I solved it following your advice on this forum.
Now I got two different problems. First when I run the sample code I got
$ python test2.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:02<00:00, 2.42it/s]
/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py:1659: UserWarning: You are calling .generate() with the input_ids
being on a device type different than your model's device. input_ids
is on cpu, whereas the model is on cuda. You may experience unexpected behaviors or slower generation. Please make sure that you have put input_ids
to the correct device by calling for example input_ids = input_ids.to('cuda') before running .generate()
.
warnings.warn(
Traceback (most recent call last):
File "/home/me_l/TestOCR/test2.py", line 36, in
outputs = model.generate(**inputs, **gen_kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/generation/utils.py", line 2397, in _sample
outputs = self(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/me_l/.cache/huggingface/modules/transformers_modules/nikravan/glm-4vq/e441477369dc88ad0ab225d9cd69db0291e2dc7b/modeling_chatglm.py", line 1017, in forward
transformer_outputs = self.transformer(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
File "/home/me_l/.cache/huggingface/modules/transformers_modules/nikravan/glm-4vq/e441477369dc88ad0ab225d9cd69db0291e2dc7b/modeling_chatglm.py", line 866, in forward
new_input_embeds.append(torch.cat(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)
So I changed the device_map="auto" to device_map={"": 0} and then deleted that completely
Now I got a different error
$ python test2.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Loading checkpoint shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 5/5 [00:02<00:00, 2.28it/s]
Traceback (most recent call last):
File "/home/me_l/TestOCR/test2.py", line 10, in
model = AutoModelForCausalLM.from_pretrained(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 558, in from_pretrained
return model_class.from_pretrained(
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3820, in from_pretrained
dispatch_model(model, **device_map_kwargs)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/accelerate/big_modeling.py", line 502, in dispatch_model
model.to(device)
File "/home/me_l/TestOCR/SecondVenv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2702, in to
raise ValueError(
ValueError: .to
is not supported for 4-bit
or 8-bit
bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype
could you help me how to run this code?