Spaces:

thenativefox
/

RAG

Sleeping

RAG

File size: 3,636 Bytes

939262b

text
{'role': 'assistant', 'content': "Matey, I'm afraid I must inform ye that humans cannot eat helicopters. Helicopters are not food, they are flying machines. Food is meant to be eaten, like a hearty plate o' grog, a savory bowl o' stew, or a delicious loaf o' bread. But helicopters, they be for transportin' and movin' around, not for eatin'. So, I'd say none, me hearties. None at all."}
The pipeline will take care of all the details of tokenization and calling apply_chat_template for you -
once the model has a chat template, all you need to do is initialize the pipeline and pass it the list of messages!
What are "generation prompts"?
You may have noticed that the apply_chat_template method has an add_generation_prompt argument. This argument tells
the template to add tokens that indicate the start of a bot response. For example, consider the following chat:
python
messages = [
    {"role": "user", "content": "Hi there!"},
    {"role": "assistant", "content": "Nice to meet you!"},
    {"role": "user", "content": "Can I ask a question?"}
]
Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example:
python
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
"""
And here's what it looks like with a generation prompt:
python
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
"""<|im_start|>user
Hi there!<|im_end|>
<|im_start|>assistant
Nice to meet you!<|im_end|>
<|im_start|>user
Can I ask a question?<|im_end|>
<|im_start|>assistant
"""
Note that this time, we've added the tokens that indicate the start of a bot response. This ensures that when the model
generates text it will write a bot response instead of doing something unexpected, like continuing the user's 
message. Remember, chat models are still just language models - they're trained to continue text, and chat is just a 
special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're 
supposed to be doing.
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
special tokens before bot responses. In these cases, the add_generation_prompt argument will have no effect. The exact
effect that add_generation_prompt has will depend on the template being used.
Can I use chat templates in training?
Yes! We recommend that you apply the chat template as a preprocessing step for your dataset. After this, you
can simply continue like any other language model training task. When training, you should usually set 
add_generation_prompt=False, because the added tokens to prompt an assistant response will not be helpful during 
training. Let's see an example:
thon
from transformers import AutoTokenizer
from datasets import Dataset
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
chat1 = [
    {"role": "user", "content": "Which is bigger, the moon or the sun?"},
    {"role": "assistant", "content": "The sun."}
]
chat2 = [
    {"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
    {"role": "assistant", "content": "A bacterium."}
]
dataset = Dataset.from_dict({"chat": [chat1, chat2]})
dataset = dataset.map(lambda x: {"formatted_chat": tokenizer.apply_chat_template(x["chat"], tokenize=False, add_generation_prompt=False)})
print(dataset['formatted_chat'][0])
And we get:text
<|user|>
Which is bigger, the moon or the sun?
<|assistant|>
The sun.