Templates for Chat Models
Introduction
An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string
of text (as is the case with a standard language model), the model instead continues a conversation that consists
of one or more messages, each of which includes a role, like "user" or "assistant", as well as message text.
Much like tokenization, different models expect very different input formats for chat. This is the reason we added
chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, 
represented as lists of messages, into a single tokenizable string in the format that the model expects. 
Let's make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default 
template, which mostly just adds whitespace between rounds of dialogue:
thon

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
chat = [
    {"role": "user", "content": "Hello, how are you?"},
    {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
    {"role": "user", "content": "I'd like to show off how chat templating works!"},
 ]
tokenizer.apply_chat_template(chat, tokenize=False)
" Hello, how are you?  I'm doing great. How can I help you today?   I'd like to show off how chat templating works!"

Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting,
that string will also be tokenized for us. To see a more complex template in action, though, let's use the 
mistralai/Mistral-7B-Instruct-v0.1 model.
thon

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")
chat = [
   {"role": "user", "content": "Hello, how are you?"},
   {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
   {"role": "user", "content": "I'd like to show off how chat templating works!"},
 ]
tokenizer.apply_chat_template(chat, tokenize=False)
"[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]"

Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of 
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
How do I use chat templates?
As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role
and content keys, and then pass it to the [~PreTrainedTokenizer.apply_chat_template] method. Once you do that,
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
to use add_generation_prompt=True to add a generation prompt. 
Here's an example of preparing input for model.generate(), using the Zephyr assistant model:
thon
from transformers import AutoModelForCausalLM, AutoTokenizer
checkpoint = "HuggingFaceH4/zephyr-7b-beta"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint)  # You may want to use bfloat16 and/or move to GPU here
messages = [
    {
        "role": "system",
        "content": "You are a friendly chatbot who always responds in the style of a pirate",
    },
    {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
 ]
tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
print(tokenizer.decode(tokenized_chat[0]))
This will yield a string in the input format that Zephyr expects.text
<|system|>
You are a friendly chatbot who always responds in the style of a pirate 
<|user|>
How many helicopters can a human eat in one sitting? 
<|assistant|>