Templates for Chat Models Introduction An increasingly common use case for LLMs is chat. In a chat context, rather than continuing a single string of text (as is the case with a standard language model), the model instead continues a conversation that consists of one or more messages, each of which includes a role, like "user" or "assistant", as well as message text. Much like tokenization, different models expect very different input formats for chat. This is the reason we added chat templates as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, represented as lists of messages, into a single tokenizable string in the format that the model expects. Let's make this concrete with a quick example using the BlenderBot model. BlenderBot has an extremely simple default template, which mostly just adds whitespace between rounds of dialogue: thon from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] tokenizer.apply_chat_template(chat, tokenize=False) " Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!" Notice how the entire chat is condensed into a single string. If we use tokenize=True, which is the default setting, that string will also be tokenized for us. To see a more complex template in action, though, let's use the mistralai/Mistral-7B-Instruct-v0.1 model. thon from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1") chat = [ {"role": "user", "content": "Hello, how are you?"}, {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, {"role": "user", "content": "I'd like to show off how chat templating works!"}, ] tokenizer.apply_chat_template(chat, tokenize=False) "[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today? [INST] I'd like to show off how chat templating works! [/INST]" Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not. How do I use chat templates? As you can see in the example above, chat templates are easy to use. Simply build a list of messages, with role and content keys, and then pass it to the [~PreTrainedTokenizer.apply_chat_template] method. Once you do that, you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea to use add_generation_prompt=True to add a generation prompt. Here's an example of preparing input for model.generate(), using the Zephyr assistant model: thon from transformers import AutoModelForCausalLM, AutoTokenizer checkpoint = "HuggingFaceH4/zephyr-7b-beta" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = AutoModelForCausalLM.from_pretrained(checkpoint) # You may want to use bfloat16 and/or move to GPU here messages = [ { "role": "system", "content": "You are a friendly chatbot who always responds in the style of a pirate", }, {"role": "user", "content": "How many helicopters can a human eat in one sitting?"}, ] tokenized_chat = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt") print(tokenizer.decode(tokenized_chat[0])) This will yield a string in the input format that Zephyr expects.text <|system|> You are a friendly chatbot who always responds in the style of a pirate <|user|> How many helicopters can a human eat in one sitting? <|assistant|>