Spaces:
Running
Running
File size: 10,232 Bytes
939262b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 |
Not all of the tool-calling features shown above are used by all models. Some use tool call IDs, others simply use the function name and match tool calls to results using the ordering, and there are several models that use neither and only issue one tool call at a time to avoid confusion. If you want your code to be compatible across as many models as possible, we recommend structuring your tools calls like we've shown here, and returning tool results in the order that they were issued by the model. The chat templates on each model should handle the rest. Understanding tool schemas Each function you pass to the tools argument of apply_chat_template is converted into a JSON schema. These schemas are then passed to the model chat template. In other words, tool-use models do not see your functions directly, and they never see the actual code inside them. What they care about is the function definitions and the arguments they need to pass to them - they care about what the tools do and how to use them, not how they work! It is up to you to read their outputs, detect if they have requested to use a tool, pass their arguments to the tool function, and return the response in the chat. Generating JSON schemas to pass to the template should be automatic and invisible as long as your functions follow the specification above, but if you encounter problems, or you simply want more control over the conversion, you can handle the conversion manually. Here is an example of a manual schema conversion. thon from transformers.utils import get_json_schema def multiply(a: float, b: float): """ A function that multiplies two numbers Args: a: The first number to multiply b: The second number to multiply """ return a * b schema = get_json_schema(multiply) print(schema) This will yield: json { "type": "function", "function": { "name": "multiply", "description": "A function that multiplies two numbers", "parameters": { "type": "object", "properties": { "a": { "type": "number", "description": "The first number to multiply" }, "b": { "type": "number", "description": "The second number to multiply" } }, "required": ["a", "b"] } } } If you wish, you can edit these schemas, or even write them from scratch yourself without using get_json_schema at all. JSON schemas can be passed directly to the tools argument of apply_chat_template - this gives you a lot of power to define precise schemas for more complex functions. Be careful, though - the more complex your schemas, the more likely the model is to get confused when dealing with them! We recommend simple function signatures where possible, keeping arguments (and especially complex, nested arguments) to a minimum. Here is an example of defining schemas by hand, and passing them directly to apply_chat_template: thon A simple function that takes no arguments current_time = { "type": "function", "function": { "name": "current_time", "description": "Get the current local time as a string.", "parameters": { 'type': 'object', 'properties': {} } } } A more complete function that takes two numerical arguments multiply = { 'type': 'function', 'function': { 'name': 'multiply', 'description': 'A function that multiplies two numbers', 'parameters': { 'type': 'object', 'properties': { 'a': { 'type': 'number', 'description': 'The first number to multiply' }, 'b': { 'type': 'number', 'description': 'The second number to multiply' } }, 'required': ['a', 'b'] } } } model_input = tokenizer.apply_chat_template( messages, tools = [current_time, multiply] ) Advanced: Retrieval-augmented generation "Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our recommendation for RAG models is that their template should accept a documents argument. This should be a list of documents, where each "document" is a single dict with title and contents keys, both of which are strings. Because this format is much simpler than the JSON schemas used for tools, no helper functions are necessary. Here's an example of a RAG template in action: thon document1 = { "title": "The Moon: Our Age-Old Foe", "contents": "Man has always dreamed of destroying the moon. In this essay, I shall" } document2 = { "title": "The Sun: Our Age-Old Friend", "contents": "Although often underappreciated, the sun provides several notable benefits" } model_input = tokenizer.apply_chat_template( messages, documents=[document1, document2] ) Advanced: How do chat templates work? The chat template for a model is stored on the tokenizer.chat_template attribute. If no chat template is set, the default template for that model class is used instead. Let's take a look at the template for BlenderBot: thon from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill") tokenizer.default_chat_template "{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}" That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make sure that the newlines and indentation we add don't end up being included in the template output - see the tip on trimming whitespace below! {%- for message in messages %} {%- if message['role'] == 'user' %} {{- ' ' }} {%- endif %} {{- message['content'] }} {%- if not loop.last %} {{- ' ' }} {%- endif %} {%- endfor %} {{- eos_token }} If you've never seen one of these before, this is a Jinja template. Jinja is a templating language that allows you to write simple code that generates text. In many ways, the code and syntax resembles Python. In pure Python, this template would look something like this: python for idx, message in enumerate(messages): if message['role'] == 'user': print(' ') print(message['content']) if not idx == len(messages) - 1: # Check for the last message in the conversation print(' ') print(eos_token) Effectively, the template does three things: 1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing. 2. Add the message content 3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token. This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which are a common way to give the model directives about how it should behave in the subsequent conversation. But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system messages and slightly different system message handling in general - don't use this one in your actual code!) {%- for message in messages %} {%- if message['role'] == 'user' %} {{- bos_token + '[INST] ' + message['content'] + ' [/INST]' }} {%- elif message['role'] == 'system' %} {{- '<<SYS>>\\n' + message['content'] + '\\n<</SYS>>\\n\\n' }} {%- elif message['role'] == 'assistant' %} {{- ' ' + message['content'] + ' ' + eos_token }} {%- endif %} {%- endfor %} Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly distinguishable to the model because of the tokens they're wrapped in. Advanced: Adding and editing chat templates How do I create a chat template? Simple, just write a jinja template and set tokenizer.chat_template. You may find it easier to start with an existing template from another model and simply edit it for your needs! For example, we could take the LLaMA template above and add "[ASST]" and "[/ASST]" to assistant messages: {%- for message in messages %} {%- if message['role'] == 'user' %} {{- bos_token + '[INST] ' + message['content'].strip() + ' [/INST]' }} {%- elif message['role'] == 'system' %} {{- '<<SYS>>\\n' + message['content'].strip() + '\\n<</SYS>>\\n\\n' }} {%- elif message['role'] == 'assistant' %} {{- '[ASST] ' + message['content'] + ' [/ASST]' + eos_token }} {%- endif %} {%- endfor %} Now, simply set the tokenizer.chat_template attribute. Next time you use [~PreTrainedTokenizer.apply_chat_template], it will use your new template! This attribute will be saved in the tokenizer_config.json file, so you can use [~utils.PushToHubMixin.push_to_hub] to upload your new template to the Hub and make sure everyone's using the right template for your model! python template = tokenizer.chat_template template = template.replace("SYS", "SYSTEM") # Change the system token tokenizer.chat_template = template # Set the new template tokenizer.push_to_hub("model_name") # Upload your new template to the Hub! The method [~PreTrainedTokenizer.apply_chat_template] which uses your chat template is called by the [TextGenerationPipeline] class, so once you set the correct chat template, your model will automatically become compatible with [TextGenerationPipeline]. If you're fine-tuning a model for chat, in addition to setting a chat template, you should probably add any new chat control tokens as special tokens in the tokenizer. Special tokens are never split, ensuring that your control tokens are always handled as single tokens rather than being tokenized in pieces. You should also set the tokenizer's eos_token attribute to the token that marks the end of assistant generations in your template. This will ensure that text generation tools can correctly figure out when to stop generating text. |