Spaces:

thenativefox
/

RAG

Sleeping

RAG

File size: 1,635 Bytes

939262b

Advanced: Retrieval-augmented generation
"Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding
to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our 
recommendation for RAG models is that their template
should accept a documents argument. This should be a list of documents, where each "document"
is a single dict with title and contents keys, both of which are strings. Because this format is much simpler
than the JSON schemas used for tools, no helper functions are necessary.
Here's an example of a RAG template in action:
thon
document1 = {
    "title": "The Moon: Our Age-Old Foe",
    "contents": "Man has always dreamed of destroying the moon. In this essay, I shall"
}
document2 = {
    "title": "The Sun: Our Age-Old Friend",
    "contents": "Although often underappreciated, the sun provides several notable benefits"
}
model_input = tokenizer.apply_chat_template(
    messages,
    documents=[document1, document2]
)

Advanced: How do chat templates work?
The chat template for a model is stored on the tokenizer.chat_template attribute. If no chat template is set, the
default template for that model class is used instead. Let's take a look at the template for BlenderBot:
thon

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
tokenizer.default_chat_template
"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ '  ' }}{% endif %}{% endfor %}{{ eos_token }}"