|
# How to Create a Chatbot with Gradio |
|
|
|
Tags: NLP, TEXT, CHAT |
|
|
|
## Introduction |
|
|
|
Chatbots are a popular application of large language models. Using `gradio`, you can easily build a demo of your chatbot model and share that with your users, or try it yourself using an intuitive chatbot UI. |
|
|
|
This tutorial uses `gr.ChatInterface()`, which is a high-level abstraction that allows you to create your chatbot UI fast, often with a single line of code. The chatbot interface that we create will look something like this: |
|
|
|
$demo_chatinterface_streaming_echo |
|
|
|
We'll start with a couple of simple examples, and then show how to use `gr.ChatInterface()` with real language models from several popular APIs and libraries, including `langchain`, `openai`, and Hugging Face. |
|
|
|
**Prerequisites**: please make sure you are using the **latest version** version of Gradio: |
|
|
|
```bash |
|
$ pip install --upgrade gradio |
|
``` |
|
|
|
## Defining a chat function |
|
|
|
When working with `gr.ChatInterface()`, the first thing you should do is define your chat function. Your chat function should take two arguments: `message` and then `history` (the arguments can be named anything, but must be in this order). |
|
|
|
- `message`: a `str` representing the user's input. |
|
- `history`: a `list` of `list` representing the conversations up until that point. Each inner list consists of two `str` representing a pair: `[user input, bot response]`. |
|
|
|
Your function should return a single string response, which is the bot's response to the particular user input `message`. Your function can take into account the `history` of messages, as well as the current message. |
|
|
|
Let's take a look at a few examples. |
|
|
|
## Example: a chatbot that responds yes or no |
|
|
|
Let's write a chat function that responds `Yes` or `No` randomly. |
|
|
|
Here's our chat function: |
|
|
|
```python |
|
import random |
|
|
|
def random_response(message, history): |
|
return random.choice(["Yes", "No"]) |
|
``` |
|
|
|
Now, we can plug this into `gr.ChatInterface()` and call the `.launch()` method to create the web interface: |
|
|
|
```python |
|
import gradio as gr |
|
|
|
gr.ChatInterface(random_response).launch() |
|
``` |
|
|
|
That's it! Here's our running demo, try it out: |
|
|
|
$demo_chatinterface_random_response |
|
|
|
## Another example using the user's input and history |
|
|
|
Of course, the previous example was very simplistic, it didn't even take user input or the previous history into account! Here's another simple example showing how to incorporate a user's input as well as the history. |
|
|
|
```python |
|
import random |
|
import gradio as gr |
|
|
|
def alternatingly_agree(message, history): |
|
if len(history) % 2 == 0: |
|
return f"Yes, I do think that '{message}'" |
|
else: |
|
return "I don't think so" |
|
|
|
gr.ChatInterface(alternatingly_agree).launch() |
|
``` |
|
|
|
## Streaming chatbots |
|
|
|
If in your chat function, you use `yield` to generate a sequence of responses, you'll end up with a streaming chatbot. It's that simple! |
|
|
|
```python |
|
import time |
|
import gradio as gr |
|
|
|
def slow_echo(message, history): |
|
for i in range(len(message)): |
|
time.sleep(0.3) |
|
yield "You typed: " + message[: i+1] |
|
|
|
gr.ChatInterface(slow_echo).launch() |
|
``` |
|
|
|
Notice that we've [enabled queuing](/guides/key-features#queuing), which is required to use generator functions. While the response is streaming, the "Submit" button turns into a "Stop" button that can be used to stop the generator function. You can customize the appearance and behavior of the "Stop" button using the `stop_btn` parameter. |
|
|
|
## Customizing your chatbot |
|
|
|
If you're familiar with Gradio's `Interface` class, the `gr.ChatInterface` includes many of the same arguments that you can use to customize the look and feel of your Chatbot. For example, you can: |
|
|
|
- add a title and description above your chatbot using `title` and `description` arguments. |
|
- add a theme or custom css using `theme` and `css` arguments respectively. |
|
- add `examples` and even enable `cache_examples`, which make it easier for users to try it out . |
|
- You can change the text or disable each of the buttons that appear in the chatbot interface: `submit_btn`, `retry_btn`, `undo_btn`, `clear_btn`. |
|
|
|
If you want to customize the `gr.Chatbot` or `gr.Textbox` that compose the `ChatInterface`, then you can pass in your own chatbot or textbox as well. Here's an example of how we can use these parameters: |
|
|
|
```python |
|
import gradio as gr |
|
|
|
def yes_man(message, history): |
|
if message.endswith("?"): |
|
return "Yes" |
|
else: |
|
return "Ask me anything!" |
|
|
|
gr.ChatInterface( |
|
yes_man, |
|
chatbot=gr.Chatbot(height=300), |
|
textbox=gr.Textbox(placeholder="Ask me a yes or no question", container=False, scale=7), |
|
title="Yes Man", |
|
description="Ask Yes Man any question", |
|
theme="soft", |
|
examples=["Hello", "Am I cool?", "Are tomatoes vegetables?"], |
|
cache_examples=True, |
|
retry_btn=None, |
|
undo_btn="Delete Previous", |
|
clear_btn="Clear", |
|
).launch() |
|
``` |
|
|
|
## Additional Inputs |
|
|
|
You may want to add additional parameters to your chatbot and expose them to your users through the Chatbot UI. For example, suppose you want to add a textbox for a system prompt, or a slider that sets the number of tokens in the chatbot's response. The `ChatInterface` class supports an `additional_inputs` parameter which can be used to add additional input components. |
|
|
|
The `additional_inputs` parameters accepts a component or a list of components. You can pass the component instances directly, or use their string shortcuts (e.g. `"textbox"` instead of `gr.Textbox()`). If you pass in component instances, and they have _not_ already been rendered, then the components will appear underneath the chatbot (and any examples) within a `gr.Accordion()`. You can set the label of this accordion using the `additional_inputs_accordion_name` parameter. |
|
|
|
Here's a complete example: |
|
|
|
$code_chatinterface_system_prompt |
|
|
|
If the components you pass into the `additional_inputs` have already been rendered in a parent `gr.Blocks()`, then they will _not_ be re-rendered in the accordion. This provides flexibility in deciding where to lay out the input components. In the example below, we position the `gr.Textbox()` on top of the Chatbot UI, while keeping the slider underneath. |
|
|
|
```python |
|
import gradio as gr |
|
import time |
|
|
|
def echo(message, history, system_prompt, tokens): |
|
response = f"System prompt: {system_prompt}\n Message: {message}." |
|
for i in range(min(len(response), int(tokens))): |
|
time.sleep(0.05) |
|
yield response[: i+1] |
|
|
|
with gr.Blocks() as demo: |
|
system_prompt = gr.Textbox("You are helpful AI.", label="System Prompt") |
|
slider = gr.Slider(10, 100, render=False) |
|
|
|
gr.ChatInterface( |
|
echo, additional_inputs=[system_prompt, slider] |
|
) |
|
|
|
demo.launch() |
|
``` |
|
|
|
If you need to create something even more custom, then its best to construct the chatbot UI using the low-level `gr.Blocks()` API. We have [a dedicated guide for that here](/guides/creating-a-custom-chatbot-with-blocks). |
|
|
|
## Using your chatbot via an API |
|
|
|
Once you've built your Gradio chatbot and are hosting it on [Hugging Face Spaces](https://hf.space) or somewhere else, then you can query it with a simple API at the `/chat` endpoint. The endpoint just expects the user's message (and potentially additional inputs if you have set any using the `additional_inputs` parameter), and will return the response, internally keeping track of the messages sent so far. |
|
|
|
[](https://github.com/gradio-app/gradio/assets/1778297/7b10d6db-6476-4e2e-bebd-ecda802c3b8f) |
|
|
|
To use the endpoint, you should use either the [Gradio Python Client](/guides/getting-started-with-the-python-client) or the [Gradio JS client](/guides/getting-started-with-the-js-client). |
|
|
|
## A `langchain` example |
|
|
|
Now, let's actually use the `gr.ChatInterface` with some real large language models. We'll start by using `langchain` on top of `openai` to build a general-purpose streaming chatbot application in 19 lines of code. You'll need to have an OpenAI key for this example (keep reading for the free, open-source equivalent!) |
|
|
|
```python |
|
from langchain.chat_models import ChatOpenAI |
|
from langchain.schema import AIMessage, HumanMessage |
|
import openai |
|
import gradio as gr |
|
|
|
os.environ["OPENAI_API_KEY"] = "sk-..." # Replace with your key |
|
|
|
llm = ChatOpenAI(temperature=1.0, model='gpt-3.5-turbo-0613') |
|
|
|
def predict(message, history): |
|
history_langchain_format = [] |
|
for human, ai in history: |
|
history_langchain_format.append(HumanMessage(content=human)) |
|
history_langchain_format.append(AIMessage(content=ai)) |
|
history_langchain_format.append(HumanMessage(content=message)) |
|
gpt_response = llm(history_langchain_format) |
|
return gpt_response.content |
|
|
|
gr.ChatInterface(predict).launch() |
|
``` |
|
|
|
## A streaming example using `openai` |
|
|
|
Of course, we could also use the `openai` library directy. Here a similar example, but this time with streaming results as well: |
|
|
|
```python |
|
import openai |
|
import gradio as gr |
|
|
|
openai.api_key = "sk-..." # Replace with your key |
|
|
|
def predict(message, history): |
|
history_openai_format = [] |
|
for human, assistant in history: |
|
history_openai_format.append({"role": "user", "content": human }) |
|
history_openai_format.append({"role": "assistant", "content":assistant}) |
|
history_openai_format.append({"role": "user", "content": message}) |
|
|
|
response = openai.ChatCompletion.create( |
|
model='gpt-3.5-turbo', |
|
messages= history_openai_format, |
|
temperature=1.0, |
|
stream=True |
|
) |
|
|
|
partial_message = "" |
|
for chunk in response: |
|
if len(chunk['choices'][0]['delta']) != 0: |
|
partial_message = partial_message + chunk['choices'][0]['delta']['content'] |
|
yield partial_message |
|
|
|
gr.ChatInterface(predict).launch() |
|
``` |
|
|
|
## Example using a local, open-source LLM with Hugging Face |
|
|
|
Of course, in many cases you want to run a chatbot locally. Here's the equivalent example using Together's RedePajama model, from Hugging Face (this requires you to have a GPU with CUDA). |
|
|
|
```python |
|
import gradio as gr |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, StoppingCriteria, StoppingCriteriaList, TextIteratorStreamer |
|
from threading import Thread |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1") |
|
model = AutoModelForCausalLM.from_pretrained("togethercomputer/RedPajama-INCITE-Chat-3B-v1", torch_dtype=torch.float16) |
|
model = model.to('cuda:0') |
|
|
|
class StopOnTokens(StoppingCriteria): |
|
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: |
|
stop_ids = [29, 0] |
|
for stop_id in stop_ids: |
|
if input_ids[0][-1] == stop_id: |
|
return True |
|
return False |
|
|
|
def predict(message, history): |
|
|
|
history_transformer_format = history + [[message, ""]] |
|
stop = StopOnTokens() |
|
|
|
messages = "".join(["".join(["\n<human>:"+item[0], "\n<bot>:"+item[1]]) #curr_system_message + |
|
for item in history_transformer_format]) |
|
|
|
model_inputs = tokenizer([messages], return_tensors="pt").to("cuda") |
|
streamer = TextIteratorStreamer(tokenizer, timeout=10., skip_prompt=True, skip_special_tokens=True) |
|
generate_kwargs = dict( |
|
model_inputs, |
|
streamer=streamer, |
|
max_new_tokens=1024, |
|
do_sample=True, |
|
top_p=0.95, |
|
top_k=1000, |
|
temperature=1.0, |
|
num_beams=1, |
|
stopping_criteria=StoppingCriteriaList([stop]) |
|
) |
|
t = Thread(target=model.generate, kwargs=generate_kwargs) |
|
t.start() |
|
|
|
partial_message = "" |
|
for new_token in streamer: |
|
if new_token != '<': |
|
partial_message += new_token |
|
yield partial_message |
|
|
|
|
|
gr.ChatInterface(predict).launch() |
|
``` |
|
|
|
With those examples, you should be all set to create your own Gradio Chatbot demos soon! For building even more custom Chatbot applications, check out [a dedicated guide](/guides/creating-a-custom-chatbot-with-blocks) using the low-level `gr.Blocks()` API. |
|
|