Voice_Commands / app.py
zinoubm's picture
Using prompt engineering instead of fine tuning
3dacc3e
raw
history blame
3.91 kB
import gradio as gr
import openai
from constants import *
openai.api_key = OPENAI_API_KEY
title = "Car Seats Voice Commands"
description = """
This is a demo for controlling car seats with Voice Commands, On the left there's the inputs section
and on the right you'll find your outputs. For the inputs you have two choices **Voice** and **Text**,
Use **Voice** If you want a closer experience to the final product, Or use **Text** if you just want to test the command model.
for the outputs you have the **transcription**(Please check that it's accurate), **command**(to know which
command the system detected) and you have the robot voice (again use this if you want a more real experience).
**Features** : You can either activate of deactivate the following features
- Heated Seats
- Cooled Seats
- Massage Seats
Examples:
- **Direct Commands** : Try to say something like "Activate heated seats" or "Turn Off massage seats"
- **Indirect Commands** : Try "My back is cold" , "No heating is needed anymore" or "I'm stressed today"
"""
article = """
This demo processes commands in two steps, the first step is the transcription phase and the second is the
Command Classification phase. For Transcription I used The OpenAi whisper model, and for the classification
I Fine-Tuned the OpenAi **ada** model on Car Seats Command.
"""
# def get_command(command, model, id2label):
# """
# This function get the classification outputs from openai API
# """
# completion = openai.Completion.create(
# model=model, prompt=f"{command}->", max_tokens=1, temperature=0
# )
# id = int(completion["choices"][0]["text"].strip())
# result = id2label[id] if id in id2label else "unknown"
# return result
def get_command(command, model, id2label):
"""
This function get the classification outputs from openai API
"""
prompt = f"""
We want to control the seats of a car which has features to cool, heat, or massage a seat. The user said "{command}", Which feature we should use to ensure user comfort? Give just the number of the feature.
Mapping:
1: "massage_seats_on"
2: "massage_seats_off"
3: "heated_seats_on"
4: "heated_seats_off"
5: "cooled_seats_on"
6: "cooled_seats_off"
Command_Code:
"""
completion = openai.Completion.create(
model="text-davinci-003", prompt=prompt, max_tokens=2, temperature=0
)
id = int(completion["choices"][0]["text"].strip())
result = id2label[id] if id in id2label else "unknown"
return result
def transcribe(audio, text):
"""
if text provided the function will classify the input directly.
if not the audio will be transcribed then the transcription will be classified.
"""
if text:
result = get_command(text, MODEL, id2label)
return "Text provided by the user", text_respnses[result], None
# getting text transcription
audio_file = open(audio, "rb")
transcription = openai.Audio.transcribe("whisper-1", audio_file, language="en")
transcription = transcription["text"]
result = get_command(transcription, MODEL, id2label)
audio_res = resoponses.get(result)()
return transcription, text_respnses[result], audio_res
if __name__ == "__main__":
gr.Interface(
fn=transcribe,
inputs=[
gr.Audio(label="", source="microphone", type="filepath"),
gr.Textbox(label="If you prefer type your command (more accurate)"),
],
outputs=[
gr.Textbox(
label="Input Transcription (Please check that this matches what you've said)"
),
gr.Textbox(label="Machine Response (Text Version)"),
gr.Audio(label="Machine Response (Audio Version)"),
],
allow_flagging="auto",
title=title,
description=description,
article=article,
).launch()