3v324v23's picture
Зафиксирована рабочая версия TEN-Agent для HuggingFace Space
87337b1

openai_v2v_python

An extension for integrating OpenAI's Next Generation of Multimodal AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.

Features

  • OpenAI Multimodal Integration: Leverage GPT Multimodal models for voice to voice as well as text processing.
  • Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
  • Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.

API

Refer to api definition in [manifest.json] and default values in property.json.

Property Type Description
api_key string API key for authenticating with OpenAI
temperature float64 Sampling temperature, higher values mean more randomness
model string Model identifier (e.g., GPT-3.5, GPT-4)
max_tokens int64 Maximum number of tokens to generate
system_message string Default system message to send to the model
voice string Voice that OpenAI model speeches, such as alloy, echo, shimmer, etc
server_vad bool Flag to enable or disable server vad of OpenAI
language string Language that OpenAO model reponds, such as en-US, zh-CN, etc
dump bool Flag to enable or disable audio dump for debugging purpose

Data Out:

Name Property Type Description
text_data text string Outgoing text data

Command Out:

Name Description
flush Response after flushing the current state

Audio Frame In:

Name Description
pcm_frame Audio frame input for voice processing

Audio Frame Out:

Name Description
pcm_frame Audio frame output after voice processing

Azure Support

This extension also support Azure OpenAI Service, the propoerty settings are as follow:

{
    "base_uri": "wss://xxx.openai.azure.com",
    "path": "/openai/realtime?api-version=xxx&deployment=xxx",
    "api_key": "xxx",
    "model": "gpt-4o-realtime-preview",
    "vendor": "azure"
}