3v324v23's picture
Зафиксирована рабочая версия TEN-Agent для HuggingFace Space
87337b1

glm_v2v_python

An extension for integrating GLM's Multimodal AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.

Features

  • GLM Multimodal Integration: Leverage GLM Multimodal models for voice to voice as well as text processing.
  • Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
  • Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.

API

Refer to api definition in [manifest.json] and default values in property.json.

Property Type Description
api_key string API key for authenticating with OpenAI
max_tokens int64 Maximum number of tokens to generate
prompt string Default system message to send to the model
server_vad bool Flag to enable or disable server vad of OpenAI
dump bool Flag to enable or disable audio dump for debugging purpose

Data Out:

Name Property Type Description
text_data text string Outgoing text data

Command Out:

Name Description
flush Response after flushing the current state

Audio Frame In:

Name Description
pcm_frame Audio frame input for voice processing

Audio Frame Out:

Name Description
pcm_frame Audio frame output after voice processing