glm_v2v_python
An extension for integrating GLM's Multimodal AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.
Features
- GLM Multimodal Integration: Leverage GLM Multimodal models for voice to voice as well as text processing.
- Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
- Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.
API
Refer to api
definition in [manifest.json] and default values in property.json.
Property | Type | Description |
---|---|---|
api_key |
string |
API key for authenticating with OpenAI |
max_tokens |
int64 |
Maximum number of tokens to generate |
prompt |
string |
Default system message to send to the model |
server_vad |
bool |
Flag to enable or disable server vad of OpenAI |
dump |
bool |
Flag to enable or disable audio dump for debugging purpose |
Data Out:
Name | Property | Type | Description |
---|---|---|---|
text_data |
text |
string |
Outgoing text data |
Command Out:
Name | Description |
---|---|
flush |
Response after flushing the current state |
Audio Frame In:
Name | Description |
---|---|
pcm_frame |
Audio frame input for voice processing |
Audio Frame Out:
Name | Description |
---|---|
pcm_frame |
Audio frame output after voice processing |