3v324v23's picture
Зафиксирована рабочая версия TEN-Agent для HuggingFace Space
87337b1

gemini_v2v_python

An extension for integrating Gemini's Next Generation of Multimodal AI into your application, providing configurable AI-driven features such as conversational agents, task automation, and tool integration.

Features

  • Gemini Multimodal Integration: Leverage Gemini Multimodal models for voice-to-voice as well as text processing.
  • Configurable: Easily customize API keys, model settings, prompts, temperature, etc.
  • Async Queue Processing: Supports real-time message processing with task cancellation and prioritization.

API

Refer to the api definition in [manifest.json] and default values in property.json.

Property Type Description
api_key string API key for authenticating with Gemini
temperature float32 Sampling temperature, higher values mean more randomness
model string Model identifier (e.g., GPT-4, Gemini-1)
max_tokens int32 Maximum number of tokens to generate
system_message string Default system message to send to the model
voice string Voice that Gemini model uses, such as alloy, echo, shimmer, etc.
server_vad bool Flag to enable or disable server VAD for Gemini
language string Language that Gemini model responds in, such as en-US, zh-CN, etc.
dump bool Flag to enable or disable audio dump for debugging purposes
base_uri string Base URI for connecting to the Gemini service
audio_out bool Flag to enable or disable audio output
input_transcript bool Flag to enable input transcript processing
sample_rate int32 Sample rate for audio processing
stream_id int32 Stream ID for identifying audio streams
greeting string Greeting message for initial interaction

Data Out

Name Property Type Description
text_data text string Outgoing text data
append text string Additional text appended to the output

Command Out

Name Description
flush Response after flushing the current state
tool_call Invokes a tool with specific arguments

Audio Frame In

Name Description
pcm_frame Audio frame input for voice processing

Video Frame In

Name Description
video_frame Video frame input for processing

Audio Frame Out

Name Description
pcm_frame Audio frame output after voice processing