3v324v23's picture
Зафиксирована рабочая версия TEN-Agent для HuggingFace Space
87337b1

Amazon Bedrock LLM Extension

Configurations

You can config this extension by providing following environments:

Env Required Default Notes
AWS_REGION No us-east-1 The Region of Amazon Bedrock service you want to use.
AWS_ACCESS_KEY_ID No - Access Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document.
AWS_SECRET_ACCESS_KEY No - Secret Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document.
AWS_BEDROCK_MODEL No Nova (https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html) Bedrock model id, check docuement.

Features

  • Real-time video and audio interaction similar to Gemini 2.0
  • Audio recognition using TEN framework's STT plugin
  • Text-to-speech conversion using TEN framework's TTS plugin
  • Integration with AWS Bedrock's Nova model
  • Smart input truncation logic
  • Multi-language support

Requirements

  • Python 3.9+
  • AWS account with Bedrock access
  • TEN framework with STT and TTS plugins
  • Dependencies listed in requirements.txt

Installation

  1. Install dependencies:
pip install -r requirements.txt
  1. Configure AWS credentials:
  • Set up AWS credentials with Bedrock access
  • Update the api_key in configuration

Configuration

The extension can be configured through manifest.json properties:

  • base_uri: Bedrock API endpoint
  • region: AWS region for Bedrock
  • aws_access_key_id: AWS access key ID
  • aws_secret_access_key: AWS secret access key
  • model_id: Bedrock Nova model ID
  • language: Language code for STT/TTS
  • See manifest.json for full configuration options

Input Truncation Logic

The extension implements smart input truncation:

  1. Duration-based truncation:

    • Automatically truncates input exceeding 30 seconds
  2. Silence-based truncation:

    • Triggers when silence exceeds 2 seconds
  3. Manual truncation:

    • Supports user-initiated truncation

Architecture

  1. Audio Processing:

    • Uses TEN framework's STT plugin for audio recognition
    • Buffers and processes audio in real-time
    • Provides intermediate and final transcripts
  2. Nova Model Integration:

    • Combines transcribed text with video input
    • Sends to Bedrock's Nova model for processing
    • Handles responses and error conditions
  3. Speech Synthesis:

    • Converts Nova model responses to speech
    • Uses TEN framework's TTS plugin
    • Synchronizes with video output

API Usage

Commands

  1. Flush Command:
cmd = Cmd.create("flush")
await ten_env.send_cmd(cmd)
  1. User Events:
# User joined
cmd = Cmd.create("on_user_joined")
await ten_env.send_cmd(cmd)

# User left
cmd = Cmd.create("on_user_left")
await ten_env.send_cmd(cmd)

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request