agents/ten_packages/extension/bedrock_llm_python/README.md

Amazon Bedrock LLM Extension

Configurations

You can config this extension by providing following environments:

Env	Required	Default	Notes
AWS_REGION	No	us-east-1	The Region of Amazon Bedrock service you want to use.
AWS_ACCESS_KEY_ID	No	-	Access Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document.
AWS_SECRET_ACCESS_KEY	No	-	Secret Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document.
AWS_BEDROCK_MODEL	No	Nova (https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html)	Bedrock model id, check docuement.

Features

Real-time video and audio interaction similar to Gemini 2.0
Audio recognition using TEN framework's STT plugin
Text-to-speech conversion using TEN framework's TTS plugin
Integration with AWS Bedrock's Nova model
Smart input truncation logic
Multi-language support

Requirements

Python 3.9+
AWS account with Bedrock access
TEN framework with STT and TTS plugins
Dependencies listed in requirements.txt

Installation

Install dependencies:

pip install -r requirements.txt

Configure AWS credentials:

Set up AWS credentials with Bedrock access
Update the api_key in configuration

Configuration

The extension can be configured through manifest.json properties:

base_uri: Bedrock API endpoint
region: AWS region for Bedrock
aws_access_key_id: AWS access key ID
aws_secret_access_key: AWS secret access key
model_id: Bedrock Nova model ID
language: Language code for STT/TTS
See manifest.json for full configuration options

Input Truncation Logic

The extension implements smart input truncation:

Duration-based truncation:
- Automatically truncates input exceeding 30 seconds
Silence-based truncation:
- Triggers when silence exceeds 2 seconds
Manual truncation:
- Supports user-initiated truncation

Architecture

Audio Processing:
- Uses TEN framework's STT plugin for audio recognition
- Buffers and processes audio in real-time
- Provides intermediate and final transcripts
Nova Model Integration:
- Combines transcribed text with video input
- Sends to Bedrock's Nova model for processing
- Handles responses and error conditions
Speech Synthesis:
- Converts Nova model responses to speech
- Uses TEN framework's TTS plugin
- Synchronizes with video output

API Usage

Commands

Flush Command:

cmd = Cmd.create("flush")
await ten_env.send_cmd(cmd)

User Events:

# User joined
cmd = Cmd.create("on_user_joined")
await ten_env.send_cmd(cmd)

# User left
cmd = Cmd.create("on_user_left")
await ten_env.send_cmd(cmd)

Contributing

Fork the repository
Create a feature branch
Submit a pull request