Amazon Bedrock LLM Extension
Configurations
You can config this extension by providing following environments:
Env | Required | Default | Notes |
---|---|---|---|
AWS_REGION | No | us-east-1 | The Region of Amazon Bedrock service you want to use. |
AWS_ACCESS_KEY_ID | No | - | Access Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document. |
AWS_SECRET_ACCESS_KEY | No | - | Secret Key of your IAM User, make sure you've set proper permissions to invoke Bedrock models and gain models access in Bedrock. Will use default credentials provider if not provided. Check document. |
AWS_BEDROCK_MODEL | No | Nova (https://docs.aws.amazon.com/nova/latest/userguide/what-is-nova.html) | Bedrock model id, check docuement. |
Features
- Real-time video and audio interaction similar to Gemini 2.0
- Audio recognition using TEN framework's STT plugin
- Text-to-speech conversion using TEN framework's TTS plugin
- Integration with AWS Bedrock's Nova model
- Smart input truncation logic
- Multi-language support
Requirements
- Python 3.9+
- AWS account with Bedrock access
- TEN framework with STT and TTS plugins
- Dependencies listed in requirements.txt
Installation
- Install dependencies:
pip install -r requirements.txt
- Configure AWS credentials:
- Set up AWS credentials with Bedrock access
- Update the api_key in configuration
Configuration
The extension can be configured through manifest.json properties:
base_uri
: Bedrock API endpointregion
: AWS region for Bedrockaws_access_key_id
: AWS access key IDaws_secret_access_key
: AWS secret access keymodel_id
: Bedrock Nova model IDlanguage
: Language code for STT/TTS- See manifest.json for full configuration options
Input Truncation Logic
The extension implements smart input truncation:
Duration-based truncation:
- Automatically truncates input exceeding 30 seconds
Silence-based truncation:
- Triggers when silence exceeds 2 seconds
Manual truncation:
- Supports user-initiated truncation
Architecture
Audio Processing:
- Uses TEN framework's STT plugin for audio recognition
- Buffers and processes audio in real-time
- Provides intermediate and final transcripts
Nova Model Integration:
- Combines transcribed text with video input
- Sends to Bedrock's Nova model for processing
- Handles responses and error conditions
Speech Synthesis:
- Converts Nova model responses to speech
- Uses TEN framework's TTS plugin
- Synchronizes with video output
API Usage
Commands
- Flush Command:
cmd = Cmd.create("flush")
await ten_env.send_cmd(cmd)
- User Events:
# User joined
cmd = Cmd.create("on_user_joined")
await ten_env.send_cmd(cmd)
# User left
cmd = Cmd.create("on_user_left")
await ten_env.send_cmd(cmd)
Contributing
- Fork the repository
- Create a feature branch
- Submit a pull request