File size: 1,461 Bytes
ca99eca
 
 
 
 
 
 
 
 
 
 
 
 
039d869
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
---
title: Qwen2.5 Omni 7B Demo
emoji: 🏆
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: 5.23.1
app_file: app.py
pinned: false
license: mit
short_description: A space exploring omni modality capabilities
---

# Qwen2.5-Omni Multimodal Chat Demo

This Space demonstrates the capabilities of Qwen2.5-Omni, an end-to-end multimodal model that can perceive and generate text, images, audio, and video.

## Features

- **Omni-modal Understanding**: Process text, images, audio, and video inputs
- **Multimodal Responses**: Generate both text and natural speech outputs
- **Real-time Interaction**: Stream responses as they're generated
- **Customizable Voice**: Choose between male and female voice outputs

## How to Use

1. **Text Input**: Type your message in the text box and click "Send Text"
2. **Multimodal Input**: 
   - Upload images, audio files, or videos
   - Optionally add accompanying text
   - Click "Send Multimodal Input"
3. **Voice Settings**: 
   - Toggle audio output on/off
   - Select preferred voice type

## Examples

Try these interactions:
- Upload an image and ask "Describe what you see"
- Upload an audio clip and ask "What is being said here?"
- Upload a video and ask "What's happening in this video?"
- Ask complex questions like "Explain quantum computing in simple terms"

## Technical Details

This demo uses:
- Qwen2.5-Omni-7B model
- FlashAttention-2 for accelerated inference
- Gradio for the interactive interface