|
--- |
|
title: InternVL2.5 Dual Image Analyzer |
|
emoji: 🖼️ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: docker |
|
sdk_version: 3.10 |
|
app_file: app.py |
|
pinned: false |
|
license: mit |
|
--- |
|
|
|
# InternVL2.5 Dual Image Analyzer |
|
|
|
This Hugging Face Space demonstrates the capabilities of InternVL2.5, a powerful vision-language model. |
|
It allows you to upload and analyze two images simultaneously, comparing the results side by side. |
|
|
|
## Features |
|
|
|
- Upload one or two images for detailed analysis |
|
- Uses the InternVL2.5-8B model for high-quality image understanding |
|
- Handles various image aspects and formats |
|
- Multi-GPU support for efficient processing |
|
- Provides a selection of prompts or allows custom queries |
|
|
|
## Usage |
|
|
|
1. Upload one or two images using the upload buttons |
|
2. Select a prompt from the dropdown or enter your own |
|
3. Click "Analyze Images" to process the images |
|
4. View the detailed analysis for each image |
|
|
|
For comparing two images, use the prompt "Compare these images and describe the differences." |
|
|
|
## Requirements |
|
|
|
- Python 3.8 or higher |
|
- PyTorch |
|
- Transformers (version 4.35.2+) |
|
- Pillow |
|
- Matplotlib |
|
- Accelerate |
|
- Bitsandbytes |
|
- Safetensors |
|
- Gradio for the web interface |
|
|
|
## Hardware Requirements |
|
|
|
This application uses a vision-language model which requires: |
|
- A CUDA-capable GPU with at least 8GB VRAM |
|
- 8GB+ system RAM |
|
|
|
## Deployment Options |
|
|
|
### 1. Hugging Face Spaces (Recommended) |
|
|
|
This repository is ready to be deployed on Hugging Face Spaces. |
|
|
|
**Steps:** |
|
1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces) |
|
2. Select "Docker" as the Space SDK |
|
3. Link this GitHub repository |
|
4. Select a GPU (T4 or better is recommended) |
|
5. Create the Space |
|
|
|
The application will automatically deploy with the Gradio UI frontend. |
|
|
|
### 2. AWS SageMaker |
|
|
|
For production deployment on AWS SageMaker: |
|
|
|
1. Package the application using the provided Dockerfile |
|
2. Upload the Docker image to Amazon ECR |
|
3. Create a SageMaker Model using the ECR image |
|
4. Deploy an endpoint with an instance type like ml.g4dn.xlarge |
|
5. Set up API Gateway for HTTP access (optional) |
|
|
|
Detailed AWS instructions can be found in the `docs/aws_deployment.md` file. |
|
|
|
### 3. Azure Machine Learning |
|
|
|
For Azure deployment: |
|
|
|
1. Create an Azure ML workspace |
|
2. Register the model on Azure ML |
|
3. Create an inference configuration |
|
4. Deploy to AKS or ACI with a GPU-enabled instance |
|
|
|
Detailed Azure instructions can be found in the `docs/azure_deployment.md` file. |
|
|
|
## How It Works |
|
|
|
The application uses the InternVL2.5 model, a state-of-the-art multimodal AI model that can understand and describe images with impressive detail. |
|
|
|
The script: |
|
1. Processes the images with the selected prompt |
|
2. Uses 8-bit quantization to reduce memory requirements |
|
3. Formats and displays the results |
|
|
|
## Repository Structure |
|
|
|
- `app.py` - Gradio UI for web interface |
|
- `Dockerfile` - For containerized deployment |
|
- `requirements.txt` - Python dependencies |
|
- `data_temp/` - Sample images for testing |
|
|
|
## Local Development |
|
|
|
1. Install the required packages: |
|
``` |
|
pip install -r requirements.txt |
|
``` |
|
|
|
2. Run the Gradio UI: |
|
``` |
|
python app.py |
|
``` |
|
|
|
3. Visit `http://localhost:7860` in your browser |
|
|
|
## Example Output |
|
|
|
``` |
|
Processing image: data_temp/page_2.png |
|
Loading model... |
|
Generating descriptions... |
|
|
|
==== Image Description Results (InternVL2.5) ==== |
|
|
|
Basic Description: |
|
The image shows a webpage or document with text content organized in multiple columns. |
|
|
|
Detailed Description: |
|
The image displays a structured document or webpage with multiple sections of text organized in a grid layout. The content appears to be technical or educational in nature, with what looks like headings and paragraphs of text. The color scheme is primarily black text on a white background, creating a clean, professional appearance. There appear to be multiple columns of information, possibly representing different topics or categories. The layout suggests this might be documentation, a reference guide, or an educational resource related to technical content. |
|
|
|
Technical Analysis: |
|
This appears to be a screenshot of a digital document or webpage. The image quality is good with clear text rendering, suggesting it was captured at an appropriate resolution. The image uses a standard document layout with what appears to be a grid or multi-column structure. The screenshot has been taken of what seems to be a text-heavy interface with minimal graphics, consistent with technical documentation or reference materials. |
|
``` |
|
|
|
Note: Actual descriptions will vary based on the specific image content and may be more detailed than this example. |