Spaces:

mknolan
/

internvl25-image-analyzer-working

Paused

App Files Files Community

mknolan commited on Mar 21

Commit

1f8c48b

verified ·

1 Parent(s): 562ded1

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +135 -5

README.md CHANGED Viewed

@@ -1,10 +1,140 @@
 ---
-title: Internvl25 Image Analyzer Working
-emoji: 📚
-colorFrom: yellow
-colorTo: yellow
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: InternVL2.5 Dual Image Analyzer
+emoji: 🖼️
+colorFrom: blue
+colorTo: purple
 sdk: docker
+sdk_version: 3.10
+app_file: app.py
 pinned: false
+license: mit
 ---
+# InternVL2.5 Dual Image Analyzer
+This Hugging Face Space demonstrates the capabilities of InternVL2.5, a powerful vision-language model.
+It allows you to upload and analyze two images simultaneously, comparing the results side by side.
+## Features
+- Upload one or two images for detailed analysis
+- Uses the InternVL2.5-8B model for high-quality image understanding
+- Handles various image aspects and formats
+- Multi-GPU support for efficient processing
+- Provides a selection of prompts or allows custom queries
+## Usage
+1. Upload one or two images using the upload buttons
+2. Select a prompt from the dropdown or enter your own
+3. Click "Analyze Images" to process the images
+4. View the detailed analysis for each image
+For comparing two images, use the prompt "Compare these images and describe the differences."
+## Requirements
+- Python 3.8 or higher
+- PyTorch
+- Transformers (version 4.35.2+)
+- Pillow
+- Matplotlib
+- Accelerate
+- Bitsandbytes
+- Safetensors
+- Gradio for the web interface
+## Hardware Requirements
+This application uses a vision-language model which requires:
+- A CUDA-capable GPU with at least 8GB VRAM
+- 8GB+ system RAM
+## Deployment Options
+### 1. Hugging Face Spaces (Recommended)
+This repository is ready to be deployed on Hugging Face Spaces.
+**Steps:**
+1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces)
+2. Select "Docker" as the Space SDK
+3. Link this GitHub repository
+4. Select a GPU (T4 or better is recommended)
+5. Create the Space
+The application will automatically deploy with the Gradio UI frontend.
+### 2. AWS SageMaker
+For production deployment on AWS SageMaker:
+1. Package the application using the provided Dockerfile
+2. Upload the Docker image to Amazon ECR
+3. Create a SageMaker Model using the ECR image
+4. Deploy an endpoint with an instance type like ml.g4dn.xlarge
+5. Set up API Gateway for HTTP access (optional)
+Detailed AWS instructions can be found in the `docs/aws_deployment.md` file.
+### 3. Azure Machine Learning
+For Azure deployment:
+1. Create an Azure ML workspace
+2. Register the model on Azure ML
+3. Create an inference configuration
+4. Deploy to AKS or ACI with a GPU-enabled instance
+Detailed Azure instructions can be found in the `docs/azure_deployment.md` file.
+## How It Works
+The application uses the InternVL2.5 model, a state-of-the-art multimodal AI model that can understand and describe images with impressive detail.
+The script:
+1. Processes the images with the selected prompt
+2. Uses 8-bit quantization to reduce memory requirements
+3. Formats and displays the results
+## Repository Structure
+- `app.py` - Gradio UI for web interface
+- `Dockerfile` - For containerized deployment
+- `requirements.txt` - Python dependencies
+- `data_temp/` - Sample images for testing
+## Local Development
+1. Install the required packages:
+   ```
+   pip install -r requirements.txt
+   ```
+2. Run the Gradio UI:
+   ```
+   python app.py
+   ```
+3. Visit `http://localhost:7860` in your browser
+## Example Output
+```
+Processing image: data_temp/page_2.png
+Loading model...
+Generating descriptions...
+==== Image Description Results (InternVL2.5) ====
+Basic Description:
+The image shows a webpage or document with text content organized in multiple columns.
+Detailed Description:
+The image displays a structured document or webpage with multiple sections of text organized in a grid layout. The content appears to be technical or educational in nature, with what looks like headings and paragraphs of text. The color scheme is primarily black text on a white background, creating a clean, professional appearance. There appear to be multiple columns of information, possibly representing different topics or categories. The layout suggests this might be documentation, a reference guide, or an educational resource related to technical content.
+Technical Analysis:
+This appears to be a screenshot of a digital document or webpage. The image quality is good with clear text rendering, suggesting it was captured at an appropriate resolution. The image uses a standard document layout with what appears to be a grid or multi-column structure. The screenshot has been taken of what seems to be a text-heavy interface with minimal graphics, consistent with technical documentation or reference materials.
+```
+Note: Actual descriptions will vary based on the specific image content and may be more detailed than this example.