mknolan commited on
Commit
1f8c48b
·
verified ·
1 Parent(s): 562ded1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +135 -5
README.md CHANGED
@@ -1,10 +1,140 @@
1
  ---
2
- title: Internvl25 Image Analyzer Working
3
- emoji: 📚
4
- colorFrom: yellow
5
- colorTo: yellow
6
  sdk: docker
 
 
7
  pinned: false
 
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: InternVL2.5 Dual Image Analyzer
3
+ emoji: 🖼️
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ sdk_version: 3.10
8
+ app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # InternVL2.5 Dual Image Analyzer
14
+
15
+ This Hugging Face Space demonstrates the capabilities of InternVL2.5, a powerful vision-language model.
16
+ It allows you to upload and analyze two images simultaneously, comparing the results side by side.
17
+
18
+ ## Features
19
+
20
+ - Upload one or two images for detailed analysis
21
+ - Uses the InternVL2.5-8B model for high-quality image understanding
22
+ - Handles various image aspects and formats
23
+ - Multi-GPU support for efficient processing
24
+ - Provides a selection of prompts or allows custom queries
25
+
26
+ ## Usage
27
+
28
+ 1. Upload one or two images using the upload buttons
29
+ 2. Select a prompt from the dropdown or enter your own
30
+ 3. Click "Analyze Images" to process the images
31
+ 4. View the detailed analysis for each image
32
+
33
+ For comparing two images, use the prompt "Compare these images and describe the differences."
34
+
35
+ ## Requirements
36
+
37
+ - Python 3.8 or higher
38
+ - PyTorch
39
+ - Transformers (version 4.35.2+)
40
+ - Pillow
41
+ - Matplotlib
42
+ - Accelerate
43
+ - Bitsandbytes
44
+ - Safetensors
45
+ - Gradio for the web interface
46
+
47
+ ## Hardware Requirements
48
+
49
+ This application uses a vision-language model which requires:
50
+ - A CUDA-capable GPU with at least 8GB VRAM
51
+ - 8GB+ system RAM
52
+
53
+ ## Deployment Options
54
+
55
+ ### 1. Hugging Face Spaces (Recommended)
56
+
57
+ This repository is ready to be deployed on Hugging Face Spaces.
58
+
59
+ **Steps:**
60
+ 1. Create a new Space on [Hugging Face Spaces](https://huggingface.co/spaces)
61
+ 2. Select "Docker" as the Space SDK
62
+ 3. Link this GitHub repository
63
+ 4. Select a GPU (T4 or better is recommended)
64
+ 5. Create the Space
65
+
66
+ The application will automatically deploy with the Gradio UI frontend.
67
+
68
+ ### 2. AWS SageMaker
69
+
70
+ For production deployment on AWS SageMaker:
71
+
72
+ 1. Package the application using the provided Dockerfile
73
+ 2. Upload the Docker image to Amazon ECR
74
+ 3. Create a SageMaker Model using the ECR image
75
+ 4. Deploy an endpoint with an instance type like ml.g4dn.xlarge
76
+ 5. Set up API Gateway for HTTP access (optional)
77
+
78
+ Detailed AWS instructions can be found in the `docs/aws_deployment.md` file.
79
+
80
+ ### 3. Azure Machine Learning
81
+
82
+ For Azure deployment:
83
+
84
+ 1. Create an Azure ML workspace
85
+ 2. Register the model on Azure ML
86
+ 3. Create an inference configuration
87
+ 4. Deploy to AKS or ACI with a GPU-enabled instance
88
+
89
+ Detailed Azure instructions can be found in the `docs/azure_deployment.md` file.
90
+
91
+ ## How It Works
92
+
93
+ The application uses the InternVL2.5 model, a state-of-the-art multimodal AI model that can understand and describe images with impressive detail.
94
+
95
+ The script:
96
+ 1. Processes the images with the selected prompt
97
+ 2. Uses 8-bit quantization to reduce memory requirements
98
+ 3. Formats and displays the results
99
+
100
+ ## Repository Structure
101
+
102
+ - `app.py` - Gradio UI for web interface
103
+ - `Dockerfile` - For containerized deployment
104
+ - `requirements.txt` - Python dependencies
105
+ - `data_temp/` - Sample images for testing
106
+
107
+ ## Local Development
108
+
109
+ 1. Install the required packages:
110
+ ```
111
+ pip install -r requirements.txt
112
+ ```
113
+
114
+ 2. Run the Gradio UI:
115
+ ```
116
+ python app.py
117
+ ```
118
+
119
+ 3. Visit `http://localhost:7860` in your browser
120
+
121
+ ## Example Output
122
+
123
+ ```
124
+ Processing image: data_temp/page_2.png
125
+ Loading model...
126
+ Generating descriptions...
127
+
128
+ ==== Image Description Results (InternVL2.5) ====
129
+
130
+ Basic Description:
131
+ The image shows a webpage or document with text content organized in multiple columns.
132
+
133
+ Detailed Description:
134
+ The image displays a structured document or webpage with multiple sections of text organized in a grid layout. The content appears to be technical or educational in nature, with what looks like headings and paragraphs of text. The color scheme is primarily black text on a white background, creating a clean, professional appearance. There appear to be multiple columns of information, possibly representing different topics or categories. The layout suggests this might be documentation, a reference guide, or an educational resource related to technical content.
135
+
136
+ Technical Analysis:
137
+ This appears to be a screenshot of a digital document or webpage. The image quality is good with clear text rendering, suggesting it was captured at an appropriate resolution. The image uses a standard document layout with what appears to be a grid or multi-column structure. The screenshot has been taken of what seems to be a text-heavy interface with minimal graphics, consistent with technical documentation or reference materials.
138
+ ```
139
+
140
+ Note: Actual descriptions will vary based on the specific image content and may be more detailed than this example.