mknolan's picture
Upload README.md with huggingface_hub
5c40b14 verified

A newer version of the Gradio SDK is available: 5.27.1

Upgrade
metadata
title: InternVL2.5 Image Analyzer
emoji: 🖼️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 3.50.0
app_file: app.py
pinned: false

InternVL2.5 Image Analyzer

This Hugging Face Space demonstrates the capabilities of the InternVL2.5 model, a powerful multimodal model that can analyze images and respond to questions about them.

Features

  • Upload your own images for analysis
  • Choose from predefined prompts or create your own
  • Detailed image understanding and description
  • Text recognition in images
  • Visual reasoning capabilities

Model Details

This space uses the InternVL2.5-8B model, which is a multimodal large language model (MLLM) with approximately 8.1 billion parameters. The model was developed by OpenGVLab and demonstrates strong capabilities in various visual understanding tasks.

Architecture

InternVL2.5 combines a vision encoder (based on the InternViT architecture) with a language model, allowing it to process both visual and textual information.

Example Prompts

Here are some prompts you can try:

  1. Describe this image in detail.
  2. What can you tell me about this image?
  3. Is there any text in this image? If so, can you read it?
  4. What is the main subject of this image?
  5. What emotions or feelings does this image convey?
  6. Describe the composition and visual elements of this image.
  7. Summarize what you see in this image in one paragraph.

Usage

  1. Upload an image using the file uploader
  2. Select a prompt from the dropdown or write your own
  3. Click "Submit" to get the analysis

Credits

This application uses the InternVL2.5 model by OpenGVLab. For more information about the model, check out:

License

The InternVL2.5 model is licensed under the MIT License.