--- title: Multi-Modal AI Demo emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.20.1 app_file: app.py pinned: false --- # Multi-Modal AI Demo This project demonstrates the use of multi-modal AI capabilities using Hugging Face pretrained models. The application provides the following features: 1. **Image Captioning**: Generate descriptive captions for images 2. **Visual Question Answering**: Answer questions about the content of images 3. **Sentiment Analysis**: Analyze the sentiment of text inputs ## Requirements - Python 3.8+ - Dependencies listed in `requirements.txt` ## Local Installation To run this project locally: 1. Clone this repository 2. Install dependencies: ``` pip install -r requirements.txt ``` 3. Run the application: ``` python app.py ``` Then open your browser and navigate to the URL shown in the terminal (typically http://127.0.0.1:7860). ## Deploying to Hugging Face Spaces This project is configured for direct deployment to Hugging Face Spaces. The core files needed for deployment are: - `app.py` - Main application file - `model_utils.py` - Utility functions for model operations - `requirements.txt` - Project dependencies - `README.md` - This documentation file with Spaces configuration ## Models Used This demo uses the following pretrained models from Hugging Face: - Image Captioning: `nlpconnect/vit-gpt2-image-captioning` - Visual Question Answering: `nlpconnect/vit-gpt2-image-captioning` (simplified) - Sentiment Analysis: `distilbert-base-uncased-finetuned-sst-2-english`