RagBenchCapstone10 / README.md
swaroop-uddandarao
added reports
408ab70
---
title: RagBenchCapstone10
emoji: πŸ“‰
colorFrom: green
colorTo: yellow
sdk: gradio
sdk_version: 5.16.0
app_file: app.py
pinned: false
short_description: RagBench Dataset development by Saiteja
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# RAG Benchmark Evaluation System
## Overview
This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.
## Features
- Multiple LLM support (LLaMA 3.3, Mistral 7B)
- Various reranking models:
- MS MARCO MiniLM
- MS MARCO TinyBERT
- MonoT5 Base
- MonoT5 Small
- MonoT5 3B
- Vector similarity search using Milvus
- Automatic document chunking and retrieval
- Performance metrics calculation
- Interactive Gradio interface
## Prerequisites
- Python 3.8+
- CUDA-compatible GPU (optional, for faster processing)
## Installation
1. Clone the repository:
bash
git clone https://github.com/yourusername/rag-benchmark.git
cd rag-benchmark
2. Install dependencies:
- pip install -r requirements.txt
3. Configure the models:
- Create a `models` directory and add your language model files.
- Create a `rerankers` directory and add your reranking model files.
- Run the application:
- python app.py
## Usage
1. Start the application:
2. Access the web interface at `http://localhost:7860`
3. Enter your question and select:
- LLM Model (LLaMA 3.3 or Mistral 7B)
- Reranking Model (MS MARCO or MonoT5 variants)
4. Click "Evaluate Model" to get results
## Metrics
The system calculates several performance metrics:
- RMSE Context Relevance
- RMSE Context Utilization
- AUCROC Adherence
- Processing Time
## Reranking Models Comparison
### MS MARCO Models
- **MiniLM**: Fast and efficient, good general performance
- **TinyBERT**: Lightweight, slightly lower accuracy but faster
### MonoT5 Models
- **Small**: Compact and fast, suitable for limited resources
- **Base**: Balanced performance and speed
- **3B**: Highest accuracy, requires more computational resources
## Error Handling
- Automatic fallback to fewer documents if token limits are exceeded
- Graceful handling of API timeouts
- Comprehensive error logging
## Contributing
1. Fork the repository
2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the branch (`git push origin feature/AmazingFeature`)
5. Open a Pull Request
## Dependencies
- gradio
- torch
- transformers
- sentence-transformers
- pymilvus
- numpy
- pandas
- scikit-learn
- tiktoken
- groq
- huggingface_hub
## License
[Your License Here]
## Acknowledgments
- RAGBench dataset
- Hugging Face Transformers
- Milvus Vector Database
- Groq API