Spaces:

ssaiteja16
/

RagBenchCapstone10

Sleeping

App Files Files Community

RagBenchCapstone10 / README.md

swaroop-uddandarao

added reports

408ab70 2 months ago

preview code

raw

history blame contribute delete

2.91 kB

	---
	title: RagBenchCapstone10
	emoji: 📉
	colorFrom: green
	colorTo: yellow
	sdk: gradio
	sdk_version: 5.16.0
	app_file: app.py
	pinned: false
	short_description: RagBench Dataset development by Saiteja
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# RAG Benchmark Evaluation System

	## Overview

	This project implements a Retrieval-Augmented Generation (RAG) system for evaluating different language models and reranking strategies. It provides a user-friendly interface for querying documents and analyzing the performance of various models.

	## Features

	- Multiple LLM support (LLaMA 3.3, Mistral 7B)
	- Various reranking models:
	- MS MARCO MiniLM
	- MS MARCO TinyBERT
	- MonoT5 Base
	- MonoT5 Small
	- MonoT5 3B
	- Vector similarity search using Milvus
	- Automatic document chunking and retrieval
	- Performance metrics calculation
	- Interactive Gradio interface

	## Prerequisites

	- Python 3.8+
	- CUDA-compatible GPU (optional, for faster processing)

	## Installation

	1. Clone the repository:
	bash
	git clone https://github.com/yourusername/rag-benchmark.git
	cd rag-benchmark

	2. Install dependencies:

	- pip install -r requirements.txt

	3. Configure the models:

	- Create a `models` directory and add your language model files.
	- Create a `rerankers` directory and add your reranking model files.

	- Run the application:

	- python app.py

	## Usage

	1. Start the application:

	2. Access the web interface at `http://localhost:7860`

	3. Enter your question and select:

	- LLM Model (LLaMA 3.3 or Mistral 7B)
	- Reranking Model (MS MARCO or MonoT5 variants)

	4. Click "Evaluate Model" to get results

	## Metrics

	The system calculates several performance metrics:

	- RMSE Context Relevance
	- RMSE Context Utilization
	- AUCROC Adherence
	- Processing Time

	## Reranking Models Comparison

	### MS MARCO Models

	- MiniLM: Fast and efficient, good general performance
	- TinyBERT: Lightweight, slightly lower accuracy but faster

	### MonoT5 Models

	- Small: Compact and fast, suitable for limited resources
	- Base: Balanced performance and speed
	- 3B: Highest accuracy, requires more computational resources

	## Error Handling

	- Automatic fallback to fewer documents if token limits are exceeded
	- Graceful handling of API timeouts
	- Comprehensive error logging

	## Contributing

	1. Fork the repository
	2. Create your feature branch (`git checkout -b feature/AmazingFeature`)
	3. Commit your changes (`git commit -m 'Add some AmazingFeature'`)
	4. Push to the branch (`git push origin feature/AmazingFeature`)
	5. Open a Pull Request

	## Dependencies

	- gradio
	- torch
	- transformers
	- sentence-transformers
	- pymilvus
	- numpy
	- pandas
	- scikit-learn
	- tiktoken
	- groq
	- huggingface_hub

	## License

	[Your License Here]

	## Acknowledgments

	- RAGBench dataset
	- Hugging Face Transformers
	- Milvus Vector Database
	- Groq API