Spaces:

ankanghosh
/

askveracity

Running

App Files Files Community

askveracity / docs /configuration.md

ankanghosh

Upload 21 files

6d11371 verified 2 days ago

preview code

raw

history blame contribute delete

7.82 kB

	# AskVeracity Configuration Guide

	This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.

	## Prerequisites

	Before setting up AskVeracity, ensure you have:

	- Python 3.8 or higher
	- pip (Python package installer)
	- Git (for cloning the repository)
	- API keys for external services

	## Installation

	### Local Development

	1. Clone the repository:
	```bash
	git clone https://github.com/yourusername/askveracity.git
	cd askveracity
	```

	2. Install the required dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Download the required spaCy model:
	```bash
	python -m spacy download en_core_web_sm
	```

	## API Key Configuration

	AskVeracity requires several API keys to access external services. You have two options for configuring these keys:

	### Option 1: Using Streamlit Secrets (Recommended for Local Development)

	1. Create a `.streamlit` directory if it doesn't exist:
	```bash
	mkdir -p .streamlit
	```

	2. Create a `secrets.toml` file:
	```bash
	cp .streamlit/secrets.toml.example .streamlit/secrets.toml
	```

	3. Edit the `.streamlit/secrets.toml` file with your API keys:
	```toml
	OPENAI_API_KEY = "your_openai_api_key"
	NEWS_API_KEY = "your_news_api_key"
	FACTCHECK_API_KEY = "your_factcheck_api_key"
	```

	### Option 2: Using Environment Variables

	1. Create a `.env` file in the root directory:
	```bash
	touch .env
	```

	2. Add your API keys to the `.env` file:
	```
	OPENAI_API_KEY=your_openai_api_key
	NEWS_API_KEY=your_news_api_key
	FACTCHECK_API_KEY=your_factcheck_api_key
	```

	3. Load the environment variables:
	```python
	# In Python
	from dotenv import load_dotenv
	load_dotenv()
	```

	Or in your terminal:
	```bash
	# Unix/Linux/MacOS
	source .env

	# Windows
	# Install python-dotenv[cli] and run
	dotenv run streamlit run app.py
	```

	## Required API Keys

	AskVeracity uses the following external APIs:

	1. OpenAI API (Required)
	- Used for claim extraction, classification, and explanation generation
	- Get an API key from [OpenAI's website](https://platform.openai.com/)

	2. News API (Optional but recommended)
	- Used for retrieving news article evidence
	- Get an API key from [NewsAPI.org](https://newsapi.org/)

	3. Google Fact Check Tools API (Optional but recommended)
	- Used for retrieving fact-checking evidence
	- Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api)

	## Configuration Files

	### config.py

	The main configuration file is `config.py`, which contains:

	- API key handling
	- Rate limiting configuration
	- Error backoff settings
	- RSS feed settings

	Important configuration sections in `config.py`:

	```python
	# Rate limiting configuration
	RATE_LIMITS = {
	# api_name: {"requests": max_requests, "period": period_in_seconds}
	"newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour
	"factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day
	"semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes
	"wikidata": {"requests": 60, "period": 60}, # 60 requests per minute
	"wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute
	"rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour
	}

	# Error backoff settings
	ERROR_BACKOFF = {
	"max_retries": 5,
	"initial_backoff": 1, # seconds
	"backoff_factor": 2, # exponential backoff
	}

	# RSS feed settings
	RSS_SETTINGS = {
	"max_feeds_per_request": 10, # Maximum number of feeds to try per request
	"max_age_days": 3, # Maximum age of RSS items to consider
	"timeout_seconds": 5, # Timeout for RSS feed requests
	"max_workers": 5 # Number of parallel workers for fetching feeds
	}
	```

	### Category-Specific RSS Feeds

	Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category:

	```python
	CATEGORY_SPECIFIC_FEEDS = {
	"ai": [
	"https://www.artificialintelligence-news.com/feed/",
	"https://openai.com/news/rss.xml",
	# Additional AI-specific feeds
	],
	"science": [
	"https://www.science.org/rss/news_current.xml",
	"https://www.nature.com/nature.rss",
	# Additional science feeds
	],
	# Additional categories
	}
	```

	## Hugging Face Spaces Deployment

	### Setting Up a Space

	1. Create a new Space on Hugging Face:
	- Go to https://huggingface.co/spaces
	- Click "Create new Space"
	- Select "Streamlit" as the SDK
	- Choose the hardware tier (use the default 16GB RAM)

	2. Upload the project files:
	- You can upload files directly through the Hugging Face web interface
	- Alternatively, use Git to push to the Hugging Face repository
	- Make sure to include all necessary files including requirements.txt

	### Setting Up Secrets

	1. Add API keys as secrets:
	- Go to the "Settings" tab of your Space
	- Navigate to the "Repository secrets" section
	- Add your API keys:
	- `OPENAI_API_KEY`
	- `NEWS_API_KEY`
	- `FACTCHECK_API_KEY`

	### Configuring the Space

	Edit the metadata in the `README.md` file:

	```yaml
	---
	title: Askveracity
	emoji: 📉
	colorFrom: blue
	colorTo: pink
	sdk: streamlit
	sdk_version: 1.44.1
	app_file: app.py
	pinned: false
	license: mit
	short_description: Fact-checking and misinformation detection tool.
	---
	```

	## Custom Configuration

	### Adjusting Rate Limits

	You can adjust the rate limits in `config.py` based on your API subscription levels:

	```python
	# Update for higher tier News API subscription
	RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour
	```

	### Modifying RSS Feeds

	The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed.

	### Performance Evaluation

	The system includes a performance evaluation script `evaluate_performance.py` that:

	1. Runs the fact-checking system on a predefined set of test claims
	2. Calculates accuracy, safety rate, processing time, and confidence metrics
	3. Generates visualization charts in the `results/` directory
	4. Saves detailed results to `results/performance_results.json`

	To run the performance evaluation:

	```bash
	python evaluate_performance.py [--limit N] [--output FILE]
	```

	- `--limit N`: Limit evaluation to first N claims (default: all)
	- `--output FILE`: Save results to FILE (default: performance_results.json)

	## Running the Application

	Start the Streamlit app:

	```bash
	streamlit run app.py
	```

	The application will be available at http://localhost:8501 by default.

	## Troubleshooting

	### API Key Issues

	If you encounter API key errors:

	1. Verify that your API keys are set correctly
	2. Check the logs for specific error messages
	3. Make sure API keys are not expired or rate-limited

	### Model Loading Errors

	If spaCy model fails to load:

	```bash
	# Reinstall the model
	python -m spacy download en_core_web_sm --force
	```

	### Rate Limiting

	If you encounter rate limiting issues:

	1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py`
	2. Increase the backoff parameters in `ERROR_BACKOFF`
	3. Subscribe to higher API tiers if available

	### Memory Issues

	If the application crashes due to memory issues:

	1. Reduce the number of parallel workers in `RSS_SETTINGS`
	2. Limit the maximum number of evidence items processed

	## Performance Optimization

	For better performance:

	1. Upgrade to a higher-tier OpenAI model for improved accuracy
	2. Increase the number of parallel workers for evidence retrieval
	3. Add more relevant RSS feeds to improve evidence gathering