askveracity / docs /configuration.md
ankanghosh's picture
Upload 21 files
6d11371 verified
# AskVeracity Configuration Guide
This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.
## Prerequisites
Before setting up AskVeracity, ensure you have:
- Python 3.8 or higher
- pip (Python package installer)
- Git (for cloning the repository)
- API keys for external services
## Installation
### Local Development
1. Clone the repository:
```bash
git clone https://github.com/yourusername/askveracity.git
cd askveracity
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the required spaCy model:
```bash
python -m spacy download en_core_web_sm
```
## API Key Configuration
AskVeracity requires several API keys to access external services. You have two options for configuring these keys:
### Option 1: Using Streamlit Secrets (Recommended for Local Development)
1. Create a `.streamlit` directory if it doesn't exist:
```bash
mkdir -p .streamlit
```
2. Create a `secrets.toml` file:
```bash
cp .streamlit/secrets.toml.example .streamlit/secrets.toml
```
3. Edit the `.streamlit/secrets.toml` file with your API keys:
```toml
OPENAI_API_KEY = "your_openai_api_key"
NEWS_API_KEY = "your_news_api_key"
FACTCHECK_API_KEY = "your_factcheck_api_key"
```
### Option 2: Using Environment Variables
1. Create a `.env` file in the root directory:
```bash
touch .env
```
2. Add your API keys to the `.env` file:
```
OPENAI_API_KEY=your_openai_api_key
NEWS_API_KEY=your_news_api_key
FACTCHECK_API_KEY=your_factcheck_api_key
```
3. Load the environment variables:
```python
# In Python
from dotenv import load_dotenv
load_dotenv()
```
Or in your terminal:
```bash
# Unix/Linux/MacOS
source .env
# Windows
# Install python-dotenv[cli] and run
dotenv run streamlit run app.py
```
## Required API Keys
AskVeracity uses the following external APIs:
1. **OpenAI API** (Required)
- Used for claim extraction, classification, and explanation generation
- Get an API key from [OpenAI's website](https://platform.openai.com/)
2. **News API** (Optional but recommended)
- Used for retrieving news article evidence
- Get an API key from [NewsAPI.org](https://newsapi.org/)
3. **Google Fact Check Tools API** (Optional but recommended)
- Used for retrieving fact-checking evidence
- Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api)
## Configuration Files
### config.py
The main configuration file is `config.py`, which contains:
- API key handling
- Rate limiting configuration
- Error backoff settings
- RSS feed settings
Important configuration sections in `config.py`:
```python
# Rate limiting configuration
RATE_LIMITS = {
# api_name: {"requests": max_requests, "period": period_in_seconds}
"newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour
"factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day
"semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes
"wikidata": {"requests": 60, "period": 60}, # 60 requests per minute
"wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute
"rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour
}
# Error backoff settings
ERROR_BACKOFF = {
"max_retries": 5,
"initial_backoff": 1, # seconds
"backoff_factor": 2, # exponential backoff
}
# RSS feed settings
RSS_SETTINGS = {
"max_feeds_per_request": 10, # Maximum number of feeds to try per request
"max_age_days": 3, # Maximum age of RSS items to consider
"timeout_seconds": 5, # Timeout for RSS feed requests
"max_workers": 5 # Number of parallel workers for fetching feeds
}
```
### Category-Specific RSS Feeds
Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category:
```python
CATEGORY_SPECIFIC_FEEDS = {
"ai": [
"https://www.artificialintelligence-news.com/feed/",
"https://openai.com/news/rss.xml",
# Additional AI-specific feeds
],
"science": [
"https://www.science.org/rss/news_current.xml",
"https://www.nature.com/nature.rss",
# Additional science feeds
],
# Additional categories
}
```
## Hugging Face Spaces Deployment
### Setting Up a Space
1. Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "Streamlit" as the SDK
- Choose the hardware tier (use the default 16GB RAM)
2. Upload the project files:
- You can upload files directly through the Hugging Face web interface
- Alternatively, use Git to push to the Hugging Face repository
- Make sure to include all necessary files including requirements.txt
### Setting Up Secrets
1. Add API keys as secrets:
- Go to the "Settings" tab of your Space
- Navigate to the "Repository secrets" section
- Add your API keys:
- `OPENAI_API_KEY`
- `NEWS_API_KEY`
- `FACTCHECK_API_KEY`
### Configuring the Space
Edit the metadata in the `README.md` file:
```yaml
---
title: Askveracity
emoji: πŸ“‰
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---
```
## Custom Configuration
### Adjusting Rate Limits
You can adjust the rate limits in `config.py` based on your API subscription levels:
```python
# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour
```
### Modifying RSS Feeds
The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed.
### Performance Evaluation
The system includes a performance evaluation script `evaluate_performance.py` that:
1. Runs the fact-checking system on a predefined set of test claims
2. Calculates accuracy, safety rate, processing time, and confidence metrics
3. Generates visualization charts in the `results/` directory
4. Saves detailed results to `results/performance_results.json`
To run the performance evaluation:
```bash
python evaluate_performance.py [--limit N] [--output FILE]
```
- `--limit N`: Limit evaluation to first N claims (default: all)
- `--output FILE`: Save results to FILE (default: performance_results.json)
## Running the Application
Start the Streamlit app:
```bash
streamlit run app.py
```
The application will be available at http://localhost:8501 by default.
## Troubleshooting
### API Key Issues
If you encounter API key errors:
1. Verify that your API keys are set correctly
2. Check the logs for specific error messages
3. Make sure API keys are not expired or rate-limited
### Model Loading Errors
If spaCy model fails to load:
```bash
# Reinstall the model
python -m spacy download en_core_web_sm --force
```
### Rate Limiting
If you encounter rate limiting issues:
1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py`
2. Increase the backoff parameters in `ERROR_BACKOFF`
3. Subscribe to higher API tiers if available
### Memory Issues
If the application crashes due to memory issues:
1. Reduce the number of parallel workers in `RSS_SETTINGS`
2. Limit the maximum number of evidence items processed
## Performance Optimization
For better performance:
1. Upgrade to a higher-tier OpenAI model for improved accuracy
2. Increase the number of parallel workers for evidence retrieval
3. Add more relevant RSS feeds to improve evidence gathering