AskVeracity Configuration Guide

This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.

Prerequisites

Before setting up AskVeracity, ensure you have:

Python 3.8 or higher
pip (Python package installer)
Git (for cloning the repository)
API keys for external services

Installation

Local Development

Clone the repository:

git clone https://github.com/yourusername/askveracity.git
cd askveracity

Install the required dependencies:
```
pip install -r requirements.txt
```

Download the required spaCy model:

python -m spacy download en_core_web_sm

API Key Configuration

AskVeracity requires several API keys to access external services. You have two options for configuring these keys:

Option 1: Using Streamlit Secrets (Recommended for Local Development)

Create a .streamlit directory if it doesn't exist:
```
mkdir -p .streamlit
```

Create a secrets.toml file:

cp .streamlit/secrets.toml.example .streamlit/secrets.toml

Edit the .streamlit/secrets.toml file with your API keys:

OPENAI_API_KEY = "your_openai_api_key"
NEWS_API_KEY = "your_news_api_key"
FACTCHECK_API_KEY = "your_factcheck_api_key"

Option 2: Using Environment Variables

Create a .env file in the root directory:
```
touch .env
```

Add your API keys to the .env file:

OPENAI_API_KEY=your_openai_api_key
NEWS_API_KEY=your_news_api_key
FACTCHECK_API_KEY=your_factcheck_api_key

Load the environment variables:

# In Python
from dotenv import load_dotenv
load_dotenv()

Or in your terminal:

# Unix/Linux/MacOS
source .env

# Windows
# Install python-dotenv[cli] and run
dotenv run streamlit run app.py

Required API Keys

AskVeracity uses the following external APIs:

OpenAI API (Required)
- Used for claim extraction, classification, and explanation generation
- Get an API key from OpenAI's website
News API (Optional but recommended)
- Used for retrieving news article evidence
- Get an API key from NewsAPI.org
Google Fact Check Tools API (Optional but recommended)
- Used for retrieving fact-checking evidence
- Get an API key from Google Fact Check Tools API

Configuration Files

config.py

The main configuration file is config.py, which contains:

API key handling
Rate limiting configuration
Error backoff settings
RSS feed settings

Important configuration sections in config.py:

# Rate limiting configuration
RATE_LIMITS = {
    # api_name: {"requests": max_requests, "period": period_in_seconds}
    "newsapi": {"requests": 100, "period": 3600},  # 100 requests per hour
    "factcheck": {"requests": 1000, "period": 86400},  # 1000 requests per day
    "semantic_scholar": {"requests": 10, "period": 300},  # 10 requests per 5 minutes
    "wikidata": {"requests": 60, "period": 60},  # 60 requests per minute
    "wikipedia": {"requests": 200, "period": 60},  # 200 requests per minute
    "rss": {"requests": 300, "period": 3600}  # 300 RSS requests per hour
}

# Error backoff settings
ERROR_BACKOFF = {
    "max_retries": 5,
    "initial_backoff": 1,  # seconds
    "backoff_factor": 2,  # exponential backoff
}

# RSS feed settings
RSS_SETTINGS = {
    "max_feeds_per_request": 10,  # Maximum number of feeds to try per request
    "max_age_days": 3,            # Maximum age of RSS items to consider
    "timeout_seconds": 5,         # Timeout for RSS feed requests
    "max_workers": 5              # Number of parallel workers for fetching feeds
}

Category-Specific RSS Feeds

Category-specific RSS feeds are defined in modules/category_detection.py. These feeds are used to prioritize sources based on the detected claim category:

CATEGORY_SPECIFIC_FEEDS = {
    "ai": [
        "https://www.artificialintelligence-news.com/feed/",
        "https://openai.com/news/rss.xml",
        # Additional AI-specific feeds
    ],
    "science": [
        "https://www.science.org/rss/news_current.xml",
        "https://www.nature.com/nature.rss",
        # Additional science feeds
    ],
    # Additional categories
}

Hugging Face Spaces Deployment

Setting Up a Space

Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "Streamlit" as the SDK
- Choose the hardware tier (use the default 16GB RAM)
Upload the project files:
- You can upload files directly through the Hugging Face web interface
- Alternatively, use Git to push to the Hugging Face repository
- Make sure to include all necessary files including requirements.txt

Setting Up Secrets

Add API keys as secrets:
- Go to the "Settings" tab of your Space
- Navigate to the "Repository secrets" section
- Add your API keys:
  - OPENAI_API_KEY
  - NEWS_API_KEY
  - FACTCHECK_API_KEY

Configuring the Space

Edit the metadata in the README.md file:

---
title: Askveracity
emoji: 📉
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---

Custom Configuration

Adjusting Rate Limits

You can adjust the rate limits in config.py based on your API subscription levels:

# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600}  # 500 requests per hour

Modifying RSS Feeds

The list of RSS feeds can be found in modules/rss_feed.py and category-specific feeds in modules/category_detection.py. You can add or remove feeds as needed.

Performance Evaluation

The system includes a performance evaluation script evaluate_performance.py that:

Runs the fact-checking system on a predefined set of test claims
Calculates accuracy, safety rate, processing time, and confidence metrics
Generates visualization charts in the results/ directory
Saves detailed results to results/performance_results.json

To run the performance evaluation:

python evaluate_performance.py [--limit N] [--output FILE]

--limit N: Limit evaluation to first N claims (default: all)
--output FILE: Save results to FILE (default: performance_results.json)

Running the Application

Start the Streamlit app:

streamlit run app.py

The application will be available at http://localhost:8501 by default.

Troubleshooting

API Key Issues

If you encounter API key errors:

Verify that your API keys are set correctly
Check the logs for specific error messages
Make sure API keys are not expired or rate-limited

Model Loading Errors

If spaCy model fails to load:

# Reinstall the model
python -m spacy download en_core_web_sm --force

Rate Limiting

If you encounter rate limiting issues:

Reduce the number of requests by adjusting RATE_LIMITS in config.py
Increase the backoff parameters in ERROR_BACKOFF
Subscribe to higher API tiers if available

Memory Issues

If the application crashes due to memory issues:

Reduce the number of parallel workers in RSS_SETTINGS
Limit the maximum number of evidence items processed

Performance Optimization

For better performance:

Upgrade to a higher-tier OpenAI model for improved accuracy
Increase the number of parallel workers for evidence retrieval
Add more relevant RSS feeds to improve evidence gathering