askveracity / docs /configuration.md
ankanghosh's picture
Upload 21 files
6d11371 verified

AskVeracity Configuration Guide

This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.

Prerequisites

Before setting up AskVeracity, ensure you have:

  • Python 3.8 or higher
  • pip (Python package installer)
  • Git (for cloning the repository)
  • API keys for external services

Installation

Local Development

  1. Clone the repository:

    git clone https://github.com/yourusername/askveracity.git
    cd askveracity
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    
  3. Download the required spaCy model:

    python -m spacy download en_core_web_sm
    

API Key Configuration

AskVeracity requires several API keys to access external services. You have two options for configuring these keys:

Option 1: Using Streamlit Secrets (Recommended for Local Development)

  1. Create a .streamlit directory if it doesn't exist:

    mkdir -p .streamlit
    
  2. Create a secrets.toml file:

    cp .streamlit/secrets.toml.example .streamlit/secrets.toml
    
  3. Edit the .streamlit/secrets.toml file with your API keys:

    OPENAI_API_KEY = "your_openai_api_key"
    NEWS_API_KEY = "your_news_api_key"
    FACTCHECK_API_KEY = "your_factcheck_api_key"
    

Option 2: Using Environment Variables

  1. Create a .env file in the root directory:

    touch .env
    
  2. Add your API keys to the .env file:

    OPENAI_API_KEY=your_openai_api_key
    NEWS_API_KEY=your_news_api_key
    FACTCHECK_API_KEY=your_factcheck_api_key
    
  3. Load the environment variables:

    # In Python
    from dotenv import load_dotenv
    load_dotenv()
    

    Or in your terminal:

    # Unix/Linux/MacOS
    source .env
    
    # Windows
    # Install python-dotenv[cli] and run
    dotenv run streamlit run app.py
    

Required API Keys

AskVeracity uses the following external APIs:

  1. OpenAI API (Required)

    • Used for claim extraction, classification, and explanation generation
    • Get an API key from OpenAI's website
  2. News API (Optional but recommended)

    • Used for retrieving news article evidence
    • Get an API key from NewsAPI.org
  3. Google Fact Check Tools API (Optional but recommended)

Configuration Files

config.py

The main configuration file is config.py, which contains:

  • API key handling
  • Rate limiting configuration
  • Error backoff settings
  • RSS feed settings

Important configuration sections in config.py:

# Rate limiting configuration
RATE_LIMITS = {
    # api_name: {"requests": max_requests, "period": period_in_seconds}
    "newsapi": {"requests": 100, "period": 3600},  # 100 requests per hour
    "factcheck": {"requests": 1000, "period": 86400},  # 1000 requests per day
    "semantic_scholar": {"requests": 10, "period": 300},  # 10 requests per 5 minutes
    "wikidata": {"requests": 60, "period": 60},  # 60 requests per minute
    "wikipedia": {"requests": 200, "period": 60},  # 200 requests per minute
    "rss": {"requests": 300, "period": 3600}  # 300 RSS requests per hour
}

# Error backoff settings
ERROR_BACKOFF = {
    "max_retries": 5,
    "initial_backoff": 1,  # seconds
    "backoff_factor": 2,  # exponential backoff
}

# RSS feed settings
RSS_SETTINGS = {
    "max_feeds_per_request": 10,  # Maximum number of feeds to try per request
    "max_age_days": 3,            # Maximum age of RSS items to consider
    "timeout_seconds": 5,         # Timeout for RSS feed requests
    "max_workers": 5              # Number of parallel workers for fetching feeds
}

Category-Specific RSS Feeds

Category-specific RSS feeds are defined in modules/category_detection.py. These feeds are used to prioritize sources based on the detected claim category:

CATEGORY_SPECIFIC_FEEDS = {
    "ai": [
        "https://www.artificialintelligence-news.com/feed/",
        "https://openai.com/news/rss.xml",
        # Additional AI-specific feeds
    ],
    "science": [
        "https://www.science.org/rss/news_current.xml",
        "https://www.nature.com/nature.rss",
        # Additional science feeds
    ],
    # Additional categories
}

Hugging Face Spaces Deployment

Setting Up a Space

  1. Create a new Space on Hugging Face:

  2. Upload the project files:

    • You can upload files directly through the Hugging Face web interface
    • Alternatively, use Git to push to the Hugging Face repository
    • Make sure to include all necessary files including requirements.txt

Setting Up Secrets

  1. Add API keys as secrets:
    • Go to the "Settings" tab of your Space
    • Navigate to the "Repository secrets" section
    • Add your API keys:
      • OPENAI_API_KEY
      • NEWS_API_KEY
      • FACTCHECK_API_KEY

Configuring the Space

Edit the metadata in the README.md file:

---
title: Askveracity
emoji: πŸ“‰
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---

Custom Configuration

Adjusting Rate Limits

You can adjust the rate limits in config.py based on your API subscription levels:

# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600}  # 500 requests per hour

Modifying RSS Feeds

The list of RSS feeds can be found in modules/rss_feed.py and category-specific feeds in modules/category_detection.py. You can add or remove feeds as needed.

Performance Evaluation

The system includes a performance evaluation script evaluate_performance.py that:

  1. Runs the fact-checking system on a predefined set of test claims
  2. Calculates accuracy, safety rate, processing time, and confidence metrics
  3. Generates visualization charts in the results/ directory
  4. Saves detailed results to results/performance_results.json

To run the performance evaluation:

python evaluate_performance.py [--limit N] [--output FILE]
  • --limit N: Limit evaluation to first N claims (default: all)
  • --output FILE: Save results to FILE (default: performance_results.json)

Running the Application

Start the Streamlit app:

streamlit run app.py

The application will be available at http://localhost:8501 by default.

Troubleshooting

API Key Issues

If you encounter API key errors:

  1. Verify that your API keys are set correctly
  2. Check the logs for specific error messages
  3. Make sure API keys are not expired or rate-limited

Model Loading Errors

If spaCy model fails to load:

# Reinstall the model
python -m spacy download en_core_web_sm --force

Rate Limiting

If you encounter rate limiting issues:

  1. Reduce the number of requests by adjusting RATE_LIMITS in config.py
  2. Increase the backoff parameters in ERROR_BACKOFF
  3. Subscribe to higher API tiers if available

Memory Issues

If the application crashes due to memory issues:

  1. Reduce the number of parallel workers in RSS_SETTINGS
  2. Limit the maximum number of evidence items processed

Performance Optimization

For better performance:

  1. Upgrade to a higher-tier OpenAI model for improved accuracy
  2. Increase the number of parallel workers for evidence retrieval
  3. Add more relevant RSS feeds to improve evidence gathering