# AskVeracity Configuration Guide This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system. ## Prerequisites Before setting up AskVeracity, ensure you have: - Python 3.8 or higher - pip (Python package installer) - Git (for cloning the repository) - API keys for external services ## Installation ### Local Development 1. Clone the repository: ```bash git clone https://github.com/yourusername/askveracity.git cd askveracity ``` 2. Install the required dependencies: ```bash pip install -r requirements.txt ``` 3. Download the required spaCy model: ```bash python -m spacy download en_core_web_sm ``` ## API Key Configuration AskVeracity requires several API keys to access external services. You have two options for configuring these keys: ### Option 1: Using Streamlit Secrets (Recommended for Local Development) 1. Create a `.streamlit` directory if it doesn't exist: ```bash mkdir -p .streamlit ``` 2. Create a `secrets.toml` file: ```bash cp .streamlit/secrets.toml.example .streamlit/secrets.toml ``` 3. Edit the `.streamlit/secrets.toml` file with your API keys: ```toml OPENAI_API_KEY = "your_openai_api_key" NEWS_API_KEY = "your_news_api_key" FACTCHECK_API_KEY = "your_factcheck_api_key" ``` ### Option 2: Using Environment Variables 1. Create a `.env` file in the root directory: ```bash touch .env ``` 2. Add your API keys to the `.env` file: ``` OPENAI_API_KEY=your_openai_api_key NEWS_API_KEY=your_news_api_key FACTCHECK_API_KEY=your_factcheck_api_key ``` 3. Load the environment variables: ```python # In Python from dotenv import load_dotenv load_dotenv() ``` Or in your terminal: ```bash # Unix/Linux/MacOS source .env # Windows # Install python-dotenv[cli] and run dotenv run streamlit run app.py ``` ## Required API Keys AskVeracity uses the following external APIs: 1. **OpenAI API** (Required) - Used for claim extraction, classification, and explanation generation - Get an API key from [OpenAI's website](https://platform.openai.com/) 2. **News API** (Optional but recommended) - Used for retrieving news article evidence - Get an API key from [NewsAPI.org](https://newsapi.org/) 3. **Google Fact Check Tools API** (Optional but recommended) - Used for retrieving fact-checking evidence - Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api) ## Configuration Files ### config.py The main configuration file is `config.py`, which contains: - API key handling - Rate limiting configuration - Error backoff settings - RSS feed settings Important configuration sections in `config.py`: ```python # Rate limiting configuration RATE_LIMITS = { # api_name: {"requests": max_requests, "period": period_in_seconds} "newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour "factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day "semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes "wikidata": {"requests": 60, "period": 60}, # 60 requests per minute "wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute "rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour } # Error backoff settings ERROR_BACKOFF = { "max_retries": 5, "initial_backoff": 1, # seconds "backoff_factor": 2, # exponential backoff } # RSS feed settings RSS_SETTINGS = { "max_feeds_per_request": 10, # Maximum number of feeds to try per request "max_age_days": 3, # Maximum age of RSS items to consider "timeout_seconds": 5, # Timeout for RSS feed requests "max_workers": 5 # Number of parallel workers for fetching feeds } ``` ### Category-Specific RSS Feeds Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category: ```python CATEGORY_SPECIFIC_FEEDS = { "ai": [ "https://www.artificialintelligence-news.com/feed/", "https://openai.com/news/rss.xml", # Additional AI-specific feeds ], "science": [ "https://www.science.org/rss/news_current.xml", "https://www.nature.com/nature.rss", # Additional science feeds ], # Additional categories } ``` ## Hugging Face Spaces Deployment ### Setting Up a Space 1. Create a new Space on Hugging Face: - Go to https://huggingface.co/spaces - Click "Create new Space" - Select "Streamlit" as the SDK - Choose the hardware tier (use the default 16GB RAM) 2. Upload the project files: - You can upload files directly through the Hugging Face web interface - Alternatively, use Git to push to the Hugging Face repository - Make sure to include all necessary files including requirements.txt ### Setting Up Secrets 1. Add API keys as secrets: - Go to the "Settings" tab of your Space - Navigate to the "Repository secrets" section - Add your API keys: - `OPENAI_API_KEY` - `NEWS_API_KEY` - `FACTCHECK_API_KEY` ### Configuring the Space Edit the metadata in the `README.md` file: ```yaml --- title: Askveracity emoji: 📉 colorFrom: blue colorTo: pink sdk: streamlit sdk_version: 1.44.1 app_file: app.py pinned: false license: mit short_description: Fact-checking and misinformation detection tool. --- ``` ## Custom Configuration ### Adjusting Rate Limits You can adjust the rate limits in `config.py` based on your API subscription levels: ```python # Update for higher tier News API subscription RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour ``` ### Modifying RSS Feeds The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed. ### Performance Evaluation The system includes a performance evaluation script `evaluate_performance.py` that: 1. Runs the fact-checking system on a predefined set of test claims 2. Calculates accuracy, safety rate, processing time, and confidence metrics 3. Generates visualization charts in the `results/` directory 4. Saves detailed results to `results/performance_results.json` To run the performance evaluation: ```bash python evaluate_performance.py [--limit N] [--output FILE] ``` - `--limit N`: Limit evaluation to first N claims (default: all) - `--output FILE`: Save results to FILE (default: performance_results.json) ## Running the Application Start the Streamlit app: ```bash streamlit run app.py ``` The application will be available at http://localhost:8501 by default. ## Troubleshooting ### API Key Issues If you encounter API key errors: 1. Verify that your API keys are set correctly 2. Check the logs for specific error messages 3. Make sure API keys are not expired or rate-limited ### Model Loading Errors If spaCy model fails to load: ```bash # Reinstall the model python -m spacy download en_core_web_sm --force ``` ### Rate Limiting If you encounter rate limiting issues: 1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py` 2. Increase the backoff parameters in `ERROR_BACKOFF` 3. Subscribe to higher API tiers if available ### Memory Issues If the application crashes due to memory issues: 1. Reduce the number of parallel workers in `RSS_SETTINGS` 2. Limit the maximum number of evidence items processed ## Performance Optimization For better performance: 1. Upgrade to a higher-tier OpenAI model for improved accuracy 2. Increase the number of parallel workers for evidence retrieval 3. Add more relevant RSS feeds to improve evidence gathering