Spaces:
Running
AskVeracity Configuration Guide
This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.
Prerequisites
Before setting up AskVeracity, ensure you have:
- Python 3.8 or higher
- pip (Python package installer)
- Git (for cloning the repository)
- API keys for external services
Installation
Local Development
Clone the repository:
git clone https://github.com/yourusername/askveracity.git cd askveracity
Install the required dependencies:
pip install -r requirements.txt
Download the required spaCy model:
python -m spacy download en_core_web_sm
API Key Configuration
AskVeracity requires several API keys to access external services. You have two options for configuring these keys:
Option 1: Using Streamlit Secrets (Recommended for Local Development)
Create a
.streamlit
directory if it doesn't exist:mkdir -p .streamlit
Create a
secrets.toml
file:cp .streamlit/secrets.toml.example .streamlit/secrets.toml
Edit the
.streamlit/secrets.toml
file with your API keys:OPENAI_API_KEY = "your_openai_api_key" NEWS_API_KEY = "your_news_api_key" FACTCHECK_API_KEY = "your_factcheck_api_key"
Option 2: Using Environment Variables
Create a
.env
file in the root directory:touch .env
Add your API keys to the
.env
file:OPENAI_API_KEY=your_openai_api_key NEWS_API_KEY=your_news_api_key FACTCHECK_API_KEY=your_factcheck_api_key
Load the environment variables:
# In Python from dotenv import load_dotenv load_dotenv()
Or in your terminal:
# Unix/Linux/MacOS source .env # Windows # Install python-dotenv[cli] and run dotenv run streamlit run app.py
Required API Keys
AskVeracity uses the following external APIs:
OpenAI API (Required)
- Used for claim extraction, classification, and explanation generation
- Get an API key from OpenAI's website
News API (Optional but recommended)
- Used for retrieving news article evidence
- Get an API key from NewsAPI.org
Google Fact Check Tools API (Optional but recommended)
- Used for retrieving fact-checking evidence
- Get an API key from Google Fact Check Tools API
Configuration Files
config.py
The main configuration file is config.py
, which contains:
- API key handling
- Rate limiting configuration
- Error backoff settings
- RSS feed settings
Important configuration sections in config.py
:
# Rate limiting configuration
RATE_LIMITS = {
# api_name: {"requests": max_requests, "period": period_in_seconds}
"newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour
"factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day
"semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes
"wikidata": {"requests": 60, "period": 60}, # 60 requests per minute
"wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute
"rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour
}
# Error backoff settings
ERROR_BACKOFF = {
"max_retries": 5,
"initial_backoff": 1, # seconds
"backoff_factor": 2, # exponential backoff
}
# RSS feed settings
RSS_SETTINGS = {
"max_feeds_per_request": 10, # Maximum number of feeds to try per request
"max_age_days": 3, # Maximum age of RSS items to consider
"timeout_seconds": 5, # Timeout for RSS feed requests
"max_workers": 5 # Number of parallel workers for fetching feeds
}
Category-Specific RSS Feeds
Category-specific RSS feeds are defined in modules/category_detection.py
. These feeds are used to prioritize sources based on the detected claim category:
CATEGORY_SPECIFIC_FEEDS = {
"ai": [
"https://www.artificialintelligence-news.com/feed/",
"https://openai.com/news/rss.xml",
# Additional AI-specific feeds
],
"science": [
"https://www.science.org/rss/news_current.xml",
"https://www.nature.com/nature.rss",
# Additional science feeds
],
# Additional categories
}
Hugging Face Spaces Deployment
Setting Up a Space
Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "Streamlit" as the SDK
- Choose the hardware tier (use the default 16GB RAM)
Upload the project files:
- You can upload files directly through the Hugging Face web interface
- Alternatively, use Git to push to the Hugging Face repository
- Make sure to include all necessary files including requirements.txt
Setting Up Secrets
- Add API keys as secrets:
- Go to the "Settings" tab of your Space
- Navigate to the "Repository secrets" section
- Add your API keys:
OPENAI_API_KEY
NEWS_API_KEY
FACTCHECK_API_KEY
Configuring the Space
Edit the metadata in the README.md
file:
---
title: Askveracity
emoji: π
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---
Custom Configuration
Adjusting Rate Limits
You can adjust the rate limits in config.py
based on your API subscription levels:
# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour
Modifying RSS Feeds
The list of RSS feeds can be found in modules/rss_feed.py
and category-specific feeds in modules/category_detection.py
. You can add or remove feeds as needed.
Performance Evaluation
The system includes a performance evaluation script evaluate_performance.py
that:
- Runs the fact-checking system on a predefined set of test claims
- Calculates accuracy, safety rate, processing time, and confidence metrics
- Generates visualization charts in the
results/
directory - Saves detailed results to
results/performance_results.json
To run the performance evaluation:
python evaluate_performance.py [--limit N] [--output FILE]
--limit N
: Limit evaluation to first N claims (default: all)--output FILE
: Save results to FILE (default: performance_results.json)
Running the Application
Start the Streamlit app:
streamlit run app.py
The application will be available at http://localhost:8501 by default.
Troubleshooting
API Key Issues
If you encounter API key errors:
- Verify that your API keys are set correctly
- Check the logs for specific error messages
- Make sure API keys are not expired or rate-limited
Model Loading Errors
If spaCy model fails to load:
# Reinstall the model
python -m spacy download en_core_web_sm --force
Rate Limiting
If you encounter rate limiting issues:
- Reduce the number of requests by adjusting
RATE_LIMITS
inconfig.py
- Increase the backoff parameters in
ERROR_BACKOFF
- Subscribe to higher API tiers if available
Memory Issues
If the application crashes due to memory issues:
- Reduce the number of parallel workers in
RSS_SETTINGS
- Limit the maximum number of evidence items processed
Performance Optimization
For better performance:
- Upgrade to a higher-tier OpenAI model for improved accuracy
- Increase the number of parallel workers for evidence retrieval
- Add more relevant RSS feeds to improve evidence gathering