Spaces:

ankanghosh
/

askveracity

Running

File size: 7,816 Bytes

6d11371

# AskVeracity Configuration Guide

This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.

## Prerequisites

Before setting up AskVeracity, ensure you have:

- Python 3.8 or higher
- pip (Python package installer)
- Git (for cloning the repository)
- API keys for external services

## Installation

### Local Development

1. Clone the repository:
   ```bash
   git clone https://github.com/yourusername/askveracity.git
   cd askveracity
   ```

2. Install the required dependencies:
   ```bash
   pip install -r requirements.txt
   ```

3. Download the required spaCy model:
   ```bash
   python -m spacy download en_core_web_sm
   ```

## API Key Configuration

AskVeracity requires several API keys to access external services. You have two options for configuring these keys:

### Option 1: Using Streamlit Secrets (Recommended for Local Development)

1. Create a `.streamlit` directory if it doesn't exist:
   ```bash
   mkdir -p .streamlit
   ```

2. Create a `secrets.toml` file:
   ```bash
   cp .streamlit/secrets.toml.example .streamlit/secrets.toml
   ```

3. Edit the `.streamlit/secrets.toml` file with your API keys:
   ```toml
   OPENAI_API_KEY = "your_openai_api_key"
   NEWS_API_KEY = "your_news_api_key"
   FACTCHECK_API_KEY = "your_factcheck_api_key"
   ```

### Option 2: Using Environment Variables

1. Create a `.env` file in the root directory:
   ```bash
   touch .env
   ```

2. Add your API keys to the `.env` file:
   ```
   OPENAI_API_KEY=your_openai_api_key
   NEWS_API_KEY=your_news_api_key
   FACTCHECK_API_KEY=your_factcheck_api_key
   ```

3. Load the environment variables:
   ```python
   # In Python
   from dotenv import load_dotenv
   load_dotenv()
   ```

   Or in your terminal:
   ```bash
   # Unix/Linux/MacOS
   source .env
   
   # Windows
   # Install python-dotenv[cli] and run
   dotenv run streamlit run app.py
   ```

## Required API Keys

AskVeracity uses the following external APIs:

1. **OpenAI API** (Required)
   - Used for claim extraction, classification, and explanation generation
   - Get an API key from [OpenAI's website](https://platform.openai.com/)

2. **News API** (Optional but recommended)
   - Used for retrieving news article evidence
   - Get an API key from [NewsAPI.org](https://newsapi.org/)

3. **Google Fact Check Tools API** (Optional but recommended)
   - Used for retrieving fact-checking evidence
   - Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api)

## Configuration Files

### config.py

The main configuration file is `config.py`, which contains:

- API key handling
- Rate limiting configuration
- Error backoff settings
- RSS feed settings

Important configuration sections in `config.py`:

```python
# Rate limiting configuration
RATE_LIMITS = {
    # api_name: {"requests": max_requests, "period": period_in_seconds}
    "newsapi": {"requests": 100, "period": 3600},  # 100 requests per hour
    "factcheck": {"requests": 1000, "period": 86400},  # 1000 requests per day
    "semantic_scholar": {"requests": 10, "period": 300},  # 10 requests per 5 minutes
    "wikidata": {"requests": 60, "period": 60},  # 60 requests per minute
    "wikipedia": {"requests": 200, "period": 60},  # 200 requests per minute
    "rss": {"requests": 300, "period": 3600}  # 300 RSS requests per hour
}

# Error backoff settings
ERROR_BACKOFF = {
    "max_retries": 5,
    "initial_backoff": 1,  # seconds
    "backoff_factor": 2,  # exponential backoff
}

# RSS feed settings
RSS_SETTINGS = {
    "max_feeds_per_request": 10,  # Maximum number of feeds to try per request
    "max_age_days": 3,            # Maximum age of RSS items to consider
    "timeout_seconds": 5,         # Timeout for RSS feed requests
    "max_workers": 5              # Number of parallel workers for fetching feeds
}
```

### Category-Specific RSS Feeds

Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category:

```python
CATEGORY_SPECIFIC_FEEDS = {
    "ai": [
        "https://www.artificialintelligence-news.com/feed/",
        "https://openai.com/news/rss.xml",
        # Additional AI-specific feeds
    ],
    "science": [
        "https://www.science.org/rss/news_current.xml",
        "https://www.nature.com/nature.rss",
        # Additional science feeds
    ],
    # Additional categories
}
```

## Hugging Face Spaces Deployment

### Setting Up a Space

1. Create a new Space on Hugging Face:
   - Go to https://huggingface.co/spaces
   - Click "Create new Space"
   - Select "Streamlit" as the SDK
   - Choose the hardware tier (use the default 16GB RAM)

2. Upload the project files:
   - You can upload files directly through the Hugging Face web interface
   - Alternatively, use Git to push to the Hugging Face repository
   - Make sure to include all necessary files including requirements.txt

### Setting Up Secrets

1. Add API keys as secrets:
   - Go to the "Settings" tab of your Space
   - Navigate to the "Repository secrets" section
   - Add your API keys:
     - `OPENAI_API_KEY`
     - `NEWS_API_KEY`
     - `FACTCHECK_API_KEY`

### Configuring the Space

Edit the metadata in the `README.md` file:

```yaml
---
title: Askveracity
emoji: 📉
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---
```

## Custom Configuration

### Adjusting Rate Limits

You can adjust the rate limits in `config.py` based on your API subscription levels:

```python
# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600}  # 500 requests per hour
```

### Modifying RSS Feeds

The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed.

### Performance Evaluation

The system includes a performance evaluation script `evaluate_performance.py` that:

1. Runs the fact-checking system on a predefined set of test claims
2. Calculates accuracy, safety rate, processing time, and confidence metrics
3. Generates visualization charts in the `results/` directory
4. Saves detailed results to `results/performance_results.json`

To run the performance evaluation:

```bash
python evaluate_performance.py [--limit N] [--output FILE]
```

- `--limit N`: Limit evaluation to first N claims (default: all)
- `--output FILE`: Save results to FILE (default: performance_results.json)

## Running the Application

Start the Streamlit app:

```bash
streamlit run app.py
```

The application will be available at http://localhost:8501 by default.

## Troubleshooting

### API Key Issues

If you encounter API key errors:

1. Verify that your API keys are set correctly
2. Check the logs for specific error messages
3. Make sure API keys are not expired or rate-limited

### Model Loading Errors

If spaCy model fails to load:

```bash
# Reinstall the model
python -m spacy download en_core_web_sm --force
```

### Rate Limiting

If you encounter rate limiting issues:

1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py`
2. Increase the backoff parameters in `ERROR_BACKOFF`
3. Subscribe to higher API tiers if available

### Memory Issues

If the application crashes due to memory issues:

1. Reduce the number of parallel workers in `RSS_SETTINGS`
2. Limit the maximum number of evidence items processed

## Performance Optimization

For better performance:

1. Upgrade to a higher-tier OpenAI model for improved accuracy
2. Increase the number of parallel workers for evidence retrieval
3. Add more relevant RSS feeds to improve evidence gathering