Spaces:
Running
Running
File size: 7,816 Bytes
6d11371 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
# AskVeracity Configuration Guide
This document describes how to set up and configure the AskVeracity fact-checking and misinformation detection system.
## Prerequisites
Before setting up AskVeracity, ensure you have:
- Python 3.8 or higher
- pip (Python package installer)
- Git (for cloning the repository)
- API keys for external services
## Installation
### Local Development
1. Clone the repository:
```bash
git clone https://github.com/yourusername/askveracity.git
cd askveracity
```
2. Install the required dependencies:
```bash
pip install -r requirements.txt
```
3. Download the required spaCy model:
```bash
python -m spacy download en_core_web_sm
```
## API Key Configuration
AskVeracity requires several API keys to access external services. You have two options for configuring these keys:
### Option 1: Using Streamlit Secrets (Recommended for Local Development)
1. Create a `.streamlit` directory if it doesn't exist:
```bash
mkdir -p .streamlit
```
2. Create a `secrets.toml` file:
```bash
cp .streamlit/secrets.toml.example .streamlit/secrets.toml
```
3. Edit the `.streamlit/secrets.toml` file with your API keys:
```toml
OPENAI_API_KEY = "your_openai_api_key"
NEWS_API_KEY = "your_news_api_key"
FACTCHECK_API_KEY = "your_factcheck_api_key"
```
### Option 2: Using Environment Variables
1. Create a `.env` file in the root directory:
```bash
touch .env
```
2. Add your API keys to the `.env` file:
```
OPENAI_API_KEY=your_openai_api_key
NEWS_API_KEY=your_news_api_key
FACTCHECK_API_KEY=your_factcheck_api_key
```
3. Load the environment variables:
```python
# In Python
from dotenv import load_dotenv
load_dotenv()
```
Or in your terminal:
```bash
# Unix/Linux/MacOS
source .env
# Windows
# Install python-dotenv[cli] and run
dotenv run streamlit run app.py
```
## Required API Keys
AskVeracity uses the following external APIs:
1. **OpenAI API** (Required)
- Used for claim extraction, classification, and explanation generation
- Get an API key from [OpenAI's website](https://platform.openai.com/)
2. **News API** (Optional but recommended)
- Used for retrieving news article evidence
- Get an API key from [NewsAPI.org](https://newsapi.org/)
3. **Google Fact Check Tools API** (Optional but recommended)
- Used for retrieving fact-checking evidence
- Get an API key from [Google Fact Check Tools API](https://developers.google.com/fact-check/tools/api)
## Configuration Files
### config.py
The main configuration file is `config.py`, which contains:
- API key handling
- Rate limiting configuration
- Error backoff settings
- RSS feed settings
Important configuration sections in `config.py`:
```python
# Rate limiting configuration
RATE_LIMITS = {
# api_name: {"requests": max_requests, "period": period_in_seconds}
"newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour
"factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day
"semantic_scholar": {"requests": 10, "period": 300}, # 10 requests per 5 minutes
"wikidata": {"requests": 60, "period": 60}, # 60 requests per minute
"wikipedia": {"requests": 200, "period": 60}, # 200 requests per minute
"rss": {"requests": 300, "period": 3600} # 300 RSS requests per hour
}
# Error backoff settings
ERROR_BACKOFF = {
"max_retries": 5,
"initial_backoff": 1, # seconds
"backoff_factor": 2, # exponential backoff
}
# RSS feed settings
RSS_SETTINGS = {
"max_feeds_per_request": 10, # Maximum number of feeds to try per request
"max_age_days": 3, # Maximum age of RSS items to consider
"timeout_seconds": 5, # Timeout for RSS feed requests
"max_workers": 5 # Number of parallel workers for fetching feeds
}
```
### Category-Specific RSS Feeds
Category-specific RSS feeds are defined in `modules/category_detection.py`. These feeds are used to prioritize sources based on the detected claim category:
```python
CATEGORY_SPECIFIC_FEEDS = {
"ai": [
"https://www.artificialintelligence-news.com/feed/",
"https://openai.com/news/rss.xml",
# Additional AI-specific feeds
],
"science": [
"https://www.science.org/rss/news_current.xml",
"https://www.nature.com/nature.rss",
# Additional science feeds
],
# Additional categories
}
```
## Hugging Face Spaces Deployment
### Setting Up a Space
1. Create a new Space on Hugging Face:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select "Streamlit" as the SDK
- Choose the hardware tier (use the default 16GB RAM)
2. Upload the project files:
- You can upload files directly through the Hugging Face web interface
- Alternatively, use Git to push to the Hugging Face repository
- Make sure to include all necessary files including requirements.txt
### Setting Up Secrets
1. Add API keys as secrets:
- Go to the "Settings" tab of your Space
- Navigate to the "Repository secrets" section
- Add your API keys:
- `OPENAI_API_KEY`
- `NEWS_API_KEY`
- `FACTCHECK_API_KEY`
### Configuring the Space
Edit the metadata in the `README.md` file:
```yaml
---
title: Askveracity
emoji: π
colorFrom: blue
colorTo: pink
sdk: streamlit
sdk_version: 1.44.1
app_file: app.py
pinned: false
license: mit
short_description: Fact-checking and misinformation detection tool.
---
```
## Custom Configuration
### Adjusting Rate Limits
You can adjust the rate limits in `config.py` based on your API subscription levels:
```python
# Update for higher tier News API subscription
RATE_LIMITS["newsapi"] = {"requests": 500, "period": 3600} # 500 requests per hour
```
### Modifying RSS Feeds
The list of RSS feeds can be found in `modules/rss_feed.py` and category-specific feeds in `modules/category_detection.py`. You can add or remove feeds as needed.
### Performance Evaluation
The system includes a performance evaluation script `evaluate_performance.py` that:
1. Runs the fact-checking system on a predefined set of test claims
2. Calculates accuracy, safety rate, processing time, and confidence metrics
3. Generates visualization charts in the `results/` directory
4. Saves detailed results to `results/performance_results.json`
To run the performance evaluation:
```bash
python evaluate_performance.py [--limit N] [--output FILE]
```
- `--limit N`: Limit evaluation to first N claims (default: all)
- `--output FILE`: Save results to FILE (default: performance_results.json)
## Running the Application
Start the Streamlit app:
```bash
streamlit run app.py
```
The application will be available at http://localhost:8501 by default.
## Troubleshooting
### API Key Issues
If you encounter API key errors:
1. Verify that your API keys are set correctly
2. Check the logs for specific error messages
3. Make sure API keys are not expired or rate-limited
### Model Loading Errors
If spaCy model fails to load:
```bash
# Reinstall the model
python -m spacy download en_core_web_sm --force
```
### Rate Limiting
If you encounter rate limiting issues:
1. Reduce the number of requests by adjusting `RATE_LIMITS` in `config.py`
2. Increase the backoff parameters in `ERROR_BACKOFF`
3. Subscribe to higher API tiers if available
### Memory Issues
If the application crashes due to memory issues:
1. Reduce the number of parallel workers in `RSS_SETTINGS`
2. Limit the maximum number of evidence items processed
## Performance Optimization
For better performance:
1. Upgrade to a higher-tier OpenAI model for improved accuracy
2. Increase the number of parallel workers for evidence retrieval
3. Add more relevant RSS feeds to improve evidence gathering |