Spaces:

ankanghosh
/

askveracity

Running

File size: 9,647 Bytes

6d11371

# Data Handling in AskVeracity

This document explains how data flows through the AskVeracity fact-checking and misinformation detection system, from user input to final verification results.

## Data Flow Overview

```
User Input → Claim Extraction → Category Detection → Evidence Retrieval → Evidence Analysis → Classification → Explanation → Result Display
```

## User Input Processing

### Input Sanitization and Extraction

1. **Input Acceptance:** The system accepts user input as free-form text through the Streamlit interface.

2. **Claim Extraction** (`modules/claim_extraction.py`):
   - For concise inputs (<30 words), the system preserves the input as-is
   - For longer texts, an LLM extracts the main factual claim
   - Validation ensures the extraction doesn't add information not present in the original
   - Entity preservation is verified using spaCy's NER

3. **Claim Shortening:**
   - For evidence retrieval, claims are shortened to preserve key entities and context
   - Preserves entity mentions, key nouns, titles, country references, and negation contexts

## Evidence Retrieval and Processing

### Multi-source Evidence Gathering

Evidence is collected from multiple sources in parallel (`modules/evidence_retrieval.py`):

1. **Category Detection** (`modules/category_detection.py`):
   - Detects the claim category (ai, science, technology, politics, business, world, sports, entertainment)
   - Prioritizes sources based on category
   - No category receives preferential weighting; assignment is based purely on keyword matching

2. **Wikipedia** evidence:
   - Search Wikipedia API for relevant articles
   - Extract introductory paragraphs
   - Process in parallel for up to 3 top search results

3. **Wikidata** evidence:
   - SPARQL queries for structured data
   - Entity extraction with descriptions

4. **News API** evidence:
   - Retrieval from NewsAPI.org with date filtering
   - Prioritizes recent articles
   - Extracts titles, descriptions, and content snippets

5. **RSS Feed** evidence (`modules/rss_feed.py`):
   - Parallel retrieval from multiple RSS feeds
   - Category-specific feeds selection
   - Relevance and recency scoring

6. **ClaimReview** evidence:
   - Google's Fact Check Tools API integration
   - Retrieves fact-checks from fact-checking organizations
   - Includes ratings and publisher information

7. **Scholarly** evidence:
   - OpenAlex API for academic sources
   - Extracts titles, abstracts, and publication dates

8. **Category Fallback** mechanism:
   - For AI claims, falls back to technology sources if insufficient evidence (for RSS feeds)
   - For other categories, falls back to default RSS feeds
   - Ensures robust evidence retrieval across related domains

### Evidence Preprocessing

Each evidence item is standardized to a consistent format:
```
Title: [title], Source: [source], Date: [date], URL: [url], Content: [content snippet]
```

Length limits are applied to reduce token usage:
- Content snippets are limited to ~1000 characters
- Evidence items are truncated while maintaining context

## Evidence Analysis and Relevance Ranking

### Relevance Assessment

Evidence is analyzed and scored for relevance:

1. **Component Extraction:**
   - Extract entities, verbs, and keywords from the claim
   - Use NLP processing to identify key claim components

2. **Entity and Verb Matching:**
   - Match entities from claim to evidence (case-sensitive and case-insensitive)
   - Match verbs from claim to evidence
   - Score based on matches (entity matches weighted higher than verb matches)

3. **Temporal Relevance:**
   - Detection of temporal indicators in claims
   - Date-based filtering for time-sensitive claims
   - Adjusts evidence retrieval window based on claim temporal context

4. **Scoring Formula:**
   ```
   final_score = (entity_matches * 3.0) + (verb_matches * 2.0)
   ```
   If no entity or verb matches, fall back to keyword matching:
   ```
   final_score = keyword_matches * 1.0
   ```

### Evidence Selection

The system selects the most relevant evidence:

1. **Relevance Sorting:**
   - Evidence items sorted by relevance score (descending)
   - Top 10 most relevant items selected

2. **Handling No Evidence:**
   - If no evidence is found, a placeholder is returned
   - Ensures graceful handling of edge cases

## Truth Classification

### Evidence Classification (`modules/classification.py`)

Each evidence item is classified individually:

1. **LLM Classification:**
   - Each evidence item is analyzed by an LLM
   - Classification categories: support, contradict, insufficient
   - Confidence score (0-100) assigned to each classification
   - Structured output parsing with fallback mechanisms

2. **Tense Normalization:**
   - Normalizes verb tenses in claims to ensure consistent classification
   - Converts present simple and perfect forms to past tense equivalents
   - Preserves semantic equivalence across tense variations

### Verdict Aggregation

Evidence classifications are aggregated to determine the final verdict:

1. **Weighted Aggregation:**
   - 55% weight for count of support/contradict items
   - 45% weight for quality (confidence) of support/contradict items

2. **Confidence Calculation:**
   - Formula: `1.0 - (min_score / max_score)`
   - Higher confidence for consistent evidence
   - Lower confidence for mixed or insufficient evidence

3. **Final Verdict Categories:**
   - "True (Based on Evidence)"
   - "False (Based on Evidence)"
   - "Uncertain"

## Explanation Generation

### Explanation Creation (`modules/explanation.py`)

Human-readable explanations are generated based on the verdict:

1. **Template Selection:**
   - Different prompts for true, false, and uncertain verdicts
   - Special handling for claims containing negation

2. **Confidence Communication:**
   - Translation of confidence scores to descriptive language
   - Clear communication of certainty/uncertainty

3. **Very Low Confidence Handling:**
   - Special explanations for verdicts with very low confidence (<10%)
   - Strong recommendations to verify with authoritative sources

## Result Presentation

Results are presented in the Streamlit UI with multiple components:

1. **Verdict Display:**
   - Color-coded verdict (green for true, red for false, gray for uncertain)
   - Confidence percentage
   - Explanation text

2. **Evidence Presentation:**
   - Tabbed interface for different evidence views with URLs if available
   - Supporting and contradicting evidence tabs
   - Source distribution summary

3. **Input Guidance:**
   - Tips for claim formatting
   - Guidance for time-sensitive claims
   - Suggestions for verb tense based on claim age

4. **Processing Insights:**
   - Processing time
   - AI reasoning steps
   - Source distribution statistics

## Data Persistence and Privacy

AskVeracity prioritizes user privacy:

1. **No Data Storage:**
   - User claims are not stored persistently
   - Results are maintained only in session state
   - No user data is collected or retained

2. **Session Management:**
   - Session state in Streamlit manages current user interaction
   - Session is cleared when starting a new verification

3. **API Interaction:**
   - External API calls use their respective privacy policies
   - OpenAI API usage follows their data handling practices

4. **Caching:**
   - Model caching for performance
   - Resource cleanup on application termination

## Performance Tracking

The system includes a performance tracking utility (`utils/performance.py`):

1. **Metrics Tracked:**
   - Claims processed count
   - Evidence retrieval success rates
   - Processing times
   - Confidence scores
   - Source types used
   - Temporal relevance

2. **Usage:**
   - Performance metrics are logged during processing
   - Summary of select metrics available in the final result
   - Used for system optimization

## Performance Evaluation

The system includes a performance evaluation script (`evaluate_performance.py`):

1. **Test Claims:**
   - Predefined set of test claims with known ground truth labels
   - Claims categorized as "True", "False", or "Uncertain"

2. **Metrics:**
   - Overall accuracy: Percentage of claims correctly classified according to ground truth
   - Safety rate: Percentage of claims either correctly classified or safely categorized as "Uncertain" rather than making an incorrect assertion
   - Per-class accuracy and safety rates
   - Average processing time
   - Average confidence score
   - Classification distributions

3. **Visualization:**
   - Charts for accuracy by classification type
   - Charts for safety rate by classification type
   - Processing time by classification type
   - Confidence scores by classification type

4. **Results Storage:**
   - Detailed results saved to JSON file
   - Visualization charts saved as PNG files
   - All results stored in the `results/` directory

## Error Handling and Resilience

The system implements robust error handling:

1. **API Error Handling** (`utils/api_utils.py`):
   - Decorator-based error handling
   - Exponential backoff for retries
   - Rate limiting respecting API constraints

2. **Safe JSON Parsing:**
   - Defensive parsing of API responses
   - Fallback mechanisms for invalid responses

3. **Graceful Degradation:**
   - Multiple fallback strategies
   - Core functionality preservation even when some sources fail

4. **Fallback Mechanisms:**
   - Fallback for truth classification when classifier is not called
   - Fallback for explanation generation when explanation generator is not called
   - Ensures complete results even with partial component failures