Spaces:
Running
Data Handling in AskVeracity
This document explains how data flows through the AskVeracity fact-checking and misinformation detection system, from user input to final verification results.
Data Flow Overview
User Input β Claim Extraction β Category Detection β Evidence Retrieval β Evidence Analysis β Classification β Explanation β Result Display
User Input Processing
Input Sanitization and Extraction
Input Acceptance: The system accepts user input as free-form text through the Streamlit interface.
Claim Extraction (
modules/claim_extraction.py
):- For concise inputs (<30 words), the system preserves the input as-is
- For longer texts, an LLM extracts the main factual claim
- Validation ensures the extraction doesn't add information not present in the original
- Entity preservation is verified using spaCy's NER
Claim Shortening:
- For evidence retrieval, claims are shortened to preserve key entities and context
- Preserves entity mentions, key nouns, titles, country references, and negation contexts
Evidence Retrieval and Processing
Multi-source Evidence Gathering
Evidence is collected from multiple sources in parallel (modules/evidence_retrieval.py
):
Category Detection (
modules/category_detection.py
):- Detects the claim category (ai, science, technology, politics, business, world, sports, entertainment)
- Prioritizes sources based on category
- No category receives preferential weighting; assignment is based purely on keyword matching
Wikipedia evidence:
- Search Wikipedia API for relevant articles
- Extract introductory paragraphs
- Process in parallel for up to 3 top search results
Wikidata evidence:
- SPARQL queries for structured data
- Entity extraction with descriptions
News API evidence:
- Retrieval from NewsAPI.org with date filtering
- Prioritizes recent articles
- Extracts titles, descriptions, and content snippets
RSS Feed evidence (
modules/rss_feed.py
):- Parallel retrieval from multiple RSS feeds
- Category-specific feeds selection
- Relevance and recency scoring
ClaimReview evidence:
- Google's Fact Check Tools API integration
- Retrieves fact-checks from fact-checking organizations
- Includes ratings and publisher information
Scholarly evidence:
- OpenAlex API for academic sources
- Extracts titles, abstracts, and publication dates
Category Fallback mechanism:
- For AI claims, falls back to technology sources if insufficient evidence (for RSS feeds)
- For other categories, falls back to default RSS feeds
- Ensures robust evidence retrieval across related domains
Evidence Preprocessing
Each evidence item is standardized to a consistent format:
Title: [title], Source: [source], Date: [date], URL: [url], Content: [content snippet]
Length limits are applied to reduce token usage:
- Content snippets are limited to ~1000 characters
- Evidence items are truncated while maintaining context
Evidence Analysis and Relevance Ranking
Relevance Assessment
Evidence is analyzed and scored for relevance:
Component Extraction:
- Extract entities, verbs, and keywords from the claim
- Use NLP processing to identify key claim components
Entity and Verb Matching:
- Match entities from claim to evidence (case-sensitive and case-insensitive)
- Match verbs from claim to evidence
- Score based on matches (entity matches weighted higher than verb matches)
Temporal Relevance:
- Detection of temporal indicators in claims
- Date-based filtering for time-sensitive claims
- Adjusts evidence retrieval window based on claim temporal context
Scoring Formula:
final_score = (entity_matches * 3.0) + (verb_matches * 2.0)
If no entity or verb matches, fall back to keyword matching:
final_score = keyword_matches * 1.0
Evidence Selection
The system selects the most relevant evidence:
Relevance Sorting:
- Evidence items sorted by relevance score (descending)
- Top 10 most relevant items selected
Handling No Evidence:
- If no evidence is found, a placeholder is returned
- Ensures graceful handling of edge cases
Truth Classification
Evidence Classification (modules/classification.py
)
Each evidence item is classified individually:
LLM Classification:
- Each evidence item is analyzed by an LLM
- Classification categories: support, contradict, insufficient
- Confidence score (0-100) assigned to each classification
- Structured output parsing with fallback mechanisms
Tense Normalization:
- Normalizes verb tenses in claims to ensure consistent classification
- Converts present simple and perfect forms to past tense equivalents
- Preserves semantic equivalence across tense variations
Verdict Aggregation
Evidence classifications are aggregated to determine the final verdict:
Weighted Aggregation:
- 55% weight for count of support/contradict items
- 45% weight for quality (confidence) of support/contradict items
Confidence Calculation:
- Formula:
1.0 - (min_score / max_score)
- Higher confidence for consistent evidence
- Lower confidence for mixed or insufficient evidence
- Formula:
Final Verdict Categories:
- "True (Based on Evidence)"
- "False (Based on Evidence)"
- "Uncertain"
Explanation Generation
Explanation Creation (modules/explanation.py
)
Human-readable explanations are generated based on the verdict:
Template Selection:
- Different prompts for true, false, and uncertain verdicts
- Special handling for claims containing negation
Confidence Communication:
- Translation of confidence scores to descriptive language
- Clear communication of certainty/uncertainty
Very Low Confidence Handling:
- Special explanations for verdicts with very low confidence (<10%)
- Strong recommendations to verify with authoritative sources
Result Presentation
Results are presented in the Streamlit UI with multiple components:
Verdict Display:
- Color-coded verdict (green for true, red for false, gray for uncertain)
- Confidence percentage
- Explanation text
Evidence Presentation:
- Tabbed interface for different evidence views with URLs if available
- Supporting and contradicting evidence tabs
- Source distribution summary
Input Guidance:
- Tips for claim formatting
- Guidance for time-sensitive claims
- Suggestions for verb tense based on claim age
Processing Insights:
- Processing time
- AI reasoning steps
- Source distribution statistics
Data Persistence and Privacy
AskVeracity prioritizes user privacy:
No Data Storage:
- User claims are not stored persistently
- Results are maintained only in session state
- No user data is collected or retained
Session Management:
- Session state in Streamlit manages current user interaction
- Session is cleared when starting a new verification
API Interaction:
- External API calls use their respective privacy policies
- OpenAI API usage follows their data handling practices
Caching:
- Model caching for performance
- Resource cleanup on application termination
Performance Tracking
The system includes a performance tracking utility (utils/performance.py
):
Metrics Tracked:
- Claims processed count
- Evidence retrieval success rates
- Processing times
- Confidence scores
- Source types used
- Temporal relevance
Usage:
- Performance metrics are logged during processing
- Summary of select metrics available in the final result
- Used for system optimization
Performance Evaluation
The system includes a performance evaluation script (evaluate_performance.py
):
Test Claims:
- Predefined set of test claims with known ground truth labels
- Claims categorized as "True", "False", or "Uncertain"
Metrics:
- Overall accuracy: Percentage of claims correctly classified according to ground truth
- Safety rate: Percentage of claims either correctly classified or safely categorized as "Uncertain" rather than making an incorrect assertion
- Per-class accuracy and safety rates
- Average processing time
- Average confidence score
- Classification distributions
Visualization:
- Charts for accuracy by classification type
- Charts for safety rate by classification type
- Processing time by classification type
- Confidence scores by classification type
Results Storage:
- Detailed results saved to JSON file
- Visualization charts saved as PNG files
- All results stored in the
results/
directory
Error Handling and Resilience
The system implements robust error handling:
API Error Handling (
utils/api_utils.py
):- Decorator-based error handling
- Exponential backoff for retries
- Rate limiting respecting API constraints
Safe JSON Parsing:
- Defensive parsing of API responses
- Fallback mechanisms for invalid responses
Graceful Degradation:
- Multiple fallback strategies
- Core functionality preservation even when some sources fail
Fallback Mechanisms:
- Fallback for truth classification when classifier is not called
- Fallback for explanation generation when explanation generator is not called
- Ensures complete results even with partial component failures