Spaces:
Running
Running
# Data Handling in AskVeracity | |
This document explains how data flows through the AskVeracity fact-checking and misinformation detection system, from user input to final verification results. | |
## Data Flow Overview | |
``` | |
User Input β Claim Extraction β Category Detection β Evidence Retrieval β Evidence Analysis β Classification β Explanation β Result Display | |
``` | |
## User Input Processing | |
### Input Sanitization and Extraction | |
1. **Input Acceptance:** The system accepts user input as free-form text through the Streamlit interface. | |
2. **Claim Extraction** (`modules/claim_extraction.py`): | |
- For concise inputs (<30 words), the system preserves the input as-is | |
- For longer texts, an LLM extracts the main factual claim | |
- Validation ensures the extraction doesn't add information not present in the original | |
- Entity preservation is verified using spaCy's NER | |
3. **Claim Shortening:** | |
- For evidence retrieval, claims are shortened to preserve key entities and context | |
- Preserves entity mentions, key nouns, titles, country references, and negation contexts | |
## Evidence Retrieval and Processing | |
### Multi-source Evidence Gathering | |
Evidence is collected from multiple sources in parallel (`modules/evidence_retrieval.py`): | |
1. **Category Detection** (`modules/category_detection.py`): | |
- Detects the claim category (ai, science, technology, politics, business, world, sports, entertainment) | |
- Prioritizes sources based on category | |
- No category receives preferential weighting; assignment is based purely on keyword matching | |
2. **Wikipedia** evidence: | |
- Search Wikipedia API for relevant articles | |
- Extract introductory paragraphs | |
- Process in parallel for up to 3 top search results | |
3. **Wikidata** evidence: | |
- SPARQL queries for structured data | |
- Entity extraction with descriptions | |
4. **News API** evidence: | |
- Retrieval from NewsAPI.org with date filtering | |
- Prioritizes recent articles | |
- Extracts titles, descriptions, and content snippets | |
5. **RSS Feed** evidence (`modules/rss_feed.py`): | |
- Parallel retrieval from multiple RSS feeds | |
- Category-specific feeds selection | |
- Relevance and recency scoring | |
6. **ClaimReview** evidence: | |
- Google's Fact Check Tools API integration | |
- Retrieves fact-checks from fact-checking organizations | |
- Includes ratings and publisher information | |
7. **Scholarly** evidence: | |
- OpenAlex API for academic sources | |
- Extracts titles, abstracts, and publication dates | |
8. **Category Fallback** mechanism: | |
- For AI claims, falls back to technology sources if insufficient evidence (for RSS feeds) | |
- For other categories, falls back to default RSS feeds | |
- Ensures robust evidence retrieval across related domains | |
### Evidence Preprocessing | |
Each evidence item is standardized to a consistent format: | |
``` | |
Title: [title], Source: [source], Date: [date], URL: [url], Content: [content snippet] | |
``` | |
Length limits are applied to reduce token usage: | |
- Content snippets are limited to ~1000 characters | |
- Evidence items are truncated while maintaining context | |
## Evidence Analysis and Relevance Ranking | |
### Relevance Assessment | |
Evidence is analyzed and scored for relevance: | |
1. **Component Extraction:** | |
- Extract entities, verbs, and keywords from the claim | |
- Use NLP processing to identify key claim components | |
2. **Entity and Verb Matching:** | |
- Match entities from claim to evidence (case-sensitive and case-insensitive) | |
- Match verbs from claim to evidence | |
- Score based on matches (entity matches weighted higher than verb matches) | |
3. **Temporal Relevance:** | |
- Detection of temporal indicators in claims | |
- Date-based filtering for time-sensitive claims | |
- Adjusts evidence retrieval window based on claim temporal context | |
4. **Scoring Formula:** | |
``` | |
final_score = (entity_matches * 3.0) + (verb_matches * 2.0) | |
``` | |
If no entity or verb matches, fall back to keyword matching: | |
``` | |
final_score = keyword_matches * 1.0 | |
``` | |
### Evidence Selection | |
The system selects the most relevant evidence: | |
1. **Relevance Sorting:** | |
- Evidence items sorted by relevance score (descending) | |
- Top 10 most relevant items selected | |
2. **Handling No Evidence:** | |
- If no evidence is found, a placeholder is returned | |
- Ensures graceful handling of edge cases | |
## Truth Classification | |
### Evidence Classification (`modules/classification.py`) | |
Each evidence item is classified individually: | |
1. **LLM Classification:** | |
- Each evidence item is analyzed by an LLM | |
- Classification categories: support, contradict, insufficient | |
- Confidence score (0-100) assigned to each classification | |
- Structured output parsing with fallback mechanisms | |
2. **Tense Normalization:** | |
- Normalizes verb tenses in claims to ensure consistent classification | |
- Converts present simple and perfect forms to past tense equivalents | |
- Preserves semantic equivalence across tense variations | |
### Verdict Aggregation | |
Evidence classifications are aggregated to determine the final verdict: | |
1. **Weighted Aggregation:** | |
- 55% weight for count of support/contradict items | |
- 45% weight for quality (confidence) of support/contradict items | |
2. **Confidence Calculation:** | |
- Formula: `1.0 - (min_score / max_score)` | |
- Higher confidence for consistent evidence | |
- Lower confidence for mixed or insufficient evidence | |
3. **Final Verdict Categories:** | |
- "True (Based on Evidence)" | |
- "False (Based on Evidence)" | |
- "Uncertain" | |
## Explanation Generation | |
### Explanation Creation (`modules/explanation.py`) | |
Human-readable explanations are generated based on the verdict: | |
1. **Template Selection:** | |
- Different prompts for true, false, and uncertain verdicts | |
- Special handling for claims containing negation | |
2. **Confidence Communication:** | |
- Translation of confidence scores to descriptive language | |
- Clear communication of certainty/uncertainty | |
3. **Very Low Confidence Handling:** | |
- Special explanations for verdicts with very low confidence (<10%) | |
- Strong recommendations to verify with authoritative sources | |
## Result Presentation | |
Results are presented in the Streamlit UI with multiple components: | |
1. **Verdict Display:** | |
- Color-coded verdict (green for true, red for false, gray for uncertain) | |
- Confidence percentage | |
- Explanation text | |
2. **Evidence Presentation:** | |
- Tabbed interface for different evidence views with URLs if available | |
- Supporting and contradicting evidence tabs | |
- Source distribution summary | |
3. **Input Guidance:** | |
- Tips for claim formatting | |
- Guidance for time-sensitive claims | |
- Suggestions for verb tense based on claim age | |
4. **Processing Insights:** | |
- Processing time | |
- AI reasoning steps | |
- Source distribution statistics | |
## Data Persistence and Privacy | |
AskVeracity prioritizes user privacy: | |
1. **No Data Storage:** | |
- User claims are not stored persistently | |
- Results are maintained only in session state | |
- No user data is collected or retained | |
2. **Session Management:** | |
- Session state in Streamlit manages current user interaction | |
- Session is cleared when starting a new verification | |
3. **API Interaction:** | |
- External API calls use their respective privacy policies | |
- OpenAI API usage follows their data handling practices | |
4. **Caching:** | |
- Model caching for performance | |
- Resource cleanup on application termination | |
## Performance Tracking | |
The system includes a performance tracking utility (`utils/performance.py`): | |
1. **Metrics Tracked:** | |
- Claims processed count | |
- Evidence retrieval success rates | |
- Processing times | |
- Confidence scores | |
- Source types used | |
- Temporal relevance | |
2. **Usage:** | |
- Performance metrics are logged during processing | |
- Summary of select metrics available in the final result | |
- Used for system optimization | |
## Performance Evaluation | |
The system includes a performance evaluation script (`evaluate_performance.py`): | |
1. **Test Claims:** | |
- Predefined set of test claims with known ground truth labels | |
- Claims categorized as "True", "False", or "Uncertain" | |
2. **Metrics:** | |
- Overall accuracy: Percentage of claims correctly classified according to ground truth | |
- Safety rate: Percentage of claims either correctly classified or safely categorized as "Uncertain" rather than making an incorrect assertion | |
- Per-class accuracy and safety rates | |
- Average processing time | |
- Average confidence score | |
- Classification distributions | |
3. **Visualization:** | |
- Charts for accuracy by classification type | |
- Charts for safety rate by classification type | |
- Processing time by classification type | |
- Confidence scores by classification type | |
4. **Results Storage:** | |
- Detailed results saved to JSON file | |
- Visualization charts saved as PNG files | |
- All results stored in the `results/` directory | |
## Error Handling and Resilience | |
The system implements robust error handling: | |
1. **API Error Handling** (`utils/api_utils.py`): | |
- Decorator-based error handling | |
- Exponential backoff for retries | |
- Rate limiting respecting API constraints | |
2. **Safe JSON Parsing:** | |
- Defensive parsing of API responses | |
- Fallback mechanisms for invalid responses | |
3. **Graceful Degradation:** | |
- Multiple fallback strategies | |
- Core functionality preservation even when some sources fail | |
4. **Fallback Mechanisms:** | |
- Fallback for truth classification when classifier is not called | |
- Fallback for explanation generation when explanation generator is not called | |
- Ensures complete results even with partial component failures |