File size: 9,647 Bytes
6d11371
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
# Data Handling in AskVeracity

This document explains how data flows through the AskVeracity fact-checking and misinformation detection system, from user input to final verification results.

## Data Flow Overview

```
User Input β†’ Claim Extraction β†’ Category Detection β†’ Evidence Retrieval β†’ Evidence Analysis β†’ Classification β†’ Explanation β†’ Result Display
```

## User Input Processing

### Input Sanitization and Extraction

1. **Input Acceptance:** The system accepts user input as free-form text through the Streamlit interface.

2. **Claim Extraction** (`modules/claim_extraction.py`):
   - For concise inputs (<30 words), the system preserves the input as-is
   - For longer texts, an LLM extracts the main factual claim
   - Validation ensures the extraction doesn't add information not present in the original
   - Entity preservation is verified using spaCy's NER

3. **Claim Shortening:**
   - For evidence retrieval, claims are shortened to preserve key entities and context
   - Preserves entity mentions, key nouns, titles, country references, and negation contexts

## Evidence Retrieval and Processing

### Multi-source Evidence Gathering

Evidence is collected from multiple sources in parallel (`modules/evidence_retrieval.py`):

1. **Category Detection** (`modules/category_detection.py`):
   - Detects the claim category (ai, science, technology, politics, business, world, sports, entertainment)
   - Prioritizes sources based on category
   - No category receives preferential weighting; assignment is based purely on keyword matching

2. **Wikipedia** evidence:
   - Search Wikipedia API for relevant articles
   - Extract introductory paragraphs
   - Process in parallel for up to 3 top search results

3. **Wikidata** evidence:
   - SPARQL queries for structured data
   - Entity extraction with descriptions

4. **News API** evidence:
   - Retrieval from NewsAPI.org with date filtering
   - Prioritizes recent articles
   - Extracts titles, descriptions, and content snippets

5. **RSS Feed** evidence (`modules/rss_feed.py`):
   - Parallel retrieval from multiple RSS feeds
   - Category-specific feeds selection
   - Relevance and recency scoring

6. **ClaimReview** evidence:
   - Google's Fact Check Tools API integration
   - Retrieves fact-checks from fact-checking organizations
   - Includes ratings and publisher information

7. **Scholarly** evidence:
   - OpenAlex API for academic sources
   - Extracts titles, abstracts, and publication dates

8. **Category Fallback** mechanism:
   - For AI claims, falls back to technology sources if insufficient evidence (for RSS feeds)
   - For other categories, falls back to default RSS feeds
   - Ensures robust evidence retrieval across related domains

### Evidence Preprocessing

Each evidence item is standardized to a consistent format:
```
Title: [title], Source: [source], Date: [date], URL: [url], Content: [content snippet]
```

Length limits are applied to reduce token usage:
- Content snippets are limited to ~1000 characters
- Evidence items are truncated while maintaining context

## Evidence Analysis and Relevance Ranking

### Relevance Assessment

Evidence is analyzed and scored for relevance:

1. **Component Extraction:**
   - Extract entities, verbs, and keywords from the claim
   - Use NLP processing to identify key claim components

2. **Entity and Verb Matching:**
   - Match entities from claim to evidence (case-sensitive and case-insensitive)
   - Match verbs from claim to evidence
   - Score based on matches (entity matches weighted higher than verb matches)

3. **Temporal Relevance:**
   - Detection of temporal indicators in claims
   - Date-based filtering for time-sensitive claims
   - Adjusts evidence retrieval window based on claim temporal context

4. **Scoring Formula:**
   ```
   final_score = (entity_matches * 3.0) + (verb_matches * 2.0)
   ```
   If no entity or verb matches, fall back to keyword matching:
   ```
   final_score = keyword_matches * 1.0
   ```

### Evidence Selection

The system selects the most relevant evidence:

1. **Relevance Sorting:**
   - Evidence items sorted by relevance score (descending)
   - Top 10 most relevant items selected

2. **Handling No Evidence:**
   - If no evidence is found, a placeholder is returned
   - Ensures graceful handling of edge cases

## Truth Classification

### Evidence Classification (`modules/classification.py`)

Each evidence item is classified individually:

1. **LLM Classification:**
   - Each evidence item is analyzed by an LLM
   - Classification categories: support, contradict, insufficient
   - Confidence score (0-100) assigned to each classification
   - Structured output parsing with fallback mechanisms

2. **Tense Normalization:**
   - Normalizes verb tenses in claims to ensure consistent classification
   - Converts present simple and perfect forms to past tense equivalents
   - Preserves semantic equivalence across tense variations

### Verdict Aggregation

Evidence classifications are aggregated to determine the final verdict:

1. **Weighted Aggregation:**
   - 55% weight for count of support/contradict items
   - 45% weight for quality (confidence) of support/contradict items

2. **Confidence Calculation:**
   - Formula: `1.0 - (min_score / max_score)`
   - Higher confidence for consistent evidence
   - Lower confidence for mixed or insufficient evidence

3. **Final Verdict Categories:**
   - "True (Based on Evidence)"
   - "False (Based on Evidence)"
   - "Uncertain"

## Explanation Generation

### Explanation Creation (`modules/explanation.py`)

Human-readable explanations are generated based on the verdict:

1. **Template Selection:**
   - Different prompts for true, false, and uncertain verdicts
   - Special handling for claims containing negation

2. **Confidence Communication:**
   - Translation of confidence scores to descriptive language
   - Clear communication of certainty/uncertainty

3. **Very Low Confidence Handling:**
   - Special explanations for verdicts with very low confidence (<10%)
   - Strong recommendations to verify with authoritative sources

## Result Presentation

Results are presented in the Streamlit UI with multiple components:

1. **Verdict Display:**
   - Color-coded verdict (green for true, red for false, gray for uncertain)
   - Confidence percentage
   - Explanation text

2. **Evidence Presentation:**
   - Tabbed interface for different evidence views with URLs if available
   - Supporting and contradicting evidence tabs
   - Source distribution summary

3. **Input Guidance:**
   - Tips for claim formatting
   - Guidance for time-sensitive claims
   - Suggestions for verb tense based on claim age

4. **Processing Insights:**
   - Processing time
   - AI reasoning steps
   - Source distribution statistics

## Data Persistence and Privacy

AskVeracity prioritizes user privacy:

1. **No Data Storage:**
   - User claims are not stored persistently
   - Results are maintained only in session state
   - No user data is collected or retained

2. **Session Management:**
   - Session state in Streamlit manages current user interaction
   - Session is cleared when starting a new verification

3. **API Interaction:**
   - External API calls use their respective privacy policies
   - OpenAI API usage follows their data handling practices

4. **Caching:**
   - Model caching for performance
   - Resource cleanup on application termination

## Performance Tracking

The system includes a performance tracking utility (`utils/performance.py`):

1. **Metrics Tracked:**
   - Claims processed count
   - Evidence retrieval success rates
   - Processing times
   - Confidence scores
   - Source types used
   - Temporal relevance

2. **Usage:**
   - Performance metrics are logged during processing
   - Summary of select metrics available in the final result
   - Used for system optimization

## Performance Evaluation

The system includes a performance evaluation script (`evaluate_performance.py`):

1. **Test Claims:**
   - Predefined set of test claims with known ground truth labels
   - Claims categorized as "True", "False", or "Uncertain"

2. **Metrics:**
   - Overall accuracy: Percentage of claims correctly classified according to ground truth
   - Safety rate: Percentage of claims either correctly classified or safely categorized as "Uncertain" rather than making an incorrect assertion
   - Per-class accuracy and safety rates
   - Average processing time
   - Average confidence score
   - Classification distributions

3. **Visualization:**
   - Charts for accuracy by classification type
   - Charts for safety rate by classification type
   - Processing time by classification type
   - Confidence scores by classification type

4. **Results Storage:**
   - Detailed results saved to JSON file
   - Visualization charts saved as PNG files
   - All results stored in the `results/` directory

## Error Handling and Resilience

The system implements robust error handling:

1. **API Error Handling** (`utils/api_utils.py`):
   - Decorator-based error handling
   - Exponential backoff for retries
   - Rate limiting respecting API constraints

2. **Safe JSON Parsing:**
   - Defensive parsing of API responses
   - Fallback mechanisms for invalid responses

3. **Graceful Degradation:**
   - Multiple fallback strategies
   - Core functionality preservation even when some sources fail

4. **Fallback Mechanisms:**
   - Fallback for truth classification when classifier is not called
   - Fallback for explanation generation when explanation generator is not called
   - Ensures complete results even with partial component failures