ankanghosh commited on
Commit
c95116f
Β·
verified Β·
1 Parent(s): 5e23c82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -111
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: Askveracity
3
  emoji: πŸ“‰
4
  colorFrom: blue
5
  colorTo: pink
@@ -11,67 +11,95 @@ license: mit
11
  short_description: Fact-checking and misinformation detection tool.
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
-
16
  # AskVeracity: Fact Checking System
17
 
18
- A streamlined web application that analyzes claims to determine their truthfulness through evidence gathering and analysis.
 
 
 
 
 
 
 
19
 
20
  ## Overview
21
 
22
- This application uses an agentic AI approach to verify factual claims through a combination of NLP techniques and large language models.
23
 
24
  The AI agent:
25
  1. Uses a ReAct (Reasoning + Acting) methodology to analyze claims
26
- 2. Dynamically gathers evidence from multiple sources (Wikipedia, News APIs, RSS feeds, fact-checking sites)
27
- 3. Intelligently decides which tools to use and in what order based on the claim's category
28
- 4. Classifies the truthfulness of claims using the collected evidence
29
  5. Provides transparency into its reasoning process
30
- 6. Generates clear explanations for its verdict with confidence scores
31
-
32
- ## Features
33
-
34
- - **Claim Extraction**: Identifies and focuses on the primary factual claim
35
- - **Category Detection**: Determines the claim's category to optimize evidence retrieval
36
- - **Multi-source Evidence**: Gathers evidence from Wikipedia, news articles, academic sources, and fact-checking sites
37
- - **Semantic Analysis**: Analyzes evidence relevance using advanced NLP techniques
38
- - **Transparent Classification**: Provides clear verdicts with confidence scores
39
- - **Detailed Explanations**: Generates human-readable explanations for verdicts
40
- - **Interactive UI**: Easy-to-use Streamlit interface with evidence exploration
41
-
42
- ## Project Structure
 
 
 
 
 
 
 
 
 
 
43
 
44
  ```
45
  askveracity/
46
  β”‚
47
- β”œβ”€β”€ app.py # Main Streamlit application
48
- β”œβ”€β”€ agent.py # LangGraph agent implementation
49
- β”œβ”€β”€ config.py # Configuration and API keys
50
- β”œβ”€β”€ requirements.txt # Dependencies for the application
51
- β”œβ”€β”€ .streamlit/ # Streamlit configuration
52
- β”‚ β”œβ”€β”€ config.toml # UI theme configuration
53
- β”‚ └── secrets.toml.example # Example secrets file (do not commit actual secrets)
54
- β”œβ”€β”€ utils/
55
- β”‚ β”œβ”€β”€ __init__.py
56
- β”‚ β”œβ”€β”€ api_utils.py # API rate limiting and error handling
57
- β”‚ β”œβ”€β”€ performance.py # Performance tracking utilities
58
- β”‚ └── models.py # Model initialization functions
59
- β”œβ”€β”€ modules/
60
- β”‚ β”œβ”€β”€ __init__.py
61
- β”‚ β”œβ”€β”€ claim_extraction.py # Claim extraction functionality
62
- β”‚ β”œβ”€β”€ evidence_retrieval.py # Evidence gathering from various sources
63
- β”‚ β”œβ”€β”€ classification.py # Truth classification logic
64
- β”‚ β”œβ”€β”€ explanation.py # Explanation generation
65
- β”‚ β”œβ”€β”€ rss_feed.py # RSS feed evidence retrieval
66
- β”‚ β”œβ”€β”€ semantic_analysis.py # Relevance analysis for evidence
67
- β”‚ └── category_detection.py # Claim category detection
68
- β”œβ”€β”€ data/
69
- β”‚ └── source_credibility.json # Source credibility data
70
- └── tests/
71
- β”œβ”€β”€ __init__.py
72
- └── test_claim_extraction.py # Unit tests for claim extraction
 
 
 
73
  ```
74
 
 
 
 
 
 
 
 
 
 
 
75
  ## Setup and Installation
76
 
77
  ### Local Development
@@ -87,17 +115,16 @@ askveracity/
87
  pip install -r requirements.txt
88
  ```
89
 
90
- 3. Set up your API keys:
91
-
92
- You have two options:
 
93
 
 
 
94
  **Option 1: Using Streamlit secrets (recommended for local development)**
95
 
96
- - Copy the example secrets file to create your own:
97
- ```
98
- cp .streamlit/secrets.toml.example .streamlit/secrets.toml
99
- ```
100
- - Edit `.streamlit/secrets.toml` and add your API keys:
101
  ```toml
102
  OPENAI_API_KEY = "your_openai_api_key"
103
  NEWS_API_KEY = "your_news_api_key"
@@ -106,50 +133,27 @@ askveracity/
106
 
107
  **Option 2: Using environment variables**
108
 
109
- Create a `.env` file in the root directory with the following content:
110
- ```
111
- OPENAI_API_KEY=your_openai_api_key
112
- NEWS_API_KEY=your_news_api_key
113
- FACTCHECK_API_KEY=your_factcheck_api_key
114
- ```
115
 
116
- 4. When using environment variables, load them:
117
-
118
- At the start of your Python script or in your terminal:
119
- ```python
120
- # In Python
121
- from dotenv import load_dotenv
122
- load_dotenv()
123
  ```
124
-
125
- Or in your terminal before running the app:
126
- ```bash
127
- # Unix/Linux/MacOS
128
- source .env
129
-
130
- # Windows
131
- # Install python-dotenv[cli] and run
132
- dotenv run streamlit run app.py
133
  ```
134
 
135
- ### Running the Application
136
-
137
- Launch the Streamlit app by running:
138
- ```
139
- streamlit run app.py
140
- ```
141
-
142
  ### Deploying to Hugging Face Spaces
143
 
144
- 1. Fork this repository to your GitHub account
145
- 2. Create a new Space on Hugging Face:
146
  - Go to https://huggingface.co/spaces
147
  - Click "Create new Space"
148
  - Select "Streamlit" as the SDK
149
- - Choose "From GitHub" as the source
150
- - Connect to your GitHub repository
151
 
152
- 3. Add the required API keys as secrets:
153
  - Go to the "Settings" tab of your Space
154
  - Navigate to the "Repository secrets" section
155
  - Add the following secrets:
@@ -157,38 +161,97 @@ streamlit run app.py
157
  - `NEWS_API_KEY`
158
  - `FACTCHECK_API_KEY`
159
 
160
- 4. Your Space will automatically deploy with the changes
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
- ## Rate Limiting and API Considerations
163
 
164
- The application implements intelligent rate limiting for API calls to:
165
- - Wikipedia
166
- - WikiData
167
- - News API
168
- - Google FactCheck Tools
169
- - RSS feeds
170
 
171
- The system includes exponential backoff for retries and optimized API usage to work within free API tiers. Rate limits can be configured in the `config.py` file.
 
 
 
 
 
 
 
 
172
 
173
- ## Best Practices for Claim Verification
 
 
 
174
 
175
- For optimal results with AskVeracity:
176
- - Keep claims short and precise
177
- - Include key details in your claim
178
- - Phrase claims as direct statements rather than questions
179
- - Be specific about who said what, when relevant
180
 
181
- ## Development Notes
182
 
183
- ### UI Differences Between Environments
 
 
 
 
 
 
 
184
 
185
- When developing locally versus deploying to Hugging Face Spaces, you may notice visual differences in certain UI elements:
186
 
187
- - **Button styling**: Buttons may appear in different colors (blue/purple locally vs. coral/orange on HF Spaces)
188
- - This is due to Hugging Face Spaces applying its own theme based on the `colorFrom` and `colorTo` values in the configuration
189
 
190
- These differences are cosmetic only and don't affect functionality. We've chosen to maintain the default Hugging Face styling for the deployed version.
 
 
 
 
 
191
 
192
  ## License
193
 
194
- This project is licensed under the [MIT License](./LICENSE), allowing free use, modification, and distribution with proper attribution.
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AskVeracity
3
  emoji: πŸ“‰
4
  colorFrom: blue
5
  colorTo: pink
 
11
  short_description: Fact-checking and misinformation detection tool.
12
  ---
13
 
 
 
14
  # AskVeracity: Fact Checking System
15
 
16
+ [![Open in Spaces](https://img.shields.io/badge/πŸ€—-Open%20in%20Spaces-blue.svg)](https://huggingface.co/spaces/ankanghosh/askveracity)
17
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
18
+
19
+ A streamlined web application that analyzes claims to determine their truthfulness through evidence gathering and analysis, supporting efforts in misinformation detection.
20
+
21
+ <p align="center">
22
+ <img src="docs/assets/app_screenshot.png" alt="Application Screenshot" width="800"/>
23
+ </p>
24
 
25
  ## Overview
26
 
27
+ AskVeracity is an agentic AI system that verifies factual claims through a combination of NLP techniques and large language models. The system gathers and analyzes evidence from multiple sources to provide transparent and explainable verdicts.
28
 
29
  The AI agent:
30
  1. Uses a ReAct (Reasoning + Acting) methodology to analyze claims
31
+ 2. Dynamically gathers evidence from multiple sources, prioritized by claim category
32
+ 3. Applies semantic analysis to determine evidence relevance
33
+ 4. Classifies the truthfulness of claims with confidence scores
34
  5. Provides transparency into its reasoning process
35
+ 6. Generates clear explanations for its verdict
36
+
37
+ ## Key Features
38
+
39
+ - **Intelligent Claim Extraction:** Extracts and focuses on the primary factual claim
40
+ - **Category Detection:** Automatically identifies claim categories for optimized evidence retrieval
41
+ - **Multi-source Evidence Gathering:** Collects evidence from:
42
+ - Wikipedia and Wikidata
43
+ - News articles
44
+ - Academic sources via OpenAlex
45
+ - Fact-checking websites
46
+ - Category-specific RSS feeds
47
+ - **Enhanced Entity Matching:** Uses improved entity and verb matching for accurate evidence relevance assessment
48
+ - **Category-Specific Fallbacks:** Ensures robust evidence retrieval with domain-appropriate fallbacks
49
+ - **Transparent Classification:** Provides clear verdicts with confidence scores
50
+ - **Safety-First Classification:** Prioritizes avoiding incorrect assertions when evidence is insufficient
51
+ - **Detailed Explanations:** Generates human-readable explanations for verdicts
52
+ - **Interactive UI:** Easy-to-use Streamlit interface with evidence exploration options
53
+ - **Claim Formatting Guidance:** Helps users format claims optimally for better results
54
+
55
+ ## System Architecture
56
+
57
+ AskVeracity is built with a modular architecture:
58
 
59
  ```
60
  askveracity/
61
  β”‚
62
+ β”œβ”€β”€ agent.py # LangGraph agent implementation
63
+ β”œβ”€β”€ app.py # Main Streamlit application
64
+ β”œβ”€β”€ config.py # Configuration and API keys
65
+ β”œβ”€β”€ evaluate_performance.py # Performance evaluation script
66
+ β”‚
67
+ β”œβ”€β”€ modules/ # Core functionality modules
68
+ β”‚ β”œβ”€β”€ claim_extraction.py # Claim extraction functionality
69
+ β”‚ β”œβ”€β”€ evidence_retrieval.py # Evidence gathering from various sources
70
+ β”‚ β”œβ”€β”€ classification.py # Truth classification logic
71
+ β”‚ β”œβ”€β”€ explanation.py # Explanation generation
72
+ β”‚ β”œβ”€β”€ rss_feed.py # RSS feed evidence retrieval
73
+ β”‚ └── category_detection.py # Claim category detection
74
+ β”‚
75
+ β”œβ”€β”€ utils/ # Utility functions
76
+ β”‚ β”œβ”€β”€ api_utils.py # API rate limiting and error handling
77
+ β”‚ β”œβ”€β”€ performance.py # Performance tracking utilities
78
+ β”‚ └── models.py # Model initialization functions
79
+ β”‚
80
+ β”œβ”€β”€ results/ # Performance evaluation results
81
+ β”‚ β”œβ”€β”€ performance_results.json # Evaluation metrics
82
+ β”‚ └── *.png # Performance visualization charts
83
+ β”‚
84
+ └── docs/ # Documentation
85
+ β”œβ”€β”€ assets/ # Images and other media
86
+ β”‚ └── app_screenshot.png # Application screenshot
87
+ β”œβ”€β”€ architecture.md # System design and component interactions
88
+ β”œβ”€β”€ configuration.md # Setup and environment configuration
89
+ β”œβ”€β”€ data-handling.md # Data processing and flow
90
+ └── changelog.md # Version history
91
  ```
92
 
93
+ ## Claim Verification Process
94
+
95
+ 1. **Claim Extraction:** The system extracts the main factual claim from user input
96
+ 2. **Category Detection:** The claim is categorized (AI, science, technology, politics, business, world, sports, entertainment)
97
+ 3. **Evidence Retrieval:** Evidence is gathered from multiple sources with category-specific prioritization
98
+ 4. **Evidence Analysis:** Evidence relevance is assessed using entity and verb matching
99
+ 5. **Classification:** A weighted evaluation determines the verdict with confidence score
100
+ 6. **Explanation Generation:** A human-readable explanation is generated
101
+ 7. **Result Presentation:** Results are presented with detailed evidence exploration options
102
+
103
  ## Setup and Installation
104
 
105
  ### Local Development
 
115
  pip install -r requirements.txt
116
  ```
117
 
118
+ 3. Download the required spaCy model:
119
+ ```
120
+ python -m spacy download en_core_web_sm
121
+ ```
122
 
123
+ 4. Set up your API keys:
124
+
125
  **Option 1: Using Streamlit secrets (recommended for local development)**
126
 
127
+ - Create a `.streamlit/secrets.toml` file with your API keys:
 
 
 
 
128
  ```toml
129
  OPENAI_API_KEY = "your_openai_api_key"
130
  NEWS_API_KEY = "your_news_api_key"
 
133
 
134
  **Option 2: Using environment variables**
135
 
136
+ - Set environment variables directly or create a `.env` file:
137
+ ```
138
+ OPENAI_API_KEY=your_openai_api_key
139
+ NEWS_API_KEY=your_news_api_key
140
+ FACTCHECK_API_KEY=your_factcheck_api_key
141
+ ```
142
 
143
+ 5. Run the application:
 
 
 
 
 
 
144
  ```
145
+ streamlit run app.py
 
 
 
 
 
 
 
 
146
  ```
147
 
 
 
 
 
 
 
 
148
  ### Deploying to Hugging Face Spaces
149
 
150
+ 1. Create a new Space on Hugging Face:
 
151
  - Go to https://huggingface.co/spaces
152
  - Click "Create new Space"
153
  - Select "Streamlit" as the SDK
154
+ - Choose the hardware tier (recommended: 16GB RAM)
 
155
 
156
+ 2. Add the required API keys as secrets:
157
  - Go to the "Settings" tab of your Space
158
  - Navigate to the "Repository secrets" section
159
  - Add the following secrets:
 
161
  - `NEWS_API_KEY`
162
  - `FACTCHECK_API_KEY`
163
 
164
+ 3. Push your code to the Hugging Face repository or upload files directly through the web interface
165
+
166
+ ## Configuration Options
167
+
168
+ The system includes several configuration options in `config.py`:
169
+
170
+ 1. **API Rate Limits:** Controls request rates to external APIs
171
+ ```python
172
+ RATE_LIMITS = {
173
+ "newsapi": {"requests": 100, "period": 3600}, # 100 requests per hour
174
+ "factcheck": {"requests": 1000, "period": 86400}, # 1000 requests per day
175
+ # Other API limits...
176
+ }
177
+ ```
178
+
179
+ 2. **Error Handling:** Configures retry behavior for API errors
180
+ ```python
181
+ ERROR_BACKOFF = {
182
+ "max_retries": 5,
183
+ "initial_backoff": 1, # seconds
184
+ "backoff_factor": 2, # exponential backoff
185
+ }
186
+ ```
187
+
188
+ 3. **RSS Feed Settings:** Customizes RSS feed handling
189
+ ```python
190
+ RSS_SETTINGS = {
191
+ "max_feeds_per_request": 10,
192
+ "max_age_days": 3,
193
+ "timeout_seconds": 5,
194
+ "max_workers": 5
195
+ }
196
+ ```
197
+
198
+ 4. **Category-Specific RSS Feeds:** Defined in `modules/category_detection.py` for optimized evidence retrieval
199
 
200
+ ## Performance Evaluation and Development
201
 
202
+ The system includes a performance evaluation script that tests the fact-checking capabilities using predefined claims:
 
 
 
 
 
203
 
204
+ ```bash
205
+ python evaluate_performance.py [--limit N] [--output FILE]
206
+ ```
207
+
208
+ The evaluation measures:
209
+ - **Accuracy:** How often the system correctly classifies claims
210
+ - **Safety Rate:** How often the system avoids making incorrect assertions
211
+ - **Processing Time:** Average time to process claims
212
+ - **Confidence Scores:** Average confidence in verdicts
213
 
214
+ Detailed results and visualizations are saved to the `results/` directory. These results are not tracked in the repository as they will vary based on:
215
+ - The evolving nature of available evidence
216
+ - News sources constantly updating and deprioritizing older content
217
+ - Changes in the recency and relevance of test claims
218
 
219
+ Developers should update the claims in `evaluate_performance.py` to use fresh, relevant examples and run the evaluation script to generate current performance metrics. This ensures that performance evaluations remain relevant in the rapidly changing information landscape.
 
 
 
 
220
 
221
+ ## Recent Improvements
222
 
223
+ - **Safety Rate Metric:** Added metric to measure how often the system avoids making incorrect assertions
224
+ - **Refined Relevance Scoring:** Implemented weighted scoring with entity and verb matching with keyword fallback for accurate evidence relevance assessment during classification
225
+ - **Enhanced Evidence Relevance:** Improved entity and verb matching with weighted scoring prioritization and increased evidence gathering from 5 to 10 items
226
+ - **Streamlined Architecture:** Removed source credibility and semantic analysis complexity for improved maintainability
227
+ - **Category-Specific Fallbacks:** AI claims fall back to technology sources; other categories fall back to default RSS feeds
228
+ - **OpenAlex Integration:** Replaced Semantic Scholar with OpenAlex for academic evidence
229
+ - **Improved User Experience:** Enhanced claim processing and result presentation
230
+ - **Better Robustness:** Improved handling of specialized topics and novel terms
231
 
232
+ ## Limitations
233
 
234
+ AskVeracity has several limitations to be aware of:
 
235
 
236
+ - Performance is best for widely-reported news and information published within the last 48 hours
237
+ - The system evaluates claims based on current evidence - claims that were true in the past may be judged differently if circumstances have changed
238
+ - Technical or very specialized claims may receive "Uncertain" verdicts if insufficient evidence is found
239
+ - Non-English claims have limited support
240
+ - The system is designed to indicate uncertainty when evidence is insufficient
241
+ - Results can vary based on available evidence and LLM behavior
242
 
243
  ## License
244
 
245
+ This project is licensed under the [MIT License](./LICENSE), allowing free use, modification, and distribution with proper attribution.
246
+
247
+ ## Blog and Additional Resources
248
+ Read our detailed blog post about the project: [AskVeracity: An Agentic Fact-Checking System for Misinformation Detection](https://researchguy.in/anveshak-spirituality-qa-bridging-faith-and-intelligence/)
249
+
250
+ ## Acknowledgements
251
+ - Built with [LangGraph](https://github.com/langchain-ai/langgraph) and [Streamlit](https://streamlit.io/)
252
+ - Uses OpenAI's API for language model capabilities
253
+ - Leverages open data sources including Wikipedia, Wikidata, and various RSS feeds
254
+
255
+ ## Contact
256
+
257
+ For questions, feedback, or suggestions, please contact us at [email protected].