giuseppericcio commited on
Commit
f708b4d
Β·
1 Parent(s): a7b9b89

Config app

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. .gitignore +2 -0
  3. README.md +85 -2
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *data/* filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ .venv/
2
+ .streamlit/
README.md CHANGED
@@ -4,11 +4,94 @@ emoji: πŸƒ
4
  colorFrom: yellow
5
  colorTo: red
6
  sdk: streamlit
7
- sdk_version: 1.44.1
8
  app_file: app.py
9
  pinned: false
10
  license: cc-by-nc-4.0
11
  short_description: 'πŸ©ΊπŸ” CER Demo: Fact-Checking Biomedical Claims.'
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  colorFrom: yellow
5
  colorTo: red
6
  sdk: streamlit
7
+ sdk_version: 1.42.0
8
  app_file: app.py
9
  pinned: false
10
  license: cc-by-nc-4.0
11
  short_description: 'πŸ©ΊπŸ” CER Demo: Fact-Checking Biomedical Claims.'
12
  ---
13
 
14
+
15
+ # 🩺 CER Demo: *Fact-Checking Biomedical Claims*
16
+
17
+ Welcome to the demo of the *CER (Combining Evidence and Reasoning)* system for fact-checking biomedical claims. This tool combines PubMed, one of the leading biomedical knowledge bases, with Large Language Models (LLMs) to verify the accuracy of claims, generate justifications, and provide reliable classifications.
18
+
19
+ ## πŸŽ₯ Demo (or GIF)
20
+ [Watch our demo]() to see how CER supports biomedical fact-checking and enhances the transparency of scientific recommendations!
21
+
22
+ ## πŸ“Š Data Sources
23
+ We use the following data sources for training and evaluating the system:
24
+
25
+ - **[PubMed](https://pubmed.ncbi.nlm.nih.gov/)**: A biomedical database containing over 20 million abstracts.
26
+ - **HealthFC**: 750 biomedical claims curated by *Vladika et al. (2024)*.
27
+ - **BioASQ-7b**: 745 claims from the *BioASQ Challenge, Nentidis et al. (2020)*.
28
+ - **SciFact**: 1.4k expert-annotated scientific claims (*Wadden et al., 2020*).
29
+
30
+ ## πŸ›  Technologies Used
31
+ - **Python**: Core programming language.
32
+ - **FAISS Indexing**: For efficient retrieval of biomedical abstracts.
33
+ - [**Meta-Llama-3.1-405B-Instruct**](https://huggingface.co/hugging-quants/Meta-Llama-3.1-405B-Instruct-AWQ-INT4): Language model for generating justifications.
34
+ - **PubMedBERT**: Classifier for claim evaluation.
35
+ - **Streamlit**: For building an interactive user interface.
36
+
37
+ The system is designed to work on both lightweight setups (Intel i7 CPU, 16GB RAM) and advanced environments with GPUs (e.g., NVIDIA Tesla T4), supporting complex tasks on large datasets.
38
+
39
+ ## πŸ”¬ Methodological Workflow
40
+ CER follows a structured workflow in three main phases:
41
+
42
+ 1. **Evidence Retrieval**: Relevant abstracts are extracted from PubMed using a BM25 retrieval engine.
43
+ 2. **Justification Generation**: The LLM generates explanations based on the retrieved abstracts.
44
+ 3. **Claim Classification**: The classifier evaluates each claim as true, false, or "not enough evidence."
45
+
46
+ ![Methodology](./Methodology.png)
47
+
48
+ ## 🌟 Key Features
49
+ - **Zero-Shot and Fine-Tuned Classification**: Provides reliable fact-checking without the need for extensive task-specific labeled data.
50
+ - **Robustness Across Datasets**: Fine-tuning enhances model performance, even when the training and test sets differ.
51
+ - **Efficient Retrieval**: Leverages the Sparse Retriever for quick and accurate evidence extraction from PubMed.
52
+ - **Transparency**: Generates justifications to explain the classification of each claim, ensuring transparency and interpretability.
53
+
54
+ ## πŸš€ Getting Started
55
+ Follow these steps to use the CER system demo:
56
+
57
+ ### Prerequisites
58
+ - **Python 3.9+**
59
+ - Required libraries: Install with the command:
60
+ ```bash
61
+ pip install -r requirements.txt
62
+ ```
63
+
64
+ ### Running the Application
65
+ 1. **Clone the repository**:
66
+ ```bash
67
+ git clone https://github.com/picuslab/CER-Fact-Checking.git
68
+ cd CER-Fact-Checking
69
+ ```
70
+ 2. **Create a virtual environment**:
71
+ ```bash
72
+ python -m venv venv
73
+ source venv/bin/activate # On Windows use `venv\Scripts\activate`
74
+ ```
75
+ 3. **Run the Streamlit application**:
76
+ ```bash
77
+ streamlit run app.py
78
+ ```
79
+ Open your browser and go to `http://localhost:8501` to interact with the application.
80
+
81
+ ### Submitting Claims
82
+ Enter a biomedical claim, for example:
83
+ ```
84
+ "Vitamin D reduces the risk of osteoporosis."
85
+ ```
86
+ Observe the process of evidence retrieval, justification generation, and classification.
87
+
88
+ ## πŸ“ˆ Conclusions
89
+ CER demonstrates how fact-checking using LLMs and evidence retrieval techniques can improve the reliability of medical information. Fine-tuning LLMs proves to be a powerful strategy for enhancing accuracy in fact-checking, even across different datasets. The ability to separate prediction from explanation ensures transparency and reduces bias.
90
+
91
+ ## βš– Ethical Considerations
92
+ **CER** is a decision-support tool, not a substitute for professional medical advice. All recommendations must be validated by authorized healthcare providers. This demo uses anonymized data for illustrative purposes.
93
+
94
+ ## πŸ™ Acknowledgments
95
+ Special thanks to the dataset creators, library developers, and the research team for their contributions to this project.
96
+
97
+ πŸ‘¨β€πŸ’» This project was developed by Mariano Barone, Antonio Romano, Giuseppe Riccio, Marco Postiglione, and Vincenzo Moscato at *University of Naples, Federico II*.