Spaces:

mabil
/

NORUS2

Sleeping

App Files Files Community

mabil commited on 17 days ago

Commit

af53f00

1 Parent(s): 0767e59

Uploaded NORUS app files

Browse files

Files changed (20) hide show

.DS_Store +0 -0
Dockerfile +27 -0
README.md +81 -8
app.py +216 -0
app_setup.sh +6 -0
index.html +0 -19
models/.DS_Store +0 -0
models/__pycache__/similarity_model.cpython-313.pyc +0 -0
models/similarity_model.py +32 -0
requirements.txt +14 -0
start_local.sh +8 -0
static/.DS_Store +0 -0
static/css/.DS_Store +0 -0
static/css/style.css +221 -0
static/js/.DS_Store +0 -0
static/js/script.js +102 -0
style.css +0 -28
templates/.DS_Store +0 -0
templates/NORUS.html +180 -0
templates/app.py +134 -0

.DS_Store ADDED Viewed

Binary file (10.2 kB). View file

Dockerfile ADDED Viewed

	@@ -0,0 +1,27 @@

+FROM python:3.9
+# 1. Crea utente non-root (richiesto da Hugging Face)
+RUN useradd -m -u 1000 user
+USER user
+ENV PATH="/home/user/.local/bin:$PATH"
+# 2. Crea directory di lavoro
+WORKDIR /app
+# 3. Copia requirements e installa pacchetti
+COPY --chown=user requirements.txt .
+RUN pip install --no-cache-dir --upgrade -r requirements.txt
+# 4. Copia script di setup per NLTK
+COPY --chown=user app_setup.sh .
+RUN chmod +x app_setup.sh && ./app_setup.sh
+# 5. Copia tutto il resto dell'app
+COPY --chown=user . .
+# 6. Imposta variabile per NLTK
+ENV NLTK_DATA="/home/user/nltk_data"
+# 7. Avvia l'app
+CMD ["python", "app.py"]

README.md CHANGED Viewed

@@ -1,11 +1,84 @@
 ---
-title: NORUS2
-emoji: 🏆
-colorFrom: gray
-colorTo: red
-sdk: static
-pinned: false
-short_description: Neural ORiginality Understanding System
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Norus Tool
+emoji: 🔥
+colorFrom: green
+colorTo: purple
+sdk: docker
+app_file: app.py
+pinned: true
 ---
+# NORUS Tool 🧠📄
+[![🤗 Hugging Face Spaces](https://img.shields.io/badge/🤗-HuggingFace_Spaces-blue)](https://huggingface.co/spaces/mabil/norus-tool)
+**NORUS** (Novelty and Originality Recognition Utility System) è uno strumento basato su intelligenza artificiale che consente l'analisi semantica di articoli scientifici in formato PDF, confrontandoli con articoli locali o pubblicati su PubMed. Il tool restituisce misure di **similarità semantica**, **token overlap** e un indice composito chiamato **OUI (Originality & Uniqueness Index)**.
+## 🚀 Funzionalità principali
+- ✅ Caricamento PDF da analizzare
+- 📂 Confronto con PDF locali o articoli da PubMed
+- 🤖 Estrazione di embedding semantici tramite SciBERT
+- 📊 Calcolo di:
+  - Similarità semantica (cosine similarity)
+  - Sovrapposizione testuale (token overlap)
+  - Indice OUI (originalità e novità)
+- 📈 Visualizzazione interattiva dei risultati via Chart.js
+## 🧪 OUI - Originality & Uniqueness Index
+\`\`\`math
+OUI = 1 - (α × semantic_similarity + β × token_overlap)
+\`\`\`
+- α = 0.7 → penalizza la somiglianza semantica
+- β = 0.3 → penalizza la ripetizione letterale
+- L'OUI misura **quanto un documento è originale**, sia nel contenuto che nella forma.
+## 🧱 Architettura
+- `Flask` come backend web
+- `pdfplumber` per l'estrazione del testo dai PDF
+- `nltk` per preprocessing linguistico
+- `sentence-transformers` con modello `allenai/scibert_scivocab_uncased`
+- `requests` per l'interfaccia con PubMed
+## 📂 Struttura del progetto
+```
+.
+├── app.py
+├── Dockerfile
+├── requirements.txt
+├── static/
+├── templates/
+├── uploads/
+├── README.md
+```
+## ▶️ Esecuzione locale
+Per eseguire localmente:
+1. Assicurati di avere Python 3.9+
+2. Installa le dipendenze:
+\`\`\`bash
+pip install -r requirements.txt
+\`\`\`
+3. Avvia l'app:
+\`\`\`bash
+python app.py
+\`\`\`
+Apri il browser su `http://localhost:7860`
+## 📡 Deploy su Hugging Face Spaces
+Puoi caricare questo progetto come Space Docker-based su Hugging Face. Il `Dockerfile` è già configurato.
+---
+---
+🧠 Developed by Marina Bilotta – Computational Chemistry & AI Research

app.py ADDED Viewed

	@@ -0,0 +1,216 @@

+import os
+import requests
+import pdfplumber
+from flask import Flask, render_template, request, redirect, url_for, flash, send_file
+from werkzeug.utils import secure_filename
+from sentence_transformers import SentenceTransformer, util
+from transformers import AutoTokenizer
+from fpdf import FPDF
+from collections import Counter
+from io import BytesIO
+tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+app = Flask(__name__)
+app.secret_key = os.environ.get("SECRET_KEY", "NORUS_secretkey_05")
+app.config["UPLOAD_FOLDER"] = "uploads"
+os.makedirs(app.config["UPLOAD_FOLDER"], exist_ok=True)
+model = SentenceTransformer("allenai/scibert_scivocab_uncased")
+last_results = []
+last_common_keywords = []
+def extract_pdf_text(pdf_path):
+    text = ""
+    try:
+        with pdfplumber.open(pdf_path) as pdf:
+            for page in pdf.pages:
+                text += page.extract_text() or " "
+    except Exception as e:
+        print(f"Errore estrazione testo: {e}")
+    return text.lower().strip()
+def preprocess_text(text):
+    tokens = tokenizer.tokenize(text.lower())
+    tokens = [token for token in tokens if len(token) > 3 and token.isalpha()]
+    return tokens
+def calculate_token_overlap(text1, text2):
+    tokens1 = set(text1.split())
+    tokens2 = set(text2.split())
+    overlap = len(tokens1 & tokens2)
+    return round((overlap / max(len(tokens1), 1)) * 100, 2)
+def calculate_oui(similarity, token_overlap, alpha=0.7, beta=0.3):
+    oui = alpha * (1 - similarity / 100) + beta * (1 - token_overlap / 100)
+    result = round(oui * 100, 2)
+    return 0.0 if result == -0.0 else result
+def validate_document(pdf_path, comparison_sources, method="local", titles=None):
+    pdf_text = extract_pdf_text(pdf_path)
+    pdf_tokens = preprocess_text(pdf_text)
+    results = []
+    all_keywords = []
+    for i, doc in enumerate(comparison_sources):
+        doc_text = extract_pdf_text(doc) if method == "local" else doc
+        doc_tokens = preprocess_text(doc_text)
+        similarity = util.pytorch_cos_sim(
+            model.encode(pdf_text, convert_to_tensor=True),
+            model.encode(doc_text, convert_to_tensor=True)
+        ).item() * 100
+        token_overlap = calculate_token_overlap(" ".join(pdf_tokens), " ".join(doc_tokens))
+        oui = calculate_oui(similarity, token_overlap)
+        title = titles[i] if titles and i < len(titles) else os.path.basename(doc) if method == "local" else "Unknown Title"
+        common_keywords = list(set(pdf_tokens) & set(doc_tokens))[:5]
+        all_keywords.extend(common_keywords)
+        results.append({
+            "title": title,
+            "similarity": round(similarity, 2),
+            "token_overlap": round(token_overlap, 2),
+            "oui": round(oui, 2)
+        })
+    global last_results, last_common_keywords
+    last_results = results
+    last_common_keywords = Counter(all_keywords).most_common(10)
+    return results
+def fetch_pubmed_details(article_id):
+    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
+    params = {"db": "pubmed", "id": article_id, "retmode": "xml"}
+    try:
+        response = requests.get(base_url, params=params)
+        response.raise_for_status()
+        import xml.etree.ElementTree as ET
+        root = ET.fromstring(response.text)
+        title = root.find(".//ArticleTitle").text if root.find(".//ArticleTitle") is not None else "No Title"
+        abstract = root.find(".//AbstractText").text if root.find(".//AbstractText") is not None else "No Abstract"
+        keywords = root.findall(".//Keyword")
+        keyword_text = " ".join([kw.text for kw in keywords if kw.text]) if keywords else ""
+        return title, f"{abstract} {keyword_text}"
+    except Exception as e:
+        print(f"Errore recupero abstract: {e}")
+        return "No Title", "No Abstract"
+def fetch_pubmed(query, year_start, year_end, max_results=10):
+    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
+    params = {
+        "db": "pubmed",
+        "term": f"{query} AND ({year_start}[PDAT] : {year_end}[PDAT])",
+        "retmax": max_results,
+        "retmode": "json",
+        "sort": "relevance"  # <-- Qui abbiamo ordinato per rilevanza
+    }
+    try:
+        response = requests.get(base_url, params=params)
+        response.raise_for_status()
+        id_list = response.json().get("esearchresult", {}).get("idlist", [])
+        return id_list
+    except Exception as e:
+        print(f"Errore fetch PubMed: {e}")
+        return []
+@app.route("/")
+def index():
+    return render_template("NORUS.html")
+@app.route("/validate", methods=["POST"])
+def validate():
+    pdf_file = request.files.get("pdf_file")
+    analysis_type = request.form.get("analysis_type")
+    query = request.form.get("query", "").strip()
+    if not pdf_file:
+        flash("Carica un file PDF valido.", "error")
+        return redirect(url_for("index"))
+    filename = secure_filename(pdf_file.filename)
+    pdf_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
+    pdf_file.save(pdf_path)
+    if analysis_type == "local":
+        comparison_files = request.files.getlist("comparison_files")
+        saved_paths = []
+        for file in comparison_files:
+            if file and file.filename.endswith(".pdf"):
+                fname = secure_filename(file.filename)
+                path = os.path.join(app.config["UPLOAD_FOLDER"], fname)
+                file.save(path)
+                saved_paths.append(path)
+        if not saved_paths:
+            flash("Nessun file di confronto caricato.", "error")
+            return redirect(url_for("index"))
+        results = validate_document(pdf_path, saved_paths, method="local")
+    else:
+        year_start = request.form.get("year_start", "2000")
+        year_end = request.form.get("year_end", "2025")
+        num_articles = int(request.form.get("num_articles", "10"))
+        pubmed_ids = fetch_pubmed(query, year_start, year_end, num_articles)
+        if not pubmed_ids:
+            flash("Nessun articolo trovato su PubMed per questa ricerca.", "error")
+            return redirect(url_for("index"))
+        pubmed_results = [fetch_pubmed_details(id_) for id_ in pubmed_ids]
+        pubmed_texts = [r[1] for r in pubmed_results]
+        pubmed_titles = [r[0] for r in pubmed_results]
+        results = validate_document(pdf_path, pubmed_texts, method="pubmed", titles=pubmed_titles)
+    return render_template("NORUS.html", results=results, keywords=last_common_keywords)
+@app.route("/download_report", methods=["POST"])
+def download_report():
+    if not last_results:
+        flash("Nessun risultato da esportare.", "error")
+        return redirect(url_for("index"))
+    pdf = FPDF()
+    pdf.add_page()
+    pdf.set_font("Arial", "B", 16)
+    pdf.cell(0, 10, "NORUS Tool - Report Analisi", ln=True, align="C")
+    pdf.ln(10)
+    pdf.set_font('Arial', '', 12)
+    pdf.multi_cell(0, 10, "Indice OUI = alpha(1 - sim/100) + beta(1 - overlap/100), con alpha = 0.7 e beta = 0.3.\nValori più bassi di OUI indicano maggiore similarità semantica e testuale.")
+    pdf.ln(5)
+    pdf.set_font("Arial", "B", 12)
+    pdf.cell(90, 10, "Titolo", 1)
+    pdf.cell(30, 10, "Sim %", 1)
+    pdf.cell(30, 10, "Overlap %", 1)
+    pdf.cell(30, 10, "OUI", 1)
+    pdf.ln()
+    pdf.set_font("Arial", "", 11)
+    for res in last_results:
+        title = res["title"][:40] + "..." if len(res["title"]) > 43 else res["title"]
+        pdf.cell(90, 10, title, 1)
+        pdf.cell(30, 10, str(res["similarity"]), 1)
+        pdf.cell(30, 10, str(res["token_overlap"]), 1)
+        pdf.cell(30, 10, str(res["oui"]), 1)
+        pdf.ln()
+    if last_common_keywords:
+        pdf.ln(6)
+        pdf.set_font("Arial", "B", 12)
+        pdf.cell(0, 10, "Parole chiave comuni:", ln=True)
+        pdf.set_font("Arial", "", 11)
+        for kw, count in last_common_keywords:
+            pdf.cell(0, 10, f"- {kw} ({count})", ln=True)
+    pdf.set_y(-20)
+    pdf.set_font("Arial", "I", 9)
+    pdf.cell(0, 10, "© 2025 NORUS Tool", 0, 0, "C")
+    output_path = os.path.join(app.config["UPLOAD_FOLDER"], "NORUS_Report.pdf")
+    pdf.output(output_path, 'F')
+    return send_file(output_path, as_attachment=True)
+if __name__ == "__main__":
+    app.run(debug=True, host="0.0.0.0", port=7860)

app_setup.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+echo ">>> Setup NLTK resources..."
+mkdir -p /home/user/nltk_data
+python3 -m nltk.downloader -d /home/user/nltk_data punkt stopwords wordnet

index.html DELETED Viewed

@@ -1,19 +0,0 @@
-<!doctype html>
-<html>
-	<head>
-		<meta charset="utf-8" />
-		<meta name="viewport" content="width=device-width" />
-		<title>My static Space</title>
-		<link rel="stylesheet" href="style.css" />
-	</head>
-	<body>
-		<div class="card">
-			<h1>Welcome to your static Space!</h1>
-			<p>You can modify this app directly by editing <i>index.html</i> in the Files and versions tab.</p>
-			<p>
-				Also don't forget to check the
-				<a href="https://huggingface.co/docs/hub/spaces" target="_blank">Spaces documentation</a>.
-			</p>
-		</div>
-	</body>
-</html>

models/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

models/__pycache__/similarity_model.cpython-313.pyc ADDED Viewed

Binary file (762 Bytes). View file

models/similarity_model.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics.pairwise import cosine_similarity
+def compute_similarity(text1, text2):
+    """
+    Calcola la similarità del coseno tra due testi usando TF-IDF.
+    Parametri:
+        text1 (str): Primo testo.
+        text2 (str): Secondo testo.
+    Ritorna:
+        float: Valore di similarità (0 a 1).
+    """
+    try:
+        # Verifica che i testi non siano vuoti
+        if not text1.strip() or not text2.strip():
+            raise ValueError("Uno o entrambi i testi sono vuoti.")
+        # Vettorizzazione con TF-IDF
+        vectorizer = TfidfVectorizer(stop_words='english')
+        tfidf_matrix = vectorizer.fit_transform([text1, text2])
+        # Calcolo della similarità del coseno
+        similarity_matrix = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:2])
+        return similarity_matrix[0][0]  # Ritorna il valore della similarità
+    except Exception as e:
+        print(f"Errore durante il calcolo della similarità: {e}")
+        return None

requirements.txt ADDED Viewed

	@@ -0,0 +1,14 @@

+fpdf
+flask
+pdfplumber
+nltk
+sentence-transformers
+scikit-learn
+pandas
+reportlab
+matplotlib
+requests
+keybert
+torch
+transformers
+spacy

start_local.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/bin/bash
+echo "⚙️ Avvio dell'ambiente NORUS..."
+python3 -m venv venv
+source venv/bin/activate
+pip install --upgrade pip
+pip install -r requirements.txt
+echo "✅ Ambiente pronto. Avvio del server Flask..."
+python app.py

static/.DS_Store ADDED Viewed

Binary file (8.2 kB). View file

static/css/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

static/css/style.css ADDED Viewed

	@@ -0,0 +1,221 @@

+/* Reset base */
+html, body {
+    height: 100%;
+    margin: 0;
+    padding: 0;
+    overflow-y: auto;
+}
+/* Corpo */
+body {
+    font-family: Arial, sans-serif;
+    background-color: #f8f8f8;
+    color: #333;
+    display: flex;
+    flex-direction: column;
+    min-height: 100vh;
+}
+/* Header */
+header {
+    background-color: rgba(42, 77, 111, 0.9);
+    color: #fff;
+    padding: 20px;
+    text-align: center;
+}
+header h1 {
+    margin-bottom: 10px;
+    font-size: 2.2em;
+}
+header p {
+    font-size: 1.2em;
+}
+/* Logo */
+#logo {
+    display: block;
+    margin: 0 auto;
+    max-width: 200px;
+    height: auto;
+    cursor: pointer;
+    transition: transform 0.3s ease;
+}
+#logo:hover {
+    transform: scale(1.2);
+}
+/* Form principale */
+form {
+    margin: 20px auto;
+    width: 90%;
+    max-width: 800px;
+    padding: 25px;
+    background-color: #fff;
+    border-radius: 12px;
+    box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
+}
+label {
+    display: block;
+    font-size: 1.05em;
+    margin: 12px 0 5px;
+    font-weight: bold;
+    color: #2a4d6f;
+}
+input[type="file"],
+input[type="text"],
+input[type="number"],
+select {
+    width: 100%;
+    padding: 10px;
+    margin-bottom: 15px;
+    border: 1px solid #ccc;
+    border-radius: 6px;
+    font-size: 1em;
+    box-sizing: border-box;
+}
+/* Input file */
+input[type="file"]::file-selector-button {
+    padding: 6px 12px;
+    margin-right: 10px;
+    background-color: #2a4d6f;
+    color: white;
+    border: none;
+    border-radius: 5px;
+    cursor: pointer;
+    transition: background-color 0.3s;
+}
+input[type="file"]::file-selector-button:hover {
+    background-color: #1a3d56;
+}
+/* Pulsanti */
+button {
+    width: 100%;
+    background-color: #2a4d6f;
+    color: #fff;
+    padding: 12px;
+    border: none;
+    border-radius: 6px;
+    font-size: 1.1em;
+    cursor: pointer;
+    transition: background-color 0.3s, transform 0.2s;
+    margin-top: 10px;
+}
+button:hover {
+    background-color: #1a3d56;
+    transform: scale(1.02);
+}
+/* Risultati */
+.results {
+    padding: 25px;
+    background-color: #fff;
+    margin: 30px auto;
+    border-radius: 12px;
+    box-shadow: 0 5px 15px rgba(0, 0, 0, 0.1);
+    max-width: 1000px;
+    overflow-x: auto;
+}
+/* Tabelle */
+table {
+    width: 100%;
+    border-collapse: collapse;
+    margin-top: 25px;
+}
+th {
+    background-color: #2a4d6f;
+    color: #fff;
+    padding: 12px;
+    text-align: left;
+}
+td {
+    padding: 12px;
+    border-bottom: 1px solid #ddd;
+    background-color: #f9f9f9;
+}
+table tr:hover {
+    background-color: #eef3f7;
+}
+table th, table td {
+    font-size: 1em;
+    word-wrap: break-word;
+}
+/* Grafico */
+#chart-container {
+    width: 100%;
+    max-width: 1000px;
+    height: 500px;
+    margin: 40px auto;
+}
+canvas {
+    width: 100% !important;
+    height: 100% !important;
+    display: block;
+}
+/* Barra di caricamento */
+#progress-container {
+    width: 100%;
+    background-color: #e0e0e0;
+    border-radius: 20px;
+    overflow: hidden;
+    margin-top: 20px;
+}
+#progress-bar {
+    height: 20px;
+    width: 0;
+    background: linear-gradient(90deg, #4caf50 0%, #8bc34a 100%);
+    text-align: center;
+    line-height: 20px;
+    color: white;
+    font-weight: bold;
+    transition: width 0.4s ease;
+}
+/* Quando al 100%, barra diventa blu */
+#progress-bar.complete {
+    background: linear-gradient(90deg, #2196f3 0%, #21cbf3 100%);
+}
+/* Footer */
+footer {
+    background-color: #2a4d6f;
+    color: #fff;
+    text-align: center;
+    padding: 15px;
+    width: 100%;
+    font-size: 1em;
+    margin-top: auto;
+}
+/* Responsive layout */
+@media screen and (max-width: 600px) {
+    form, .results {
+        width: 95%;
+        padding: 15px;
+    }
+    header h1 {
+        font-size: 1.5em;
+    }
+    header p {
+        font-size: 1em;
+    }
+}

static/js/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

static/js/script.js ADDED Viewed

	@@ -0,0 +1,102 @@

+document.addEventListener("DOMContentLoaded", function () {
+    const logoLink = document.getElementById("logo-link");
+    if (logoLink) {
+        logoLink.addEventListener("click", function () {
+            const logo = document.getElementById("logo");
+            logo.style.transform = "scale(1.5)";
+            setTimeout(() => {
+                logo.style.transform = "scale(1)";
+            }, 500);
+        });
+    }
+    function startProgress() {
+        const progressBar = document.getElementById("progress-bar");
+        const progressContainer = document.getElementById("progress-container");
+        const analyzeBtn = document.querySelector("button[type='submit']");
+        if (progressBar && progressContainer && analyzeBtn) {
+            progressContainer.style.display = "block";
+            analyzeBtn.disabled = true;
+            analyzeBtn.textContent = "⏳ Analysis in progress...";
+            let width = 0;
+            const totalTime = 180000;  // 3 minutes
+            const intervalTime = totalTime / 100;
+            const interval = setInterval(() => {
+                if (width >= 100) {
+                    clearInterval(interval);
+                    progressBar.textContent = "100%";
+                    setTimeout(() => {
+                        progressContainer.style.display = "none";
+                        progressBar.style.width = "0%";
+                        progressBar.textContent = "0%";
+                        analyzeBtn.disabled = false;
+                        analyzeBtn.textContent = "Analyze";
+                    }, 1000);
+                } else {
+                    width += 1;
+                    progressBar.style.width = width + "%";
+                    progressBar.textContent = width + "%";
+                }
+            }, intervalTime);
+            // fallback di sicurezza
+            setTimeout(() => {
+                analyzeBtn.disabled = false;
+                analyzeBtn.textContent = "Analyze";
+                progressContainer.style.display = "none";
+                progressBar.style.width = "0%";
+                progressBar.textContent = "0%";
+            }, totalTime + 3000);
+        }
+    }
+    window.startProgress = startProgress;
+    const analysisForm = document.getElementById("analysisForm");
+    if (analysisForm) {
+        analysisForm.addEventListener("submit", function () {
+            startProgress();
+        });
+    }
+    const analysisType = document.getElementById("analysis_type");
+    if (analysisType) {
+        analysisType.addEventListener("change", function () {
+            document.getElementById("pubmed-options").style.display =
+                this.value === "pubmed" ? "block" : "none";
+            document.getElementById("local-options").style.display =
+                this.value === "local" ? "block" : "none";
+        });
+        analysisType.dispatchEvent(new Event("change"));
+    }
+    const fileInput = document.getElementById("pdf_file");
+    if (fileInput) {
+        fileInput.addEventListener("change", function () {
+            const fileLabel = document.querySelector('label[for="pdf_file"]');
+            if (fileInput.files.length > 0 && fileLabel) {
+                fileLabel.textContent = `Main PDF selected: ${fileInput.files[0].name}`;
+            }
+        });
+    }
+    const comparisonInput = document.getElementById("comparison_files");
+    if (comparisonInput) {
+        comparisonInput.addEventListener("change", function () {
+            const label = document.querySelector('label[for="comparison_files"]');
+            if (comparisonInput.files.length > 0 && label) {
+                label.textContent = `${comparisonInput.files.length} comparison files selected`;
+            }
+        });
+    }
+    const flashMessages = document.querySelectorAll(".error");
+    if (flashMessages.length > 0) {
+        setTimeout(() => {
+            flashMessages.forEach(message => message.remove());
+        }, 5000);
+    }
+});

style.css DELETED Viewed

@@ -1,28 +0,0 @@
-body {
-	padding: 2rem;
-	font-family: -apple-system, BlinkMacSystemFont, "Arial", sans-serif;
-}
-h1 {
-	font-size: 16px;
-	margin-top: 0;
-}
-p {
-	color: rgb(107, 114, 128);
-	font-size: 15px;
-	margin-bottom: 10px;
-	margin-top: 5px;
-}
-.card {
-	max-width: 620px;
-	margin: 0 auto;
-	padding: 16px;
-	border: 1px solid lightgray;
-	border-radius: 16px;
-}
-.card p:last-child {
-	margin-bottom: 0;
-}

templates/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

templates/NORUS.html ADDED Viewed

	@@ -0,0 +1,180 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
+  <title>NORUS Tool</title>
+  <link rel="stylesheet" href="{{ url_for('static', filename='css/style.css') }}">
+  <script src="https://cdn.jsdelivr.net/npm/chart.js"></script>
+  <script src="{{ url_for('static', filename='js/script.js') }}"></script>
+</head>
+<body>
+<header>
+  <div style="text-align: center; margin-top: 20px;">
+    <a href="#" id="logo-link">
+      <img id="logo" src="https://i.imgur.com/MT5Sl9h.png" alt="NORUS Logo" style="width: 150px;" />
+    </a>
+  </div>
+  <h1>NORUS Tool</h1>
+  <p>Analyze your PDF and discover originality and similarity</p>
+</header>
+<main>
+  <form id="analysisForm" action="/validate" method="POST" enctype="multipart/form-data" onsubmit="startProgress()">
+    <label for="analysis_type">Choose Analysis Type:</label>
+    <select name="analysis_type" id="analysis_type" required>
+      <option value="local">Local Comparison</option>
+      <option value="pubmed">PubMed Search</option>
+    </select>
+    <div id="pubmed-options" style="display: none;">
+      <label for="query">PubMed Query:</label>
+      <input type="text" name="query" id="query" />
+      <label for="year_start">Start Year:</label>
+      <input type="number" name="year_start" id="year_start" min="1900" max="2025" value="2000" />
+      <label for="year_end">End Year:</label>
+      <input type="number" name="year_end" id="year_end" min="1900" max="2025" value="2025" />
+      <label for="num_articles">Number of Articles:</label>
+      <input type="number" name="num_articles" id="num_articles" min="1" value="10" />
+    </div>
+    <div id="local-options" style="display: none;">
+      <label for="comparison_files">Upload comparison PDFs (select multiple):</label>
+      <input type="file" name="comparison_files" id="comparison_files" multiple />
+    </div>
+    <label for="pdf_file">Upload your main PDF:</label>
+    <input type="file" name="pdf_file" id="pdf_file" required />
+    <button type="submit">Analyze</button>
+  </form>
+  <div id="progress-container" style="display: none;">
+    <p style="text-align: center;">⏳ Analysis in progress...</p>
+    <div id="progress-bar">0%</div>
+  </div>
+  {% if results %}
+  <section>
+    <h2>Analysis Results</h2>
+    <table>
+      <thead>
+        <tr>
+          <th>Title</th>
+          <th>Semantic Similarity (%)</th>
+          <th>Token Overlap (%)</th>
+          <th>OUI (Originality & Uniqueness Index)</th>
+        </tr>
+      </thead>
+      <tbody>
+        {% for result in results %}
+        <tr>
+          <td style="max-width: 400px; word-wrap: break-word;">{{ result.title }}</td>
+          <td>{{ "%.2f"|format(result.similarity) }}</td>
+          <td>{{ "%.2f"|format(result.token_overlap) }}</td>
+          <td>{{ "%.2f"|format(result.oui) }}</td>
+        </tr>
+        {% endfor %}
+      </tbody>
+    </table>
+    {% if keywords %}
+    <div class="results" style="text-align: center; margin-top: 30px;">
+      <h3>🔑 Common Keywords</h3>
+      <p>
+        {% for kw, count in keywords %}
+          <span style="margin: 5px; font-weight: bold;">{{ kw }} ({{ count }})</span>
+        {% endfor %}
+      </p>
+    </div>
+    {% endif %}
+    <form action="/download_report" method="post" style="text-align: center; margin-top: 30px;">
+      <button type="submit">📄 Download PDF Report</button>
+    </form>
+    <div id="chart-container" style="margin-top: 50px;">
+      <canvas id="similarityChart"></canvas>
+    </div>
+  </section>
+  {% endif %}
+</main>
+<footer><p>&copy; 2025 NORUS Tool. All rights reserved.</p></footer>
+<script>
+  document.addEventListener("DOMContentLoaded", function() {
+    const analysisType = document.getElementById("analysis_type");
+    const pubmedOptions = document.getElementById("pubmed-options");
+    const localOptions = document.getElementById("local-options");
+    function toggleOptions() {
+      if (analysisType.value === "pubmed") {
+        pubmedOptions.style.display = "block";
+        localOptions.style.display = "none";
+      } else {
+        pubmedOptions.style.display = "none";
+        localOptions.style.display = "block";
+      }
+    }
+    analysisType.addEventListener("change", toggleOptions);
+    toggleOptions();
+  });
+</script>
+{% if results %}
+<script>
+  new Chart(document.getElementById('similarityChart'), {
+    type: 'bar',
+    data: {
+      labels: {{ results | map(attribute='title') | list | safe }},
+      datasets: [
+        {
+          label: 'Semantic Similarity (%)',
+          data: {{ results | map(attribute='similarity') | list | safe }},
+          backgroundColor: 'rgba(54, 162, 235, 0.7)',
+          borderColor: 'rgba(54, 162, 235, 1)',
+          borderWidth: 1
+        },
+        {
+          label: 'Token Overlap (%)',
+          data: {{ results | map(attribute='token_overlap') | list | safe }},
+          backgroundColor: 'rgba(255, 159, 64, 0.7)',
+          borderColor: 'rgba(255, 159, 64, 1)',
+          borderWidth: 1
+        },
+        {
+          label: 'OUI (%)',
+          data: {{ results | map(attribute='oui') | list | safe }},
+          backgroundColor: 'rgba(153, 102, 255, 0.7)',
+          borderColor: 'rgba(153, 102, 255, 1)',
+          borderWidth: 1
+        }
+      ]
+    },
+    options: {
+      responsive: true,
+      plugins: {
+        legend: { position: 'top' },
+        tooltip: { mode: 'index', intersect: false }
+      },
+      scales: {
+        y: { beginAtZero: true },
+        x: {
+          ticks: {
+            autoSkip: false,
+            maxRotation: 45,
+            minRotation: 45
+          }
+        }
+      }
+    }
+  });
+</script>
+{% endif %}
+</body>
+</html>

templates/app.py ADDED Viewed

	@@ -0,0 +1,134 @@

+import os
+import requests
+import pdfplumber
+import numpy as np
+from flask import Flask, render_template, request, redirect, url_for, flash
+from werkzeug.utils import secure_filename
+from sentence_transformers import SentenceTransformer, util
+import nltk
+from nltk.stem import WordNetLemmatizer, PorterStemmer
+from nltk.tokenize import word_tokenize
+from nltk.corpus import stopwords
+nltk.download("punkt")
+nltk.download("wordnet")
+nltk.download("stopwords")
+lemmatizer = WordNetLemmatizer()
+stemmer = PorterStemmer()
+stop_words = set(stopwords.words("english"))
+app = Flask(__name__)
+app.config["UPLOAD_FOLDER"] = "uploads"
+os.makedirs(app.config["UPLOAD_FOLDER"], exist_ok=True)
+model = SentenceTransformer("allenai/scibert_scivocab_uncased")
+def extract_pdf_text(pdf_path):
+    text = ""
+    try:
+        with pdfplumber.open(pdf_path) as pdf:
+            for page in pdf.pages:
+                text += page.extract_text() or " "
+    except Exception as e:
+        print(f"Errore estrazione testo: {e}")
+    return text.lower().strip()
+def preprocess_text(text):
+    text = text.lower()
+    words = word_tokenize(text)
+    words = [stemmer.stem(lemmatizer.lemmatize(w)) for w in words if w.isalnum() and w not in stop_words]
+    return " ".join(words)
+def calculate_token_overlap(text1, text2):
+    tokens1 = set(text1.split())
+    tokens2 = set(text2.split())
+    overlap = len(tokens1 & tokens2)
+    return round((overlap / max(len(tokens1), 1)) * 100, 2)
+def calculate_oui(similarity, token_overlap, alpha=0.7, beta=0.3):
+    oui = alpha * (1 - similarity / 100) + beta * (1 - token_overlap / 100)
+    return round(max(0, min(oui * 100, 100)), 2)
+def validate_document(pdf_path, comparison_sources, method="local", titles=None):
+    pdf_text = extract_pdf_text(pdf_path)
+    results = []
+    for i, doc in enumerate(comparison_sources):
+        doc_text = extract_pdf_text(doc) if method == "local" else doc
+        similarity = util.pytorch_cos_sim(
+            model.encode(pdf_text, convert_to_tensor=True),
+            model.encode(doc_text, convert_to_tensor=True)
+        ).item() * 100
+        token_overlap = calculate_token_overlap(pdf_text, doc_text)
+        oui = calculate_oui(similarity, token_overlap)
+        title = titles[i] if titles and i < len(titles) else os.path.basename(doc) if method == "local" else "Unknown Title"
+        results.append({"title": title, "similarity": round(similarity, 2), "token_overlap": round(token_overlap, 2), "oui": round(oui, 2)})
+    return results
+def fetch_pubmed_details(article_id):
+    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi"
+    params = {"db": "pubmed", "id": article_id, "retmode": "xml"}
+    try:
+        response = requests.get(base_url, params=params)
+        response.raise_for_status()
+        import xml.etree.ElementTree as ET
+        root = ET.fromstring(response.text)
+        title = root.find(".//ArticleTitle").text if root.find(".//ArticleTitle") is not None else "No Title"
+        abstract = root.find(".//AbstractText").text if root.find(".//AbstractText") is not None else "No Abstract"
+        keywords = root.findall(".//Keyword")
+        keyword_text = " ".join([kw.text for kw in keywords if kw.text]) if keywords else "No Keywords"
+        print(f"\n🔍 ARTICOLO RECUPERATO\n📖 Titolo: {title}\n📝 Abstract: {abstract[:500]}...\n🔑 Keywords: {keyword_text}\n")
+        return title, f"{abstract} {keyword_text}"
+    except requests.exceptions.RequestException as e:
+        print(f"Errore recupero abstract: {e}")
+        return "No Title", "No Abstract"
+def fetch_pubmed(query, year_start, year_end, max_results=10):
+    base_url = "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi"
+    params = {"db": "pubmed", "term": f"{query} AND ({year_start}[PDAT] : {year_end}[PDAT])", "retmax": max_results, "retmode": "json"}
+    try:
+        response = requests.get(base_url, params=params)
+        response.raise_for_status()
+        return response.json().get("esearchresult", {}).get("idlist", [])
+    except requests.exceptions.RequestException as e:
+        print(f"Errore recupero articoli PubMed: {e}")
+        return []
+@app.route("/")
+def index():
+    return render_template("NORUS.html")
+@app.route("/validate", methods=["POST"])
+def validate():
+    pdf_file = request.files.get("pdf_file")
+    analysis_type = request.form.get("analysis_type")
+    local_dir = request.form.get("local_directory", "").strip()
+    query = request.form.get("query", "").strip()
+    if not pdf_file:
+        flash("Carica un file PDF valido.", "error")
+        return redirect(url_for("index"))
+    filename = secure_filename(pdf_file.filename)
+    pdf_path = os.path.join(app.config["UPLOAD_FOLDER"], filename)
+    pdf_file.save(pdf_path)
+    results = []
+    if analysis_type == "local":
+        if not os.path.isdir(local_dir):
+            flash("Seleziona una directory valida.", "error")
+            return redirect(url_for("index"))
+        comparison_files = [os.path.join(local_dir, f) for f in os.listdir(local_dir) if f.endswith(".pdf")]
+        if not comparison_files:
+            flash("La directory non contiene PDF.", "error")
+            return redirect(url_for("index"))
+        results = validate_document(pdf_path, comparison_files, method="local")
+    elif analysis_type == "pubmed":
+        year_start = request.form.get("year_start", "2000")
+        year_end = request.form.get("year_end", "2025")
+        num_articles = int(request.form.get("num_articles", "10"))
+        pubmed_ids = fetch_pubmed(query, year_start, year_end, num_articles)
+        pubmed_results = [fetch_pubmed_details(article_id) for article_id in pubmed_ids]
+        results = validate_document(pdf_path, [result[1] for result in pubmed_results], method="pubmed", titles=[result[0] for result in pubmed_results])
+    return render_template("NORUS.html", results=results)
+if __name__ == "__main__":
+    app.run(debug=True, port=7860)