Model Card for ddosdub/DualEncoderModernBERT

This is a binary classification model that combines ModernBERT and SBERT embeddings to detect whether a piece of evidence supports a given claim (evidence detection). This is a deep learning approach underpinned by transformer architecture.

Model Details

Model Description

This model uses a dual embedding approach that combines contextualized embeddings from ModernBERT-base with sentence embeddings from SBERT (all-MiniLM-L6-v2). The model first processes claim-evidence pairs through both embedding models, then concatenates the embeddings and passes them through a classifier to predict whether the evidence supports the claim.

The model is fine-tuned using QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization and flash-attention for efficient training and inference.

Text preprocessing includes removing reference tags, normalizing accented characters using unidecode, cleaning up irregular spacing around punctuation, and normalizing whitespace. Data augmentation was applied to the positive class (minority) using synonym replacement to address class imbalance.

Developed by: Dhruv Sharma and Tuan Chuong Goh
Model type: Supervised
Language(s) (NLP): English
License: cc-by-4.0
Finetuned from model: ModernBERT-base and SBERT (all-MiniLM-L6-v2)

Model Sources

Repository: https://github.com/chuongg3/NLU-EvidenceDetection
Paper: https://huggingface.co/answerdotai/ModernBERT-base

Uses

Direct Use

This model can be directly used for evidence detection tasks, where the goal is to determine whether a given piece of evidence supports a specific claim. It processes claim-evidence pairs and outputs a binary classification result.

Downstream Use

The model can be integrated into fact-checking systems, academic research tools, or information verification applications. It can also serve as a component in larger natural language understanding pipelines for tasks requiring evidence assessment.

Out-of-Scope Use

This model is not designed to:

Process non-English text
Handle multi-class classification beyond binary evidence detection
Serve as a standalone fact-checker without human oversight
Generate text or provide explanations for its decisions

Bias, Risks, and Limitations

The model uses an optimal threshold of 0.5433 determined through validation data to convert probabilities to binary predictions. The 4-bit quantization may introduce some precision loss compared to full-precision models, although the performance metrics indicate this has minimal impact on model quality. The original dataset had class imbalance which was addressed through data augmentation for the positive class.

Recommendations

Users (both direct and downstream) should be aware that:

The model works best with properly preprocessed text inputs
Performance may vary across different domains or types of claims
The model should be used as a decision support tool rather than the sole arbiter of evidence validity
Regular evaluation on new data is recommended to monitor potential performance drift

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sentence_transformers import SentenceTransformer
import torch

# Load models
modernbert_tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
modernbert_model = AutoModelForSequenceClassification.from_pretrained("answerdotai/ModernBERT-base")
sbert_model = SentenceTransformer("all-MiniLM-L6-v2")

# Load the fine-tuned model
# Replace with actual path when available
model = torch.load("path/to/h25471ds-m19364tg-ED")

# Process input
def predict(claim, evidence):
    # Preprocess text
    # ... preprocessing code here ...
    
    # Get ModernBERT embeddings
    inputs = modernbert_tokenizer(claim, evidence, return_tensors="pt")
    modernbert_output = modernbert_model(**inputs)
    
    # Get SBERT embeddings
    sbert_claim = sbert_model.encode(claim)
    sbert_evidence = sbert_model.encode(evidence)
    
    # Combine embeddings and predict
    # ... model inference code here ...
    
    return prediction

Training Details

Training Data

Training data consisted of claim-evidence pairs for evidence detection tasks. Data augmentation was applied to the positive class (minority) using synonym replacement to address class imbalance.

Training Procedure

Preprocessing

The preprocessing pipeline includes:

Removing reference tags like [REF], [REF, REF]
Normalizing accented characters using unidecode
Cleaning up irregular spacing around punctuation
Normalizing whitespace

Training Hyperparameters

Training regime: 4-bit (nf4) quantization with QLoRA
learning_rate: 0.0002643238333834569
batch_size: 64
num_epochs: 5
weight_decay: 0.048207625326781293
warmup_ratio: 0.19552784843595056
gradient_accumulation_steps: 4
lora_r: 56
lora_alpha: 40
lora_dropout: 0.07644825534662132
classifier_dropout: 0.2659719581055393
classifier_hidden_size: 768
max_length: 8192

Speeds, Sizes, Times

Model size: The base ModernBERT model is loaded in 4-bit quantization
SBERT embeddings dimension: 384
Memory footprint: Reduced due to 4-bit quantization and parameter-efficient fine-tuning

Evaluation

Testing Data, Factors & Metrics

Testing Data

Development set with claim-evidence pairs for evidence detection.

Factors

The evaluation focused on the model's ability to correctly classify evidence as supporting or not supporting claims across various domains and claim types.

Metrics

The following metrics were used to evaluate model performance:

Accuracy: Proportion of correct predictions
Precision: Proportion of positive identifications that were actually correct
Recall: Proportion of actual positives that were identified correctly
F1-Score: Harmonic mean of precision and recall
Matthews Correlation Coefficient: Correlation coefficient between observed and predicted binary classifications

Results

Summary

Accuracy: 0.87377657779278
Macro Precision: 0.83764094620994
Macro Recall: 0.86135532021442
Macro F1-Score: 0.84790707217937
Weighted Precision: 0.88028808321627
Weighted Recall: 0.87377657779278
Weighted F1-Score: 0.87591472842040
Matthews Correlation Coefficient: 0.69859387983347

The model achieved a Macro F1-score of 0.848 (84.8%) and an accuracy of 0.874 (87.4%) on the development set.

Environmental Impact

Hardware Type: CUDA-compatible GPU with T4 (Turing) architecture or newer
Hours used: Not specified
Cloud Provider: Not specified
Compute Region: Not specified
Carbon Emitted: Not calculated, but the use of 4-bit quantization and QLoRA significantly reduces the computational requirements compared to full-precision fine-tuning

Technical Specifications

Model Architecture and Objective

The model combines ModernBERT's contextual understanding with SBERT's semantic similarity capabilities. It first extracts the [CLS] token embedding from ModernBERT, then concatenates it with SBERT embeddings before passing through the classification layers.

Compute Infrastructure

Hardware

RAM: at least 16 GB
Storage: at least 2GB
GPU: CUDA-compatible GPU with T4 (Turing) architecture or newer
Training requirements: T4 or newer GPU architecture to support flash-attention
Inference requirements: Can be performed on less powerful GPUs with 4-bit quantization

Software

torch: 2.6.0+cu126
transformers
peft: 0.15.1 (for QLoRA implementation)
bitsandbytes: (for 4-bit quantization)
flash-attn: (for efficient attention computation)
sentence-transformers
sklearn
numpy
pandas
unidecode: (for text normalization)
re: (for text cleaning)

More Information

The model combines the strengths of ModernBERT's long context understanding with SBERT's semantic similarity capabilities. The use of QLoRA and 4-bit quantization enables efficient fine-tuning with significantly reduced memory requirements compared to full-precision fine-tuning. Flash-attention provides computational speedups during training and inference on compatible hardware.

Hyperparameters were optimized using a systematic search process to find the optimal configuration.

Important references:

QLoRA: Efficient Finetuning of Quantized LLMs (2023) - https://arxiv.org/abs/2305.14314
Hugging Face 4-bit Transformers with bitsandbytes - https://huggingface.co/blog/4bit-transformers-bitsandbytes
PEFT: Parameter-Efficient Fine-Tuning Documentation - https://huggingface.co/docs/peft/en/index

Model Card Contact

For inquiries about this model, please contact through the GitHub repository: https://github.com/chuongg3/NLU-EvidenceDetection

ddosdub
/

DualEncoderModernBERT