Model Card for ddosdub/DualEncoderModernBERT

This is a binary classification model that combines ModernBERT and SBERT embeddings to detect whether a piece of evidence supports a given claim (evidence detection). This is a deep learning approach underpinned by transformer architecture.

Model Details

Model Description

This model uses a dual embedding approach that combines contextualized embeddings from ModernBERT-base with sentence embeddings from SBERT (all-MiniLM-L6-v2). The model first processes claim-evidence pairs through both embedding models, then concatenates the embeddings and passes them through a classifier to predict whether the evidence supports the claim.

The model is fine-tuned using QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization and flash-attention for efficient training and inference.

Text preprocessing includes removing reference tags, normalizing accented characters using unidecode, cleaning up irregular spacing around punctuation, and normalizing whitespace. Data augmentation was applied to the positive class (minority) using synonym replacement to address class imbalance.

  • Developed by: Dhruv Sharma and Tuan Chuong Goh
  • Model type: Supervised
  • Language(s) (NLP): English
  • License: cc-by-4.0
  • Finetuned from model: ModernBERT-base and SBERT (all-MiniLM-L6-v2)

Model Sources

Uses

Direct Use

This model can be directly used for evidence detection tasks, where the goal is to determine whether a given piece of evidence supports a specific claim. It processes claim-evidence pairs and outputs a binary classification result.

Downstream Use

The model can be integrated into fact-checking systems, academic research tools, or information verification applications. It can also serve as a component in larger natural language understanding pipelines for tasks requiring evidence assessment.

Out-of-Scope Use

This model is not designed to:

  • Process non-English text
  • Handle multi-class classification beyond binary evidence detection
  • Serve as a standalone fact-checker without human oversight
  • Generate text or provide explanations for its decisions

Bias, Risks, and Limitations

The model uses an optimal threshold of 0.5433 determined through validation data to convert probabilities to binary predictions. The 4-bit quantization may introduce some precision loss compared to full-precision models, although the performance metrics indicate this has minimal impact on model quality. The original dataset had class imbalance which was addressed through data augmentation for the positive class.

Recommendations

Users (both direct and downstream) should be aware that:

  • The model works best with properly preprocessed text inputs
  • Performance may vary across different domains or types of claims
  • The model should be used as a decision support tool rather than the sole arbiter of evidence validity
  • Regular evaluation on new data is recommended to monitor potential performance drift

How to Get Started with the Model

Use the code below to get started with the model:

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from sentence_transformers import SentenceTransformer
import torch

# Load models
modernbert_tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
modernbert_model = AutoModelForSequenceClassification.from_pretrained("answerdotai/ModernBERT-base")
sbert_model = SentenceTransformer("all-MiniLM-L6-v2")

# Load the fine-tuned model
# Replace with actual path when available
model = torch.load("path/to/h25471ds-m19364tg-ED")

# Process input
def predict(claim, evidence):
    # Preprocess text
    # ... preprocessing code here ...
    
    # Get ModernBERT embeddings
    inputs = modernbert_tokenizer(claim, evidence, return_tensors="pt")
    modernbert_output = modernbert_model(**inputs)
    
    # Get SBERT embeddings
    sbert_claim = sbert_model.encode(claim)
    sbert_evidence = sbert_model.encode(evidence)
    
    # Combine embeddings and predict
    # ... model inference code here ...
    
    return prediction

Training Details

Training Data

Training data consisted of claim-evidence pairs for evidence detection tasks. Data augmentation was applied to the positive class (minority) using synonym replacement to address class imbalance.

Training Procedure

Preprocessing

The preprocessing pipeline includes:

  1. Removing reference tags like [REF], [REF, REF]
  2. Normalizing accented characters using unidecode
  3. Cleaning up irregular spacing around punctuation
  4. Normalizing whitespace

Training Hyperparameters

  • Training regime: 4-bit (nf4) quantization with QLoRA
  • learning_rate: 0.0002643238333834569
  • batch_size: 64
  • num_epochs: 5
  • weight_decay: 0.048207625326781293
  • warmup_ratio: 0.19552784843595056
  • gradient_accumulation_steps: 4
  • lora_r: 56
  • lora_alpha: 40
  • lora_dropout: 0.07644825534662132
  • classifier_dropout: 0.2659719581055393
  • classifier_hidden_size: 768
  • max_length: 8192

Speeds, Sizes, Times

  • Model size: The base ModernBERT model is loaded in 4-bit quantization
  • SBERT embeddings dimension: 384
  • Memory footprint: Reduced due to 4-bit quantization and parameter-efficient fine-tuning

Evaluation

Testing Data, Factors & Metrics

Testing Data

Development set with claim-evidence pairs for evidence detection.

Factors

The evaluation focused on the model's ability to correctly classify evidence as supporting or not supporting claims across various domains and claim types.

Metrics

The following metrics were used to evaluate model performance:

  • Accuracy: Proportion of correct predictions
  • Precision: Proportion of positive identifications that were actually correct
  • Recall: Proportion of actual positives that were identified correctly
  • F1-Score: Harmonic mean of precision and recall
  • Matthews Correlation Coefficient: Correlation coefficient between observed and predicted binary classifications

Results

Summary

  • Accuracy: 0.87377657779278
  • Macro Precision: 0.83764094620994
  • Macro Recall: 0.86135532021442
  • Macro F1-Score: 0.84790707217937
  • Weighted Precision: 0.88028808321627
  • Weighted Recall: 0.87377657779278
  • Weighted F1-Score: 0.87591472842040
  • Matthews Correlation Coefficient: 0.69859387983347

The model achieved a Macro F1-score of 0.848 (84.8%) and an accuracy of 0.874 (87.4%) on the development set.

Environmental Impact

  • Hardware Type: CUDA-compatible GPU with T4 (Turing) architecture or newer
  • Hours used: Not specified
  • Cloud Provider: Not specified
  • Compute Region: Not specified
  • Carbon Emitted: Not calculated, but the use of 4-bit quantization and QLoRA significantly reduces the computational requirements compared to full-precision fine-tuning

Technical Specifications

Model Architecture and Objective

The model combines ModernBERT's contextual understanding with SBERT's semantic similarity capabilities. It first extracts the [CLS] token embedding from ModernBERT, then concatenates it with SBERT embeddings before passing through the classification layers.

Compute Infrastructure

Hardware

  • RAM: at least 16 GB
  • Storage: at least 2GB
  • GPU: CUDA-compatible GPU with T4 (Turing) architecture or newer
  • Training requirements: T4 or newer GPU architecture to support flash-attention
  • Inference requirements: Can be performed on less powerful GPUs with 4-bit quantization

Software

  • torch: 2.6.0+cu126
  • transformers
  • peft: 0.15.1 (for QLoRA implementation)
  • bitsandbytes: (for 4-bit quantization)
  • flash-attn: (for efficient attention computation)
  • sentence-transformers
  • sklearn
  • numpy
  • pandas
  • unidecode: (for text normalization)
  • re: (for text cleaning)

More Information

The model combines the strengths of ModernBERT's long context understanding with SBERT's semantic similarity capabilities. The use of QLoRA and 4-bit quantization enables efficient fine-tuning with significantly reduced memory requirements compared to full-precision fine-tuning. Flash-attention provides computational speedups during training and inference on compatible hardware.

Hyperparameters were optimized using a systematic search process to find the optimal configuration.

Important references:

Model Card Contact

For inquiries about this model, please contact through the GitHub repository: https://github.com/chuongg3/NLU-EvidenceDetection

Downloads last month
49
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ddosdub/DualEncoderModernBERT

Adapter
(14)
this model