Spaces:

voxmenthe
/

imdb-sentiment-demo

Running on CPU Upgrade

App Files Files Community

voxmenthe commited on 21 days ago

Commit

472f1d2

1 Parent(s): 9a4b7b9

add full app and model initial test

Browse files

Files changed (10) hide show

README.md +68 -11
app.py +105 -4
classifiers.py +141 -0
config.yaml +12 -0
inference.py +79 -0
models.py +172 -0
requirements.txt +18 -0
src/config.yaml +46 -0
train_utils.py +156 -0
upload_to_hf.py +110 -0

README.md CHANGED Viewed

@@ -1,14 +1,71 @@
 ---
-title: Imdb Sentiment Demo
-emoji: 📚
-colorFrom: indigo
-colorTo: red
-sdk: gradio
-sdk_version: 5.29.0
-app_file: app.py
-pinned: false
-license: mit
-short_description: IMDB Sentiment Analysis Demo Using Modern Bert
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+language: en
+tags:
+- sentiment-analysis
+- modernbert
+- imdb
+datasets:
+- imdb
+metrics:
+- accuracy
+- f1
 ---
+# ModernBERT IMDb Sentiment Analysis Model
+## Model Description
+Fine-tuned ModernBERT model for sentiment analysis on IMDb movie reviews. Achieves 95.75% accuracy on the test set.
+## Usage
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+model = AutoModelForSequenceClassification.from_pretrained("voxmenthe/modernbert-imdb-sentiment")
+tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
+# Input processing
+inputs = tokenizer("This movie was fantastic!", return_tensors="pt")
+outputs = model(**inputs)
+# Get the predicted class
+predicted_class_id = outputs.logits.argmax().item()
+# Convert class ID to label
+predicted_label = model.config.id2label[predicted_class_id]
+print(f"Predicted label: {predicted_label}")
+```
+## Model Card
+### Model Details
+- **Model Name**: ModernBERT IMDb Sentiment Analysis
+- **Base Model**: answerdotai/ModernBERT-base
+- **Task**: Sentiment Analysis
+- **Dataset**: IMDb Movie Reviews
+- **Training Epochs**: 5
+### Model Performance
+- **Test Accuracy**: 95.75%
+- **Test F1 Score**: 95.75%
+### Model Architecture
+- **Base Model**: answerdotai/ModernBERT-base
+- **Task-Specific Head**: ClassifierHead (from `classifiers.py`)
+- **Number of Labels**: 2 (Positive, Negative)
+### Model Inference
+- **Input Format**: Text (single review)
+- **Output Format**: Predicted sentiment label (Positive or Negative)
+### Model Version
+- **Version**: 1.0
+- **Date**: 2025-05-07
+### Model License
+- **License**: MIT License
+### Model Contact
+- **Contact**: [email protected]
+### Model Citation
+- **Citation**: voxmenthe/modernbert-imdb-sentiment

app.py CHANGED Viewed

@@ -1,7 +1,108 @@
 import gradio as gr
-def greet(name):
-    return "Hello " + name + "!!"
-demo = gr.Interface(fn=greet, inputs="text", outputs="text")
-demo.launch()

 import gradio as gr
+from inference import SentimentInference
+import os
+from datasets import load_dataset
+import random
+# --- Initialize Sentiment Model ---
+CONFIG_PATH = os.path.join(os.path.dirname(__file__), "config.yaml")
+if not os.path.exists(CONFIG_PATH):
+    CONFIG_PATH = "config.yaml"
+    if not os.path.exists(CONFIG_PATH):
+        raise FileNotFoundError(
+            f"Configuration file not found. Tried {os.path.join(os.path.dirname(__file__), 'config.yaml')} and {CONFIG_PATH}. "
+            f"Ensure 'config.yaml' exists and is accessible."
+        )
+print(f"Loading model with config: {CONFIG_PATH}")
+try:
+    sentiment_inferer = SentimentInference(config_path=CONFIG_PATH)
+    print("Sentiment model loaded successfully.")
+except Exception as e:
+    print(f"Error loading sentiment model: {e}")
+    sentiment_inferer = None
+# --- Load IMDB Dataset ---
+print("Loading IMDB dataset for samples...")
+try:
+    imdb_dataset = load_dataset("imdb", split="test")
+    print("IMDB dataset loaded successfully.")
+except Exception as e:
+    print(f"Failed to load IMDB dataset: {e}. Sample loading will be disabled.")
+    imdb_dataset = None
+def load_random_imdb_sample():
+    """Loads a random sample text from the IMDB dataset."""
+    if imdb_dataset is None:
+        return "IMDB dataset not available. Cannot load sample.", None
+    random_index = random.randint(0, len(imdb_dataset) - 1)
+    sample = imdb_dataset[random_index]
+    return sample["text"], sample["label"]
+def predict_sentiment(text_input, true_label_state):
+    """Predicts sentiment for the given text_input."""
+    if sentiment_inferer is None:
+        return "Error: Sentiment model could not be loaded. Please check the logs.", true_label_state
+    if not text_input or not text_input.strip():
+        return "Please enter some text for analysis.", true_label_state
+    try:
+        prediction = sentiment_inferer.predict(text_input)
+        sentiment = prediction['sentiment']
+        # Convert numerical label to text if available
+        true_sentiment = None
+        if true_label_state is not None:
+            true_sentiment = "positive" if true_label_state == 1 else "negative"
+        result = f"Predicted Sentiment: {sentiment.capitalize()}"
+        if true_sentiment:
+            result += f"\nTrue IMDB Label: {true_sentiment.capitalize()}"
+        return result, None  # Reset true label state after display
+    except Exception as e:
+        print(f"Error during prediction: {e}")
+        return f"Error during prediction: {str(e)}", true_label_state
+# --- Gradio Interface ---
+with gr.Blocks() as demo:
+    true_label = gr.State()
+    gr.Markdown("## IMDb Sentiment Analyzer")
+    gr.Markdown("Enter a movie review to classify its sentiment as Positive or Negative, or load a random sample from the IMDb dataset.")
+    with gr.Row():
+        input_textbox = gr.Textbox(lines=7, placeholder="Enter movie review here...", label="Movie Review", scale=3)
+        output_text = gr.Text(label="Analysis Result", scale=1)
+    with gr.Row():
+        submit_button = gr.Button("Analyze Sentiment")
+        load_sample_button = gr.Button("Load Random IMDB Sample")
+    gr.Examples(
+        examples=[
+            ["This movie was absolutely fantastic! The acting was superb and the plot was gripping."],
+            ["I was really disappointed with this film. It was boring and the story made no sense."],
+            ["An average movie, had some good parts but overall quite forgettable."],
+            ["Wow so I don't think I've ever seen a movie quite like that. The plot was... interesting, and the acting was, well, hmm."]
+        ],
+        inputs=input_textbox
+    )
+    # Wire actions
+    submit_button.click(
+        fn=predict_sentiment,
+        inputs=[input_textbox, true_label],
+        outputs=[output_text, true_label]
+    )
+    load_sample_button.click(
+        fn=load_random_imdb_sample,
+        inputs=None,
+        outputs=[input_textbox, true_label]
+    )
+if __name__ == '__main__':
+    print("Launching Gradio interface...")
+    demo.launch(share=False)

classifiers.py ADDED Viewed

	@@ -0,0 +1,141 @@

+from torch import nn
+import torch
+class ClassifierHead(nn.Module):
+    """Basically a fancy MLP: 3-layer classifier head with GELU, LayerNorm, and Skip Connections."""
+    def __init__(self, hidden_size, num_labels, dropout_prob):
+        super().__init__()
+        # Layer 1
+        self.dense1 = nn.Linear(hidden_size, hidden_size)
+        self.norm1 = nn.LayerNorm(hidden_size)
+        self.activation = nn.GELU()
+        self.dropout1 = nn.Dropout(dropout_prob)
+        # Layer 2
+        self.dense2 = nn.Linear(hidden_size, hidden_size)
+        self.norm2 = nn.LayerNorm(hidden_size)
+        self.dropout2 = nn.Dropout(dropout_prob)
+        # Output Layer
+        self.out_proj = nn.Linear(hidden_size, num_labels)
+    def forward(self, features):
+        # Layer 1
+        identity1 = features
+        x = self.norm1(features)
+        x = self.dense1(x)
+        x = self.activation(x)
+        x = self.dropout1(x)
+        x = x + identity1 # skip connection
+        # Layer 2
+        identity2 = x
+        x = self.norm2(x)
+        x = self.dense2(x)
+        x = self.activation(x)
+        x = self.dropout2(x)
+        x = x + identity2 # skip connection
+        # Output Layer
+        logits = self.out_proj(x)
+        return logits
+class ConcatClassifierHead(nn.Module):
+    """
+    An enhanced classifier head designed for concatenated CLS + Mean Pooling input.
+    Includes an initial projection layer before the standard enhanced block.
+    """
+    def __init__(self, input_size, hidden_size, num_labels, dropout_prob):
+        super().__init__()
+        # Initial projection from concatenated size (2*hidden) down to hidden_size
+        self.initial_projection = nn.Linear(input_size, hidden_size)
+        self.initial_norm = nn.LayerNorm(hidden_size) # Norm after projection
+        self.initial_activation = nn.GELU()
+        self.initial_dropout = nn.Dropout(dropout_prob)
+        # Layer 1
+        self.dense1 = nn.Linear(hidden_size, hidden_size)
+        self.norm1 = nn.LayerNorm(hidden_size)
+        self.activation = nn.GELU()
+        self.dropout1 = nn.Dropout(dropout_prob)
+        # Layer 2
+        self.dense2 = nn.Linear(hidden_size, hidden_size)
+        self.norm2 = nn.LayerNorm(hidden_size)
+        self.dropout2 = nn.Dropout(dropout_prob)
+        # Output Layer
+        self.out_proj = nn.Linear(hidden_size, num_labels)
+    def forward(self, features):
+        # Initial Projection Step
+        x = self.initial_projection(features)
+        x = self.initial_norm(x)
+        x = self.initial_activation(x)
+        x = self.initial_dropout(x)
+        # x should now be of shape (batch_size, hidden_size)
+        # Layer 1 + Skip
+        identity1 = x # Skip connection starts after initial projection
+        x_res = self.norm1(x)
+        x_res = self.dense1(x_res)
+        x_res = self.activation(x_res)
+        x_res = self.dropout1(x_res)
+        x = x + x_res # skip connection
+        # Layer 2 + Skip
+        identity2 = x
+        x_res = self.norm2(x)
+        x_res = self.dense2(x_res)
+        x_res = self.activation(x_res)
+        x_res = self.dropout2(x_res)
+        x = x + x_res # skip connection
+        # Output Layer
+        logits = self.out_proj(x)
+        return logits
+# ExpansionClassifierHead currently not used
+class ExpansionClassifierHead(nn.Module):
+    """
+    A classifier head using FFN-style expansion (input -> 4*hidden -> hidden -> labels).
+    Takes concatenated CLS + Mean Pooled features as input.
+    """
+    def __init__(self, input_size, hidden_size, num_labels, dropout_prob):
+        super().__init__()
+        intermediate_size = hidden_size * 4 # FFN expansion factor
+        # Layer 1 (Expansion)
+        self.norm1 = nn.LayerNorm(input_size)
+        self.dense1 = nn.Linear(input_size, intermediate_size)
+        self.activation = nn.GELU()
+        self.dropout1 = nn.Dropout(dropout_prob)
+        # Layer 2 (Projection back down)
+        self.norm2 = nn.LayerNorm(intermediate_size)
+        self.dense2 = nn.Linear(intermediate_size, hidden_size)
+        # Activation and Dropout applied after projection
+        self.dropout2 = nn.Dropout(dropout_prob)
+        # Output Layer
+        self.out_proj = nn.Linear(hidden_size, num_labels)
+    def forward(self, features):
+        # Layer 1
+        x = self.norm1(features)
+        x = self.dense1(x)
+        x = self.activation(x)
+        x = self.dropout1(x)
+        # Layer 2
+        x = self.norm2(x)
+        x = self.dense2(x)
+        x = self.activation(x)
+        x = self.dropout2(x)
+        # Output Layer
+        logits = self.out_proj(x)
+        return logits

config.yaml ADDED Viewed

	@@ -0,0 +1,12 @@

+model:
+  name: "voxmenthe/modernbert-imdb-sentiment"
+  output_dir: "checkpoints"
+  max_length: 880 # 256
+  dropout: 0.1
+  pooling_strategy: "mean" # Current default, change as needed
+inference:
+  # Default path, can be overridden
+  model_path: "checkpoints/mean_epoch5_0.9575acc_0.9575f1.pt"
+  # Using the same max_length as training for consistency
+  max_length: 880 # 256

inference.py ADDED Viewed

	@@ -0,0 +1,79 @@

+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+from models import ModernBertForSentiment
+from transformers import ModernBertConfig
+from typing import Dict, Any
+import yaml
+import os
+class SentimentInference:
+    def __init__(self, config_path: str = "config.yaml"):
+        """Load configuration and initialize model and tokenizer."""
+        with open(config_path, 'r') as f:
+            config = yaml.safe_load(f)
+        model_cfg = config.get('model', {})
+        inference_cfg = config.get('inference', {})
+        # Path to the .pt model weights file
+        model_weights_path = inference_cfg.get('model_path',
+                                             os.path.join(model_cfg.get('output_dir', 'checkpoints'), 'best_model.pt'))
+        # Base model name from config (e.g., 'answerdotai/ModernBERT-base')
+        # This will be used for loading both tokenizer and base BERT config from Hugging Face Hub
+        base_model_name = model_cfg.get('name', 'answerdotai/ModernBERT-base')
+        self.max_length = inference_cfg.get('max_length', model_cfg.get('max_length', 256))
+        # Load tokenizer from the base model name (e.g., from Hugging Face Hub)
+        print(f"Loading tokenizer from: {base_model_name}")
+        self.tokenizer = AutoTokenizer.from_pretrained(base_model_name)
+        # Load base BERT config from the base model name
+        print(f"Loading ModernBertConfig from: {base_model_name}")
+        bert_config = ModernBertConfig.from_pretrained(base_model_name)
+        # --- Apply any necessary overrides from your config to the loaded bert_config ---
+        # For example, if your ModernBertForSentiment expects specific config values beyond the base BERT model.
+        # Your current ModernBertForSentiment takes the entire config object, which might implicitly carry these.
+        # However, explicitly setting them on bert_config loaded from HF is safer if they are architecturally relevant.
+        bert_config.classifier_dropout = model_cfg.get('dropout', bert_config.classifier_dropout) # Example
+        # Ensure num_labels is set if your inference model needs it (usually for HF pipeline, less so for manual predict)
+        # bert_config.num_labels = model_cfg.get('num_labels', 1) # Typically 1 for binary sentiment regression-style output
+        # It's also important that pooling_strategy and num_weighted_layers are set on the config object
+        # that ModernBertForSentiment receives, as it uses these to build its layers.
+        # These are usually fine-tuning specific, not part of the base HF config, so they should come from your model_cfg.
+        bert_config.pooling_strategy = model_cfg.get('pooling_strategy', 'cls')
+        bert_config.num_weighted_layers = model_cfg.get('num_weighted_layers', 4)
+        bert_config.loss_function = model_cfg.get('loss_function', {'name': 'SentimentWeightedLoss', 'params': {}}) # Needed by model init
+        # Ensure num_labels is explicitly set for the model's classifier head
+        bert_config.num_labels = 1 # For sentiment (positive/negative) often treated as 1 logit output
+        print("Instantiating ModernBertForSentiment model structure...")
+        self.model = ModernBertForSentiment(bert_config)
+        print(f"Loading model weights from local checkpoint: {model_weights_path}")
+        # Load the entire checkpoint dictionary first
+        checkpoint = torch.load(model_weights_path, map_location=torch.device('cpu'))
+        # Extract the model_state_dict from the checkpoint
+        # This handles the case where the checkpoint saves more than just the model weights (e.g., optimizer state, epoch)
+        if 'model_state_dict' in checkpoint:
+            model_state_to_load = checkpoint['model_state_dict']
+        else:
+            # If the checkpoint is just the state_dict itself (older format or different saving convention)
+            model_state_to_load = checkpoint
+        self.model.load_state_dict(model_state_to_load)
+        self.model.eval()
+        print("Model loaded successfully.")
+    def predict(self, text: str) -> Dict[str, Any]:
+        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, max_length=self.max_length)
+        with torch.no_grad():
+            outputs = self.model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
+        logits = outputs["logits"]
+        prob = torch.sigmoid(logits).item()
+        return {"sentiment": "positive" if prob > 0.5 else "negative", "confidence": prob}

models.py ADDED Viewed

	@@ -0,0 +1,172 @@

+from transformers import ModernBertModel, ModernBertPreTrainedModel
+from transformers.modeling_outputs import SequenceClassifierOutput
+from torch import nn
+import torch
+from train_utils import SentimentWeightedLoss, SentimentFocalLoss
+import torch.nn.functional as F
+from classifiers import ClassifierHead, ConcatClassifierHead
+class ModernBertForSentiment(ModernBertPreTrainedModel):
+    """ModernBERT encoder with a dynamically configurable classification head and pooling strategy."""
+    def __init__(self, config):
+        super().__init__(config)
+        self.num_labels = config.num_labels
+        self.bert = ModernBertModel(config) # Base BERT model, config may have output_hidden_states=True
+        # Store pooling strategy from config
+        self.pooling_strategy = getattr(config, 'pooling_strategy', 'mean')
+        self.num_weighted_layers = getattr(config, 'num_weighted_layers', 4)
+        if self.pooling_strategy in ['weighted_layer', 'cls_weighted_concat'] and not config.output_hidden_states:
+            # This check is more of an assertion; train.py should set output_hidden_states=True
+            raise ValueError(
+                "output_hidden_states must be True in BertConfig for weighted_layer pooling."
+            )
+        # Initialize weights for weighted layer pooling
+        if self.pooling_strategy in ['weighted_layer', 'cls_weighted_concat']:
+            # num_weighted_layers specifies how many *top* layers of BERT to use.
+            # If num_weighted_layers is e.g. 4, we use the last 4 layers.
+            self.layer_weights = nn.Parameter(torch.ones(self.num_weighted_layers) / self.num_weighted_layers)
+        # Determine classifier input size and choose head
+        classifier_input_size = config.hidden_size
+        if self.pooling_strategy in ['cls_mean_concat', 'cls_weighted_concat']:
+            classifier_input_size = config.hidden_size * 2
+        # Dropout for features fed into the classifier head
+        classifier_dropout_prob = (
+            config.classifier_dropout if config.classifier_dropout is not None else config.hidden_dropout_prob
+        )
+        self.features_dropout = nn.Dropout(classifier_dropout_prob)
+        # Select the appropriate classifier head based on input feature dimension
+        if classifier_input_size == config.hidden_size:
+            self.classifier = ClassifierHead(
+                hidden_size=config.hidden_size, # input_size for ClassifierHead is just hidden_size
+                num_labels=config.num_labels,
+                dropout_prob=classifier_dropout_prob
+            )
+        elif classifier_input_size == config.hidden_size * 2:
+            self.classifier = ConcatClassifierHead(
+                input_size=config.hidden_size * 2,
+                hidden_size=config.hidden_size, # Internal hidden size of the head
+                num_labels=config.num_labels,
+                dropout_prob=classifier_dropout_prob
+            )
+        else:
+            # This case should ideally not be reached with current strategies
+            raise ValueError(f"Unexpected classifier_input_size: {classifier_input_size}")
+        # Initialize loss function based on config
+        loss_config = getattr(config, 'loss_function', {'name': 'SentimentWeightedLoss', 'params': {}})
+        loss_name = loss_config.get('name', 'SentimentWeightedLoss')
+        loss_params = loss_config.get('params', {})
+        if loss_name == "SentimentWeightedLoss":
+            self.loss_fct = SentimentWeightedLoss() # SentimentWeightedLoss takes no arguments
+        elif loss_name == "SentimentFocalLoss":
+            # Ensure only relevant params are passed, or that loss_params is structured correctly for SentimentFocalLoss
+            # For SentimentFocalLoss, expected params are 'gamma_focal' and 'label_smoothing_epsilon'
+            self.loss_fct = SentimentFocalLoss(**loss_params)
+        else:
+            raise ValueError(f"Unsupported loss function: {loss_name}")
+        self.post_init() # Initialize weights and apply final processing
+    def _mean_pool(self, last_hidden_state, attention_mask):
+        if attention_mask is None:
+            attention_mask = torch.ones_like(last_hidden_state[:, :, 0]) # Assuming first dim of last hidden state is token ids
+        input_mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
+        sum_embeddings = torch.sum(last_hidden_state * input_mask_expanded, 1)
+        sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
+        return sum_embeddings / sum_mask
+    def _weighted_layer_pool(self, all_hidden_states):
+        # all_hidden_states includes embeddings + output of each layer.
+        # We want the outputs of the last num_weighted_layers.
+        # Example: 12 layers -> all_hidden_states have 13 items (embeddings + 12 layers)
+        # num_weighted_layers = 4 -> use layers 9, 10, 11, 12 (indices -4, -3, -2, -1)
+        layers_to_weigh = torch.stack(all_hidden_states[-self.num_weighted_layers:], dim=0)
+        # layers_to_weigh shape: (num_weighted_layers, batch_size, sequence_length, hidden_size)
+        # Normalize weights to sum to 1 (softmax or simple division)
+        normalized_weights = F.softmax(self.layer_weights, dim=-1)
+        # Weighted sum across layers
+        # Reshape weights for broadcasting: (num_weighted_layers, 1, 1, 1)
+        weighted_hidden_states = layers_to_weigh * normalized_weights.view(-1, 1, 1, 1)
+        weighted_sum_hidden_states = torch.sum(weighted_hidden_states, dim=0)
+        # weighted_sum_hidden_states shape: (batch_size, sequence_length, hidden_size)
+        # Pool the result (e.g., take [CLS] token of this weighted sum)
+        return weighted_sum_hidden_states[:, 0] # Return CLS token of the weighted sum
+    def forward(
+        self,
+        input_ids=None,
+        attention_mask=None,
+        labels=None,
+        lengths=None,
+        return_dict=None,
+        **kwargs
+    ):
+        return_dict = return_dict if return_dict is not None else self.config.use_return_dict
+        bert_outputs = self.bert(
+            input_ids,
+            attention_mask=attention_mask,
+            return_dict=return_dict,
+            output_hidden_states=self.config.output_hidden_states # Controlled by train.py
+        )
+        last_hidden_state = bert_outputs[0] # Or bert_outputs.last_hidden_state
+        pooled_features = None
+        if self.pooling_strategy == 'cls':
+            pooled_features = last_hidden_state[:, 0] # CLS token
+        elif self.pooling_strategy == 'mean':
+            pooled_features = self._mean_pool(last_hidden_state, attention_mask)
+        elif self.pooling_strategy == 'cls_mean_concat':
+            cls_output = last_hidden_state[:, 0]
+            mean_output = self._mean_pool(last_hidden_state, attention_mask)
+            pooled_features = torch.cat((cls_output, mean_output), dim=1)
+        elif self.pooling_strategy == 'weighted_layer':
+            if not self.config.output_hidden_states or bert_outputs.hidden_states is None:
+                raise ValueError("Weighted layer pooling requires output_hidden_states=True and hidden_states in BERT output.")
+            all_hidden_states = bert_outputs.hidden_states
+            pooled_features = self._weighted_layer_pool(all_hidden_states)
+        elif self.pooling_strategy == 'cls_weighted_concat':
+            if not self.config.output_hidden_states or bert_outputs.hidden_states is None:
+                raise ValueError("Weighted layer pooling requires output_hidden_states=True and hidden_states in BERT output.")
+            cls_output = last_hidden_state[:, 0]
+            all_hidden_states = bert_outputs.hidden_states
+            weighted_output = self._weighted_layer_pool(all_hidden_states)
+            pooled_features = torch.cat((cls_output, weighted_output), dim=1)
+        else:
+            raise ValueError(f"Unknown pooling_strategy: {self.pooling_strategy}")
+        pooled_features = self.features_dropout(pooled_features)
+        logits = self.classifier(pooled_features)
+        loss = None
+        if labels is not None:
+            if lengths is None:
+                raise ValueError("lengths must be provided when labels are specified for loss calculation.")
+            loss = self.loss_fct(logits.squeeze(-1), labels, lengths)
+        if not return_dict:
+            # Ensure 'outputs' from BERT is appropriately handled. If it's a tuple:
+            bert_model_outputs = bert_outputs[1:] if isinstance(bert_outputs, tuple) else (bert_outputs.hidden_states, bert_outputs.attentions)
+            output = (logits,) + bert_model_outputs
+            return ((loss,) + output) if loss is not None else output
+        return SequenceClassifierOutput(
+            loss=loss,
+            logits=logits,
+            hidden_states=bert_outputs.hidden_states,
+            attentions=bert_outputs.attentions,
+        )

requirements.txt ADDED Viewed

	@@ -0,0 +1,18 @@

+gradio
+ipykernel
+ipywidgets
+tqdm
+kagglehub
+transformers>=4.51.3,<5.0.0
+torch>=2.7.0,<2.8.0
+datasets>=2.16.1,<2.17.0
+markdown>=3.7.0,<4.0.0
+matplotlib>=3.9.0,<4.0.0
+notebook>=7.2.0,<8.0.0
+numpy>=2.1.0,<3.0.0
+pandas>=2.2.0,<3.0.0
+python-json-logger>=2.0.7,<3.0.0
+requests>=2.27.1,<3.0.0
+scikit-learn>=1.5.0
+seaborn>=0.13.0
+weasyprint

src/config.yaml ADDED Viewed

	@@ -0,0 +1,46 @@

+model:
+  name: "voxmenthe/modernbert-imdb-sentiment"
+  loss_function:
+    name: "SentimentWeightedLoss" # Options: "SentimentWeightedLoss", "SentimentFocalLoss"
+    # Parameters for the chosen loss function.
+    # For SentimentFocalLoss, common params are:
+    # gamma_focal: 1.0 # (e.g., 2.0 for standard, -2.0 for reversed, 0 for none)
+    # label_smoothing_epsilon: 0.05 # (e.g., 0.0 to 0.1)
+    # For SentimentWeightedLoss, params is empty:
+    params:
+      gamma_focal: 1.0
+      label_smoothing_epsilon: 0.05
+  output_dir: "checkpoints"
+  max_length: 880 # 256
+  dropout: 0.1
+  # --- Pooling Strategy --- #
+  # Options: "cls", "mean", "cls_mean_concat", "weighted_layer", "cls_weighted_concat"
+  # "cls" uses just the [CLS] token for classification
+  # "mean" uses mean pooling over final hidden states for classification
+  # "cls_mean_concat" uses both [CLS] and mean pooling over final hidden states for classification
+  # "weighted_layer" uses a weighted combination of the final hidden states from the top N layers for classification
+  # "cls_weighted_concat" uses a weighted combination of the final hidden states from the top N layers and the [CLS] token for classification
+  pooling_strategy: "mean" # Current default, change as needed
+  num_weighted_layers: 6 # Number of top BERT layers to use for 'weighted_layer' strategies (e.g., 1 to 12 for BERT-base)
+data:
+  # No specific data paths needed as we use HF datasets at the moment
+training:
+  epochs: 6
+  batch_size: 16
+  lr: 1e-5 # 1e-5 # 2.0e-5
+  weight_decay_rate: 0.02 # 0.01
+  resume_from_checkpoint: "" # "checkpoints/mean_epoch2_0.9361acc_0.9355f1.pt" # Path to checkpoint file, or empty to not resume
+inference:
+  # Default path, can be overridden
+  model_path: "checkpoints/mean_epoch5_0.9575acc_0.9575f1.pt"
+  # Using the same max_length as training for consistency
+  max_length: 880 # 256
+# "answerdotai/ModernBERT-base"
+# "answerdotai/ModernBERT-large"

train_utils.py ADDED Viewed

	@@ -0,0 +1,156 @@

+import math
+from torch import nn
+import torch
+import torch.nn.functional as F
+class SentimentWeightedLoss(nn.Module):
+    """BCEWithLogits + dynamic weighting.
+    We weight each sample by:
+      • length_weight:  sqrt(num_tokens) / sqrt(max_tokens)
+      • confidence_weight: |sigmoid(logits) - 0.5|  (higher confidence ⇒ larger weight)
+    The two weights are combined multiplicatively then normalized.
+    """
+    def __init__(self):
+        super().__init__()
+        # Initialize BCE loss without reduction, since we're applying per-sample weights
+        self.bce = nn.BCEWithLogitsLoss(reduction="none")
+        self.min_len_weight_sqrt = 0.1  # Minimum length weight
+    def forward(self, logits, targets, lengths):
+        base_loss = self.bce(logits.view(-1), targets.float())  # shape [B]
+        prob = torch.sigmoid(logits.view(-1))
+        confidence_weight = (prob - 0.5).abs() * 2  # ∈ [0,1]
+        if lengths.numel() == 0:
+            # Handle empty batch: return 0.0 loss or mean of base_loss if it's also empty (becomes nan then)
+            # If base_loss on empty input is empty tensor, mean is nan. So return 0.0 is safer.
+            return torch.tensor(0.0, device=logits.device, requires_grad=logits.requires_grad)
+        length_weight = torch.sqrt(lengths.float()) / math.sqrt(lengths.max().item())
+        length_weight = length_weight.clamp(self.min_len_weight_sqrt, 1.0) # Clamp to avoid extreme weights
+        weights = confidence_weight * length_weight
+        weights = weights / (weights.mean() + 1e-8)  # normalize so E[w]=1
+        return (base_loss * weights).mean()
+class SentimentFocalLoss(nn.Module):
+    """
+    This loss function incorporates:
+    1. Base BCEWithLogitsLoss.
+    2. Label Smoothing.
+    3. Focal Loss modulation to focus more on hard examples (can be reversed to focus on easy examples).
+    4. Sample weighting based on review length.
+    5. Sample weighting based on prediction confidence.
+    The final loss for each sample is calculated roughly as:
+    Loss_sample = FocalModulator(pt, gamma) * BCE(logits, smoothed_targets) * NormalizedExternalWeight
+    NormalizedExternalWeight = (ConfidenceWeight * LengthWeight) / Mean(ConfidenceWeight * LengthWeight)
+    """
+    def __init__(self, gamma_focal: float = 0.1, label_smoothing_epsilon: float = 0.05):
+        """
+        Args:
+            gamma_focal (float): Gamma parameter for Focal Loss.
+                - If gamma_focal > 0 (e.g., 2.0), applies standard Focal Loss,
+                  down-weighting easy examples (focus on hard examples).
+                - If gamma_focal < 0 (e.g., -2.0), applies a reversed Focal Loss,
+                  down-weighting hard examples (focus on easy examples by up-weighting pt).
+                - If gamma_focal = 0, no Focal Loss modulation is applied.
+            label_smoothing_epsilon (float): Epsilon for label smoothing. (0.0 <= epsilon < 1.0)
+                - If 0.0, no label smoothing is applied. Converts hard labels (0, 1)
+                  to soft labels (epsilon, 1-epsilon).
+        """
+        super().__init__()
+        if not (0.0 <= label_smoothing_epsilon < 1.0):
+            raise ValueError("label_smoothing_epsilon must be between 0.0 and <1.0.")
+        self.gamma_focal = gamma_focal
+        self.label_smoothing_epsilon = label_smoothing_epsilon
+        # Initialize BCE loss without reduction, since we're applying per-sample weights
+        self.bce_loss_no_reduction = nn.BCEWithLogitsLoss(reduction="none")
+    def forward(self, logits: torch.Tensor, targets: torch.Tensor, lengths: torch.Tensor) -> torch.Tensor:
+        """
+        Computes the custom loss.
+        Args:
+            logits (torch.Tensor): Raw logits from the model. Expected shape [B] or [B, 1].
+            targets (torch.Tensor): Ground truth labels (0 or 1). Expected shape [B] or [B, 1].
+            lengths (torch.Tensor): Number of tokens in each review. Expected shape [B].
+        Returns:
+            torch.Tensor: The computed scalar loss.
+        """
+        B = logits.size(0)
+        if B == 0: # Handle empty batch case
+            return torch.tensor(0.0, device=logits.device, requires_grad=True)
+        logits_flat = logits.view(-1)
+        original_targets_flat = targets.view(-1).float() # Ensure targets are float
+        # 1. Label Smoothing
+        if self.label_smoothing_epsilon > 0:
+            # Smooth 1 to (1 - epsilon), and 0 to epsilon
+            targets_for_bce = original_targets_flat * (1.0 - self.label_smoothing_epsilon) + \
+                              (1.0 - original_targets_flat) * self.label_smoothing_epsilon
+        else:
+            targets_for_bce = original_targets_flat
+        # 2. Calculate Base BCE loss terms (using potentially smoothed targets)
+        base_bce_loss_terms = self.bce_loss_no_reduction(logits_flat, targets_for_bce)
+        # 3. Focal Loss Modulation Component
+        # For the focal modulator, 'pt' is the probability assigned by the model to the *original* ground truth class.
+        probs = torch.sigmoid(logits_flat)
+        # pt: probability of the original true class
+        pt = torch.where(original_targets_flat.bool(), probs, 1.0 - probs)
+        focal_modulator = torch.ones_like(pt) # Default to 1 (no modulation if gamma_focal is 0)
+        if self.gamma_focal > 0:  # Standard Focal Loss: (1-pt)^gamma. Focus on hard examples (pt is small).
+            focal_modulator = (1.0 - pt + 1e-8).pow(self.gamma_focal) # Epsilon for stability if pt is 1
+        elif self.gamma_focal < 0:  # Reversed Focal: (pt)^|gamma|. Focus on easy examples (pt is large).
+            focal_modulator = (pt + 1e-8).pow(abs(self.gamma_focal)) # Epsilon for stability if pt is 0
+        modulated_loss_terms = focal_modulator * base_bce_loss_terms
+        # 4. Confidence Weighting (based on how far probability is from 0.5)
+        # Uses the same `probs` calculated for focal `pt`.
+        confidence_w = (probs - 0.5).abs() * 2.0  # Scales to range [0, 1]
+        # 5. Length Weighting (longer reviews potentially weighted more)
+        lengths_flat = lengths.view(-1).float()
+        max_len_in_batch = lengths_flat.max().item()
+        if max_len_in_batch == 0: # Edge case: if all reviews in batch have 0 length
+            length_w = torch.ones_like(lengths_flat)
+        else:
+            # Normalize by sqrt of max length in the current batch. Add epsilon for stability.
+            length_w = torch.sqrt(lengths_flat) / (math.sqrt(max_len_in_batch) + 1e-8)
+            length_w = torch.clamp(length_w, 0.0, 1.0) # Ensure weights are capped at 1
+        # 6. Combine External Weights (Confidence and Length)
+        # These weights are applied ON TOP of the focal-modulated loss terms.
+        external_weights = confidence_w * length_w
+        # Normalize these combined external_weights so their mean is approximately 1.
+        # This prevents the weighting scheme from drastically changing the overall loss magnitude.
+        if external_weights.sum() > 1e-8: # Avoid division by zero if all weights are zero
+             normalized_external_weights = external_weights / (external_weights.mean() + 1e-8)
+        else: # If all external weights are zero, use ones to not nullify the loss.
+             normalized_external_weights = torch.ones_like(external_weights)
+        # 7. Apply Normalized External Weights to the (Focal) Modulated Loss Terms
+        final_loss_terms_per_sample = modulated_loss_terms * normalized_external_weights
+        # 8. Final Reduction: Mean of the per-sample losses
+        loss = final_loss_terms_per_sample.mean()
+        return loss

upload_to_hf.py ADDED Viewed

	@@ -0,0 +1,110 @@

+from huggingface_hub import HfApi, upload_folder, create_repo
+from transformers import AutoTokenizer, AutoConfig
+import os
+import shutil
+import tempfile
+# --- Configuration ---
+HUGGING_FACE_USERNAME = "voxmenthe"  # Your Hugging Face username
+MODEL_NAME_ON_HF = "modernbert-imdb-sentiment" # The name of the model on Hugging Face
+REPO_ID = f"{HUGGING_FACE_USERNAME}/{MODEL_NAME_ON_HF}"
+# Original base model from which the tokenizer and initial config were derived
+ORIGINAL_BASE_MODEL_NAME = "answerdotai/ModernBERT-base"
+# Local path to your fine-tuned model checkpoint
+LOCAL_MODEL_CHECKPOINT_DIR = "checkpoints"
+FINE_TUNED_MODEL_FILENAME = "mean_epoch5_0.9575acc_0.9575f1.pt" # Your best checkpoint
+# If your fine-tuned model is just a .pt file, ensure you also have a config.json for ModernBertForSentiment
+# For simplicity, we'll re-save the config from the fine-tuned model structure if possible, or from original base.
+# Files from your project to include (e.g., custom model code, inference script)
+# The user has moved these to the root directory.
+PROJECT_FILES_TO_UPLOAD = [
+    "config.yaml",
+    "inference.py",
+    "models.py",
+    "train_utils.py",
+    "classifiers.py",
+    "README.md"
+]
+def upload_model_and_tokenizer():
+    api = HfApi()
+    # Create the repository on Hugging Face Hub (if it doesn't exist)
+    print(f"Creating repository {REPO_ID} on Hugging Face Hub...")
+    create_repo(repo_id=REPO_ID, repo_type="model", exist_ok=True)
+    # Create a temporary directory to gather all files for upload
+    with tempfile.TemporaryDirectory() as tmp_upload_dir:
+        print(f"Created temporary directory for upload: {tmp_upload_dir}")
+        # 1. Save tokenizer files from the ORIGINAL_BASE_MODEL_NAME
+        print(f"Saving tokenizer from {ORIGINAL_BASE_MODEL_NAME} to {tmp_upload_dir}...")
+        try:
+            tokenizer = AutoTokenizer.from_pretrained(ORIGINAL_BASE_MODEL_NAME)
+            tokenizer.save_pretrained(tmp_upload_dir)
+            print("Tokenizer files saved.")
+        except Exception as e:
+            print(f"Error saving tokenizer from {ORIGINAL_BASE_MODEL_NAME}: {e}")
+            print("Please ensure this model name is correct and accessible.")
+            return
+        # 2. Save base model config.json (architecture) from ORIGINAL_BASE_MODEL_NAME
+        # This is crucial for AutoModelForSequenceClassification.from_pretrained(REPO_ID) to work.
+        print(f"Saving model config.json from {ORIGINAL_BASE_MODEL_NAME} to {tmp_upload_dir}...")
+        try:
+            config = AutoConfig.from_pretrained(ORIGINAL_BASE_MODEL_NAME)
+            # If your fine-tuned ModernBertForSentiment has specific architectural changes in its config
+            # that are NOT automatically handled by loading the state_dict (e.g. num_labels if not standard),
+            # you might need to update 'config' here before saving.
+            # For now, we assume the base config is sufficient or your model's state_dict handles it.
+            config.save_pretrained(tmp_upload_dir)
+            print("Model config.json saved.")
+        except Exception as e:
+            print(f"Error saving config.json from {ORIGINAL_BASE_MODEL_NAME}: {e}")
+            return
+        # 3. Copy fine-tuned model checkpoint to temporary directory
+        # The fine-tuned weights should be named 'pytorch_model.bin' or 'model.safetensors' for HF to auto-load.
+        # Or, your config.json in the repo should point to the custom name.
+        # For simplicity, we'll rename it to HF standard name of pytorch_model.bin.
+        local_checkpoint_path = os.path.join(LOCAL_MODEL_CHECKPOINT_DIR, FINE_TUNED_MODEL_FILENAME)
+        if os.path.exists(local_checkpoint_path):
+            hf_model_path = os.path.join(tmp_upload_dir, "pytorch_model.bin")
+            shutil.copyfile(local_checkpoint_path, hf_model_path)
+            print(f"Copied fine-tuned model {FINE_TUNED_MODEL_FILENAME} to {hf_model_path}.")
+        else:
+            print(f"Error: Fine-tuned model checkpoint {local_checkpoint_path} not found.")
+            return
+        # 4. Copy other project files
+        for project_file in PROJECT_FILES_TO_UPLOAD:
+            local_project_file_path = project_file # Files are now at the root
+            if os.path.exists(local_project_file_path):
+                shutil.copy(local_project_file_path, os.path.join(tmp_upload_dir, os.path.basename(project_file)))
+                print(f"Copied project file {project_file} to {tmp_upload_dir}.")
+            else:
+                print(f"Warning: Project file {project_file} not found at {local_project_file_path}.")
+        # 5. Upload the contents of the temporary directory
+        print(f"Uploading all files from {tmp_upload_dir} to {REPO_ID}...")
+        try:
+            upload_folder(
+                folder_path=tmp_upload_dir,
+                repo_id=REPO_ID,
+                repo_type="model",
+                commit_message=f"Upload fine-tuned model, tokenizer, and supporting files for {MODEL_NAME_ON_HF}"
+            )
+            print("All files uploaded successfully!")
+        except Exception as e:
+            print(f"Error uploading folder to Hugging Face Hub: {e}")
+if __name__ == "__main__":
+    # Make sure you are logged in to Hugging Face CLI:
+    # Run `huggingface-cli login` or `huggingface-cli login --token YOUR_HF_WRITE_TOKEN` in your terminal first.
+    print("Starting upload process...")
+    print(f"Target Hugging Face Repo ID: {REPO_ID}")
+    print("Ensure you have run 'huggingface-cli login' with a write token.")
+    upload_model_and_tokenizer()