xlmr-large-classifier-pinocchio_it_tra1-eng - MT/HT Classifier

This model is a fine-tuned version of FacebookAI/xlm-roberta-large for distinguishing between Machine Translated (MT) and Human Translated (HT) text (or HT1 and HT2 if using two different human translators).

Training data:

Train: 1490, for each label: 745
Validation: 164, for each label: 82
Test: 214, for each label: 107

Results on the held-out test set:

Accuracy: 0.9065
F1-Score: 0.9099
Precision: 0.8783
Recall: 0.9439

label mapping

Label MT: 0

Label PE: 1 (this is the human translator)

Info

Upload date: 2025-04-30 00:00

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng")
model = AutoModelForSequenceClassification.from_pretrained("DanielSc4/xlmr-large-classifier-pinocchio_it_tra1-eng")

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
inp = tokenizer('This is a test', return_tensors='pt').to(device)
model = model.to(device)

out = model(**inp)

logits = out.logits
probs = logits.softmax(dim=-1)
pred = probs.argmax(dim=-1).item()
print("Predicted class: " + str(pred)) # 0 for MT, 1 for PE