fzn0x
/

bert-spam-classification-model

Text Classification

Model card Files Files and versions Community

bert-spam-classification-model / README.md

fzn0x's picture

Update README.md

5d7987d verified 26 days ago

|

history blame contribute delete

3.08 kB

	---
	license: mit
	language:
	- en
	metrics:
	- accuracy
	base_model:
	- google-bert/bert-base-uncased
	pipeline_tag: text-classification
	tags:
	- text-classification
	- spam
	- english
	---
	# Fine-tuned BERT-base-uncased pre-trained model to classify spam SMS.

	Check Github for Eval Results logs: https://github.com/fzn0x/bert-sms-classification

	My second project in Natural Language Processing (NLP), where I fine-tuned a bert-base-uncased model to classify spam SMS. This is huge improvements from https://github.com/fzn0x/bert-indonesian-english-hate-comments.

	How to use this model?

	```py
	from transformers import BertTokenizer, BertForSequenceClassification
	import torch

	tokenizer = BertTokenizer.from_pretrained('fzn0x/bert-spam-classification-model')
	model = BertForSequenceClassification.from_pretrained('fzn0x/bert-spam-classification-model')

	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model.to(device)
	model.eval()

	def model_predict(text: str):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True).to(device)
	with torch.no_grad():
	outputs = model(**inputs)
	logits = outputs.logits
	prediction = torch.argmax(logits, dim=1).item()
	return 'SPAM' if prediction == 1 else 'HAM'

	def predict():
	text = "Hello, do you know with this crypto you can be rich? contact us in 88888"
	predicted_label = model_predict(text)
	print(f"1. Predicted class: {predicted_label}") # EXPECT: SPAM

	text = "Help me richard!"
	predicted_label = model_predict(text)
	print(f"2. Predicted class: {predicted_label}") # EXPECT: HAM

	text = "You can buy loopstation for 100$, try buyloopstation.com"
	predicted_label = model_predict(text)
	print(f"3. Predicted class: {predicted_label}") # EXPECT: SPAM

	text = "Mate, I try to contact your phone, where are you?"
	predicted_label = model_predict(text)
	print(f"4. Predicted class: {predicted_label}") # EXPECT: HAM

	if __name__ == "__main__":
	predict()
	```

	## 📚 Citations

	If you use this repository or its ideas, please cite the following:

	See [`citations.bib`](./citations.bib) for full BibTeX entries.

	- Wolf et al., Transformers: State-of-the-Art Natural Language Processing, EMNLP 2020. [ACL Anthology](https://www.aclweb.org/anthology/2020.emnlp-demos.6)
	- Pedregosa et al., Scikit-learn: Machine Learning in Python, JMLR 2011.
	- Almeida & Gómez Hidalgo, SMS Spam Collection v.1, UCI Machine Learning Repository (2011). [Kaggle Link](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)

	## 🧠 Credits and Libraries Used

	- [Hugging Face Transformers](https://github.com/huggingface/transformers) – model, tokenizer, and training utilities
	- [scikit-learn](https://scikit-learn.org/stable/) – metrics and preprocessing
	- Logging silencing inspired by Hugging Face GitHub discussions
	- Dataset from [UCI SMS Spam Collection](https://www.kaggle.com/datasets/uciml/sms-spam-collection-dataset)
	- Inspiration from [Kaggle Notebook by Suyash Khare](https://www.kaggle.com/code/suyashkhare/naive-bayes)