Spaces:

dragonities
/

Toxic_Detection

Running

App Files Files Community

Toxic_Detection / app.py

dragonities

Initial commit for Toxic Detection project

8415dcf 5 months ago

raw

history blame

13.8 kB

	# -- coding: utf-8 --
	"""ai-portfolio.ipynb

	Automatically generated by Colab.

	Original file is located at
	https://colab.research.google.com/drive/1XN71Q8R5ctujwjQB0XsGHB7KBp4hP6wR

	# Project: Portfolio - Final Project

	Instructions for Students:

	Please carefully follow these steps to complete and submit your assignment:

	1. Completing the Assignment: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.

	2. Creating a Google Drive Folder: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.

	3. Uploading Completed Assignment: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.

	4. Sharing Folder Link: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.

	5. Setting Permission toPublic: Please make sure your Google Drive folder is set to public. This allows your instructor to access your solutions and assess your work correctly.

	Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.

	Description:

	Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.

	You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.

	To get you started, here are some ideas:

	1. Sentiment Analysis Application: Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.

	2. Chatbot: Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.

	3. Predictive Text Application: Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.

	4. Image Classification Application: Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.

	5. News Article Classifier: Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.

	6. Recommendation System: Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.

	7. Plant Disease Detection: Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.

	8. Facial Expression Recognition: Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.

	9. Chest X-Ray Interpretation: Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.

	10. Food Classification: Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.

	11. Traffic Sign Recognition: Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.

	Submission:

	Please upload both your model and application to Huggingface or your own Github account for submission.

	Presentation:

	You are required to create a presentation to showcase your project, including the following details:

	- The objective of your model.
	- A comprehensive description of your model.
	- The specific metrics used to measure your model's effectiveness.
	- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
	- An explanation of the methodology used in developing the model.
	- A discussion on challenges faced, how they were handled, and your learnings from those.
	- Suggestions for potential future improvements to the model.
	- A functioning link to a demo of your model in action.

	Grading:

	Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.

	Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!
	"""


	# Commented out IPython magic to ensure Python compatibility.
	# %pip install rggrader


	"""## Working Space"""

	import nltk
	nltk.download('wordnet')
	nltk.download('omw-1.4') # Untuk mendukung antonim multi-bahasa

	"""## Submit Notebook"""

	import random
	from transformers import pipeline
	import string
	from nltk.corpus import wordnet
	import nltk

	# Unduh resource WordNet
	nltk.download("wordnet")
	nltk.download("omw-1.4")

	# Load GPT-2 untuk menghasilkan kata pengganti
	text_generator = pipeline("text-generation", model="gpt2")

	# Load pretrained hate speech detection model
	hate_speech_classifier = pipeline("text-classification", model="unitary/toxic-bert")

	# Confidence threshold untuk mendeteksi toksisitas
	CONFIDENCE_THRESHOLD = 0.5

	# Initialize toxic counter
	toxic_counter = {"count": 0}

	# File path untuk menyimpan mapping negatif ke positif
	filepath = "extended_negative_to_positive_words.txt"

	# Daftar kata positif untuk fallback
	positive_words = ["kind", "friendly", "smart", "brilliant", "amazing", "wonderful", "great", "excellent"]

	# Fungsi untuk mencari antonim menggunakan WordNet
	def find_opposite(word):
	antonyms = []
	for syn in wordnet.synsets(word):
	for lemma in syn.lemmas():
	if lemma.antonyms(): # Cek apakah ada antonim
	antonyms.append(lemma.antonyms()[0].name())
	return antonyms[0] if antonyms else None

	# Fungsi untuk menghasilkan kata pengganti secara acak menggunakan GPT-2
	def generate_random_antonym(word):
	prompt = f"Generate a random positive word to replace the toxic word '{word}':"
	try:
	response = text_generator(prompt, max_new_tokens=5, truncation=True, num_return_sequences=1)
	generated_text = response[0]['generated_text']
	# Ambil kata pertama dari hasil yang dihasilkan
	random_antonym = generated_text.split(":")[-1].strip().split()[0]
	# Validasi apakah hasil hanya terdiri dari alfabet
	if random_antonym.isalpha():
	return random_antonym
	else:
	return random.choice(positive_words)
	except Exception as e:
	print(f"Error in generating random antonym for '{word}': {e}")
	# Fallback ke kata positif acak
	return random.choice(positive_words)

	# Fungsi untuk memuat mapping negatif ke positif dari file
	def load_neg_to_pos_map(filepath):
	neg_to_pos_map = {}
	with open(filepath, "r") as file:
	for line_number, line in enumerate(file, start=1):
	if line.strip(): # Skip empty lines
	parts = line.strip().split(":")
	if len(parts) == 2: # Pastikan format benar
	neg, pos = parts
	neg_to_pos_map[neg.strip().lower()] = pos.strip()
	else:
	print(f"Warning: Invalid format on line {line_number}: {line.strip()}")
	return neg_to_pos_map

	# Fungsi untuk memperbarui file mapping
	def update_neg_to_pos_file(filepath, word, opposite_word):
	with open(filepath, "a") as file:
	file.write(f"{word} : {opposite_word}\n")

	# Fungsi untuk mengganti kata-kata toksik
	def replace_toxic_words(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
	words = text.split()
	replaced_words = []
	updates = []
	unresolved = []

	for word in words:
	# Bersihkan kata dari tanda baca
	clean_word = word.strip(string.punctuation).lower()

	# Gunakan model untuk mendeteksi toksik
	result = hate_speech_classifier(clean_word)
	label = result[0]['label']
	confidence = result[0]['score']

	if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
	# Jika kata toksik, cek apakah sudah ada pengganti
	if clean_word in neg_to_pos_map:
	replacement = neg_to_pos_map[clean_word]
	replaced_word = word.replace(clean_word, replacement)
	replaced_words.append(replaced_word)
	else:
	# Cari antonim atau hasilkan secara acak
	antonym = find_opposite(clean_word) or generate_random_antonym(clean_word)
	if antonym and antonym.isalpha(): # Validasi hasil penggantian
	neg_to_pos_map[clean_word] = antonym
	update_neg_to_pos_file(filepath, clean_word, antonym)
	updates.append((clean_word, antonym))
	replaced_word = word.replace(clean_word, antonym)
	replaced_words.append(replaced_word)
	else:
	# Jika gagal, fallback ke kata positif acak
	fallback_word = random.choice(positive_words)
	neg_to_pos_map[clean_word] = fallback_word
	update_neg_to_pos_file(filepath, clean_word, fallback_word)
	updates.append((clean_word, fallback_word))
	replaced_word = word.replace(clean_word, fallback_word)
	replaced_words.append(replaced_word)
	else:
	# Kata non-toksik tetap
	replaced_words.append(word)

	return " ".join(replaced_words), updates, unresolved

	# Fungsi untuk mendeteksi dan mereparafrase teks
	def detect_and_paraphrase_with_ban(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
	# Cek apakah user sudah diblokir
	if toxic_counter["count"] >= 3:
	return "You have been banned for submitting toxic content multiple times. Please refresh to try again."

	# Deteksi konten toksik
	result = hate_speech_classifier(text)
	label = result[0]['label']
	confidence = result[0]['score']

	detection_info = f"Detection: {label} (Confidence: {confidence:.2f})\n"

	# Jika teks terdeteksi toksik
	if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
	toxic_counter["count"] += 1
	detection_info += "Detected toxic content. Rewriting...\n"

	if toxic_counter["count"] >= 3:
	return "You have been banned for submitting toxic content multiple times. Please refresh to try again."

	# Ganti kata toksik
	rewritten_text, updates, unresolved = replace_toxic_words(text, neg_to_pos_map, filepath)

	# Log perubahan dan kata yang tidak terselesaikan
	if updates:
	detection_info += "Updates made:\n" + "\n".join(
	[f"- '{word}' updated with antonym '{opposite}'" for word, opposite in updates]
	) + "\n"
	if unresolved:
	detection_info += "Unresolved words (no antonyms found): " + ", ".join(unresolved) + "\n"

	return detection_info + f"Rewritten Text: {rewritten_text}"
	else:
	detection_info += "Content is not toxic or confidence is too low.\n"
	return detection_info + f"Original Text: {text}"

	# Muat peta negatif ke positif
	neg_to_pos_map = load_neg_to_pos_map(filepath)

	import gradio as gr

	# Fungsi untuk Gradio
	def detect_and_rewrite_chatbot(input_text):
	global neg_to_pos_map
	if not neg_to_pos_map:
	neg_to_pos_map = load_neg_to_pos_map(filepath)
	return detect_and_paraphrase_with_ban(input_text, neg_to_pos_map, filepath)

	# Buat antarmuka Gradio
	with gr.Blocks() as chatbot_interface:
	gr.Markdown("## Toxicity Detection")
	with gr.Row():
	input_text = gr.Textbox(label="Input Text", placeholder="Type something...", lines=2)
	output_text = gr.Textbox(label="Output Text", interactive=False)
	submit_button = gr.Button("Submit")
	submit_button.click(detect_and_rewrite_chatbot, inputs=input_text, outputs=output_text)

	# Jalankan Gradio
	if __name__ == "__main__":
	chatbot_interface.launch()