Toxic_Detection / app.py
dragonities's picture
Initial commit for Toxic Detection project
8415dcf
raw
history blame
13.8 kB
# -*- coding: utf-8 -*-
"""ai-portfolio.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1XN71Q8R5ctujwjQB0XsGHB7KBp4hP6wR
# Project: Portfolio - Final Project
**Instructions for Students:**
Please carefully follow these steps to complete and submit your assignment:
1. **Completing the Assignment**: You are required to work on and complete all tasks in the provided assignment. Be disciplined and ensure that you thoroughly engage with each task.
2. **Creating a Google Drive Folder**: If you don't previously have a folder for collecting assignments, you must create a new folder in your Google Drive. This will be a repository for all your completed assignment files, helping you keep your work organized and easy to access.
3. **Uploading Completed Assignment**: Upon completion of your assignment, make sure to upload all necessary files, involving codes, reports, and related documents into the created Google Drive folder. Save this link in the 'Student Identity' section and also provide it as the last parameter in the `submit` function that has been provided.
4. **Sharing Folder Link**: You're required to share the link to your assignment Google Drive folder. This is crucial for the submission and evaluation of your assignment.
5. **Setting Permission toPublic**: Please make sure your **Google Drive folder is set to public**. This allows your instructor to access your solutions and assess your work correctly.
Adhering to these procedures will facilitate a smooth assignment process for you and the reviewers.
**Description:**
Welcome to your final portfolio project assignment for AI Bootcamp. This is your chance to put all the skills and knowledge you've learned throughout the bootcamp into action by creating real-world AI application.
You have the freedom to create any application or model, be it text-based or image-based or even voice-based or multimodal.
To get you started, here are some ideas:
1. **Sentiment Analysis Application:** Develop an application that can determine sentiment (positive, negative, neutral) from text data like reviews or social media posts. You can use Natural Language Processing (NLP) libraries like NLTK or TextBlob, or more advanced pre-trained models from transformers library by Hugging Face, for your sentiment analysis model.
2. **Chatbot:** Design a chatbot serving a specific purpose such as customer service for a certain industry, a personal fitness coach, or a study helper. Libraries like ChatterBot or Dialogflow can assist in designing conversational agents.
3. **Predictive Text Application:** Develop a model that suggests the next word or sentence similar to predictive text on smartphone keyboards. You could use the transformers library by Hugging Face, which includes pre-trained models like GPT-2.
4. **Image Classification Application:** Create a model to distinguish between different types of flowers or fruits. For this type of image classification task, pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be utilized.
5. **News Article Classifier:** Develop a text classification model that categorizes news articles into predefined categories. NLTK, SpaCy, and sklearn are valuable libraries for text pre-processing, feature extraction, and building classification models.
6. **Recommendation System:** Create a simplified recommendation system. For instance, a book or movie recommender based on user preferences. Python's Surprise library can assist in building effective recommendation systems.
7. **Plant Disease Detection:** Develop a model to identify diseases in plants using leaf images. This project requires a good understanding of convolutional neural networks (CNNs) and image processing. PyTorch, TensorFlow, and OpenCV are all great tools to use.
8. **Facial Expression Recognition:** Develop a model to classify human facial expressions. This involves complex feature extraction and classification algorithms. You might want to leverage deep learning libraries like TensorFlow or PyTorch, along with OpenCV for processing facial images.
9. **Chest X-Ray Interpretation:** Develop a model to detect abnormalities in chest X-ray images. This task may require understanding of specific features in such images. Again, TensorFlow and PyTorch for deep learning, and libraries like SciKit-Image or PIL for image processing, could be of use.
10. **Food Classification:** Develop a model to classify a variety of foods such as local Indonesian food. Pre-trained models like ResNet or VGG from PyTorch or TensorFlow can be a good starting point.
11. **Traffic Sign Recognition:** Design a model to recognize different traffic signs. This project has real-world applicability in self-driving car technology. Once more, you might utilize PyTorch or TensorFlow for the deep learning aspect, and OpenCV for image processing tasks.
**Submission:**
Please upload both your model and application to Huggingface or your own Github account for submission.
**Presentation:**
You are required to create a presentation to showcase your project, including the following details:
- The objective of your model.
- A comprehensive description of your model.
- The specific metrics used to measure your model's effectiveness.
- A brief overview of the dataset used, including its source, pre-processing steps, and any insights.
- An explanation of the methodology used in developing the model.
- A discussion on challenges faced, how they were handled, and your learnings from those.
- Suggestions for potential future improvements to the model.
- A functioning link to a demo of your model in action.
**Grading:**
Submissions will be manually graded, with a select few given the opportunity to present their projects in front of a panel of judges. This will provide valuable feedback, further enhancing your project and expanding your knowledge base.
Remember, consistent practice is the key to mastering these concepts. Apply your knowledge, ask questions when in doubt, and above all, enjoy the process. Best of luck to you all!
"""
# Commented out IPython magic to ensure Python compatibility.
# %pip install rggrader
"""## Working Space"""
import nltk
nltk.download('wordnet')
nltk.download('omw-1.4') # Untuk mendukung antonim multi-bahasa
"""## Submit Notebook"""
import random
from transformers import pipeline
import string
from nltk.corpus import wordnet
import nltk
# Unduh resource WordNet
nltk.download("wordnet")
nltk.download("omw-1.4")
# Load GPT-2 untuk menghasilkan kata pengganti
text_generator = pipeline("text-generation", model="gpt2")
# Load pretrained hate speech detection model
hate_speech_classifier = pipeline("text-classification", model="unitary/toxic-bert")
# Confidence threshold untuk mendeteksi toksisitas
CONFIDENCE_THRESHOLD = 0.5
# Initialize toxic counter
toxic_counter = {"count": 0}
# File path untuk menyimpan mapping negatif ke positif
filepath = "extended_negative_to_positive_words.txt"
# Daftar kata positif untuk fallback
positive_words = ["kind", "friendly", "smart", "brilliant", "amazing", "wonderful", "great", "excellent"]
# Fungsi untuk mencari antonim menggunakan WordNet
def find_opposite(word):
antonyms = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
if lemma.antonyms(): # Cek apakah ada antonim
antonyms.append(lemma.antonyms()[0].name())
return antonyms[0] if antonyms else None
# Fungsi untuk menghasilkan kata pengganti secara acak menggunakan GPT-2
def generate_random_antonym(word):
prompt = f"Generate a random positive word to replace the toxic word '{word}':"
try:
response = text_generator(prompt, max_new_tokens=5, truncation=True, num_return_sequences=1)
generated_text = response[0]['generated_text']
# Ambil kata pertama dari hasil yang dihasilkan
random_antonym = generated_text.split(":")[-1].strip().split()[0]
# Validasi apakah hasil hanya terdiri dari alfabet
if random_antonym.isalpha():
return random_antonym
else:
return random.choice(positive_words)
except Exception as e:
print(f"Error in generating random antonym for '{word}': {e}")
# Fallback ke kata positif acak
return random.choice(positive_words)
# Fungsi untuk memuat mapping negatif ke positif dari file
def load_neg_to_pos_map(filepath):
neg_to_pos_map = {}
with open(filepath, "r") as file:
for line_number, line in enumerate(file, start=1):
if line.strip(): # Skip empty lines
parts = line.strip().split(":")
if len(parts) == 2: # Pastikan format benar
neg, pos = parts
neg_to_pos_map[neg.strip().lower()] = pos.strip()
else:
print(f"Warning: Invalid format on line {line_number}: {line.strip()}")
return neg_to_pos_map
# Fungsi untuk memperbarui file mapping
def update_neg_to_pos_file(filepath, word, opposite_word):
with open(filepath, "a") as file:
file.write(f"{word} : {opposite_word}\n")
# Fungsi untuk mengganti kata-kata toksik
def replace_toxic_words(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
words = text.split()
replaced_words = []
updates = []
unresolved = []
for word in words:
# Bersihkan kata dari tanda baca
clean_word = word.strip(string.punctuation).lower()
# Gunakan model untuk mendeteksi toksik
result = hate_speech_classifier(clean_word)
label = result[0]['label']
confidence = result[0]['score']
if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
# Jika kata toksik, cek apakah sudah ada pengganti
if clean_word in neg_to_pos_map:
replacement = neg_to_pos_map[clean_word]
replaced_word = word.replace(clean_word, replacement)
replaced_words.append(replaced_word)
else:
# Cari antonim atau hasilkan secara acak
antonym = find_opposite(clean_word) or generate_random_antonym(clean_word)
if antonym and antonym.isalpha(): # Validasi hasil penggantian
neg_to_pos_map[clean_word] = antonym
update_neg_to_pos_file(filepath, clean_word, antonym)
updates.append((clean_word, antonym))
replaced_word = word.replace(clean_word, antonym)
replaced_words.append(replaced_word)
else:
# Jika gagal, fallback ke kata positif acak
fallback_word = random.choice(positive_words)
neg_to_pos_map[clean_word] = fallback_word
update_neg_to_pos_file(filepath, clean_word, fallback_word)
updates.append((clean_word, fallback_word))
replaced_word = word.replace(clean_word, fallback_word)
replaced_words.append(replaced_word)
else:
# Kata non-toksik tetap
replaced_words.append(word)
return " ".join(replaced_words), updates, unresolved
# Fungsi untuk mendeteksi dan mereparafrase teks
def detect_and_paraphrase_with_ban(text, neg_to_pos_map, filepath="extended_negative_to_positive_words.txt"):
# Cek apakah user sudah diblokir
if toxic_counter["count"] >= 3:
return "You have been banned for submitting toxic content multiple times. Please refresh to try again."
# Deteksi konten toksik
result = hate_speech_classifier(text)
label = result[0]['label']
confidence = result[0]['score']
detection_info = f"Detection: {label} (Confidence: {confidence:.2f})\n"
# Jika teks terdeteksi toksik
if "toxic" in label.lower() and confidence >= CONFIDENCE_THRESHOLD:
toxic_counter["count"] += 1
detection_info += "Detected toxic content. Rewriting...\n"
if toxic_counter["count"] >= 3:
return "You have been banned for submitting toxic content multiple times. Please refresh to try again."
# Ganti kata toksik
rewritten_text, updates, unresolved = replace_toxic_words(text, neg_to_pos_map, filepath)
# Log perubahan dan kata yang tidak terselesaikan
if updates:
detection_info += "Updates made:\n" + "\n".join(
[f"- '{word}' updated with antonym '{opposite}'" for word, opposite in updates]
) + "\n"
if unresolved:
detection_info += "Unresolved words (no antonyms found): " + ", ".join(unresolved) + "\n"
return detection_info + f"Rewritten Text: {rewritten_text}"
else:
detection_info += "Content is not toxic or confidence is too low.\n"
return detection_info + f"Original Text: {text}"
# Muat peta negatif ke positif
neg_to_pos_map = load_neg_to_pos_map(filepath)
import gradio as gr
# Fungsi untuk Gradio
def detect_and_rewrite_chatbot(input_text):
global neg_to_pos_map
if not neg_to_pos_map:
neg_to_pos_map = load_neg_to_pos_map(filepath)
return detect_and_paraphrase_with_ban(input_text, neg_to_pos_map, filepath)
# Buat antarmuka Gradio
with gr.Blocks() as chatbot_interface:
gr.Markdown("## Toxicity Detection")
with gr.Row():
input_text = gr.Textbox(label="Input Text", placeholder="Type something...", lines=2)
output_text = gr.Textbox(label="Output Text", interactive=False)
submit_button = gr.Button("Submit")
submit_button.click(detect_and_rewrite_chatbot, inputs=input_text, outputs=output_text)
# Jalankan Gradio
if __name__ == "__main__":
chatbot_interface.launch()