LogoCleaner: Logo Detection and Removal Models

This repository contains the required models for the LogoCleaner Software, developed as a course project for COMP4432 at the Department of Computing, The Hong Kong Polytechnic University. The implementation code and detailed usage instructions can be found at LogoCleaner GitHub Repository.

Model Details

Model Description

This repository contains two key model weights:

sam_vit_b_01ec64.pth: The Segment Anything Model (SAM) with ViT-B backbone developed by Meta. SAM is a powerful foundation model for image segmentation tasks that can identify objects in images based on prompts.
best_model.pth: A custom selector module that works on top of the SAM model, specifically trained by our team to identify and select logo regions in images.

Together, these models form the backbone of the LogoCleaner application, which can automatically detect and remove logos from images.

Model Architecture

SAM (ViT-B): A vision transformer-based architecture that serves as a powerful segmentation foundation model.
Selector Module: A custom neural network that takes SAM's outputs and specializes in logo identification.

Intended Uses & Limitations

Intended Uses

Automatic logo detection in images
Logo removal and inpainting for privacy or copyright reasons
Educational purposes for computer vision and image processing

Limitations

Performance may vary depending on logo complexity and image quality
The model works best with clear, distinct logos rather than heavily stylized or distorted ones

Training Data

The selector module was trained on FlickrLogos-32 dataset released in ICMR11 and updated in ICML2017, which contains photos showing brand logos and is meant for the evaluation of logo retrieval and multi-class logo detection/recognition systems on real-world images.

Note that even though the dataset is open-source and well-known, we cannot provide the dataset with link since the owner requires an (informal) email to email in order to get the dataset. Our team also send the email and get the original datasets and we apologize for the inconvience.

Training Procedure

We use a single RTX 4090 graphic card and train with 50 epoches (the selector module converges very fast at around the 45 epoch). You could refer to the training source code with our GitHub Repository.

Evaluation Results

We choose the classic fusion loss (BCE and Dice loss) for the evaluation results and outperforms the classic Unet with 2% dice loss. The details please refer to the report in our GitHub Repository.

Usage

Direct Download

Both model files can be downloaded directly from this repository's files section.

Using the Hugging Face Hub

from huggingface_hub import hf_hub_download

# Download the SAM model
sam_path = hf_hub_download(
    repo_id="PeterDAI/LogoCleaner",
    filename="sam_vit_b_01ec64.pth"
)

# Download the selector model
selector_path = hf_hub_download(
    repo_id="PeterDAI/LogoCleaner",
    filename="best_model.pth"
)

Contact

If you are the Prof or TA assessing our project and encouter any problem, feel free to contact us with this email.