--- license: mit language: - en pipeline_tag: image-segmentation --- # LogoCleaner: Logo Detection and Removal Models This repository contains the required models for the LogoCleaner Software, developed as a course project for **COMP4432** at the Department of Computing, The Hong Kong Polytechnic University. The implementation code and detailed usage instructions can be found at [LogoCleaner GitHub Repository](https://github.com/hiteacherIamhumble/LogoCleaner). ## Model Details ### Model Description This repository contains two key model weights: 1. **sam_vit_b_01ec64.pth**: The Segment Anything Model (SAM) with ViT-B backbone developed by Meta. SAM is a powerful foundation model for image segmentation tasks that can identify objects in images based on prompts. 2. **best_model.pth**: A custom selector module that works on top of the SAM model, specifically trained by our team to identify and select logo regions in images. Together, these models form the backbone of the LogoCleaner application, which can automatically detect and remove logos from images. ### Model Architecture - **SAM (ViT-B)**: A vision transformer-based architecture that serves as a powerful segmentation foundation model. - **Selector Module**: A custom neural network that takes SAM's outputs and specializes in logo identification. ## Intended Uses & Limitations ### Intended Uses - Automatic logo detection in images - Logo removal and inpainting for privacy or copyright reasons - Educational purposes for computer vision and image processing ### Limitations - Performance may vary depending on logo complexity and image quality - The model works best with clear, distinct logos rather than heavily stylized or distorted ones ## Training Data The selector module was trained on [FlickrLogos-32](https://www.uni-augsburg.de/en/fakultaet/fai/informatik/prof/mmc/research/datensatze/flickrlogos/) dataset released in ICMR11 and updated in ICML2017, which contains photos showing brand logos and is meant for the evaluation of logo retrieval and multi-class logo detection/recognition systems on real-world images. Note that even though the dataset is open-source and well-known, we cannot provide the dataset with link since the owner requires an (informal) email to [email](request_flickrlogos@informatik.uni-augsburg.de) in order to get the dataset. Our team also send the email and get the original datasets and we apologize for the inconvience. ## Training Procedure We use a single RTX 4090 graphic card and train with 50 epoches (the selector module converges very fast at around the 45 epoch). You could refer to the training source code with our [GitHub Repository](https://github.com/hiteacherIamhumble/LogoCleaner). ## Evaluation Results We choose the classic fusion loss (BCE and Dice loss) for the evaluation results and outperforms the classic Unet with 2% dice loss. The details please refer to the report in our [GitHub Repository](https://github.com/hiteacherIamhumble/LogoCleaner). ## Usage ### Direct Download Both model files can be downloaded directly from this repository's files section. ### Using the Hugging Face Hub ```python from huggingface_hub import hf_hub_download # Download the SAM model sam_path = hf_hub_download( repo_id="PeterDAI/LogoCleaner", filename="sam_vit_b_01ec64.pth" ) # Download the selector model selector_path = hf_hub_download( repo_id="PeterDAI/LogoCleaner", filename="best_model.pth" ) ``` ## Contact If you are the Prof or TA assessing our project and encouter any problem, feel free to contact us with this [email](22097845d@connect.polyu.hk).