BLIP-SMILE / SMILE /README.md
yuezih
init
ca19ab4

A newer version of the Gradio SDK is available: 5.29.0

Upgrade

🫠 SMILE

Issues Forks Stars

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

case.png


News 📢

  • [2023.09.30] We now provide the code and our trained checkpoints (of BLIP) for quick deploying and easy reproduction. The previous demonstrative codes are now available at demonstrative.md.
  • [2023.06.26] We provide the demonstrative codes to show how to implement SMILE in your codebase, including a pseudocode, a BLIP version, and a transformers version.

Demo

We are building online demos. Please stay tuned.

Usage

git clone https://github.com/yuezih/SMILE
cd SMILE/BLIP

Installation

pip install -r requirements.txt

The code has been tested on PyTorch 2.0.0.

Data Preparation

The data configs are in SMILE/BLIP/configs/caption_coco.yaml.

  • Set the image_root to your MSCOCO image root.
  • MSCOCO annotation files will be automatically downloaded.

Checkpoints

The pre-trained and MLE-finetuned checkpoints are available at the original BLIP repo.

We provide our two checkpoints finetuned on MSCOCO with SMILE:

  • blip_smile_base.pth: The vanilla SMILE-optimized BLIP.
  • blip_mle_smile_base.pth: BLIP finetuned with MLE+SMILE (0.01:0.99), with a compromise between descriptiveness and accuracy.
Method Download Cap. Len. Lex. Div. R@1 R@5 CLIPScore PPL
blip_smile_base.pth OneDrive 22.3 4.5 10.0 24.5 75.0 95.6
blip_mle_smile_base.pth OneDrive 19.8 3.6 10.9 25.1 76.2 79.4

Set the checkpoint path in SMILE/BLIP/configs/caption_coco.yaml.

Training & Inference

bash scripts/train.sh
bash scripts/eval.sh

Kind reminders:

  • Please use transformers==4.15.0 rather than a higher version.
  • For torch<=2.0.0, replace torchrun with python -m torch.distributed.run in the training and inference scripts.

Citation

If you find this repo to be helpful for your research, please consider citing our paper:

@misc{yue2023learning,
      title={Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation}, 
      author={Zihao Yue and Anwen Hu and Liang Zhang and Qin Jin},
      year={2023},
      eprint={2306.13460},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

Our work relies on resources from BLIP and HuggingFace transformers. Many thanks to them for their amazing efforts.