Spaces:
Runtime error
Runtime error
<div> | |
<h2 align="center"> | |
🫠 SMILE | |
</h2> | |
</div> | |
<p align="center"> | |
<a > | |
<img alt="Issues" src="https://img.shields.io/github/issues/yuezih/SMILE?color=blueviolet" /> | |
</a> | |
<a > | |
<img alt="Forks" src="https://img.shields.io/github/forks/yuezih/SMILE?color=orange" /> | |
</a> | |
<a > | |
<img alt="Stars" src="https://img.shields.io/github/stars/yuezih/SMILE?color=ff69b4" /> | |
</a> | |
<br /> | |
</p> | |
[Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation](https://arxiv.org/abs/2306.13460) | |
 | |
--- | |
## News 📢 | |
- [2023.09.30] We now provide the code and our trained checkpoints (of BLIP) for quick deploying and easy reproduction. The previous demonstrative codes are now available at [demonstrative.md](./assets/demonstrative.md). | |
- [2023.06.26] We provide the demonstrative codes to show how to implement SMILE in your codebase, including a pseudocode, a [BLIP](https://github.com/salesforce/BLIP) version, and a [transformers](https://github.com/huggingface/transformers) version. | |
## Demo | |
We are building online demos. Please stay tuned. | |
## Usage | |
``` | |
git clone https://github.com/yuezih/SMILE | |
cd SMILE/BLIP | |
``` | |
### Installation | |
``` | |
pip install -r requirements.txt | |
``` | |
The code has been tested on PyTorch 2.0.0. | |
### Data Preparation | |
The data configs are in `SMILE/BLIP/configs/caption_coco.yaml`. | |
- Set the `image_root` to your MSCOCO image root. | |
- MSCOCO annotation files will be automatically downloaded. | |
### Checkpoints | |
The pre-trained and MLE-finetuned checkpoints are available at the [original BLIP repo](https://github.com/salesforce/BLIP). | |
We provide our two checkpoints finetuned on MSCOCO with SMILE: | |
- `blip_smile_base.pth`: The vanilla SMILE-optimized BLIP. | |
- `blip_mle_smile_base.pth`: BLIP finetuned with MLE+SMILE (0.01:0.99), with a compromise between descriptiveness and accuracy. | |
Method|Download|Cap. Len.|Lex. Div.|R@1|R@5|CLIPScore|PPL | |
-|:-:|:-:|:-:|:-:|:-:|:-:|:-: | |
`blip_smile_base.pth`|[OneDrive](https://1drv.ms/u/s!AocXJ7uKxt6XcsGzBZ4XKoZWKJY?e=BW7fJK)|22.3|4.5|10.0|24.5|75.0|95.6 | |
`blip_mle_smile_base.pth`|[OneDrive](https://1drv.ms/u/s!AocXJ7uKxt6Xc85rDJCdunDI0jU?e=eDpAGG)|19.8|3.6|**10.9**|**25.1**|76.2|79.4 | |
Set the checkpoint path in `SMILE/BLIP/configs/caption_coco.yaml`. | |
### Training & Inference | |
``` | |
bash scripts/train.sh | |
``` | |
``` | |
bash scripts/eval.sh | |
``` | |
Kind reminders: | |
- Please use `transformers==4.15.0` rather than a higher version. | |
- For `torch<=2.0.0`, replace `torchrun` with `python -m torch.distributed.run` in the training and inference scripts. | |
## Citation | |
If you find this repo to be helpful for your research, please consider citing our paper: | |
```bibtex | |
@misc{yue2023learning, | |
title={Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation}, | |
author={Zihao Yue and Anwen Hu and Liang Zhang and Qin Jin}, | |
year={2023}, | |
eprint={2306.13460}, | |
archivePrefix={arXiv}, | |
primaryClass={cs.CL} | |
} | |
``` | |
## Acknowledgement | |
Our work relies on resources from [BLIP](https://github.com/salesforce/BLIP) and [HuggingFace transformers](https://github.com/huggingface/transformers). Many thanks to them for their amazing efforts. | |