Spaces:

yuezih
/

BLIP-SMILE

Runtime error

App Files Files Community

BLIP-SMILE / SMILE /README.md

yuezih

init

ca19ab4 over 1 year ago

preview code

raw

history blame contribute delete

3.25 kB

	<div>
	<h2 align="center">
	🫠 SMILE
	</h2>
	</div>

	<p align="center">
	<a >
	<img alt="Issues" src="https://img.shields.io/github/issues/yuezih/SMILE?color=blueviolet" />
	</a>
	<a >
	<img alt="Forks" src="https://img.shields.io/github/forks/yuezih/SMILE?color=orange" />
	</a>
	<a >
	<img alt="Stars" src="https://img.shields.io/github/stars/yuezih/SMILE?color=ff69b4" />
	</a>
	<br />
	</p>

	[Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation](https://arxiv.org/abs/2306.13460)

	![case.png](./assets/case.png)

	---

	## News 📢

	- [2023.09.30] We now provide the code and our trained checkpoints (of BLIP) for quick deploying and easy reproduction. The previous demonstrative codes are now available at [demonstrative.md](./assets/demonstrative.md).
	- [2023.06.26] We provide the demonstrative codes to show how to implement SMILE in your codebase, including a pseudocode, a [BLIP](https://github.com/salesforce/BLIP) version, and a [transformers](https://github.com/huggingface/transformers) version.

	## Demo

	We are building online demos. Please stay tuned.

	## Usage

	```
	git clone https://github.com/yuezih/SMILE
	cd SMILE/BLIP
	```

	### Installation

	```
	pip install -r requirements.txt
	```

	The code has been tested on PyTorch 2.0.0.

	### Data Preparation

	The data configs are in `SMILE/BLIP/configs/caption_coco.yaml`.
	- Set the `image_root` to your MSCOCO image root.
	- MSCOCO annotation files will be automatically downloaded.

	### Checkpoints

	The pre-trained and MLE-finetuned checkpoints are available at the [original BLIP repo](https://github.com/salesforce/BLIP).

	We provide our two checkpoints finetuned on MSCOCO with SMILE:
	- `blip_smile_base.pth`: The vanilla SMILE-optimized BLIP.
	- `blip_mle_smile_base.pth`: BLIP finetuned with MLE+SMILE (0.01:0.99), with a compromise between descriptiveness and accuracy.

	Method\|Download\|Cap. Len.\|Lex. Div.\|R@1\|R@5\|CLIPScore\|PPL
	-\|:-:\|:-:\|:-:\|:-:\|:-:\|:-:\|:-:
	`blip_smile_base.pth`\|[OneDrive](https://1drv.ms/u/s!AocXJ7uKxt6XcsGzBZ4XKoZWKJY?e=BW7fJK)\|22.3\|4.5\|10.0\|24.5\|75.0\|95.6
	`blip_mle_smile_base.pth`\|[OneDrive](https://1drv.ms/u/s!AocXJ7uKxt6Xc85rDJCdunDI0jU?e=eDpAGG)\|19.8\|3.6\|10.9\|25.1\|76.2\|79.4

	Set the checkpoint path in `SMILE/BLIP/configs/caption_coco.yaml`.

	### Training & Inference

	```
	bash scripts/train.sh
	```

	```
	bash scripts/eval.sh
	```

	Kind reminders:
	- Please use `transformers==4.15.0` rather than a higher version.
	- For `torch<=2.0.0`, replace `torchrun` with `python -m torch.distributed.run` in the training and inference scripts.

	## Citation

	If you find this repo to be helpful for your research, please consider citing our paper:

	```bibtex
	@misc{yue2023learning,
	title={Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation},
	author={Zihao Yue and Anwen Hu and Liang Zhang and Qin Jin},
	year={2023},
	eprint={2306.13460},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	## Acknowledgement

	Our work relies on resources from [BLIP](https://github.com/salesforce/BLIP) and [HuggingFace transformers](https://github.com/huggingface/transformers). Many thanks to them for their amazing efforts.