--- library_name: transformers license: apache-2.0 datasets: - Jialuo21/Science-T2I-Trainset base_model: - laion/CLIP-ViT-H-14-laion2B-s32B-b79K --- # SciScore SciScore is finetuned on the base model [CLIP-H](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) using [Science-T2I](https://huggingface.co/datasets/Jialuo21/Science-T2I-Trainset) dataset. It takes an implicit prompt and a generated image as input and outputs a score that represents the scientific alignment between them. ## Resources - [Website](https://jialuo-li.github.io/Science-T2I-Web/) - [arXiv: Paper](https://arxiv.org/abs/2504.13129) - [GitHub: Code](https://github.com/Jialuo-Li/Science-T2I) - [Huggingface: Science-T2I-S&C Benchmark](https://huggingface.co/collections/Jialuo21/science-t2i-67d3bfe43253da2bc7cfaf06) - [Huggingface: Science-T2I Trainset](https://huggingface.co/datasets/Jialuo21/Science-T2I-Trainset) ## Feature ## Qick Start ``` from transformers import AutoProcessor, AutoModel from PIL import Image import torch device = "cuda" processor_name_or_path = "Jialuo21/SciScore" model_pretrained_name_or_path = "Jialuo21/SciScore" processor = AutoProcessor.from_pretrained(processor_name_or_path) model = AutoModel.from_pretrained(model_pretrained_name_or_path).eval().to(device) def calc_probs(prompt, images): image_inputs = processor( images=images, padding=True, truncation=True, max_length=77, return_tensors="pt", ).to(device) text_inputs = processor( text=prompt, padding=True, truncation=True, max_length=77, return_tensors="pt", ).to(device) with torch.no_grad(): image_embs = model.get_image_features(**image_inputs) image_embs = image_embs / torch.norm(image_embs, dim=-1, keepdim=True) text_embs = model.get_text_features(**text_inputs) text_embs = text_embs / torch.norm(text_embs, dim=-1, keepdim=True) scores = model.logit_scale.exp() * (text_embs @ image_embs.T)[0] probs = torch.softmax(scores, dim=-1) return probs.cpu().tolist() pil_images = [Image.open("./examples/camera_1.png"), Image.open("./examples/camera_2.png")] prompt = "A camera screen without electricity sits beside the window, realistic." print(calc_probs(prompt, pil_images)) ``` ## Citation ``` @misc{li2025sciencet2iaddressingscientificillusions, title={Science-T2I: Addressing Scientific Illusions in Image Synthesis}, author={Jialuo Li and Wenhao Chai and Xingyu Fu and Haiyang Xu and Saining Xie}, year={2025}, eprint={2504.13129}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2504.13129}, } ```