Given a photo of a face, will describe it. Be careful as it can be unflattering.

Based on the GIT-Base-COCO image to text model and fine-tuned on Face2Text.

How to use:

from transformers import AutoProcessor, AutoModelForCausalLM, AutoTokenizer
import cv2

DEVICE = 'cpu' # cpu or cuda
IMG_PATH = 'face.png'

processor = AutoProcessor.from_pretrained('microsoft/git-base-coco')
model = AutoModelForCausalLM.from_pretrained('mtanti/face-describer')
tokeniser = AutoTokenizer.from_pretrained('microsoft/git-base-coco')
model.eval()
model.to(DEVICE)

img = cv2.imread(IMG_PATH)
tensor_img = processor(
    images=[img[:, :, ::-1]],
    return_tensors='pt',
)['pixel_values'].to(DEVICE)
desc = tokeniser.decode(
    model.generate(pixel_values=tensor_img, max_length=100, repetition_penalty=1.05, do_sample=True)[0, :],
    skip_special_tokens=True,
)
Downloads last month
11
Safetensors
Model size
177M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mtanti/face-describer

Finetuned
(3)
this model