--- license: apache-2.0 language: - fr - en pipeline_tag: zero-shot-object-detection library_name: transformers base_model: - omlab/omdet-turbo-swin-tiny-hf tags: - endpoints-template --- # Fork of [omlab/omdet-turbo-swin-tiny-hf](https://huggingface.co/omlab/omdet-turbo-swin-tiny-hf) for a `zero-shot-object-detection` Inference endpoint. This repository implements a `custom` task for `zero-shot-object-detection` for 🤗 Inference Endpoints. The code for the customized handler is in the [handler.py](https://huggingface.co/Blueway/inference-endpoint-for-omdet-turbo-swin-tiny-hf/blob/main/handler.py). To use deploy this model a an Inference Endpoint you have to select `Custom` as task to use the `handler.py` file. The repository contains a requirements.txt to download the timm library. ### expected Request payload ```json { inputs:{ "image": "/9j/4AAQSkZJRgABAQEBLAEsAAD/2wBDAAMCAgICAgMC....", // base64 image as bytes "candiates":["broken curb", "broken road", "broken road sign", "broken sidewalk"] } } ``` below is an example on how to run a request using Python and `requests`. ## Run Request ``` python import json from typing import List import requests as r import base64 ENDPOINT_URL = "" HF_TOKEN = "" def predict(path_to_image: str = None, candidates: List[str] = None): with open(path_to_image, "rb") as i: b64 = base64.b64encode(i.read()) payload = {"inputs": {"image": b64.decode("utf-8"), "candidates": candidates}} response = r.post( ENDPOINT_URL, headers={"Authorization": f"Bearer {HF_TOKEN}"}, json=payload ) return response.json() prediction = predict( path_to_image="image/brokencurb.jpg", candidates=["broken curb", "broken road", "broken road sign", "broken sidewalk"] ) print(json.dumps(prediction, indent=2)) ``` expected output ``` python { "boxes": [ [ 1.919342041015625, 231.1556396484375, 1011.4019775390625, 680.3773193359375 ], [ 610.9949951171875, 397.6180419921875, 1019.9259033203125, 510.8144226074219 ], [ 1.919342041015625, 231.1556396484375, 1011.4019775390625, 680.3773193359375 ], [ 786.1240234375, 68.618896484375, 916.1265869140625, 225.0513458251953 ] ], "scores": [ 0.4329715967178345, 0.4215811491012573, 0.3389397859573364, 0.3133399784564972 ], "candidates": [ "broken sidewalk", "broken road sign", "broken road", "broken road sign" ] } ``` The boxes are structured like {x_min, y_min, x_max, y_max} ## visualize result
image/png
input image
To visualize the result of the request you can implement this code ``` python prediction = predict( path_to_image="image/cat_and_remote.jpg", candidates=["cat", "remote", "pot hole"] ) import matplotlib.pyplot as plt import matplotlib.patches as patches with open("image/cat_and_remote.jpg", "rb") as i: image = plt.imread(i) # Plot image fig, ax = plt.subplots(1) ax.imshow(image) for score, class_name, box in zip( prediction["scores"], prediction["candidates"], prediction["boxes"] ): # Create a Rectangle patch rect = patches.Rectangle([int(box[0]), int(box[1])], int(box[2] - box[0]), int(box[3] - box[1]), linewidth=1, edgecolor='r', facecolor='none') # Add the patch to the Axes ax.add_patch(rect) ax.text(int(box[0]), int(box[1]), str(round(score, 2)) + " " + str(class_name), color='white', fontsize=6, bbox=dict(facecolor='red', alpha=0.5)) plt.savefig('image_result/cat_and_remote_with_bboxes_zero_shot.jpeg') ``` **result**
image/png
output image
## Credits This adaptation for huggingface inference endpoint was inspiered by [@philschmid](https://huggingface.co/philschmid) work on [philschmid/clip-zero-shot-image-classification](https://huggingface.co/philschmid/clip-zero-shot-image-classification).