File size: 17,390 Bytes
c36468d 272de72 2ef1bc2 272de72 92a47ad 4c50b11 412a2bc 86ee6d9 2ef1bc2 92a47ad |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
---
license: apache-2.0
datasets:
- TNILab/DomainNet_FL
language:
- en
base_model:
- google/siglip2-base-patch16-224
pipeline_tag: image-classification
library_name: transformers
tags:
- Multisource-121-DomainNet
---

# **Multisource-121-DomainNet**
> **Multisource-121-DomainNet** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify images into 121 domain categories using the **SiglipForImageClassification** architecture.

*Moment Matching for Multi-Source Domain Adaptation* : https://arxiv.org/pdf/1812.01754
*SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786
```py
Classification Report:
precision recall f1-score support
barn 0.7483 0.8370 0.7902 270
baseball_bat 0.9197 0.9333 0.9265 270
basket 0.8302 0.8148 0.8224 270
beach 0.7059 0.7556 0.7299 270
bear 0.7500 0.7444 0.7472 270
beard 0.5496 0.5741 0.5616 270
bee 0.9004 0.9037 0.9020 270
bird 0.7352 0.7815 0.7576 270
blueberry 0.7230 0.7926 0.7562 270
bowtie 0.8726 0.8370 0.8544 270
bracelet 0.7328 0.7111 0.7218 270
brain 0.8925 0.9222 0.9071 270
bread 0.5573 0.6667 0.6071 270
broccoli 0.9200 0.7667 0.8364 270
bus 0.8442 0.8630 0.8535 270
butterfly 0.9321 0.9148 0.9234 270
circle 0.6038 0.8185 0.6950 270
cloud 0.8201 0.8444 0.8321 270
cruise_ship 0.8545 0.8481 0.8513 270
dolphin 0.8286 0.8593 0.8436 270
dumbbell 0.8705 0.8963 0.8832 270
elephant 0.8598 0.8630 0.8614 270
eye 0.8603 0.8667 0.8635 270
eyeglasses 0.8425 0.7926 0.8168 270
feather 0.8413 0.7852 0.8123 270
fish 0.8169 0.8593 0.8375 270
flower 0.7973 0.8741 0.8339 270
foot 0.8152 0.8333 0.8242 270
frog 0.9270 0.8000 0.8588 270
giraffe 0.9026 0.8926 0.8976 270
goatee 0.5171 0.5037 0.5103 270
golf_club 0.6466 0.6778 0.6618 270
grapes 0.8731 0.8407 0.8566 270
grass 0.7359 0.6296 0.6786 270
guitar 0.8386 0.8852 0.8613 270
hamburger 0.8535 0.8630 0.8582 270
hand 0.7824 0.6926 0.7348 270
hat 0.7333 0.7741 0.7532 270
headphones 0.8971 0.9037 0.9004 270
helicopter 0.8992 0.8259 0.8610 270
hexagon 0.9113 0.8370 0.8726 270
hockey_stick 0.8419 0.8481 0.8450 270
horse 0.8081 0.8889 0.8466 270
hourglass 0.9161 0.9296 0.9228 270
house 0.7524 0.8778 0.8103 270
ice_cream 0.8821 0.8593 0.8705 270
jacket 0.8621 0.7407 0.7968 270
ladder 0.7051 0.8148 0.7560 270
leg 0.5916 0.5741 0.5827 270
lipstick 0.8889 0.8000 0.8421 270
megaphone 0.8710 0.9000 0.8852 270
monkey 0.8370 0.8556 0.8462 270
moon 0.8527 0.8148 0.8333 270
mushroom 0.8774 0.8481 0.8625 270
necklace 0.8670 0.7481 0.8032 270
owl 0.9179 0.9111 0.9145 270
panda 0.9490 0.8963 0.9219 270
pear 0.8832 0.8963 0.8897 270
peas 0.7743 0.8259 0.7993 270
penguin 0.8618 0.8778 0.8697 270
pig 0.6767 0.8296 0.7454 270
pillow 0.7359 0.6296 0.6786 270
pineapple 0.9213 0.9111 0.9162 270
pizza 0.9173 0.9444 0.9307 270
pool 0.6717 0.6593 0.6654 270
popsicle 0.7390 0.8074 0.7717 270
rabbit 0.8345 0.8778 0.8556 270
rhinoceros 0.9219 0.9185 0.9202 270
rifle 0.9256 0.8296 0.8750 270
river 0.6067 0.7370 0.6656 270
sailboat 0.8606 0.9148 0.8869 270
sandwich 0.7638 0.7667 0.7652 270
sea_turtle 0.8794 0.9185 0.8986 270
shark 0.8114 0.8444 0.8276 270
shoe 0.8097 0.8667 0.8372 270
skyscraper 0.7727 0.8185 0.7950 270
snorkel 0.8238 0.6926 0.7525 270
snowman 0.8736 0.8444 0.8588 270
soccer_ball 0.9395 0.8630 0.8996 270
speedboat 0.7649 0.7593 0.7621 270
spider 0.9212 0.8222 0.8689 270
spoon 0.8165 0.8074 0.8119 270
square 0.4669 0.6259 0.5348 270
squirrel 0.8394 0.7741 0.8054 270
stethoscope 0.8566 0.8630 0.8598 270
strawberry 0.8629 0.7926 0.8263 270
streetlight 0.5000 0.6852 0.5781 270
submarine 0.6850 0.6926 0.6888 270
suitcase 0.8259 0.7556 0.7892 270
sun 0.8082 0.6556 0.7239 270
sweater 0.5912 0.6963 0.6395 270
sword 0.8258 0.8074 0.8165 270
table 0.5502 0.5481 0.5492 270
teapot 0.9019 0.8852 0.8935 270
teddy-bear 0.7906 0.8111 0.8007 270
telephone 0.7836 0.7778 0.7807 270
tent 0.7579 0.7074 0.7318 270
The_Eiffel_Tower 0.8633 0.8889 0.8759 270
The_Great_Wall_of_China 0.8893 0.8333 0.8604 270
The_Mona_Lisa 0.8152 0.9148 0.8621 270
tiger 0.8577 0.8259 0.8415 270
toaster 0.6788 0.6889 0.6838 270
tooth 0.8807 0.7926 0.8343 270
tornado 0.7530 0.7000 0.7255 270
tractor 0.9372 0.8296 0.8802 270
train 0.7692 0.7407 0.7547 270
tree 0.7639 0.8148 0.7885 270
triangle 0.8852 0.8000 0.8405 270
trombone 0.6653 0.5963 0.6289 270
truck 0.7049 0.7963 0.7478 270
trumpet 0.7463 0.5667 0.6442 270
umbrella 0.9144 0.8704 0.8918 270
vase 0.8148 0.7333 0.7719 270
violin 0.8966 0.7704 0.8287 270
watermelon 0.7970 0.8000 0.7985 270
whale 0.7769 0.6963 0.7344 270
windmill 0.8963 0.8963 0.8963 270
wine_glass 0.8996 0.8630 0.8809 270
yoga 0.7406 0.8037 0.7709 270
zebra 0.9144 0.7519 0.8252 270
zigzag 0.6502 0.6333 0.6417 270
accuracy 0.7995 32670
macro avg 0.8052 0.7995 0.8006 32670
weighted avg 0.8052 0.7995 0.8006 32670
```
The model categorizes images into the following 121 classes:
- **Class 0:** "barn"
- **Class 1:** "baseball_bat"
- **Class 2:** "basket"
- **Class 3:** "beach"
- **Class 4:** "bear"
- **Class 5:** "beard"
- **Class 6:** "bee"
- **Class 7:** "bird"
- **Class 8:** "blueberry"
- **Class 9:** "bowtie"
- **Class 10:** "bracelet"
- **Class 11:** "brain"
- **Class 12:** "bread"
- **Class 13:** "broccoli"
- **Class 14:** "bus"
- **Class 15:** "butterfly"
- **Class 16:** "circle"
- **Class 17:** "cloud"
- **Class 18:** "cruise_ship"
- **Class 19:** "dolphin"
- **Class 20:** "dumbbell"
- **Class 21:** "elephant"
- **Class 22:** "eye"
- **Class 23:** "eyeglasses"
- **Class 24:** "feather"
- **Class 25:** "fish"
- **Class 26:** "flower"
- **Class 27:** "foot"
- **Class 28:** "frog"
- **Class 29:** "giraffe"
- **Class 30:** "goatee"
- **Class 31:** "golf_club"
- **Class 32:** "grapes"
- **Class 33:** "grass"
- **Class 34:** "guitar"
- **Class 35:** "hamburger"
- **Class 36:** "hand"
- **Class 37:** "hat"
- **Class 38:** "headphones"
- **Class 39:** "helicopter"
- **Class 40:** "hexagon"
- **Class 41:** "hockey_stick"
- **Class 42:** "horse"
- **Class 43:** "hourglass"
- **Class 44:** "house"
- **Class 45:** "ice_cream"
- **Class 46:** "jacket"
- **Class 47:** "ladder"
- **Class 48:** "leg"
- **Class 49:** "lipstick"
- **Class 50:** "megaphone"
- **Class 51:** "monkey"
- **Class 52:** "moon"
- **Class 53:** "mushroom"
- **Class 54:** "necklace"
- **Class 55:** "owl"
- **Class 56:** "panda"
- **Class 57:** "pear"
- **Class 58:** "peas"
- **Class 59:** "penguin"
- **Class 60:** "pig"
- **Class 61:** "pillow"
- **Class 62:** "pineapple"
- **Class 63:** "pizza"
- **Class 64:** "pool"
- **Class 65:** "popsicle"
- **Class 66:** "rabbit"
- **Class 67:** "rhinoceros"
- **Class 68:** "rifle"
- **Class 69:** "river"
- **Class 70:** "sailboat"
- **Class 71:** "sandwich"
- **Class 72:** "sea_turtle"
- **Class 73:** "shark"
- **Class 74:** "shoe"
- **Class 75:** "skyscraper"
- **Class 76:** "snorkel"
- **Class 77:** "snowman"
- **Class 78:** "soccer_ball"
- **Class 79:** "speedboat"
- **Class 80:** "spider"
- **Class 81:** "spoon"
- **Class 82:** "square"
- **Class 83:** "squirrel"
- **Class 84:** "stethoscope"
- **Class 85:** "strawberry"
- **Class 86:** "streetlight"
- **Class 87:** "submarine"
- **Class 88:** "suitcase"
- **Class 89:** "sun"
- **Class 90:** "sweater"
- **Class 91:** "sword"
- **Class 92:** "table"
- **Class 93:** "teapot"
- **Class 94:** "teddy-bear"
- **Class 95:** "telephone"
- **Class 96:** "tent"
- **Class 97:** "The_Eiffel_Tower"
- **Class 98:** "The_Great_Wall_of_China"
- **Class 99:** "The_Mona_Lisa"
- **Class 100:** "tiger"
- **Class 101:** "toaster"
- **Class 102:** "tooth"
- **Class 103:** "tornado"
- **Class 104:** "tractor"
- **Class 105:** "train"
- **Class 106:** "tree"
- **Class 107:** "triangle"
- **Class 108:** "trombone"
- **Class 109:** "truck"
- **Class 110:** "trumpet"
- **Class 111:** "umbrella"
- **Class 112:** "vase"
- **Class 113:** "violin"
- **Class 114:** "watermelon"
- **Class 115:** "whale"
- **Class 116:** "windmill"
- **Class 117:** "wine_glass"
- **Class 118:** "yoga"
- **Class 119:** "zebra"
- **Class 120:** "zigzag"
# **Run with Transformers🤗**
```python
!pip install -q transformers torch pillow gradio
```
```python
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from transformers.image_utils import load_image
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Multisource-121-DomainNet"
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
def multisource_classification(image):
"""Predicts the domain category for an input image."""
# Convert the input numpy array to a PIL Image and ensure it is in RGB format
image = Image.fromarray(image).convert("RGB")
# Process the image and convert it to model inputs
inputs = processor(images=image, return_tensors="pt")
# Get model predictions without gradient calculations
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
# Convert logits to probabilities using softmax
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
# Mapping from class indices to domain labels
labels = {
"0": "barn", "1": "baseball_bat", "2": "basket", "3": "beach", "4": "bear",
"5": "beard", "6": "bee", "7": "bird", "8": "blueberry", "9": "bowtie",
"10": "bracelet", "11": "brain", "12": "bread", "13": "broccoli", "14": "bus",
"15": "butterfly", "16": "circle", "17": "cloud", "18": "cruise_ship", "19": "dolphin",
"20": "dumbbell", "21": "elephant", "22": "eye", "23": "eyeglasses", "24": "feather",
"25": "fish", "26": "flower", "27": "foot", "28": "frog", "29": "giraffe",
"30": "goatee", "31": "golf_club", "32": "grapes", "33": "grass", "34": "guitar",
"35": "hamburger", "36": "hand", "37": "hat", "38": "headphones", "39": "helicopter",
"40": "hexagon", "41": "hockey_stick", "42": "horse", "43": "hourglass", "44": "house",
"45": "ice_cream", "46": "jacket", "47": "ladder", "48": "leg", "49": "lipstick",
"50": "megaphone", "51": "monkey", "52": "moon", "53": "mushroom", "54": "necklace",
"55": "owl", "56": "panda", "57": "pear", "58": "peas", "59": "penguin",
"60": "pig", "61": "pillow", "62": "pineapple", "63": "pizza", "64": "pool",
"65": "popsicle", "66": "rabbit", "67": "rhinoceros", "68": "rifle", "69": "river",
"70": "sailboat", "71": "sandwich", "72": "sea_turtle", "73": "shark", "74": "shoe",
"75": "skyscraper", "76": "snorkel", "77": "snowman", "78": "soccer_ball", "79": "speedboat",
"80": "spider", "81": "spoon", "82": "square", "83": "squirrel", "84": "stethoscope",
"85": "strawberry", "86": "streetlight", "87": "submarine", "88": "suitcase", "89": "sun",
"90": "sweater", "91": "sword", "92": "table", "93": "teapot", "94": "teddy-bear",
"95": "telephone", "96": "tent", "97": "The_Eiffel_Tower", "98": "The_Great_Wall_of_China",
"99": "The_Mona_Lisa", "100": "tiger", "101": "toaster", "102": "tooth", "103": "tornado",
"104": "tractor", "105": "train", "106": "tree", "107": "triangle", "108": "trombone",
"109": "truck", "110": "trumpet", "111": "umbrella", "112": "vase", "113": "violin",
"114": "watermelon", "115": "whale", "116": "windmill", "117": "wine_glass", "118": "yoga",
"119": "zebra", "120": "zigzag"
}
# Create a dictionary mapping each label to its corresponding probability (rounded)
predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Create Gradio interface
iface = gr.Interface(
fn=multisource_classification,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Prediction Scores"),
title="Multisource-121-DomainNet Classification",
description="Upload an image to classify it into one of 121 domain categories."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
```
---
# **Intended Use:**
The **Multisource-121-DomainNet** model is designed for multi-source image classification. It can categorize images into a diverse set of 121 domains, covering various objects, scenes, and landmarks. Potential use cases include:
- **Cross-Domain Image Analysis:** Enabling robust classification across a wide range of visual domains.
- **Multimedia Retrieval:** Assisting in content organization and retrieval in multimedia databases.
- **Computer Vision Research:** Serving as a benchmark for evaluating domain adaptation and transfer learning techniques.
- **Interactive Applications:** Enhancing user interfaces with diverse, real-time image recognition capabilities. |