timm
/

Image Classification
timm
PyTorch
Safetensors
Transformers
rwightman HF Staff commited on
Commit
a9b986e
·
verified ·
1 Parent(s): 1da852b
Files changed (4) hide show
  1. README.md +148 -0
  2. config.json +33 -0
  3. model.safetensors +3 -0
  4. pytorch_model.bin +3 -0
README.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - timm
5
+ library_name: timm
6
+ license: apache-2.0
7
+ datasets:
8
+ - imagenet-1k
9
+ - webli
10
+ ---
11
+ # Model card for vit_so400m_patch14_siglip_378.webli_ft_in1k
12
+
13
+ A Vision Transformer (ViT) encoder. Pretrained with SigLIP contrastive learning. Fine-tuned on ImageNet-1k in `timm`.
14
+
15
+ ## Model Details
16
+ - **Model Type:** Image classification / feature backbone
17
+ - **Model Stats:**
18
+ - Params (M): 429.4
19
+ - GMACs: 335.4
20
+ - Activations (M): 452.9
21
+ - Image size: 378 x 378
22
+ - **Papers:**
23
+ - Sigmoid Loss for Language Image Pre-Training: https://arxiv.org/abs/2303.15343
24
+ - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: https://arxiv.org/abs/2010.11929v2
25
+ - **Pretrain Dataset:** WebLI
26
+ - **Original:** https://github.com/google-research/big_vision
27
+ - **Dataset:** ImageNet-1k
28
+
29
+ ## Model Usage
30
+ ### Image Classification
31
+ ```python
32
+ from urllib.request import urlopen
33
+ from PIL import Image
34
+ import timm
35
+
36
+ img = Image.open(urlopen(
37
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
38
+ ))
39
+
40
+ model = timm.create_model('vit_so400m_patch14_siglip_378.webli_ft_in1k', pretrained=True)
41
+ model = model.eval()
42
+
43
+ # get model specific transforms (normalization, resize)
44
+ data_config = timm.data.resolve_model_data_config(model)
45
+ transforms = timm.data.create_transform(**data_config, is_training=False)
46
+
47
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
48
+
49
+ top5_probabilities, top5_class_indices = torch.topk(output.softmax(dim=1) * 100, k=5)
50
+ ```
51
+
52
+ ### Feature Map Extraction
53
+ ```python
54
+ from urllib.request import urlopen
55
+ from PIL import Image
56
+ import timm
57
+
58
+ img = Image.open(urlopen(
59
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
60
+ ))
61
+
62
+ model = timm.create_model(
63
+ 'vit_so400m_patch14_siglip_378.webli_ft_in1k',
64
+ pretrained=True,
65
+ features_only=True,
66
+ )
67
+ model = model.eval()
68
+
69
+ # get model specific transforms (normalization, resize)
70
+ data_config = timm.data.resolve_model_data_config(model)
71
+ transforms = timm.data.create_transform(**data_config, is_training=False)
72
+
73
+ output = model(transforms(img).unsqueeze(0)) # unsqueeze single image into batch of 1
74
+
75
+ for o in output:
76
+ # print shape of each feature map in output
77
+ # e.g.:
78
+ # torch.Size([1, 1152, 27, 27])
79
+ # torch.Size([1, 1152, 27, 27])
80
+ # torch.Size([1, 1152, 27, 27])
81
+
82
+ print(o.shape)
83
+ ```
84
+
85
+ ### Image Embeddings
86
+ ```python
87
+ from urllib.request import urlopen
88
+ from PIL import Image
89
+ import timm
90
+
91
+ img = Image.open(urlopen(
92
+ 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
93
+ ))
94
+
95
+ model = timm.create_model(
96
+ 'vit_so400m_patch14_siglip_378.webli_ft_in1k',
97
+ pretrained=True,
98
+ num_classes=0, # remove classifier nn.Linear
99
+ )
100
+ model = model.eval()
101
+
102
+ # get model specific transforms (normalization, resize)
103
+ data_config = timm.data.resolve_model_data_config(model)
104
+ transforms = timm.data.create_transform(**data_config, is_training=False)
105
+
106
+ output = model(transforms(img).unsqueeze(0)) # output is (batch_size, num_features) shaped tensor
107
+
108
+ # or equivalently (without needing to set num_classes=0)
109
+
110
+ output = model.forward_features(transforms(img).unsqueeze(0))
111
+ # output is unpooled, a (1, 729, 1152) shaped tensor
112
+
113
+ output = model.forward_head(output, pre_logits=True)
114
+ # output is a (1, num_features) shaped tensor
115
+ ```
116
+
117
+ ## Model Comparison
118
+ Explore the dataset and runtime metrics of this model in timm [model results](https://github.com/huggingface/pytorch-image-models/tree/main/results).
119
+
120
+ ## Citation
121
+ ```bibtex
122
+ @inproceedings{zhai2023sigmoid,
123
+ title={Sigmoid loss for language image pre-training},
124
+ author={Zhai, Xiaohua and Mustafa, Basil and Kolesnikov, Alexander and Beyer, Lucas},
125
+ booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
126
+ pages={11975--11986},
127
+ year={2023}
128
+ }
129
+ ```
130
+ ```bibtex
131
+ @article{dosovitskiy2020vit,
132
+ title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
133
+ author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
134
+ journal={ICLR},
135
+ year={2021}
136
+ }
137
+ ```
138
+ ```bibtex
139
+ @misc{rw2019timm,
140
+ author = {Ross Wightman},
141
+ title = {PyTorch Image Models},
142
+ year = {2019},
143
+ publisher = {GitHub},
144
+ journal = {GitHub repository},
145
+ doi = {10.5281/zenodo.4414861},
146
+ howpublished = {\url{https://github.com/huggingface/pytorch-image-models}}
147
+ }
148
+ ```
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architecture": "vit_so400m_patch14_siglip_378",
3
+ "num_classes": 1000,
4
+ "num_features": 1152,
5
+ "global_pool": "map",
6
+ "pretrained_cfg": {
7
+ "tag": "webli_ft_in1k",
8
+ "custom_load": false,
9
+ "input_size": [
10
+ 3,
11
+ 378,
12
+ 378
13
+ ],
14
+ "fixed_input_size": true,
15
+ "interpolation": "bicubic",
16
+ "crop_pct": 1.0,
17
+ "crop_mode": "squash",
18
+ "mean": [
19
+ 0.5,
20
+ 0.5,
21
+ 0.5
22
+ ],
23
+ "std": [
24
+ 0.5,
25
+ 0.5,
26
+ 0.5
27
+ ],
28
+ "num_classes": 1000,
29
+ "pool_size": null,
30
+ "first_conv": "patch_embed.proj",
31
+ "classifier": "head"
32
+ }
33
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1aa26b75cd67e6498a1b84a81a6956363b33d2e850f5442876f2d5915a85bf29
3
+ size 1717547296
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8d9d320a9ae6212dd39ec06f04bce62bbecf89a57e31a2c07722bc9340951fc1
3
+ size 1717643562