jameslahm commited on
Commit
c02e37b
·
verified ·
1 Parent(s): 2fa8b28

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -5
README.md CHANGED
@@ -4,7 +4,6 @@ license: mit
4
  tags:
5
  - image-classification
6
  - timm
7
- library_tag: timm
8
  pipeline_tag: image-classification
9
  ---
10
 
@@ -12,16 +11,144 @@ pipeline_tag: image-classification
12
 
13
  Paper: https://arxiv.org/abs/2503.23135
14
 
15
- Code: https://github.com/THU-MIG/lsnet
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  ```bibtex
18
  @misc{wang2025lsnetlargefocussmall,
19
- title={LSNet: See Large, Focus Small},
20
  author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
21
  year={2025},
22
  eprint={2503.23135},
23
  archivePrefix={arXiv},
24
  primaryClass={cs.CV},
25
- url={https://arxiv.org/abs/2503.23135},
26
  }
27
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  tags:
5
  - image-classification
6
  - timm
 
7
  pipeline_tag: image-classification
8
  ---
9
 
 
11
 
12
  Paper: https://arxiv.org/abs/2503.23135
13
 
14
+ Code: https://github.com/jameslahm/lsnet
15
 
16
+ ## Usage
17
+
18
+ ```python
19
+ import timm
20
+ import torch
21
+ from PIL import Image
22
+ import requests
23
+ from timm.data import resolve_data_config, create_transform
24
+
25
+ # Load the model
26
+ model = timm.create_model(
27
+ 'hf_hub:jameslahm/lsnet_b',
28
+ pretrained=True
29
+ )
30
+ model.eval()
31
+
32
+ # Load and transform image
33
+ # Example using a URL:
34
+ url = 'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
35
+ img = Image.open(requests.get(url, stream=True).raw)
36
+
37
+ config = resolve_data_config({}, model=model)
38
+ transform = create_transform(**config)
39
+ input_tensor = transform(img).unsqueeze(0) # transform and add batch dimension
40
+
41
+ # Make prediction
42
+ with torch.no_grad():
43
+ output = model(input_tensor)
44
+ probabilities = torch.nn.functional.softmax(output[0], dim=0)
45
+
46
+ # Get top 5 predictions
47
+ top5_prob, top5_catid = torch.topk(probabilities, 5)
48
+ # Assuming you have imagenet labels list 'imagenet_labels'
49
+ # for i in range(top5_prob.size(0)):
50
+ # print(imagenet_labels[top5_catid[i]], top5_prob[i].item())
51
+ ```
52
+
53
+ ## Citation
54
+ If our code or models help your work, please cite our paper:
55
  ```bibtex
56
  @misc{wang2025lsnetlargefocussmall,
57
+ title={LSNet: See Large, Focus Small},
58
  author={Ao Wang and Hui Chen and Zijia Lin and Jungong Han and Guiguang Ding},
59
  year={2025},
60
  eprint={2503.23135},
61
  archivePrefix={arXiv},
62
  primaryClass={cs.CV},
63
+ url={https://arxiv.org/abs/2503.23135},
64
  }
65
+ ```
66
+
67
+ # [LSNet: See Large, Focus Small](https://arxiv.org/abs/2503.23135)
68
+
69
+
70
+ Official PyTorch implementation of **LSNet**. CVPR 2025.
71
+
72
+ <p align="center">
73
+ <img src="https://raw.githubusercontent.com/THU-MIG/lsnet/refs/heads/master/figures/throughput.svg" width=60%> <br>
74
+ Models are trained on ImageNet-1K and the throughput
75
+ is tested on a Nvidia RTX3090.
76
+ </p>
77
+
78
+ [LSNet: See Large, Focus Small](https://arxiv.org/abs/2503.23135).\
79
+ Ao Wang, Hui Chen, Zijia Lin, Jungong Han, and Guiguang Ding\
80
+ [![arXiv](https://img.shields.io/badge/arXiv-2503.23135-b31b1b.svg)](https://arxiv.org/abs/2503.23135) [![Hugging Face Models](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Models-blue)](https://huggingface.co/jameslahm/lsnet/tree/main) [![Hugging Face Collection](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Collection-blue)](https://huggingface.co/collections/jameslahm/lsnet-67ebec0ab4e220e7918d9565)
81
+
82
+ We introduce LSNet, a new family of lightweight vision models inspired by dynamic heteroscale capability of the human visual system, i.e., "See Large, Focus Small". LSNet achieves state-of-the-art performance and efficiency trade-offs across various vision tasks.
83
+
84
+ <details>
85
+ <summary>
86
+ <font size="+1">Abstract</font>
87
+ </summary>
88
+ Vision network designs, including Convolutional Neural Networks and Vision Transformers, have significantly advanced the field of computer vision. Yet, their complex computations pose challenges for practical deployments, particularly in real-time applications. To tackle this issue, researchers have explored various lightweight and efficient network designs. However, existing lightweight models predominantly leverage self-attention mechanisms and convolutions for token mixing. This dependence brings limitations in effectiveness and efficiency in the perception and aggregation processes of lightweight networks, hindering the balance between performance and efficiency under limited computational budgets. In this paper, we draw inspiration from the dynamic heteroscale vision ability inherent in the efficient human vision system and propose a "See Large, Focus Small" strategy for lightweight vision network design. We introduce LS (<b>L</b>arge-<b>S</b>mall) convolution, which combines large-kernel perception and small-kernel aggregation. It can efficiently capture a wide range of perceptual information and achieve precise feature aggregation for dynamic and complex visual representations, thus enabling proficient processing of visual information. Based on LS convolution, we present LSNet, a new family of lightweight models. Extensive experiments demonstrate that LSNet achieves superior performance and efficiency over existing lightweight networks in various vision tasks.
89
+ </details>
90
+
91
+ ## Classification on ImageNet-1K
92
+
93
+ ### Models
94
+ - \* denotes the results with distillation.
95
+ - The throughput is tested on a Nvidia RTX3090 using [speed.py](./speed.py).
96
+
97
+ | Model | Top-1 | Params | FLOPs | Throughput | Ckpt | Log |
98
+ |:-:|:-:|:-:|:-:|:-:|:-:|:-:|
99
+ | LSNet-T | 74.9 / 76.1* | 11.4M | 0.3G | 14708 | [T](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_t.pth) / [T*](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_t_distill.pth) | [T](logs/lsnet_t.log) / [T*](logs/lsnet_t_distill.log) |
100
+ | LSNet-S | 77.8 / 79.0* | 16.1M | 0.5G | 9023 | [S](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_s.pth) / [S*](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_s_distill.pth) | [S](logs/lsnet_s.log) / [S*](logs/lsnet_s_distill.log) |
101
+ | LSNet-B | 80.3 / 81.6* | 23.2M | 1.3G | 3996 | [B](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_b.pth) / [B*](https://huggingface.co/jameslahm/lsnet/blob/main/lsnet_b_distill.pth) | [B](logs/lsnet_b.log) / [B*](logs/lsnet_b_distill.log) |
102
+
103
+ ## ImageNet
104
+
105
+ ### Prerequisites
106
+ `conda` virtual environment is recommended.
107
+ ```bash
108
+ conda create -n lsnet python=3.8
109
+ pip install -r requirements.txt
110
+ ```
111
+
112
+ ### Data preparation
113
+
114
+ Download and extract ImageNet train and val images from http://image-net.org/. The training and validation data are expected to be in the `train` folder and `val` folder respectively:
115
+ ```
116
+ |-- /path/to/imagenet/
117
+ |-- train
118
+ |-- val
119
+ ```
120
+
121
+ ### Training
122
+ To train LSNet-T on an 8-GPU machine:
123
+ ```bash
124
+ python -m torch.distributed.launch --nproc_per_node=8 --master_port 12345 --use_env main.py --model lsnet_t --data-path ~/imagenet --dist-eval
125
+ # For training with distillation, please add `--distillation-type hard`
126
+ # For LSNet-B, please add `--weight-decay 0.05`
127
+ ```
128
+
129
+ ### Testing
130
+ ```bash
131
+ python main.py --eval --model lsnet_t --resume ./pretrain/lsnet_t.pth --data-path ~/imagenet
132
+ ```
133
+ Models can also be automatically downloaded from 🤗 like below.
134
+ ```python
135
+ import timm
136
+
137
+ model = timm.create_model(
138
+ f'hf_hub:jameslahm/lsnet_{t/t_distill/s/s_distill/b/b_distill}',
139
+ pretrained=True
140
+ )
141
+ ```
142
+
143
+ ## Downstream Tasks
144
+ [Object Detection and Instance Segmentation](https://github.com/THU-MIG/lsnet/blob/master/detection/README.md)<br>
145
+ [Semantic Segmentation](https://github.com/THU-MIG/lsnet/blob/master/segmentation/README.md)<br>
146
+ [Robustness Evaluation](https://github.com/THU-MIG/lsnet/blob/master/README_robustness.md)
147
+
148
+ ## Acknowledgement
149
+
150
+ Classification (ImageNet) code base is partly built with [EfficientViT](https://github.com/microsoft/Cream/tree/main/EfficientViT), [LeViT](https://github.com/facebookresearch/LeViT), [PoolFormer](https://github.com/sail-sg/poolformer) and [EfficientFormer](https://github.com/snap-research/EfficientFormer).
151
+
152
+ The detection and segmentation pipeline is from [MMCV](https://github.com/open-mmlab/mmcv) ([MMDetection](https://github.com/open-mmlab/mmdetection) and [MMSegmentation](https://github.com/open-mmlab/mmsegmentation)).
153
+
154
+ Thanks for the great implementations!