soumyaprabhamaiti commited on
Commit
b6bb21d
·
1 Parent(s): 8fdfa17

Update readme

Browse files
Files changed (3) hide show
  1. README.md +189 -3
  2. readme_images/unet.png +0 -0
  3. readme_images/webapp.png +0 -0
README.md CHANGED
@@ -1,11 +1,197 @@
1
  ---
2
  title: Pet Image Segmentation using PyTorch
3
- emoji: 🌖
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
  sdk_version: 5.4.0
8
  app_file: run_webapp.py
9
- pinned: false
10
  license: mit
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: Pet Image Segmentation using PyTorch
3
+ emoji: 😻
4
  colorFrom: blue
5
  colorTo: pink
6
  sdk: gradio
7
  sdk_version: 5.4.0
8
  app_file: run_webapp.py
9
+ pinned: true
10
  license: mit
11
+ short_description: Segments pet image into foreground, background & boundary
12
+ ---
13
+ # Pet Image Segmentation using PyTorch
14
+
15
+ [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/soumyaprabhamaiti/pet-image-segmentation-pytorch)
16
+
17
+ This project focuses on segmenting pet images into three classes: background, pet, and boundary using a [U-Net](https://arxiv.org/abs/1505.04597) model implemented in PyTorch. The model is trained on [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) and the web app for inference is deployed using [Gradio](https://gradio.app/).
18
+
19
+ ## Webapp Demo
20
+
21
+ The deployed version of this project can be accessed at [Hugging Face Spaces](https://huggingface.co/spaces/soumyaprabhamaiti/pet-image-segmentation-pytorch). Segmentation on a sample image is shown below:
22
+ ![Segmentation on a sample image](readme_images/webapp.png)
23
+
24
+ ## Installing Locally
25
+
26
+ 1. Clone the repository:
27
+ ```
28
+ git clone https://github.com/soumya-prabha-maiti/pet-image-segmentation-pytorch.git
29
+ ```
30
+
31
+ 1. Navigate to the project folder:
32
+ ```
33
+ cd pet-image-segmentation-pytorch
34
+ ```
35
+
36
+ 1. Create and activate a virtual environment:
37
+ ```
38
+ python -m venv env
39
+ source env/bin/activate # On Windows use `env\Scripts\activate`
40
+ ```
41
+
42
+ 1. Install the required libraries:
43
+ ```
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ 1. Run the application:
48
+ ```
49
+ python run_webapp.py
50
+ ```
51
+
52
+ ## Dataset
53
+
54
+ The [Oxford-IIIT Pet Dataset](https://www.robots.ox.ac.uk/~vgg/data/pets/) contains 37 categories of pets with roughly 200 images for each category. The images have a large variation in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation. Here the dataset was obtained using Torchvision.
55
+
56
+ ## Model Architecture
57
+
58
+ The segmentation model uses the UNET architecture. The basic architecture of the UNET model is shown below:
59
+ ![UNET Architecture](readme_images/unet.png)
60
+
61
+ The UNET model consists of an encoder and a decoder. The encoder is a series of convolutional layers that extract features from the input image. The decoder is a series of transposed convolutional layers that upsample the features to the original image size. Skip connections are used to connect the encoder and decoder layers. The skip connections concatenate the feature maps from the encoder to the corresponding feature maps in the decoder. This helps the decoder to recover the spatial information lost during the encoding process.
62
+
63
+ <details>
64
+ <summary>Detailed architecture of the UNET model used in this project</summary>
65
+
66
+ ==========================================================================================
67
+ Layer (type:depth-idx) Output Shape Param #
68
+ ==========================================================================================
69
+ UNet [16, 3, 128, 128] --
70
+ ├─ModuleList: 1-9 -- (recursive)
71
+ │ └─DoubleConvOriginal: 2-1 [16, 16, 128, 128] --
72
+ │ │ └─Sequential: 3-1 [16, 16, 128, 128] --
73
+ │ │ │ └─Conv2d: 4-1 [16, 16, 128, 128] 432
74
+ │ │ │ └─BatchNorm2d: 4-2 [16, 16, 128, 128] 32
75
+ │ │ │ └─ReLU: 4-3 [16, 16, 128, 128] --
76
+ │ │ │ └─Conv2d: 4-4 [16, 16, 128, 128] 2,304
77
+ │ │ │ └─BatchNorm2d: 4-5 [16, 16, 128, 128] 32
78
+ │ │ │ └─ReLU: 4-6 [16, 16, 128, 128] --
79
+ ├─MaxPool2d: 1-2 [16, 16, 64, 64] --
80
+ ├─ModuleList: 1-9 -- (recursive)
81
+ │ └─DoubleConvOriginal: 2-2 [16, 32, 64, 64] --
82
+ │ │ └─Sequential: 3-2 [16, 32, 64, 64] --
83
+ │ │ │ └─Conv2d: 4-7 [16, 32, 64, 64] 4,608
84
+ │ │ │ └─BatchNorm2d: 4-8 [16, 32, 64, 64] 64
85
+ │ │ │ └─ReLU: 4-9 [16, 32, 64, 64] --
86
+ │ │ │ └─Conv2d: 4-10 [16, 32, 64, 64] 9,216
87
+ │ │ │ └─BatchNorm2d: 4-11 [16, 32, 64, 64] 64
88
+ │ │ │ └─ReLU: 4-12 [16, 32, 64, 64] --
89
+ ├─MaxPool2d: 1-4 [16, 32, 32, 32] --
90
+ ├─ModuleList: 1-9 -- (recursive)
91
+ │ └─DoubleConvOriginal: 2-3 [16, 64, 32, 32] --
92
+ │ │ └─Sequential: 3-3 [16, 64, 32, 32] --
93
+ │ │ │ └─Conv2d: 4-13 [16, 64, 32, 32] 18,432
94
+ │ │ │ └─BatchNorm2d: 4-14 [16, 64, 32, 32] 128
95
+ │ │ │ └─ReLU: 4-15 [16, 64, 32, 32] --
96
+ │ │ │ └─Conv2d: 4-16 [16, 64, 32, 32] 36,864
97
+ │ │ │ └─BatchNorm2d: 4-17 [16, 64, 32, 32] 128
98
+ │ │ │ └─ReLU: 4-18 [16, 64, 32, 32] --
99
+ ├─MaxPool2d: 1-6 [16, 64, 16, 16] --
100
+ ├─ModuleList: 1-9 -- (recursive)
101
+ │ └─DoubleConvOriginal: 2-4 [16, 128, 16, 16] --
102
+ │ │ └─Sequential: 3-4 [16, 128, 16, 16] --
103
+ │ │ │ └─Conv2d: 4-19 [16, 128, 16, 16] 73,728
104
+ │ │ │ └─BatchNorm2d: 4-20 [16, 128, 16, 16] 256
105
+ │ │ │ └─ReLU: 4-21 [16, 128, 16, 16] --
106
+ │ │ │ └─Conv2d: 4-22 [16, 128, 16, 16] 147,456
107
+ │ │ │ └─BatchNorm2d: 4-23 [16, 128, 16, 16] 256
108
+ │ │ │ └─ReLU: 4-24 [16, 128, 16, 16] --
109
+ ├─MaxPool2d: 1-8 [16, 128, 8, 8] --
110
+ ├─ModuleList: 1-9 -- (recursive)
111
+ │ └─DoubleConvOriginal: 2-5 [16, 256, 8, 8] --
112
+ │ │ └─Sequential: 3-5 [16, 256, 8, 8] --
113
+ │ │ │ └─Conv2d: 4-25 [16, 256, 8, 8] 294,912
114
+ │ │ │ └─BatchNorm2d: 4-26 [16, 256, 8, 8] 512
115
+ │ │ │ └─ReLU: 4-27 [16, 256, 8, 8] --
116
+ │ │ │ └─Conv2d: 4-28 [16, 256, 8, 8] 589,824
117
+ │ │ │ └─BatchNorm2d: 4-29 [16, 256, 8, 8] 512
118
+ │ │ │ └─ReLU: 4-30 [16, 256, 8, 8] --
119
+ ├─MaxPool2d: 1-10 [16, 256, 4, 4] --
120
+ ├─DoubleConvOriginal: 1-11 [16, 512, 4, 4] --
121
+ │ └─Sequential: 2-6 [16, 512, 4, 4] --
122
+ │ │ └─Conv2d: 3-6 [16, 512, 4, 4] 1,179,648
123
+ │ │ └─BatchNorm2d: 3-7 [16, 512, 4, 4] 1,024
124
+ │ │ └─ReLU: 3-8 [16, 512, 4, 4] --
125
+ │ │ └─Conv2d: 3-9 [16, 512, 4, 4] 2,359,296
126
+ │ │ └─BatchNorm2d: 3-10 [16, 512, 4, 4] 1,024
127
+ │ │ └─ReLU: 3-11 [16, 512, 4, 4] --
128
+ ├─ModuleList: 1-12 -- --
129
+ │ └─ConvTranspose2d: 2-7 [16, 256, 8, 8] 524,544
130
+ │ └─DoubleConvOriginal: 2-8 [16, 256, 8, 8] --
131
+ │ │ └─Sequential: 3-12 [16, 256, 8, 8] --
132
+ │ │ │ └─Conv2d: 4-31 [16, 256, 8, 8] 1,179,648
133
+ │ │ │ └─BatchNorm2d: 4-32 [16, 256, 8, 8] 512
134
+ │ │ │ └─ReLU: 4-33 [16, 256, 8, 8] --
135
+ │ │ │ └─Conv2d: 4-34 [16, 256, 8, 8] 589,824
136
+ │ │ │ └─BatchNorm2d: 4-35 [16, 256, 8, 8] 512
137
+ │ │ │ └─ReLU: 4-36 [16, 256, 8, 8] --
138
+ │ └─ConvTranspose2d: 2-9 [16, 128, 16, 16] 131,200
139
+ │ └─DoubleConvOriginal: 2-10 [16, 128, 16, 16] --
140
+ │ │ └─Sequential: 3-13 [16, 128, 16, 16] --
141
+ │ │ │ └─Conv2d: 4-37 [16, 128, 16, 16] 294,912
142
+ │ │ │ └─BatchNorm2d: 4-38 [16, 128, 16, 16] 256
143
+ │ │ │ └─ReLU: 4-39 [16, 128, 16, 16] --
144
+ │ │ │ └─Conv2d: 4-40 [16, 128, 16, 16] 147,456
145
+ │ │ │ └─BatchNorm2d: 4-41 [16, 128, 16, 16] 256
146
+ │ │ │ └─ReLU: 4-42 [16, 128, 16, 16] --
147
+ │ └─ConvTranspose2d: 2-11 [16, 64, 32, 32] 32,832
148
+ │ └─DoubleConvOriginal: 2-12 [16, 64, 32, 32] --
149
+ │ │ └─Sequential: 3-14 [16, 64, 32, 32] --
150
+ │ │ │ └─Conv2d: 4-43 [16, 64, 32, 32] 73,728
151
+ │ │ │ └─BatchNorm2d: 4-44 [16, 64, 32, 32] 128
152
+ │ │ │ └─ReLU: 4-45 [16, 64, 32, 32] --
153
+ │ │ │ └─Conv2d: 4-46 [16, 64, 32, 32] 36,864
154
+ │ │ │ └─BatchNorm2d: 4-47 [16, 64, 32, 32] 128
155
+ │ │ │ └─ReLU: 4-48 [16, 64, 32, 32] --
156
+ │ └─ConvTranspose2d: 2-13 [16, 32, 64, 64] 8,224
157
+ │ └─DoubleConvOriginal: 2-14 [16, 32, 64, 64] --
158
+ │ │ └─Sequential: 3-15 [16, 32, 64, 64] --
159
+ │ │ │ └─Conv2d: 4-49 [16, 32, 64, 64] 18,432
160
+ │ │ │ └─BatchNorm2d: 4-50 [16, 32, 64, 64] 64
161
+ │ │ │ └─ReLU: 4-51 [16, 32, 64, 64] --
162
+ │ │ │ └─Conv2d: 4-52 [16, 32, 64, 64] 9,216
163
+ │ │ │ └─BatchNorm2d: 4-53 [16, 32, 64, 64] 64
164
+ │ │ │ └─ReLU: 4-54 [16, 32, 64, 64] --
165
+ │ └─ConvTranspose2d: 2-15 [16, 16, 128, 128] 2,064
166
+ │ └─DoubleConvOriginal: 2-16 [16, 16, 128, 128] --
167
+ │ │ └─Sequential: 3-16 [16, 16, 128, 128] --
168
+ │ │ │ └─Conv2d: 4-55 [16, 16, 128, 128] 4,608
169
+ │ │ │ └─BatchNorm2d: 4-56 [16, 16, 128, 128] 32
170
+ │ │ │ └─ReLU: 4-57 [16, 16, 128, 128] --
171
+ │ │ │ └─Conv2d: 4-58 [16, 16, 128, 128] 2,304
172
+ │ │ │ └─BatchNorm2d: 4-59 [16, 16, 128, 128] 32
173
+ │ │ │ └─ReLU: 4-60 [16, 16, 128, 128] --
174
+ ├─Conv2d: 1-13 [16, 3, 128, 128] 51
175
+ ==========================================================================================
176
+ Total params: 7,778,643
177
+ Trainable params: 7,778,643
178
+ Non-trainable params: 0
179
+ Total mult-adds (Units.GIGABYTES): 17.01
180
+ ==========================================================================================
181
+ Input size (MB): 3.15
182
+ Forward/backward pass size (MB): 595.59
183
+ Params size (MB): 31.11
184
+ Estimated Total Size (MB): 629.85
185
+ ==========================================================================================
186
+ </details>
187
+
188
+ ## Libraries Used
189
+
190
+ The following libraries were used in this project:
191
+
192
+ - PyTorch + PyTorch Lightning : To build segmentation model.
193
+ - Gradio : To create the user interface for the segmentation app.
194
+
195
+ ## License
196
+
197
+ This project is licensed under the [MIT License](LICENSE).
readme_images/unet.png ADDED
readme_images/webapp.png ADDED