Moditha24 commited on
Commit
2d15db2
·
verified ·
1 Parent(s): 89ec615

Upload 4 files

Browse files
Files changed (5) hide show
  1. .gitattributes +1 -0
  2. README (2).md +52 -0
  3. config.py +248 -0
  4. gitattributes +31 -0
  5. pytorch_model.pth +3 -0
.gitattributes ADDED
@@ -0,0 +1 @@
 
 
1
+ pytorch_model.pth filter=lfs diff=lfs merge=lfs -text
README (2).md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Hugging Face's logo
2
+ ---
3
+ tags:
4
+ - object-detection
5
+ - vision
6
+ library_name: mask_rcnn
7
+ datasets:
8
+ - coco
9
+
10
+ ---
11
+
12
+
13
+ # Mask R-CNN
14
+
15
+ ## Model desription
16
+
17
+ Mask R-CNN is a model that extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. The model locates pixels of images instead of just bounding boxes as Faster R-CNN was not designed for pixel-to-pixel alignment between network inputs and outputs.
18
+
19
+ *This Model is based on the Pretrained model from [OpenMMlab](https://github.com/open-mmlab/mmdetection)*
20
+
21
+ ![MMDetection](https://user-images.githubusercontent.com/40661020/143967081-c2552bed-9af2-46c4-ae44-5b3b74e5679f.png)
22
+
23
+ ### More information on the model and dataset:
24
+
25
+ #### The model
26
+ Mask R-CNN works towards the approach of instance segmentation, which involves object detection, and semantic segmentation. For object detection, Mask R-CNN uses an architecture that is similar to Faster R-CNN, while it uses a Fully Convolutional Network(FCN) for semantic segmentation.
27
+ The FCN is added to the top of features of a Faster R-CNN to generate a mask segmentation output. This segmentation output is in parallel with the classification and bounding box regressor network of the Faster R-CNN model. From the advancement of Fast R-CNN Region of Interest Pooling(ROI), Mask R-CNN adds refinement called ROI aligning by addressing the loss and misalignment of ROI Pooling; the new ROI aligned leads to improved results.
28
+
29
+
30
+ #### Datasets
31
+ [COCO Datasets](https://cocodataset.org/#home)
32
+
33
+ ## Training Procedure
34
+ Please [read the paper](https://arxiv.org/pdf/1703.06870.pdf) for more information on training, or check OpenMMLab [repository](https://github.com/open-mmlab/mmdetection/tree/master/configs/mask_rcnn)
35
+
36
+ The model architecture is divided into two parts:
37
+ - Region proposal network (RPN) to propose candidate object bounding boxes.
38
+ - Binary mask classifier to generate a mask for every class
39
+
40
+ #### Technical Summary.
41
+ - Mask R-CNN is quite similar to the structure of faster R-CNN.
42
+ - Outputs a binary mask for each Region of Interest.
43
+ - Applies bounding-box classification and regression in parallel, simplifying the original R-CNN's multi-stage pipeline.
44
+ - The network architectures utilized are called ResNet and ResNeXt. The depth can be either 50 or 101
45
+
46
+ #### Results Summary
47
+ - Instance Segmentation: Based on the COCO dataset, Mask R-CNN outperforms all categories compared to MNC and FCIS, which are state-of-the-art models.
48
+ - Bounding Box Detection: Mask R-CNN outperforms the base variants of all previous state-of-the-art models, including the COCO 2016 Detection Challenge winner.
49
+
50
+ ## Intended uses & limitations
51
+ The identification of object relationships and the context of objects in a picture are both aided by image segmentation. Some of the applications include face recognition, number plate recognition, and satellite image analysis. With great model generality, Mask RCNN can be extended to human pose estimation; it can be used to estimate on-site approaching live traffic to aid autonomous driving.
52
+
config.py ADDED
@@ -0,0 +1,248 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model = dict(
2
+ type='MaskRCNN',
3
+ backbone=dict(
4
+ type='ResNeXt',
5
+ depth=101,
6
+ num_stages=4,
7
+ out_indices=(0, 1, 2, 3),
8
+ frozen_stages=1,
9
+ norm_cfg=dict(type='BN', requires_grad=True),
10
+ norm_eval=True,
11
+ style='pytorch',
12
+ init_cfg=dict(
13
+ type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'),
14
+ groups=64,
15
+ base_width=4),
16
+ neck=dict(
17
+ type='FPN',
18
+ in_channels=[256, 512, 1024, 2048],
19
+ out_channels=256,
20
+ num_outs=5),
21
+ rpn_head=dict(
22
+ type='RPNHead',
23
+ in_channels=256,
24
+ feat_channels=256,
25
+ anchor_generator=dict(
26
+ type='AnchorGenerator',
27
+ scales=[8],
28
+ ratios=[0.5, 1.0, 2.0],
29
+ strides=[4, 8, 16, 32, 64]),
30
+ bbox_coder=dict(
31
+ type='DeltaXYWHBBoxCoder',
32
+ target_means=[0.0, 0.0, 0.0, 0.0],
33
+ target_stds=[1.0, 1.0, 1.0, 1.0]),
34
+ loss_cls=dict(
35
+ type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
36
+ loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
37
+ roi_head=dict(
38
+ type='StandardRoIHead',
39
+ bbox_roi_extractor=dict(
40
+ type='SingleRoIExtractor',
41
+ roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
42
+ out_channels=256,
43
+ featmap_strides=[4, 8, 16, 32]),
44
+ bbox_head=dict(
45
+ type='Shared2FCBBoxHead',
46
+ in_channels=256,
47
+ fc_out_channels=1024,
48
+ roi_feat_size=7,
49
+ num_classes=80,
50
+ bbox_coder=dict(
51
+ type='DeltaXYWHBBoxCoder',
52
+ target_means=[0.0, 0.0, 0.0, 0.0],
53
+ target_stds=[0.1, 0.1, 0.2, 0.2]),
54
+ reg_class_agnostic=False,
55
+ loss_cls=dict(
56
+ type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
57
+ loss_bbox=dict(type='L1Loss', loss_weight=1.0)),
58
+ mask_roi_extractor=dict(
59
+ type='SingleRoIExtractor',
60
+ roi_layer=dict(type='RoIAlign', output_size=14, sampling_ratio=0),
61
+ out_channels=256,
62
+ featmap_strides=[4, 8, 16, 32]),
63
+ mask_head=dict(
64
+ type='FCNMaskHead',
65
+ num_convs=4,
66
+ in_channels=256,
67
+ conv_out_channels=256,
68
+ num_classes=80,
69
+ loss_mask=dict(
70
+ type='CrossEntropyLoss', use_mask=True, loss_weight=1.0))),
71
+ train_cfg=dict(
72
+ rpn=dict(
73
+ assigner=dict(
74
+ type='MaxIoUAssigner',
75
+ pos_iou_thr=0.7,
76
+ neg_iou_thr=0.3,
77
+ min_pos_iou=0.3,
78
+ match_low_quality=True,
79
+ ignore_iof_thr=-1),
80
+ sampler=dict(
81
+ type='RandomSampler',
82
+ num=256,
83
+ pos_fraction=0.5,
84
+ neg_pos_ub=-1,
85
+ add_gt_as_proposals=False),
86
+ allowed_border=-1,
87
+ pos_weight=-1,
88
+ debug=False),
89
+ rpn_proposal=dict(
90
+ nms_pre=2000,
91
+ max_per_img=1000,
92
+ nms=dict(type='nms', iou_threshold=0.7),
93
+ min_bbox_size=0),
94
+ rcnn=dict(
95
+ assigner=dict(
96
+ type='MaxIoUAssigner',
97
+ pos_iou_thr=0.5,
98
+ neg_iou_thr=0.5,
99
+ min_pos_iou=0.5,
100
+ match_low_quality=True,
101
+ ignore_iof_thr=-1),
102
+ sampler=dict(
103
+ type='RandomSampler',
104
+ num=512,
105
+ pos_fraction=0.25,
106
+ neg_pos_ub=-1,
107
+ add_gt_as_proposals=True),
108
+ mask_size=28,
109
+ pos_weight=-1,
110
+ debug=False)),
111
+ test_cfg=dict(
112
+ rpn=dict(
113
+ nms_pre=1000,
114
+ max_per_img=1000,
115
+ nms=dict(type='nms', iou_threshold=0.7),
116
+ min_bbox_size=0),
117
+ rcnn=dict(
118
+ score_thr=0.05,
119
+ nms=dict(type='nms', iou_threshold=0.5),
120
+ max_per_img=100,
121
+ mask_thr_binary=0.5)))
122
+ dataset_type = 'CocoDataset'
123
+ data_root = 'data/coco/'
124
+ img_norm_cfg = dict(
125
+ mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
126
+ train_pipeline = [
127
+ dict(type='LoadImageFromFile'),
128
+ dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
129
+ dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
130
+ dict(type='RandomFlip', flip_ratio=0.5),
131
+ dict(
132
+ type='Normalize',
133
+ mean=[123.675, 116.28, 103.53],
134
+ std=[58.395, 57.12, 57.375],
135
+ to_rgb=True),
136
+ dict(type='Pad', size_divisor=32),
137
+ dict(type='DefaultFormatBundle'),
138
+ dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
139
+ ]
140
+ test_pipeline = [
141
+ dict(type='LoadImageFromFile'),
142
+ dict(
143
+ type='MultiScaleFlipAug',
144
+ img_scale=(1333, 800),
145
+ flip=False,
146
+ transforms=[
147
+ dict(type='Resize', keep_ratio=True),
148
+ dict(type='RandomFlip'),
149
+ dict(
150
+ type='Normalize',
151
+ mean=[123.675, 116.28, 103.53],
152
+ std=[58.395, 57.12, 57.375],
153
+ to_rgb=True),
154
+ dict(type='Pad', size_divisor=32),
155
+ dict(type='ImageToTensor', keys=['img']),
156
+ dict(type='Collect', keys=['img'])
157
+ ])
158
+ ]
159
+ data = dict(
160
+ samples_per_gpu=2,
161
+ workers_per_gpu=2,
162
+ train=dict(
163
+ type='CocoDataset',
164
+ ann_file='data/coco/annotations/instances_train2017.json',
165
+ img_prefix='data/coco/train2017/',
166
+ pipeline=[
167
+ dict(type='LoadImageFromFile'),
168
+ dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
169
+ dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
170
+ dict(type='RandomFlip', flip_ratio=0.5),
171
+ dict(
172
+ type='Normalize',
173
+ mean=[123.675, 116.28, 103.53],
174
+ std=[58.395, 57.12, 57.375],
175
+ to_rgb=True),
176
+ dict(type='Pad', size_divisor=32),
177
+ dict(type='DefaultFormatBundle'),
178
+ dict(
179
+ type='Collect',
180
+ keys=['img', 'gt_bboxes', 'gt_labels', 'gt_masks'])
181
+ ]),
182
+ val=dict(
183
+ type='CocoDataset',
184
+ ann_file='data/coco/annotations/instances_val2017.json',
185
+ img_prefix='data/coco/val2017/',
186
+ pipeline=[
187
+ dict(type='LoadImageFromFile'),
188
+ dict(
189
+ type='MultiScaleFlipAug',
190
+ img_scale=(1333, 800),
191
+ flip=False,
192
+ transforms=[
193
+ dict(type='Resize', keep_ratio=True),
194
+ dict(type='RandomFlip'),
195
+ dict(
196
+ type='Normalize',
197
+ mean=[123.675, 116.28, 103.53],
198
+ std=[58.395, 57.12, 57.375],
199
+ to_rgb=True),
200
+ dict(type='Pad', size_divisor=32),
201
+ dict(type='ImageToTensor', keys=['img']),
202
+ dict(type='Collect', keys=['img'])
203
+ ])
204
+ ]),
205
+ test=dict(
206
+ type='CocoDataset',
207
+ ann_file='data/coco/annotations/instances_val2017.json',
208
+ img_prefix='data/coco/val2017/',
209
+ pipeline=[
210
+ dict(type='LoadImageFromFile'),
211
+ dict(
212
+ type='MultiScaleFlipAug',
213
+ img_scale=(1333, 800),
214
+ flip=False,
215
+ transforms=[
216
+ dict(type='Resize', keep_ratio=True),
217
+ dict(type='RandomFlip'),
218
+ dict(
219
+ type='Normalize',
220
+ mean=[123.675, 116.28, 103.53],
221
+ std=[58.395, 57.12, 57.375],
222
+ to_rgb=True),
223
+ dict(type='Pad', size_divisor=32),
224
+ dict(type='ImageToTensor', keys=['img']),
225
+ dict(type='Collect', keys=['img'])
226
+ ])
227
+ ]))
228
+ evaluation = dict(metric=['bbox', 'segm'])
229
+ optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001)
230
+ optimizer_config = dict(grad_clip=None)
231
+ lr_config = dict(
232
+ policy='step',
233
+ warmup='linear',
234
+ warmup_iters=500,
235
+ warmup_ratio=0.001,
236
+ step=[16, 22])
237
+ runner = dict(type='EpochBasedRunner', max_epochs=24)
238
+ checkpoint_config = dict(interval=1)
239
+ log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
240
+ custom_hooks = [dict(type='NumClassCheckHook')]
241
+ dist_params = dict(backend='nccl')
242
+ log_level = 'INFO'
243
+ load_from = None
244
+ resume_from = None
245
+ workflow = [('train', 1)]
246
+ opencv_num_threads = 0
247
+ mp_start_method = 'fork'
248
+ auto_scale_lr = dict(enable=False, base_batch_size=16)
gitattributes ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ftz filter=lfs diff=lfs merge=lfs -text
6
+ *.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.h5 filter=lfs diff=lfs merge=lfs -text
8
+ *.joblib filter=lfs diff=lfs merge=lfs -text
9
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
10
+ *.model filter=lfs diff=lfs merge=lfs -text
11
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
12
+ *.npy filter=lfs diff=lfs merge=lfs -text
13
+ *.npz filter=lfs diff=lfs merge=lfs -text
14
+ *.onnx filter=lfs diff=lfs merge=lfs -text
15
+ *.ot filter=lfs diff=lfs merge=lfs -text
16
+ *.parquet filter=lfs diff=lfs merge=lfs -text
17
+ *.pb filter=lfs diff=lfs merge=lfs -text
18
+ *.pickle filter=lfs diff=lfs merge=lfs -text
19
+ *.pkl filter=lfs diff=lfs merge=lfs -text
20
+ *.pt filter=lfs diff=lfs merge=lfs -text
21
+ *.pth filter=lfs diff=lfs merge=lfs -text
22
+ *.rar filter=lfs diff=lfs merge=lfs -text
23
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
24
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
25
+ *.tflite filter=lfs diff=lfs merge=lfs -text
26
+ *.tgz filter=lfs diff=lfs merge=lfs -text
27
+ *.wasm filter=lfs diff=lfs merge=lfs -text
28
+ *.xz filter=lfs diff=lfs merge=lfs -text
29
+ *.zip filter=lfs diff=lfs merge=lfs -text
30
+ *.zstandard filter=lfs diff=lfs merge=lfs -text
31
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
pytorch_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:39d6f70cff1219b012d2f0663d0eeb8958f179ea083b67d8f5ed6a99d77d38e2
3
+ size 410109324