File size: 2,238 Bytes
9736014
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#### Grad-CAM visualization of any VisionEncoderDecoder model

# Step 1: Open /pytorch_grad_cam folder and make sure that in init.py all the CAM version is imported as the class name not the python file. For example 
                 from pytorch_grad_cam.grad_cam import GradCAM 

because when in the main python code (Grad_CAM_Visualization.py) we want to import every Class directly.


# Step2: Open the main Grad-CAM code: Grad_CAM_Visualization.py and edit the following function according to your model. 
#                             "def reshape_transform(tensor, height=14, width=14):

                                  result = tensor[:, 1:, :].reshape(tensor.size(0),

                                      height, width, tensor.size(2))

                                  result = result.transpose(2, 3).transpose(1, 2)

#                                 return result"

here as the resized image tensor was [150,528] which should be equivalent to the reshaped transform of [1,14,14,768]

## The error message should be like this if any mismatch:

              RuntimeError: shape '[1, 16, 16, 768]' is invalid for input of size 150528



# Step 3: Choose your desired model from (DeIT_Base16_Pretrained with ImageNeT, Customized VisionTransformer, Dino_Base16_Pretrained with ImageNeT, My customized DeiT-CXR model, My customized EfficientNet model, and ##VisionEncoderDecoder Model)



# Step 4: Open base_cam.py file and go to the "forward" function of Class BaseCAM.
          Write extra line "outputs = outputs.pooler_output" for ##VisionEncoderDecoder Model as we need to take the tensor of pooler_output of the model configuration. Follow the comment line as well.


# Step 5: Then follow the comments in the Grad_CAM_Visualization.py: 
              use model.encoder instead of model for ## VisionEncoderDecoder Model

              use different target_layers for different model

              target_layers = [model.encoder.encoder.layer[-1].layernorm_before] for ## VisionEncoderDecoder Model


# Step 6: Change the image_path and output_path accordingly

# Step 7: Run python Grad_CAM_Visualization.py --use-cuda --image-path "directory/image_path" --method "any grad-cam method defined in the code"