Aviroy commited on
Commit
d5b9ea2
·
verified ·
1 Parent(s): 65419ca

Rename README.md to Update README.md

Browse files
Files changed (2) hide show
  1. README.md +0 -22
  2. Update README.md +187 -0
README.md DELETED
@@ -1,22 +0,0 @@
1
- ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- metrics:
6
- - accuracy
7
- base_model:
8
- - microsoft/resnet-50
9
- - timm/vgg19.tv_in1k
10
- - google/vit-base-patch16-224
11
- - xai-org/grok-1
12
- pipeline_tag: image-classification
13
- tags:
14
- - Ocular-Toxoplasmosis(FundusImages)
15
- - Retinal-images(Diabetics,Cataract,Gulocoma,Healthy)
16
- - Pytorch
17
- - Transformers
18
- - Image-Classification
19
- - Image_feature_extraction
20
- - Grad-CAM
21
- - XAI-Visualization
22
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Update README.md ADDED
@@ -0,0 +1,187 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - accuracy
7
+ base_model:
8
+ - microsoft/resnet-50
9
+ - timm/vgg19.tv_in1k
10
+ - google/vit-base-patch16-224
11
+ - xai-org/grok-1
12
+ pipeline_tag: image-classification
13
+ tags:
14
+ - Ocular-Toxoplasmosis(FundusImages)
15
+ - Retinal-images(Diabetics,Cataract,Gulocoma,Healthy)
16
+ - Pytorch
17
+ - Transformers
18
+ - Image-Classification
19
+ - Image_feature_extraction
20
+ - Grad-CAM
21
+ - XAI-Visualization
22
+ ---
23
+
24
+ # Model Card: ROYXAI [Vision Transformer + VGG19 + ResNet50 Ensemble with Grad-CAM]
25
+
26
+ ## Model Description
27
+ This model is an ensemble of three deep learning architectures: **Vision Transformer (ViT), VGG19, and ResNet50**. The ensemble approach enhances classification performance on medical image datasets related to ocular diseases. The model also integrates **Grad-CAM** visualization to highlight regions of interest for better interpretability.
28
+
29
+ ## Model Details
30
+ - **Model Name**: ROYXAI
31
+ - **Developed by**: Avishek Roy Sparsho
32
+ - **Framework**: PyTorch
33
+ - **Ensemble Method**: Bagging
34
+ - **Backbone Models**: Vision Transformer, VGG19, ResNet50
35
+ - **Target Task**: Medical Image Classification
36
+ - **Supported Classes**:
37
+ - OT
38
+ - Healthy
39
+ - SC_diabetes
40
+ - SC_cataract
41
+ - SC_glucoma
42
+
43
+ ## Dataset
44
+ - **Dataset Name**: Custom Ocular Disease and its Secondary complications Dataset
45
+ - **Dataset Source**: Private Dataset (Medical Images)
46
+ - **Dataset Structure**: Images stored in folders based on class labels
47
+ - **Preprocessing**:
48
+ - Resized images to 224x224 pixels
49
+ - Normalized using ImageNet mean and standard deviation
50
+
51
+ ## Model Performance
52
+ - **Accuracy**: 98% on the test dataset
53
+ - **Precision/Recall/F1-score**: Evaluated and optimized for medical diagnosis
54
+ - **Overfitting Prevention**: Implemented **data augmentation, dropout, weight regularization**
55
+
56
+ ## Installation and Usage
57
+ ### Clone the Repository
58
+ ```bash
59
+ git clone https://huggingface.co/Aviroy/ROYXAI
60
+ cd ROYXAI
61
+ ```
62
+
63
+ ### Install Dependencies
64
+ ```bash
65
+ pip install -r requirements.txt
66
+ ```
67
+
68
+ ### Training the Model
69
+ To train the model from scratch, run:
70
+ ```bash
71
+ python train.py --epochs 50 --batch_size 32
72
+ ```
73
+
74
+ ### Load Pretrained Model
75
+ To directly use the trained model:
76
+ ```python
77
+ import torch
78
+ from PIL import Image
79
+ import torchvision.transforms as transforms
80
+ from model import ensemble_model # Load the trained ensemble model
81
+
82
+ # Define image transformations
83
+ transform = transforms.Compose([
84
+ transforms.Resize((224, 224)),
85
+ transforms.ToTensor(),
86
+ transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
87
+ ])
88
+
89
+ # Load and preprocess an image
90
+ image_path = "path/to/image.jpg"
91
+ image = Image.open(image_path).convert('RGB')
92
+ image = transform(image).unsqueeze(0).to('cuda' if torch.cuda.is_available() else 'cpu')
93
+
94
+ # Perform inference
95
+ ensemble_model.eval()
96
+ with torch.no_grad():
97
+ output = ensemble_model(image)
98
+ predicted_class = torch.argmax(output, dim=1).item()
99
+
100
+ # Print classification result
101
+ print("Predicted Class:", predicted_class)
102
+ ```
103
+
104
+ ## Grad-CAM Visualization
105
+ ### Visualizing Attention Maps for Interpretability
106
+ #### Vision Transformer (ViT)
107
+ ```python
108
+ from visualization import visualize_gradcam_vit # Function for ViT Grad-CAM
109
+
110
+ # Generate Grad-CAM visualization
111
+ overlay = visualize_gradcam_vit(ensemble_model.models[0], image, target_class=predicted_class)
112
+
113
+ # Display the Grad-CAM output
114
+ import matplotlib.pyplot as plt
115
+ plt.imshow(overlay)
116
+ plt.axis('off')
117
+ plt.title("Grad-CAM for Vision Transformer")
118
+ plt.show()
119
+ ```
120
+
121
+ #### ResNet50
122
+ ```python
123
+ from visualization import visualize_gradcam # General Grad-CAM function
124
+
125
+ # Generate Grad-CAM visualization for ResNet50
126
+ overlay = visualize_gradcam(ensemble_model.models[2], image, target_class=predicted_class)
127
+
128
+ # Display the Grad-CAM output
129
+ import matplotlib.pyplot as plt
130
+ plt.imshow(overlay)
131
+ plt.axis('off')
132
+ plt.title("Grad-CAM for ResNet50")
133
+ plt.show()
134
+ ```
135
+
136
+ #### VGG19
137
+ ```python
138
+ from visualization import visualize_gradcam # General Grad-CAM function
139
+
140
+ # Generate Grad-CAM visualization for VGG19
141
+ overlay = visualize_gradcam(ensemble_model.models[1], image, target_class=predicted_class)
142
+
143
+ # Display the Grad-CAM output
144
+ import matplotlib.pyplot as plt
145
+ plt.imshow(overlay)
146
+ plt.axis('off')
147
+ plt.title("Grad-CAM for VGG19")
148
+ plt.show()
149
+ ```
150
+
151
+ ## Training Configuration
152
+ - **Optimizer**: Adam with weight decay
153
+ - **Learning Rate Scheduler**: Cosine Annealing LR
154
+ - **Loss Function**: Cross-Entropy Loss
155
+ - **Batch Size**: 32
156
+ - **Training Epochs**: 20
157
+ - **Hardware Used**: T4 GPU x2 ,M1chip ,GPU P100
158
+
159
+ ## Limitations & Considerations
160
+ - This model is trained on a specific dataset and may not generalize well to other medical image datasets without fine-tuning.
161
+ - It is **not a substitute for professional medical diagnosis**.
162
+ - The Vision Transformer model is computationally expensive compared to CNNs.
163
+
164
+ ## Citation
165
+ If you use this model in your research, please cite:
166
+ ```
167
+ @article{Sparsho2025,
168
+ author = {Avishek Roy Sparsho},
169
+ title = {ROYXAI Model For Proper Visualization of Classified Medical Image},
170
+ journal = {Medical AI Research},
171
+ year = {2025}
172
+ }
173
+ ```
174
+
175
+ ## Acknowledgments
176
+ Special thanks to the open-source community and Kaggle for providing medical datasets for deep learning research.
177
+
178
+ ## License
179
+ This model is released under the **Apache 2.0 License**. Use it responsibly.
180
+
181
+
182
+
183
+
184
+
185
+
186
+
187
+