Model Card for Model ID
The following model is designed to predict, given a certain number of inputs, whether a person has and/or is it at risk of acquiring diabetes.
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Description
The following model is designed to predict, given a certain number of inputs, whether a person has and/or is it at risk of acquiring diabetes. The model has a total of 21 input features, and is designed to work within form-based applications, i.e. software applications which require user input.
NOTE: The following model is meant as an assistive tool, and must NOT directly be used to produce the final verdict on a person or patient's condition. As it is meant to promote further evaluations upon having completed its prediction.
- Developed by: DeepNeural
- Model type: Tabular Classifier
- Language(s): English
- License: MIT
Model Inputs
Variable Name | Type | Description | Question Input Type |
---|---|---|---|
HighBP | Binary | Does the patient have high blood pressure? | 0 = no, 1 = yes |
HighChol | Binary | Does the patient have high cholesterol? | 0 = no, 1 = yes |
CholCheck | Binary | Has the patient had a cholesterol check in 5 years? | 0 = no, 1 = yes |
BMI | Integer | Body Mass Index | Numeric value |
Smoker | Binary | Does the patient smoke? (at least 5 packs)? | 0 = no, 1 = yes |
Stroke | Binary | Has the patient suffered from a stroke? | 0 = no, 1 = yes |
HeartDiseaseAttack | Binary | Coronary heart disease or myocardial infarction? | 0 = no, 1 = yes |
PhysActivity | Binary | Physical activity in the past 30 days? | 0 = no, 1 = yes |
Fruits | Binary | Does the patient consume one or more fruits per day? | 0 = no, 1 = yes |
Veggies | Binary | Does the patient consume vegetables one or more times per day? | 0 = no, 1 = yes |
HvyAlcoholConsump | Binary | Heavy drinker (14 drinks per week for men, 7 for women)? | 0 = no, 1 = yes |
AnyHealthcare | Binary | Does the patient have healthcare coverage? | 0 = no, 1 = yes |
NoDocbcCost | Binary | Difficulty reaching a doctor due to cost in the past 12 months? | 0 = no, 1 = yes |
GenHlth | Integer | How good is the patient's general health? | 1 = excellent, 2 = very good, 3 = good, 4 = fair, 5 = poor |
MenHlth | Integer | Days in the past 30 when mental health was not good? | Scale 1-30 |
PhysHlth | Integer | Days in the past 30 when physical health was poor? | Scale 1-30 |
DiffWalk | Binary | Does the patient have difficulty walking? | 0 = no, 1 = yes |
Sex | Binary | What is the patient's sex? | 0 = female, 1 = male |
Age | Integer | What is the patient's age? | 1 = 18-24, 9 = 60-64, 13 = 80 or older |
Education | Integer | Maximum education reached | 1 = never attended school 2 = grades 1-8 3 = grades 9-11 4 = grade 12 or GED 5 = college (1-3 years) 6 = college (4+ years) |
Income | Integer | Income level | 1 = less than $10,000 5 = less than $35,000 8 = $75,000 or more |
Model Sources
Uses
This model is primarily designed for Data Scientists, Software Engineers and Machine Learning Engineers who have an interest in developing diabetic-based software applications, for various healthcare institutions, ranging from hospitals to clinics. Furthermore, this model is also designed for educational purposes within acadamia, whereby diabetic risk-analysis is a priority of the study.
Foreseeable users of the software applications to be developed with this model include: doctors, nurses (with respect to their patients)
Bias, Risks, and Limitations
Please be adviced that our model had to be adjusted to place a greater emphasis on the minority class - a positive result - which ensured a robust model was built. However, in correcting the aforementioned issue of an imbalanced dataset, our model now works well with real life data, whereby the minority class requires a greater level of importance (see the results for metrics). However, the model may still suffer from misclassifications at certain points, and therefore, users are adviced to remember that this model is meant as an assistive tool, aiding in faster diagnostics.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More research needed for further recommendations. Furthermore, the following model will continously undergo improvements and testing for better results capable of fixing the limitations mentioned in the previous section.
How to Get Started with the Model
To properly make use of this model, please refer to the illustration below, which showcases how this model can be loaded directly into an application. Please note, that, because it was built with the Scikit-Learn Machine Learning library, the model has been saved as a .joblib file. With that in mind, please proceed by copying the following code into your coding environment (Python).
Install Joblib
!pip install joblib
Load the model Upon Installation
my_model = joblib.load('diabetes_health_indicators_classifier_v1.joblib')
Make predictions (Binary or Probability)
my_model.predict(X_test) # For probability-based outputs my_model.predict_proba(X_test)
NOTE: This model requires input data in a 2-Dimensional format (Pandas Series) with the column names, considering the model is to be used in form-based applications.
Metrics
We tested our dataset on various Machine Learning models, namely: logistic regression, Stochastic Gradient Descent, and Support Vector Machines. In all of these cases, we tested our models on the new (unforseen) test data. In doing this, we discovered that all three models performed well, with promising accuracy, recall, and AUC scores; these being the most trustworthy scores, as our dataset was originally imbalanced; we thus performed multiple types of imbalance adjustments, to place a greater emphasis on the minority class, which is, the more important class. Upon having adjusted the dataset, we retrained all of our models once more to draw a conclusion. The best performing model, after performing hyperparameter tuning, was the SGDClassifier model. The primary metrics used were: accuracy, recall, AUC, precision and f1-score. Please refer to the results section to see the results.
Results (Best and final scores after fixing imbalanced issues)
Accuracy - 73% Precision - 31% Recall - 78% AUC - 75% F1-Score - 44%
Environmental Impact
- Hardware Type: T4 (for training)
- Hours used: < 20hr
- Cloud Provider: Google Cloud
- Compute Region: Europe
- Carbon Emitted: 1.02
- Downloads last month
- 0