Model Description

The following model is designed to predict whether a patient being screened has a cancerous tumor, depending on certain factors related to breast shape, texture, smoothness, etc. The model has a total of 30 input features, and is designed to work within form-based applications, i.e. software applications which require user input.

NOTE: The following model is meant as an assistive tool, and must NOT directly be used to produce the final verdict on a patient's condition. As it is meant to promote further evaluations upon having completed its prediction.

  • Developed by: DeepNeural
  • Model type: Tabular Classifier
  • Language(s): English
  • License: MIT

Model Inputs

Variable Name Type Description
radius1 Continuous radius (mean of distances from center to points on the perimeter)
texture1 Continuous texture (standard deviation of gray-scale values)
perimeter1 Continuous perimeter
area1 Continuous area
smoothness1 Continuous smoothness (local variation in radius lengths)
compactness1 Continuous compactness (perimeter^2 / area - 1.0)
concavity1 Continuous concavity (severity of concave portions of the contour)
concave_points1 Continuous concave points (number of concave portions of the contour)
symmetry1 Continuous symmetry
fractal_dimension1 Continuous ractal dimension ("coastline approximation" - 1)
radius2 Continuous
texture2 Continuous
perimeter2 Continuous
area2 Continuous
smoothness2 Continuous
compactness2 Continuous
concavity2 Continuous
concave_points2 Continuous
symmetry2 Continuous
fractal_dimension2 Continuous
radius3 Continuous
texture3 Continuous
perimeter3 Continuous
area3 Continuous
smoothness3 Continuous
compactness3 Continuous
concavity3 Continuous
concave_points3 Continuous
symmetry3 Continuous
fractal_dimension3 Continuous

Model Sources

Uses

This model is primarily designed for Data Scientists, Software Engineers and Machine Learning Engineers who have an interest in developing predictive breast cancer software applications, for various healthcare institutions, ranging from hospitals to clinics. Furthermore, this model is also designed for educational purposes within acadamia, whereby breast cancer risk-analysis is a priority of the study.

Foreseeable users of the software applications to be developed with this model include: doctors, nurses (with respect to their patients)

Bias, Risks, and Limitations

Please be adviced that our model was trained on a specific dataset for breast cancer prediction, and although it has an high level of accuracy and precision, there may come certain moments where misclassifications occur.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More research needed for further recommendations. Furthermore, the following model will continously undergo improvements and testing for better results capable of fixing the limitations mentioned in the previous section. It is further adviced that this model be used an assistive tool in diagnostics procedures.

How to Get Started with the Model

To properly make use of this model, please refer to the illustration below, which showcases how this model can be loaded directly into an application. Please note, that, because it was built with the Scikit-Learn Machine Learning library, the model has been saved as a .joblib file. With that in mind, please proceed by copying the following code into your coding environment (Python).

  1. Install Joblib

    !pip install joblib
    
  2. Load the model Upon Installation

    my_model = joblib.load('breast_cancer_classifier_model_v1.joblib')
    
  3. Make predictions (Binary or Probability)

    my_model.predict(X_test)
    
    # For probability-based outputs
    
    my_model.predict_proba(X_test)
    

NOTE: This model requires input data in a 2-Dimensional format (Pandas Series) with the column names, considering the model is to be used in form-based applications.

Metrics

We tested our dataset on various Machine Learning models, namely: logistic regression, Stochastic Gradient Descent, and Support Vector Machines. After performing hyperparameter tuning on the Logistic Regression model, we opted to prioritize said model for our metrics calculations. The metrics used were accuracy, precision, recall, f1-score and AUC. The results for our model can be seen in the 'Results' section.

Results (Best and final scores after fixing imbalanced issues)

Accuracy - 94% Precision - 100% Recall - 84% AUC - 92% F1-Score - 91%

Environmental Impact

  • Hardware Type: T4 (for training)
  • Hours used: < 20hr
  • Cloud Provider: Google Cloud
  • Compute Region: Europe
  • Carbon Emitted: 1.02
Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support