Breast Cancer Prediction

Machine Learning Project

Knearest Neighbors Model

After the implementation and the execution of the created machine learning model using the “K-Nearest Neighbor Classifier algorithm” it could be clearly revealed that the predicted model for the “Breast Cancer Wisconsin (Diagnostic) Data Set ” gives the best accuracy score as 97.18% with feature size = 30.

Optimal numbers of neighbors were 9. To select the best tuning parameter in this model applied 10 fold cross-validation for testing.

KNN Model with Full 30 Features

KNN Model with 7 BestFeature Selected Features

KNN Model with 7 Correlation Selected Features

Observations:

1. The accuracy for the model with 7 correlation selected features was worse than the other two models.

2. The model with 7 BestFeature selected features was better at predicting Benign while the model with 7 correlation selected features was equally good at predicting Benigh and Malignant.

Northwestern Data Visualization Final Project