Mammograms are not 100% accurate in the identifying the breast cancer. Better methods are needed to predict the breast cancer without the need of surgical biopsies. The study evaluated the prediction accuracy of breast cancer using the K-nearest neighbor (k-NN) classifier algorithm.
The breast cancer dataset (containing 569 records and 32 attributes) was obtained from University of California Irvine (UCI) machine learning repository. Applying supervised machine learning technique to patient characteristics including tumor features (radius, texture, perimeter, area, smoothness, compactness, concavity, concave points, symmetry and fractal dimension), k-NEAREST NEIGHBOUR (k-NN) was used to detect whether the mass was malignant or benign. Data were segmented into a training dataset containing the first 469 observations to build the k-NN model and a testing dataset containing the remaining observations was used to simulate new patients. Normalization of the data points was applied to rescale the features to a standard range of values. The initial choice of k = 21, approximately square root of 469 patients in our training dataset was used. Alternative k-values (k = 1,5,11,15,21,27) were also tested to optimize the model performance. The analysis was conducted using “class” package of R (v3.6.2).
In 100 simulations, 98% accuracy was achieved by the k-NN algorithm -- i.e., only 2 out of 100, or 2 percent of masses were incorrectly classified. Choice of k=21 seems more accurate than any other choices as it has the minimum number of incorrect identification of cancerous cells.
Supervised machine learning algorithm was shown to be capable of tackling extremely complex tasks such as identification of cancerous masses with reasonable accuracy. The application of this analysis could be an important resource for early detection of cancerous tumors and their treatment.
© 2020 Published by Elsevier Inc.