Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier

Authors

  • Emad Majeed Hameed Department of Computer Science, Gujarat University, Ahmedabad, India
  • Hardik Joshi Department of Computer Science, Gujarat University, Ahmedabad, India https://orcid.org/0000-0002-0943-6383

DOI:

https://doi.org/10.51173/jt.v6i3.2587

Keywords:

Diabetes, Prediction, Feature Selection, KNN

Abstract

Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used.

Downloads

Download data is not yet available.

Author Biographies

Emad Majeed Hameed, Department of Computer Science, Gujarat University, Ahmedabad, India

          

Hardik Joshi, Department of Computer Science, Gujarat University, Ahmedabad, India

      

References

A. K. Dewangan and P. Agrawal, “Classification of Diabetes Mellitus Using Machine Learning Techniques,” Int. J. Eng. Appl. Sci. IJEAS, vol. 2, no. 5, May 2015.

M. Marinov, A. S. M. Mosa, I. Yoo, and S. A. Boren, “Data-Mining Technologies for Diabetes: A Systematic Review,” J. Diabetes Sci. Technol., vol. 5, no. 6, Nov. 2011, https://doi.org/10.1177/193229681100500631.

American Diabetes Association, “Classification and Diagnosis of Diabetes: Standards of Medical Care in Diabetes -2018,” Diabetes Care, vol. 41, no. Suppl.1, pp. S13–S27, 2018, https://doi.org/10.2337/dc18-S002.

World Health Organization, Global Report on Diabetes. Geneva: WHO Library, 2016.

S. Bano, M. Naeem, and A. Khan, “A Framework to Improve Diabetes Prediction using k-NN and SVM,” Int J Comput Sci Inf Secur IJCSIS, vol. 14, no. 11, 2016.

N. Sneha and T. Gangil, “Analysis of diabetes mellitus for early prediction using optimal features selection,” J Big Data, vol. 6, 2019, https://doi.org/10.1186/s40537-019-0175-6.

J. Gou, T. Xiong, and Y. Kuang, “A Novel Weighted Voting for K-Nearest Neighbor Rule,” J. Comput., vol. 6, no. 5, May 2011, doi:10.4304/jcp.6.5.833-840.

P. S. Kumar and V. Umatejaswi, “Diagnosing Diabetes using Data Mining Techniques,” Int J Sci Res Publ, vol. 7, no. 6, pp. 705–709, 2017.

A. Jakka and J. Vakula Rani, “Performance evaluation of machine learning models for diabetes prediction,” Int J Innov Technol Explor. Eng, vol. 8, pp. 1976–1980, 2019, https://doi.org/10.1016/j.eswa.2022.116857.

Y. A. Christobel and P. Sivaprakasam, “A New Classwise k Nearest Neighbor (CKNN) Method for the Classification of Diabetes Dataset,” IJEAT, vol. 2, no. 3, pp. 396–400, 2013.

A. H. Khaleel, G. A. Al-Suhail, and B. M. Hussan, “A weighted voting of k-nearest neighbor algorithm for diabetes mellitus,” Int. J. Comput. Sci. Mob. Comput., vol. 6, no. 1, pp. 43–51, 2017.

I. H. Sarker, M. F. Faruque, H. Alqahtani, and A. Kalim, “K-Nearest Neighbor Learning based Diabetes Mellitus Prediction and Analysis for eHealth Services”, EAI Endorsed Scal Inf Syst, vol. 7, no. 26, p. e4, Jan. 2020, https://doi.org/10.4108/eai.13-7-2018.162737 .

S. C. Gupta and N. Goel, “Performance enhancement of diabetes prediction by finding optimum K for KNN classifier with feature selection method,” in 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), IEEE, Aug. 2020, pp. 980–986, https://doi.org/10.1109/ICSSIT48917.2020.9214129.

R. Saxena, “Role of K-nearest neighbour in detection of Diabetes Mellitus,” Turk. J. Comput. Math. Educ. TURCOMAT, vol. 12, no. 10, pp. 373–376, 2021.

B. V. V. Prasad, S. Gupta, N. Borah, R. Dineshkumar, H. Lautre, and B. Mouleswararao, “Predicting diabetes with multivariate analysis an innovative KNN-based classifier approach,” Prev. Med., vol. 174, p. 107619, Jul. 2023, doi: 10.1016/j.ypmed.2023.107619, https://doi.org/10.1016/j.ypmed.2023.107619.

K. Alnowaiser, “Improving Healthcare Prediction of Diabetic Patients Using KNN Imputed Features and Tri-Ensemble Model,” IEEE Access, vol. 12, pp. 16783–16793, 2024, doi: 10.1109/ACCESS.2024.3359760, https://doi.org/10.1109/ACCESS.2024.3359760.

J. Muthu and s Suriya, “Type 2 Diabetes Prediction using K-Nearest Neighbor Algorithm,” J. Trends Comput. Sci. Smart Technol., vol. 5, Jun. 2023, doi: 10.36548/jtcsst.2023.2.007.

K. Saxena, Z. Khan, and S. Singh, “Diagnosis of Diabetes Mellitus using K Nearest Neighbor Algorithm,” Int. J. Comput. Sci. Trends Technol. IJCST, vol. 2, no. 4, Aug. 2014.

E. M. Hameed and H. Joshi, “Performance comparison of machine learning techniques in prediction of diabetes risk,” AIP Conf. Proc., vol. 3051, no. 1, p. 040002, Feb. 2024, https://doi.org/10.1063/5.0191611.

X. Wu and V. Kumar, The Top Ten Algorithms in Data Mining. USA: Taylor & Francis Group, 2009.

M. A. M. Khan, “Fast Distance Metric Based Data Mining Techniques Using Ptrees: K-Nearest-Neighbor Classification and k-Clustering,” Master’s Thesis, North Dakota State University, North Dakota, USA, 2001.

N. Sambasivan and A. Ansari, Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons, 2015.

W. Yu and W. A. Zhengguo, “Fast KNN algorithm for Text Categorization,” in Proc. of the 6th International Conference on Machine Learning and Cybernetics, Hong Kong, 2007, https://doi.org/10.1109/ICMLC.2007.4370742.

E. M. Hameed and H. Joshi, “Current Diabetes Classification and Prediction Models Using Intelligent Techniques,” in Minar Congress 6, 2022, p. 20, http://dx.doi.org/10.47832/MinarCongress6-2.

S. E. Maxwell, H. D. Delaney, and K. Kelley, Designing Experiments and Analyzing Data: A Model Comparison Perspective. Routledge, 2017.

The KNN performance using different k values and the Euclidean distance method

Downloads

Published

2024-09-30

How to Cite

Emad Majeed Hameed, & Hardik Joshi. (2024). Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier. Journal of Techniques, 6(3), 19–25. https://doi.org/10.51173/jt.v6i3.2587

Issue

Section

Engineering (Miscellaneous): Computer Engineering