Impact of Data Normalization on K-Nearest Neighbor Classification Performance: A Case Study on Date Fruit Dataset
DOI:
https://doi.org/10.64479/iarci.v1i2.61Keywords:
K-Nearest Neighbor,, Data Normalization, Distance-Based Classification, , Weighted KNN,Abstract
Data normalization is a crucial preprocessing step for distance-based classification algorithms such as K-Nearest Neighbor (KNN), as differences in feature scales can significantly affect distance calculations and classification accuracy. This study investigates the impact of data normalization on KNN classification performance using the Date Fruit Dataset as a case study. Three preprocessing scenarios are evaluated: raw data without normalization, Min–Max normalization, and Z-score standardization. In addition, the performance of standard KNN is compared with distance-weighted KNN to assess the contribution of distance weighting under different preprocessing conditions. The experiments are conducted using stratified 10-fold cross-validation, and model performance is evaluated using accuracy and standard deviation. Statistical significance of performance differences is examined using paired t-test, and sensitivity analysis is performed to analyze the effect of varying the number of nearest neighbors. The results show that data normalization leads to a substantial improvement in classification performance compared to raw data. Z-score standardization achieves the highest and most stable accuracy, followed by Min–Max normalization. Distance-weighted KNN consistently produces slightly higher accuracy than standard KNN; however, the improvement is not statistically significant after normalization. Sensitivity analysis indicates that normalized data results in a wider and more stable range of optimal k values. These findings demonstrate that data normalization plays a more dominant role than distance weighting in improving KNN performance. The study provides empirical evidence that proper preprocessing is essential for reliable KNN-based classification and establishes a robust baseline for further enhancements such as feature weighting and metaheuristic optimization.
Downloads
References
[1] Y. Dimas Pratama and A. Salam, “Comparison of Data Normalization Techniques on KNN Classification Performance for Pima Indians Diabetes Dataset,” J. Appl. Informatics Comput., vol. 9, no. 3, pp. 693–706, 2025, doi: 10.30871/jaic.v9i3.9353.
[2] M. Yusran, K. Sadik, A. M. Soleh, and C. Suhaeni, “Effect of Feature Normalization and Distance Metrics on K-Nearest Neighbors Performance for Diabetes Disease Classification,” J. Math. Comput. Stat., vol. 8, no. 2, pp. 341–354, 2025, doi: 10.35580/jmathcos.v8i2.8012.
[3] S. Zhang, “Challenges in KNN Classification,” IEEE Trans. Knowl. Data Eng., vol. 34, no. 10, pp. 4663–4675, 2022, doi: 10.1109/TKDE.2021.3049250.
[4] Y. Manzali, K. A. Barry, R. Flouchi, Y. Balouki, and M. Elfar, “A feature weighted K-nearest neighbor algorithm based on association rules,” J. Ambient Intell. Humaniz. Comput., vol. 15, no. 7, pp. 2995–3008, 2024, doi: 10.1007/s12652-024-04793-z.
[5] J. Manurung, H. Saragih, M. A. Prabukusumo, and E. A. Firdaus, “Optimizing the performance of the K-Nearest Neighbors algorithm using grid search and feature scaling to improve data classification accuracy,” vol. 14, no. 2, pp. 260–268, 2025.
[6] I. Niño-Adan, I. Landa-Torres, E. Portillo, and D. Manjarres, “Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0,” Eng. Appl. Artif. Intell., vol. 111, p. 104807, 2022, doi: https://doi.org/10.1016/j.engappai.2022.104807.
[7] M. Pagan, M. Zarlis, and A. Candra, “Investigating the impact of data scaling on the k-nearest neighbor algorithm,” Computer Science and Information Technologies, vol. 4, no. 2. pp. 135–142, 2023, doi: 10.11591/csit.v4i2.pp135-142.
[8] F. Tarakci and A. Ozkan, “Comparison of classification performance of kNN and WKNN algorithms,” Selcuk Univ. J. Eng. Sci., vol. 20, no. 02, pp. 32–37, 2021.
[9] H. Vega-Huerta et al., “K-Nearest Neighbors Model to Optimize Data Classification According to the Water Quality Index of the Upper Basin of the City of Huarmey,” Appl. Sci., vol. 15, no. 18, 2025, doi: 10.3390/app151810202.
[10] C. Ma, X. Du, and L. Cao, “Improved KNN Algorithm for Fine-Grained Classification of Encrypted Network Flow,” Electronics, vol. 9, no. 2, 2020, doi: 10.3390/electronics9020324.
[11] Y. Liu, Y. Zhang, X. Wang, and X. Qu, “Evidential K-Nearest Neighbors with Cognitive-Inspired Feature Selection for High-Dimensional Data,” Big Data Cogn. Comput., vol. 9, no. 8, 2025, doi: 10.3390/bdcc9080202.
[12] M. Kim et al., “Fault Detection Method via k-Nearest Neighbor Normalization and Weight Local Outlier Factor for Circulating Fluidized Bed Boiler with Multimode Process,” Energies, vol. 15, no. 17, 2022, doi: 10.3390/en15176146.
[13] A. Nasiri, A. Taheri-Garavand, and Y.-D. Zhang, “Image-based deep learning automated sorting of date fruit,” Postharvest Biol. Technol., vol. 153, pp. 133–141, 2019, doi: https://doi.org/10.1016/j.postharvbio.2019.04.003.
[14] A. Alsirhani, M. H. Siddiqi, A. M. Mostafa, M. Ezz, and A. A. Mahmoud, “A Novel Classification Model of Date Fruit Dataset Using Deep Transfer Learning,” Electronics, vol. 12, no. 3, 2023, doi: 10.3390/electronics12030665.
[15] K. Albarrak, Y. Gulzar, Y. Hamid, A. Mehmood, and A. B. Soomro, “A Deep Learning-Based Model for Date Fruit Classification,” Sustainability, vol. 14, no. 10, 2022, doi: 10.3390/su14106339.
[16] M. Pagan, M. Zarlis, and A. Candra, “Investigating the impact of data scaling on the k-nearest neighbor algorithm,” Comput. Sci. Inf. Technol., vol. 4, no. 2, pp. 135–142, 2023, doi: 10.11591/csit.v4i2.pp135-142.
[17] C.-Y. Lee, K.-Y. Huang, Y.-X. Shen, and Y.-C. Lee, “Improved Weighted k-Nearest Neighbor Based on PSO for Wind Power System State Recognition,” Energies, vol. 13, no. 20, 2020, doi: 10.3390/en13205520.
[18] N. Rastin, M. Z. Jahromi, and M. Taheri, “A generalized weighted distance k-Nearest Neighbor for multi-label problems,” Pattern Recognit., vol. 114, p. 107526, 2021, doi: https://doi.org/10.1016/j.patcog.2020.107526.
[19] A. A. Amer, S. D. Ravana, and R. A. A. Habeeb, “Effective k-nearest neighbor models for data classification enhancement,” J. Big Data, vol. 12, no. 1, p. 86, 2025, doi: 10.1186/s40537-025-01137-2.
[20] A. A. S. R. de Sousa, J. da Silva Coelho, M. R. Machado, and M. Dutkiewicz, “Multiclass Supervised Machine Learning Algorithms Applied to Damage and Assessment Using Beam Dynamic Response,” J. Vib. Eng. Technol., vol. 11, no. 6, pp. 2709–2731, 2023, doi: 10.1007/s42417-023-01072-7.
[21] A. Gyasi-Agyei, “A comparative assessment of machine learning models and algorithms for osteosarcoma cancer detection and classification,” Healthc. Anal., vol. 7, p. 100380, 2025, doi: https://doi.org/10.1016/j.health.2024.100380.
[22] S. Szeghalmy and A. Fazekas, “A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning.,” Sensors (Basel)., vol. 23, no. 4, Feb. 2023, doi: 10.3390/s23042333.
[23] M. B. - and D. B. B. -, “A Comprehensive Review of Cross-Validation Techniques in Machine Learning,” Int. J. Sci. Technol., vol. 16, no. 1, pp. 1–4, 2025, doi: 10.71097/ijsat.v16.i1.1305.







