COMPARATIVE ANALYSIS AND EXPERIMENTAL EVALUATION OF ALGORITHMS FOR RECOVERING MISSING (NAN) VALUES IN INFORMATION SYSTEM DATA
Keywords:
NaN values, missing data, imputation algorithms, KNN imputation, MICE method, data analysis, machine learning, information systems.Abstract
This article investigates the problem of identifying and recovering missing (NaN) values in
information system data and examines their influence on analytical results. During the research process, artificial
missing values with different proportions were generated based on a complete dataset and subsequently
restored using statistical and machine-learning-based imputation methods. The effectiveness of each algorithm
was evaluated using error metrics obtained through comparison with the original ground-truth values. The
obtained results made it possible to determine the efficiency of different methods depending on the structure of
the data and to establish a methodological basis for selecting optimal approaches in the intelligent analysis of
information system data. The findings of the study contribute positively to improving data quality and enhancing
the reliability of analytical processes
References
Little, R. J. A., Rubin, D. B. Statistical Analysis with Missing Data. 3rd ed. — Hoboken, NJ: Wiley, 2019.
Van Buuren, S. Flexible Imputation of Missing Data. 2nd ed. — Boca Raton: CRC Press, 2018.
Batista, G. E. A. P. A., Monard, M. C. An analysis of four missing-data treatment methods for supervised learning //
Applied Artificial Intelligence. — 2003. — Vol. 17. — pp. 519–533.
Troyanskaya, O., Cantor, M., Sherlock, G., et al. Missing-value estimation methods for DNA microarrays // Bioinformatics.
— 2001. — Vol. 17(6). — pp. 520–525.
Zhang, S. Nearest-neighbor selection for iteratively KNN imputation // Journal of Systems and Software. — 2012. —
Vol. 85. — pp. 2541–2552.
Rubin, D. B. Multiple Imputation for Nonresponse in Surveys. — New York: John Wiley & Sons, 1987.
García-Laencina, P. J., Sancho-Gómez, J. L., Figueiras-Vidal, A. R. Pattern classification with missing data: A review
// Neural Computing and Applications. — 2010. — Vol. 19. — pp. 263–282.
Jerez, J. M., Molina, I., García-Laencina, P. J., et al. Missing-data imputation using statistical and machine-learning
methods in a real breast-cancer problem // Artificial Intelligence in Medicine. — 2010. — Vol. 50. — pp. 105–115.
Hastie, T., Tibshirani, R., Friedman, J. The Elements of Statistical Learning. 2nd ed. — New York: Springer, 2009.
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. Scikit-learn: Machine Learning in Python // Journal of Machine
Learning Research. — 2011. — Vol. 12. — pp. 2825–2830.
Kuhn, M., Johnson, K. Applied Predictive Modeling. — New York: Springer, 2013.
Aggarwal, C. C. Data Mining: The Textbook. — Cham: Springer, 2015.
Fayzullo, N., Akbar, R., Yarmatov, S. Determining the number of effective distributions based on neural-network
ensemble // International Journal of Intelligent Systems and Applications. — 2025. — Vol. 17(4). — pp. 69–77.
Yarmatov, S., Bustanov, X., Safarova, L. Optimization and improvement of reliability of machine-learning algorithms
based on regularization methods // 2025 International Russian Smart Industry Conference (SmartIndustryCon). —
Sochi, Russian Federation, 2025. — pp. 394–398.