Benchmarking Robust Machine Learning Models Under Data Imperfections in Real-World Data Science Scenarios

Marlindawati, . and Muhammad, Azhar and Esha, Sabir (2026) Benchmarking Robust Machine Learning Models Under Data Imperfections in Real-World Data Science Scenarios. Journal of Data Science, 2026 (03). pp. 38-59. ISSN 2805-5160

[img] Text
jods2026_03.pdf - Published Version
Available under License Creative Commons Attribution.

Download (269kB)
[img] Text
853 - Published Version
Available under License Creative Commons Attribution.

Download (49kB)
Official URL: http://ipublishing.intimal.edu.my/jods.html

Abstract

Machine learning systems deployed in real-world environments frequently encounter data imperfections such as noise, missing values, class imbalance, and distribution shifts. Despite substantial progress in model development, most evaluation protocols rely on clean benchmark datasets, creating a gap between laboratory performance and operational reliability. Existing robustness studies often focus on isolated perturbation types or single model families, lacking a unified benchmarking framework. This study proposes a structured and reproducible benchmarking methodology to systematically evaluate model robustness under controlled data degradation scenarios. Multiple classical machine learning algorithms and deep learning models were assessed across diverse benchmark datasets. Controlled perturbations—including feature noise, label corruption, missingness mechanisms, imbalance ratios, and covariate shifts—were introduced at progressive levels. Performance was evaluated using predictive metrics, robustness degradation rate (RDR), and computational efficiency, with statistical validation across repeated experimental runs. Results indicate that ensemble-based methods consistently achieved the strongest robustness, maintaining degradation rates below 10% under moderate noise and imbalance conditions. Deep neural networks demonstrated superior clean-data accuracy but experienced sharper degradation under structured corruption and distribution shifts. Mitigation strategies such as regularization and resampling reduced degradation by 5–12% under moderate perturbations but showed limited effectiveness under extreme conditions. The findings demonstrate that robustness is multidimensional and dependent on alignment between model inductive bias and data imperfection type. The proposed benchmarking framework provides practical guidance for selecting machine learning models suited to imperfect data environments, advancing reliable and deployment-ready AI systems

Item Type: Article
Uncontrolled Keywords: Robust Machine Learning; Data Quality; Benchmarking; Model Evaluation; Real-World Data
Subjects: Q Science > Q Science (General)
Q Science > QA Mathematics > QA75 Electronic computers. Computer science
Q Science > QA Mathematics > QA76 Computer software
Depositing User: Unnamed user with email masilah.mansor@newinti.edu.my
Date Deposited: 26 Feb 2026 05:23
Last Modified: 26 Feb 2026 05:23
URI: http://eprints.intimal.edu.my/id/eprint/2300

Actions (login required)

View Item View Item