Automated Feature Engineering Using Meta-Learning for Efficient and Generalizable Data Science Pipelines

Helda, Yudhiastuti and Shafiq, Hussain and Irfa, Shabbir (2026) Automated Feature Engineering Using Meta-Learning for Efficient and Generalizable Data Science Pipelines. Journal of Data Science, 2026 (04). pp. 60-79. ISSN 2805-5160

	Text jods2026_04.pdf - Published Version Available under License Creative Commons Attribution. Download (304kB)
	Text 854 - Published Version Available under License Creative Commons Attribution. Download (42kB)

Official URL: http://ipublishing.intimal.edu.my/jods.html

Abstract

Feature engineering remains one of the most time-intensive and expertise-dependent stages in machine learning pipelines, often limiting scalability and reproducibility. Despite advances in automated machine learning, existing systems largely emphasize model and hyperparameter optimization while leaving feature construction partially manual and task-specific. This reveals a critical research gap: the absence of a transferable, experience-driven mechanism capable of generalizing feature engineering knowledge across heterogeneous datasets. To address this limitation, this study proposes a meta-learning–based automated feature engineering framework that models transformation selection as a learnable mapping between dataset meta-characteristics and transformation utility. The framework constructs a reusable meta-knowledge layer trained on historical task–transformation–performance relationships and applies ranked transformation strategies to unseen datasets under computational constraints. Experiments conducted on diverse classification and regression datasets demonstrate that the proposed approach achieves up to 4.2% improvement in F1-score and 8.3% reduction in RMSE compared to raw-feature baselines, while maintaining performance comparable to or exceeding manually engineered pipelines. In addition, development time is reduced by up to 55%, and search complexity decreases by approximately 60% through ranking-based pruning. These findings confirm that feature engineering can be formalized as a transferable meta-learning problem, enabling scalable, efficient, and generalizable data science workflows. The study advances the automation of representation construction and supports the integration of intelligent meta-knowledge reuse in next-generation AutoML systems.

Item Type:	Article
Uncontrolled Keywords:	Automated Machine Learning; Feature Engineering; Meta-Learning; Data Pipelines; AutoML
Subjects:	Q Science > Q Science (General) Q Science > QA Mathematics > QA75 Electronic computers. Computer science Q Science > QA Mathematics > QA76 Computer software
Depositing User:	Unnamed user with email masilah.mansor@newinti.edu.my
Date Deposited:	26 Feb 2026 07:15
Last Modified:	26 Feb 2026 07:15
URI:	http://eprints.intimal.edu.my/id/eprint/2301

Actions (login required)

View Item