Federated Learning for Privacy-Preserving Data Science: Performance, Efficiency, and Scalability Analysis

Authors

  • . Nirwana Universitas Bina Darma, Palembang, Indonesia
  • Muhammad Azhar Hong Kong Shue Yan University, Hong Kong, SAR, China
  • Mehwish Usman University of Agriculture, Faisalabad, Pakistan

DOI:

https://doi.org/10.61453/jods.v20260105

Keywords:

Federated Learning, Privacy-Preserving AI, Distributed Data Science, Scalability, Secure Analytics

Abstract

The rapid growth of distributed and privacy-sensitive data environments has intensified the need for collaborative machine learning approaches that preserve confidentiality without sacrificing performance. Traditional centralized learning requires data aggregation, creating regulatory, ethical, and security risks. Although federated learning (FL) addresses this limitation by enabling decentralized training, existing implementations suffer from performance degradation under non-IID data distributions, unstable convergence, and high communication overhead. Moreover, many studies focus primarily on accuracy comparisons without systematically evaluating scalability and efficiency trade-offs. This study proposes an Adaptive Federated Learning (AFL) framework that integrates divergence-aware aggregation and intelligent client selection to enhance convergence stability and communication efficiency in heterogeneous environments. A comprehensive experimental evaluation was conducted across IID and non-IID data partitions, varying participation rates, and communication constraints. Performance was assessed using predictive accuracy, F1-score, convergence rounds, communication volume, and scalability metrics, with comparisons against centralized learning and standard FedAvg. Results demonstrate that AFL improves accuracy by up to 5.3% and macro F1-score by 6.5% under highly non-IID settings compared to FedAvg, while reducing convergence rounds by approximately 23% and communication overhead by up to 28%. Statistical analysis confirms significant performance gains (p < 0.01). The findings indicate that adaptive orchestration mechanisms substantially enhance federated robustness without compromising privacy advantages. This research aims to provide a system-level evaluation framework for privacy-preserving distributed learning and offers actionable guidance for deploying scalable federated systems in healthcare, finance, and other data-sensitive domains.

References

Abdullah, R. M., Al-Surmi, I., Qaid, G. R. S., & Alwan, A. A. (2024). Energy-Efficient Handover Algorithm for Sustainable Mobile Networks: Balancing Connectivity and Power Consumption. Journal of Sensor and Actuator Networks, 13(5). https://doi.org/10.3390/jsan13050051

Abreha, H. G., Hayajneh, M., & Serhani, M. A. (2022). Federated Learning in Edge Computing: A Systematic Survey. Sensors 2022, Vol. 22, 22(2). https://doi.org/10.3390/s22020450

Ahmed, S., Kaiser, S. S., Chaki, S., & Ali, S. B. M. S. (2025). Adaptive Federated Learning With Reinforcement Learning-Based Client Selection for Heterogeneous Environments. IEEE Access, 13, 131671–131695. https://doi.org/10.1109/ACCESS.2025.3591699

Antonio, E. O. (2025). FGSM Attack Impact on MNIST Classifiers via PyTorch Lightning. IEEE 8th International Conference on Electrical, Control and Computer Engineering, InECCE 2025 - Proceedings, 617–621. https://doi.org/10.1109/InECCE64959.2025.11150836

Azhar, M., Amjad, A., Dewi, D. A., & Kasim, S. (2025). A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information, 16(9). https://doi.org/10.3390/info16090784

Cao, X., Basar, T., Diggavi, S., Eldar, Y. C., Letaief, K. B., Poor, H. V., & Zhang, J. (2023). Communication-Efficient Distributed Learning: An Overview. IEEE Journal on Selected Areas in Communications, 41(4), 851–873. https://doi.org/10.1109/JSAC.2023.3242710

Cerqueus, A., & Delorme, X. (2023). Evaluating the scalability of reconfigurable manufacturing systems at the design phase. International Journal of Production Research, 61(23), 8080–8093. https://doi.org/10.1080/00207543.2022.2164374

Dai, Z., Ma, Z., Zhang, X., Chen, J., Ershadnia, R., Luan, X., & Soltanian, M. R. (2022). An integrated experimental design framework for optimizing solute transport monitoring locations in heterogeneous sedimentary media. Journal of Hydrology, 614, 128541. https://doi.org/10.1016/j.jhydrol.2022.128541

Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., de las Casas, D., Donner, C., Fritz, L., Galperti, C., Huber, A., Keeling, J., Tsimpoukelli, M., Kay, J., Merle, A., Moret, J. M., … Riedmiller, M. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022 602:7897, 602(7897), 414–419. https://doi.org/10.1038/s41586-021-04301-9

Erol, S., Özer, H., Gürhan, A., Koplay, M., Erol, Ç., Seher, N., & Öztürk, M. (2026). Evaluation of supervised machine learning models in predicting temporomandibular joint disc displacement on 3T magnetic resonance imaging. Cranio - Journal of Craniomandibular and Sleep Practice. https://doi.org/10.1080/08869634.2026.2620624

Fujinuma, N., DeCost, B., Hattrick-Simpers, J., & Lofland, S. E. (2022). Why big data and compute are not necessarily the path to big materials science. Communications Materials 2022 3:1, 3(1), 59-. https://doi.org/10.1038/s43246-022-00283-x

Guerra, E., Wilhelmi, F., Miozzo, M., & Dini, P. (2023). The Cost of Training Machine Learning Models Over Distributed Data Sources. IEEE Open Journal of the Communications Society, 4, 1111–1126. https://doi.org/10.1109/OJCOMS.2023.3274394

Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2025). Mastering diverse control tasks through world models. Nature 2025 640:8059, 640(8059), 647–653. https://doi.org/10.1038/s41586-025-08744-2

Haripriya, R., Khare, N., Pandey, M., & Biswas, S. (2025). A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation. Journal of Big Data 2025 12:1, 12(1), 113-. https://doi.org/10.1186/s40537-025-01169-8

Khajehali, N., Yan, J., Chow, Y. W., & Fahmideh, M. (2023). A Comprehensive Overview of IoT-Based Federated Learning: Focusing on Client Selection Methods. Sensors 2023, Vol. 23, 23(16). https://doi.org/10.3390/s23167235

Krishnan, R., & Durairaj, S. (2024). Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 2024 106:12, 106(12), 3837–3878. https://doi.org/10.1007/s00607-024-01301-1

Lee, C., Kwon, H., & Lee, Y. Il. (2025). OPC UA-based three-layer architecture for aggregated microgrids integrating edge cloud computing and IEC 62264. Journal of Industrial Information Integration, 48, 100965. https://doi.org/10.1016/j.jii.2025.100965

Liang, W., Tadesse, G. A., Ho, D., Li, F. F., Zaharia, M., Zhang, C., & Zou, J. (2022). Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence 2022 4:8, 4(8), 669–677. https://doi.org/10.1038/s42256-022-00516-1

Lu, Z., Pan, H., Dai, Y., Si, X., & Zhang, Y. (2024). Federated Learning with Non-IID Data: A Survey. IEEE Internet of Things Journal, 11(11), 19188–19209. https://doi.org/10.1109/JIOT.2024.3376548

Luo, Y., Yuan, L., Zheng, J., Wang, Y., Gao, Y., & Chen, D. (2025). DPCTS: Dual-Perspective Cross-Client Trust Scoring for Robust Backdoor Defense of Federated Learning over 6G Networks. IEEE Transactions on Network Science and Engineering. https://doi.org/10.1109/TNSE.2025.3645850

Oyekan, B. (2024). DEVELOPING PRIVACY-PRESERVING FEDERATED LEARNING MODELS FOR COLLABORATIVE HEALTH DATA ANALYSIS ACROSS MULTIPLE INSTITUTIONS WITHOUT COMPROMISING DATA SECURITY. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (Online), 3(3), 139–164. https://doi.org/10.60087/jklst.vol3.n3.p139-164

Podschwadt, R., Takabi, D., Hu, P., Rafiei, M. H., & Cai, Z. (2022). A Survey of Deep Learning Architectures for Privacy-Preserving Machine Learning With Fully Homomorphic Encryption. IEEE Access, 10, 117477–117500. https://doi.org/10.1109/ACCESS.2022.3219049

Rahman, A., Hossain, M. S., Muhammad, G., Kundu, D., Debnath, T., Rahman, M., Khan, M. S. I., Tiwari, P., & Band, S. S. (2022). Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Cluster Computing 2022 26:4, 26(4), 2271–2311. https://doi.org/10.1007/s10586-022-03658-4

Rodriguez, D., Nayak, T., Chen, Y., Krishnan, R., & Huang, Y. (2022). On the role of deep learning model complexity in adversarial robustness for medical images. BMC Medical Informatics and Decision Making 2022 22:2, 22(2), 160-. https://doi.org/10.1186/s12911-022-01891-w

Sapienza, S., & Vedder, A. (2021). Principle-based recommendations for big data and machine learning in food safety: the P-SAFETY model. AI & SOCIETY 2021 38:1, 38(1), 5–20. https://doi.org/10.1007/s00146-021-01282-1

Soenksen, L. R., Ma, Y., Zeng, C., Boussioux, L., Villalobos Carballo, K., Na, L., Wiberg, H. M., Li, M. L., Fuentes, I., & Bertsimas, D. (2022). Integrated multimodal artificial intelligence framework for healthcare applications. Npj Digital Medicine 2022 5:1, 5(1), 149-. https://doi.org/10.1038/s41746-022-00689-4

Stolte, M., Kappenberg, F., Rahnenführer, J., & Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Https://Doi.Org/10.1214/24-SS149, 18(none), 163–298. https://doi.org/10.1214/24-SS149

Wu, B., Fang, F., & Wang, X. (2024). Joint Age-Based Client Selection and Resource Allocation for Communication-Efficient Federated Learning over NOMA Networks. IEEE Transactions on Communications, 72(1), 179–192. https://doi.org/10.1109/TCOMM.2023.3317300

Xie, Z., & Song, S. (2023). FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence. IEEE Journal on Selected Areas in Communications, 41(4), 1227–1242. https://doi.org/10.1109/JSAC.2023.3242734

Zhang, J., Symons, J., Agapow, P., Teo, J. T., Paxton, C. A., Abdi, J., Mattie, H., Davie, C., Torres, A. Z., Folarin, A., Sood, H., Celi, L. A., Halamka, J., Eapen, S., & Budhdeo, S. (2022). Best practices in the real-world data life cycle. PLOS Digital Health, 1(1), e0000003. https://doi.org/10.1371/journal.pdig.0000003

Zhang, Y., Liu, J., Li, J., Huang, Y., Zhong, W., Chen, Y., & Chen, L. (2025). SC-NBTI: A Smart Contract-Based Incentive Mechanism for Federated Knowledge Sharing. Sensors 2025, Vol. 25, 25(18). https://doi.org/10.3390/s25185802

Zheng, R., Sumper, A., Aragues-Penalba, M., & Galceran-Arellano, S. (2024). Advancing Power System Services With Privacy-Preserving Federated Learning Techniques: A Review. IEEE Access, 12, 76753–76780. https://doi.org/10.1109/ACCESS.2024.3407121

Downloads

Published

2026-02-27

How to Cite

Nirwana, ., Azhar, M., & Usman, M. (2026). Federated Learning for Privacy-Preserving Data Science: Performance, Efficiency, and Scalability Analysis. Journal of Data Science, 2026(1), 81–101. https://doi.org/10.61453/jods.v20260105