Federated Learning for Privacy-Preserving Data Science: Performance, Efficiency, and Scalability Analysis
DOI:
https://doi.org/10.61453/jods.v20260105Keywords:
Federated Learning, Privacy-Preserving AI, Distributed Data Science, Scalability, Secure AnalyticsAbstract
The rapid growth of distributed and privacy-sensitive data environments has intensified the need for collaborative machine learning approaches that preserve confidentiality without sacrificing performance. Traditional centralized learning requires data aggregation, creating regulatory, ethical, and security risks. Although federated learning (FL) addresses this limitation by enabling decentralized training, existing implementations suffer from performance degradation under non-IID data distributions, unstable convergence, and high communication overhead. Moreover, many studies focus primarily on accuracy comparisons without systematically evaluating scalability and efficiency trade-offs. This study proposes an Adaptive Federated Learning (AFL) framework that integrates divergence-aware aggregation and intelligent client selection to enhance convergence stability and communication efficiency in heterogeneous environments. A comprehensive experimental evaluation was conducted across IID and non-IID data partitions, varying participation rates, and communication constraints. Performance was assessed using predictive accuracy, F1-score, convergence rounds, communication volume, and scalability metrics, with comparisons against centralized learning and standard FedAvg. Results demonstrate that AFL improves accuracy by up to 5.3% and macro F1-score by 6.5% under highly non-IID settings compared to FedAvg, while reducing convergence rounds by approximately 23% and communication overhead by up to 28%. Statistical analysis confirms significant performance gains (p < 0.01). The findings indicate that adaptive orchestration mechanisms substantially enhance federated robustness without compromising privacy advantages. This research aims to provide a system-level evaluation framework for privacy-preserving distributed learning and offers actionable guidance for deploying scalable federated systems in healthcare, finance, and other data-sensitive domains.
References
Abdullah, R. M., Al-Surmi, I., Qaid, G. R. S., & Alwan, A. A. (2024). Energy-Efficient Handover Algorithm for Sustainable Mobile Networks: Balancing Connectivity and Power Consumption. Journal of Sensor and Actuator Networks, 13(5). https://doi.org/10.3390/jsan13050051
Abreha, H. G., Hayajneh, M., & Serhani, M. A. (2022). Federated Learning in Edge Computing: A Systematic Survey. Sensors 2022, Vol. 22, 22(2). https://doi.org/10.3390/s22020450
Ahmed, S., Kaiser, S. S., Chaki, S., & Ali, S. B. M. S. (2025). Adaptive Federated Learning With Reinforcement Learning-Based Client Selection for Heterogeneous Environments. IEEE Access, 13, 131671–131695. https://doi.org/10.1109/ACCESS.2025.3591699
Antonio, E. O. (2025). FGSM Attack Impact on MNIST Classifiers via PyTorch Lightning. IEEE 8th International Conference on Electrical, Control and Computer Engineering, InECCE 2025 - Proceedings, 617–621. https://doi.org/10.1109/InECCE64959.2025.11150836
Azhar, M., Amjad, A., Dewi, D. A., & Kasim, S. (2025). A Systematic Review and Experimental Evaluation of Classical and Transformer-Based Models for Urdu Abstractive Text Summarization. Information, 16(9). https://doi.org/10.3390/info16090784
Cao, X., Basar, T., Diggavi, S., Eldar, Y. C., Letaief, K. B., Poor, H. V., & Zhang, J. (2023). Communication-Efficient Distributed Learning: An Overview. IEEE Journal on Selected Areas in Communications, 41(4), 851–873. https://doi.org/10.1109/JSAC.2023.3242710
Cerqueus, A., & Delorme, X. (2023). Evaluating the scalability of reconfigurable manufacturing systems at the design phase. International Journal of Production Research, 61(23), 8080–8093. https://doi.org/10.1080/00207543.2022.2164374
Dai, Z., Ma, Z., Zhang, X., Chen, J., Ershadnia, R., Luan, X., & Soltanian, M. R. (2022). An integrated experimental design framework for optimizing solute transport monitoring locations in heterogeneous sedimentary media. Journal of Hydrology, 614, 128541. https://doi.org/10.1016/j.jhydrol.2022.128541
Degrave, J., Felici, F., Buchli, J., Neunert, M., Tracey, B., Carpanese, F., Ewalds, T., Hafner, R., Abdolmaleki, A., de las Casas, D., Donner, C., Fritz, L., Galperti, C., Huber, A., Keeling, J., Tsimpoukelli, M., Kay, J., Merle, A., Moret, J. M., … Riedmiller, M. (2022). Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 2022 602:7897, 602(7897), 414–419. https://doi.org/10.1038/s41586-021-04301-9
Erol, S., Özer, H., Gürhan, A., Koplay, M., Erol, Ç., Seher, N., & Öztürk, M. (2026). Evaluation of supervised machine learning models in predicting temporomandibular joint disc displacement on 3T magnetic resonance imaging. Cranio - Journal of Craniomandibular and Sleep Practice. https://doi.org/10.1080/08869634.2026.2620624
Fujinuma, N., DeCost, B., Hattrick-Simpers, J., & Lofland, S. E. (2022). Why big data and compute are not necessarily the path to big materials science. Communications Materials 2022 3:1, 3(1), 59-. https://doi.org/10.1038/s43246-022-00283-x
Guerra, E., Wilhelmi, F., Miozzo, M., & Dini, P. (2023). The Cost of Training Machine Learning Models Over Distributed Data Sources. IEEE Open Journal of the Communications Society, 4, 1111–1126. https://doi.org/10.1109/OJCOMS.2023.3274394
Hafner, D., Pasukonis, J., Ba, J., & Lillicrap, T. (2025). Mastering diverse control tasks through world models. Nature 2025 640:8059, 640(8059), 647–653. https://doi.org/10.1038/s41586-025-08744-2
Haripriya, R., Khare, N., Pandey, M., & Biswas, S. (2025). A privacy-enhanced framework for collaborative Big Data analysis in healthcare using adaptive federated learning aggregation. Journal of Big Data 2025 12:1, 12(1), 113-. https://doi.org/10.1186/s40537-025-01169-8
Khajehali, N., Yan, J., Chow, Y. W., & Fahmideh, M. (2023). A Comprehensive Overview of IoT-Based Federated Learning: Focusing on Client Selection Methods. Sensors 2023, Vol. 23, 23(16). https://doi.org/10.3390/s23167235
Krishnan, R., & Durairaj, S. (2024). Reliability and performance of resource efficiency in dynamic optimization scheduling using multi-agent microservice cloud-fog on IoT applications. Computing 2024 106:12, 106(12), 3837–3878. https://doi.org/10.1007/s00607-024-01301-1
Lee, C., Kwon, H., & Lee, Y. Il. (2025). OPC UA-based three-layer architecture for aggregated microgrids integrating edge cloud computing and IEC 62264. Journal of Industrial Information Integration, 48, 100965. https://doi.org/10.1016/j.jii.2025.100965
Liang, W., Tadesse, G. A., Ho, D., Li, F. F., Zaharia, M., Zhang, C., & Zou, J. (2022). Advances, challenges and opportunities in creating data for trustworthy AI. Nature Machine Intelligence 2022 4:8, 4(8), 669–677. https://doi.org/10.1038/s42256-022-00516-1
Lu, Z., Pan, H., Dai, Y., Si, X., & Zhang, Y. (2024). Federated Learning with Non-IID Data: A Survey. IEEE Internet of Things Journal, 11(11), 19188–19209. https://doi.org/10.1109/JIOT.2024.3376548
Luo, Y., Yuan, L., Zheng, J., Wang, Y., Gao, Y., & Chen, D. (2025). DPCTS: Dual-Perspective Cross-Client Trust Scoring for Robust Backdoor Defense of Federated Learning over 6G Networks. IEEE Transactions on Network Science and Engineering. https://doi.org/10.1109/TNSE.2025.3645850
Oyekan, B. (2024). DEVELOPING PRIVACY-PRESERVING FEDERATED LEARNING MODELS FOR COLLABORATIVE HEALTH DATA ANALYSIS ACROSS MULTIPLE INSTITUTIONS WITHOUT COMPROMISING DATA SECURITY. Journal of Knowledge Learning and Science Technology ISSN: 2959-6386 (Online), 3(3), 139–164. https://doi.org/10.60087/jklst.vol3.n3.p139-164
Podschwadt, R., Takabi, D., Hu, P., Rafiei, M. H., & Cai, Z. (2022). A Survey of Deep Learning Architectures for Privacy-Preserving Machine Learning With Fully Homomorphic Encryption. IEEE Access, 10, 117477–117500. https://doi.org/10.1109/ACCESS.2022.3219049
Rahman, A., Hossain, M. S., Muhammad, G., Kundu, D., Debnath, T., Rahman, M., Khan, M. S. I., Tiwari, P., & Band, S. S. (2022). Federated learning-based AI approaches in smart healthcare: concepts, taxonomies, challenges and open issues. Cluster Computing 2022 26:4, 26(4), 2271–2311. https://doi.org/10.1007/s10586-022-03658-4
Rodriguez, D., Nayak, T., Chen, Y., Krishnan, R., & Huang, Y. (2022). On the role of deep learning model complexity in adversarial robustness for medical images. BMC Medical Informatics and Decision Making 2022 22:2, 22(2), 160-. https://doi.org/10.1186/s12911-022-01891-w
Sapienza, S., & Vedder, A. (2021). Principle-based recommendations for big data and machine learning in food safety: the P-SAFETY model. AI & SOCIETY 2021 38:1, 38(1), 5–20. https://doi.org/10.1007/s00146-021-01282-1
Soenksen, L. R., Ma, Y., Zeng, C., Boussioux, L., Villalobos Carballo, K., Na, L., Wiberg, H. M., Li, M. L., Fuentes, I., & Bertsimas, D. (2022). Integrated multimodal artificial intelligence framework for healthcare applications. Npj Digital Medicine 2022 5:1, 5(1), 149-. https://doi.org/10.1038/s41746-022-00689-4
Stolte, M., Kappenberg, F., Rahnenführer, J., & Bommert, A. (2024). Methods for quantifying dataset similarity: a review, taxonomy and comparison. Https://Doi.Org/10.1214/24-SS149, 18(none), 163–298. https://doi.org/10.1214/24-SS149
Wu, B., Fang, F., & Wang, X. (2024). Joint Age-Based Client Selection and Resource Allocation for Communication-Efficient Federated Learning over NOMA Networks. IEEE Transactions on Communications, 72(1), 179–192. https://doi.org/10.1109/TCOMM.2023.3317300
Xie, Z., & Song, S. (2023). FedKL: Tackling Data Heterogeneity in Federated Reinforcement Learning by Penalizing KL Divergence. IEEE Journal on Selected Areas in Communications, 41(4), 1227–1242. https://doi.org/10.1109/JSAC.2023.3242734
Zhang, J., Symons, J., Agapow, P., Teo, J. T., Paxton, C. A., Abdi, J., Mattie, H., Davie, C., Torres, A. Z., Folarin, A., Sood, H., Celi, L. A., Halamka, J., Eapen, S., & Budhdeo, S. (2022). Best practices in the real-world data life cycle. PLOS Digital Health, 1(1), e0000003. https://doi.org/10.1371/journal.pdig.0000003
Zhang, Y., Liu, J., Li, J., Huang, Y., Zhong, W., Chen, Y., & Chen, L. (2025). SC-NBTI: A Smart Contract-Based Incentive Mechanism for Federated Knowledge Sharing. Sensors 2025, Vol. 25, 25(18). https://doi.org/10.3390/s25185802
Zheng, R., Sumper, A., Aragues-Penalba, M., & Galceran-Arellano, S. (2024). Advancing Power System Services With Privacy-Preserving Federated Learning Techniques: A Review. IEEE Access, 12, 76753–76780. https://doi.org/10.1109/ACCESS.2024.3407121
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Data Science

This work is licensed under a Creative Commons Attribution 4.0 International License.