Comparison of Logistic Regression, Random Forest, SVM, KNN Algorithm for Water Quality Classification Based on Contaminant Parameters

Authors

  • Teguh Sutanto Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia
  • Muhammad Rafli Aditya Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia
  • Haldi Budiman Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia
  • M.Rezqy Noor Ridha Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia
  • Usman Syapotro Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia
  • Noor Azijah Universitas Islam Kalimantan Muhammad Arsyad Al-Banjari, Indonesia

Keywords:

Water Quality, Logistic Regression, Random Forest, Support Vector Machine (SVM), K-Nearest Neighbors (KNN)

Abstract

This study compares four machine learning algorithms Logistic Regression, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN) in water quality classification based on contaminant parameters. The purpose of this study is to evaluate and compare the performance of these algorithms in terms of accuracy. The methodology used includes data collection, preprocessing, and algorithm implementation with evaluation using crossvalidation techniques. The results showed that the application of the Stacking method with Gradient Boosting Meta-learner produced the highest accuracy of 96.00%, outperforming all other algorithms. In comparison, Random Forest achieved 95.75% accuracy, followed by SVM with 93.25% accuracy, and Logistic Regression and KNN each achieved 90.19% accuracy. This finding emphasizes that Stacking with Gradient Boosting provides much better performance in water quality classification compared to other models. This research provides new insights into the application of machine learning algorithms for water quality management as well as guidance for optimal algorithm selection.

Published

2024-11-26

How to Cite

Sutanto, T., Aditya, M. R., Budiman, H., Noor Ridha, M., Syapotro, U., & Azijah, N. (2024). Comparison of Logistic Regression, Random Forest, SVM, KNN Algorithm for Water Quality Classification Based on Contaminant Parameters. Journal of Data Science, 2024. Retrieved from https://iuojs.intimal.edu.my/index.php/jods/article/view/588