A Comparative Study of K-means Clustering Algorithms Using Euclidean and Manhattan Distance for Climate Data.

Section: Research Paper
Published
May 1, 2025
Pages
47-58

Abstract

The K-means clustering algorithms (Random, K-means++, Canopy, and Farthest First) are unsupervised machine learning techniques designed to group data points based on their similarities. The study examined the effects of clustering algorithms and distance metrics on climate data analysis from meteorological stations in the Kurdistan Region of Iraq (20202022). 8-attribute dataset with 1,095 cases was clustered using Random, K-means++, Canopy, and Farthest First methods, evaluated with Euclidean and Manhattan distance metrics via the WEKA tools, which is a versatile and accessible open-source tool for machine learning and data mining. It features a user-friendly interface, a wide range of algorithms, robust pre-processing and visualization tools, and cross-platform compatibility.Focusing on efficiency and reducing variation within clusters, the results revealed that within Euclidean distance, all algorithms formed two clusters. Canopy required the most iterations, Farthest First the fewest. K-means++ was the fastest, Canopy the slowest. WCSS values were similar, with Random and Canopy scoring lowest, but within Manhattan Distance, all algorithms again formed two clusters. Canopy had the highest iterations, Farthest First the fewest and fastest, while Random was slowest. WCSS differences were negligible, with Random, Canopy, and Farthest First performing best.Graphs illustrate the highlighted differences in cluster distribution, iterations, execution time, and WCSS. Euclidean distance yielded lower WCSS, while interactive maps revealed clearer cluster distributions for most attributes compared to Manhattan distance. produced the lowest within-cluster sum of squared errors compared to the Manhattan distance.

References

  1. Amira , W. O., Shahen , M. .. & Mohamad, S. . H., 2023. An Application Of Two Classification Methods: Hierarchical Clustering And Factor Analysis To The Plays PUBG. Iraqi Journal of Statistical Sciences, 20(1), pp. 25-42.
  2. Andrew , M., Kamal , N. & Lyle , H. U., 2000. Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,, 169-178.
  3. Anil , K. J., 2009. Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters, Volume 31, pp. 651-666.
  4. Arthur, D. & Vassilvitskii, S., 2007. K-Means++: The Advantages of Careful Seeding. pp. 1027-1035.
  5. Edsa, S. A. A. & Chandra, G., 2023. Modified K-Means Clustering with Semi Grouping Perspective. Indonesian Journal of Computer Science, 12(2), pp. 493-500.
  6. Gan, G., Jianhong , W. & Chaoqun , M., 2021. Data Clustering. In: Theory ,Algorthm and Applications Second Edition. Philadephia: Math Works ,Inc., pp. 68-69.
  7. Ghazal, T. M., Hussain, M. Z. & Said, R. A., 2021. Performances of K-Means Clustering Algorithm with Different Distance Metrics. Intelligent Automation & Soft Computing, 30(2).
  8. Gupta, A., Sharma, H. & Akhtar, A., 2021. A comparative Analysis of K-means and Hierarchical Clustering. EPRA International Journal of Multidisciplinary Research (IJMR), 7(8), pp. 412-418.
  9. Hamad, B. A., 2023. Combining Cluster Analysis with Multiple Linear Regression Analysis to Create the Most Accurate Prediction Model for Evaporation in the Kurdistan Region of Iraq. Iraqi Journal of Statistical Sciences, 20(2), pp. 188-199.
  10. Hochreiter, S., 2014. Summarizing Multivariate Data. In: Basic Methods of Data Analysis. Linz: Institute of Bioinformatics, Johannes Kepler University, Austria, pp. 76-77.
  11. Jayakameswaraia, M., Reddy, K. K. & Ramakrishna, S., 2017. Performance Assessment of Improved Farthest Firset Cluster Algorithm on Smartphone Senso Data. International Journal of Creative Research Thoughts (IJCR), 5(4), pp. 3302-3305.
  12. Karthikeyan, B., George, D. J. & Manikandan, G., 2020. A Comparative Study on K-Means Clustering and Agglomerative Hierarchical Clustering. International Journal of Emerging Trends in Engineering Research, 8(5), pp. 1600-1604.
  13. Mohammed Elhassan, F. D. & Ahamed, Y. M., 2020. Compare Clustering Algorithms of Weka Tool. International Journal of Innovative Science, Engineering & Technology, 7(10), pp. 276-290.
  14. Rodriguez, M. Z., Comin, C. H. & Luciano da, . F., 2019. Clustering algorithms: A comparative approach. PLOS ONE, 14(1).
  15. Sinaga, K. P. & Yang, . M.-S., 2020. Unsupervised K-Means Clustering Algorithm. IEEE, 80716 - 80727.
  16. Syakur, M. A., Khotimah, B. K. & Rochman, E. M. S., 2018. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. Materials Science and Engineering , 336(9).
  17. Tabianan , K., Velu, S. & Ravi , V., 2022. K-Means Clustering Approach for Intelligent Customer Segmentation Using Customer Purchase Behavior Data. Sustainability, 14(12).
Download this PDF file

Statistics

How to Cite

Ahmed Hamad, B. (2025). A Comparative Study of K-means Clustering Algorithms Using Euclidean and Manhattan Distance for Climate Data. IRAQI JOURNAL OF STATISTICAL SCIENCES, 22(1), 47–58. https://doi.org/10.33899/iqjoss.2025.187754