№2, 2022

AN ADAPTIVE k-MEDIANS CLUSTERING ALGORITHM

Adil M. Bagirov, Sona Taheri, Burak Ordin

A new version of the k-medians algorithm, the adaptive k-medians algorithm, is introduced to solve clustering problems with the similarity measure defined using the L1-norm. The proposed algorithm first calculates the center of the whole data set as its median. To solve the k-clustering problem (k-1), we formulate the auxiliary clustering problem to generate a set of starting points for the k-th cluster center. Then, the k-medians algorithm is applied starting from the previous  (k-1) cluster centers and each point from the set of starting points to solve the k-clustering problem. A solution with the least value of the clustering function is accepted as the solution to the k-clustering problem. We evaluate the performance of the adaptive k-medians algorithm and compare it with other concurrent clustering algorithms using 8 real-world data sets (pp.3-15).

Keywords: Cluster analysis, k-medians algorithm, Adaptive clustering
DOI : 10.25045/jpit.v13.i2.01
References
  • Aggarwal C, Hinneburg A, Keim D. (2001). On the surprising behavior of distance metrics in high dimensional space. In: ICDT ’01 Proceedings of the 8th International Conference on Database Theory (pp.420–434).
  • Bagirov A, Karmitsa N, Taheri S. (2020). Partitional Clustering via Nonsmooth Optimization. Springer, Cham.
  • Bagirov A, Mohebi E. (2016). An algorithm for clustering using L1-norm based on hyperbolic smoothing technique. Computational Intelligence, 32, 439–57.
  • Bagirov A, Ordin B, Mohebi E. (2020). An incremental nonsmooth optimization algorithm for clustering using L1 and l∞ norms. Journal of Industrial and Management Optimization, 16(6), 2757-2779.
  • Bagirov A, Taheri S. (2017). A DC optimization algorithm for clustering problems with L1-norm. Iranian Journal of Operations Research, 8(2), 2–24.
  • Bagirov A, Yearwood J. (2006). A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. European Journal of Operational Research,170(2), 578–596.
  • Bagirov A. (2008). Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41(10), 3192–3199.
  • Bai L, Liang J, Sui C, , Dang C. (2013). Fast global k-means clustering based on local geometrical information. Information Sciences, 245, 168–80.
  • Ben-David S. (2020). Computational feasibility of clustering under clusterability assumptions. 2020.  https://arxiv.org/abs/1501.00437
  • Burkardt J. (2021) https://people.sc.fsu.edu/~jburkardt/
  • Carmichael J, Sneath P. (1969). Taxometric maps. Systematic Zoology, 18, 402–415.
  • Castellanos A, Cigarran J, Garcıa-Serrano A. (2017). Formal concept analysis for topic detection: A clustering quality experimental analysis. Information Systems, 66, 24-42.
  • Dai Q, Xiong Z, Xie J, Wang X, Zhang Y, Shang J. (2019). A novel clustering algorithm based on the natural reverse nearest neighbor structure. Information Systems, 84, 1-16.
  • Davies D, Bouldin D. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224-227.
  • Dhillon, I.S., Fan, J., Guan, Y. (2001). Efficient clustering of very large document collections. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol.2., pp.357-381.
  • Dua D, Graff C. (2019). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http: //archive.ics.uci.edu/ml
  • Friggstad Z, Khodamoradi K, Rezapour M, Salavatipour M. (2019). Approximation schemes for clustering with outliers. ACM Transactions on Algorithms,15(2), 1-26.
  • Hanilci C, Ertas F. (2011). Comparison of the impact of some Minkowski metrics on vq/gmm based speaker recognition. Computers and Electrical Engineering, 37, 41-56.
  • Jain A. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651-666.
  • Jajuga K. (1987). A clustering method based on the L1-norm. Computational Statistics & Data Analysis, 5(4), 357-371.
  • Karmitsa N, Bagirov A, Taheri S. (2017). New diagonal bundle method for clustering problems in large data sets. European Journal of Operational Research, 263(2), 367-379.
  • Karmitsa N, Bagirov A, Taheri S. (2018). Clustering in large data sets with the limited memory bundle method. Pattern Recognition, 83, 245–259.
  • Khachay M, Khachay D. (2019). Attainable accuracy guarantee for the k-medians clustering in [0,1]. Optimization Letters, 13, 1837-1853.
  • Kumar A, Sabharwal Y, Sen S. (2010). Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2), Article 5.
  • Lai J, Huang TJ. (2010). Fast global k-means clustering using cluster membership and inequality. Pattern Recognition, 43(5), 1954-1963.
  • Likas A, Vlassis N, Verbeek J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451-461.
  • Ordin B, Bagirov A. (2015). A heuristic algorithm for solving the minimum sum-ofsquares clustering problems. Journal of Global Optimization, 61(2), 341-361.
  • Reinelt G. (1991). TSP-LIB-A Traveling Salesman Library. ORSA Journal on Computing, 3, 319-350.
  • Rousseeuw P. (1987). Silhouettes: Graphical aid to interpret and validate of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
  • Sabo K, Scitovski R, Vazler I. (2013). One-dimensional center-based L1-clustering method. Optimization Letters, 7(1), 5-22.
  • de Souza R, de Carvalho F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353-365.
  • Spath H. (1976). Algorithm 30: L1 cluster analysis. Computing, 16(4), 379-387.
  • Xavier A, Xavier V. (2011). Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recognition, 44(1), 70-77.
  • Zhang J, Peng L, Zhao X, Kuruoglu E. (2012). Robust data clustering by learning multimetric Lq-norm distances. Expert Systems with Applications, 39, 335-349.