AN ADAPTIVE k-MEDIANS CLUSTERING ALGORITHM

Adil M. Bagirov, Sona Taheri, Burak Ordin

doi:http://doi.org/10.25045/jpit.v13.i2.01

About the Journal Editorial Board For authors Publication ethics Archive Abstracting & Indexing Contact us

№2, 2022

AN ADAPTIVE k-MEDIANS CLUSTERING ALGORITHM

Adil M. Bagirov, Sona Taheri, Burak Ordin

a.bagirov@federation.edu.au; sona.taheri@rmit.edu.au; burak.ordin@ege.edu.tr

A new version of the k-medians algorithm, the adaptive k-medians algorithm, is introduced to solve clustering problems with the similarity measure defined using the L₁-norm. The proposed algorithm first calculates the center of the whole data set as its median. To solve the k-clustering problem (k-1), we formulate the auxiliary clustering problem to generate a set of starting points for the k-th cluster center. Then, the k-medians algorithm is applied starting from the previous (k-1) cluster centers and each point from the set of starting points to solve the k-clustering problem. A solution with the least value of the clustering function is accepted as the solution to the k-clustering problem. We evaluate the performance of the adaptive k-medians algorithm and compare it with other concurrent clustering algorithms using 8 real-world data sets (pp.3-15).

Keywords: Cluster analysis, k-medians algorithm, Adaptive clustering

DOI:

http://doi.org/10.25045/jpit.v13.i2.01

View article(2060)

References

Aggarwal C, Hinneburg A, Keim D. (2001). On the surprising behavior of distance metrics in high dimensional space. In: ICDT ’01 Proceedings of the 8th International Conference on Database Theory (pp.420–434).
Bagirov A, Karmitsa N, Taheri S. (2020). Partitional Clustering via Nonsmooth Optimization. Springer, Cham.
Bagirov A, Mohebi E. (2016). An algorithm for clustering using L1-norm based on hyperbolic smoothing technique. Computational Intelligence, 32, 439–57.
Bagirov A, Ordin B, Mohebi E. (2020). An incremental nonsmooth optimization algorithm for clustering using L1 and l∞ norms. Journal of Industrial and Management Optimization, 16(6), 2757-2779.
Bagirov A, Taheri S. (2017). A DC optimization algorithm for clustering problems with L1-norm. Iranian Journal of Operations Research, 8(2), 2–24.
Bagirov A, Yearwood J. (2006). A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems. European Journal of Operational Research,170(2), 578–596.
Bagirov A. (2008). Modified global k-means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41(10), 3192–3199.
Bai L, Liang J, Sui C, , Dang C. (2013). Fast global k-means clustering based on local geometrical information. Information Sciences, 245, 168–80.
Ben-David S. (2020). Computational feasibility of clustering under clusterability assumptions. 2020. https://arxiv.org/abs/1501.00437
Burkardt J. (2021) https://people.sc.fsu.edu/~jburkardt/
Carmichael J, Sneath P. (1969). Taxometric maps. Systematic Zoology, 18, 402–415.
Castellanos A, Cigarran J, Garcıa-Serrano A. (2017). Formal concept analysis for topic detection: A clustering quality experimental analysis. Information Systems, 66, 24-42.
Dai Q, Xiong Z, Xie J, Wang X, Zhang Y, Shang J. (2019). A novel clustering algorithm based on the natural reverse nearest neighbor structure. Information Systems, 84, 1-16.
Davies D, Bouldin D. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1(2), 224-227.
Dhillon, I.S., Fan, J., Guan, Y. (2001). Efficient clustering of very large document collections. In: Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., Namburu, R.R. (eds) Data Mining for Scientific and Engineering Applications. Massive Computing, vol.2., pp.357-381.
Dua D, Graff C. (2019). UCI Machine Learning Repository. University of California, Irvine, School of Information and Computer Sciences, http: //archive.ics.uci.edu/ml
Friggstad Z, Khodamoradi K, Rezapour M, Salavatipour M. (2019). Approximation schemes for clustering with outliers. ACM Transactions on Algorithms,15(2), 1-26.
Hanilci C, Ertas F. (2011). Comparison of the impact of some Minkowski metrics on vq/gmm based speaker recognition. Computers and Electrical Engineering, 37, 41-56.
Jain A. (2010). Data clustering: 50 years beyond k-means. Pattern Recognition Letters, 31(8), 651-666.
Jajuga K. (1987). A clustering method based on the L1-norm. Computational Statistics & Data Analysis, 5(4), 357-371.
Karmitsa N, Bagirov A, Taheri S. (2017). New diagonal bundle method for clustering problems in large data sets. European Journal of Operational Research, 263(2), 367-379.
Karmitsa N, Bagirov A, Taheri S. (2018). Clustering in large data sets with the limited memory bundle method. Pattern Recognition, 83, 245–259.
Khachay M, Khachay D. (2019). Attainable accuracy guarantee for the k-medians clustering in [0,1]. Optimization Letters, 13, 1837-1853.
Kumar A, Sabharwal Y, Sen S. (2010). Linear-time approximation schemes for clustering problems in any dimensions. Journal of the ACM, 57(2), Article 5.
Lai J, Huang TJ. (2010). Fast global k-means clustering using cluster membership and inequality. Pattern Recognition, 43(5), 1954-1963.
Likas A, Vlassis N, Verbeek J. (2003). The global k-means clustering algorithm. Pattern Recognition, 36(2), 451-461.
Ordin B, Bagirov A. (2015). A heuristic algorithm for solving the minimum sum-ofsquares clustering problems. Journal of Global Optimization, 61(2), 341-361.
Reinelt G. (1991). TSP-LIB-A Traveling Salesman Library. ORSA Journal on Computing, 3, 319-350.
Rousseeuw P. (1987). Silhouettes: Graphical aid to interpret and validate of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65.
Sabo K, Scitovski R, Vazler I. (2013). One-dimensional center-based L1-clustering method. Optimization Letters, 7(1), 5-22.
de Souza R, de Carvalho F. (2004). Clustering of interval data based on city-block distances. Pattern Recognition Letters, 25, 353-365.
Spath H. (1976). Algorithm 30: L1 cluster analysis. Computing, 16(4), 379-387.
Xavier A, Xavier V. (2011). Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing and partition into boundary and gravitational regions. Pattern Recognition, 44(1), 70-77.
Zhang J, Peng L, Zhao X, Kuruoglu E. (2012). Robust data clustering by learning multimetric Lq-norm distances. Expert Systems with Applications, 39, 335-349.