№1, 2023
COMPARATIVE ANALYSIS OF K-MEANS AND FUZZY C-MEANS ALGORITHMS ON DEMOGRAPHIC DATA USING THE PCA METHOD
The concept of demography, which includes the processes such as birth, death, natural increase, improvement of employment and standard of living of the population, migration, etc., occupies a unique place in the global processes of the modern era. In this regard, this article uses clustering algorithms, which are estimated as a demographic data mining technology. For the analysis of demographic data, experiments are performed using k-means and fuzzy c-means clustering algorithms in the Python programming language. The experiment uses PCA method to reduce the dimension and get more effective results. Silhouette, Calinski-Harabasz and Davies-Bouldin indices, and CPU time are used to evaluate the quality of the algorithm. The result of the experiment shows the possibility of achieving an effective result through the k-means and fuzzy c-means clustering algorithms by applying the PCA method in the demographic data analysis (pp.15-22).
- Ahmadov, E. (2021). Intelligent analysis of demographic data using K-means clustering algorithm (in Azerbaijani). 2nd International Science and Engineering Conference With The Joint Organization By The Ministry Of Education Azerbaijan Republic. 369-371. https://beu.edu.az/root_panel/upload/files/
beu_edu_az/documents/Engineering_Book_2021%20%281%29.pdf - Alguliyev, R. M. and Yusifov, F. F. (2021). Architectural principles of creating a national e-demography system (in Azerbaijani). Information Society Problems, 1, 3-17. 10.25045/jpis.v12.i1.01
- Chattopadhyay, S., Pratihar, D. K., Sarkar, S. (2011). A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30, 701–720. https://www.research
gate.net/publication/285788940_A_comparative_study_of_fuzzy_c-means_algorithm_and_entropy-based_
fuzzy_clustering_algorithms - Ghosh, S. & Dubey, S. (2013). Comparative analysis of k-means and fuzzy c-means algorithms. International Journal of Advanced Computer Science and Applications, 4, 35-39. https://dx.doi.org/10.14569/
IJACSA.2013.040406 - Grover, N. (2014). A study of various fuzzy clustering algorithm. International Journal of Engineering Research, 3, 177-181. 10.17950/ijer/v3s3/310
- Jain, A., Duin, R., Mao, J. (2000). Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1, 4-37. 10.1109/34.824819
- Mamat, R., Mohamed, F. S., Mohamed, M., A., Rawi, N., M., Awang, M. I. (2018). Silhouette index for determining optimal k-means clustering on images in different color models. International Journal of Engineering & Technology, 7, 105-109 .https://www.sciencepubco.com/index.php/ijet/article/view/11464
- Mishra, S., Sarkar, U., Traphder, S., Datta, S., Swain, D. P., Saikhom, R., Panda, S., Laishram, M. (2017). Multivariate statistical data analysis-principal component analysis (pca). International Journal of Livestock Research, 7, 60-78. https://www.semantic
scholar.org/paper/Multivariate-Statistical-Data-Analysis-Principal-Mishra-Sarkar/3ad314f33dbdf486999f521ed3ba061006a2d2b2 - Müllensiefen, D., Hennig, C., Howells H. (2017). Using clustering of rankings to explain brand preferences with personality and sociodemographic variables. Journal of Applied Statistics, 45, 1-21. 10.1080/02664763.2017.1339025
- Oti, E. U., Olusola, M. O., Eze, F. C., Enogwe, S. U. (2021). Comprehensive review of k-means clustering algorithms,” International Journal of Advances in Scientific Research and Engineering, 7, 64–69. http://dx.doi.org/10.31695/IJASRE.2021.34050
- Sharma, R. D. (2020). Python tools for big data analytics. International Journal of Science and Research (IJSR), 9, 597-602. https://www.ijsr.net/archive/v9i5/SR20507222308.pdf
- Suganya, R. & Shanthi, R. (2012). Fuzzy c-means algorithm – a review. International Journal of Scientific and Research Publications, 2, 1-3. https://www.ijsrp.org/research-paper-1112.php?rp=P11381
- Wang, X. & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Material Science and Engineering, 569, 1-6. https://www.researchgate.net/publication/335081976_An_improved_index_for_clustering_validation_based_on_Silhouette_index_and_Calinski-Harabasz_index
- Wijaya, Y. A., Kurniady, D. A., Setyanto, E., Tarihoran, W. S., Rusmana, D., Rahim, R. (2021). Davies Bouldin index algorithm for optimizing clustering case studies mapping school facilities. TEM Journal, 10, 1099-1103. https://docplayer.net/219118152-Davies-bouldin-index-algorithm-for-optimizing-clustering-case-studies-mapping-school-facilities.html
- Yong, Y., Chongxun, Z., Pan, L. (2004). A novel fuzzy c-means clustering algorithm for image thresholding. Measurement Science Review, 4, 11-19. https://www.semanticscholar.org/paper/A-Novel-Fuzzy-C-Means-Clustering-Algorithm-for-Yong-Chong-xun/ebbef7e8b7a1ea133999561ab279e51b961d31ec