COMPARATIVE ANALYSIS OF K-MEANS AND FUZZY C-MEANS ALGORITHMS ON DEMOGRAPHIC DATA USING THE PCA METHOD

Eltun Y. Ahmadov

doi:http://doi.org/10.25045/jpit.v14.i1.03

About the Journal Editorial Board For authors Publication ethics Archive Abstracting & Indexing Contact us

№1, 2023

COMPARATIVE ANALYSIS OF K-MEANS AND FUZZY C-MEANS ALGORITHMS ON DEMOGRAPHIC DATA USING THE PCA METHOD

Eltun Y. Ahmadov

eltunehmedov95@gmail.com

The concept of demography, which includes the processes such as birth, death, natural increase, improvement of employment and standard of living of the population, migration, etc., occupies a unique place in the global processes of the modern era. In this regard, this article uses clustering algorithms, which are estimated as a demographic data mining technology. For the analysis of demographic data, experiments are performed using k-means and fuzzy c-means clustering algorithms in the Python programming language. The experiment uses PCA method to reduce the dimension and get more effective results. Silhouette, Calinski-Harabasz and Davies-Bouldin indices, and CPU time are used to evaluate the quality of the algorithm. The result of the experiment shows the possibility of achieving an effective result through the k-means and fuzzy c-means clustering algorithms by applying the PCA method in the demographic data analysis (pp.15-22).

Keywords: Data mining, demography, clustering, k-means, fuzzy c-means, PCA

DOI:

http://doi.org/10.25045/jpit.v14.i1.03

View article(1894)

References

Ahmadov, E. (2021). Intelligent analysis of demographic data using K-means clustering algorithm (in Azerbaijani). 2nd International Science and Engineering Conference With The Joint Organization By The Ministry Of Education Azerbaijan Republic. 369-371. https://beu.edu.az/root_panel/upload/files/
beu_edu_az/documents/Engineering_Book_2021%20%281%29.pdf
Alguliyev, R. M. and Yusifov, F. F. (2021). Architectural principles of creating a national e-demography system (in Azerbaijani). Information Society Problems, 1, 3-17. 10.25045/jpis.v12.i1.01
Chattopadhyay, S., Pratihar, D. K., Sarkar, S. (2011). A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30, 701–720. https://www.research
gate.net/publication/285788940_A_comparative_study_of_fuzzy_c-means_algorithm_and_entropy-based_
fuzzy_clustering_algorithms
Ghosh, S. & Dubey, S. (2013). Comparative analysis of k-means and fuzzy c-means algorithms. International Journal of Advanced Computer Science and Applications, 4, 35-39. https://dx.doi.org/10.14569/
IJACSA.2013.040406
Grover, N. (2014). A study of various fuzzy clustering algorithm. International Journal of Engineering Research, 3, 177-181. 10.17950/ijer/v3s3/310
Jain, A., Duin, R., Mao, J. (2000). Statistical Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1, 4-37. 10.1109/34.824819
Mamat, R., Mohamed, F. S., Mohamed, M., A., Rawi, N., M., Awang, M. I. (2018). Silhouette index for determining optimal k-means clustering on images in different color models. International Journal of Engineering & Technology, 7, 105-109 .https://www.sciencepubco.com/index.php/ijet/article/view/11464
Mishra, S., Sarkar, U., Traphder, S., Datta, S., Swain, D. P., Saikhom, R., Panda, S., Laishram, M. (2017). Multivariate statistical data analysis-principal component analysis (pca). International Journal of Livestock Research, 7, 60-78. https://www.semantic
scholar.org/paper/Multivariate-Statistical-Data-Analysis-Principal-Mishra-Sarkar/3ad314f33dbdf486999f521ed3ba061006a2d2b2
Müllensiefen, D., Hennig, C., Howells H. (2017). Using clustering of rankings to explain brand preferences with personality and sociodemographic variables. Journal of Applied Statistics, 45, 1-21. 10.1080/02664763.2017.1339025
Oti, E. U., Olusola, M. O., Eze, F. C., Enogwe, S. U. (2021). Comprehensive review of k-means clustering algorithms,” International Journal of Advances in Scientific Research and Engineering, 7, 64–69. http://dx.doi.org/10.31695/IJASRE.2021.34050
Sharma, R. D. (2020). Python tools for big data analytics. International Journal of Science and Research (IJSR), 9, 597-602. https://www.ijsr.net/archive/v9i5/SR20507222308.pdf
Suganya, R. & Shanthi, R. (2012). Fuzzy c-means algorithm – a review. International Journal of Scientific and Research Publications, 2, 1-3. https://www.ijsrp.org/research-paper-1112.php?rp=P11381
Wang, X. & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Material Science and Engineering, 569, 1-6. https://www.researchgate.net/publication/335081976_An_improved_index_for_clustering_validation_based_on_Silhouette_index_and_Calinski-Harabasz_index
Wijaya, Y. A., Kurniady, D. A., Setyanto, E., Tarihoran, W. S., Rusmana, D., Rahim, R. (2021). Davies Bouldin index algorithm for optimizing clustering case studies mapping school facilities. TEM Journal, 10, 1099-1103. https://docplayer.net/219118152-Davies-bouldin-index-algorithm-for-optimizing-clustering-case-studies-mapping-school-facilities.html
Yong, Y., Chongxun, Z., Pan, L. (2004). A novel fuzzy c-means clustering algorithm for image thresholding. Measurement Science Review, 4, 11-19. https://www.semanticscholar.org/paper/A-Novel-Fuzzy-C-Means-Clustering-Algorithm-for-Yong-Chong-xun/ebbef7e8b7a1ea133999561ab279e51b961d31ec