№1, 2023


Eltun Y. Ahmadov

The concept of demography, which includes the processes such as birth, death, natural increase, improvement of employment and standard of living of the population, migration, etc., occupies a unique place in the global processes of the modern era. In this regard, this article uses clustering algorithms, which are estimated as a demographic data mining technology. For the analysis of demographic data, experiments are performed using k-means and fuzzy c-means clustering algorithms in the Python programming language. The experiment uses PCA method to reduce the dimension and get more effective results. Silhouette, Calinski-Harabasz and Davies-Bouldin indices, and CPU time are used to evaluate the quality of the algorithm. The result of the experiment shows the possibility of achieving an effective result through the k-means and fuzzy c-means clustering algorithms by applying the PCA method in the demographic data analysis (pp.15-22).

Keywords: Data mining, demography, clustering, k-means, fuzzy c-means, PCA
DOI : 10.25045/jpit.v14.i1.03

Ahmadov, E. (2021). Intelligent analysis of demographic data using K-means clustering algorithm (in Azerbaijani). 2nd International Science and Engineering Conference With The Joint Organization By The Ministry Of Education Azerbaijan Republic. 369-371. https://beu.edu.az/root_panel/upload/files/

Alguliyev, R. M. and Yusifov, F. F. (2021). Architectural principles of creating a national e-demography system (in Azerbaijani). Information Society Problems, 1, 3-17. 10.25045/jpis.v12.i1.01

 Chattopadhyay, S., Pratihar, D. K., Sarkar, S. (2011). A comparative study of fuzzy c-means algorithm and entropy-based fuzzy clustering algorithms. Computing and Informatics, 30, 701–720. https://www.research

Ghosh, S. & Dubey, S. (2013). Comparative analysis of k-means and fuzzy c-means algorithms. International Journal of Advanced Computer Science and Applications, 4, 35-39. https://dx.doi.org/10.14569/

Grover, N. (2014). A study of various fuzzy clustering algorithm. International Journal of Engineering  Research, 3, 177-181. 10.17950/ijer/v3s3/310

Jain, A., Duin, R.,   Mao, J. (2000). Statistical  Pattern Recognition: A Review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 1, 4-37. 10.1109/34.824819

Mamat, R., Mohamed, F. S., Mohamed, M., A., Rawi, N., M., Awang, M. I. (2018). Silhouette index for determining optimal k-means clustering on images in different color models. International Journal of Engineering & Technology, 7, 105-109 .https://www.sciencepubco.com/index.php/ijet/article/view/11464

Mishra, S., Sarkar, U., Traphder, S., Datta, S., Swain, D. P., Saikhom, R., Panda, S., Laishram, M. (2017). Multivariate statistical data analysis-principal component analysis (pca). International Journal of Livestock Research, 7, 60-78. https://www.semantic

Müllensiefen, D., Hennig, C., Howells H. (2017). Using clustering of rankings to explain brand preferences with personality and sociodemographic variables. Journal of Applied Statistics, 45, 1-21. 10.1080/02664763.2017.1339025

Oti, E. U., Olusola, M. O., Eze, F. C., Enogwe, S. U. (2021). Comprehensive review of k-means clustering algorithms,” International Journal of Advances in Scientific Research and Engineering, 7, 64–69. http://dx.doi.org/10.31695/IJASRE.2021.34050

Sharma, R. D. (2020). Python tools for big data analytics. International Journal of Science and Research (IJSR), 9, 597-602. https://www.ijsr.net/archive/v9i5/SR20507222308.pdf

Suganya, R. & Shanthi, R. (2012). Fuzzy c-means algorithm – a review. International Journal of Scientific and Research Publications, 2, 1-3. https://www.ijsrp.org/research-paper-1112.php?rp=P11381

Wang, X. & Xu, Y. (2019). An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. IOP Conference Series: Material Science and Engineering, 569, 1-6. https://www.researchgate.net/publication/335081976_An_improved_index_for_clustering_validation_based_on_Silhouette_index_and_Calinski-Harabasz_index

Wijaya, Y. A., Kurniady, D. A., Setyanto, E., Tarihoran, W. S., Rusmana, D., Rahim,  R. (2021). Davies Bouldin index algorithm for optimizing clustering case studies mapping school facilities. TEM Journal, 10, 1099-1103. https://docplayer.net/219118152-Davies-bouldin-index-algorithm-for-optimizing-clustering-case-studies-mapping-school-facilities.html

Yong, Y., Chongxun, Z., Pan, L. (2004). A novel fuzzy c-means clustering algorithm for image thresholding. Measurement Science Review, 4, 11-19. https://www.semanticscholar.org/paper/A-Novel-Fuzzy-C-Means-Clustering-Algorithm-for-Yong-Chong-xun/ebbef7e8b7a1ea133999561ab279e51b961d31ec