№1, 2021

DETECTION OF FAKE PROFILES IN SOCIAL NETWORKS WITH THE APPLICATION OF CLUSTERING METHODS

Khayala V. Ahmadova

Social network has millions of active users, as it offers a number of opportunities for social network users to make new friends, read the news, get useful information, and have fun. Having millions of active users of social networks creates conditions for the implementation of malicious purposes, such as manipulation of people, various types of challenges, discrediting people or organizations. In this case, fake profiles operating as groups such as troll profiles, sibyl accounts, sockpuppets, bot accounts, etc. are widely used. When classifying the algorithms used to detect fake profiles, the problems such as, data must have labels, the time spent classifying many profiles, and so on. arise. This article uses k-means, Gaussian Mixture, agglomerative clustering, spectral clustering algorithms to group fake profiles on social networks. Since clustering algorithms perform worse than classification methods in detecting fake profiles, this article discusses in which data the clustering methods used to detect fake profiles give better results. During the application of the algorithms, open access datasets containing profile-based data are used. Based on the results obtained during the performance evaluation of clustering methods using evaluation metrics such as, adjusted rand index, homogeneity, completeness, etc. the agglomerative clustering algorithm shows better results than other applied clustering algorithms (pp.83–94).

Keywords: Fake profile, clustering, k-means, agglomerative clustering, feature
References
  • Security threats we face while using social media.
    https://novalisit.com/security-threats-we-face-while-using-social-media/
  • İmamverdiyev Y.N., Əhmədova X. Sosial şəbəkələrdə saxta profillərin aşkarlanması metodları haqqında / İnformasiya təhlükəsizliyinin aktual multidissiplinar elmi-praktiki problemləri V respublika konfransının əsərləri, 2019, s.45–48.
  • İmamverdiyev Y.N. Sosial media və təhlükəsizlik problemləri / İnformasiya təhlükəsizliyinin multidissiplinar problemləri üzrə II respublika elmi-praktiki konfransı, 2015, s.189–192.
  • Şıxəliyev R.H. Sosial şəbəkələrdə təhlükəsizlik problemləri // İnformasiya cəmiyyəti problemləri, 2016, №2, s.80–88.
  • Van Der Walt E., Eloff J. Using machine learning to detect fake identities: Bots vs humans // IEEE Access, 2018, vol.6, pp.6540–6549.
  • Wani M.A., Jabin S. A sneak into the devil's colony - fake profiles in online social networks // arXiv preprint arXiv:1705.09929, 2017.
  • Boshmaf Y., Muslukhov I., Beznosov K., & Ripeanu M. Design and analysis of a social botnet // Computer Networks, 2013, vol.57, no.2, pp.556–578.
  • Ferrara E., Varol O., Davis C., Menczer F., & Flammini A. The rise of social bots // Communications of the ACM, 2016, vol. 59, no.7, pp.96–104.
  • Silva S.S., Silva R.M., Pinto R.C., & Salles R.M. Botnets: A survey // Computer Networks, 2013, vol. 57, no. 2, pp.378–403.
  • Al-Qurishi M., Al-Rakhami M., Alamri A., Alrubaian M., Rahman S.M.M., & Hossain M.S. Sybil defense techniques in online social networks: a survey // IEEE Access, 2017, vol.5, pp.1200–1219.
  • Douceur J.R. The sybil attack // International Workshop on Peer-to-Peer Systems, 2002, pp.251–260.
  • Yamak Z., Saunier J., Vercouter L. Detection of multiple identity manipulation in collaborative projects / Proceedings of the 25th International Conference Companion on World Wide Web, 2016, pp.955–960.
  • Im J., Chandrasekharan E., Sargent J., Lighthammer P., et al. Still out there: Modeling and identifying russian troll accounts on twitter / 12th ACM Conference on Web Science, 2020, pp.1–10.
  • Albayati M.B., Altamimi A.M. Identifying Fake Facebook Profiles Using Data Mining Techniques // Journal of ICT Research and Applications, 2019, vol.13, no.2, pp.107–117.
  • Xiao C., Freeman D.M., Hwa T. Detecting clusters of fake accounts in online social networks // Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security, 2015, pp.91–101.
  • Zarei K., Farahbakhsh R., Crespi N. How impersonators exploit Instagram to generate fake engagement? // arXiv preprint arXiv:2002.07173, 2020.
  • Alowibdi J. S., Buy U. A., Philip S. Y., Ghani S., & Mokbel M.Deception detection in Twitter // Social network analysis and mining, 2015, vol.5, no.1, Article number 32, 13 p.
  • Bilge A., Ozdemir Z., Polat H. A novel shilling attack detection method // Procedia Computer Science, 2014, vol.31, pp.165–174.
  • Miller Z., Dickinson B., Deitrick W., Hu W., & Wang A.H. Twitter spammer detection using data stream clustering // Information Sciences, 2014, vol.260, pp.64–73.
  • Hu F., Li Z., Yang C., & Jiang Y. A graph-based approach to detecting tourist movement patterns using social media data // Cartography and Geographic Information Science, 2019, vol.46, no.4, pp.368–382.
  • Luo F., Cao G., Mulligan K., & Li X. Explore spatiotemporal and demographic characteristics of human mobility via Twitter: A case study of Chicago // Applied Geography, 2016, vol.70, pp.11–25.
  • Yamak Z., Saunier J., Vercouter L. SocksCatch: Automatic detection and grouping of sockpuppets in social media // Knowledge-Based Systems, 2018, vol.149, pp.124–142.
  • Stringhini G., Mourlanne P., Jacob G., Egele M., Kruegel C., & Vigna G. EvilCohort: Detecting communities of malicious accounts on online services / 24th USENIX Security Symposium, 2015, pp.563–578.
  • Ganjaliyev F. New method for community detection in social networks extracted from the Web / Proc. of the 4th International Conference “Problems of Cybernetics and Informatics” (PCI), 2012, pp.1–2.
  • Alguliev R.M., Aliguliyev R.M., Ganjaliyev F.S. Partition clustering-based method for detecting community structures in weighted social networks // International Journal of Information Processing and Management, 2013, vol.4, no.2, pp.60–72.
  • Cao Q., Yang X., Yu J., & Palow C. Uncovering large groups of active malicious accounts in online social networks / Proc. of the 2014 ACM SIGSAC Conference on Computer and Communications Security, 2014, pp.477–488.
  • Wani M.A., Jabin S. Mutual clustering coefficient-based suspicious-link detection approach for online social networks // Journal of King Saud University – Computer and Information Sciences, 2018.
  • Wang A.H. Don't follow me: Spam detection in twitter / International Conference on Security and Cryptography (SECRYPT), IEEE, 2010, pp.1–10.
  • Kwak H., Lee C., Park H., & Moon S. What is Twitter, a social network or a news media? / Proceedings of the 19th International Conference on World Wide Web, 2010, pp.591–600.
  • Zheng X., Zeng Z., Chen Z., Yu Y., & Rong C. Detecting spammers on social networks // Neurocomputing, 2015, vol.159, pp.27–34.
  • Yang C., Harkreader R., Gu G. Empirical evaluation and new design for fighting evolving twitter spammers // IEEE Transactions on Information Forensics and Security, 2013, vol.8, no.8, pp.1280–1293.
  • Wei F., Nguyen U. T. Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings / Proc. of the 1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), 2019, pp.101–109.
  • Harshitkgupta / Fake-Profile-Detection-using-ML.
    https://github.com/harshitkgupta/Fake-Profile-Detection-using-ML/tree/master/data
  • Instagram fake spammer genuine account.
    https://www.kaggle.com/free4ever1/instagram-fake-spammer-genuine-accounts
  • Fcakyon / instafake-dataset.
    https://github.com/fcakyon/instafake-dataset/tree/master/data/fake-v1.0
  • Radheysm / Fake-Profile-Detection. 
    https://github.com/radheysm/Fake-Profile-Detection
  • KhayalaAhmadova / Fake_profile_clustering.
    https://github.com/KhayalaAhmadova/Fake_profile_clustering
  • Rank.
    https://orange3.readthedocs.io/projects/orange-visual-programming/en/latest/widgets/ data/rank.html
  • Principal Component Analysis and k-means clustering to visualize a high dimensional dataset. https://medium.com/@dmitriy.kavyazin/principal-component-analysis-and-k-means-clustering-to-visualize-a-high-dimensional-dataset-577b2a7a5fe2
  • https://scikit-learn.org/stable/modules/clustering.html#clustering
  • Clustering performance evaluation.
    https://scikit-learn.org/stable/modules/clustering.html#clustering-performance-evaluation