revision

№1, 2025

EXPERIMENTAL COMPARISON OF WORD EMBEDDING MODELS FOR FAKE NEWS DETECTION

Jalal Mehdiyev

Detecting fake news in text data is a challenging task in today's world of generative AI, where it helps third parties generate fake news on a single request, making this problem increasingly relevant. Word embedding has been shown to be an effective model for solving classification problems. The purpose of this study and the experiments conducted is to find the strengths and weaknesses of each embedding model to solve the classification problem of finding falsified data. We consider three categories: traditional models (TF-IDF, LSA), predictive models (Word2Vec, GloVe, FastText), and contextualized models (BERT). The assessment is carried out using a test on three datasets - LIAR, ISOT, and COVID-19. In order to achieve a fair experiment, a single SVM classification method was chosen. The models are compared based on the metrics Accuracy, F1-score, and CPU time. The results of the research will help in selecting the efficient algorithm for researchers (pp.24-34).

Keywords: Fake news detection, Word embeddings models, Contextualized embedding models, Support Vector Machine
References
  • Ahmed, H., Traoré, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1).
    https://doi.org/10.1002/spy2.9
  • Barushka, A., & Hajek, P. (2019). Review Spam Detection Using Word Embeddings and Deep Neural Networks. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (Eds.) Artificial Intelligence Applications and Innovations. AIAI 2019. IFIP Advances in Information and Communication Technology, vol. 559. Springer, Cham.
    https://doi.org/10.1007/978-3-030-19823-7_28
  • Ghannay, S., Favre, B., Estève, Y., & Camelin, N. (2016). Word embedding evaluation and combination. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, 16 (pp. 300–305).
  • Joseph, P., & Yerima, S.Y. (2022). A comparative study of word embedding techniques for SMS spam detection. In 14th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 149-155).
    https://doi.org/10.1109/CICN56167.2022.10008245
  • Naseem, U., Razzak, I., Khan, S. K., & Prasad, M. (2021). A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(5), Article 74, 1-35. https://doi.org/10.1145/3434237
  • Nassif, A.B., Elnagar, A., Elgendy, O., & Afadar, Y. (2022). Arabic fake news detection based on deep contextualized embedding models. Neural Computing and Applications, 34, 16019–16032.
    https://doi.org/10.1007/s00521-022-07206-4
  • Neelima, A., & Mehrotra, S. (2023). A Comprehensive Review on Word Embedding Techniques. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (pp. 538-543).
    https://doi.org/10.1109/ICISCoIS56541.2023.10100347
  • Omar, E. (2023). Towards a Self- sustained House: Development of an Analytical Hierarchy Process System for Evaluating the Performance of Self-Sustained Houses. MSA Engineering Journal, 2(2), 56-79. https://doi.org/10.21608/msaeng.2023.291864
  • Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2021). Fighting an infodemic: COVID-19 fake news dataset. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., & Akhtar, M.S. (Eds.) Combating online hostile posts in regional languages during emergency situations, vol. 1402. Springer, Cham.
    https://doi.org/10.1007/978-3-030-73696-5_3
  • Rodriguez, P.L., & Spirling, A. (2021). Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research. The Journal of Politics, 84, 101-115.
    https://doi.org/10.1086/715162
  • Samadi, M., Mousavian, M., & Momtazi, S. (2021). Deep contextualized text representation and learning for fake news detection. Information Processing & Management, 58(6), 102723.
    https://doi.org/10.1016/j.ipm.2021.102723
  • Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., & Kotti Padannayil, S. (2021). Spam emails detection based on distributed word embedding with deep learning. In: Maleh, Y., Shojafar, M., Alazab, M., & Baddi, Y. (Eds.), Machine intelligence and big data analytics for cybersecurity applications, vol. 919. Springer, Cham.
    https://doi.org/10.1007/978-3-030-57024-8_7
  • Torregrossa, F., Allesiardo, R., Claveau, V., Kooli, N., & Gravier, G. (2021) A survey on training and evaluation of word embeddings. International Journal of Data Science and Analytics, 11(2), 85–103.
    https://doi.org/10.1007/s41060-021-00242-8
  • Truică, C.-O., & Apostol, E.-S. (2023). It’s All in the Embedding! Fake News Detection Using Document Embeddings. Mathematics, 11(3), 508 https://doi.org/10.3390/math11030508
  • Verma, P.K., Agrawal, P., Amorim, I., & Prodan, R. (2021). WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Transactions on Computational Social Systems, 8(4), 881-893.
    https://doi.org/10.1109/TCSS.2021.3068519
  • Wang, B., Wang, A., Chen, F., Wang, Y., & Kuo, C.-C. J. (2019). Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8, e19.
    https://doi.org/10.1017/ATSIP.2019.12
  • Wang, W.Y. (2017). “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, 2 (pp. 422–426). Association for Computational Linguistics.
    https://doi.org/10.18653/v1/P17-2067
  • Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., & He, B. (2021). Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in NLP models. In Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2048–2058). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.165