EXPERIMENTAL COMPARISON OF WORD EMBEDDING MODELS FOR FAKE NEWS DETECTION

Jalal Mehdiyev

doi:http://doi.org/10.25045/jpit.v16.i1.03

About the Journal Editorial Board For authors Publication ethics Archive Abstracting & Indexing Contact us

№1, 2025

EXPERIMENTAL COMPARISON OF WORD EMBEDDING MODELS FOR FAKE NEWS DETECTION

Jalal Mehdiyev

jalal.mehdiyev.s@gmail.com

Detecting fake news in text data is a challenging task in today's world of generative AI, where it helps third parties generate fake news on a single request, making this problem increasingly relevant. Word embedding has been shown to be an effective model for solving classification problems. The purpose of this study and the experiments conducted is to find the strengths and weaknesses of each embedding model to solve the classification problem of finding falsified data. We consider three categories: traditional models (TF-IDF, LSA), predictive models (Word2Vec, GloVe, FastText), and contextualized models (BERT). The assessment is carried out using a test on three datasets - LIAR, ISOT, and COVID-19. In order to achieve a fair experiment, a single SVM classification method was chosen. The models are compared based on the metrics Accuracy, F1-score, and CPU time. The results of the research will help in selecting the efficient algorithm for researchers (pp.24-34).

Keywords: Fake news detection, Word embeddings models, Contextualized embedding models, Support Vector Machine

DOI:

http://doi.org/10.25045/jpit.v16.i1.03

View article(719)

References

Ahmed, H., Traoré, I., & Saad, S. (2018). Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1).
https://doi.org/10.1002/spy2.9
Barushka, A., & Hajek, P. (2019). Review Spam Detection Using Word Embeddings and Deep Neural Networks. In: MacIntyre, J., Maglogiannis, I., Iliadis, L., Pimenidis, E. (Eds.) Artificial Intelligence Applications and Innovations. AIAI 2019. IFIP Advances in Information and Communication Technology, vol. 559. Springer, Cham.
https://doi.org/10.1007/978-3-030-19823-7_28
Ghannay, S., Favre, B., Estève, Y., & Camelin, N. (2016). Word embedding evaluation and combination. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, 16 (pp. 300–305).
Joseph, P., & Yerima, S.Y. (2022). A comparative study of word embedding techniques for SMS spam detection. In 14th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 149-155).
https://doi.org/10.1109/CICN56167.2022.10008245
Naseem, U., Razzak, I., Khan, S. K., & Prasad, M. (2021). A comprehensive survey on word representation models: From classical to state-of-the-art word representation language models. ACM Transactions on Asian and Low-Resource Language Information Processing, 20(5), Article 74, 1-35. https://doi.org/10.1145/3434237
Nassif, A.B., Elnagar, A., Elgendy, O., & Afadar, Y. (2022). Arabic fake news detection based on deep contextualized embedding models. Neural Computing and Applications, 34, 16019–16032.
https://doi.org/10.1007/s00521-022-07206-4
Neelima, A., & Mehrotra, S. (2023). A Comprehensive Review on Word Embedding Techniques. In 2023 International Conference on Intelligent Systems for Communication, IoT and Security (pp. 538-543).
https://doi.org/10.1109/ICISCoIS56541.2023.10100347
Omar, E. (2023). Towards a Self- sustained House: Development of an Analytical Hierarchy Process System for Evaluating the Performance of Self-Sustained Houses. MSA Engineering Journal, 2(2), 56-79. https://doi.org/10.21608/msaeng.2023.291864
Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M. S., Ekbal, A., Das, A., & Chakraborty, T. (2021). Fighting an infodemic: COVID-19 fake news dataset. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., & Akhtar, M.S. (Eds.) Combating online hostile posts in regional languages during emergency situations, vol. 1402. Springer, Cham.
https://doi.org/10.1007/978-3-030-73696-5_3
Rodriguez, P.L., & Spirling, A. (2021). Word Embeddings: What Works, What Doesn’t, and How to Tell the Difference for Applied Research. The Journal of Politics, 84, 101-115.
https://doi.org/10.1086/715162
Samadi, M., Mousavian, M., & Momtazi, S. (2021). Deep contextualized text representation and learning for fake news detection. Information Processing & Management, 58(6), 102723.
https://doi.org/10.1016/j.ipm.2021.102723
Srinivasan, S., Ravi, V., Alazab, M., Ketha, S., Al-Zoubi, A.M., & Kotti Padannayil, S. (2021). Spam emails detection based on distributed word embedding with deep learning. In: Maleh, Y., Shojafar, M., Alazab, M., & Baddi, Y. (Eds.), Machine intelligence and big data analytics for cybersecurity applications, vol. 919. Springer, Cham.
https://doi.org/10.1007/978-3-030-57024-8_7
Torregrossa, F., Allesiardo, R., Claveau, V., Kooli, N., & Gravier, G. (2021) A survey on training and evaluation of word embeddings. International Journal of Data Science and Analytics, 11(2), 85–103.
https://doi.org/10.1007/s41060-021-00242-8
Truică, C.-O., & Apostol, E.-S. (2023). It’s All in the Embedding! Fake News Detection Using Document Embeddings. Mathematics, 11(3), 508 https://doi.org/10.3390/math11030508
Verma, P.K., Agrawal, P., Amorim, I., & Prodan, R. (2021). WELFake: Word Embedding Over Linguistic Features for Fake News Detection. IEEE Transactions on Computational Social Systems, 8(4), 881-893.
https://doi.org/10.1109/TCSS.2021.3068519
Wang, B., Wang, A., Chen, F., Wang, Y., & Kuo, C.-C. J. (2019). Evaluating word embedding models: methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8, e19.
https://doi.org/10.1017/ATSIP.2019.12
Wang, W.Y. (2017). “Liar, liar pants on fire”: A new benchmark dataset for fake news detection. In Proc. of the 55th Annual Meeting of the Association for Computational Linguistics, 2 (pp. 422–426). Association for Computational Linguistics.
https://doi.org/10.18653/v1/P17-2067
Yang, W., Li, L., Zhang, Z., Ren, X., Sun, X., & He, B. (2021). Be careful about poisoned word embeddings: Exploring the vulnerability of the embedding layers in NLP models. In Proc. of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2048–2058). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.165