№2, 2022
INVESTIGATION OF CLUSTERING AND CLASSIFICATION METHODS FOR INTELLECTUAL ANALYSIS OF LOG FILES
Today, the application of information technology in all areas of our lives has led to wider spread and popularity of cybercrime. In modern industrial control systems and cyber-physical systems, log files are very important in terms of detecting cyber incidents, identifying and preventing threats and anomalies. However, today, a large volume of log files generated in these systems greatly complicates the process of extracting useful information from them. This, in turn, highlights the need for intellectual analysis of log files. To this end, this article explores a number of clustering and classification methods and algorithms for the intellectual analysis of log files. Thus, K-means, CURE, EM, kNN, Naive Bayes and DT algorithms are selected out of these algorithms and their working principle is studied, explained, and the application of each algorithm on KDD CUP 99 data set is studied and compared (pp.48-60).
- Alguliyev R.M., Aliguliyev R.M., Sukhostat, L.V. (2020). Parallel Batch k-means for Big Data Clustering. Computers & Industrial Engineering, vol 152.
Parallel batch k-means for Big data clustering - ScienceDirect - Alıquliyev, R.M., Niftaliyeva, G.Y (2016). Application opportunities of Data Mining Technologies in E-government system analysis “Big Data: capabilities, multidisciplinary problems and perspectives” I Republican scientific-practical conference - Baku, 2016. - pp. 81-84. (in Azerbaijani)
https://ict.az/uploads/konfrans/biq_data/1-21_Gunay_Nifteliyeva_-_E-dovlt_sisteminin_analizind_data_mining_texnologiyalarnn_ttbiq_imkanlar.pdf - Altunkaynak A., Başakın E.E. ve Kartal E., (2020). Air Polution Prediction with Wavelet K-Nearest Neighbour Method.
https://dergipark.org.tr/en/download/article-file/1342958 - Aslı Çalış, Sema Kayapınar, Tahsin Çetinyokuş (2014). An Application On Computer And Internet Security With Decision Tree Algorıthms In Data Mining. Journal Of Industrial Engineering Vol: 25 №: 3-4 P: (2-19)
https://dergipark.org.tr/tr/download/article-file/752270 - Aytuğ ONAN (2015). Comparative Performance Analysis of Decision Tree Algorithms in the Corporate Bankruptcy Prediction. Information Technologies Journal, Vol: 8, №: 1, https://dergipark.org.tr/en/download/article-file/75347
- Babak R. Nabiyev (2018). Application of clustering methods network traffic for detecting DDoS attacks. Problems of Information Technologies, 2018, №1, 110–120.
APPLICATION_OF_CLUSTERING_METHODS_NETWORK_TRAFFIC_FOR_DETECTING_DDOS_ATTACKS__azerb._.pdf (jpit.az) - Babak R. Nabiyev, (2015). Network traffic clustering method. II Republican scientific-practical conference on multidisciplinary problems of information security, dedicated to the 150th anniversary of the International Telecommunication Union. - S. 213-215. https://ict.az/uploads/konfrans/2_konfrans/58.pdf
- Burak D.B. (2020). Algorithm: Naive Bayes classifier.
https://www.datasciencearth.com/algorithm-naive-bayes-classifier/ - Edy Umargono, Jatmiko Endro Suseno, S.K. Vincesius Gunawan (2019). K-Means Clustering Optimization Using the Elbow Method and Early Centroid Determination Based on Mean and Median Formula. The 2nd International Seminar on Science and Technology (ISSTEC 2019)
(PDF) K-Means Clustering Optimization Using the Elbow Method and Early Centroid Determination Based on Mean and Median Formula (researchgate.net) - Fargana J. Abdullayeva, Sabira S. Ojagverdiyeva (2021). An approach to identify vulgarism based on machine learning. Problems of Information Technologies, 2021, №2, 89–98.
AN_APPROACH_TO_IDENTIFY_VULGARISM_BASED_ON_MACHINE_LEARNING.pdf (jpit.az) - Han J., Kamber M., Pei J., (2011). Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann 744 p.
https://www.sciencedirect.com/book/9780123814791/data-mining-concepts-and-techniques - Jingzhong Wang, Xia Li, (2010). An improved KNN algorithm for text classification. International Conference on Information, Networking and Automation (ICINA)., vol2, pp., 436-439.
An improved KNN algorithm for text classification | IEEE Conference Publication | IEEE Xplore - Kesavaraj, G., Sukumaran, S. (2013). A Study On Classification Techniques in Data Mining. IEEE Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), pp 1–7.
A study on classification techniques in data mining | IEEE Conference Publication | IEEE Xplore - Meral DEMİRALAY, A. Yılmaz ÇAMURCU, (2005). COMPARISON OF CLUSTERING CHARACTERISTICS OF CURE, AGNES AND K-MEANS ALGORITHMS. Istanbul Commerce University Journal of Science, 2005, №2, p.1-18.
https://dergipark.org.tr/tr/download/article-file/199461 - Muhammad Zulfadhilah, Yudi Prayudi, Imam Riadi (2016). Cyber Profiling using Log Analysis and K-Means Clustering A Case Study Higher Education in Indonesia. International Journal of Advanced Computer Science and Applications, vol 7.
https://www.researchgate.net/publication/305737193_Cyber_Profiling_using_Log_Analysis_and_K-Means_Clustering_A_Case_Study_Higher_Education_in_Indonesia - Nadkarni, Prakash (2016). Clinical Research Computing and Core Technologies: Data Mining and “Big Data”, pp., 187–204.
Core Technologies: Data Mining and “Big Data” - ScienceDirect - Ömürbek, N., Dağ, O., Eren, H., (2020). Evaluation of Airports Clustered According to EM Algorithm by Side Count Method. Atatürk University Journal of Economics and Administrative Sciences, 34(2): 491-514.
https://dergipark.org.tr/en/download/article-file/1049649 - Pecht, Michael G., Kang, Myeongsu (2018). Machine Learning: Fundamentals.
Machine Learning: Fundamentals - Prognostics and Health Management of Electronics - Wiley Online Library - Ramiz H. Shikhaliyev (2022). A method for Intelligent Scheduling Of Computer Networks Monitoring. Problems of Information Technology (2022), vol. 13, no. 1, 38-42.
A_METHOD_FOR_INTELLIGENT_SCHEDULING_OF_COMPUTER_NETWORKS_MONITORING.pdf (jpit.az) - Sudipto Guha, Rajeev Rastogi, Kyuseok Shim (2001). CURE: AN EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATABASES. Information Systems Vol. 26, No. 1, pp. 35-58.
https://www.sciencedirect.com/science/article/abs/pii/S0306437901000084 - Tavallaee M., Bagheri E., Lu W., Ghorbani A.A. A detailed analysis of the KDD CUP 99 data set. IEEE Symposium on Computational Intelligencein Security and Defense Applications, 2009, pp.53–58.
- Tim Zwietasch (2014). Detecting Anomalies in System Log Files using Machine Learning Techniques.
(Detecting Anomalies in System log files using Machine Learning Techniques (d-nb.info)) - Yadigar, N. Imamverdiyev, Babek R. Nabiyev (2014). Multi-classifier model for network traffic. Problems of Information Technologies, - 2014. - N: 2(6). - S. 68-74.
https://jpit.az/uploads/article/az/MULTI_CLASSIFICATORY_MODEL_FOR_NETWORK_TRAFFIC___azerb._.pdf - Yang T. et al. (2016). Improve the Prediction Accuracy of Naive Bayes Classifier with Association Rule Mining. IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance, and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 129-133. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.38
https://ieeexplore.ieee.org/document/5356528 - Ying, S., Wang, B., Wang, L., Li, Q., Zhao, Y., Shang, J., Geng, J. (2021). An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples. ACM Transactions on Knowledge Discovery from Data, 15(3), 1–22.
An Improved KNN-Based Efficient Log Anomaly Detection Method with Automatically Labeled Samples | ACM Transactions on Knowledge Discovery from Data - Banu Zafer (2006). Unobservable class analysis and application. Yıldız Technical University, Graduate School of Natural and Applied Sciences, 2006
http://dspace.yildiz.edu.tr/xmlui/bitstream/handle/1/4189/0028352.pdf?sequence=1&isAllowed=y - Zhang, S., Xuelong L., Zong M., Xiaofeng Z., Cheng D., (2017). Learning k for kNN Classification. ACM Transactions on Intelligent Systems and Technology, vol 8, pp., 1–19.
Learning k for kNN Classification | ACM Transactions on Intelligent Systems and Technology