ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий

ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий

ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий

ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий

ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий
ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ - Проблемы Информационных Технологий
НАЦИОНАЛЬНАЯ АКАДЕМИЯ НАУК АЗЕРБАЙДЖАНА

№1, 2022

ЭКСПЕРИМЕНТАЛЬНОЕ ИССЛЕДОВАНИЕ МЕТОДОВ МАШИННОГО ОБУЧЕНИЯ ПРИ ОБНАРУЖЕНИИ АНОМАЛИЙ

Макруфа Ш. Гаджирагимова, Лейла Р. Юсифова

В последнее время широкое использование компьютерных сетей привело к увеличению сетевых угроз и атак. Существующих систем и инструментов безопасности недостаточно для обнаружения атак злоумышленников на сетевую инфраструктуру, кроме того, они считаются устаревшими для хранения и анализа больших данных сетевого трафика с точки зрения размера, скорости и разнообразия. Обнаружение аномалий в данных сетевого трафика - одна из важнейших задач обеспечения сетевой безопасности, а также одно из основных направлений научных исследований. Несмотря на то, что в области обнаружения аномалий в сетевом трафике проведено значительное количество исследований, необходимо разработать более точные модели обнаружения. В статье анализируются некоторые алгоритмы машинного обучения, используемые для обнаружения аномалий. Возможность использования алгоритмов машинного обучения при обнаружении аномалий в данных трафика компьютерной сети - DoS-атак была экспериментально исследована на программной платформе WEKA. Для повышения эффективности алгоритмов классификации предложена ансамблевая модель, состоящая из нескольких алгоритмов классификации. Эффективность предложенного метода проанализирована с использованием базы данных NSL-KDD. Предложенный подход показал более высокую точность обнаружения аномалий по сравнению с результатами, показанными алгоритмами классификации при их работе в отдельности (стр.9-21).

Ключевые слова: большие данные, аномалия, DoS-атак, IDS, машинное обучение, ансамбль классификации
DOI : 10.25045/jpit.v13.i1.02
Литература

Abdulhammed R., Faezipour M., Abuzneid A., and AbuMallouh A. (2019). Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network trafc. IEEE Sensors Lett., Jan. 2019 3(1), pp. 1-4. https://doi.org/10.1109/LSENS.2018.2879990

Agarwal B., Mittal N. (2012). Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques. Procedia Technology, 6. pp. 996-1003. http://dx.doi.org/10.1016/j.protcy.2012.10.121

Aggarwal CC, Philip SY. (2005). An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), pp. 211–221.  https://doi.org/10.1007/s00778-004-0125-5

Agrawal S, Agrawal J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, pp. 708–713. https://doi.org/10.1016/j.procs.2015.08.220

Akbar S., Nageswara R. K., Chandulal J. A. (2010). Intrusion detection system methodologies based on data analysis. International Journal of Computer Applications. 5(2), pp. 10–20. http://dx.doi.org/10.5120/892-1266

Akoglu L., Tong H., Koutra D. (2015). Graph based anomaly detection and description: a survey. Data Mining Knowl Discov. 29(3), pp. 626–88. https://doi.org/10.1007/s10618-014-0365-y

Alguliyev R., Aliguliyev R., İmamverdiyev Y. N., Sukhostat L. (2018). Weighted Clustering for Anomaly Detection in Big Data. Statistics, Optimization & Information Computing, 6(2), pp. 178–188.  https://doi.org/10.19139/soic.v6i2.404

Alguliyev R. M., Aliguliyev R. M., Imamverdiyev Y. N. and Sukhostat L. V. (2017). An anomaly detection based on optimization. International Journal of Intelligent Systems and Applications, 9(12), pp. 87-96. DOI: 10.5815/ijisa.2017.12.08

Aliguliyev R. M., Hajirahimova M. Sh. (2019). Classification Ensemble Based Anomaly Detection in Network Traffic. Review of Computer Engineering Research, vol. 6(1), pp. 12-23. DOI:10.18488/journal.76.2019.61.12.23

Almeida V. A., Doneda D., & de Souza Abreu J. (2017). Cyberwarfare and Digital Governance. IEEE Internet Computing. 21(2), pp. 68-71. https://doi.org/10.1109/MIC.2017.23

Antal B. and Hajdu A. (2014). An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst., vol. 60, pp. 20–27. https://doi.org/10.1016/j.knosys.2013.12.023

Ariyaluran R. A. H., et al. (2019). Real-time big data processing for anomaly detection: A Survey. International Journal of Information Management, vol.45, pp. 289-307. https://doi.org/10.1016/j.ijinfomgt.2018.08.006

Bellman R. (2013). Dynamic programming. Chelmsford: Courier Corporation. 

Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp. 5–32. https://doi.org/10.1023/A:1010933404324

Buczak A. L., & Guven E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), pp. 1153-1176. https://doi.org/10.1109/comst.2015.2494502

Camacho J., Macia-Fernandez G., Diaz-Verdejo J., Garcia-Teodoro P. (2014). Tackling the big data 4 vs for anomaly detection. In: Computer communications workshops (INFOCOM WKSHPS), 2014 IEEE conference on. IEEE. pp. 500–505.  https://doi.org/10.1177/1550147720921309

Carlos A. Catania, Facundo Bromberg, Carlos Garcнa Garino (2012). An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection Expert Systems with Applications. 39(2), pp. 1822–1829. https://doi.org/10.1016/j.eswa.2011.08.068

Chandola V., Banerjee A., Kumar V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), pp. 71-97. https://doi.org/10.1145/1541880.1541882

Chaudhary K., Yadav J., & Mallick B. (2012). A review of fraud detection techniques: Credit card. International Journal of Computers and Applications, 45(1), pp. 39–44. DOI: 10.5120/6748-8991

Dash M., & Ng W. (2010). Outlier detection in transactional data. Intelligent Data Analysis, 14(3), pp. 283–298. DOI: 10.3233/ida-2010-0422

Denning D. E. (1987). An Intrusion-Detection Model. IEEE transactions on software engineering, 13(2), pp. 222 – 232. https://doi.org/10.1109/TSE.1987.232894

Dewan Md. F., Nouria Harbi, and Mohammad Zahidur Rahman (2010). Combining naive bayes and decision tree for adaptive intrusion detection. International Journal of Network Security & Its Applications (IJNSA), 2(2), pp. 1-12. DOI: 10.5121/ijnsa.2010.2202 

Dhanabal, Shantharajah S. P. (2015). A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. International Journal of Advanced Research in Computer and Communication Engineering 9(4) (2015) pp. 446–452. http://dx.doi.org/10.4236/jcc.2016.44008

Dua S., Du X. (2011). Data mining and machine learning in cybersecurity. Boca Raton, FL, CRC Press, 256 p. https://doi.org/10.1201/b10867

Erfani S. M., Rajasegarar S., Karunasekera S., Leckie C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recogn. 58 pp. 121–34. https://doi.org/10.1016/j.patcog.2016.03.028

Farnaaz, N., & Jabbar, M. (2016). Random Forest Modeling for Network Intrusion Detection System. Procedia Computer Science, 89, pp. 213-217. https://doi.org/10.1016/j.procs.2016.06.047

Fawcett T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27 (8), pp. 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Fujimaki R., Yairi T., Machida K. (2005). An approach to spacecraft anomaly detection problem using kernel feature space. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press, New York, NY, USA, pp. 401–410. https://doi.org/10.1145/1081870.1081917

Garofalo M. (2017). Big data analytics for Flow-based anomaly detection in high-speed networks. PhD Thesis. http://dx.doi.org/10.6093/UNINA/FEDOA/11617

Global - VNI Complete Forecast Highlights https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2021_Forecast_Highlights.pdf

Gogoi P., Bhattacharyya D. K., Borah B., and Kalita J. K. (2011). A survey of outlier detection methods in network anomaly identification. The Computer Journal, 54(4), pp. 570-588. https://doi.org/10.1093/comjnl/bxr026

Goldstein M, Uchida S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE. 11(4). https://doi.org/10.1371/journal.pone.0152173

Gu G., Fogla P., Dagon D., Lee W., Skori B. (2006). Measuring intrusion detection capability: An information-theoretic approach. Proceedings of the ACM Symposium on Information, Computer and Communications Security, pp. 90–101. https://doi.org/10.1145/1128817.1128834

Gupta M., Gao J., Aggarwal C.C., Han J. (2014). Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng. 26(9), pp. 2250–67. https://doi.org/10.1109/TKDE.2013.184

Hacırəhimova M. Ş., (2014). Big Data texnologiyaları və informasiya təhlükəsizliyi problemləri. İnformasiya texnologiyaları problemləri, №2, pp. 49–56. http://dx.doi.org/10.25045/jpit.v07.i1.06

He H., Wang J., Graco W., and Hawkins S. (1997). Application of neural networks to detection of medical fraud. Expert Systems with Applications 13(4), pp. 329–336. https://doi.org/10.1016/S0957-4174(97)00045-6

Heydari A. et al. (2015). Detection of review spam: a survey. Expert Syst Appl, 42(7) pp. 3634–42. https://doi.org/10.1016/j.eswa.2014.12.029

Hodge V., Austin J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), pp. 85–126. https://doi.org/10.1007/s10462-004-4304-y

Holz T. (2008). Security measurements and metrics for networks. Lecture Notes in Computer Science, vol. 4909, pp. 157–165. http://dx.doi.org/10.4236/ijcns.2013.61004

Husejinović A. (2020). Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences 8(1), pp. 1-5. http://pen.ius.edu.ba

Xuan S., Liu G., Li Z., Zheng L., Wang S., & Jiang C. (2018). Random forest for credit card fraud detection. IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). https://doi.org/10.1109/ICNSC.2018.8361343

Imamverdiyev Y., Abdullayeva F. (2018). Deep Learning Method for Denial of Service Attack Detection Based on Restricted Boltzmann Machine Big Data, 6(2), pp. 159-169. https://doi.org/10.1089/big.2018.0023

Jin Huang and Charles, Ling X. (2005). Using AUC and Accuracy in Evaluating Learning Algorithms IEEE transactions on knowledge and data engineering, 17(3), pp. 299-310. https://doi.org/10.1109/TKDE.2005.50

KDD data set, 1999. http://kdd.ics.uci.edu/databases/kddcup99

Kim G., Lee S., and Kim S. (2014). A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst. Appl., 41(4), pp. 1690-1700. https://doi.org/10.1016/j.eswa.2013.08.066

Kumari R., Sheetanshu, Singh M. K., Jha R. & Singh N. K. (2016). Anomaly detection in network traffic using K-mean clustering. 3rd International Conference on Recent Advances in Information Technology (RAIT). https://doi.org/10.1109/RAIT.2016.7507933

Lee W., Stolfo S. J., Mok K. W. (2000). Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6), pp. 533-567. https://doi.org/10.1023/A:1006624031083

Marnerides A. K., Spachos P., Chatzimisios P., and Mauthe A. U. (2015). Malware detection in the cloud under Ensemble Empirical Mode Decomposition. In 2015 Int. Conf. Comput. Netw. Commun. IEEE., pp. 82–88. https://doi.org/10.4018/IJESMA.2018070104

McHugh J. (2000). Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3(4), pp. 262–294. https://doi.org/10.1145/382912.382923

Mukherjee B., Heberline L. T., & Levitt K. (1994). Network instruction detection. IEEE Network, 8, pp. 26–41. https://doi.org/10.1007/978-0-387-33112-6_8 

Münz G., Li S., Carle G. (2007). Traffic anomaly detection using k-means clustering. In: GI/ITG Workshop MMBnet. pp. 13-14. DOI:10.1.1.323.6870

Nassif A. B. et al. (2021). Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access,  vol.9, pp. 78658- 78700. https://doi.org/10.1109/access.2021.3083060

NSL-KDD data set for network-based intrusion detection systems [Electronic resource]. 2017. Access mode: http://nsl.cs.unb.ca/NSL-KDD/ 

Patcha A., Park J.M., (2007). An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput Netw. 51(12), pp.3448–3470. http://dx.doi.org/10.1016%2Fj.comnet.2007.02.001

Phua C., Lee V., Smith-Miles K., & Gayler R. (2010). A comprehensive survey of data miningbased fraud detection, Research Computing Research Repository https://arxiv.org/ct?url=https%3A%2F%2Fdx.doi.org%2F10.1016%2Fj.chb.2012.01.002&v=67e7929e 

Raguseo E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), pp. 187-195. https://doi.org/10.1016/j.ijinfomgt.2017.07.008 

Rehman M. H., Liew C. S., Abbas A., Jayaraman P. P., Wah T. Y., & Khan S. U. (2016). Big data reduction methods: a survey. Data Science and Engineering, 1(4), pp. 265-284. https://doi.org/10.1007/s41019-016-0022-0

Samuel A. L. (1959). Some studies in Machine Learning using the game of checkers. IBM Journal of research and development, 3(3), pp. 210–229.  https://doi.org/10.1147/rd.33.0210

Saneja B., Rani R. (2017). An efficient approach for outlier detection in big sensor data of health care. International Journal of Communication Systems, 30(17), pp. 1-10. https://doi.org/10.1002/dac.3352

Schlegl T., Seeböck P., Waldstein S. M., Schmidt-Erfurth U., and Langs G. (2017). Unsupervised Anomaly Detection With Generative Adversarial Networks to Guide Marker Discovery. International Conference on Information Processing in Medical Imaging, pp. 146-157. https://doi.org/10.1007/978-3-319-59050-9_12

Shon T., Moon J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), pp. 3799-3821. https://doi.org/10.1016/j.ins.2007.03.025

Srikanth T., Philip B., Jiong J. et al. (2020). A comprehensive survey of anomaly detection techniques for high dimensional big data. Journal of Big Data, 7(42), pp. 1-30. https://doi.org/10.1186/s40537-020-00320-x

Tsai C. F., Hsu Y. F., Lin C. Y., Lin W. Y. (2009). Intrusion detection by machine learning: A review. Expert Syst. Appl., 36(10), pp. 11994-12000. https://doi.org/10.1016/j.eswa.2009.05.029

Varian I. (2020). IMRT (Intensity Modulated Radiation Therapy). 26 June 2020. https://patient.varian.com/en/treatments/radiation-therapy/treatment-techniques

Wang C., Zhao Z., Gong L., Zhu L., Liu Z., & Cheng X. (2018). A Distributed Anomaly Detection System for In-Vehicle Network Using HTM. IEEE ACCESS, 6, pp. 9091-9098. https://doi.org/10.3390/s20143934

Wang L. & Jones R. (2017). Big data analytics for network intrusion detection: A survey. International Journal of Networks and Communications, 7(1), pp. 24-31 doi: 10.5923/j.ijnc.20170701.03

Wauters, M., & Vanhoucke, M. (2017). A Nearest Neighbour extension to project duration forecasting with Artificial Intelligence. European Journal of Operational Research, 259(3), pp. 1097-1111. https://doi.org/10.1016/j.ejor.2016.11.018

Wei Y. et al. (2019). MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers,  pp. 1-12. https://doi.org/10.1145/3459930.3469523

Yang T. et al. (2016). Improve the Prediction Accuracy of Naive Bayes Classifier with Association Rule Mining. IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance, and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 129-133. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.38

Zhang J., Zulkernine M., and Haque A. (2008). Random-Forests-Based Network Intrusion Detection Systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(5), pp. 649–659.  https://doi.org/10.1109/TSMCC.2008.923876 

Zhang M.L., Zhou Z.H., (2005). A k-nearest neighbor based algorithm for multi-label classification / Proc. of the International Conference on Granular Computing, pp. 718–721. https://doi.org/10.1109/GRC.2005.1547385

Zhou Z., (2012). Hua Ensemble Methods. Foundations and Algorithms. CRC Press, p.234. https://doi.org/10.1201/b12207