№1, 2022
EXPERIMENTAL STUDY OF MACHINE LEARNING METHODS IN ANOMALY DETECTION
Recently, the widespread usage of computer networks has led to the increase of network threats and attacks. Existing security systems and devices are insufficient in the detection of intruders' attacks on network infrastructure, and they considered to be outdated for storing and analyzing large network traffic data in terms of size, speed, and diversity. Detection of anomalies in network traffic data is one of the most important issues in providing network security. In the paper, we investigate the possibility of using machine learning algorithms in the detection of anomalies – DoS attacks in computer network traffic data on the WEKA software platform. Ensemble model consisting of several unsupervised classification algorithms has been proposed to increase the efficiency of classification algorithms. The effectiveness of the proposed model was studied using the NSL-KDD database. The proposed approach showed a higher accuracy in the detection of anomalies compared to the results shown by the classification algorithms separately (pp.9-19).
Abdulhammed R., Faezipour M., Abuzneid A., and AbuMallouh A. (2019). Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network trafc. IEEE Sensors Lett., Jan. 2019 3(1), pp. 1-4. https://doi.org/10.1109/LSENS.2018.2879990
Agarwal B., Mittal N. (2012). Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques. Procedia Technology, 6. pp. 996-1003. http://dx.doi.org/10.1016/j.protcy.2012.10.121
Aggarwal CC, Philip SY. (2005). An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), pp. 211–221. https://doi.org/10.1007/s00778-004-0125-5
Agrawal S, Agrawal J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, pp. 708–713. https://doi.org/10.1016/j.procs.2015.08.220
Akbar S., Nageswara R. K., Chandulal J. A. (2010). Intrusion detection system methodologies based on data analysis. International Journal of Computer Applications. 5(2), pp. 10–20. http://dx.doi.org/10.5120/892-1266
Akoglu L., Tong H., Koutra D. (2015). Graph based anomaly detection and description: a survey. Data Mining Knowl Discov. 29(3), pp. 626–88. https://doi.org/10.1007/s10618-014-0365-y
Alguliyev R., Aliguliyev R., İmamverdiyev Y. N., Sukhostat L. (2018). Weighted Clustering for Anomaly Detection in Big Data. Statistics, Optimization & Information Computing, 6(2), pp. 178–188. https://doi.org/10.19139/soic.v6i2.404
Alguliyev R. M., Aliguliyev R. M., Imamverdiyev Y. N. and Sukhostat L. V. (2017). An anomaly detection based on optimization. International Journal of Intelligent Systems and Applications, 9(12), pp. 87-96. DOI: 10.5815/ijisa.2017.12.08
Aliguliyev R. M., Hajirahimova M. Sh. (2019). Classification Ensemble Based Anomaly Detection in Network Traffic. Review of Computer Engineering Research, vol. 6(1), pp. 12-23. DOI:10.18488/journal.76.2019.61.12.23
Almeida V. A., Doneda D., & de Souza Abreu J. (2017). Cyberwarfare and Digital Governance. IEEE Internet Computing. 21(2), pp. 68-71. https://doi.org/10.1109/MIC.2017.23
Antal B. and Hajdu A. (2014). An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst., vol. 60, pp. 20–27. https://doi.org/10.1016/j.knosys.2013.12.023
Ariyaluran R. A. H., et al. (2019). Real-time big data processing for anomaly detection: A Survey. International Journal of Information Management, vol.45, pp. 289-307. https://doi.org/10.1016/j.ijinfomgt.2018.08.006
Bellman R. (2013). Dynamic programming. Chelmsford: Courier Corporation.
Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp. 5–32. https://doi.org/10.1023/A:1010933404324
Buczak A. L., & Guven E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), pp. 1153-1176. https://doi.org/10.1109/comst.2015.2494502
Camacho J., Macia-Fernandez G., Diaz-Verdejo J., Garcia-Teodoro P. (2014). Tackling the big data 4 vs for anomaly detection. In: Computer communications workshops (INFOCOM WKSHPS), 2014 IEEE conference on. IEEE. pp. 500–505. https://doi.org/10.1177/1550147720921309
Carlos A. Catania, Facundo Bromberg, Carlos Garcнa Garino (2012). An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection Expert Systems with Applications. 39(2), pp. 1822–1829. https://doi.org/10.1016/j.eswa.2011.08.068
Chandola V., Banerjee A., Kumar V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), pp. 71-97. https://doi.org/10.1145/1541880.1541882
Chaudhary K., Yadav J., & Mallick B. (2012). A review of fraud detection techniques: Credit card. International Journal of Computers and Applications, 45(1), pp. 39–44. DOI: 10.5120/6748-8991
Dash M., & Ng W. (2010). Outlier detection in transactional data. Intelligent Data Analysis, 14(3), pp. 283–298. DOI: 10.3233/ida-2010-0422
Denning D. E. (1987). An Intrusion-Detection Model. IEEE transactions on software engineering, 13(2), pp. 222 – 232. https://doi.org/10.1109/TSE.1987.232894
Dewan Md. F., Nouria Harbi, and Mohammad Zahidur Rahman (2010). Combining naive bayes and decision tree for adaptive intrusion detection. International Journal of Network Security & Its Applications (IJNSA), 2(2), pp. 1-12. DOI: 10.5121/ijnsa.2010.2202
Dhanabal, Shantharajah S. P. (2015). A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. International Journal of Advanced Research in Computer and Communication Engineering 9(4) (2015) pp. 446–452. http://dx.doi.org/10.4236/jcc.2016.44008
Dua S., Du X. (2011). Data mining and machine learning in cybersecurity. Boca Raton, FL, CRC Press, 256 p. https://doi.org/10.1201/b10867
Erfani S. M., Rajasegarar S., Karunasekera S., Leckie C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recogn. 58 pp. 121–34. https://doi.org/10.1016/j.patcog.2016.03.028
Farnaaz, N., & Jabbar, M. (2016). Random Forest Modeling for Network Intrusion Detection System. Procedia Computer Science, 89, pp. 213-217. https://doi.org/10.1016/j.procs.2016.06.047
Fawcett T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27 (8), pp. 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
Fujimaki R., Yairi T., Machida K. (2005). An approach to spacecraft anomaly detection problem using kernel feature space. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press, New York, NY, USA, pp. 401–410. https://doi.org/10.1145/1081870.1081917
Garofalo M. (2017). Big data analytics for Flow-based anomaly detection in high-speed networks. PhD Thesis. http://dx.doi.org/10.6093/UNINA/FEDOA/11617
Global - VNI Complete Forecast Highlights https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2021_Forecast_Highlights.pdf
Gogoi P., Bhattacharyya D. K., Borah B., and Kalita J. K. (2011). A survey of outlier detection methods in network anomaly identification. The Computer Journal, 54(4), pp. 570-588. https://doi.org/10.1093/comjnl/bxr026
Goldstein M, Uchida S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE. 11(4). https://doi.org/10.1371/journal.pone.0152173
Gu G., Fogla P., Dagon D., Lee W., Skori B. (2006). Measuring intrusion detection capability: An information-theoretic approach. Proceedings of the ACM Symposium on Information, Computer and Communications Security, pp. 90–101. https://doi.org/10.1145/1128817.1128834
Gupta M., Gao J., Aggarwal C.C., Han J. (2014). Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng. 26(9), pp. 2250–67. https://doi.org/10.1109/TKDE.2013.184
Makrufa S. Hajirahimova (2016). Big data technologies and information security challenges. Problems Information Technologies, №1, pp. 41–46.
He H., Wang J., Graco W., and Hawkins S. (1997). Application of neural networks to detection of medical fraud. Expert Systems with Applications 13(4), pp. 329–336. https://doi.org/10.1016/S0957-4174(97)00045-6
Heydari A. et al. (2015). Detection of review spam: a survey. Expert Syst Appl, 42(7) pp. 3634–42. https://doi.org/10.1016/j.eswa.2014.12.029
Hodge V., Austin J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), pp. 85–126. https://doi.org/10.1007/s10462-004-4304-y
Holz T. (2008). Security measurements and metrics for networks. Lecture Notes in Computer Science, vol. 4909, pp. 157–165. http://dx.doi.org/10.4236/ijcns.2013.61004
Husejinović A. (2020). Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences 8(1), pp. 1-5. http://pen.ius.edu.ba
Xuan S., Liu G., Li Z., Zheng L., Wang S., & Jiang C. (2018). Random forest for credit card fraud detection. IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). https://doi.org/10.1109/ICNSC.2018.8361343
Imamverdiyev Y., Abdullayeva F. (2018). Deep Learning Method for Denial of Service Attack Detection Based on Restricted Boltzmann Machine Big Data, 6(2), pp. 159-169. https://doi.org/10.1089/big.2018.0023
Jin Huang and Charles, Ling X. (2005). Using AUC and Accuracy in Evaluating Learning Algorithms IEEE transactions on knowledge and data engineering, 17(3), pp. 299-310. https://doi.org/10.1109/TKDE.2005.50
KDD data set, 1999. http://kdd.ics.uci.edu/databases/kddcup99
Kim G., Lee S., and Kim S. (2014). A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst. Appl., 41(4), pp. 1690-1700. https://doi.org/10.1016/j.eswa.2013.08.066
Kumari R., Sheetanshu, Singh M. K., Jha R. & Singh N. K. (2016). Anomaly detection in network traffic using K-mean clustering. 3rd International Conference on Recent Advances in Information Technology (RAIT). https://doi.org/10.1109/RAIT.2016.7507933
Lee W., Stolfo S. J., Mok K. W. (2000). Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6), pp. 533-567. https://doi.org/10.1023/A:1006624031083
Marnerides A. K., Spachos P., Chatzimisios P., and Mauthe A. U. (2015). Malware detection in the cloud under Ensemble Empirical Mode Decomposition. In 2015 Int. Conf. Comput. Netw. Commun. IEEE., pp. 82–88. https://doi.org/10.4018/IJESMA.2018070104
McHugh J. (2000). Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3(4), pp. 262–294. https://doi.org/10.1145/382912.382923
Mukherjee B., Heberline L. T., & Levitt K. (1994). Network instruction detection. IEEE Network, 8, pp. 26–41. https://doi.org/10.1007/978-0-387-33112-6_8
Münz G., Li S., Carle G. (2007). Traffic anomaly detection using k-means clustering. In: GI/ITG Workshop MMBnet. pp. 13-14. DOI:10.1.1.323.6870
Nassif A. B. et al. (2021). Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access, vol.9, pp. 78658- 78700. https://doi.org/10.1109/access.2021.3083060
NSL-KDD data set for network-based intrusion detection systems [Electronic resource]. 2017. Access mode: http://nsl.cs.unb.ca/NSL-KDD/
Patcha A., Park J.M., (2007). An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput Netw. 51(12), pp.3448–3470. http://dx.doi.org/10.1016%2Fj.comnet.2007.02.001
Phua C., Lee V., Smith-Miles K., & Gayler R. (2010). A comprehensive survey of data miningbased fraud detection, Research Computing Research Repository https://arxiv.org/ct?url=https%3A%2F%2Fdx.doi.org%2F10.1016%2Fj.chb.2012.01.002&v=67e7929e
Raguseo E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), pp. 187-195. https://doi.org/10.1016/j.ijinfomgt.2017.07.008
Rehman M. H., Liew C. S., Abbas A., Jayaraman P. P., Wah T. Y., & Khan S. U. (2016). Big data reduction methods: a survey. Data Science and Engineering, 1(4), pp. 265-284. https://doi.org/10.1007/s41019-016-0022-0
Samuel A. L. (1959). Some studies in Machine Learning using the game of checkers. IBM Journal of research and development, 3(3), pp. 210–229. https://doi.org/10.1147/rd.33.0210
Saneja B., Rani R. (2017). An efficient approach for outlier detection in big sensor data of health care. International Journal of Communication Systems, 30(17), pp. 1-10. https://doi.org/10.1002/dac.3352
Schlegl T., Seeböck P., Waldstein S. M., Schmidt-Erfurth U., and Langs G. (2017). Unsupervised Anomaly Detection With Generative Adversarial Networks to Guide Marker Discovery. International Conference on Information Processing in Medical Imaging, pp. 146-157. https://doi.org/10.1007/978-3-319-59050-9_12
Shon T., Moon J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), pp. 3799-3821. https://doi.org/10.1016/j.ins.2007.03.025
Srikanth T., Philip B., Jiong J. et al. (2020). A comprehensive survey of anomaly detection techniques for high dimensional big data. Journal of Big Data, 7(42), pp. 1-30. https://doi.org/10.1186/s40537-020-00320-x
Tsai C. F., Hsu Y. F., Lin C. Y., Lin W. Y. (2009). Intrusion detection by machine learning: A review. Expert Syst. Appl., 36(10), pp. 11994-12000. https://doi.org/10.1016/j.eswa.2009.05.029
Varian I. (2020). IMRT (Intensity Modulated Radiation Therapy). 26 June 2020. https://patient.varian.com/en/treatments/radiation-therapy/treatment-techniques
Wang C., Zhao Z., Gong L., Zhu L., Liu Z., & Cheng X. (2018). A Distributed Anomaly Detection System for In-Vehicle Network Using HTM. IEEE ACCESS, 6, pp. 9091-9098. https://doi.org/10.3390/s20143934
Wang L. & Jones R. (2017). Big data analytics for network intrusion detection: A survey. International Journal of Networks and Communications, 7(1), pp. 24-31 doi: 10.5923/j.ijnc.20170701.03
Wauters, M., & Vanhoucke, M. (2017). A Nearest Neighbour extension to project duration forecasting with Artificial Intelligence. European Journal of Operational Research, 259(3), pp. 1097-1111. https://doi.org/10.1016/j.ejor.2016.11.018
Wei Y. et al. (2019). MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers, pp. 1-12. https://doi.org/10.1145/3459930.3469523
Yang T. et al. (2016). Improve the Prediction Accuracy of Naive Bayes Classifier with Association Rule Mining. IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance, and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 129-133. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.38
Zhang J., Zulkernine M., and Haque A. (2008). Random-Forests-Based Network Intrusion Detection Systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(5), pp. 649–659. https://doi.org/10.1109/TSMCC.2008.923876
Zhang M.L., Zhou Z.H., (2005). A k-nearest neighbor based algorithm for multi-label classification / Proc. of the International Conference on Granular Computing, pp. 718–721. https://doi.org/10.1109/GRC.2005.1547385
Zhou Z., (2012). Hua Ensemble Methods. Foundations and Algorithms. CRC Press, p.234. https://doi.org/10.1201/b12207