№1, 2022

EXPERIMENTAL STUDY OF MACHINE LEARNING METHODS IN ANOMALY DETECTION

Makrufa Sh. Hajirahimova, Leyla R. Yusifova

Recently, the widespread usage of computer networks has led to the increase of network threats and attacks. Existing security systems and devices are insufficient in the detection of intruders' attacks on network infrastructure, and they considered to be outdated for storing and analyzing large network traffic data in terms of size, speed, and diversity. Detection of anomalies in network traffic data is one of the most important issues in providing network security. In the paper, we investigate the possibility of using machine learning algorithms in the detection of anomalies – DoS attacks in computer network traffic data on the WEKA software platform. Ensemble model consisting of several unsupervised classification algorithms has been proposed to increase the efficiency of classification algorithms. The effectiveness of the proposed model was studied using the NSL-KDD database. The proposed approach showed a higher accuracy in the detection of anomalies compared to the results shown by the classification algorithms separately (pp.9-19).

Keywords: Big data, Anomaly, DoS attacks, IDS, Machine learning, Ensemble classification
References
  • Abdulhammed R., Faezipour M., Abuzneid A., and AbuMallouh A. (2019). Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network trafc. IEEE Sensors Lett., Jan. 2019 3(1), pp. 1-4. https://doi.org/10.1109/LSENS.2018.2879990
  • Agarwal B., Mittal N. (2012). Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques. Procedia Technology, 6. pp. 996-1003. http://dx.doi.org/10.1016/j.protcy.2012.10.121
  • Aggarwal CC, Philip SY. (2005). An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), pp. 211–221.  https://doi.org/10.1007/s00778-004-0125-5
  • Agrawal S, Agrawal J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, pp. 708–713. https://doi.org/10.1016/j.procs.2015.08.220
  • Akbar S., Nageswara R. K., Chandulal J. A. (2010). Intrusion detection system methodologies based on data analysis. International Journal of Computer Applications. 5(2), pp. 10–20. http://dx.doi.org/10.5120/892-1266
  • Akoglu L., Tong H., Koutra D. (2015). Graph based anomaly detection and description: a survey. Data Mining Knowl Discov. 29(3), pp. 626–88. https://doi.org/10.1007/s10618-014-0365-y
  • Alguliyev R., Aliguliyev R., İmamverdiyev Y. N., Sukhostat L. (2018). Weighted Clustering for Anomaly Detection in Big Data. Statistics, Optimization & Information Computing, 6(2), pp. 178–188.  https://doi.org/10.19139/soic.v6i2.404
  • Alguliyev R. M., Aliguliyev R. M., Imamverdiyev Y. N. and Sukhostat L. V. (2017). An anomaly detection based on optimization. International Journal of Intelligent Systems and Applications, 9(12), pp. 87-96. DOI: 10.5815/ijisa.2017.12.08
  • Aliguliyev R. M., Hajirahimova M. Sh. (2019). Classification Ensemble Based Anomaly Detection in Network Traffic. Review of Computer Engineering Research, vol. 6(1), pp. 12-23. DOI:10.18488/journal.76.2019.61.12.23
  • Almeida V. A., Doneda D., & de Souza Abreu J. (2017). Cyberwarfare and Digital Governance. IEEE Internet Computing. 21(2), pp. 68-71. https://doi.org/10.1109/MIC.2017.23
  • Antal B. and Hajdu A. (2014). An ensemble-based system for automatic screening of diabetic retinopathy. Knowl.-Based Syst., vol. 60, pp. 20–27. https://doi.org/10.1016/j.knosys.2013.12.023
  • Ariyaluran R. A. H., et al. (2019). Real-time big data processing for anomaly detection: A Survey. International Journal of Information Management, vol.45, pp. 289-307. https://doi.org/10.1016/j.ijinfomgt.2018.08.006
  • Bellman R. (2013). Dynamic programming. Chelmsford: Courier Corporation. 
  • Breiman, L. (2001). Random forests. Machine Learning. 45(1), pp. 5–32. https://doi.org/10.1023/A:1010933404324
  • Buczak A. L., & Guven E. (2016). A survey of data mining and machine learning methods for cyber security intrusion detection. IEEE Communications Surveys & Tutorials, 18(2), pp. 1153-1176. https://doi.org/10.1109/comst.2015.2494502
  • Camacho J., Macia-Fernandez G., Diaz-Verdejo J., Garcia-Teodoro P. (2014). Tackling the big data 4 vs for anomaly detection. In: Computer communications workshops (INFOCOM WKSHPS), 2014 IEEE conference on. IEEE. pp. 500–505.  https://doi.org/10.1177/1550147720921309
  • Carlos A. Catania, Facundo Bromberg, Carlos Garcнa Garino (2012). An autonomous labeling approach to support vector machines algorithms for network traffic anomaly detection Expert Systems with Applications. 39(2), pp. 1822–1829. https://doi.org/10.1016/j.eswa.2011.08.068
  • Chandola V., Banerjee A., Kumar V. (2009). Anomaly detection: a survey. ACM Computing Surveys, 41(3), pp. 71-97. https://doi.org/10.1145/1541880.1541882
  • Chaudhary K., Yadav J., & Mallick B. (2012). A review of fraud detection techniques: Credit card. International Journal of Computers and Applications, 45(1), pp. 39–44. DOI: 10.5120/6748-8991
  • Dash M., & Ng W. (2010). Outlier detection in transactional data. Intelligent Data Analysis, 14(3), pp. 283–298. DOI: 10.3233/ida-2010-0422
  • Denning D. E. (1987). An Intrusion-Detection Model. IEEE transactions on software engineering, 13(2), pp. 222 – 232. https://doi.org/10.1109/TSE.1987.232894
  • Dewan Md. F., Nouria Harbi, and Mohammad Zahidur Rahman (2010). Combining naive bayes and decision tree for adaptive intrusion detection. International Journal of Network Security & Its Applications (IJNSA), 2(2), pp. 1-12. DOI: 10.5121/ijnsa.2010.2202 
  • Dhanabal, Shantharajah S. P. (2015). A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. International Journal of Advanced Research in Computer and Communication Engineering 9(4) (2015) pp. 446–452. http://dx.doi.org/10.4236/jcc.2016.44008
  • Dua S., Du X. (2011). Data mining and machine learning in cybersecurity. Boca Raton, FL, CRC Press, 256 p. https://doi.org/10.1201/b10867
  • Erfani S. M., Rajasegarar S., Karunasekera S., Leckie C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class svm with deep learning. Pattern Recogn. 58 pp. 121–34. https://doi.org/10.1016/j.patcog.2016.03.028
  • Farnaaz, N., & Jabbar, M. (2016). Random Forest Modeling for Network Intrusion Detection System. Procedia Computer Science, 89, pp. 213-217. https://doi.org/10.1016/j.procs.2016.06.047
  • Fawcett T. (2006). An Introduction to ROC Analysis. Pattern Recognition Letters, 27 (8), pp. 861–874. https://doi.org/10.1016/j.patrec.2005.10.010
  • Fujimaki R., Yairi T., Machida K. (2005). An approach to spacecraft anomaly detection problem using kernel feature space. In Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining. ACM Press, New York, NY, USA, pp. 401–410. https://doi.org/10.1145/1081870.1081917
  • Garofalo M. (2017). Big data analytics for Flow-based anomaly detection in high-speed networks. PhD Thesis. http://dx.doi.org/10.6093/UNINA/FEDOA/11617
  • Global - VNI Complete Forecast Highlights https://www.cisco.com/c/dam/m/en_us/solutions/service-provider/vni-forecast-highlights/pdf/Global_2021_Forecast_Highlights.pdf
  • Gogoi P., Bhattacharyya D. K., Borah B., and Kalita J. K. (2011). A survey of outlier detection methods in network anomaly identification. The Computer Journal, 54(4), pp. 570-588. https://doi.org/10.1093/comjnl/bxr026
  • Goldstein M, Uchida S. (2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS ONE. 11(4). https://doi.org/10.1371/journal.pone.0152173
  • Gu G., Fogla P., Dagon D., Lee W., Skori B. (2006). Measuring intrusion detection capability: An information-theoretic approach. Proceedings of the ACM Symposium on Information, Computer and Communications Security, pp. 90–101. https://doi.org/10.1145/1128817.1128834
  • Gupta M., Gao J., Aggarwal C.C., Han J. (2014). Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng. 26(9), pp. 2250–67. https://doi.org/10.1109/TKDE.2013.184
  • Makrufa S. Hajirahimova (2016). Big data technologies and information security challenges. Problems Information Technologies, №1, pp. 41–46.
  • He H., Wang J., Graco W., and Hawkins S. (1997). Application of neural networks to detection of medical fraud. Expert Systems with Applications 13(4), pp. 329–336. https://doi.org/10.1016/S0957-4174(97)00045-6
  • Heydari A. et al. (2015). Detection of review spam: a survey. Expert Syst Appl, 42(7) pp. 3634–42. https://doi.org/10.1016/j.eswa.2014.12.029
  • Hodge V., Austin J. (2004). A survey of outlier detection methodologies. Artificial Intelligence Review, 22(2), pp. 85–126. https://doi.org/10.1007/s10462-004-4304-y
  • Holz T. (2008). Security measurements and metrics for networks. Lecture Notes in Computer Science, vol. 4909, pp. 157–165. http://dx.doi.org/10.4236/ijcns.2013.61004
  • Husejinović A. (2020). Credit card fraud detection using naive Bayesian and C4.5 decision tree classifiers. Periodicals of Engineering and Natural Sciences 8(1), pp. 1-5. http://pen.ius.edu.ba
  • Xuan S., Liu G., Li Z., Zheng L., Wang S., & Jiang C. (2018). Random forest for credit card fraud detection. IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). https://doi.org/10.1109/ICNSC.2018.8361343
  • Imamverdiyev Y., Abdullayeva F. (2018). Deep Learning Method for Denial of Service Attack Detection Based on Restricted Boltzmann Machine Big Data, 6(2), pp. 159-169. https://doi.org/10.1089/big.2018.0023
  • Jin Huang and Charles, Ling X. (2005). Using AUC and Accuracy in Evaluating Learning Algorithms IEEE transactions on knowledge and data engineering, 17(3), pp. 299-310. https://doi.org/10.1109/TKDE.2005.50
  • KDD data set, 1999. http://kdd.ics.uci.edu/databases/kddcup99
  • Kim G., Lee S., and Kim S. (2014). A novel hybrid intrusion detection method integrating anomaly detection with misuse detection. Expert Syst. Appl., 41(4), pp. 1690-1700. https://doi.org/10.1016/j.eswa.2013.08.066
  • Kumari R., Sheetanshu, Singh M. K., Jha R. & Singh N. K. (2016). Anomaly detection in network traffic using K-mean clustering. 3rd International Conference on Recent Advances in Information Technology (RAIT). https://doi.org/10.1109/RAIT.2016.7507933
  • Lee W., Stolfo S. J., Mok K. W. (2000). Adaptive intrusion detection: A data mining approach. Artificial Intelligence Review, 14(6), pp. 533-567. https://doi.org/10.1023/A:1006624031083
  • Marnerides A. K., Spachos P., Chatzimisios P., and Mauthe A. U. (2015). Malware detection in the cloud under Ensemble Empirical Mode Decomposition. In 2015 Int. Conf. Comput. Netw. Commun. IEEE., pp. 82–88. https://doi.org/10.4018/IJESMA.2018070104
  • McHugh J. (2000). Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory. ACM Transactions on Information and System Security, 3(4), pp. 262–294. https://doi.org/10.1145/382912.382923
  • Mukherjee B., Heberline L. T., & Levitt K. (1994). Network instruction detection. IEEE Network, 8, pp. 26–41. https://doi.org/10.1007/978-0-387-33112-6_8 
  • Münz G., Li S., Carle G. (2007). Traffic anomaly detection using k-means clustering. In: GI/ITG Workshop MMBnet. pp. 13-14. DOI:10.1.1.323.6870
  • Nassif A. B. et al. (2021). Machine Learning for Anomaly Detection: A Systematic Review. IEEE Access,  vol.9, pp. 78658- 78700. https://doi.org/10.1109/access.2021.3083060
  • NSL-KDD data set for network-based intrusion detection systems [Electronic resource]. 2017. Access mode: http://nsl.cs.unb.ca/NSL-KDD/ 
  • Patcha A., Park J.M., (2007). An overview of anomaly detection techniques: existing solutions and latest technological trends. Comput Netw. 51(12), pp.3448–3470. http://dx.doi.org/10.1016%2Fj.comnet.2007.02.001
  • Phua C., Lee V., Smith-Miles K., & Gayler R. (2010). A comprehensive survey of data miningbased fraud detection, Research Computing Research Repository https://arxiv.org/ct?url=https%3A%2F%2Fdx.doi.org%2F10.1016%2Fj.chb.2012.01.002&v=67e7929e 
  • Raguseo E. (2018). Big data technologies: An empirical investigation on their adoption, benefits and risks for companies. International Journal of Information Management, 38(1), pp. 187-195. https://doi.org/10.1016/j.ijinfomgt.2017.07.008 
  • Rehman M. H., Liew C. S., Abbas A., Jayaraman P. P., Wah T. Y., & Khan S. U. (2016). Big data reduction methods: a survey. Data Science and Engineering, 1(4), pp. 265-284. https://doi.org/10.1007/s41019-016-0022-0
  • Samuel A. L. (1959). Some studies in Machine Learning using the game of checkers. IBM Journal of research and development, 3(3), pp. 210–229.  https://doi.org/10.1147/rd.33.0210
  • Saneja B., Rani R. (2017). An efficient approach for outlier detection in big sensor data of health care. International Journal of Communication Systems, 30(17), pp. 1-10. https://doi.org/10.1002/dac.3352
  • Schlegl T., Seeböck P., Waldstein S. M., Schmidt-Erfurth U., and Langs G. (2017). Unsupervised Anomaly Detection With Generative Adversarial Networks to Guide Marker Discovery. International Conference on Information Processing in Medical Imaging, pp. 146-157. https://doi.org/10.1007/978-3-319-59050-9_12
  • Shon T., Moon J. (2007). A hybrid machine learning approach to network anomaly detection. Information Sciences, 177(18), pp. 3799-3821. https://doi.org/10.1016/j.ins.2007.03.025
  • Srikanth T., Philip B., Jiong J. et al. (2020). A comprehensive survey of anomaly detection techniques for high dimensional big data. Journal of Big Data, 7(42), pp. 1-30. https://doi.org/10.1186/s40537-020-00320-x
  • Tsai C. F., Hsu Y. F., Lin C. Y., Lin W. Y. (2009). Intrusion detection by machine learning: A review. Expert Syst. Appl., 36(10), pp. 11994-12000. https://doi.org/10.1016/j.eswa.2009.05.029
  • Varian I. (2020). IMRT (Intensity Modulated Radiation Therapy). 26 June 2020. https://patient.varian.com/en/treatments/radiation-therapy/treatment-techniques
  • Wang C., Zhao Z., Gong L., Zhu L., Liu Z., & Cheng X. (2018). A Distributed Anomaly Detection System for In-Vehicle Network Using HTM. IEEE ACCESS, 6, pp. 9091-9098. https://doi.org/10.3390/s20143934
  • Wang L. & Jones R. (2017). Big data analytics for network intrusion detection: A survey. International Journal of Networks and Communications, 7(1), pp. 24-31 doi: 10.5923/j.ijnc.20170701.03
  • Wauters, M., & Vanhoucke, M. (2017). A Nearest Neighbour extension to project duration forecasting with Artificial Intelligence. European Journal of Operational Research, 259(3), pp. 1097-1111. https://doi.org/10.1016/j.ejor.2016.11.018
  • Wei Y. et al. (2019). MSD-Kmeans: A Novel Algorithm for Efficient Detection of Global and Local Outliers,  pp. 1-12. https://doi.org/10.1145/3459930.3469523
  • Yang T. et al. (2016). Improve the Prediction Accuracy of Naive Bayes Classifier with Association Rule Mining. IEEE 2nd International Conference on Big Data Security on Cloud, IEEE International Conference on High Performance, and Smart Computing, IEEE International Conference on Intelligent Data and Security, pp. 129-133. https://doi.org/10.1109/BigDataSecurity-HPSC-IDS.2016.38
  • Zhang J., Zulkernine M., and Haque A. (2008). Random-Forests-Based Network Intrusion Detection Systems. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 38(5), pp. 649–659.  https://doi.org/10.1109/TSMCC.2008.923876 
  • Zhang M.L., Zhou Z.H., (2005). A k-nearest neighbor based algorithm for multi-label classification / Proc. of the International Conference on Granular Computing, pp. 718–721. https://doi.org/10.1109/GRC.2005.1547385
  • Zhou Z., (2012). Hua Ensemble Methods. Foundations and Algorithms. CRC Press, p.234. https://doi.org/10.1201/b12207