№2, 2023

SOFTWARE DEFECT PREDICTION USING THE MACHINE LEARNING METHODS

Tamilla A. Bayramova

Reliability of software systems is one of the main indicators of quality. Defects occurring when developing software systems have a direct effect on reliability. Precise prediction of defects in software systems helps software engineers to ensure the reliability of software systems and to properly allocate resources for the trial process. The development of an ensemble method by combining several classification methods occupies one of the main places in research conducted in the field of error prediction in software modules. This paper proposes a method based on the application of ensemble training for defect detection. Here, a database obtained from PROMISE and GITHUB software engineering registry is used to detect defects. Experiments are conducted using Weka software. The prediction efficiency is evaluated based on F-measure and ROC-area. As a result of experiments, the defect detection accuracy of the proposed method is proven to be higher than that of individual machine learning methods (pp.23-31).

Keywords: Random Forest, Naive Bayes, Bagging, Boosting, Ensemble, Software defect prediction
References
  • Alazzam I., Alsmadi I., Akour M., (2017). Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods, Int. J. Data Anal. Techn. Strategies, vol. 9, no. 1, p. 1, doi: 10.1504/ijdats.2017.10003991.)
  • Aljamaan H., Alazba A. (2020). Software defect prediction using tree-based ensembles. In Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE 2020). Association for Computing Machinery, New York, NY, USA, pp.1–10. https://doi.org/10.1145/3416508
  • Alsaeedi A., Zubair Khan M. (2019). Software Defect Prediction Using SupervisedMachine Learning and Ensemble Techniques:A Comparative Study, Journal of Software Engineering and Applications, 12, pp.85-100.
  • Bowes D., Hall T., (2018). Software defect prediction: do different classifiers find the same defects? Petrić J. Software defect prediction: do different classifiers find the same defects? Software Qual J., 26, pp.525–552 https://doi.org/10.1007/s11219-016-9353-3.
  • Catal C., Diri B. (2009). Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Information Sciences, 179, pp. 1040–1058.
  • Chug A., Dhall S., (2013). Software defect prediction using supervised learning algorithm and unsupervised learning algorithm, Confluence 2013: The Next Generation Information Technology Summit (4th International Conference), Noida, pp. 173-179, doi: 10.1049/cp.2013.2313.
  • Cortes C., LeCun Y., Vapnik V., Drucker H., Jackel L. D. (2008). Boosting and other ensemble methods, Neural Comput., vol. 6, no. 6, pp. 1289–1301,
  • Ezekiel O.O., Irhebhude M. E., Evwiekpaefe A.E. and Nonyelum O., F. (2020). Evaluation of Machine Learning Classification Techniques in Predicting Software Defects, Transactionson Machine Learning and Artificial Intelligence, Volume 8 No 5 August, pp: 1-15
  • Ge J., Liu J., Liu, W. (2018). Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification DataSets. 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 27-29 June, Busan, 399-406.
  • Hussain S., Keung J., Khan A. A., and Bennin K. E., (2015). Performance evaluation of ensemble methods for software fault prediction, in Proc. ASWEC 24th Australas. Softw. Eng. Conf. (ASWEC), vol. 2, Sep. pp. 91–95, doi: 10.1145/2811681.2811699.
  • Jayanthi R., Florence L. (2019). Software defect prediction techniques using metrics based on neural network classifier. Cluster Comput 22 (Suppl 1), pp.77–88. https://doi.org/10.1007/s10586-018-1730-1.
  • Jin C., Jin Sh. (2015). Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Applied Soft Computing 35 (Oct. 2015), pp.717–725.     https://doi.org/10.1016/j. asoc.2015.07.006  
  • Kanmani S., Uthariaraj V. R., Sankaranarayanan V., Thambidurai P. (2007). Object-oriented software fault prediction using neural networks. Information and Software Technology 49, 5, pp.483–492. https://doi.org/10.1016/ j.infsof.2006.07.005),
  • Kaur A., Kaur K. (2014). Performance analysis of ensemble learning for predicting defects in open source software, 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Delhi, India, pp. 219-225, doi: 10.1109/ICACCI. 2014.6968438
  • Kazimov T.H., Bayramova T. A., Malıkova N.C. (2021). Research of intelligent methods of software testing. System Research & Information Technologies, № 4, pp. 42- 52.
  • Kazimov T.H., Bayramova T. A. (2022). Development of a hybrid method for calculation of software complexity // System Research & Information Technologies, № 2, pp. 32- 44.
  • Qiao L, Li X., Umer Q., Guo P. (2020). Deep learning based software defect prediction, Neurocomputing, Vol. 385, pp. 100-110, https://doi.org/10.1016/j.neucom. 2019.11.067
  • Li R., Zhou L., Zhang Sh., Liu H., Huang X., Sun Z. (2019). Software Defect Prediction Based on Ensemble Learning. In Proceedings of the 2019 2nd International Conference on Data Science and Information Technology (DSIT 2019). Association for Computing Machinery, New York, NY, USA, 1–6. https://doi.org/10.1145/3352411. 3352412.
  • Malhotra R. (2015). A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing, 27, pp.504–518. https://doi.org/10.1016/j.asoc.2014.11.023,
  • Matloob F. et al., (2021). Software Defect Prediction Using Ensemble Learning: A Systematic Literature Review, in IEEE Access, vol. 9, pp. 98754-98771, doi: 10.1109/ACCESS.2021.3095559.
  • Menzies T., Milton Z., Turhan B., Cukic B., Jiang Y., and Bener A. (2010). Defect prediction from static code features: current results, limitations, new approaches. Automated Software Engineering 17, 4, pp.375–407. https://doi.org/10.1007/s10515-010-0069-5.
  • NASA Defect Datasethttps://github.com/klainfo/NASADefectDataset/tree/master/OriginalData/MDPI.
  • NASA metrics data program, PROMISE software engineering repository, 2004, http://promise.site.uottawa.ca/SERepository/datasets-page.html
  • Rathore S.S., Kumar S. A. (, 2016). Decision Tree Regression Based Approach forthe Number of Software Faults Prediction. ACM SIGSOFT Softw Are Engineering Notes41, pp.1-6. https://doi.org/10.1145/2853073.2853083
  • Rathore S.S., Kumar, S. (2017). An Empirical Study of Some Software Fault Prediction Techniques for the Number of Faults Prediction. Soft Computing, 21,7417-7434.
  • Rodriguez J. J., Kuncheva L. I., Alonso C. J., (2006). Rotation forest: A new classifier ensemble method,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 10, pp. 1619.
  • Rokach L., (2009). Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography, Comput. Statist. Data Anal., vol. 53, no. 12, pp. 4046–4072.
  • Rokach L. (2010). Ensemble-based classifiers. Artificial Intelligence Review 33, 1, pp.1-39. https://doi.org/10.1007/s10462-009-9124-7.
  • Singh P., Pal N. R., Verma S.. Vyas O. P. (2017). Fuzzy Rule-Based Approach for Software Fault Prediction, in IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 47, no. 5, pp. 826-837, doi: 10.1109/TSMC.2016.2521840.
  • Stapor K. (2017). Evaluating and comparing classifiers: Review, some recommendations and limitations, in Proc. Int. Conf. Comput. Recognit. Syst., pp. 12–21.
  • Thota M. K., Shajin F. H, Rajesh P., (2020). Survey on software defect prediction techniquesVol. 17, No. 4, pp.100-110.
  • Wang H. (2014). Software Defects Classification Prediction Based on Mining Software Repository. Master’s Thesis, Uppsala University, Department of InformationTechnology. p.93.
  • Witten H., Frank E., Hall M. A., Pal C. J., (2009). Data Mining: Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 1, 10–18. https://doi.org/10.1145/1656274.1656278
  • Yan Z., Chen X. and Guo P. (2010). Software Defect Prediction Using Fuzzy Sup-port Vector Regression. In: Zhang, L., Lu, B. and Kwok, J., Eds., Advances in NeuralNetworks, Springer, Berlin, pp.17-24. https://doi.org/ 10.1007/978-3-642-13318-3_3
  • Zhang Ch., Ma Y. (2012). Ensemble Machine Learning: Methods and Applications, Springer New York, NY, p.332.
  • Zhang H. (2009). An Investigation of the Relationships between Lines of Code and Defects. 2009 IEEE International Conference on Software Maintenance, 20-26 September Edmonton, 274-283.
  • Zhang Z., Jing X., Wang T. (2017). Label propagation based semi-supervised learning for software defect prediction. Automated Software Engineering, 24, pp.47–69. https://doi.org/10.1007/s10515-016-0194-x.
  • Brownlee J., (2020). ROC Curves and Precision-Recall Curves for Imbalanced Classification,
  • https://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-imbalanced-classification/ (accessed Jun. 1, 2022)