Software Defect Prediction through Neural Network and Feature Selections

,


Introduction
e volume of the software global business in 2020 was estimated by $389.86 billion, and it is also expected to exceed $428.84 billion by 2028 [1]. Software defect has a hugenancial impact on software industrial. Software failure causes about $1.7 trillion loses in 2017 [2]. Mars Climate Orbiter, Mariner 1,and Arian5 are examples of expensive software failure that causes $193 million, 18.5 million, and 8 million, respectively. However, testing and quality assurance cost can reach up to 23% of software total cost [3]. Moreover, xing software faults during the software development will cost 100 times less than xing the software after deployments [3]. Di erent terms are used to describe software fails such as fault, defect, bugs, problem, errors, and incident. Developer tends to imply defect term when the software is in really severe condition and may even be dangerous [4]. Indeed, software defect refers to any fault occurred during developing any software.
Software defect problems are addressed by the inspectors who meet to inspect the software. Later on, the programmer and moderator make the change [4]. Inspector needs to spend time, e ort, and money to detect the software defect. In addition, the accuracy of detecting the software defect can be a ected by numerous factors, one of them is the human errors. Hence, the human error can make the process of determining software defect more complicated and a highrisk task. erefore, additional opinion is needed to help inspectors to achieve more accurate software defects. Software defect prediction refers to the process that can predict the defected code and specify the area of the defected code through using classifier [5]. For the last few decades' prediction, the software defect has been the subject of interest for many researchers. erefore, many automatic software defect prediction models were proposed that make the inspectors and programmers spend more time and money in enhancing the model than predicting the software defects. In addition, the perditions and classification process made by humans have faced many challenges such as human error and low performance, and most of the prediction and classification process was performed through an automated system. Artificial neural networks (ANNs) is the most common method used by researchers for prediction. ANNs find their way in different applications in the real world like medical diagnosis such as heart disease, Wisconsin breast cancer, thyroid disease, and Pima Indian diabetes [6]. Also, ANN is being applied in the agriculture such as planet leaf disease [7], predicting mass of the fruits [8], and olive oil harvesting time [9]; weather forecasting [10]; language recognitions [11]; and many other applications.
All the data set can be used for training and testing the neural network. However, not all the features in the data set have the same importance for prediction the software defect. For instance, in the large dimensionality problem, features with less importance could cause the generalized performances of ANN to deteriorate [12]. erefore, feature selection tools can be used to determine the features with high importance and to eliminate the feature with less importance [13]. Moreover, different tools can be used to select the feature such as discriminant analysis (DA), principal component analyses (PA), decision tree (T), and multilayer perceptron (MLP). Also, feature selection found many applications worldwide to solve real-world problems such as landslide prediction [14,15], medical application [16,17], food industries [18,19], face recognitions [20,21], and many other applications [22][23][24]. e aim of the study can be achieved by answering the following research questions: (1) How the RBF neural network perform when it used to predict the software defect for fourteen NASA data sets? (2) How the RBF neural network perform when it used to predict the software defect for fourteen NASA data sets after eliminating nonrelated features? (3) How the RBF neural network perform when it compares to previous methods that used to predict software defect for fourteen NASA data sets? (4) How the RBF neural network perform when it compares to previous methods that used to predict the software defect for fourteen NASA data sets after eliminating nonrelated features?
In Section 1, an introduction for this study is introduced. Section 2 summarized the previous work that focused on the prediction methods that used to predict the software defection using the same data set that is used in this study. Section 3 introduced the data set and evaluation methods. Section 5 explained the experimental setup. Result is discussed in Section 6.

Related Work
A frame work to predict software defect code area is introduced in [5].
e proposed system was performed through two steps: feature selection-using abstract syntax trees (AST) method-and prediction-using conventional neural networks.
e system proposed in this study was trained and tested using seven data sets. e data set was imported from PROMISE repository. Also, the system performance was verified using three evaluation metrics including precision, recall, and F-measure.
e proposed system was trained and tested two times; firstly, with all features and secondly with features that are selected by AST, the system achieved better performance when trained and tested with features that are selected by AST.
Software defect prediction using the Bayesian neural network was proposed in [25]. Nine data sets consisting of nine features obtained from the PROMISE data repository were used to train the Bayesian neural network. Training neural network was performed in two phases, using the data without reducing the number of features in the first phase. While in the second training phase, the number of features was reduced to five features by using two methods of feature selection, namely, Relief (Relief Fat tribute Eval with Ranker search method) and Subset Eval attribute evaluator with best first search method. Meanwhile, area under curve (AUC) was the evaluation performance tool. Bayesian neural network with data after feature selection has shown better performance over the performance achieved by using data with all features for training.
Deep belief network (DBN) was used to predict the defect in the software if exists. e DBN performance was evaluated using precision, recall, and F1-measure. As expected, DBN achieved better results when trained after eliminating unimportant features. A ten Java bench mark data set obtained from the PROMISE repository in [26] was used to predict the software defections. In addition, the abstract syntax trees (ASTs) were used as feature selection tool to select the most important factors that can improve the software defect prediction. DBN trained with data after selecting the most important feature achieved better results.
Two classifiers: random forests (RF), logistic regression (LR), and standard data sets obtained from the PROMISE repository was used in [27] to predict the software defect. e data set contains ten features that contribute to determining the place of defect in the code. In a study [27], they proposed a model to predict the software defect which reduced the number of features using the abstract syntax before entering the RF and LR. Results were evaluated by using four performance methods: precision, recall, F-measure, and AUC were used as discriminators. Evaluation method demonstrates that result achieved with feature selections has better accuracy.
Four classifiers: random forest (RF), Naïve Bayes, RPart, and support vector machine (SVM) was used to predict software defect predictions. e aim of the study was proposed by [28] to specify the best classifier among these four classifiers. For this, eighteen data sets were used. Twelve of these data sets were obtained from NASA data set, three from open source data set, and three from commercial data set. Also, the confusion matrix was used for performance evaluation. e study concludes that each classifier has its own strength and weakness when used to classify some data, and the strength of the classifier depends on the data type.
A study proposed in 2018 introduce a model to predict the software defect based on feature selection by Dependent Naïve Bayes (FDNB) and Naïve Bayes neural network. e proposed system was trained and tested using eight NASA data set, each data set conduct of thirty-eight features some of these features believes that does not have a strong relation with prediction the software defect. Cross-validation method with convolution matrix was used to evaluate the performance of the proposed model. e system achieved better performance when it was trained with the selected features [29].
Software defect can be determinate by using some features that are associated with software. Some of these features has a strong relationship with class of software as (defect or nondefect). e authors of [30] introduce a method to investigate the relation between the defect or nondefect of the software and the features. e study conducted eleven data set with features varied between twenty-one features to thirty-nine features (every data set has different number of features). erefore, the similarity measure (SM) method was used to select the strongest relation feature with the software defect. e strongest features were tested using the k-nearest neighbor (KNN) model, and the performance of KNN was evaluated using the AUC model. e results show that KNN achieved better results when it used the selected features over all features.
Since the software defect prediction has been the subject of interest for the last few decades, comparison between different software predictions models was introduced by [31]. e study used four classifiers used by previous works which are bagging, support vector machines (SVM), decision tree (DS), and random forest (RF) classifiers. In addition, each the performance of each classifier was evaluated using precision, recall, F-Score, TPR, and FPR. In this study, the feature selection was not considered. e result of this study shown that RF had the best performance for the most data set.
Conventional neural network was used to predict the number of software defects within two data sets obtained from the NASA data set for the neural network training stage. For testing, two different data sets were used. e proposed classifier performance was evaluated using the mean square error. Which has shown a high performance from this model.
In addition to the previous listed work, different studies have been done to improve the methodology of predicting the software defect such as [32] introduced a frame work for software prediction. e authors of [33] proposed a methodology to use a class imbalance in software learning detection. A study introduced by [34] investigated the performance of using cost-sensitive classification in predicting the software defect.

Materials and Methods
e four main steps in this study include collecting and processing the data set followed by feature selection and software defect as shown in Figure 1.

Data Set.
NASA data set is one of the most common bench mark data sets widely used to for software defect prediction analysis [35][36][37][38][39]. In this research, fourteen NASA data sets was manipulated to train and test the proposed method. Data set obtained from different resources include PROMISE repository http://openscience.us, https://data. mendeley.com/datasets/923xvkk5mm/1, and http:// promise.site.uottawa.ca/SERepository/datasets-page.html. Also, the number of attributes, the number of all samples, the number of defective samples, the number of nondefective samples, the number of missing attributes, and the defect rate for every data set conducted in this study are summarized in Table 1. Furthermore, the defect rate obtained by dividing the number of defected samples to the total number of the sample. For instance, the number of attributes for JM1 data set can be seen in column two (22), the number of all samples can be seen in column three (10885), the number of defective samples can be found in column four (2106), and number of nondefective samples found in column five, while the columns six and seven show the number of missing attributes (0) and defect rate, respectively (19.35).

Data Set Preparation and Analysis.
Data sets in this study have different range of values for every attribute. Table 2 shows a random attribute values. For example, MC1 data set has values between 0.09 and 256. is variance of values can have many effects on the neural network performance. erefore, all data set were normalized before it passes to the radial base function neural network, i.e., each attribute value for every data set was normalized between 0 and 1. Also, the true (defect software) and false (nondefect) values were converted to be one for the true value and zero for false value.
Neural network performance highly relies on the data set that used in the training stage, i.e., neural network trained with a data set that represent all class in will achieve a high performance [11]. erefore, dividing data set to train and test neural network was subject of interest for many previous studies [40]. Moreover, there are no standard for dividing the data set for training and testing neural network porous. Some scholars used 80% of the data set for training and the rest of 20% for testing and validation purposes. Some other scholars used the well-known method k-fold cross validation (k is the number of folds). e latter is based on dividing the data set randomly into specific number of folds (k). Each fold carries the same number of data set samples. Training and Applied Computational Intelligence and Soft Computing testing neural network using the k-fold cross-validation method is performed by holding one of the folds for testing and the rest of the folds for training (k−1). is procedure is repeated k times. Moreover, each fold of the k-folds has the opportunity to pass the training for k−1 time and testing for one time [41]. In this study, the RBF neural network was trained and tested using (5 : 1) cross validation, where one of k-folds used for testing and four of the k-folds used for training.

Background on Feature Selection.
e size of the data that used to train the neural network can have some effect on the performance of the neural network. erefore, feature selection is widely used to choose the most important data to train the neural network. In this paper, the feature selection correlation base method was used to reduce the number of data by choosing the most important features, and this method was used before in [42,43].

Radial Base Function Neural Network.
For this study, the well-known radial base function neural network (RBF) was manipulated to predict the software defect. RBF neural network was being used in different applications such as medical application [44], image processing [45], and agricultures [46]. RB is a forward neural network consisting of three layers: input layer, hidden layer, and output layer as shown in Figure 2. Layers in the RBF neural network are fully connected to each other.   Figure 2 shows the RBF neural network, where X p is the input data set. Each input represents one feature in the data set. e weight between the i th input and RBF hidden layer H are scaled through the input weight U i,h .
Equation (1) shows the input vectors to the RBF hidden layer. Hence, Y p,h is the input to RBF layer (hidden layer), and h is the index to the hidden layer. Equation (2) shows the computation in the hidden layer. φ(·) denotes RBF activation function. RBF used different activation function. Gaussian is the most common one.
where C h is the center of RBF, σh is the width of RBF, and ‖ · ‖ is the Euclidean norm. e RBF output, O P, can be calculated as Hence, the weight values the output layer received from the hidden layer is denoted by w h . e bias weight is denoted by w 0 .

Evaluation Methods.
In previous research, many models was used to predict the software defect. Meanwhile, different performance indicators were used to evaluate the performance of the software defect models such as precision [3],  recall [10], and specificity [30]. Performance indicator used in this study include precision, recall, specificity (SPE), and accuracy (ACC). e abbreviations used in this study are given as follows: (i) TP is the true positive, which represent the true prediction of the positive values (ii) FP is the false positive, which represent the false prediction of the positive values (iii) TN is the true negative, which represent the true prediction of the negative values (iv) FN is the false negative, which represent the false prediction of the negative values Precision: it represents the correctness classification. It calculated by dividing the number of samples that correctly predicted defect divided by the total number of correctly predicted samples: Recall: it represents the rate of defecting models. It is calculated by dividing the number of samples that correctly predicted defect divided by the total number of modules that are actually defective.
Accuracy: it is calculated by taking the ration between the total number of correctly prediction of defect and nondefect samples and total number of prediction samples.
F-measure: it considers the importance of recall and precision are equally.

Experimental Setup
In this research, the radial base function neural network was used to predict the software defect in fourteen data sets. erefore, the experiments were performed for every data set in two stages: first stage by training and testing RBF using all the features in the data set. Secondly, choosing the best feature selection in every data set using WEKA software and then training and testing RBF using that selected data set. Moreover, the K-cross-validation method was used to divided the data randomly before it is used to train the RBF. Meanwhile, precision and recall, SPE, and accuracy were used to evaluate the RBF achieved results. RBF performance was compared with results mentioned in the previous work. e RBF neural network was executed using MATLAB Toolboxes 2014 B version.

Results and Discussion
In this section, the research questions will be answered, and the results will be analysis. Table 3 carries the answer for the first research question "how the RBF neural network performance, when it is used to predict the software defect for fourteen NASA data sets?" Table 3 summarizes the performance of the RBF neural network for prediction of the defect in fourteen data sets before applying the feature selection technique. It shows the RBF neural network performance results using the prediction evaluation indicators: accuracy, F-measure, precision, and recall in the training and testing stage. Recall is the third performance indicator used in this study. As in Table 3, 100% is the highest recall achieved by RBF for MC1, and 76.82% is the lowest recall achieved using MC2. Furthermore, RBF predicted the software defect on CM1, MW1, PC1, PC2, PC3, and PC4 with recall varied between 90.09% and 99.98%. e rest of data set prediction performance ranged between 80.82%-89.91%.

Testing Stage.
As the well-known neural network achieved better performance in the training stage over testing stage, the RBF neural network achieved the highest and lowest accuracy 98.42% and 67.29% for PC2 and MC2 data sets, respectively. Besides PC2, RBF obtained accuracy over 90% for five data set, and they are CM1, MC1, PC1, PC3, and PC4. e prediction accuracy for JM1, KC1, KC4, and MW1 is 80.05% and 86.89%, while the RBF prediction performance for the rest of the data set is lower than 78.25%. e best F-measure performance was 0.99 for PC1, MC1, and PC2. CM1, MW1, and PC3 was predicted with performance over 0.90. JM1, KC1, and KC3 were predated with performance over 0.80, while the lowest F-measure was obtained for KC2 and MC2. For recall performance indicator, MC1 with 99.41% and MC2 77.13% represent the highest and lowest F-measure, respectively. Eight data sets were predicted with performance over 90.0%: CM1, KC1, MC1, MW1, PC1, PC2, PC3, and PC4. e rest of the data set was predicted with performance over 80.0%. e last software prediction indicator used in the testing stage was precision. e highest precision obtained was 99.41% for MC1 data set, while the lowest precision obtained was 70.59% for MC2 data set. Six data sets predict with precision more than 90.0%: CM1, MC1, PC1, PC2, PC3, and PC4. In addition, five data sets predict with precision more than 80.0%: JM1, KC1, KC3, KC4, and MW1. KC2 and PC5 achieved the 79.82% for the former and 78.01% for the latter.
is section will answer for the second research question in this study "How the RBF neural network performance, when it is used to predict the software defect for fourteen NASA data sets after eliminating not related feature? Table 4 shows the number of selected features by using the correlation base method. In this study, only the top 50% of the feature selected by correlation based method was chosen. Table 5 shows the RBF neural network performance after applying the correlation-based method as a feature selection method. e following section will discuss the RBF software prediction performance in both training and testing stages.

Training Stage.
During the training stage, the RBF neural network attained 100% accuracy for predicting the software defect for MC1, PC1, and PC2 data sets. In addition, the RBF accuracy range between 90.71% and 96.49% for the data sets CM1, MW1, PC3, and PC4. e rest of the data set obtained accuracy range between 81.89% and 87.83% except for MC2 71.72%. e 1.00 F-measure was obtained for MC1, PC1, and PC2, while CM1, MW1, and PC3 obtained 0.99. While the F-measure performance for JM1 and PC4 was 0.90 and PC4, respectively. 0.79 is the lowest F-measure for software defect prediction obtained for MC2. e rest of the data set F-measure ranged between 0.84 and 0.89. e highest recall achieved by the RBF neural network was 100% achieved for three data sets MC1, PC1, and PC2. Data sets include CM1, JM1, MW1, PC3, and PC4 which obtained accuracy over 90.00%. Meanwhile, RBF predicts the software defect for KC1, KC2, KC3, KC4, and PC5 with recall performance range between 84.67% and 86.89%. For the precision as an evaluation method used in this study and as Table 5 shows, the RBF neural network attained the highest software defect prediction for PC1 and PC2, directly followed by 99.79% for MC1, 97.96% for PC4, 97.10% for CM1, 95.39% and 91.28% for PC3 and MW1 data set, respectively. e rest of the data ranged between 83.67% and 88.87%. e lowest precision was 72.27% for MC2.

Testing Stage.
e testing result obtained by the RBF neural network using the data set after feature selection is listed under testing result parts in Table 5.
e highest software defect prediction performance was 99.80% for PC2 data set, followed with slight difference 99.01% for MC1 data set. Data sets like PC1 achieved 98.99%, PC4 94.44%, PC3 94.11%, and CM1 93.99%. 70.18% was the lowest software defect prediction accuracy achieved for MC2 data set. e rest of the data set obtained accuracy between 79.02% and 88.90%. e highest F-measure obtained by the RBF neural network was 0.99 obtained for three data sets MC1, PC1, and PC, while the lowest F-measure was 0.76 obtained for MC2 data set. CM1, MW1, and PC3 achieved results between 0.95 and 0.97. e rest of the data set F-measure results were below 0.88. For recall, RBF predicts the software defect with 100% for MC1 and PC1, 99.98% and 96.23% for PC2 and PC3, respectively, 95.17% for PC4, and 92.09% for MW1. e recall performance for the rest of the data set was below 89.00%. Precession performance obtained by the RBF neural network for testing data set varied form the 100% obtained for MC1 and PC2 and 73.27% as lowest results for MC2. Precision results with performance over 90.00% obtained for data sets include CM1, MC2, MW1, PC2, PC3, and PC4. Meanwhile, the precision for two data sets-KC3 and MC2-was 79.30% and 73.27%, respectively. e rest of the data set obtained results between 80.89% and 85.29%. Finally, it is worth to mention that this study is the only study mention to the results values of the precision and recall.

Comparing RBF Performance with Previous Methods.
e third research question in this study is as follows: how does the RBF neural network perform when it is compared to previous methods that are used to predict software defect for the same data set? is section compares the software defect prediction results obtained by the RBF neural network for the fourteen data sets before feature selection, with methods mentioned in the previous studies in the first part of this section. It is worth to mention that the reason behind eliminating the two evaluation methods-precision and recall-is there are not many studies mentioned to the results obtained by using the precision and recall. Furthermore, every data set results will be compared separately with the other previous methods results. Table 6 represents the result of CM1 data set. Eight classification methods are mentioned. It can be seen that the highest F-measure 0.96 obtained using RBF used in this study followed by RBF 0.95 and SVM 0.95 from the study [47]. Since the RBF neural network result is a black box, the reason behind the difference between the RBF result in this study and [34] cannot be explained. e lowest result obtained by SVM 0.78 in the study is proposed in [18]. Meanwhile, the best accuracy (93.99%) obtained by this study followed by SVM with 91.97% and RBF 90.33%. Meanwhile, the lowest accuracy was 75.00% obtained by SVM. Table 7 reflected the results of JM1 data set. e F-measure results obtained from RBF in this study was 0.87, while the highest F-measure obtained was 0.90 by MLP, SVM and RBF. Naive Bayes, RF, DS, and SVM obtained 0.89, 0.76, 0.71, and 0.71, respectively. 89.97% is the highest accuracy achieved by MLP, followed with 84.87% achieved by RBF proposed in this study. e lowest accuracy was 69.00% obtained by SVM proposed in [18]. SVM [34], RBF [34], and Naive Bayes achieved 81.43%, 81.73%, and RBF. e rest of the methods achieved results lower than 80.00%.
Performance of the RBF neural network for KC1 used is presented in Table 8. e highest F-measure was 0.92 obtained by MLP, SVM, and RBF [34]. Naive Bayes obtained 0.90 followed by 0.83 obtained by RBF proposed in this study. e highest accuracy obtained using MLP was 85.51%, while the accuracy 83.25% was obtained using the RBF neural network proposed by this study. MLP [3] obtained the lowest accuracy 79.46%. Naive Bayes, SVM, and RBF [34] obtained accuracy 82.10%, 84.47%, and 84.99%, respectively.     Table 9. Seven previous method results were used to compare with RBF obtained result in this study. Naive Bayes [34], MLP [34], SVM [34], and RBF [34] obtained the same F-measure 0.90. Also, RF, DS, and SVM obtained results 0.82, 0.78, and 0.80, respectively. Accuracy is the second evaluation method which was used to compare the RBF neural network used in this study. 84.78% was the highest accuracy achieved by Naive Bayes, followed by 83.64 and 83.63% accuracy obtained by MLP [34] and RBF [34]. KC2 software defect predicted accuracy was 82.34% and 82.00% for SVM [34] and RF [18], respectively. Also, MLP [3] predicts the software defect for KC2 data set 79.64%. Table 10 reflects the result of KC3 data set. It can be seen that the best F-measure 0.95 was obtained using SVM [34] followed by MLP 0.94, Naive Bayes 0.91, and RBF [34] 0.90. Meanwhile, the lowest F-measure 0.71 obtained by RF. DS, SVM [18], and LR obtained 0.71, 0.78, and 0.78, respectively. e RBF used in this research obtained 0.85. Table 10 shows also the accuracy results where 90.04% and 90.02% was the best accuracy obtained by MLP and SVM [34], respectively. 89.78%, 86.17%, and 82.80% are the accuracy obtained by RBF [34], Naive Bayes, and RF. SVM [18], DS, and RBF in this research obtained 79.01%, 78.01%, and 78.25%, respectively. Also, LR obtained 77.01%, while the J star method obtained the lowest accuracy among the methods mentioned in Table 10.
e results of KC4 data set were compared with three previous methods SVM, RPart, and RF. e best F-measure 0.82 obtained by RPart followed by RF 0.80 and 0.79 for SVM. RBF proposed in this study achieves the best Fmeasure 0.86. It can be noticed from Table 11 that the accuracy did not mention in the previous method, while RBF obtained 83.18%. Table 12 summarizes the result of previous work and RBF for MC1 data set. e highest and lowest F-measure were 0.99 for this research RBF and 0.10 for MLP, SVM [34], and RBF [34]. Naive Bayes and RF obtained the same Fmeasure 0.97. DS 0.95, SVM and LR [18] obtained 0.88. On the other hand, J star obtained 100% accuracy, while MLP and RBF [34] obtained the second and third accuracy with 99.40% and 99.26%, respectively. SVM [34] obtained 99.26%, RF 97%, and DS 94%, while SVM [18] and LR obtained the lowest results 81%. J48 obtained 88.60%.
Seven classification methods are listed in Table 13. 0.82 was the best F-measure obtained Naive Bayes and SVM [34] followed by 0.81 for RBF [34]. MLP and RBF (this research) obtained 0.78 and 0.76, respectively. Meanwhile, RBF [36] Table 14. 0.96 is the best F-measure obtained by two algorithms SVM [34] and RBF [34]; also, MLP [34] and RBF (this research) obtained 0.95 Naive Bayes 0.90. KNN and DT obtained the lowest F-measure 0.44 and 0.16, respectively. e best accuracy obtained to predict the software defect for MW1 data set is 92.19% for SVM [34] followed by 91.99 and 91.09% for RBF while and MLP. RBF [36] respectively. 89.33% and 88.90% for this research respectively . DT and KNN obtained the same accuracy, 86.66%, and 83.63% is the lowest accuracy obtained by Naive Bayes. e PC1 data set results are mentioned in Table 15. RBF (this research) obtained the best F-measure 0.99, while SVM [34] obtained the lowest result 0.07. e two algorithms-MLP and Naive Bayes-obtained 0.11 for both-RF, 0.91; DS, 0.88; LR, 0.85; DT, 0.50; KNN, 0.28; and RBF [36], 0.15. RBF (this research) obtained the best accuracy-98.99% and 30.09%. e lowest accuracy obtained by MLP.94.60% for KNN, 93.13% RBF [34] and DT. 93.09% SVM [34] 91.00% RF, 88.07% Naive Bayes, 87.61 J star, DS 87.00%. 81.00% for LR, and 79.00% SVM [18]. Table 16 shows the results of PC2 data set. It can be noticed that F-measure obtained by MLP, SVM, and RBF [34] is 1.00 and 0.99 for Naive Bayes and RBF (this research). Also, the highest software defect prediction accuracy obtained is 99.80% RBF (this research), 99.59% SVM, 99.58% RBF [34], and 99.52% MLP. In addition, DT 97.69%, RBF [36] Table 19, shows the result of PC5 data set, where the best F-measure 0.99 is obtained by MLP, SVM [34], and RBF [34]. Naive Bayes obtained 0.98, and RBF (this research) obtained 0.80 followed by RF, DS, and SVM [18]  e fourth research question in this study is as follows: "how does the RBF neural network perform when it is compared to previous methods that are used to predict the software defect for fourteen NASA data sets after eliminating the nonrelated feature?" Table 20 reflects the answer for question four. As seen in Table 20, two things can be noticed that this study is the first study mentioned for software defect prediction for KC4 data set. In order to compare the result obtained in this study with results obtained in the previous work, two parameters were selected-F-measure and accuracy. Nevertheless, few studies mentioned the F-measure results, precisely if feature selection techniques performed before predecting the software defect. Table 20 show the best F-measure obtained by RBF (this research), and SVM is 0.95, followed by DT 0.94 and MLP 0.94. While 93.99% is the best accuracy obtained by RBF (this research) followed by 90.69% obtained by SVM, MLP [36], Boost-RF, Bag-RF-FS, DT, and MLP [38] which obtained accuracy between 89.32% and 89.97%. In addition, RF obtained 84.60% and J star obtained the lowest accuracy 81.69% among all listed classification methods.
Nine prediction methods used to predict the software defect in KC1 data set results are listed in Table 20. e MLP [38], SVM [38], and DT obtained the best F-measure with 0.92, 0.91, and 0.91, respectively. RBF (this research) obtained 0.83, and MLP [36] obtained the lowest F-measure 0.43. J star and DT obtained the best accuracy of 85.97% and 85.68%, respectively. e lowest accuracy was 71.08% obtained by J48. Algorithms like RF [29]  For KC3 data set, the highest F-measure was 0.96 attained by using three algorithms MLP [36], DT, and SVM [38], while the lowest was obtained using MLP 0.28. RBF (this research) obtained 0.85. e highest and lowest accuracy was 93.50% and 77.58% attained using MLP [38] and Bag-RF-FS, respectively. SVM [38] and DT obtained 93.28% and 93.11%, respectively while MLP 82.75%, Jstar and RF 80.50% for both, 79.13% for Boost-RF. e RBF (this research) obtained 78.25%.    e result of software defect prediction, obtained using the RBF neural network For KC4 data set, after feature selection can be seen in Table 20, where 0.86 is for F-measure and 83.18% is for accuracy. Moreover, KC4 was not a subject of interest for researchers; therefore, no previous work was found.
Different algorithms were used to predict the software defect for MC1 data set, such as Boost-RF, Bag-RF-FS, MLP, and RBF (this research). e latter is the only algorithm to calculate the F-measure, and it was 0.99. Also, the RBF neural network attained the highest accuracy with 99.01%, while the rest algorithms obtained 97.06%. e same algorithms used with the MC1 data set were used with the MC2 data set. F-measure obtained using RBF (this research) is the highest with 0.76, while the lowest is 0.33 for Boost-RF and 0.36 for Bag-RF-FS. MLP obtained the best accuracy (97.60%) and RBF (this research) obtained 70.18%, while Boost-RF and Bag-RF-FS obtained 64.86% and 62.16%, respectively. e highest F-measure was result of MW1 was 0.95 and the lowest was 0.40 obtained using the RBF [this research] and MLP. Moreover, 92.00% was the best accuracy obtained using MLP and followed 89.33% obtained using Boost-RF and Bag-RF-FS. RBF obtained 88.90% accuracy. e result of Software defect prediction for PC1 data set was summarized in Table 20. e best and lowest F-measure obtained were 0.99 and 0.42 obtained using RBF (this research) and MLP [38] respectively. e algorithms DT, MLP [36], and SVM obtained 0.96. RBF obtained the highest accuracy 98.99% followed by MLP [36] (96.56%) and Boost-RF Bag-RF-FS with 96.07%. DT and MLP [36] obtained 93.58% and 93.59%, respectively. SVM obtained 93.10%, and J star and RF obtained the lowest accuracy (91.17%).
Four algorithms' results for PC2 are mentioned in Table 20. RBF (in this research) with 0.99 is the only study which mentioned the F-measure. In addition, the highest accuracy obtained is 99.80% using RBF and lowest accuracy is 97.23% using Boost-RF. Meanwhile, Bag-RF-FS and MLP obtained 97.69%. e best F-measure result obtained for PC3 data set was 0.97 using RBF (this research), and the lowest was 0.14 using MLP. e accuracy (87.34%) obtained using the algorithms Boost-RF and Bag-RF-FS.MLP obtained the lowest accuracy (85.12%). e lowest and highest F-measure for PC4 data set was obtained using MLP and RBF (this research), respectively. e lowest and highest accuracy for PC4 was obtained using RBF (this research) (94.44%) and MLP obtained 88.97%. Boost-RF obtained 91.60%, and Bag-RF-FS obtained 90.81. e last data set in this study is PC5, where MLP obtained the lowest F-measure (0.24) and RBF (this research) obtained the best. e highest and lowest accuracy was obtained using RBF (this research) (79.02%), and the lowest was 74.80% using MLP. Also, Boost-RF obtained 75.78%, and Bag-RF-FS obtained 76.96%.
As seen in Table 20 and discussed in this section. RBF and feature selection proposed in this research has shown better F-measure results than the previous methods for the data sets CM1, JM1, KC2, KC4, MC1, MC2, MW1, PC1, PC2, PC3, PC4, and PC5. However, RBF and feature selection proposed in this research has shown the lowest F-measure results than previous methods for two data

Conclusion
Software defect is one of the major causes of software failure. is study introduced a model to predict the software defect. e model consists of the correlation base method for feature selection and RBF for prediction. e study focused on predicting the software defect in fourteen well-known NASA data sets, namely, CM1, JM1, KC1, KC2, KC3, KC4, MC1, MC2, MW1, PC1, PC2, PC3, PC4, and PC5. RBF neural network was used to test the fourteen data sets in two stages: first, using the data set before performing feature selection. In the second stage, the correlation base method was used to select the important features. K-cross-validation methods were performed to divided the data set that is used to train and test the neural network. Furthermore, precision, recall, F-measure, and accuracy were performed to evaluate the RBF neural network performance. Results obtained in this study was compared with the results obtained in other methods. RBF neural network has shown a better performance when it is compared with other methods in data such as CM1, KC4, MC1, PC1, PC2, PC3, PC4, and PC5. e main goal of the study was to introduce a model that can detect the software defect with high accuracy; therefore, four questions mentioned earlier in this study were answered. e limitation of the proposed method is that it could not predict the software defect with higher performance for the KC1, KC2, KC3, MC1, and MW1 data sets as some previous methods do. e feature work must focus on these data sets.
In conclusion, the proposed method in this study can be used to detect the software defects.

Conflicts of Interest
e author declares no conflicts of interest.