Accurate estimation of software development effort is essential for effective management and control of software development projects. Many software effort estimation methods have been proposed in the literature including computational intelligence models. However, none of the existing models proved to be suitable under all circumstances; that is, their performance varies from one dataset to another. The goal of an ensemble model is to manage each of its individual models’ strengths and weaknesses automatically, leading to the best possible decision being taken overall. In this paper, we have developed different homogeneous and heterogeneous ensembles of optimized hybrid computational intelligence models for software development effort estimation. Different linear and nonlinear combiners have been used to combine the base hybrid learners. We have conducted an empirical study to evaluate and compare the performance of these ensembles using five popular datasets. The results confirm that individual models are not reliable as their performance is inconsistent and unstable across different datasets. Although none of the ensemble models was consistently the best, many of them were frequently among the best models for each dataset. The homogeneous ensemble of support vector regression (SVR), with the nonlinear combiner adaptive neurofuzzy inference systems-subtractive clustering (ANFIS-SC), was the best model when considering the average rank of each model across the five datasets.
Software development effort estimation is one of the core tasks in software project management. It is defined as “the process of predicting the effort required to develop a software system” [
In this paper, we have developed different homogeneous and heterogeneous ensembles of some optimized hybrid of computational intelligence models for software development effort estimation. Different linear and nonlinear combiners have been used. We have conducted an empirical study to evaluate and compare the performance of these ensembles using five popular datasets. The rest of this paper is organized as follows. Section
Software effort estimation methods can be grouped into three general approaches [
Algorithmic models represent the relationship between characteristic(s) of a software project, usually software size, and its development effort. These models are parametric in nature with a formula of standard form that is parameterized from historical data. Examples of such models include constructive cost model (COCOMO) [
Computational intelligence models, in recent years, have been widely applied to software effort estimation. Examples include neural networks [
Recently, few research studies have investigated the use of homogeneous ensemble models for software effort estimation. Braga et al. [
Some other few studies have recently investigated the use of heterogeneous ensemble models for software effort estimation. Kocaguneli et al. [
This paper differs from the above related works on the use of ensemble models for software effort estimation in several aspects. This paper investigates and compares both homogeneous and heterogeneous ensembles of hybrid computational intelligence models. Furthermore, in addition to simple linear combiners, this paper investigates and compares several nonlinear combiners. A comparison between this paper and related works is provided in Table
Comparison of related works on the use of ensemble models for software effort estimation.
Study | Ensemble type | Base learner(s) | Combination rule(s) | Number of datasets |
---|---|---|---|---|
Braga et al. [ |
Homogeneous |
Linear regression | Linear (averaging) | 1 |
Homogeneous |
MLP | Linear (averaging) | ||
Homogeneous |
M5P regression trees | Linear (averaging) | ||
Homogeneous |
M5P model trees | Linear (averaging) | ||
Homogeneous |
SVR | Linear (averaging) | ||
| ||||
Kultur et al. [ |
Homogeneous |
MLP | Nonlinear (average of largest cluster obtained using adaptive resonance theory (ART) algorithm) | 5 |
| ||||
Minku and Yao [ |
Homogeneous |
MLP | Linear (averaging) | 18 |
Homogeneous |
RBF | Linear (averaging) | ||
Homogeneous |
Regression Trees | Linear (averaging) | ||
| ||||
Kocaguneli et al. [ |
Heterogeneous | Gaussian process, MLP, RBF, SMOReg, SVMReg, IBk, LWL, additive regression, bagging with decision tree, RandomSubSpace, DecisionStump, M5P, ConjunctiveRule, DecisionTable | Linear (averaging) | 3 |
| ||||
Kocaguneli et al. [ |
Heterogeneous | ABE0-1NN, ABE0-5NN, SWReg, CART (yes), CART (no), NNet, LReg, PCR, PLSR | Linear (mean, median, inverse-ranked weighted mean (IRWM)) | 20 |
| ||||
Elish [ |
Heterogeneous | MLP, RBF, RT, KNN, SVR | Linear (median) | 5 |
| ||||
Homogeneous (bagging) | MLP | Linear (averaging, weighted averaging) and nonlinear (MLP |
||
This paper | Homogeneous |
SVR | 5 | |
Homogeneous |
ANFIS | |||
Heterogeneous | MLP, SVR, ANFIS |
A hybrid computational intelligent (HCI) model combines at least two computational intelligent (CI) techniques. For example, the combination of an artificial neural network (ANN) with a fuzzy inference system (FIS) results in a hybrid neurofuzzy system. HCI models are defined as any effective combination of CI techniques in sequential or parallel manner that perform superior to simple CI techniques [
In this paper we have used nonlinear (categorical) principal component analysis (PCA) along with the different CI models results in HCI models. PCA was first introduced by Pearson in 1901 and become a popular tool in data analysis. PCA finds the directions in which a cloud of data points is stretched most. The objective of PCA is to perform dimensionality reduction while preserving the randomness in the high-dimensional space. PCA performs a mapping of the data to a lower dimensional space in such a way that the variance of the data in the low-dimensional representation is maximized. The basic idea behind using PCA for feature selection prior to regression is to select variables according to the magnitude (from largest to smallest in absolute values) of their coefficients. In the proposed ensemble models, PCA seeks to replace more or less correlated variables by uncorrelated combinations (projections) of the original variables. Also, PCA is used to perform dimension reduction and variable selection based on the resulting variable loadings.
An ensemble model employs a group of multiple learning algorithms and combines their outputs acting as a single decision maker. Figure
Ensemble model structure.
The formula for weighted average method is
Heterogeneous ensemble consists of members having different base learning algorithms. We developed one heterogeneous ensemble model having PCA-based CI models of type MLP, SVR, and ANFIS. At first we provided the input in MLP. We selected the poorly predicted training data by MLP and provided it to train the SVR and later on the poorly predicted training data by SVR is provided to ANFIS for training. In this way the model would become diverse by having training datasets.
Homogeneous ensemble consists of members having a single-type base learning algorithm. In this case ensemble members can be different by the structure. We developed three homogeneous ensemble models and each has three PCA-based CI models of type MLP, SVR, and ANFIS. Moreover, in the proposed ensembles, we optimized their parameters using an evolutionary algorithm based on the genetic algorithm (GA). To improve the efficiency of PCA approach, the GA has been used to select the features that would increase the performance in both training phase and test phase. We used GA to extract the most important feature for improving time and accuracy of their methods, and the PCA is used for feature extraction and classification, respectively.
Table
Investigated homogeneous and heterogeneous ensemble models.
Ensemble type | Base learner(s) | Combination type | Combination rule | Abbreviation |
---|---|---|---|---|
Homogeneous | MLP | Linear | Averaging | HM-MLP-[Avg] |
Weighted averaging | HM-MLP-[WtAvg] | |||
Nonlinear | MLP | HM-MLP-[MLP] | ||
SVR | HM-MLP-[SVR] | |||
FIS-FCM | HM-MLP-[FIS-FCM] | |||
FIS-SC | HM-MLP-[FIS-SC] | |||
ANFIS-FCM | HM-MLP-[ANFIS-FCM] | |||
ANFIS-SC | HM-MLP-[ANFIS-SC] | |||
SVR | Linear | Averaging | HM-SVR-[Avg] | |
Weighted averaging | HM-SVR-[WtAvg] | |||
Nonlinear | MLP | HM-SVR-[MLP] | ||
SVR | HM-SVR-[SVR] | |||
FIS-FCM | HM-SVR-[FIS-FCM] | |||
FIS-SC | HM-SVR-[FIS-SC] | |||
ANFIS-FCM | HM-SVR-[ANFIS-FCM] | |||
ANFIS-SC | HM-SVR-[ANFIS-SC] | |||
ANFIS | Linear | Averaging | HM-ANFIS-[Avg] | |
Weighted averaging | HM-ANFIS-[WtAvg] | |||
Nonlinear | MLP | HM-ANFIS-[MLP] | ||
SVR | HM-ANFIS-[SVR] | |||
FIS-FCM | HM-ANFIS-[FIS-FCM] | |||
FIS-SC | HM-ANFIS-[FIS-SC] | |||
ANFIS-FCM | HM-ANFIS-[ANFIS-FCM] | |||
ANFIS-SC | HM-ANFIS-[ANFIS-SC] | |||
| ||||
Heterogeneous | MLP, SVR, and ANFIS | Linear | Averaging | HT-(MLP, SVR, ANFIS)-[Avg] |
Weighted averaging | HT-(MLP, SVR, ANFIS)-[WtAvg] | |||
Nonlinear | MLP | HT-(MLP, SVR, ANFIS)-[MLP] | ||
SVR | HT-(MLP, SVR, ANFIS)-[SVR] | |||
FIS-FCM | HT-(MLP, SVR, ANFIS)-[FIS-FCM] | |||
FIS-SC | HT-(MLP, SVR, ANFIS)-[FIS-SC] | |||
ANFIS-FCM | HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | |||
ANFIS-SC | HT-(MLP, SVR, ANFIS)-[ANFIS-SC] |
At first we divided the whole datasets into training and testing datasets. Around 80% of the datasets is used for training and the rest 20% were used for testing. The ensemble members are actually trained using 80% data of the training set and the rest is used for model validation. After the first run of the algorithm, in each of the following runs we have selected the same amount of actual training data, that is, the 80% of the whole training set as selected in the previous run which are poorly predicted by the CI model.
We conducted an empirical study to evaluate and compare the performance of the homogeneous and heterogeneous ensemble models under investigation in estimating software development effort. This section discusses the conducted empirical study and its results.
Five well-known datasets were used in this empirical study. These datasets, which are described next, have been widely used in the literature. A summary of their characteristics is provided in Table
Characteristics of datasets.
Dataset | No. of observations |
No. of features |
No. of numerical |
No. of categorical |
---|---|---|---|---|
Albrecht | 24 | 6 | 6 | 0 |
Miyazaki | 48 | 7 | 7 | 0 |
Maxwell | 62 | 25 | 3 | 22 |
COCOMO | 63 | 16 | 16 | 0 |
Desharnais | 77 | 8 | 7 | 1 |
Albrecht dataset [
Miyazaki dataset [
Maxwell dataset [
COCOMO dataset [
Desharnais dataset [
In order to assess and compare the different estimation models, three performance evaluation metrics were considered. The first metric is mean magnitude of relative error (MMRE), which is calculated as follows:
The third performance metric is a recently proposed evaluation function (EF) [
This section evaluates whether the hybridization of an individual model improves its estimation performance. If so, we will use the hybrid version of it in the development of the ensemble models; otherwise we use it as it is. In other words, the performance of the individual SVR model was compared to the hybrid PCA-SVR model and the hybrid PCA-GA-SVR model. The individual MLP and ANFIS models were also compared against their hybrid versions. Table
Performance of individual and hybrid models based on EF metric.
Dataset | ||||||
---|---|---|---|---|---|---|
Albrecht | Miyazaki | Maxwell | COCOMO | Desharnais | ||
SVR | Individual | 0.243 | 0.274 | 0.326 | 0.010 |
|
PCA |
|
0.503 | 0.478 |
|
0.258 | |
PCA-GA | 0.000 |
|
|
0.070 | 0.062 | |
| ||||||
MLP | Individual |
|
0.299 |
|
0.092 |
|
PCA | 0.300 |
|
0.234 |
|
0.227 | |
PCA-GA | 0.245 | 0.154 | 0.193 | 0.017 | 0.070 | |
| ||||||
ANFIS | Individual |
|
0.538 | 0.087 | 0.043 |
|
PCA | 0.139 |
|
|
|
0.000 | |
PCA-GA | 0.000 | 0.203 | 0.059 | 0.007 | 0.000 |
Performance of individual and hybrid models based on EF metric using Albrecht dataset.
Performance of individual and hybrid models based on EF metric using Miyazaki dataset.
Performance of individual and hybrid models based on EF metric using Maxwell dataset.
Performance of individual and hybrid models based on EF metric using COCOMO dataset.
Performance of individual and hybrid models based on EF metric using Desharnais dataset.
In Albrecht dataset, as observed from Table
In Miyazaki dataset, as observed from Table
In Maxwell dataset, as observed from Table
In COCOMO dataset, as observed from Table
In Desharnais dataset, as observed from Table
This section evaluates and compares the estimation performance of the homogeneous and heterogeneous ensemble models under investigation. Tables
Models’ performance using Albrecht dataset.
Model type | Model | MMRE | PRED(25) | EF |
---|---|---|---|---|
Individual | MLP | 64.60 | 40.0 | 0.61 |
SVR | 81.40 | 20.0 | 0.24 | |
ANFIS | 78.61 | 20.0 | 0.25 | |
| ||||
Homogeneous ensemble |
HM-MLP-[Avg] | 32.99 | 20.0 | 0.59 |
HM-MLP-[WtAvg] | 44.13 | 40.0 | 0.89 | |
HM-MLP-[MLP] | 54.23 | 40.0 | 0.72 | |
HM-MLP-[SVR] | 74.38 | 40.0 | 0.53 | |
HM-MLP-[FIS-FCM] | 61.43 | 20.0 | 0.32 | |
HM-MLP-[FIS-SC] | 43.87 | 40.0 | 0.89 | |
HM-MLP-[ANFIS-FCM] | 72.52 | 40.0 | 0.54 | |
HM-MLP-[ANFIS-SC] | 63.04 | 20.0 | 0.31 | |
| ||||
Homogeneous ensemble |
HM-SVR-[Avg] | 67.65 | 20.0 | 0.29 |
HM-SVR-[WtAvg] | 68.02 | 20.0 | 0.29 | |
HM-SVR-[MLP] | 69.30 | 20.0 | 0.28 | |
HM-SVR-[SVR] | 74.25 | 40.0 | 0.53 | |
HM-SVR-[FIS-FCM] | 77.52 | 20.0 | 0.25 | |
HM-SVR-[FIS-SC] | 94.58 | 0.0 | 0.00 | |
HM-SVR-[ANFIS-FCM] | 48.63 | 60.0 | 1.21 | |
HM-SVR-[ANFIS-SC] | 38.20 | 60.0 | 1.53 | |
| ||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 75.88 | 0.0 | 0.00 |
HM-ANFIS-[WtAvg] | 69.43 | 20.0 | 0.28 | |
HM-ANFIS-[MLP] | 68.16 | 40.0 | 0.58 | |
HM-ANFIS-[SVR] | 80.83 | 40.0 | 0.49 | |
HM-ANFIS-[FIS-FCM] | 67.24 | 20.0 | 0.29 | |
HM-ANFIS-[FIS-SC] | 108.83 | 0.0 | 0.00 | |
HM-ANFIS-[ANFIS-FCM] | 67.30 | 20.0 | 0.29 | |
HM-ANFIS-[ANFIS-SC] | 136.01 | 0.0 | 0.00 | |
| ||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 84.79 | 0.0 | 0.00 |
HT-(MLP, SVR, ANFIS)-[WtAvg] | 73.30 | 20.0 | 0.27 | |
HT-(MLP, SVR, ANFIS)-[MLP] | 92.03 | 20.0 | 0.21 | |
HT-(MLP, SVR, ANFIS)-[SVR] | 87.70 | 20.0 | 0.23 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 106.82 | 0.0 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 86.77 | 0.0 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 60.59 | 20.0 | 0.32 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 58.82 | 20.0 | 0.33 |
Models’ performance using Miyazaki dataset.
Model type | Model | MMRE | PRED(25) | EF |
---|---|---|---|---|
Individual | MLP | 65.86 | 20.0 | 0.30 |
SVR | 108.37 | 30.0 | 0.27 | |
ANFIS | 36.18 | 20.0 | 0.54 | |
| ||||
Homogeneous ensemble |
HM-MLP-[Avg] | 176.62 | 20.0 | 0.11 |
HM-MLP-[WtAvg] | 176.60 | 20.0 | 0.11 | |
HM-MLP-[MLP] | 69.23 | 30.0 | 0.43 | |
HM-MLP-[SVR] | 107.68 | 30.0 | 0.28 | |
HM-MLP-[FIS-FCM] | 317.26 | 10.0 | 0.03 | |
HM-MLP-[FIS-SC] | 82.92 | 40.0 | 0.48 | |
HM-MLP-[ANFIS-FCM] | 56.14 | 40.0 | 0.70 | |
HM-MLP-[ANFIS-SC] | 95.54 | 10.0 | 0.10 | |
| ||||
Homogeneous ensemble |
HM-SVR-[Avg] | 48.05 | 30.0 | 0.61 |
HM-SVR-[WtAvg] | 49.49 | 30.0 | 0.59 | |
HM-SVR-[MLP] | 93.79 | 40.0 | 0.42 | |
HM-SVR-[SVR] | 105.06 | 20.0 | 0.19 | |
HM-SVR-[FIS-FCM] | 85.07 | 40.0 | 0.46 | |
HM-SVR-[FIS-SC] | 39.89 | 40.0 | 0.98 | |
HM-SVR-[ANFIS-FCM] | 106.68 | 30.0 | 0.28 | |
HM-SVR-[ANFIS-SC] | 35.49 | 60.0 | 1.64 | |
| ||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 30.74 | 30.0 | 0.95 |
HM-ANFIS-[WtAvg] | 30.74 | 30.0 | 0.95 | |
HM-ANFIS-[MLP] | 103.44 | 30.0 | 0.29 | |
HM-ANFIS-[SVR] | 114.64 | 40.0 | 0.35 | |
HM-ANFIS-[FIS-FCM] | 138.52 | 20.0 | 0.14 | |
HM-ANFIS-[FIS-SC] | 171.95 | 20.0 | 0.12 | |
HM-ANFIS-[ANFIS-FCM] | 101.49 | 70.0 | 0.68 | |
HM-ANFIS-[ANFIS-SC] | 118.47 | 10.0 | 0.08 | |
| ||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 58.29 | 50.0 | 0.84 |
HT-(MLP, SVR, ANFIS)-[WtAvg] | 50.84 | 50.0 | 0.96 | |
HT-(MLP, SVR, ANFIS)-[MLP] | 83.82 | 60.0 | 0.71 | |
HT-(MLP, SVR, ANFIS)-[SVR] | 114.18 | 30.0 | 0.26 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 130.00 | 20.0 | 0.15 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 78.71 | 30.0 | 0.38 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 149.92 | 20.0 | 0.13 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 64.95 | 30.0 | 0.45 |
Models’ performance using Maxwell dataset.
Model type | Model | MMRE | PRED(25) | EF |
---|---|---|---|---|
Individual | MLP | 74.80 | 25.0 | 0.33 |
SVR | 101.31 | 33.3 | 0.33 | |
ANFIS | 190.49 | 16.7 | 0.09 | |
| ||||
Homogeneous ensemble |
HM-MLP-[Avg] | 53.24 | 25.0 | 0.46 |
HM-MLP-[WtAvg] | 66.03 | 16.7 | 0.25 | |
HM-MLP-[MLP] | 109.23 | 8.3 | 0.08 | |
HM-MLP-[SVR] | 141.49 | 16.7 | 0.12 | |
HM-MLP-[FIS-FCM] | 82.56 | 16.7 | 0.20 | |
HM-MLP-[FIS-SC] | 64.36 | 25.0 | 0.38 | |
HM-MLP-[ANFIS-FCM] | 2969.17 | 33.3 | 0.01 | |
HM-MLP-[ANFIS-SC] | 60.53 | 33.3 | 0.54 | |
| ||||
Homogeneous ensemble |
HM-SVR-[Avg] | 77.74 | 25.0 | 0.32 |
HM-SVR-[WtAvg] | 100.87 | 16.7 | 0.16 | |
HM-SVR-[MLP] | 85.26 | 33.3 | 0.39 | |
HM-SVR-[SVR] | 139.33 | 16.7 | 0.12 | |
HM-SVR-[FIS-FCM] | 106.43 | 16.7 | 0.16 | |
HM-SVR-[FIS-SC] | 78.92 | 25.0 | 0.31 | |
HM-SVR-[ANFIS-FCM] | 43183.76 | 0.0 | 0.00 | |
HM-SVR-[ANFIS-SC] | 66.46 | 25.0 | 0.37 | |
| ||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 64.06 | 25.0 | 0.38 |
HM-ANFIS-[WtAvg] | 65.88 | 25.0 | 0.37 | |
HM-ANFIS-[MLP] | 92.25 | 8.3 | 0.09 | |
HM-ANFIS-[SVR] | 139.34 | 16.7 | 0.12 | |
HM-ANFIS-[FIS-FCM] | 119.37 | 8.3 | 0.07 | |
HM-ANFIS-[FIS-SC] | 75.25 | 25.0 | 0.33 | |
HM-ANFIS-[ANFIS-FCM] | 419245.17 | 8.3 | 0.00 | |
HM-ANFIS-[ANFIS-SC] | 59.89 | 25.0 | 0.41 | |
| ||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 63.31 | 16.7 | 0.26 |
HT-(MLP, SVR, ANFIS)-[WtAvg] | 63.79 | 16.7 | 0.26 | |
HT-(MLP, SVR, ANFIS)-[MLP] | 96.19 | 8.3 | 0.09 | |
HT-(MLP, SVR, ANFIS)-[SVR] | 167.08 | 16.7 | 0.10 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 114.27 | 0.0 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 62.34 | 25.0 | 0.39 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 312667.24 | 8.3 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 59.49 | 25.0 | 0.41 |
Models’ performance using COCOMO dataset.
Model type | Model | MMRE | PRED(25) | EF |
---|---|---|---|---|
Individual | MLP | 250.77 | 23.1 | 0.09 |
SVR | 803.96 | 7.7 | 0.01 | |
ANFIS | 532.33 | 23.1 | 0.04 | |
| ||||
Homogeneous ensemble |
HM-MLP-[Avg] | 68.53 | 23.1 | 0.33 |
HM-MLP-[WtAvg] | 68.62 | 23.1 | 0.33 | |
HM-MLP-[MLP] | 71.57 | 23.1 | 0.32 | |
HM-MLP-[SVR] | 466.60 | 0.0 | 0.00 | |
HM-MLP-[FIS-FCM] | 432.81 | 23.1 | 0.05 | |
HM-MLP-[FIS-SC] | 102.49 | 0.0 | 0.00 | |
HM-MLP-[ANFIS-FCM] | 98.98 | 15.4 | 0.15 | |
HM-MLP-[ANFIS-SC] | 307.08 | 15.4 | 0.05 | |
| ||||
Homogeneous ensemble |
HM-SVR-[Avg] | 131.79 | 7.7 | 0.06 |
HM-SVR-[WtAvg] | 131.34 | 7.7 | 0.06 | |
HM-SVR-[MLP] | 71.57 | 23.1 | 0.32 | |
HM-SVR-[SVR] | 515.51 | 0.0 | 0.00 | |
HM-SVR-[FIS-FCM] | 1471.43 | 0.0 | 0.00 | |
HM-SVR-[FIS-SC] | 102.49 | 0.0 | 0.00 | |
HM-SVR-[ANFIS-FCM] | 305.68 | 23.1 | 0.08 | |
HM-SVR-[ANFIS-SC] | 131.46 | 23.1 | 0.17 | |
| ||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 165.57 | 15.4 | 0.09 |
HM-ANFIS-[WtAvg] | 165.57 | 15.4 | 0.09 | |
HM-ANFIS-[MLP] | 1059.11 | 7.7 | 0.01 | |
HM-ANFIS-[SVR] | 480.86 | 7.7 | 0.02 | |
HM-ANFIS-[FIS-FCM] | 2804.98 | 0.0 | 0.00 | |
HM-ANFIS-[FIS-SC] | 147249731.42 | 0.0 | 0.00 | |
HM-ANFIS-[ANFIS-FCM] | 587.09 | 15.4 | 0.03 | |
HM-ANFIS-[ANFIS-SC] | 177.90 | 30.8 | 0.17 | |
| ||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 109.79 | 7.7 | 0.07 |
HT-(MLP, SVR, ANFIS)-[WtAvg] | 83.81 | 15.4 | 0.18 | |
HT-(MLP, SVR, ANFIS)-[MLP] | 881.83 | 0.0 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[SVR] | 479.36 | 7.7 | 0.02 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 2359.23 | 0.0 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 478644.93 | 7.7 | 0.00 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 468.18 | 30.8 | 0.07 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 135.16 | 30.8 | 0.23 |
Models’ performance using Desharnais dataset.
Model type | Model | MMRE | PRED (25) | EF |
---|---|---|---|---|
Individual | MLP | 78.24 | 20.0 | 0.25 |
SVR | 73.83 | 33.3 | 0.45 | |
ANFIS | 105.70 | 20.0 | 0.19 | |
| ||||
Homogeneous ensemble |
HM-MLP-[Avg] | 68.68 | 26.7 | 0.38 |
HM-MLP-[WtAvg] | 69.34 | 26.7 | 0.38 | |
HM-MLP-[MLP] | 81.79 | 20.0 | 0.24 | |
HM-MLP-[SVR] | 90.79 | 26.7 | 0.29 | |
HM-MLP-[FIS-FCM] | 78.94 | 26.7 | 0.33 | |
HM-MLP-[FIS-SC] | 62.21 | 26.7 | 0.42 | |
HM-MLP-[ANFIS-FCM] | 62.84 | 26.7 | 0.42 | |
HM-MLP-[ANFIS-SC] | 62.05 | 26.7 | 0.42 | |
| ||||
Homogeneous ensemble |
HM-SVR-[Avg] | 59.01 | 40.0 | 0.67 |
HM-SVR-[WtAvg] | 59.03 | 40.0 | 0.67 | |
HM-SVR-[MLP] | 83.97 | 20.0 | 0.24 | |
HM-SVR-[SVR] | 93.03 | 26.7 | 0.28 | |
HM-SVR-[FIS-FCM] | 97.45 | 13.3 | 0.14 | |
HM-SVR-[FIS-SC] | 82.08 | 20.0 | 0.24 | |
HM-SVR-[ANFIS-FCM] | 81.97 | 20.0 | 0.24 | |
HM-SVR-[ANFIS-SC] | 76.57 | 26.7 | 0.34 | |
| ||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 115.32 | 20.0 | 0.17 |
HM-ANFIS-[WtAvg] | 105.88 | 20.0 | 0.19 | |
HM-ANFIS-[MLP] | 107.53 | 26.7 | 0.25 | |
HM-ANFIS-[SVR] | 90.80 | 26.7 | 0.29 | |
HM-ANFIS-[FIS-FCM] | 104.29 | 13.3 | 0.13 | |
HM-ANFIS-[FIS-SC] | 92.21 | 6.7 | 0.07 | |
HM-ANFIS-[ANFIS-FCM] | 4850.60 | 13.3 | 0.00 | |
HM-ANFIS-[ANFIS-SC] | 91.54 | 13.3 | 0.14 | |
| ||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 66.13 | 33.3 | 0.50 |
HT-(MLP, SVR, ANFIS)-[WtAvg] | 66.86 | 26.7 | 0.39 | |
HT-(MLP, SVR, ANFIS)-[MLP] | 79.30 | 33.3 | 0.42 | |
HT-(MLP, SVR, ANFIS)-[SVR] | 92.49 | 26.7 | 0.29 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 85.29 | 20.0 | 0.23 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 63.60 | 20.0 | 0.31 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 76.14 | 40.0 | 0.52 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 68.20 | 20.0 | 0.29 |
Histograms of models’ EF metric using Albrecht dataset.
MMRE versus PRED(25) by each model using Albrecht dataset.
Histograms of models’ EF metric using Miyazaki dataset.
MMRE versus PRED(25) by each model using Miyazaki dataset.
Histograms of models’ EF metric using Maxwell dataset.
MMRE versus PRED(25) by each model using Maxwell dataset.
Histograms of models’ EF metric using COCOMO dataset.
MMRE versus PRED(25) by each model using COCOMO dataset.
Histograms of models’ EF metric using Desharnais dataset.
MMRE versus PRED(25) by each model using Desharnais dataset.
Among the individual models, the MLP model achieved the best performance in terms of MMRE, PRED(25), and EF. By comparing the performance of the homogeneous ensembles of MLP, it can be observed that HM-MLP-[WtAvg], and HM-MLP-[FIC-SC] were the best models. Moreover, only three homogeneous ensembles of MLP (i.e., HM-MLP-[WtAvg], HM-MLP-[FIC-SC] and HM-MLP-[MLP]) achieved better EF than the individual MLP model, whereas the other ensembles of MLP were worse than it. By comparing the performance of the homogeneous ensembles of SVR, it can be noticed that HM-SVR-[ANFIS-SC] was the best, followed by the HM-SVR-[ANFIS-FCM]. Furthermore, all homogeneous ensembles of SVR except HM-SVR-[FIC-SC] improved the performance of the individual SVR model in terms of EF. By comparing the performance of the homogeneous ensembles of ANFIS, it can be observed that HM-ANFIS-[MLP] was the best among them in terms of EF. In addition, only three homogeneous ensembles of ANFIS (i.e., HM-ANFIS-[Avg], HM-ANFIS-[FIC-SC], and HM-ANFIS-[ANFIS-SC]) achieved worse EF than the individual ANFIS model, whereas the other ensembles of ANFIS were better than it.
Among the heterogeneous ensemble models, HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] and HT-(MLP, SVR, ANFIS)-[ANFIS-SC] achieved relatively better performance than the other heterogeneous ensembles. It is interesting to observe that the individual MLP model performed better than all the heterogeneous ensembles. The distribution of the top 10 models, in terms of EF, is as follows: 1 individual model (MLP), 5 ensembles of MLP, 3 ensembles of SVR, and 1 ensemble of ANFIS. None of the heterogeneous ensembles was among the top 10 models.
Among the individual models, the ANFIS model achieved the best performance in terms of MMRE and EF, and the SVR model was the best in terms of PRED(25). By comparing the performance of the homogeneous ensembles of MLP, it can be observed that the HM-MLP-[ANFIS-FCM] model was the best. Moreover, only three homogeneous ensembles of MLP (i.e., HM-MLP-[MLP], HM-MLP-[FIC-SC], and HM-MLP-[ANFIS-FCM]) achieved better EF than the individual MLP model, whereas the other ensembles of MLP were worse than it. By comparing the performance of the homogeneous ensembles of SVR, it can be noticed that HM-SVR-[ANFIS-SC] was the best. Furthermore, all homogeneous ensembles of SVR except HM-SVR-[SVR] improved the performance of the individual SVR model in terms of EF. By comparing the performance of the homogeneous ensembles of ANFIS, it can be observed that HM-ANFIS-[Avg] and HM-ANFIS-[WtAvg] were the best models among them in terms of EF. In addition, only three homogeneous ensembles of ANFIS (i.e., HM-ANFIS-[Avg], HM-ANFIS-[WtAvg], and HM-ANFIS-[ANFIS-FCM]) achieved better EF than the individual ANFIS model, whereas the other ensembles of ANFIS were worse than it.
Among the heterogeneous ensemble models, HT-(MLP, SVR, ANFIS)-[Avg], HT-(MLP, SVR, ANFIS)-[WtAvg], and HT-(MLP, SVR, ANFIS)-[MLP] achieved relatively better performance than the other heterogeneous ensembles. The distribution of the top 10 models, in terms of EF, is as follows: 1 ensemble of MLP, 3 ensembles of SVR, 3 ensembles of ANFIS, and 3 heterogeneous ensembles. None of the individual model was among the top 10 models.
Among the individual models, the MLP model achieved the best performance in terms of MMRE, whereas the SVR model was the best in terms of PRED(25). Both models achieved the best EF value. By comparing the performance of the homogeneous ensembles of MLP, it can be observed that HM-MLP-[ANFIS-SC] was the best model based on PRED(25) and EF metrics. Moreover, only three homogeneous ensembles of MLP (i.e., HM-MLP-[Avg], HM-MLP-[FIC-SC], and HM-MLP-[ANFIS-SC]) achieved better EF than the individual MLP model, whereas the other ensembles of MLP were worse than it. By comparing the performance of the homogeneous ensembles of SVR, it can be noticed that HM-SVR-[MLP] was the best, followed by HM-SVR-[ANFIS-SC] in terms of EF. Furthermore, all other homogeneous ensembles of SVR performed worse than the individual SVR model in terms of EF. By comparing the performance of the homogeneous ensembles of ANFIS, it can be observed that HM-ANFIS-[ANFIS-SC] was the best among them in terms of MMRE, PRED(25), and EF. In addition, only two homogeneous ensembles of ANFIS (i.e., HM-ANFIS-[FIC-FCM] and HM-ANFIS-[ANFIS-FCM]) achieved worse EF than the individual ANFIS model, whereas the other ensembles of ANFIS were better than it.
Among the heterogeneous ensemble models, HT-(MLP, SVR, ANFIS)-[FIS-SC] and HT-(MLP, SVR, ANFIS)-[ANFIS-SC] achieved relatively better performance than the other heterogeneous ensembles. The distribution of the top 10 models, in terms of EF, is as follows: 3 ensembles of MLP, 2 ensembles of SVR, 3 ensembles of ANFIS, and 2 heterogeneous ensembles. None of the individual models was among the top 10 models.
Among the individual models, the MLP model achieved the best performance in terms of MMRE, PRED(25), and EF. By comparing the performance of the homogeneous ensembles of MLP, it can be observed that HM-MLP-[Avg], HM-MLP-[WtAvg], and HM-MLP-[MLP] were the best models. Moreover, only these three homogeneous ensembles of MLP, in addition to HM-MLP-[ANFIS-FCM], achieved better EF than the individual MLP model. The other ensembles of MLP were worse than it. By comparing the performance of the homogeneous ensembles of SVR, it can be noticed that the HM-SVR-[MLP] was the best. Furthermore, only three homogeneous ensembles of SVR (i.e., HM-SVR-[SVR], HM-SVR-[FIC-FCM], and HM-SVR-[FIC-SC]) achieved worse EF than the individual SVR model, whereas the other ensembles of SVR were better than it. By comparing the performance of the homogeneous ensembles of ANFIS, it can be observed that the HM-ANFIS-[ANFIS-SC] was the best among them in terms of EF. In addition, only three homogeneous ensembles of ANFIS (i.e., HM-ANFIS-[Avg], HM-ANFIS-[WtAvg], and HM-ANFIS-[ANFIS-SC]) achieved better EF than the individual ANFIS model, whereas the other ensembles of ANFIS were worse than it.
Among the heterogeneous ensemble models, HT-(MLP, SVR, ANFIS)-[WtAvg] and HT-(MLP, SVR, ANFIS)-[ANFIS-SC] achieved relatively better performance than the other heterogeneous ensembles. The distribution of the top 10 models, in terms of EF, is as follows: 4 ensembles of MLP, 2 ensembles of SVR, 2 ensembles of ANFIS, and 2 heterogeneous ensembles. None of the individual models was among the top 10 models.
Among the individual models, the SVR model achieved the best performance in terms of MMRE, PRED(25), and EF. By comparing the performance of the homogeneous ensembles of MLP, it can be observed that HM-MLP-[FIS-SC], HM-MLP-[ANFIS-FCM], and HM-MLP-[ANFIS-SC] were the best models based on the EF metric. Moreover, all homogeneous ensembles of MLP except HM-MLP-[MLP] improved the performance of the individual MLP model in terms of EF. By comparing the performance of the homogeneous ensembles of SVR, it can be noticed that HM-SVR-[Avg] and HM-SVR-[WtAvg] were the best. Furthermore, all other homogeneous ensembles of SVR did not improve the performance of the individual SVR model in terms of EF; they performed worse than it. By comparing the performance of the homogeneous ensembles of ANFIS, it can be observed that HM-ANFIS-[SVR] was the best among them in terms of MMRE, PRED(25), and EF. In addition, only two homogeneous ensembles of ANFIS (i.e., HM-ANFIS-[MLP] and HM-ANFIS-[SVR]) achieved better EF than the individual ANFIS model, whereas the other ensembles of ANFIS were worse than it.
Among the heterogeneous ensemble models, HT-(MLP, SVR, ANFIS)-[Avg] and HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] achieved relatively better performance than the other heterogeneous ensembles. The distribution of the top 10 models, in terms of EF, is as follows: 1 individual model (SVR), 3 ensembles of MLP, 2 ensembles of SVR, and 4 heterogeneous ensembles. None of the ensembles of ANFIS was among the top 10 models.
Ranking of models’ performance, in each dataset, on the EF metric, is provided in Table None of the individual models was among the top 10 models in three datasets (Miyazaki, Maxwell, and COCOMO). In Albrecht dataset, the MLP model was ranked 6th, and, in Desharnais dataset, the SVR model was ranked 5th. HM-MLP-[ANFIS-FCM] and HM-SVR-[ANFIS-SC] models occurred most frequently among the top 10 models across the five datasets. They were among the top 10 models in four out of the five datasets. Four models were among the top 10 models in three out of the five datasets. These models are HM-MLP-[Avg], HM-MLP-[FIS-SC], HM-ANFIS-[WtAvg], and HT-(MLP, SVR, ANFIS)-[WtAvg]. Nine models were never among the top 10 models in any dataset. These models are ANFIS, HM-MLP-[SVR], HM-MLP-[FIS-FCM], HM-SVR-[FIS-FCM], HM-ANFIS-[SVR], HM-ANFIS-[FIS-FCM], HM-ANFIS-[FIS-SC], HT-(MLP, SVR, ANFIS)-[SVR], and HT-(MLP, SVR, ANFIS)-[FIS-FCM]. At least one of the homogeneous ensembles of MLP was among the top 10 models in each dataset. In addition, at least two of the homogeneous ensembles of SVR were among the top 10 models in each dataset. None of the homogeneous ensembles of ANFIS was among the top 10 models in Desharnais dataset. However, at least one of the homogeneous ensembles of ANFIS was among the top 10 models in the other datasets. None of the heterogeneous ensembles was among the top 10 models in Albrecht dataset. However, at least two of the heterogeneous ensembles were among the top 10 models in the other datasets. All ensembles models with the nonlinear combiner [FIS-FCM] did not perform well as they were not among the top 10 models in any dataset. All ensembles models with the nonlinear combiner [SVR], except HM-SVR-[SVR] model, were not among the top 10 models in any dataset. In case of the HM-SVR-[SVR] model, it was ranked 10th in Albrecht dataset and was not among the top 10 models in the other four datasets. By considering the average rank of each model across the datasets, the best five models are
Ranking of models based on EF metric (top 10 models are highlighted).
Model type | Model | Albrecht |
Miyazaki |
Maxwell |
COCOMO |
Desharnais |
---|---|---|---|---|---|---|
Individual | MLP |
|
20 | 11 | 12 | 21 |
SVR | 26 | 24 | 13 | 24 |
| |
ANFIS | 25 | 12 | 27 | 20 | 28 | |
| ||||||
Homogeneous ensemble |
HM-MLP-[Avg] |
|
32 |
|
|
11 |
HM-MLP-[WtAvg] |
|
31 | 18 |
|
12 | |
HM-MLP-[MLP] |
|
16 | 29 |
|
23 | |
HM-MLP-[SVR] | 11 | 23 | 24 | 27 | 16 | |
HM-MLP-[FIS-FCM] | 15 | 35 | 19 | 18 | 14 | |
HM-MLP-[FIS-SC] |
|
13 |
|
27 |
| |
HM-MLP-[ANFIS-FCM] |
|
|
31 |
|
| |
HM-MLP-[ANFIS-SC] | 16 | 33 |
|
19 |
| |
| ||||||
Homogeneous ensemble |
HM-SVR-[Avg] | 19 |
|
14 | 17 |
|
HM-SVR-[WtAvg] | 20 | 11 | 20 | 16 |
| |
HM-SVR-[MLP] | 21 | 17 |
|
|
26 | |
HM-SVR-[SVR] |
|
26 | 22 | 27 | 20 | |
HM-SVR-[FIS-FCM] | 24 | 14 | 21 | 27 | 32 | |
HM-SVR-[FIS-SC] | 29 |
|
15 | 27 | 25 | |
HM-SVR-[ANFIS-FCM] |
|
22 | 34 | 13 | 24 | |
HM-SVR-[ANFIS-SC] |
|
|
|
|
13 | |
| ||||||
Homogeneous ensemble |
HM-ANFIS-[Avg] | 29 |
|
|
11 | 30 |
HM-ANFIS-[WtAvg] | 22 |
|
|
|
29 | |
HM-ANFIS-[MLP] |
|
21 | 26 | 25 | 22 | |
HM-ANFIS-[SVR] | 12 | 19 | 23 | 23 | 17 | |
HM-ANFIS-[FIS-FCM] | 17 | 28 | 30 | 27 | 33 | |
HM-ANFIS-[FIS-SC] | 29 | 30 | 12 | 27 | 34 | |
HM-ANFIS-[ANFIS-FCM] | 18 |
|
33 | 21 | 35 | |
HM-ANFIS-[ANFIS-SC] | 29 | 34 |
|
|
31 | |
| ||||||
Heterogeneous ensemble |
HT-(MLP, SVR, ANFIS)-[Avg] | 29 |
|
16 | 14 |
|
HT-(MLP, SVR, ANFIS)-[WtAvg] | 23 |
|
17 |
|
| |
HT-(MLP, SVR, ANFIS)-[MLP] | 28 |
|
28 | 27 |
| |
HT-(MLP, SVR, ANFIS)-[SVR] | 27 | 25 | 25 | 22 | 19 | |
HT-(MLP, SVR, ANFIS)-[FIS-FCM] | 29 | 27 | 34 | 27 | 27 | |
HT-(MLP, SVR, ANFIS)-[FIS-SC] | 29 | 18 |
|
26 | 15 | |
HT-(MLP, SVR, ANFIS)-[ANFIS-FCM] | 14 | 29 | 32 | 15 |
| |
HT-(MLP, SVR, ANFIS)-[ANFIS-SC] | 13 | 15 |
|
|
18 |
The paper evaluated and compared different homogeneous and heterogeneous ensembles of some optimized hybrid of computational intelligence models for software development effort estimation. Different linear and nonlinear combiners have been used. We have conducted an empirical study to evaluate and compare the performance of these ensembles using five popular datasets. The main findings of this study are as follows. The results confirm that individual models are not reliable as their performance is inconsistent and unstable across different datasets. Although none of the ensemble models was consistently the best, many of them were frequently among the best models for each dataset. The homogeneous ensemble of SVR, with the nonlinear combiner ANFIS-SC, was the best model when considering the average rank of each model across the five datasets. Based on the empirical results, we do not recommend using the following models as they performed worse than the other models across the five datasets: ANFIS, HM-MLP-[SVR], HM-MLP-[FIS-FCM], HM-SVR-[FIS-FCM], HM-ANFIS-[SVR], HM-ANFIS-[FIS-FCM], HM-ANFIS-[FIS-SC], HT-(MLP, SVR, ANFIS)-[SVR], and HT-(MLP, SVR, ANFIS)-[FIS-FCM]. Moreover, we do not recommend using the nonlinear combiner FIS-FCM.
This paper has contributed interesting empirical based insights into the application of different homogeneous and heterogeneous ensemble models in estimating software development effort. Future works include assessment of ensembles of different computational intelligence models and with different combination rules. Homogeneous ensembles of models other than MLP, SVR, and ANFIS can be developed and evaluated. In addition, heterogeneous ensembles with different base models can be also explored and evaluated. In these ensembles, various linear and nonlinear combination rules can be evaluated. An interesting future research is to investigate what might be a best set of base models for heterogeneous ensembles. Another direction of future work is to apply ensemble models in other software engineering prediction problems. This includes classification and regression problems of fault and changeability prediction.
The authors would like to acknowledge the support provided by the Deanship of Scientific Research at King Fahd University of Petroleum & Minerals (KFUPM) under Research Grant FT111007. They would like to thank the anonymous reviewers for their insightful comments and suggestions and also Mr. Yasser Khan and Dr. Mohamed El-Attar for their feedback.