Prediction model for object oriented software development effort estimation using one hidden layer feed forward neural network with genetic algorithm

The budget computation for software development is affected by the prediction of software development effort and schedule. Software development effort and schedule can be predicted precisely on the basis of past software project data sets. In this paper, a model for object-oriented software development effort estimation using one hidden layer feed forward neural network (OHFNN) has been developed. The model has been further optimized with the help of genetic algorithm by taking weight vector obtained from OHFNN as initial population for the genetic algorithm. Convergence has been obtained by minimizing the sum of squared errors of each input vector and optimal weight vector has been determined to predict the software development effort. The model has been empirically validated on the PROMISE software engineering repository dataset. Performance of the model is more accurate than the well-established constructive cost model (COCOMO).


Introduction
The COCOMO model is the most popular model for software effort estimation. This model has been validated on large data set of projects at consulting firm, Teen Red Week (TRW) software production system (SPS) in California, USA. The structure of the model has been classified on the basis of type of projects to be handled. Types of projects are organic, semidetached, and embedded. The model structure is represented as follows: Here, and are domain specific parameters. For predicting the software development effort, parameters and have been adjusted on the past data set of various projects. Five scale factors have been used to generalize and replace the effects of the development mode in COCOMO II. There are fifteen parameters which affect the effort of software development. These parameters are analyst capability ( ), programmer's capability ( ), application experience ( ), modern programming practices ( ), use of software tools ( ), virtual memory experience (V ), language experience ( ), schedule constraint ( ), main memory constraint ( ), database size ( ), time constraint for CPU ( ), turnaround time ( ), machine volatility (V ), process complexity ( ), and required software reliability ( ): KLOC is estimated directly or computed from a function point analysis and is the product of fifteen effort multipliers: Effort (Months) = * (KLOC) * (EM1 * EM2 * ⋅ ⋅ ⋅ * EM15) .
Proposed prediction model of software development effort estimation has been used to predict software development effort by using sixteen independent parameters such 2   Advances in Software Engineering   as  ,  ,  ,  ,  , V ,  ,  ,  ,  ,  V  ,  ,  ,  , , and . The past dataset has been obtained from the PROMISE site. All these sixteen parameters are used as input vector in one hidden layer feed forward neural network. Through back propagation with gradient descent training, mapping between input vectors and output vectors has been established by minimizing the sum of squared error at output layer. The optimal weight vector has been obtained through this network to predict the software development effort of another dataset of PROMISE software projects. The optimal weight vector obtained from neural network is being used as initial population in GA tool to optimize the root mean square error.
The remaining part of the paper is organized as follows. In Section 2, related works have been explained. In Section 3, mathematical model of neural network approach to effort prediction has been represented. Section 4 gives the idea of genetic algorithm in brief. Section 5 gives implementation details of prediction model. Section 6 presents result and discussion. Section 7 gives the conclusion drawn from results and future scope of the research work.

Related Work
Yadav and Singh obtained OHFNN 16-19-1 optimal structure for prediction of software development effort with best root mean square as 0.00149074 at the learning rate 1.01 and momentum 0.7 in one million epochs [1]. Yadav and Singh modified COCOMO II by introducing some more parameters for predicting the software development effort [2]. Kumar et al. considered mean of square distributed error as fitness function for measuring the performance of multilayer feed forward neural network in terms of accuracy, and epochs [3]. Kumar et al. proposed a model using particle swarm optimization (PSO) for tuning the parameters of basic COCOMO model to predict the software development effort accurately considering only KLOC parameter [4]. Praynlin and Latha confirmed that back propagation algorithm is more efficient than a recurrent type neural network [5]. Kumar et al. used real coded genetic algorithms and fuzzy lambda tau methodology for reliability analysis of waste clean-up manipulator [6]. Shrivastava and Singh evaluated performance of feed forward neural network with the help of three algorithms such as back propagation, evolutionary algorithm, and hybrid evolutionary algorithm for hand written English alphabets [7]. Sheta and Al-Afeef developed genetic programming model utilizing line of code and methodology to predict the software development effort precisely compared to other models [8]. Sheta proposed modified version of COCOMO model using genetic algorithms (GAs) to explore the effect of the software development adopted methodology in effort computation [9].
Reddy and Raju used single layer feed forward neural network with back propagation learning algorithm and resilient back propagation algorithm for predicting the software development effort accurately [10]. Reddy and Raju proposed multilayer feed forward neural network with back propagation learning algorithm by iteratively processing a set of training samples and comparing the network's prediction with the actual effort [11]. Singh and Dhaka analyzed the performance of back propagation algorithm with changing training patterns and the momentum term in the feed forward neural networks [12]. Tian and Noore used genetic algorithm to globally optimize the number of the delayed input neurons and the number of neurons in the hidden layer of the neural network architecture [13]. Jun and Lee used quasioptimal case-selective neural network for software development effort estimation and adopted the beam searched technique and devised the case-set selection algorithm to find the quasioptimal model from the hierarchy of reduced neural network models [14]. Burgess and Lefley evaluated the potential of genetic programming (GP) in software effort estimation when compared with existing approaches, in terms of accuracy and ease of use [15]. Shukla presented a new genetically trained neural network model for predicting the software development effort [16]. Khoshgoftaar et al. used neural network as tool for predicting the number of software development faults [17]. Here principal components are linear combinations of sixteen independent parameters 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 , 10 , 11 , 12 , 13 , 14 , 15 , and 16 : In principal component analysis, preprocessing of the dataset has been done. These sixteen attributes of project such as , , and are correlated with the development effort. For minimizing the difference between desired output and actual output, weights have been adjusted repetitively by using ANN. The sum of squared errors on the training data set has been minimized by finding a vector of connection weights that is called network learning. The different network architecture has been trained by the standard error back propagation algorithm at different learning rate and at different momentum, having minimum sum of squared errors. Having minimum sum of squared errors is a training stopping criterion.
Multilayer networks are more powerful than single-layer networks for adjusting the weights. First appropriate transfer function is chosen by the designer after that parameters weight and bias value will be adjusted by some learning rule for minimizing the difference between desired output and actual output. Log-sigmoid function takes the input between plus and minus infinity and output varies into the range 0 to 1. Expression of log-sigmoid function is given as follows: The log-sigmoid transfer function is commonly used in multilayer networks which are trained using back propagation algorithm due to its differentiable nature.
Advances in Software Engineering 3

Effort Prediction Model Using One Hidden Layer Feed Forward Neural Network (OHFNN)
OHFNN with gradient descent back propagation learning method has been used in this model for estimating the object oriented software development effort. Let us consider input vector = ( 1 ; 2 ; . . . ) where = 16 and output vector = ( 1 ; 2 ; . . . ). The neural network can be trained by using the input and output vector mapping. is set of training vector pairs: ∈ , ∈ , where = 16, = 1, and = 40. Here net generates an output signal vector ( ) and is vector of activations of output layer neuron.
Error at th training pair ( , ) is as follows: where = ( 1 , . . . ) = ( 1 − ( 1 ) , . . . , − ( )) . (8) Squared error is sum of squares of each individual output error ; that is, The mean square error (mse) is computed over the entire training set : The weights between hidden and output layer are updated as and the weights between input and hidden layer are updated as where Δ ℎ and Δ ℎ are weight changes computed in previous step. Now the weights have been updated in output and hidden layers by the following equations: We can introduce the momentum into back propagation with the help of the following equations: Back propagation propagates changes back because it can do substantial good thing. The change in should be proportional to (1− ) the slope of the threshold function, at node . The change to should also be proportional to the weight on the link connecting node to node . Summing over all nodes in layer , = ∑ (1 − ) . At the output layer, the benefit has been given by the error at the output node. The output layer will be benefited as = − . Here a rate parameter has been introduced for controlling the learning rate. So change in is proportional to ; that is, Δ = (1 − ) and = ∑ (1 − ) for nodes in hidden layers and = − for nodes in the output layer. The output of the network is compared with desired output; if it deviates from desired output, the difference between actual output and the desired output is propagated back from the output layer to previous layer to modify the strength or weight of connection.

Genetic Algorithm (GA)
GA is a type of evolutionary concept generally used to solve optimization problems. GA is called a global optimizer. GA is based on the principles of evolution and inheritance. GA system maintains a population of potential solutions. It has some selection process based on fitness of individuals and a set of biologically inspired operators. GA consists of both a local search operator such as crossover and a global search operator such as mutation. In an evolutionary theory, only the fit individuals in a population are likely to survive and generate offspring, and their biological traits have been inherited in the next generations. In the large search space GA is much better than the conventional search and optimization techniques due to its parallelism and random search implemented by recombination operators. The following three steps are followed for GA to solve any given problem [18].
(1) Create an initial population of potential solutions to the given problem randomly.
(2) Repeatedly perform the following subsets for each generation until a termination condition has been satisfied. (3) Individual with best fitness value is the optimum solution.
A brief description of various operators used in GA and some of the basic terminologies is given.

Advances in Software Engineering
Selection. This operator selects fit individuals from the population for reproduction to generate offspring in the next generation. Selection is based on fitness value.
Crossover. This operator generates offspring from each pair of individuals. Each individual contributes a portion of its genetic information to each offspring.
Mutation. This operator randomly changes a tiny amount of genetic information in each offspring.
Chromosome. The complete genetic description of an individual is described as chromosome. It is a collection of basic features called genes.
Gene. This is a single feature within a chromosome. Gene may take any of several values called alleles.
Allele. Allele is a particular value that may be taken by a gene.

Population. A number of chromosomes form a single population.
Objective Function. This is a function that is considered for minimization or maximization of certain criterion.
Fitness. This is a measure of how well a parameter set performs.
Schema. This is a collection of genes in a chromosome having certain specified values.
Functioning of GA can be visualized as a balanced combination of exploration of new regions in the search space and exploitation of already sampled regions. By choosing the right control parameters such as the crossover and mutation probabilities and population size, performance of GA can be measured. The chromosome level representation is called the genotype. All the information that is necessary to construct an organism has been resided in genotype. The organism is called phenotype [16].

Implementation Details of OHFNN Prediction Model Using GA
OHFNN with 16 input neurons, 19 hidden layer neurons to develop input output mapping, and 1 output neuron to predict development effort in person-months has been taken. GA has been used to solve the problem of optimizing the weights of OHFNN 16-19-1 in order to minimize the mean squared error over a training PROMISE data set [19]. Here OHFNN 16-19-1 is called the phenotype, and the string of weights of OHFNN 16-19-1 is called the genotype. Genotype is a data structure which represents information about the phenotype and which is encoded for use in GA. Since 16 neurons and one bias value are at input layer, 19 neurons and one bias value are at hidden layer, and 1 neuron is at output layer, so weight vector from input to hidden layer is 17×19, and weight vector from hidden to output layer is 20×1. OHFNN 16-19-1 consists of 343 weights connecting various layers. These 343 weights are encoded in a chromosomal string. In this optimization problem fitness function is represented in terms of root mean square error (rmse) shown as follows: Advances in Software Engineering  The real valued coding scheme has been used to form a string. This predictor has been developed on Intel core 2 Duo CPU 2.10 GHz, 2GB RAM, Windows 7 32-bit OS using NNtool and GA Tool of MATLAB. In this predictor, first we obtain best four weight vectors after training OHFNN 16-19-1 on PROMISE data set. These four weight vectors are solutions in initial population of GA. Excellent results can be obtained by using the control parameters of GA given in Table 1.

Results and Discussion
By varying the number of neurons at hidden layer of OHFNN architecture, the optimal neural architecture of OHFNN is 16-19-1 for traingdm and traingda training methods of NNtool. The performance graphs of OHFNN 16-19-1 with traingda and OHFNN 16-19-1 with traingdm are shown in Figures 1  and 2, respectively. Best validation performance of OHFNN 16-19-1 with traingda is 0.0088132 at epoch 2173 and the best validation performance of OHFNN 16-19-1 with traingdm is 0.012839 at epoch 1,00,900. During the analysis of this work it has been found that development effort of some projects is not predicted precisely. Research work has been carried out to change the proposed algorithm for better results in all cases. In [1] best root mean square error is 0.00149074 for network architecture OHFNN 16-19-1 at learning rate 1.01 and momentum 0.7. Gradient descent never guarantees that root mean square error obtained is a global one. For exploring the problem in global search space, GA has been used to optimize the fitness function. This fitness function is written in terms of root mean square error. Weight vector obtained after training the neural network has been used as input for the fitness function. Using the operators such as selection, crossover, and mutation operator of GA, root mean square error can be further optimized. Root mean square error is 0.0014602 after 500 generations using 10000 populations as 6 Advances in Software Engineering shown in Figure 3. Control parameters of GA for the above are represented in Table 1.

Conclusion and Future Scope
In this research work, by a large number of simulation works, OHFNN 16-19-1 architecture has been obtained to predict the development effort accurately using GA. Performance index of OHFNN-GA prediction model depends not only on the architecture of network and learning algorithm for training but also on the crossover and mutation probabilities and population sizes. In this study, OHFNN 19-16-1 has been fixed with both training algorithms for having common platforms in the comparison of the performance. In the future the other neural network like radial basis function (RBF) with GA would be used for the prediction model. Binary associative architecture (BAM) with GA can also be used for a better result. Two hidden layer feed forward neural networks (THFNN) with GA can be used for further optimizing the rmse. Particle swarm optimization (PSO) can be combined with neural network architecture to predict the object oriented software development effort precisely. Other attributes of the object oriented software can also be predicted using this model. This nonparametric model can be used for establishing relationship between input vector and output vector with the help of weight vector.