Estimation is an important part of software engineering projects, and the ability to produce accurate effort estimates has an impact on key economic processes, including budgeting and bid proposals and deciding the execution boundaries of the project. Work in this paper explores the interrelationship among different dimensions of software projects, namely, project size, effort, and effort influencing factors. The study aims at providing better effort estimate on the parameters of modified COCOMO along with the detailed use of binary genetic algorithm as a novel optimization algorithm. Significance of 15 cost drivers can be shown by their impact on MMRE of efforts on original 63 NASA datasets. Proposed method is producing tuned values of the cost drivers, which are effective enough to improve the productivity of the projects. Prediction at different levels of MRE for each project reflects the percentage of projects with desired accuracy. Furthermore, this model is validated on two different datasets which represents better estimation accuracy as compared to the COCOMO 81 based NASA 63 and NASA 93 datasets.
Estimation is an important part of software engineering projects, and the ability to produce accurate effort estimates has an impact on key economic processes, including budgeting and bid proposals and deciding the execution boundaries of the project [
Over the past few years, software development effort is found to be one of the worst estimated attributes. Significant over- or underestimates can be very expensive for company and the competitiveness of a software company heavily depends on the ability of its project managers to accurately predict in advance the effort required to develop the software systems [
Many model structures evolved in the literature and these structures consider modeling relationship between software effort, developed line of code (DLOC), and influencing factors:
Building such a relationship as a function helps project managers to accurately allocate the available resources for the project [
Recently, the uses of search based methods have been suggested to address the software development effort estimation problem [
The whole paper is organized in 7 sections. Section
Software development effort estimates are likely to be highly inaccurate and systematically overoptimistic due to the valence effect of prediction, anchoring, and planning fallacy and cognitive effects. Empirical evidence suggests that the causes of the problem, to some extent, were due to the influence of irrelevant and misleading information, for example, information regarding the client’s budget, present in the estimation material [
Empirical software estimation models are mainly based on cost drivers and scale factors. These models show the problem of instability due to values of the cost drivers and scale factors, thus affecting the sensitivity in terms of accurate effort estimation. Also, most of the models depend on the size of the project and a small change in the size leads to the proportionate change in the effort. Miscalculations of the cost drivers have even more noisy data as a result too. For example, a misjudgment in personnel capability cost driver in COCOMO between “very high to very low” will result in 300% increase in effort. Similarly in SEER-SEM, changing security requirements values from “low” to “high” will result in 400% increase in effort. In PRICE-S, 20% change in effort will occur due to small change in the value of the productivity factor [
Many software estimation models have been proposed by various researchers and can be categorized according to their basic formulation schemes: analogy based estimation schemes [
Some of the famous algorithmic models among these diversified models, COCOMO, SLIM, SEER-SEM, and FP analysis methods, are very much popular in practice in the empirical category [
The limitations in algorithmic models have led to the exploration of nonalgorithmic models which are soft computing based [
The original Constructive Cost Model abbreviated as COCOMO was first published by Dr. Barry Boehm in 1981. The word “constructive” prevails that the complexity of the model can easily be understood due to the openness of the model, which exhibits exactly why the model gives the estimates. Since the inception of the software development techniques, many efforts were done in the improvement of estimation; COCOMO is the best documented, most transparent and reflects the software development practices of these days. The main focus in COCOMO is upon the estimation of the influence of 15 cost drivers on the development effort cost. The model does not support project management in estimating the size of the software. COCOMO has been derived from a database of 63 projects, executed between 1964 and 1979 by the American Company TRW Systems Inc. The projects considered during this time era were differing strongly in type of their application, size, and programming language [
Boehm introduced three levels of the estimation model: basic, intermediate, and detailed. The basic COCOMO 81 is a single-valued, static model which provides an approximate estimation of software development effort and cost as a function of program size expressed in thousand delivered source of instructions (KDSI). The intermediate COCOMO 81 describes software development effort as a function of program size in LOC and a set of fifteen “effort multipliers known as cost drivers.” These cost drivers incorporate subjective assessments of product, project, personnel, and hardware attributes. The advanced or detailed COCOMO 81 reduces the margin of error in the final estimate by incorporating all characteristics of the intermediate version with the determination of the cost driver’s impact on each step, that is, analysis and design of the software engineering process.
COCOMO assumes that the effort grows more than linearly with software size. The value of few multipliers is required to be increased to decrease the effort. For few other multipliers, the values are required to be decreased to decrease the effort; that is,
So the following equation can be represented as:
Contrary to the algorithmic models, since inception in 1990s the proposed nonalgorithmic models are based on computational intelligence, analytical comparisons, and inferences to project cost estimation. They have the capability to model the complex set of relationship between the dependent variables (cost, effort) and the independent variables (cost drivers) collected earlier in the project lifecycle and to learn from historical projects data. For using the nonalgorithmic models, information about those previous projects datasets is required which are similar to the projects under estimate. Usually, in these methods estimation process is done according to the analysis of the historical datasets. Many software researchers have shown their interest in the research to new approaches of nonalgorithmic models that are based on soft computing, that is, artificial neural networks, fuzzy logic and evolutionary algorithms. These methods are being used for the assessing because of their popularity and a large number of papers about their usage have been published in the recent past years [
Evolutionary computational methods are generally used in software engineering methodologies such as test case generation [
Genetic algorithm starts with randomly generated initial population as a set of solutions, which are represented by chromosomes. The algorithm then generates a sequence of individuals as new population. At each iteration, the algorithm uses the individuals of current generation to create the next generation of population. To create the new population, the algorithm works with the following steps.
Score each individual member of the current population by computing its fitness value. Scale the raw fitness scores to convert them into a more desired range of values. Select the good individuals, called parents, based on the value of fitness function. Few of the individuals in the current population that are having lower fitness are selected as elite. These individuals are directly sent to the next generation of population for elitism. Produce offsprings from the parents. Offsprings are produced either by making mutation of a single parent by combining the chromosome of a pair of parents with the help of crossover operator. Update the current population with the offsprings to form the new generation. The algorithm terminates only on the condition that any one of the stopping criteria is reached, that is, number of generations or desired fitness value.
(i) (ii) Input the 15 cost drivers, KLOC, Actual Effort for NASA projects. (iii) [LOOP] (iv) [END OF LOOP] (v) [END]
In submodel 2, influenced MMRE is calculated on the basis of occurrences of 15 cost drivers. This influenced MMRE shows the effectiveness of each cost driver in the sequence of development of efforts in terms of person-months. In this process, we take sample data having 18 input parameters, that is, 15 cost drivers, modes, source lines, and actual effort. The estimated efforts are calculated for the sample data by nullifying the effect of cost driver one by one. These efforts are used to calculate the influenced MMRE for each cost driver corresponding to the actual effort provided in the sample data. The difference between influenced MMRE and original MMRE is recorded in the list along with driver.
In submodel 3, the list with the difference of influenced MMRE and original MMRE is sorted in the descending order of the difference to provide the significant occurrence of the driver. This order has been named as Sig.
In submodel 4 we will try to minimize the MMRE by updating the value of cost driver with the help of genetic algorithm in the order of their significance. This is done by selecting the projects falling in the category of particular cost driver and then using the genetic algorithm operator. The results obtained are evaluated using the fitness function as MMRE. If the MMRE is reduced, the cost driver value for particular rating is updated, otherwise discarded. The reduced MMRE will be recorded as Mmod which will be used as the MMRE for the remaining cost drivers.
Software cost estimation models need to be quantitatively evaluated in terms of estimation accuracy to improve the modeling process. Some rules or the measurements must be provided for model assessment purpose. This measurement of accuracy defines how close the estimated result is with its actual value. Software cost estimates play significant role in delivering software projects. As a result, researchers have proposed the most widely used evaluation criterion to assess the performance of software prediction models, that is, the mean magnitude of relative error (MMRE), to evaluate the opulence of prediction systems. MMRE is usually computed by following standard evaluation processes such as cross-validation [
COCOMO computes effort on the basis of source lines of codes. In intermediate COCOMO, Boehm used 15 more predictor variables called cost drivers, which are required to calibrate the nominal effort of a project to the actual project environment. The values are set to each cost driver according to the properties of the specific software project. These numerical values of 15 cost drivers are multiplied to get the effort adjustment factor, that is, EAF.
Performance of estimation methods is usually evaluated by several ratio measurements of accuracy metrics including RE (relative error), MRE (magnitude of relative error), and MMRE (mean magnitude of relative error) which are computed as follows:
Another parameter used in evaluation of performance of estimation method is PRED (percentage of prediction) which is determined as
where
Experiments were done by taking 63 COCOMO 81 based dataset used by NASA and various other calculations performed on it. 93 NASA projects from different centers for projects from the years of 1971 to 1987 were collected by Jairus Hihn, JPL, NASA, Manager of SQIP Measurement and Benchmarking Element. The proposed model is validated by these datasets. These are one of the most analyzed data sets. The independent variable used is “adjusted delivered source instructions,” which takes into account the variation of effort when adapting software. COCOMO is built upon these data points, by introducing many factors in the form of multipliers.
These datasets include 156 historical projects with 17 effort drivers and one dependent variable of the software development effort.
Cost drivers play a vital role in estimation of the efforts and cost to be incurred. They show characteristics of software development that influence effort in carrying out a certain project. Cost drivers are selected based on the arguments that they have a linear effect on effort. COCOMO cost drivers are the basis for the analysis of proposed algorithm. Table
COCOMO cost drivers.
Cost drivers | Very low | Low | Nominal | High | Very high | Extra high |
---|---|---|---|---|---|---|
acap | 1.46 | 1.19 | 1 | 0.86 | 0.71 | |
pcap | 1.42 | 1.17 | 1 | 0.86 | 0.7 | |
aexp | 1.29 | 1.13 | 1 | 0.91 | 0.82 | |
modp | 1.24 | 1.1 | 1 | 0.91 | 0.82 | |
tool | 1.24 | 1.1 | 1 | 0.91 | 0.83 | |
vexp | 1.21 | 1.1 | 1 | 0.9 | ||
lexp | 1.14 | 1.07 | 1 | 0.95 | ||
sced | 1.23 | 1.08 | 1 | 1.04 | 1.1 | |
stor | 1 | 1.06 | 1.21 | 1.56 | ||
data | 0.94 | 1 | 1.08 | 1.16 | ||
time | 1 | 1.11 | 1.3 | 1.66 | ||
turn | 0.87 | 1 | 1.07 | 1.15 | ||
virt | 0.87 | 1 | 1.15 | 1.3 | ||
cplx | 0.7 | 0.85 | 1 | 1.15 | 1.3 | 1.65 |
rely | 0.75 | 0.88 | 1 | 1.15 | 1.4 |
Significance of 15 cost drivers can be shown by their impact on MMRE of efforts on original 63 NASA datasets. The significance occurrences of 15 cost drivers are calculated by applying step 1 to step 4 which are shown in Table
Significant occurrences of cost drivers.
1 | acap |
2 | pcap |
3 | aexp |
4 | rely |
5 | Virt |
6 | vexp |
7 | time |
8 | modp |
9 | cplx |
10 | data |
11 | tool |
12 | sced |
13 | lexp |
14 | turn |
15 | stor |
The occurrence of each cost driver is having linearity with the MMRE calculated between actual efforts produced and estimated effort with COCOMO. In Figure
Relationship between MMRE and cost drivers.
Once significant occurrences of the cost drivers are found, the sequence of cost drivers is used to produce tuned values for different ratings of various cost drivers. Step 5 and step 6 are used to generate the new values of available cost drivers. Table
Proposed algorithm based cost drivers.
Cost drivers | Very low | Low | Nominal | High | Very high | Extra high |
---|---|---|---|---|---|---|
acap | 1.46 | 1.19 | 0.9 | 0.86 | 0.71 | |
pcap | 1.42 | 0.9 | 1 | 0.86 | 0.7 | |
aexp | 1.29 | 1.4 | 1 | 0.91 | 0.82 | |
modp | 1.38 | 0.92 | 1 | 0.91 | 0.82 | |
tool | 1.24 | 1.1 | 0.99 | 0.93 | 0.83 | |
vexp | 1.38 | 1.03 | 1 | 0.9 | ||
lexp | 1.14 | 1.08 | 0.9 | 0.95 | ||
sced | 1.23 | 1.08 | 0.99 | 1.04 | 1.1 | |
stor | 1 | 1.06 | 1.19 | 1.38 | ||
data | 1.03 | 0.9 | 1.06 | 1.38 | ||
time | 0.9 | 1.11 | 1.3 | 1.66 | ||
turn | 0.97 | 0.92 | 1.03 | 0.9 | ||
virt | 0.87 | 1 | 1.15 | 1.3 | ||
cplx | 0.7 | 0.85 | 1.11 | 1.15 | 1.16 | 1.65 |
rely | 0.75 | 0.88 | 1 | 1.25 | 1.4 |
The proposed algorithm is validated with two different datasets of NASA projects. According to the evaluation criteria, the proposed method has marginal difference in efforts with actual project efforts in comparison to COCOMO generated efforts, shown in Figure
Comparison of MRE for NASA 63 projects.
Comparison of MRE for NASA 93 projects.
Comparison of productivity for NASA 63 projects.
A comparison is made between proposed method and other estimation methods by MMRE in Table
The MMRE for two different methods.
MMRE (for 63 datasets) | MMRE (for 93 datasets) | ||
---|---|---|---|
COCOMO versus actual | Proposed method versus actual | COCOMO versus actual | Proposed method versus actual |
0.36 | 0.27 | 0.59 | 0.56 |
Essentially, we want to measure useful functionality produced per time unit. Productivity is another measurement of effectiveness of the model. It is a measure of the rate or ratio at which individual software developers involved in software development produce software and associated documentation.
Higher productivity reflects the better quality achievement for the project development. Proposed method is having productivity 0.29 which is closer to the actual efforts 0.27 as productivity. Seven percent of proposed method productivity is increased and 9 percent of COCOMO productivity is decreased in comparison with actual productivity (Table
The productivity of various approaches.
Productivity (COCOMO) | Productivity (proposed method) | Productivity (actual) | |
---|---|---|---|
0.25 | 0.30 | 0.28 | |
Difference from actual | 0.03 | 0.02 |
Tables
MMRE of NASA 63 projects for various project modes.
Project mode | No. of projects (63) | MMRE for proposed method | MMRE for COCOMO |
---|---|---|---|
Embedded | 27 | 0.29 | 0.39 |
Organic | 25 | 0.28 | 0.37 |
Semidetached | 11 | 0.22 | 0.23 |
MMRE of NASA 93 projects for various project modes.
Project mode | No. of projects (93) | MMRE for proposed method | MMRE for COCOMO |
---|---|---|---|
Embedded | 21 | 0.72 | 0.82 |
Organic | 3 | 0.8 | 0.88 |
Semidetached | 69 | 0.51 | 0.51 |
Description of projects on application basis.
Type of application | No. of projects | MMRE COCOMO | MMRE proposed method |
---|---|---|---|
Application_ground | 2 | 0.28 | 0.25 |
Avionics | 11 | 0.95 | 0.80 |
Avionics monitoring | 30 | 0.66 | 0.55 |
Batch data processing | 2 | 0.08 | 0.12 |
Communications | 1 | 0.18 | 0.05 |
Data capture | 3 | 0.09 | 0.07 |
Launch processing | 1 | 0.32 | 0.46 |
Mission planning | 20 | 0.38 | 0.34 |
Monitor_control | 8 | 0.20 | 0.50 |
Operating system | 4 | 3.82 | 3.63 |
Real data processing | 3 | 0.12 | 0.06 |
Science | 2 | 0.18 | 0.41 |
Simulation | 4 | 0.17 | 0.29 |
Utility | 2 | 0.12 | 0.31 |
PRED was calculated with the two separate approaches and Table
Pred calculation at different values for both the models.
PRED | ||||||
---|---|---|---|---|---|---|
COCOMO | Proposed method | |||||
10 | 20 | 30 | 10 | 20 | 30 | |
Percentage of 63 NASA datasets | 23.81 | 39.68 | 57.14 | 25.4 | 42.86 | 61.91 |
Work carried out in the paper explores the inter-relationship among different dimensions of data driven software projects, namely, project size and effort. The above-mentioned results demonstrate that applying proposed method to the software effort estimation is by far the most feasible approach for addressing the problem of apprehension and ambiguity existing in software effort drivers. Order of occurrence of various cost drivers has a significant impact on overall efforts in project estimation. Small adjustments to the COCOMO cost drivers bring significant improvements to the quality criteria applied to the proposed approach. Proposed method is producing tuned values of the cost drivers, which are effective enough to improve the productivity of the projects. Prediction at different levels of MRE for each project reflects the percentage of projects with desired accuracy. Furthermore, this model is validated on two different datasets which represents better estimation accuracy as compared to the COCOMO 81 based NASA 63 and NASA 93 datasets. The utilization of proposed algorithm for other applications in the software engineering field can also be explored in the future.
The authors certify that there is no actual or potential conflict of interests in relation to this paper. The American Company TRW Systems Inc. has been referred to as the company where Barry W. Boehm, the developer of COCOMO, worked.