Cost estimates are essential for the success of construction projects. Neural networks, as the tools of artificial intelligence, offer a significant potential in this field. Applying neural networks, however, requires respective studies due to the specifics of different kinds of facilities. This paper presents the proposal of an approach to the estimation of construction costs of sports fields which is based on neural networks. The general applicability of artificial neural networks in the formulated problem with cost estimation is investigated. An applicability of multilayer perceptron networks is confirmed by the results of the initial training of a set of various artificial neural networks. Moreover, one network was tailored for mapping a relationship between the total cost of construction works and the selected cost predictors which are characteristic of sports fields. Its prediction quality and accuracy were assessed positively. The research results legitimatize the proposed approach.
The results presented in this paper are part of a broad research, in which the authors participate, aiming to develop tools for fast cost estimates, dedicated to the construction industry. The main aim of this paper is to present the results of the investigations on the applicability of artificial neural networks (ANNs) in the problem of estimating the total cost of construction works in the case of sports fields as specific facilities. The authors propose herein a new approach based on ANNs for estimating construction costs of sports fields.
Cost estimation is a key issue in construction projects. Both underestimation and overestimation of costs may lead to a failure of a construction project. The use of different tools and techniques in the whole project life cycle should provide information about costs to the participants of the project and support a complex decision-making process. In general, cost estimating methods can be classified as follows [ Qualitative cost estimating: Cost estimating based on heuristic methods Cost estimating based on expert judgments Quantitative cost estimating: Cost estimating based on statistical methods Cost estimating based on parametric methods Cost estimating based on nonparametric methods Cost estimating based on analogous/comparative methods Cost estimating based on analytical methods.
The expectations of the construction industry are to shorten the time necessary to predict costs, whilst on the other hand, the estimates must be reliable and accurate enough. There are worldwide publications in which the authors report the research results which respond to these expectations. The examples of the use of a regression analysis (based on both parametric and nonparametric methods) are as follows: application of multivariate regression to predict accuracy of cost estimates on the early stage of construction projects [
Artificial neural networks (ANNs) can be defined as mathematical structures and their implementations (both hardware and software), whose mode of action is based on and inspired by nervous systems observed in nature. In other words, ANNs are tools of artificial intelligence which have the ability to model data relationships with no need to assume a priori the equations or formulas which bind the variables. The networks come in wide variety depending on their structures, way of processing signals, and applications. The theory in this subject is widely presented in the literature (e.g., [ Applicability in regression problems where the relationships between the dependent and many independent variables are difficult to investigate Ability to gain knowledge in the automated training process Ability to build and store the knowledge on the basis of the collected training patterns (real-life examples) Ability of knowledge generalization; predictions can be made for the data which have not been presented to the ANNs during a training process.
Some examples of ANN applications reported for a range of cost estimating and cost analyses in construction are replication of past cost trends in highway construction and estimation of future costs trends in this field in the state of Louisiana, USA [
It needs to be emphasized that, despite a number of publications reporting research projects on the use of artificial neural networks in cost analyses and cost estimation in construction, each of the problems is specific and unique. Each of such problems requires an individual approach and investigation due to distinct conditions, determinants, and factors that influence the costs of construction projects. An individual approach to cost estimation in construction is primarily due to specificity of the facilities, including sports fields. The costs of a sport field are significant not only for the construction stage but also later in terms of its maintenance. The decisions made about the size, functionality, and quality are crucial for the future use and operational management of sport fields. The success in investigation of ANNs applicability in the problem will allow proposing a new approach for estimation of the construction cost of sport fields. The new approach, based on the advantages offered by neural networks, will allow predicting the total construction cost of sport fields much faster than with traditional methods; moreover, it will give the possibility of checking many variants and their influence on the cost in a very short time.
The general aim of the research was to develop a model that supports the process of estimating construction costs of sports fields. The authors decided to investigate implementation of ANNs for the purpose of mapping multidimensional space of cost predictors into a one-dimensional space of construction costs. In a formal notation, the problem can be defined generally as follows:
where
In the statistical sense, the problem comes down to solving a regression problem and estimating of a relationship
A general framework of the adopted research strategy is depicted in Figure
Scheme of the research framework (source: own study).
Sports fields are facilities for which some types of works are usually repeated during the construction stage. The main types of works that can be listed are geodetic surveying, earthworks (topsoil stripping, trenching, compacting of the natural subgrade, etc.), works on subgrade preparation for the sports field surface, works on sports fields surface (usually surfaces are either natural or synthetic grass), assembly of fixtures and in-ground furnishings (e.g., football/handball gates, basketball goal systems, volley ball, or tennis poles and nets), works on fencing and ball-nets installation, minor road works and works on sidewalks, landscape works and arranging green areas around the sports fields.
In the course of the research, a number of completed projects on sports fields in Poland were investigated. Both the fields dedicated to one discipline and multifunctional fields were taken into account. The facilities subject to the analysis differed in size of the playing area, arranged area for communication, arranged green area, and fencing. The surfaces were of two types: either natural or synthetic grass. It must be stressed here that the quality expectations for surfaces varied significantly and played an important role in the construction costs. The completed facilities are located all over Poland both in the urban areas (in cities of different sizes) and outside the urban areas (in the villages).
As the problem was formally expressed and the assumption about the use of ANNs was made, the authors focused on the analyses which allowed them to preselect cost predictors. The preselection was preceded by studying both technical and cost aspects of the construction projects on sports fields. This stage of the research allowed collecting the necessary background knowledge about the nature of sports fields as specific construction objects with their characteristic elements, range and sequence of construction works which must be completed, and the clients’ quality expectations.
In the next step, 129 construction projects on sports fields that were completed in Poland in recent years were investigated. For the purposes of fast cost analysis, the authors preselected the following data to be the variables of the sought-for relationship: Total cost of construction works as a dependent variable Playing area of a sports field, location of a facility, number of sport functions, the type of the playing field’s surface (natural or artificial), quality standard of the playing field’s surface, ball stop net’s surface, arranged area for communication, fencing’s length, and arranged greenery area as independent variables.
The criteria for such preselection were the availability of the data in the investigated tender documents and ensuring enough simplicity of the developed model due to which the potential client would be able to formulate the expectations about the sport field to be ordered by specifying values for potential cost predictors in the early stage of the project.
Most of the mentioned variables were of a quantitative type. In the case of the location of the facility, type of the playing field’s surface, and quality standard of the playing field’s surface, only descriptive information was available; the three variables were of the categorical type. Table
Characteristics of dependent variables and independent variables considered initially to be used in the course of a regression analysis (source: own study).
Description of the variable | Variable type | Values |
---|---|---|
Total cost of construction works | Quantitative | Cost given in thousands of PLN |
Playing area of the sports field | Quantitative | Surface area measured in m2 |
Location of the facility | Quantitative | Urban area (big cities, medium cities, small cities) or outside the urban area (villages) |
Number of sport functions | Quantitative | Number of sports that can be played in the field |
Type of the playing field surface | Categorical | Natural or artificial grass |
Quality standard of the playing field’s surface | Categorical | Quality standard assessed according to the available information in the tender documentation |
Ball stop net’s surface | Quantitative | Surface area measured in m2 |
Arranged area for communication | Quantitative | Surface area measured in m2 |
Fencing length | Quantitative | Length measured in m |
Arranged green area | Quantitative | Surface area measured in m2 |
The next step included data collection and scaling categorical variables. In the case of three of the variables (namely, location of the facility, type of the playing field surface, and quality standard of the playing field’s surface), categorical values were replaced by numerical values. The studies of the problem, analyses of the number of completed construction projects on sports fields, and, especially, the analyses of construction works costs brought the conclusions of how the categorical values of the three variables are associated with the costs of construction works. Table
General relationship between the three categorical variables and cost (source: own study).
Location of a facility | Type of the playing field surface | Quality standard of the playing field’s surface | Cost |
---|---|---|---|
Urban area, big cities | Artificial grass | High demands | Higher |
Urban area, medium cities | Moderate demands | ||
Urban area, small cities | Natural grass | Lower | |
Outside the urban area, villages | Low demands |
Categorical values for location of the facility have taken numerical values as follows: urban area: 0.9 for big cities (population over 100,000), 0.66 for medium cities (population between 20,000 and 100,000), and 0.33 for small cities (population below 20,000); outside the urban area: 0.1 for villages. In the case of the type of the playing field, surface artificial grass has taken the value of 0.9 and natural grass has taken the value of 0.1. Finally, depending on the client’s expectations and specifications available in the tender documents, the descriptions of the demands for quality standard of the playing field’s surface took values from the range between 0.1 and 0.9.
The studies of tender documents for public construction projects where completion of sports fields was the subject matter of the contract allowed for collecting the data for 129 projects. The data were collected for projects completed in the last four years all over Poland. The collected information was ordered in the database. After the analysis of outliers, the authors decided to reject some extreme cases for which the total construction cost was unusually high or unusually low. After the elimination of outliers, the data for 115 projects remained.
Further analysis included the investigation of the significance of correlations between the dependent variable and all of the initially considered independent variables, preselected cost predictors. The significance of correlations for
Significance of correlations between the variables (source: own study).
Preselected cost predictors | Is the correlation between the dependent variable, total cost of construction works, and independent variable significant for | Variable’s symbol (for accepted variables only) |
---|---|---|
Playing area of the sports field | Yes | |
Location of the sports ground | No | - |
Number of sport functions | No | - |
Type of the playing field surface | Yes | |
Quality standard of the playing field’s surface | Yes | |
Ball stop net’s surface | Yes | |
Arranged area for communication | Yes | |
Fencing length | Yes | |
Arranged greenery area | Yes | |
As the correlations for the two of preselected cost predictors (namely, location of the sports ground and the number of sport functions) appeared to be insignificant, they were rejected and no longer taken into account as the cost predictors.
Table
Exemplary records of the database including training patterns (source: own study).
| | | | | | | | |
---|---|---|---|---|---|---|---|---|
5 | 565.8 | 968 | 0.9 | 0.6 | 0.0 | 196.8 | 602.5 | 0.0 |
13 | 1359 | 3292 | 0.9 | 0.5 | 600.0 | 2142.0 | 0.0 | 0.0 |
23 | 427.6 | 1860 | 0.9 | 0.3 | 1116.0 | 78.0 | 37.5 | 1000.0 |
37 | 489.5 | 1860 | 0.1 | 0.3 | 240.0 | 100.0 | 0.0 | 0.0 |
46 | 323.0 | 800 | 0.9 | 0.5 | 192.0 | 181.8 | 139.7 | 0.0 |
59 | 181.3 | 800 | 0.1 | 0.3 | 72.0 | 0.0 | 0.0 | 400.0 |
67 | 1972.3 | 4131 | 0.9 | 0.5 | 396.0 | 1096.0 | 207.1 | 2586.4 |
82 | 161.1 | 1650 | 0.1 | 0.1 | 300.0 | 0.0 | 0.0 | 0.0 |
94 | 250.0 | 1470 | 0.1 | 0.2 | 344.5 | 93.9 | 38.5 | 0.0 |
101 | 800.5 | 1104 | 0.9 | 0.8 | 295.9 | 1469.1 | 93.3 | 374.6 |
Table
Descriptive statistics for the models’ variables (source: own study).
Variable’s symbol | Average value | Minimum value | Maximum value | Standard deviation | |
---|---|---|---|---|---|
Dependent variable | | 457.56 | 33.30 | 2592.50 | 373.14 |
| |||||
Independent variables | | 1333.79 | 275.00 | 5600.00 | 788.49 |
| 0.59 | 0.10 | 0.90 | 0.27 | |
| 0.49 | 0.10 | 0.90 | 0.16 | |
| 300.26 | 0.00 | 2212.00 | 345.27 | |
| 193.16 | 0.00 | 2142.00 | 325.84 | |
| 105.24 | 0.00 | 602.50 | 121.13 | |
| 324.29 | 0.00 | 3000.00 | 603.26 |
It is noteworthy that minimum value, namely, 0.00, for the variables
The database records (whose number equalled 115) were used as training patterns
The values of the variables were scaled automatically before and after each of the ANN’s training. This was done due to the functionalities of the ANN’s software simulator used in the course of the research. The variables were scaled linearly to the range of values appropriate for activation functions employed for certain investigated ANN. The results, especially ANNs’ training errors, presented further in the paper, are given as original, not scaled values.
After the selection of independent variables, a formal notation of the relationship in the statistical sense can be given as follows:
The aim of this stage of the research, namely, the initial training of ANNs, was to assess their applicability to the problem in general and to take a decision whether to continue the research or not. A variety of feed forward ANNs were trained in the automatic mode. The overall number of networks equalled 200; the authors took into account 100 multilayer perceptron (MLP) networks and 100 radial basis function (RBF) networks as the types appropriate for the regression analysis and suitable for the formulated problem.
The main criteria to assess the applicability of the ANNs were the quality of predictions made by trained networks and the errors. The measure for quality of predictions was Pearson’s correlation coefficient
In the case of RBF networks, both the quality of predictions and errors were so dissatisfying that the authors decided to focus on the MLP networks only. The results for the MLP networks were satisfying; they are presented synthetically below in Figure From the database of training patterns, learning ( The overall number of available training patterns was divided into the three subsets in relation: For each drawing, 10 different networks were trained. The networks varied in the number of neurons in the hidden layer, Distinct activation functions, such as linear, sigmoid, hyperbolic tangent, and exponential, were applied in the neurons of a hidden and output layer.
Quality end errors of ANNs after the initial training phase: (a) scatter diagram of Pearson’s correlation coefficients and (b) scatter diagram of errors
Learning and validating, that is,
From (
The overall results, in terms of the quality of predictions and errors, are presented in Figures
Quality end errors of ANNs after the initial training phase: (a) scatter diagram of Pearson’s correlation coefficients and (b) scatter diagram of errors
Part (a) of Figures
Table
Summary of the initial training of ANNs for MLP networks (source: own study).
Subset | | | |||
---|---|---|---|---|---|
Average | Standard deviation | | Average | Standard deviation | |
| 0.901 | 0.240 | 82.1% | 145.5 | 111.7 |
| 0.879 | 0.228 | 81.0% | 108.8 | 57.8 |
| 0.881 | 0.233 | 80.5% | 126.9 | 83.3 |
As can be seen both in Figures
This stage of the research confirmed the general applicability of ANNs to the investigated problem. The decision was to continue the research. Moreover, it allowed choosing a group of MLP networks to be trained in the next stage.
With respect to the initial training results, a group of 5 networks was chosen for the closing phase of the research. The details including networks’ structures and activation functions are given in Table
Details of the selected ANNs for further training (source: own study).
ANN | Number of neurons in the hidden layer | Activation function hidden layer | Activation function output layer | Training algorithm |
---|---|---|---|---|
| 2 | Exponential | Linear | BFGS |
| 2 | Exponential | Hyperbolic tangent | BFGS |
| 3 | Exponential | Linear | BFGS |
| 5 | Sigmoid | Linear | BFGS |
| 5 | Hyperbolic tangent | Exponential | BFGS |
Assumptions for the networks training and testing were different than in the initial stage. From the set of 115 training patterns, the testing subset,
The remaining patterns have been involved in the 10-fold cross-validation of the networks (cf. [
The performance of ANNs was assessed in general in the light of correlation between real-life and predicted values,
Results of the selected ANNs training (source: own study).
ANN | | | ||||
---|---|---|---|---|---|---|
| | | | | | |
| ||||||
Max | 0.992 | 0.997 | 0.997 | 92.043 | 111.521 | 68.006 |
Average | 0.985 | 0.982 | 0.993 | 66.416 | 63.681 | 41.286 |
Min | 0.973 | 0.948 | 0.985 | 49.572 | 20.138 | 26.656 |
Standard deviation | 0.005 | 0.015 | 0.004 | 11.620 | 24.243 | 12.338 |
| ||||||
Max | 0.994 | 0.994 | 0.991 | 150.544 | 376.102 | 79.538 |
Average | 0.978 | 0.979 | 0.983 | 73.351 | 90.999 | 57.609 |
Min | 0.933 | 0.933 | 0.979 | 43.473 | 40.899 | 35.816 |
Standard deviation | 0.022 | 0.018 | 0.003 | 31.521 | 95.430 | 13.082 |
| ||||||
Max | 0.994 | 0.994 | 0.991 | 150.544 | 376.102 | 79.538 |
Average | 0.978 | 0.979 | 0.983 | 73.351 | 90.999 | 57.609 |
Min | 0.933 | 0.933 | 0.979 | 43.473 | 40.899 | 35.816 |
Standard deviation | 0.022 | 0.018 | 0.003 | 31.521 | 95.430 | 13.082 |
| ||||||
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| ||||||
Max | 0.996 | 0.995 | 0.996 | 129.738 | 211.452 | 83.939 |
Average | 0.979 | 0.980 | 0.980 | 77.525 | 72.796 | 50.383 |
Min | 0.946 | 0.957 | 0.952 | 32.191 | 41.891 | 27.691 |
Standard deviation | 0.013 | 0.013 | 0.014 | 27.375 | 49.569 | 18.072 |
Figure
Scatter plot of
Two more criteria, relating to the accuracy of cost estimation, were also specified for assessment of the selected network
Table
| | | |
---|---|---|---|
MAPE | |||
Max | 18.76% | 25.13% | 21.20% |
Average | 12.68% | 14.43% | 9.97% |
Min | 7.49% | 5.54% | 5.60% |
Standard deviation | 3.49% | 5.66% | 4.44% |
| |||
Max | 115.83% | 104.72% | 93.42% |
Average | 65.47% | 61.55% | 46.67% |
Min | 34.34% | 9.76% | 11.87% |
Standard deviation | 21.35% | 27.59% | 17.65% |
A thorough analysis of the selected network, namely,
In this paper, the authors presented their investigations on applicability of ANNs in the problem of estimating the total cost of construction works for sports fields. The research allowed the authors to confirm assumptions about the general applicability of the MLP type networks as the tool which has the potential of mapping the relationship between the total cost of construction works and selected cost predictors which are characteristic of sports fields. On the other hand, the RBF type networks appeared to not be suitable for this particular cost estimation problem.
Apart from the general conclusions about the applicability of ANNs, one type of network tailored for this problem was selected from a broad set of various MLP networks. The analysis of the results indicates a satisfactory performance of the selected network in terms of correlations between the real cost and cost predictions. The level of the Estimating cost of construction works for a couple of variants of sports fields in a very short time Supporting the decisions made by the client being aware of the range of the cost estimation accuracy.
The obtained results encourage continuation of the investigations which will aim to improve the model, especially to lower the
The authors declare that there are no conflicts of interest regarding the publication of this paper.
The authors use the STATISTICA software in their research; therefore some of the presented assumptions for neural networks training are made due to the functionality of the tool.