Reduced Multivariate Polynomial Model for Manufacturing Costs Estimation of Piping Elements

This paper discusses the development and evaluation of an estimationmodel ofmanufacturing costs of piping elements through the application of a ReducedMultivariate Polynomial (RMP).Themodel allows obtaining accurate estimations, even when enough and adequate information is not available.This situation typically occurs in the early stages of the design process of industrial products. The experimental evaluations show that the approach is capable, with a low complexity, of reducing uncertainties and to predict costs with significant precision. Comparisons with a neural network showed also that the RMP performs better considering a set of classical performance measures with the corresponding lower complexity and higher accuracy.


Introduction
Low-volume production is usually performed on order; in this case, the producer must establish the price and the deadline to supply when the contract is signed.Therefore, even in the earliest stages of offer preparation, precise data of economic aspects are needed.In that scenario, the cost estimation has to be made using very sparse data and normally without making the process planning tasks.Thus, engineers have to estimate costs just considering geometrical aspects using data from CAD files or from simple sketches.In some aspects, the estimation of raw materials costs can be considered a straightforward task.Normally, the raw material computations are based on weight and on past prices or quotations required for the raw materials' suppliers.Some other important drivers such as direct labor hours constitute a little harder challenge to estimate.That leads to a more exact way of estimating the production times, separating clearly the set-up times and the unitary times in each one of the manufacturing operations.
Despite the level of precision that can be attained in the estimation of both aforementioned drivers, a lot of other factors or drivers remain unconsidered.This may bring distortion and variability to the estimations.Therefore, there is a need for developing a decision tool that can estimate precisely the manufacturing costs incorporating all the factors that generate influence into total manufacturing costs, that is, indirect labor, set-up operations, tooling, fixturing, and others.
From the product life cycle point of view, initial cost estimates help engineers to reach a consensus on the design, that is, the materials, the most viable processes, and the structural aspects of the product.Since there is a reduced set of defined and specific characteristics in the earlier design phases, designers are prone to committing errors.Therefore, it is very desirable that strategic product development or design change decisions must be based on quantitative analysis instead of presumptions.
Additionally, over the course of time, detailed information can be collected and used to determine the product costs.However, it is a fact today that despite the different existing approaches, all of them use IT support in some way to perform the cost estimation.Recently, there has been considerable interest in developing approaches and tools for early cost estimation.In 2004, Curran et al. [1] published an exhaustive review in systematic cost modeling and its applications to the aero-and astronautic industry.There are three principle quantitative techniques used in cost approximation: analogybased techniques, parametric method, and analytical models.The latest toolset for cost approximation has been developed using artificial intelligence (AI) techniques.The aim of using AI for cost estimates is to mimic expert's reasoning in defining variables, their relevance, and the extent of the latter in the entire expenditure for product fabrication.Currently, the most used AI approaches for approximation are case-based reasoning and artificial neural networks (ANNs).Case-based reasoning is closely related to the analogy-based method.Nevertheless, the biggest benefit of AI is data interpolation, archiving, recycling, and recording; it can accommodate the little known about a proposed product for its modeling, as with novel development projects, for example.Graham and Smith [2] introduced their case-based estimator (CBE), comprised of one dependent and five independent variables with a small case base.In that work, a study was done to measure the efficacy of two retrieval mechanisms: a standard mathematical model and a variation on the ID3 decision treegenerating algorithm to determine their consistency.
A different case-based program for forecasting the final cost of stamping goods from sheet metal was proposed by [3].There was a strong correlation between the assessments given by the intelligent system and those made by a seasoned specialist.Shehab and Abdalla [4] suggested an adaptable system that utilizes past databases for predicting fabrication costs.It is able to define the needed materials, not to mention the means and metrics for manufacture given certain input variables.In addition, it gives a pricing estimate for every stage of development, from proposal to production.Ficko et al. [5] shows that neural networks have less variance for cost forecasts, in terms of traditional techniques like linear and nonlinear relationships between the dependent variables and regressors.Cavalieri et al. [6] compared two separate methods, parametric and ANN, and their approximation of per-unit production costs for a novel kind of brake disks.Verlinden et al. [7] contrasted different techniques for cost evaluation for fabrication processes with sheet metal: regressions and ANNs.Even though their relative variances are slight, the ANN generally performed more accurately.De Cos et al. [8] made a comparison between applying a nonparametric regression formula and an ANN technique in approximating the costs of metal elements to be used in the aerospace industry.Hybrid AI techniques have also been developed to perform cost estimation.Che [9] proposed a hybrid artificial intelligence technique to perform cost estimation.Specifically, genetic-based programs ran simulations to define completely or almost completely efficient cost factors.Further, ANNs were used to budget nonintuitive production costs of a non-linear nature.Deng and Yeh [10] propose an ANN for projecting costs of plastic-injection molding, which is trained using a particle-swarm-based approach.Kim and Han [11] and Deng and Yeh [12] show the ability of least squares to support vector machines (LS-SVM) to make accurate cost estimations in the manufacturing of airframe wing-box structures.They compared the estimation performance using back-propagation neural networks (BPN) and statistical response surface methodology (RSM).More recently, Rui et al. [13] presented ten regression models developed to estimate pipeline construction component costs for different compressor station capacities in different locations.Hart et al. [14] used principal component analysis (PCA) in order to identify the physical parameters that demonstrate the highest correlation to the cost.The set of principal variables was regressed, using the Kriging method.
The model was used to analyze sets of cost data available in the literature and compare them with results from a neural network-based analysis and with a cost regression model.Further, a case study addressing the fabrication of a submarine pressure hull was developed in order to illustrate the new method.In this paper, we present a cost estimation model of pipe manufacturing based on Reduced Multivariate Polynomial.To the best of our knowledge, no other research work has tackled the manufacturing cost estimation of piping elements using Reduced Multivariate Polynomial.More precisely, the contributions of this paper are the following.
(i) Even when enough and adequate information is not available in the early stages of the design process, our approach allows for fairly accurate prediction.
(ii) We prove that Reduced Multivariate Polynomials are capable of reducing uncertainties in order to predict costs with higher precision, while performing well with a significantly lower model complexity.
(iii) Last, but not least, our approach outperforms ANN, the most used, state-of-the-art technique in cost estimation projects.
The rest of this paper is organized as follows.Section 2 defines piping manufacturing.Section 3 proposes the cost estimation models.Section 4 presents the results from a comprehensive experimental evaluation of our proposal, and in Section 5 we discuss the results.Finally, Section 6 concludes.

Piping Manufacture
The definition of piping is any tube-like structure or unfilled cylindrical shape purposed for material conveyance.These include liquids, slush, fine particles, and ground bulk.The manufacturing thereof relates to the way in which particular sections of piping can be fabricated.Every manufactured pipe section is referred to as a "joint" or a "length" (see Figure 1).To speed construction, these are normally dispatched from the factory as "double joints, " which are two preemptively welded lengths.The majority of pipe welds are without seams, that is, a longitudinal weld; for bigger dimensions, however, a spiral weld is not unheard of.Many materials are used in pipe fabrication, but the most viable have been carbon steel, stainless steel, and other steel alloys.Customary welds include TIG, MIG, and SAW.To approximate costs, usual techniques utilize past data.These data allow researchers to create in-depth budget forecasts.Rui et al. [15] investigated cost overrun of pipeline projects.The authors detected that the cost error of underestimated pipeline construction components is generally larger than that of overestimated pipeline construction components except total cost.Results of the To estimate correlations between cost and several key factors, three techniques are widely used: specialist approximations (nonobjective), account assessments, and statistical methods (regression).These techniques, however, usually have too little reliability; therefore, their cost analyses should generally not be treated as practical.

Cost Estimation Models
In this section, the Reduced Multivariate Polynomial and the Multilayer Perceptron are introduced.

Reduced Multivariate Polynomial.
To reduce the number of parameters in a full multivariate polynomial model [16] proposes a reduced multivariate polynomial (RMP) defined as where  = ( 1 ,  2 , . . .,   ) represents the vector of  independent variables, (   ⋅ ) represents the dot product given as ( 1  1 +  2  2 + ⋅ ⋅ ⋅ +     ), and the  value denotes the number of parameters of the model and is given by  = 1 +  + (2 − 1).Using matrix notation the before equation will be given as where  represents observed samples number.
In order to estimate the linear parameters  = ( 1 ,  2 , . . .,   ) will be used the robust regression method based on the iteratively reweighted least-squares algorithm with Talwar weighting function [17,18].In the first iteration, the parameters {  } are estimated using ordinary linear leastsquare technique.That is, where (⋅) −1 is the Moore-Penrose pseudoinverse matrix [19].At subsequent iterations, the parameters α( + 1) are recomputed using the weighted linear least-square algorithm and are obtained using the following equation: where ℎ are the leverage values from a least-squares fit and MAD is the median absolute deviation of the residuals from their median.Finally, the before procedure continues until the values of the parameters estimates converge within a specified tolerance.

MLP Neural Networks.
The cost data set () is estimated by using an MLP neural network model given as where  ℎ is the number of hidden nodes, b represents the linear output parameters,  denotes the explicative vector containing  values, V  denotes the nonlinear parameters, and   (⋅) are hidden activation functions, which are derived as
According to the following equations: where  represents the Jacobian matrix of the error vector evaluated in  and   =   −   is the error vector of the MLP neural network for  patter,  denotes the identity matrix and the parameter  is increased or decreased at each step of the LM algorithm.

Performance Metrics.
In this paper, we use five criteria of accuracy to compare the estimation capabilities during the test phase of the evaluated methods.The first criterion is the root mean square error (RMSE) given as The second criterion is the mean absolute error (MAE) given as The third criterion is the mean average percentage error (MAPE) given as The fourth criterion is the coefficient determination ( 2 ) given as The fifth criterion is the generalized cross-validation given as where  is the number of samples and  is the parameters number of the model, which are defined in Section 3.1 for RMP and Section 3.2.for MLP.
In order for the goodness of fit to be acceptable, the value of  2 must be close to 1 and the values of RMSE, MAE, and MAPE approach to 0.

Methodology
The main hypothesis of this research is that by using Reduced Multivariate Polynomials, it is possible to develop costs estimation models that perform better (by being more accurate) than the artificial neural networks based cost estimation techniques.Through experimentation and using actual data, these models will be used to provide answers to the following research questions.
(i) Is it possible to estimate manufacturing costs, with a very small set of product characteristics?
(ii) If there is insufficient product information, is it a barrier to effective product costs estimations?The first step was the identification of the significant variables.The identification process consisted in identifying the most correlated variables with the dependent variable, that is, the cost.The analysis of the resulting Pearson coefficients allowed distinguishing the least correlated variables and the most correlated ones.Determination coefficient is widely used in cost estimation literature, because that coefficient allows to measure the correlation between the model output and the observed cost (actual).Table 1 has been arranged in a descending order with the values of the Pearson coefficients obtained for each variable.
Initially we proceed to select the most significant variables from those shown in Table 1.The main goal of the characteristics selection phase is to find the minimum number of explicative variables that are importantly correlated.It was defined as a threshold value of a 60% of the Pearson coefficient.That is important because as the number of explicative variables decreases, the model complexity also decreases.In Table 1, it can be observed that the variables weight and welding type are highly significant, in which Pearson coefficients are greater than 75%.
The elimination of the last two variables, cavities and class, seems obvious, and the lack of connection with the total manufacture cost results is quite apparent.
As can be seen in Table 1, diameter and difficulty variables showed similar Pearson coefficients.
In analyzing the rest of the Pearson coefficient of the two aforementioned variables (diameter and difficulty) in wich values fluctuate around 70%, this allows to conclude that it would be enough to incorporate one of those variables into the model to add almost the same benefit in terms of additional accuracy.In other words, introducing those two variables adds more redundancy into the model.Therefore, it is possible to select one variable without losing accuracy by the model, but maintaining the complexity low.
To perform this selection, we used as a criterion the GCV, which allows measuring the tradeoff between accuracy and model complexity.Therefore, and aiming at reducing the complexity of the model, the third significant variable was selected between both variables as it is shown in the next section.

Reduced Multivariate Polynomial.
For identification of the third most significant variable to be incorporated into the model, besides the determination of the polynomial degree,  the GCV defined in the section of performance metrics was used here.GCV is used in model calibration phase, because it allows carrying out a tradeoff between accuracy and the number of parameters of the estimation model.
Figure 3 shows the relation between GCV with different polynomial degree, considering diameter and difficulty as the third significant variable.
As it can be observed in Figure 3, incorporating the diameter of the part, it is possible to obtain a better cost estimation.In addition, in the same figure it is possible to conclude that an eight-degree polynomial shows the best compromise between complexity and precision in cost estimations.The selected configuration will be regarded as RMP (3,8), where 3 represents the number of independent variables and 8 represents the polynomial degree.

Multilayer Perceptron.
A series of alternative configurations for cost estimation neural networks was implemented, tested, and compared.In a number of experiments, neural networks were firstly trained and then applied to the set of training data.The metric used in this comparison was the GCV defined in Section 3.3.The accuracy of the obtained results and other indicators of performance were explored.Thus, rather than derived theoretically, the neural network model was determined empirically.The best structure was selected after having tested 20 ANN configurations with one hidden layer, different numbers of nodes in the hidden layer, and different number of epochs or iterations in the training phase.Each configuration was run 30 times and with 500 iterations.The results are shown in Figure 4.In that figure it can be observed that the best compromise between accuracy and complexity is reached with 7 nodes in the hidden layer.The selected configuration will be regarded as MLP (3,7,1), where 3 represents the number of input nodes, 7 the number of nodes in the hidden layer, and 1 the number of nodes in the output layer.

Result Discussion
Actual data of a real manufacturer of piping elements was considered to test the developed models.In this case, it was presupposed that a company wanted to estimate the manufacturing cost of a new piping element that will be used in fluid transport projects by the mining industry.The experiment assumed that manufacturing costs of all similar elements follow the same single cost function, so that costs are completely determined by the product attributes known at the time of cost estimation.Two types of models were defined for the comparison: (i) models based on a Reduced Multivariate Polynomial, (ii) models based on Sigmoid Neural Networks (MLP-LS) levenberg marquardt.
Aiming at measuring the actual capacity of the proposed model versus the artificial network and to determine the generalization capacity and statistical strength, the test data group (those corresponding to 20% of the initial sample) was used.Next, the following paragraphs show the results obtained by both models.Figure 6 depicts the residuals versus the predicted costs obtained by the RMP.It can be observed, that an important fraction (over 98%) of the cases tested are acceptable with residuals ranging from −20% to 20%.

Results with RMP.
Figure 7 presents the scatterplot of actual versus predicted costs for the RMP (3,8).As it can be observed a great part of the points lie very close to the line which means a strong prediction capability (when a perfect prediction occurs, all points should lie on this line).In addition, this scatterplot provides an equation of the regression line between predicted and actual values.In this equation, the closest to 0 is the  factor, and the closest to 1 is the line slope ( factor), and the better can be considered the estimation.Note that in the case of RMP (3,8), the correlation coefficient () is equal to 0.99434.(3,7,1).Figure 8 plots the comparison between simulated and real values of the testing set using the MLP(3,7,1).Figure 9 presents the residuals versus the predicted costs obtained by the MLP(3,7,1).It can be observed that a higher fraction (92%) of the tested cases have residuals ranging from −20% to 20%.

Results with MLP
Figure 10 shows the scatterplot of actual versus predicted costs for the MLP(3,7,1).It is seen that, similarly to the RMP, a large number of the points lie very close to the line that indicates a strong prediction capability.Note that the correlation coefficient () is equal to 0.99235 in this case.Table 2 shows the results for both models.It can be observed that the best results were achieved by the RMP.As it may be observed, RMP shows a better performance for all presented metrics, especially where the best profit is obtained for precision of the model and its generalization capability.

Conclusions
A model based on Reduced Multivariate Polynomial for cost estimation of piping elements manufacturing has been proposed.The model is based on nonlinear independent variables and linear coefficients, which allows obtaining the model parameters through the use of robust regression method based on the iteratively reweighted least-squares algorithm.
The main advantages of the proposed strategy are as follows.
The proposed model offers a lower complexity in terms of the number of parameters and a higher accuracy in terms of average relative error.Another advantage of the proposed model is that its complexity increases linearly as its number of independent variables increases, whereas in using MLPbased models, the increment of the number of variables leads to a polynomial increment of the complexity.Besides, to calibrate a neural network with mixed weights (linear and nonlinear), an iterative nonlinear algorithm is needed (Hessian based).Differently, an RMP model requires only an iterative linear algorithm for its calibration.This constitutes a great contribution to reduction of complexity during the model calibration process maintaining a high precision in manufacturing cost estimation.
For comparison ends, an ANN multilayer perceptron was applied to the same estimation problem.Both neural network and polynomial models proved capable of reducing uncertainties related to the cost estimation of piping elements in early stages of their manufacture.The first model, RMP, showed a better performance in terms of accuracy and generalization capacity.
Therefore, the proposed approach constitutes a strong approach for solving complex nonlinear mapping in predicting the manufacturing cost.

Figure 4 :
Figure 4: Average of the MAPE in 30 runs with 500 versus nodes in the hidden layer.

Figure 5
plots the comparison between simulated and real values (sorted) of the testing set using the Reduced Multivariate Polynomial.

Figure 5 : 10 Figure 6 :
Figure 5: Real and simulated (obtained by the RMP) values using the testing set.

Table 1 :
Correlation coefficients in descending order.