Modeling of Throughput in Production Lines Using Response Surface Methodology and Artificial Neural Networks

The problem of assigning buffers in a production line to obtain an optimum production rate is a combinatorial problem of type NP-Hard and it is known as Buffer Allocation Problem. It is of great importance for designers of production systems due to the costs involved in terms of space requirements. In this work, the relationship among the number of buffer slots, the number of work stations, and the production rate is studied. Response surface methodology and artificial neural network were used to develop predictive models to find optimal throughput values. 360 production rate values for different number of buffer slots and workstations were used to obtain a fourth-ordermathematical model and four hidden layers’ artificial neural network. Bothmodels have a good performance in predicting the throughput, although the artificial neural networkmodel shows a better fit (R = 1.0000) against the response surface methodology (R = 0.9996). Moreover, the artificial neural network produces better predictions for data not utilized in the models construction. Finally, this study can be used as a guide to forecast the maximum or near maximum throughput of production lines taking into account the buffer size and the number of machines in the line.


Introduction
The Buffer Allocation Problem (BAP) can be defined in three different ways by means of different objective functions as presented in [1].These objective functions are expressed taking into account the maximum performance rate in a production line, the minimum buffer size to obtain a fixed throughput rate, and minimizing the average inventory of work in process.The mathematical models of these versions of the problem are given in the following paragraphs [2].

Problem BAP-A (The Dual Problem).
In this problem we assume that there are  machines and  − 1 storage areas with  total integer buffer slots to be allocated.Possible solutions have the form  = ( 1 ,  2 , . . .,  −1 ).The objective is to maximize the throughput of the production line subject to a fixed quantity of buffer slots () distributed among the  machines.The problem may be denoted as follows: max  (n) = max  ( 1 , . . .,  −1 ) s.t.(1) Problem BAP-B (The Primal Model).This problem is focused to minimize the total number of buffer slots to be allocated in between every machine , given a minimum throughput  0 .The problem may be stated as follows: where (n) = ( 1 , . . .,  −1 ) indicates the average of the WIP inventory, as a function of the buffer size vector, and the desired throughput  0 .
Metaheuristic methods are widely employed for solving combinatorial optimization problems, and these have been applied successfully in many types of problems of production lines where it is impossible to find optimum solutions by exact methods in a short amount of time when the problem size increases.Our objective is to study the relationship among the production rate, buffer size, and the number of machines in production lines.Then, we can propose the use of an RSM or ANN that models the behavior of the algorithms used to calculate the throughput of serial production line to obtain similar results in a shorter time, by knowing the number of workstations  and the total space of the buffer .
The reminder of this paper is organized as follows.In Section 2, we present previous work related to this topic.In Section 3, we describe the basic concepts of the RSM and ANNs, the fitting techniques applied in this study.In Section 4, we show the steps in the development of both models and the numerical experiments performed.In Section 5, we give some results and compare the performance of the RSM and ANN obtained.Finally, Section 6 concludes this paper.

Related Work
The BAP has been studied for over 50 years and numerous articles have been published.The first study associated with this subject was done by Koenigsberg [3], presenting an analysis and a review of the problems associated with the effective functioning of production systems.
The solution methods of the BAP has fallen in two big groups: generative and evolutionary methods.The use of these methods is combined in a closed configuration.Generative methods are focused on the search of the optimal size of the temporal storage for a better performance of the system.The simpler algorithm uses a method that considers a complete enumeration of the line.However, it is only applicable for small systems since the total number of feasible solutions grows exponentially.Thus, for big systems it is impossible to look for through the space of solutions.In the last years, many search methods and metaheuristics have been highly adapted by investigators to solve the combinatorial problem of the size of temporary storage.
In [4], a two-stage heuristic algorithm is developed to solve the problem of minimization of the total temporal storage.However, this method cannot always find the optimal solution and it also does not converge in all cases.Seong et al.
adopt the concept of a pseudo gradient and the projection of this method, to figure out the problem of maximization of the production rate for a line of fabrication [5].A year later, Seong et al. employed a method of gradient for the maximization of the production rate for a problem with exponential fabrication [6].Other researchers also employed the gradient, such as Gershwin and Schor, who faced the problem of the temporal storage capacity, minimizing the total capacity, based on the observation of whether the production rates are expanded in a first order; this may be formulated as a problem of integer linear programming [7].
Search algorithms tend to solve the exponential explosion of the number of solution vectors.In this case, some algorithms apply a primal-dual approach to minimize the total temporal storage, subject to the restriction of the production speed.The main problem is to reduce the total size of temporal storage in a defined production rate, while the dual formulation is maximizing the throughput of the production line, subject to a limitation of total space of storage.Vouros and Papadopoulos studied the maximization of the benefits of the production line through a nonlinear method that is fast and precise [8].Nevertheless, the limitation of the production rate is not taken into account.The authors based their research on a system called ASBA2 that contains a knowledge based system.This system determines the optimal plans of storage capacity, whose objective is to maximize the performance of the production lines.In order to validate the ASBA2 results, the authors executed an exact algorithm to calculate the production rate and compare them.
In [9], Nahas et al. utilized a local search heuristic to obtain the allocation plan for a given number of buffers slots.Aksoy and Gupta developed a near optimal buffer allocation plan (NOBAP) for a cellular remanufacturing system with a certain number of buffer slots [10].The algorithm that Aksoy and Gupta proposed uses an open queueing network, and it is based on the decomposition principle and expansion methodology.
There are primarily two disadvantages of the traditional search methods.The first one is that the traditional search sometimes cannot jump over local optimal solutions in the search of the global optimal solution.The second disadvantage is that, with these approximate methods, it is difficult to observe small changes in the buffer size that will affect the system.
Metaheuristics are search methods using strategies that guide the search process and explore the search space.The aim is to find optimal solutions or almost optimal.The algorithms are approximated and are generally nondeterministic.The typical methods applied in this area include Tabu Search [11][12][13], Simulated Annealing [14], Genetic Algorithms [15], and Ant Colony Optimization [16].In order to search for the best solution space, a recent tendency is to hybridize metaheuristics with other methods like nested partitions [17], the Branch and Bound method [18], and the local search [15].These hybrid search methods have an advantage over the traditional ones because they can jump over local optimal solutions in the search of the global optimal.The main disadvantage of these methods is that they only can be applied to resolve specific problems.
On the other hand, dynamic programming [19,20], artificial neural networks [21], genetic programming [22], and immune algorithm systems [23] have been applied with success in the solution of BAP in production lines.Moreover, some studies utilize diverse experimental designs for the evaluation of the solutions for the BAP [24][25][26][27][28].
There are two methods mainly used to evaluate the results produced by the strategies described above: analysis and simulation methods.The exact analytical method shows results based on queueing theory methods, which are difficult to obtain and are only available for line systems of short productions.The methods of approximate evaluation are decomposition, aggregation, and expansion.These are the most utilized methods to resolve BAP; in particular, decomposition method is used by many researchers [7,16,29].The main idea of this method is to decompose the original module through the analysis of a conjunction of smaller subsystems that are much easier to solve.The main advantages of the decomposition method are the computational efficiency and the precision to find a solution.However, the disadvantage is that it can be applied only under the assumption that the production rates are deterministic or they follow an exponential distribution and the rates for machine failure and repair are geometrical or exponential randomly distributed.
The expansion method takes up the method of queueing theory.It is a method of general expansion that is used under certain assumptions and can be used for service times in general.This method was applied to solve series and merge and split topologies of production lines with finite buffers [30,31].
The evaluation by the aggregation method has been also used to solve the BAP [18,32].It was applied to evaluate the performance of the buffer allocation in production lines.The idea of this method is to define two stations and a buffer as a subline; then it will be replaced for just one equivalent station.Next, this station is combined with a buffer and station of the original line to form again a subline of two stations and one buffer.This new subline is aggregated into a new equivalent station.This process is repeated until all the stations are added.
On the other hand, simulation provides many advantages for modeling realistic and complex systems instead of analytical methods.Nevertheless, the development of simulation models is a time consuming task.Simulation is more suitable to analyze problems of production lines at a detailed level, when mathematical analysis is not able to be applied to these kinds of problems.
Several research works [15,21] can be cited as applications of simulation models to search the solution of BAP.The study of the transference lines without restrictions in the stations and finite temporal intermediate buffer was performed by Hillier and So in 1991 and Hillier et al. in 1993 [33,34].In these research papers, the stations have exponential time in the process; both articles employ a complete evaluation of all the buffers for minimizing the space and maximizing the throughput.The authors employed an exact method to determine the production rate.The difficulty with this approach is that exact solutions can be only applied to small systems.
ANNs have been applied to model the behavior of many systems with a good acceptance.In medicine, there is a study where ANNs where used to identify patients with hard risk for dying after suffering an acute myocardial infarction [35].Furthermore, Pesko et al. present a comparative study between the use of ANNs and support vector machines, which are used to estimate costs and duration time of the construction of urban roads [36].
Tsadiras et al. propose the application of an ANN in the development of a decision support system, in order to assist production line designers in making decisions related to buffer distribution in reliable production lines.The proposed ANN contains one hidden layer with 10 hidden neurons [37].
In this research work, a response surface model and an artificial neural network are proposed to represent the relationship among the number of buffer slots, the number of workstations, and the throughput of the production line.Furthermore, the ANN models were created with different combination of neurons, and with 1, 2, 3, and 4 hidden layers, which provide a wide range of possibilities instead of only one hidden layer of neurons.

Data Fitting Techniques
In this paper, two techniques have been applied to obtain models representing the relationship among the throughput of a production line and the parameters  (number of buffer slots) and  (number of machines in the production line).

Response Surface Methodology.
Response surface methodology (RSM) is a group of mathematical and statistical techniques used to describe the relationships between a response of interest, , and a number of associated control (or input) variables denoted by  1 ,  2 , . . .,   [38].The most extensive applications of RSM are in cases where several input variables potentially influence some performance measure or quality characteristic of a process.Thus, performance measure or quality characteristic is called the response.The objectives of RSM include the determination of variable settings, for which the mean response is optimized, and the estimation of the response surface.In general, such a relationship is unknown but can be approximated by a low degree polynomial model of the form where  = ( 1 ,  2 , . . .,   )  , () is a vector function of  elements,  is a vector of unknown constant coefficients referred to as parameters, and  is a random experimental error assumed to have a zero mean.RSM has been widely applied in optimizing various processes in environmental studies for modeling and analysis of water and wastewater treatment processes [39].Some stages in the application of RSM are [40] (1) the selection of independent variables of major effects on the system through screening studies; (2) the choice of the experimental design and carrying out the experiments according to the selected experimental matrix; (3) the mathematic and statistical treatment of the obtained experimental data through the fit of a polynomial function; (4) the evaluation  of the model and fitness; (5) the verification of the necessity and possibility of performing a displacement in direction to the optimal region; (6) obtaining the optimum values for each studied variable.

Artificial Neural
Network.An artificial neural network (ANN) consists of a set of processing units, also called neurons, that are connected with each other.It can be described as a directed graph, and each neuron is a transfer function.A neuron is generally a nonlinear element of multiple inputs and a single output.The architecture of a neural network is determined by all the connections in the network and transfer functions of the neurons [41] The backpropagation algorithm proposed by [42] is the most popular algorithm to train ANNs.Moreover, advanced methods like Marquardt [43][44][45], Quasi-Newton [46], or conjugating gradient algorithms [47,48] are also very popular.Due to their application in dynamic environments, these classic learning methods have to be modified to fulfill three important requirements: (i) The capacity to work in online mode (ii) The capacity to adjust its control parameters (iii) The capacity to adapt its structures, all of them in accordance with the learning process.

Numerical Experiments
In this work, two sets of numerical experiments were considered.The first one consists of 360 experiments taken from [2], where  = 3, 4, . . ., 20 and  = 1, 2, . . ., 20 (Figure 1); it was considered for the development of the RSM and ANN models.The second one consists of 55 experiments (Table 1); these data were utilized for the validation process of the obtained models.
In the latter case, the experiments are classified in three categories: small, medium, and large lines (Table 2).In all cases, reliable exponential and balanced lines were considered with equal mean service rate   =  = 1  construction of the RSM model.Two numerical factors,  (number of machines) and  (buffer size), designated as  and , respectively, were considered in the experimental design.In addition, a fourth-order mathematical model was chosen for modeling the relationship among , , and the throughput, which is sufficiently complex to approximate the main features of the system.The data were analyzed using multiple regressions through the least squares method to fit the following: where Th is the throughput response;  0 is the intercept;  1 , . . .,  13 are the linear coefficients, and ,  are the coded independent variables.For this task, a specialized Matlab toolbox was used to carry out the regression analysis.

Artificial Neural Network Development.
A Matlab script was used to generate the ANN models using the Matlab Neural Network toolbox, starting with a three-layer model, where the first layer (input layer) contains two neurons ( and ).The number of hidden layers was varying from 1 to 4, and the output layer, composed of one neuron, represents the throughput of the production line.
Each hidden layer was composed of 5, 8, 10, 12, or 15 neurons.Thus, a total of 780 ANN configurations were created, and for each of them 50 iterations were executed in order to find the best model for every configuration.After the 780 best ANNs were generated and saved in a.mat file, another Matlab script was implemented to find the ANN with the lowest error and a coefficient of correlation near to 1.
The obtained ANNs were trained with a Levenberg-Marquardt backpropagation algorithm [49].
The design of the ANN is as follows: (i) 2 input variables,  and  (ii) One output variable, Throughput, which is the maximum or near maximum performance of the production line (iii) The ANNs containing 1 to 4 hidden layers, which employ the hyperbolic tangent sigmoid transfer function as an activation function in every hidden layer: where  is a slope parameter (iv) The ANNs contains an output layer of one neuron; it employs the linear transfer function.
where  is a slope parameter.
All the experiments were run in a desktop PC with the following specifications: Intel(R) Core(TM) i5-2450 M CPU @2.50 GHz, 4.00 GB RAM.

Validation.
The mean squared error (MSE) and the coefficient of correlation () were used as indicators to measure the performance of the RSM obtained and every ANN generated.These errors are defined as follows.
(a) The Mean Squared Error (MSE) [50] where  is the number of the input data to the ANN and RSM, Target  is the target output value of the ANN and RSM for Input  , and Output  is the predicted value of the ANN and RSM to Input  .
(b) The Coefficient of Correlation [50].It measures the correlation between the Output and Target values.It is defined as where Cov(Target, Output) is the covariance between the values of Target and the ANN and RSM Output. Target and  Output are the standard deviations of Target and Output, respectively.The maximum value of  = 1 is reached when the linear relationship is perfect between the target and output values, whereas when  = 0 it means that a linear correlation does not exist between the output and target values.

Results
In the present study, the data utilized to develop the RSM and ANN models were gathered from a total of 360 experiments, varying the values of  and .
The regression model in terms of the actual values that described the throughput is presented in (10).The results indicated that this quadratic model can be used to navigate the design space.The value of  for the model was 0.999603176, and the MSE = 1.22796 × 10 where Th is the throughput,  is the number of machines , and  is the buffer size .
The prediction of the model was validated for a correlation between experimental data and predicted throughput (Figure 2).
On the other hand, the ANN with the minimum error was one composed of four hidden layers and their corresponding neurons (8 : 8 : 10 : 10).It is shown in Figure 3.
Figure 4 shows the correlation between experimental values and predicted throughput in production lines for training dataset, validation dataset, and test dataset.Most of the data are on the bisector or in its vicinity which represents a proper correlation between experimental data and predicted outputs.Figure 4 indicates the closeness among the experimental data and the predicted results using the ANN.The maximum error and MSE are 0.34% and 1.150 × 10 −7 , respectively.Furthermore, a second dataset was used to prove the effectiveness of the RSM and ANN to forecast the throughput of a production line from the  and  values.Table 3 shows experimental and predicted values, as well as the errors of the RSM and ANN models in the throughput forecasting.Indeed, experimental data and predicted outputs obtained with the ANN model have a good fit with each other.On the contrary, the RSM model cannot predict adequately the numerical experiments of the second dataset.
Values in both models were near those of the experimental data of the first dataset.But, the ANN showed a better performance with the second dataset (Table 4).
In the case of the ANN model, the correlation between experimental data and predicted throughput is illustrated in Figure 5.It can be observed that there is an appropriate correlation between experimental values and predicted data.The maximum error and MSE obtained from this dataset are 2.76% and 3.57 × 10 −5 , respectively.

Conclusions and Further Research
The Buffer Allocation Problem is a combinatorial problem that requires a high amount of computational time for medium and large lines.It is necessary to have a model that could predict the production line throughput in a short time.In this work, the relationship among the number of buffer slots (), the number of workstations (), and the throughput in production lines was accurately modeled with RSM and ANNs.In order to study this relationship, a total of 360 experimental data were used in the construction of both models.
A fourth-order mathematical model was obtained by applying the RSM, with a coefficient of correlation  = 0.9996.On the other hand, 780 ANNs models were created in order to obtain the ANN with a coefficient of correlation near to 1, with models from 1 to 4 hidden layers and 5, 8, 10, 12, or 15 neurons for each layer.The ANN with the best performance has 4 hidden layers, with 8, 8, 10, and 10 neurons in each hidden layer, respectively, obtaining a coefficient of correlation  = 1.
Both models have a good performance with the initial 360 experimental data; however, for a second dataset not considered in the model creation, composed of 55 experimental data, the ANN shows a higher performance ( = 0.99875) than the equation obtained by the RSM ( = −0.22536).
These results show that the ANN model provides a good fit and it can represent accurately the behavior of the throughput in production lines with different sizes, even for large lines.
As further work, this ANN model can be used to find optimal or near optimal values of the throughput for minimal number of buffer slots between each machine in the system, in order to minimize the total buffer size.

Figure 1 :
Figure 1: Numerical experiments used for training and testing the ANN.

ForFigure 4 :
Figure 4: Performance indicator  for the ANN model.

Figure 5 :
Figure 5:  value for predicted throughput with the ANN on the second dataset.

Figure 6 :
Figure 6: Comparison of RSM and ANN models.

Table 1 :
Sets of numerical experiments used to validate the ANN.
4.1.Statistical Analysis for Response Surface Methodology.As we mentioned above, the first dataset of numerical experiments composed of 360 data was employed for the

Table 2 :
Classification of lines.

Table 3 :
RSM and ANN errors for the 55 data samples.

Table 4 :
Comparison of errors from RSM and ANN models.