An Approach to Integrating Tactical Decision-Making in Industrial Maintenance Balance Scorecards Using Principal Components Analysis and Machine Learning

The uncertainty of demand has led production systems to become increasingly complex; this can affect the availability of the machines and thus their maintenance. Therefore, it is necessary to adequately manage the information that facilitates decision-making. This paper presents a system for making decisions related to the design of customized maintenance plans in a production plant. This paper addresses this tactical goal and aims to provide greater knowledge and better predictions by projecting reliable behavior in the medium-term, integrating this new functionality into classic Balance Scorecards, and making it possible to extend their current measuring function to a new aptitude: predicting evolution based on historical data. In the proposed Custom Balance Scorecard design, an exploratory data phase is integrated with another analysis and prediction phase using Principal Component Analysis algorithms and Machine Learning that uses Artificial Neural Network algorithms. This new extension allows better control over the maintenance function of an industrial plant in the medium-term with a yearly horizon taken over monthly intervals which allows the measurement of the indicators of strategic productive areas and the discovery of hidden behavior patterns in work orders. In addition, this extension enables the prediction of indicator outcomes such as overall equipment efficiency and mean time to failure.


Introduction
In business and engineering, decision-making approaches and models are developed in response to the uncertainty of technological and demand conditions.In business, it is possible to identify a strategic [1] or operational [2] approach or a more particularly focused approach for suppliers [3]; in the engineering field, it is possible to identify cases regarding manufacturing conditions [4], product design [5], or aspects relating to civil engineering [6].In the context of the current market, in which delivery times are continually reduced and, more importantly, responses to orders are increasingly immediate, the production response in the industrial environment is faster, and quality and time affect both the complexity and the flexibility of the system [7].Considering that the capacity of the machines is limited, we consider those productive areas with identified bottlenecks as strategic productive areas of the factory.Using this capacity as an invariant value, the system attempts to maintain the maximum availability of the machines that comprise a strategic productive area.Moreover, if continuous production occurs in this context, for example, in papermaking, downtime caused by damage is irrecoverable.
Another characteristic of the market context is a wider range of products, resulting in the transformation of manufacturing from mass production to flexibility; in the latter case, this versatility leads to greater wear and fatigue on machines because of the high rate of change in the configuration, potentially resulting in a loss of reliability.This finding means that it is necessary to consider more extreme measures in terms of both the prediction and the anticipation of failure.Thus, predictive maintenance engineering has developed and perfected technologies for condition monitoring and predicting failures before breakage occurs [8][9][10].Although this approach is more operational and requires more resources and investments than following the scheme [11], it cannot be established in an entire strategic productive area without critical equipment, facilities, or machine parts.To respond to this problem, a methodology has been considered for a productive area designated as strategic that offers knowledge extraction and the prediction of availability indicators.Thus, the maintenance department can provide a timely response with minimal resources to maintain the required reliability.
In maintenance field, when the decisions-making is related to strategies or policies, in the long term, the considerations of fuzzy uncertainty are convenient.Thus, the literature review, carried out by Mardani et al. [12] about the fuzzy multiple criteria decision-making techniques, found that, in maintenance environments, the fuzzy approach is utilized in strategic framework, in the long term as, for example, in the selection of the maintenance strategy [13][14][15] or the maintenance policy [16].This could be extensible to projects [17,18] or in civil engineering [19] environments where there is more uncertainty due to the different conditions of each event.However in industrial environments and with continuous process as papermaking, where the same machines are used in the manufacturing despite product variety, the risks in the predictions are lower.
The integration of Principal Component Analysis (PCA) and Machine Learning (ML) techniques can facilitate decision-making in these environments.PCA is a very efficient method to find attributes that are influential in explaining the greater variation of a data set characterized by many explanatory variables in many registers [20].This algorithm is used extensively in the literature, particularly for predictive maintenance, as a method of reducing dimensions [21].According to Alpaydin [22], ML is a branch of artificial intelligence whose goal is to programmatically automate a computer's learning process, similar to how humans and animals naturally learn through experience; the algorithms of ML directly employ the data without previously establishing an equation as a model.In addition, these algorithms improve their efficiency as the quantity of data used as examples increases during the learning.ML finds natural patterns in the data and helps make better decisions and establish predictions.Because of its versatility, ML has been used in many fields, including construction [23].However, this approach is not habitually combined with the PCA and ML techniques.The grouping of data facilitated by PCA allows a better interpretation of complex systems such as those in which ML is applied; this interpretability is considered a characteristic of achievement through ML methods [24].
This work consists of a segment of a global and modular framework for Maintenance Decision Support Systems [25], whose general objective is to propose a system that assists an expert in decision-making to design customized maintenance programs in a productive plant [26].This system begins with the alignment of the company's strategic objectives, followed by the tactical and operational maintenance.
This paper addresses that tactical goal and has the objective of providing better knowledge and predictions by projecting reliability behavior in a medium-term future (yearly horizon taken over monthly intervals), integrating this new functionality into the classic Balance Scorecard (BSC) and making it possible to extend its current function of measuring the current situation to a new aptitude: predicting evolution based on historical data [27].For this objective, techniques such as PCA and ML are used.

Methodology
In the proposed Custom Balance Scorecard design, Matlab© [28] is used to integrate an exploratory phase of data using PCA algorithms and another phase of discovery and prediction that uses ML; we will use Artificial Neural Network (ANN) algorithms.The beginning data used to evaluate the results were obtained from productive area records composed of two main papermaking machines coded in the Computerized Maintenance Management System (CMMS) as M1 and M2, respectively.The data have been divided into two parts.The first part will be used in the exploratory phase, which reflects the maintenance work orders received in the productive area in one year.The other part will be used in the analysis phase, in which production values and machine responses are represented as efficiency variables and failure times; this part also considers a period of one year.Because of the continuous improvement process that characterizes the papermaking industry, the maintenance function's influence on productive efficiency and sustainability is more sensitive than in other types of industrial plants [29][30][31]; therefore, this study focuses on indicators, overall equipment efficiency (OEE), and mean time to failure (MTTF) [32].
The PCA algorithm has been used in the exploratory phase.In the analysis phase using ML techniques, ANN is used for its versatility as algorithms for supervised and unsupervised learning and for its suitable behavior against other ML techniques that are used for prediction [33].In unsupervised learning, two types of algorithms are used to extract the knowledge of the data structure through clustering.Hierarchical clustering is used, as is Neuronal Network of Self-Organizing Map (SOM).Both algorithms identify groups of individuals by similar behaviors from individual data and have been used effectively both to identify the stages of wear in industrial environments [34] and to characterize the energy in electrical supply networks [23].Hierarchical clustering makes it possible to show the natural grouping structure of the data as a function of the metric that is set as a criterion of proximity, whereas SOM decomposes the data into a set number of groups.Supervised learning will use ANN regression algorithms for the suitable predictive behavior of machine maintenance variables [35].
The production plant presents in its management system a clear division between maintenance and production, occurring equally for its databases; therefore, there is no single database where we can access all the information jointly in an integral manner.Because of this, we have to access maintenance and production data separately, so we have two distinct tables identified as dataWO, Figure 1(a), corresponding to the maintenance database, and dataWOF, Figure 1(b), corresponding to production database.Both tables will be defined in more depth later, nevertheless, to clarify the following two phases: dataWO will contain the input data for the PCA and clustering algorithms corresponding to the unsupervised learning technique, while dataWOF will serve as input information for the regression algorithm according to the supervised learning technique.The separation of maintenance and production departments at the level of database management leads to separate analysis and the use of different techniques in terms of the knowledge extraction process.

Exploratory Data Phase.
There is a first preparatory, preliminary data step in which the starting data correspond to the work orders, WOs, which have received the papermaking machines, M1 and M2, during a calendar year.These data have been extracted from a CMMS database; Figure 1(a) shows the treatment of these data once they have been imported; these are defined in the table dataWO.The data obtained present 46 attributes and 1080 instances of WOs after a prior filtering.The work order is a document that in its original format presents 46 fields that represent the 46 original attributes (see Table 1), which can be grouped into descriptive fields of problem and resolution, with free text of alphanumeric type, and other categorical variables of numeric type to accommodate the kind of work order, such as order type, requester, repair shop, repair type, urgency type, asset condition, and implication of failure.Numerical categorical variables that hold classes are order status, homogeneous family, section, and installation, type of fixed assets, type of work, and operative sequences.The remaining qualitative variables are of date type that record dates and times of request and programming of the intervention and completion.However, it is permissible to perform a PCA on all data for the 46 attributes, applying it only on the 7 numerical types (4 associated costs of totals, orders, parts, and workforce, 2 repair times, and 1 of the number of operators) which are shown in Table 1, as input variables for the PCA, discarding the rest of the original variables since they are used to obtain context information, as they document the problem and its physical location; that is why they will serve as prefilter variables for location and situation in which the maintenance intervention is located.
In the exploratory data phase, the statistical technique of PCA has been used to reduce the data dimension and find the principal axes that best represent the variation of data.These axes are orthogonal to each other and are calculated using a linear base change application by choosing a new coordinate system for the original set of data in which the largest variance of the dataset is captured on the first axis (called the first component); the second-largest variance is the second axis, and so on.This methodology reduces to a problem of eigenvalues and eigenvectors on the covariance matrix of the data, obtaining a reduction of the dimensionality of the data on those axes that make a more substantial contribution to its variance in general; therefore, many principal axes are used whose sum represents approximately 80% of the variation of the original data [20].
The PCA parts of a data set are tabulated such that each line represents an observation, instance, or individual and each column represents an attribute or variable.Consider that a data set consisting of  observations with  attributes is available.In matrix notation, we will express Ã(,) , where Ã is the matrix representing the table with the coefficients (  ) as the th observation of the jth variable; hence, the matrix of observations Ã is formed by  vectors of variables,   , sorted by columns, and each vector has  components corresponding to its  observations, as shown in To reduce the size of the variables, one must find another vector subspace that is aligned with those vector components that involve more variation, and one must form a basis for these components to be represented in an orthogonal, that is, a linearly independent system.This problem is reduced to finding a vector space whose vectors, V, represent the variation of the data, that is, a system in which (2) is satisfied: However, in this case, the variation is not reduced but is used to find the principal components and axes or their own values and vectors of the data.In accordance with this philosophy, we will attempt to find those components and principal axes that explain the maximum variation of the data.Thus, instead of matrix Ã, its covariance matrix is Once the principal components are obtained,   , along with the principal axes, V  , they together explain the variation of the data, which are ordered as a Pareto diagram, selecting exclusively that set of components  that explain at least 80% of the variation in the data.Thus, a reduction in the dimensions of the data of  original variables to  <  variables is obtained.In general, the matrix of observations projected onto the main axes, Ỹ, contains  observations of the variables that are obtained by (5), where P is the matrix formed by columns with the eigenvectors V  , obtained from (4).

Ỹ(𝑛,𝑘) = Ã(𝑛,𝑘) ⋅ P(𝑘,𝑘) .
( This transformation expresses the original data in axes that coincide with the natural variation.One aspect to be considered in this analysis is that this transformation is linear; therefore, it is not suitable for representing nonlinear problems.In cases of nonlinearity, it is advisable to use ML clustering algorithms, as will be observed later.

Phase of Analysis through Machine Learning.
In this phase, the preparatory data step uses as input data, in addition to the previous data, the production values and their responses as efficiency variables and failure times for an operational year for both papermaking machines (M1 and M2).Data are extracted and grouped from two databases: maintenance and production.The data obtained present 35 attributes and 12 instances corresponding to each month for each machine (identified as M1 and M2). Figure 1(b) shows the treatment of these data once they have been imported; they are defined in the table dataWOF.The manufacturing report is a document that in its original format presents 35 fields that represent the original attributes (see Table 2), which can be grouped in identifying fields of the machine in question, of alphanumeric type, and numerical categorical variables record the natural month.The remaining attributes are of numeric type and record the values of time, cost, interventions, and production.For each papermaking machine, 3 predictor variables associated with production parameters, shown in Table 2 (daily production in tons of paper per day, average paper weight in grams per square meter, and average speed in meters per second), are selected, with the objective of obtaining a target of two simultaneous predictive responses (OEE in the percentage of machine utilization and MTTF).Answers that evaluate the aptitude of the maintenance function applied to the productive area are provided by both machines.The three predictive variables are those that, from experience, characterize production better.Although a PCA could be performed as in the exploratory phase for the dataWO, it was not considered in this occasion due to the few instances, 12, which we had for each machine.Since PCA is a statistical analysis, it has been considered that the few instances are not sufficient to carry out such an analysis, considering at this point a selection based on criteria based on experience.However, as more productive data and more instances are obtained, a PCA can be performed on all or those productive attributes of Table 2 to reduce the dimension and select those that represent the most influence in the variation of data.
ML is divided into two techniques [27]: supervised learning, which is training a model on known input and output data to predict future outputs, and unsupervised learning, which is finding hidden patterns and intrinsic structures in the input data.Figure 2 provides an illustration of ML.For each technique, different algorithms can be used in which choosing the ideal is performed by trial and error.
Supervised learning uses classification and regression techniques to develop predictive models.The difference between these techniques is that the classification predicts responses in discrete or categorical variables, whereas regression predicts responses in a continuous variable [36].Unsupervised learning uses the clustering technique commonly used in exploratory data analysis to find hidden patterns as clusters in data.
From the algorithms of ML (Support Vector Machine, Discriminant Analysis, Naive Bayes, Nearest Neighbor, Decision Trees, K-means, Hierarchical, Gaussian Mixture, Hidden Markov Model, and ANN), we will use algorithms modeled with ANN for their versatility for both nonsupervised techniques (clustering) and supervised techniques (regression).The former groups the input data to recognize patterns and define the natural groups present in the data; According to Rumelhart et al. [37], the process of the back-propagation training of ANN is used for regression techniques such as supervised learning.The ANN backpropagation training process is divided into two stages: forward and backward propagation.A network configuration consisting of multiperceptron layers, as shown in Figure 3, and an activation function of the output layer range and [0, 1] is used before an input , which is expressed by In the forward propagation stage, we select an input data set for training ( 1 ,  2 , . . .,  −1 ) and apply it to the network to obtain the outputs   .For each neuron  of the hidden layer, the value of each nucleus   is given by where each input value of the previous layer   is weighted by  , and the output of the hidden layer is expressed by This is a nucleus activation function and is performed iteratively for each output layer until the final output,   , as in The backward propagation stage consists of measuring the error committed as the difference between the calculated  value,   , and the real value,   .We recalculate the weights , attempting to minimize the error in the reverse, first obtaining the new weights of the layer of exit   , , based on the old, , equation and later the new weights of the hidden layer   , , where   is obtained by applying (9) on the following equation: This process is repeated for  observations until a predetermined acceptable value of the error is achieved, usually using the mean square error (MSE), which is defined in (13).To ensure a rapid convergence of the iterative method, we usually use mathematical optimization methods; in this case, we use Bayesian Regularization [38].

Results
It is possible to integrate new functionalities into a custom control panel of the industrial plant.In this case, predictive analysis was added for the expected availability response of a productive area, considering the main core of the industrial plant; thus, it is possible to anticipate the information.The future availability in the medium-term (at monthly intervals) of both machines allows the maintenance department to correct possible deviations that are out of tolerance before they occur, improving their response.In the first phase, which is exploratory, we use PCA to discover the smallest dimensions that explain the variation of the data.Applying the PCA to the set of WOs of productive area 2 (composed of M1 and M2), principal components or axes, PCi, are found; these are sufficient to explain the variation of the original data contained in the WOs of the productive area.As a result of the PCA, 5 components are identified that would explain 100% of the variation; therefore, 2 linearly dependent vectors are detected among the input variables, reducing in two the original dimension; on the other hand, from 5 principal components 3 would account for 78.6% of the variation in data.This work aims at the number of interventions, costs, and maintenance times and will represent the results of PCA on the first 3 principal components.In Figure 4(a), it is observed that the first three principal components represent 78.6% variation of the data, reason why it is decided to represent the data using these three components as principal axes of representation; this is visualized in Figure 4(b), with projections of the original data on the three principal components.Previously the data were normalized with mean 0 and standard deviation 1.
Table 3 shows the values obtained in the PCA of the maintenance metrics, where the projections are obtained on the three principal components of the maintenance numerical variables selected from Table 1 (total costs, orders, parts, and workforce cost, estimated repair time, repair time, and number of operators).From the seven maintenance variables studied, it can be observed that the ones that gain more  relevance in principal axes are from greater to less: order cost, total cost, and part cost; this has been considered extracting the Euclidean modulus or norm of each variable in the three principal axes, which is shown in the last column of Table 3.The rest of the four remaining variables have a more or less similar amplitude, so they are considered of equal importance.
In the second phase, ML, the clustering technique is used to discover patterns hidden in the data, such as the natural grouping.For this technique, from the data stored in the dataWO table shown in Figure 1(a), only two attributes are used: total cost and repair time.Both attributes can be key indicators for the maintenance department, and the repair time can be key also for the production department because of downtime.Therefore, after knowing their influence on the principal components, it is important to deepen their relationship.
For the clustering technique, two algorithms have been used.The first technique, hierarchical clustering, allows the creation of a dendrogram, which is a tree diagram that measures the number of natural groups, or clusters, depending on the distance criterion that is fixed between data.In this case, by setting a distance value on the ordinate axis, the tree is trimmed by a horizontal line that cuts the dendrogram in as many intersections as natural groups appear.In this case, Figure 5(a), it is observed that, for Euclidean mean distances of 6000 to 7000, the tree presents two natural groups; from 3000 to 5500, it presents three groups; from 2000 to 3000, it presents four groups; and below 1000, the number of groups increases considerably.Because of this compression, a value of 900 is used; by pruning the tree into 7 natural groups, the different groups of color data can be illustrated in Figure 5(b).As can be seen, the hierarchical clustering technique allows an overview of the number of clusters that can be obtained as a function of the chosen distance value.There are several distance metrics, you can even define as a custom; in this case Euclidean distance has been used as a metric.
The clustering technique is again used, performing a second algorithm of an SOM, ANN, on the subset of total cost data and repair time as the chosen variables reflecting costs and times of the plant's intervention maintenance.An SOM or Kohonen consists of a competitive layer that can classify a set of vector data with any number of dimensions into as many classes as neurons have a layer [39][40][41].Neurons are arranged in a two-dimensional topology of the data set.The trained network with 2 variables and 1080 input data is shown in Figure 5(c); its two-dimensional topology with data impact is shown in Figure 5(d).
The network is configured by 2 dimensions, 2 × 4, discovering a pattern of 8 natural groups in the data; these are distributed with a clear linear relationship between them.In addition, there are discrepant data that have no linear relationship and reveal an unconventional repair; this is extraordinary and realized in one of the machines, and it was not cataloged like normal repair.This finding reveals an error in the introduction of the information in the CMMS.This event was also revealed by the green dot (single group) of the hierarchical clustering figure (see Figure 5(c)).Figure 6(a) shows the original data in the two variables (cost, time), and Figure 6(b) shows the 7 natural groups and the linear relationship between them.This figure also shows that group 8 has no linear relationship, as previously discussed.The interpretation of these results, under the maintenance approach, shows that the linear relationship between the groups comes to reflect the following analysis on the distribution of the groups, observing that the first four groups for costs between 0 and 1000 € are very close to each other, while the centroids of 2000 €, 3000 €, and 5000 € present greater distance.With this, it is inferred that the majority of the interventions cost less than 1000 €, with a smaller number of interventions at intervals of 1000 €.
Finally, the regression technique enables prediction of the future availability values of both main papermaking machines (M1 and M2) using the OEE indicators of each papermaking machine and its MTTF, such as average runtime before failure.In addition, these values are calculated simultaneously in the trained ANN model.A trained neural network with input data (predictors) is used that combines the three production variables and the two target output variables, which are the overall efficiency of each OEE machine and average time to MTTF failure, measured for 12 months of the year for each machine.For this technique, from the data stored in the dataWOF table shown in Figure 1(b), only three production attributes are used: daily production in tons of paper per day, average paper weight in grams per square meter, and average speed in meters per second.
For the papermaking machines M1 and M2, 3 input variables, a 10-layer feed-forward network with hidden neurons, and 2 layers of linear output neurons can adjust arbitrarily suitable multidimensional mapping problems, given consistent data and sufficient neurons in their hidden layer.The network will be trained with 70% of the data using the backpropagation algorithm of Bayesian Regularization; 15% of the data will be used for validation, and the remaining 15% will In short, this network can predict the maintenance behavior of the production area of the plant in availability terms (efficiency and operating times), feeding the model with the predictive values of production.As more instances of input/output data are introduced, the network will be retrained and more reliable, since it will be adjusted with a greater number of real examples that have occurred; therefore, the greater quantity will lead to the acquisition of more experience and knowledge.In maintenance terms, it is necessary to make predictions of availability and operating times according to three characteristic values of production, and the model established in this way allows to predict the OEE and MTTF observing a nonlinear behavior in the time, as is shown in Figure 9.In this sense, the responses for M1 are observed where the OEE oscillates between an average value of 94% with a low dispersion of ±0.66%, thus not happening with MTTF values of mean 81.07 h of high dispersion ±25.82 h.As for M2, the OEE has an average value of 95.52% with a low dispersion of ±1.12%, despite MTTF values of 118.03 h mean with very high dispersion ±59.35 h, concluding that the regression model can accurately predict high and low amplitude oscillation values over time.
It is noted that the fit is acceptable for training, which is provided by the global adjustment regression coefficient  values, as shown in Figure 7(b), for both machines.
For machine M1, discrepant values are observed in the validation setting for month 10 and for both OEE and MTTF indicators; there are two reasons for this reason.First, there is overadjustment when the data have not been prepared well, and there are data with erroneous or poorly conditioned input information.Second, there are a low number of observations or minimal historical information.In this paper, it has been verified that the input data for the ANN did not present poor conditioning; therefore, the overadjustment problem is discarded, and the few data (i.e., the few observations) available are considered the main cause of discordance of the inputs for validation.The problem of feeding the network with few data to train and validate the ANN is due to unavailability of more data for reasons of good performance in the industrial plant, a fact that would undoubtedly improve the learning of the network and therefore its efficiency and accuracy.However, this fact highlights another very interesting aspect of the ANN; the network is easily adaptable and configurable given a low number of observations.Here, this adaptability makes it possible to accurately predict 11 hits of 12 possibilities; thus, there is a 91.67% probability of success in this case.
For machine M2, there is a nearly total adjustment for the OEE indicator but not for the MTTF indicator, for which it is evident that, in month 6, there is a discrepancy in the prediction, as in M1; the minimal data (observations) used for the validation obtain the same precision as M1.
Another relevant aspect of the result is the acceptable precision in the prediction of availability indicators, which are based exclusively on time, by simply using as input three productive variables as predictor variables (daily paper mass, paper surface density, and machine speed).In addition to the two output variables, OEE and MTTF indicators as objectives to be predicted are calculated simultaneously, a fact that reflects an additional value of this type of networks and greater computational efficiency, by obtaining in a single simulation the prediction of more than one objective variable.

Conclusions
A PCA-ML model has been developed such that it can be integrated into scorecards with a traditional focus, BSC,  thus including a tactical definition of longer-term strategic approaches such as a scorecard based on BSC.This new extension allows better control over the maintenance function of an industrial plant in the medium-term, with a monthly interval, in such a manner that allows the measurement of certain indicators of those productive areas that were previously considered strategic.This model of PCA and an ML algorithm using ANN can be integrated very easily into any traditional control panel by converting the developed source code to packages of different programming languages and including them in a library to be used as a function in a spreadsheet or a standalone executable application.In addition, at the control panel, this model is provided with ML to discover structures and behavior patterns that are relatively hidden in WOs.By utilizing a clustering or clustering technique, natural groups are determined in the cost variables and maintenance workforce; in addition, predictions about the availability of the productive area are made through the indicators OEE and MTTF.Thus, the scorecard model on a paper production plant has been validated.As possible future works, this methodology could be applied to civil engineering and in this case applying a fuzzy uncertainty due to particular characteristics of this sector.

Table 1 :
Original attributes of the maintenance work order.

Table 2 :
Original attributes of the manufacturing report (production database).