Reliability Estimation under Scarcity of Data: A Comparison of Three Approaches

especially for the equipment characterized by less data available. The outcomes of this research will assist maintenance engineers and asset managers in choosing the optimal approach to conduct reliability analysis either when suﬃcient data or limited data are observed.


Introduction
Several hazardous substances are handled inside process plants; therefore, unforeseen events could produce fires, explosions, and chemical releases that could generate enormous financial loss and injuries or deaths of nearby employees and civilians [1].Among the potential causes, asset failure is often regarded as the primary source of the aforementioned dangerous phenomena [2]; hence, the equipment involved in process industries should be adequately maintained to guarantee appropriate standards of safety and reliability, while generating a profit from the operations.
Over the past decades, safety and reliability requirements have progressively increased [3], leading to significant deployment of resources in maintenance activities [4]. is fundamental vision has resulted in the development of many maintenance policies such as reliability-centered maintenance (RCM) [5][6][7][8][9][10][11], risk-based maintenance (RBM) [12][13][14][15][16], and condition-based maintenance (CBM) [8][9][10][11].Within the development of a CBM plan, failure prognosis is of prominent importance; thus, there is an ongoing effort on condition monitoring and calculation of the remaining useful life [17][18][19].Zeng and Zio [18] developed a dynamic risk assessment framework based on a Bayesian model to update the reliability of safety barriers as soon as new data are collected.After updating the reliabilities, the risk indexes are determined through an event-tree (ET).Another relevant work by Chen et al. [19] proposes an integration between neuro-fuzzy systems (NFSs) and Bayesian algorithm to predict the evolution of the operating condition of a given system.e approach is tested on two case studies, and the results reveal a greater accuracy than other conventional predictors (e.g., recurrent neural networks, NFSs, and recurrent NFSs).
To implement a proper maintenance plan based on preventive actions, a crucial task is represented by the estimation of the probabilities of failure.During this phase, the accuracy of the prediction is essential since overestimating the failure rates could lead to greater resource consumption.By contrast, an underestimation may delay the maintenance actions, resulting in a riskier state of the operations.Consequently, a great deal of research studies has been made to provide accurate estimation procedures adopting different tools such as fault tree analysis (FTA) [20], probability graph paper [21], support vector machines [22], FORM (first-order reliability method) [23], SORM (second-order reliability method) [24], and Bayesian network (BN) [25].To prove the advantages and limitations of the estimation methodologies, many researchers have also focused their efforts on comparing the results arising from the application of distinct approaches [26][27][28].Musleh and Helu [27] applied Bayesian inference, MLE, and LSE to censored data samples.e results of this study pointed out the Bayesian estimator as the best in terms of bias, mean squared error, and Pitman nearness probability.A more recent work by BahooToroody et al. [28] presented the comparison between the MLE and the HBM in case of perfect repair and minimal repair.e authors tested the two approaches on an NGRMS, proving that the Bayesian inference provides more precision in failure modelling than the MLE.e proposed frameworks operate under condition of availability of sufficient data, while the challenges arising from sparse and limited data have not been addressed.
Within probability and reliability applications, data scarcity is regarded as one of the main issues.Indeed, lack of available data increases the uncertainties related to the estimation process [29], causing sometimes the inability to find the probability distributions [30].Sparse data could be generated from many sources, such as the rarity of an event, limited knowledge, missing data, and impropriate data collection.Moreover, the implementation of maintenance strategies also contributes reducing available data since maintenance actions are performed to prevent failures from happening [31].Quite recently, a BN-based quantitative risk assessment methodology was developed by Yang et al. [32].In this work, a BN along with precursors are adopted to cope with data scarcity, while the consequences are evaluated through loss functions.e classic Bayesian approaches can partially compensate for limited data by incorporating prior knowledge and expert judgments.However, under the primary assumption of the work, they neglected the effect of source-to-source variability of failure data in the process model [33].To overcome this limitation and simultaneously deal with sparse data, HBM, along with precursor data, has been extensively exploited by many academics [34][35][36][37][38]. Li et al. [39] integrated the BN and the HBM for a dynamic risk assessment of a submerged pipeline.In their study, the classical BN is used to model the conditional dependencies among primary events, while the hierarchical approach is developed for predicting the probabilities associated with basic events and the safety barriers.
e proposed methodology can be updated as soon as new information becomes available, by including them into the prior distribution characterizing the HBM.
During the last years, the adoption of HBM has spread to a broader audience thanks to the advances in open-source Markov chain Monte Carlo (MCMC) sampling software such as OpenBugs [40].Examples of applications include RBM planning [41,42], condition monitoring [43,44], and probabilistic risk assessment [45,46].Recently, Abaei et al. [47] presented an HBM-based methodology able to predict the probability of failure of a tidal energy converter assuming a homogeneous Poisson process (HPP) for the failure modelling.
Despite all the ongoing efforts, there is still a need for a sound tool able to deal with the uncertainties arising from the lack of data in maintenance applications.Indeed, while the literature provides many comparison studies among distinct statistical methodologies when sufficient data are observed, less interest has been devoted to the comparison of estimation tools under the assumption of limited data available.To this end, the main objective of this paper is to provide a comparison between the Bayesian inference and two classic estimation approaches (i.e., MLE and LSE) in the event of data scarcity arising from frequent preventive maintenance actions.e methods are evaluated based on their accuracy in the estimation process for the failure rate of the components belonging to an NGRMS, which is chosen as a case study.

Hierarchical Bayesian Modelling.
e first step required to conduct a statistical inference is collecting "Data," which are defined as the observed values of a given process.Next, "Information" is obtained by manipulating, evaluating, and organizing "Data."e process of gathering "Information" leads to acquire "Knowledge," which is subsequently exploited to perform "Inference" [48].As stated by El-Gheriani et al. [29], the HBM allows to carry out the inference tasks through Bayes' theorem, shown by Bayes' theorem relies on the proportionality between the posterior distribution, denoted by π 1 (θ|x), and the product of the likelihood function and the prior distribution, respectively, identified by f(x|θ) and π 0 (θ).e prior distribution is usually named informative when it conceals relevant information about the unknown parameter of interest (θ), while it is regarded as noninformative when little 2 Mathematical Problems in Engineering or no information about θ is considered [49].It is worth mentioning that the HBM owes its name to the adoption of a multistage or hierarchical prior distributions [50], given by [51] where φ is a vector whose components are called hyperparameters, while π 2 (φ) is the hyperprior distributions, representing the uncertainty of φ.Finally, π 1 (θ|φ) is referred as first-stage prior distribution, which considers the variability of θ given a certain value of φ.
where y i is the ith observation, while prd i (θ) is the prediction of the model associated to the ith observation.e remainder of the paper is organized as follows: Section 2 describes the steps of the proposed study.Section 3 illustrates the implementation of the methodologies to the NGRMS, while Section 4 provides the discussion of the results.At last, in Section 5, conclusions are presented.

Methodology
Within the reliability analysis process, the exploitation of different estimation tools could lead to distinct results, which may affect the adopted maintenance strategy.To this end, the main goal of this paper is to investigate the application of three estimation methodologies in case of few data available, focusing mainly on the comparison between the Bayesian approach and the classic approaches (i.e., MLE and LSE).A brief overview of the framework is represented in Figure 1.
e first step (1) of the methodology is to collect failure data generated by the considered process.Since the majority of industrial equipment undergoes substantial preventive maintenance, both Times To Failure (TTFs) and Censored Times To Failure (CTTFs) are taken into account for the study (1.1); moreover, the number of failures observed during a specified time span is also considered (1.2).CTTFs arise when preventive maintenance is performed or a given component survives longer than the exposure time.
During the second phase ( 2), the failure model is specified.In the present work, the HPP is adopted for modelling the failure behaviour of the considered devices.
e HPP describes a scenario where the interarrival times between failures are independent and identically distributed according to an exponential distribution.Due to the frequent preventive measures that completely restore the life of the active components, the assumption of constant failure rate (i.e., number of failures independent upon a time) is regarded as appropriate for this study.
Next, the third part (3) consists of selecting the desired estimation tool, which is used to compute the failure rate of each apparatus (4).
ree estimation methodologies are evaluated in this paper: (i) HBM (3.1), (ii) MLE (3.2), and (iii) LSE (3.3).e results arising from the different methods are then compared (5) to point out the most accurate and precise estimator.A special focus on the comparison between the more recent Bayesian inference and the classic approaches is presented.

Hierarchical Bayesian Modelling.
Assuming an HPP for the failure events of a given system, the number of failures x, experienced during a timeframe equal to t, can be obtained via where λ is the intensity characterizing the Poisson distribution, i.e., the unknown parameter of interest.As suggested by Siu and Kelly [49], the first-stage prior representing the variability of λ among different sources should be a beta distribution, given by where α and β are the hyperparameters, which are considered as independent before including any observations into the analysis [48].After choosing the likelihood and the prior distributions, the MCMC simulations are performed to determine the posterior distributions of the hyperparameters.As a result, the posterior distribution of λ is also obtained through the sampling procedure.

Maximum Likelihood Estimation.
Under the hypothesis of HPP, the probability distribution of failure interarrival times, denoted by T, is given by the following equations: Mathematical Problems in Engineering where λ represents the rate of arrival.As previously discussed, the MLE determines the parameters of the considered distribution by maximising the likelihood function, which in case of exponential distribution is expressed by where n indicates the total number of failures, while T is addressed as the mean of the interarrival times.e estimator of λ is then found via [28,53] 2.3.Least Square Estimation.Let the interarrival time of failures follow a negative exponential distribution, described by equations ( 7) and ( 8). e LSE estimates the unknown parameters by defining a straight line that minimizes the sum of squared distances between the observed data and the line itself.erefore, the exponential distribution should be rewritten in the form shown by After applying the logarithm to both sides of equation ( 7) and some simplification, the following equation is obtained: which represents a straight line with a � −1/λ and b � 0, while y � t and x � ln[F(t)].Let t � (t 1 , t 2 , . . ., t n ) be a sample of TTFs, and the estimation of λ is found by minimizing the SSE reported by where t i stands for the ith observed TTF, while F(t i ) is replaced by the median rank, expressed by [54]

Application of the Methodology to NGRMS
To show a practical application of the three approaches and compare their results, an NGRMS (Figure 2) is chosen as a case study.A generic NGRMS is divided into four groups and twelve main components, listed in Table 1.e natural gas distribution network is a complex infrastructure formed by pipes and apparatuses able to withstand high-pressure values.us, before distributing the methane to the final users, the gas pressure must be reduced to be suitable for the various utilities.To fulfill this task, NGRMSs are usually installed along with additional subsequent pressure reduction units.e core of the plant is the reduction group, in which the pressure regulator and the pilot are tasked with reducing the pressure.During standard  4 Mathematical Problems in Engineering conditions, the pressure is decreased by varying the crosssectional flow area of the pressure regulator, while the downstream pilot is activated just in case a faster or more accurate pressure reduction is required.e solid and liquid impurities that could be present in the gas flow are removed by the filter, which is located upstream of the pressure regulator.Since decreasing the pressure is always accompanied by a temperature reduction, the gas must be heated before entering the pressure regulator to avoid the formation of ice.To this end, a water flow is preheated by a boiler, and subsequently, it is sent to an exchanger in which flows the methane gas.At last, the measuring group evaluates the natural gas's most relevant parameters (e.g., pressure, temperature, and mass flow), while the odorization group is required to add a precise quantity of odorizer, usually tetrahydrothiophene (THT), to the gas flow.

Data Collection.
To implement the approaches, the operations of a real-life NGRMS were reproduced through the AnyLogic simulation software (developed by e AnyLogic Company, http://www.xjtek.com),focusing on the stochastic failure generation process.e developed model has an NGRMS located in Tuscany near Arezzo Town and a maintenance centre in Prato Town (Figure 3), served by two maintenance teams available 24/7. e first maintenance squad is tasked with preventive actions, while the second one is in charge of the corrective actions.Agent-based modelling and fifteen simulation run were adopted for this study.From each run, the TTFs, the CTTFs, and the number of failures were extracted to conduct the subsequent analysis (step 1 of Figure 1).It is worthwhile mentioning that the failure rates adopted for the simulation are chosen based on real experience; therefore, from now on, they are considered as "real" parameters.As a result, such failure rates are also exploited as reference values to compare the precision of the three estimation approaches.
For the sake of conciseness, the estimation methods will be presented in detail for the pressure regulator; however, a summary reporting the obtained results for all the considered components will be discussed later through this study.Table 2 shows the observed number of failures and the Mathematical Problems in Engineering number of preventive maintenance actions arising from each simulation run for the pressure regulator.Each simulation run is considered as a different source of the failure rate of the components for the HBM.After this brief introduction, step 4 of Figure 1 will be implemented in the next sections.

Hierarchical Bayesian Modelling.
e BN illustrated by Figure 4 was adopted to predict the posterior distribution of λ. e aforementioned number of failures observed in each source (Table 2) is denoted by X i , while λ i refers to the failure rate of the i-th run.e calculation of posterior distribution in Bayesian will be carried out by MCMC simulation.ree chains, each starting from a distinct point in the parameter space, were used to assure the convergence.e sampling from the likelihood and the prior distribution was conducted with 105 iterations for each chain, preceded by 1,000 burn-in iterations.e estimated posterior distribution of α and β along with their respective mean values is shown in Figure 5. Furthermore, the correlation between the two parameters is represented in Figure 6.
e MCMC sampling process revealed a mean value of α equal to 0.3863, with a 95% credible interval of (0.1156, 0.9065), while the mean of the posterior predicted distribution for β is 1.68E + 4 hours with the following 95% credible interval: (2.12E + 3, 4.52E + 4).
e caterpillar plot representing the 95% credible interval for the failure rate of each source is illustrated in Figure 7.As shown in Figure 7, the computed mean value (red vertical line) of the posterior predictive distribution for λ is 1.68E − 05 (per hour).Table 3 reports a summary of the posterior distribution of every λ i .
Due to the different number of failures and distinct exposure time observed in each source, the mean failure rate varies significantly from source-to-source.For instance, the first source is characterized by the highest mean failure rate of 5.69E-05 (per hour) since it has experienced the highest number of failures (3) in the shortest timespan (44,300 hours).By contrast, the tenth pressure regulator owns the lowest mean failure rate equal to 3.59E.06(per hour), because no failure has been detected for a decade.
e source-to-source variability is incorporated within the aforementioned mean value of 1.68E − 05 (per hour).
Considering an exponential distribution for the interarrival time of failures, the MTTF is given by the reciprocal of the failure rate, as shown in Following equation (15), the MTTF of the pressure regulator is estimated.It emerged that the average time between two subsequent failure states is 59,524 hours (about six and a half years).

Maximum Likelihood Estimation.
To perform this step of the analysis, the statistical software called Minitab was exploited.Minitab allows considering both the TTFs and the CTTFs for the estimation of λ. e MLE application provides a failure rate of 1.22E − 05 (per hour), which corresponds to an MTTF equal to 82,060 hours (more than nine years).
e resulting probability density function is reported in Figure 8.

Least Square Estimation.
As in the previous step, Minitab was adopted for the LSE as well.e intensity of the Poisson process estimated by the LSE method is slightly higher than the rate calculated via the MLE.e calculation depicted a λ equal to 1.28E − 05(per hour), equivalent to an MTTF of 78,219 hours (slightly less than nine years).e exponential probability density function of failure interarrival time corresponding to the estimated failure rate is illustrated in Figure 9.

Discussion
In this section, step 5 of Figure 1 is presented.As described by the previous section, applying the three approaches with the same input data provided three different values for the failure rate of the pressure regulator.
e HBM yields a failure rate of λ HBM � 1.68E − 05, which results in an MTTF equal to 59,524 hours.On the other hand, it emerged that the MTTFs calculated by the MLE and LSE approaches are much higher than the Bayesian ones.Indeed, the MLE and the LSE quantified an average time between two subsequent failures of 82,060 and 78,219 hours, respectively.e real failure rate (i.e., the one adopted during the simulation process) is λ REAL � 1.64E − 05, corresponds to a MTTF of 60,882 hours.Accordingly, the calculation revealed a much accurate and precise Bayesian estimator with respect to the other ones.Indeed, the real value is underestimated by the HBM for 1,300 hours (about 54 days), while the other estimations are more than 2 years longer compared to MTTF REAL .
A summary of the Bayesian results and the other approaches is listed in Tables 4 and 5. e comparison will be discussed further for each group in the following sections.
e estimated MTTFs are transformed into dimensionless values through the average time between two consecutive failures adopted for the simulation (i.e., MTTF REAL ). e results are shown in Table 6 and Figure 10.

Reduction Group.
To illustrate the differences arising from the three estimation methodologies, the cumulative distribution functions (CDFs) of each approach were developed for the reduction devices (Figures 11 and 12).As depicted by Tables 4-6, the HBM provided the most accurate estimation for the failure rate of the pilot, while the MLE and the LSE of the MTTF are, respectively, 129 days longer and 154 days shorter than the real value.Considering the filter, the difference between MTTF REAL and MTTF HBM is just 17 hours, while both the MLE and the LSE overestimate the average time between two consecutive failures.e MLE yields an MTTF which is 20 days longer compared to MTTF REAL , whereas the mean time span between two failure states is estimated by the LSE as 25 days longer than the real one.

Measuring Group.
For the calculator, the HBM yields a posterior mean interarrival time of failure equal to 68,027 hours, while the MLE and the LSE of MTTF are estimated at 88,825 and 80,837 hours, respectively.Given the real value of 73,233 hours, the HBM is the most accurate estimation tool once again.e Bayesian approach manifested its advantages over the other methodologies for the PTG as well.Indeed, the difference between MTTF HBM and MTTF REAL is 128 hours (about five days), while both the MLE and the LSE overestimate the real mean time between two contiguous failures by approximately 1,000 hours (41 days).On the contrary, the application of LSE emerged as the most precise for both the meter and the RCS.However, the Bayesian inference demonstrated greater accuracy than the MLE for these two devices.e MTTF HBM of the meter is just 14 days longer than the MTTF estimated by the LSE, while the   Mathematical Problems in Engineering discrepancy between the Bayesian prediction and the LSE estimator for the RCS is equal to 12 days.Both the HBM and the LSE showed an estimation error of about 5,000 hours (208 days) and 1,500 hours (62 days) for the meter and the RCS, respectively.e CDFs of the measuring components are represented in Figures 13 and 14         e CDFs built for the THT tank and THT pipe are illustrated in Figure 15.
e Bayesian approach proved its higher performance also for the components belonging to the odorization group.e MTTF HBM of the THT tank is estimated at 98,039 hours, which is about 5,500 hours (about 230 days) shorter than MTTF REAL .e MLE model resulted in an average interarrival time of 13,9 years, while the LSE yields an MTTF of 12.1 years.Compared  Mathematical Problems in Engineering to the real value, the ML and the LSE estimators showed a gap of about 30,000 hours (more than three years) and 13,000 hours (about 1.5 years), respectively.For the THT pipe, a similar scenario is seen.Indeed, both the ML and LS of MTTFs are about 30,000 hours longer than the real average time expected before experiencing a failure.By contrast, the HBM predicted a posterior mean value of 128,370 hours, which is close to the real mean interarrival time to failure of 132,363 hours.

Preheating Group.
e water pipe is the component associated with the worst estimations due to extreme data scarcity (Figures 16 and 17).e HBM yields a posterior MTTF of 182,149 hours, which is five years longer than MTTF REAL .e MLE and the LSE also overestimated the real average time between two consecutive failures by 12 and 10 years, respectively.Considering the boiler, the MLE estimator is the most accurate, with a discrepancy of just 300 hours (almost 13 days) compared to MTTF REAL .Nevertheless, the application of the HBM is more precise than LSE.At last, the Bayesian inference emerged as the best estimator for the pump, presenting a gap of 100 hours (four days) with respect to the mean interarrival time of failure adopted for the simulation.On the other side, an overestimation of 58 and 99 days is observed, respectively, for the MLE and the LSE of the MTTF related to the pump.

Discussion: Maintenance Application.
e HBM has proven itself as the most reliable estimator under limited data, which concerns particularly the pressure regulator, the calculator, the THT tank, the THT pipe, and the water pipe.Indeed, these components are characterized by a longer MTTF than the other apparatuses; therefore, fewer failures are observed during the same time interval.e Bayesian predictions of the failure parameter for the aforementioned devices show a better precision than the other estimation methodologies.Moreover, the accuracy showed by the HBM is also higher than the other approaches for most of the devices.e root mean square error (RMSE) is calculated for each method to demonstrate the last statements, as shown by equation ( 18):  Mathematical Problems in Engineering where n denotes the number of components, while MTTF i is the estimated average time between two consecutive failures for the ith device.At last, MTTF REAL, i is the mean interarrival time between failures adopted for the ith equipment during the simulation.e RMSE computed for the HBM through equation ( 16) is equal to 13,892 hours, while the RMSE of the MLE and LSE is estimated, respectively, at 34,420 and 27,761 hours.Accordingly, the exploitation of the Bayesian method will result in a much safer maintenance strategy without overlooking economic aspects by avoiding premature maintenance actions.

Conclusions
Any maintenance policy is deeply affected by the previous failure rate estimation process, which often suffers from limited data.us, one of the most significant challenges associated with the reliability analysis is selecting a proper estimation approach capable of producing accurate and precise results in case of limited data.To this end, the application of three estimation tools is investigated in this paper, with a particular focus on the comparison between the Bayesian inference and two common estimation methodologies: the MLE and the LSE.e three analyses were tested on twelve components of an NGRMS, whose operations were simulated through a simulation model to extract failure data (i.e., TTF, CTTF, and the number of failures).Under the assumption of HPP, the results highlighted a greater accuracy of the HBM, which emerged as the most precise estimator for nine devices out of twelve.e advantages of the Bayesian estimator are especially evident in the event of data shortage, associated with the devices with greater MTTF.Indeed, the lack of data is partially compensated by the HBM through the consideration of sourceto-source variability, which is disregarded by the MLE and the LSE.On the other side, the MLE and LSE precision improves for the equipment characterized by more data available, up to taking the upper hand over the Bayesian inference for the meter, the RCS, and the boiler.However, the discrepancy between the Bayesian predictions and the other estimations for these components are negligible since almost no difference can be seen from their respective CDFs.Mathematical Problems in Engineering Considering all the above, adopting a Bayesian approach for the reliability analysis will help to deal with sparse data, resulting in a more efficient and effective maintenance plan.Further developments can include the application of weaklyinformative kind of prior to the Bayesian model to incorporate some prior knowledge into the estimation framework.

Figure 1 :
Figure 1: Flowchart representing the steps of the presented study.

Figure 4 :
Figure 4: Developed HBM for estimating the failure rate of each device.

Figure 3 :
Figure 3: Map of the location of the simulated case. .

Figure 5 :Figure 6 :
Figure 5: e posterior probability density function for alpha (on the left) and beta (on the right).

Figure 7 :
Figure 7: Predicted 95% credible interval for the failure rate of the pressure regulator in each source.e black dots are the posterior means of each source, while the red line represents the average of posterior means.

Figure 8 :
Figure 8: Interarrival time of failure distribution for the pressure regulator obtained via MLE.

Figure 9 :
Figure 9: Interarrival time of failure distribution for the pressure regulator obtained via LSE.

Figure 16 :Figure 17 :
Figure 16: Developed CDFs for the water pipe and the boiler.

Table 2 :
Number of failures, preventive actions, and exposure time for the pressure regulator in each source (simulation run).

Table 3 :
Statistical properties of the failure rate for each source of the pressure regulator.

Table 4 :
Real failure rate and failure rates estimated through the three approaches.

Table 5 :
Real MTTF and MTTFs estimated through the three approaches.

Table 6 :
Dimensionless mean time to failure (D-MTTF) for each estimation approach.Developed dotplot of the D-MTTFs for each methodology.e black dashed line represents the real MTTF.