Reliability-centered maintenance: analyzing failure in harvest sugarcane machine using some generalizations of the Weibull distribution

In this study we considered five generalizations of the standard Weibull distribution to describe the lifetime of two important components of harvest sugarcane machines. The harvesters considered in the analysis does the harvest of an average of 20 tons of sugarcane per hour and their malfunction may lead to major losses, therefore, an effective maintenance approach is of main interesting for cost savings. For the considered distributions, the mathematical background is presented. Maximum likelihood is used for parameter estimation. Further, different discrimination procedures were used to obtain the best fit for each component. At the end, we propose a maintenance scheduling for the components of the harvesters using predictive analysis.


Introduction
The arrival of the sugarcane culture in Brazil has had a significant impact on the national economy, which led the country to become the largest producer in the world [4]. Its sub-products are used in the food and chemical industries, as well as in electricity generation and fuel production. Mechanized harvesting is one of the most important stages in the sugar and ethanol mills since it must provide the raw material with quality, time and competitive costs for later processing. Among the used machines in the mechanized harvest, the harvesters stand out for having a large number of corrective stops, given the functionality in such extreme environmental conditions. In addition, its operation is in a regime of 24 hours on the workdays, impacting on fatigue and wear of their parts. During operation, the harvester removes an average of 20 tons of sugarcane per hour and its malfunction may lead to major losses, therefore, an effective maintenance approach is of main interesting [15].
Reliability-centered maintenance consists of determining the most effective maintenance approach [12]. This process was firstly developed in the aviation industry for deciding what maintenance work is needed to keep aircraft airborne, driven by the need to improve reliability while containing the cost of maintenance [19]. Reliability analysis, can be used to estimate time-related parameters to the next machine stop [9], providing information to manage and control of the preventive maintenance of harvesters which could create a production improvement and has potential for cost savings.
In reliability, common procedures are usually based on the assumption that the data follows a Weibull distribution. Introduced by Waloddi Weibull [21] this distribution has convenient mathematical properties and its physiological failure process arise in many areas (see, Manton and Yashin [10]). However, this distribution cannot be used to describe data with non-monotone hazard function (bathtub, upside-down bathtub, to list a few). to overcome this problem many generalizations of the standard Weibull distribution have been proposed. For instance, Lai [8] reviewed more than 20 generalizations of the Weibull distribution.
In this paper, we considered five important generalized Weibull distributions with three parameters to describe the lifetime of two important components of the harvest sugarcane machines. The distributions considered are the Gamma-Weibull distribution [20], generalized Weibull (GW) distribution [14], exponentiated Weibull (EW) distribution [13], Marshall-Olkin Weibull (MOW) distribution [11] and the extended Poisson-Weibull (EPW) distribution [17]. For each distribution, the mathematical background is presented and the parameters estimators are presented using the maximum likelihood estimators. Further, different discrimination procedures are used to obtain the best fit for each component. At the end, we propose a maintenance scheduling for the components of the harvesters using predictive analysis.
This work is divided as follows. Section 2 presents the literature review related the survival models adopted. Sections 3 exposes the data collection, and empirical analysis, as well as, carry out the predictive analysis based on the parametric models. Finally, in Section 4, we present some final remarks related the contribution this study.

Theoretical Background
In this section, we present the statistical background on the adopted distributions and its parameter estimation procedures. The following distributions are considered: Gamma-Weibull, generalized Weibull, exponentiated Weibull, Marshall-Olkin-Weibull and Marshall-Olkin-Weibull. Their choice is based on their flexibility to accommodate lifetime dataset with hazard functions with different shapes, for instance, constant, increasing, decreasing, bathtub and upside-down bathtub.

The Gamma-Weibull distribution
Introduced by Stacy [20] the Gamma-Weibull distribution with three parameters is a flexible model for reliability data due to its ability to accommodates various forms of the hazard function. This distribution is also know as generalized gamma (GG) distribution as also generalize the two parameter gamma distribution, hereafter, we will refer this model as GG distribution to avoid confusion with the GW distribution. A random variable has GG distribution if its PDF is given by where α > 0, φ > 0 and µ > 0. The mean and variance of GG is given by Some relevant distributions are special cases such as the Weibull distribution (when φ = 1), the distribution Gamma (α = 1), Log-Normal (case limit when φ → ∞) and the Generalized Normal distribution (α = 2). As an example, the Generalized Normal distribution is also a distribution that includes several distributions known as, half- , Maxwell-Boltzmann (φ = 3/2) e chi (φ = k/2, k = 1, 2, . . .). The cumulative distribution function (CDF) is given by where γ[y, x] = x 0 w y−1 e −w dw is the lower incomplete gamma function. The survival function is where Γ[y, x] = ∞ x w y−1 e −w dw is the upper incomplete gamma function. The hazard function of the GG distribution is This model is very flexible to describe lifetime data since it has the hazard function with constant, increasing, decreasing, bathtub and upside-down bathtub hazard rate.
For parameter estimation, let T 1 , . . . , T n be a random sample of size n, where T ∼ GG(α, µ, φ). Then, the likelihood function related to the PDF (1) is given by The log-likelihood is given by Setting the partial derivatives ∂ ∂α log(l(φ, µ, α; t)), ∂ ∂µ log(l(φ, µ, α; t)) and ∂ ∂φ log(l(φ, µ, α; t)) equal to 0, we obtain the following maximum likelihood estimatorsμ the solution provides the maximum likelihood estimates (see, for instance, Ramos et. al [16,18], Achcar et al. [1]). Under mild conditions that in some cases are not fulfill the estimators become unbiased for large samples and asymptotically efficient. Moreover, such estimators have asymptotically normal joint distribution given by where I(θ) the Fisher information matrix is

The generalized Weibull distribution
Introduced by Mudholkar et al. [14] the generalized Weibull distribution has PDF given by where λ ∈ R, φ > 0 and α > 0. The CDF and the survival function are respectively given by The hazard function of the GW distribution is This model is very flexible to describe lifetime data since it has the hazard function with constant, increasing, decreasing, bathtub and upside-down bathtub hazard rate. The quantile function of the GW distribution has closed form and is given by For parameter estimation, let T 1 , . . . , T n be a random sample of size n, where T ∼ GW(α, µ, φ). Then, the likelihood function related to the PDF (4) is given by The log-likelihood is given by Setting the partial derivatives equal to 0, we obtain the maximum likelihood estimators. Here, we following Mudholkar et al. [14] which consider the direct maximization of (7). Under mild conditions the obtained estimators are consistent and efficient with an asymptotically normal joint distribution given bŷ where I(Θ) is the 3×3 Fisher information matrix associated to the vector of parameters Θ) and I ij (Θ) is the Fisher information elements in i and j given by Since it is the the Fisher information matrix does not have closed-form expression for some terms, an alternative is to consider the observed information matrix, where the terms is given by Hereafter, we considered the same approach to obtain the confidence intervals for the parameters from other distributions.

The exponentiated Weibull distribution
Introduced by Mudholkar et al. [13] the exponentiated Weibull distribution with PDF given by where σ > 0, φ > 0 and α > 0. The exponentiated Weibull distribution includes the Weibull distribution (φ = 1) and the exponentiated exponential distribution (α = 1). The survival function is given by The hazard function of the GG distribution is This model is very flexible to describe lifetime data since it has the hazard function with constant, increasing, decreasing, bathtub and upside-down bathtub hazard rate. Additionally, the quantile function of the EW distribution has closed form and is given by The k-th moment of the EW distribution is given by The proof of this equality is presented by Choudhury [3].
For parameter estimation, let T 1 , . . . , T n be a random sample of size n, where T ∼ EW(σ, φ, α). Then, the likelihood function related to the PDF (8) is given by (10) The log-likelihood is given by Setting the partial derivatives ∂ ∂σ l(σ, φ, α; t), ∂ ∂φ l(σ, φ, α; t) and ∂ ∂α l(σ, φ, α; t) equal to 0, we obtain the following maximum likelihood estimators Marshall and Olkin [11] introduced a new procedure for adding a new parameter into a family of distribution. In this case, the authors applied such procedure in the Weibull distribution. The obtained PDF of the MOW distribution is given by where λ > 0, α > 0 and γ > 0. Cordeiro and Lemonte [5] derived many properties and the parameter estimators for the MOW distribution, the following results were obtained from the cited work. The survival function is given by This model is very flexible to describe lifetime data since it has the hazard function with constant, increasing, decreasing, bathtub and upside-down bathtub hazard rate. Additionally, the quantile function of the MOW distribution has closed form and is given by For parameter estimation, let T 1 , . . . , T n be a random sample of size n, where T ∼ MOW(λ, α, γ). Then, the likelihood function related to the PDF (11) is given by The log-likelihood is given by l(λ, α, γ; t) = n log(α) + n log(γ) + n log(λ) Setting the partial derivatives ∂ ∂λ l(λ, α, γ; t), ∂ ∂α l(λ, α, γ; t) and ∂ ∂γ l(λ, α, γ; t) equal to 0, we obtain the following maximum likelihood estimators for more details, see Cordeiro and Lemonte [5].

The extended Poisson-Weibull distribution
Ramos et al. [17] introduced the extended Poisson-Weibull (EPW) distribution as a generalization of Weibull-Poisson distribution (see Hemmati et al. [7]) where its PDF is given by where λ ∈ R * , β > 0 and α > 0. The survival function is given by The hazard function of the GG distribution is This model is very flexible to describe lifetime data since it has the hazard function with constant, increasing, decreasing, bathtub and upside-down bathtub hazard rate. Additionally, the quantile function of the EPW distribution has closed form and is given by For parameter estimation, let T 1 , . . . , T n be a random sample of size n, where T ∼ EPW(λ, α, β). Then, the likelihood function related to the PDF (8) is given by The log-likelihood is given by Setting the partial derivatives ∂ ∂λ l(λ, α, β; t), ∂ ∂α l(λ, α, β; t) and ∂ ∂β l(λ, α, β; t) equal to 0, we obtain the following maximum likelihood estimators

Goodness of fit
Firstly, in order to verify the behavior of the empirical data the TTT-plot (total time on test) was considered (Barlow and Campo [2]). The TTT-plot is obtained through the plot of [r/n, G(r/n)] where and t (i) , i = 1, · · · , n is the ordered data. For data with concave (convex) curve , the hazard function has increasing (decreasing) shape. In the case of the behavior starts convex and then becomes concave (concave and then convex) the hazard function has bathtub (inverse bathtub) shape. The goodness of fit is checked considering the Kolmogorov-Smirnov (KS) test. This procedure is based on the KS statistic D n = sup |F n (t) − F (t; φ, λ, α)|, where sup t is the supremum of the set of distances, F n (t) is the empirical distribution function and F (t; α, β, λ) is c.d.f. A hypothesis test is conducted at the 5% level of significance to test whether or not the data comes from F (t; α, β, λ). In this case, the null hypothesis is rejected if the returned p-value is smaller than 0.05.
The following discrimination criterion methods were adopted: Akaike information criteria (AIC) and the corrected AIC (AICc) computed respectively by AIC = −2l(θ; t) + 2k and AICc = AIC + 2 k (k + 1)(n − k − 1) −1 , where k is the number of parameters to be fitted andθ is MLEs of θ. For a set of candidate models for t, the best one provides the minimum values.

Data collection and Empirical Analysis
The dataset came from two sources: a manual stop system, which brings the history of revisions and corrective stops of two sugarcane harvesters; and data from the onboard computers of the harvesters, which provide information on the operation of the machine. The data were collected from January 2015 to August 2017, a period corresponding to 2.5 harvests (crops), i.e., a period of thirty months of activity.

Empirical Analysis
Firstly, considering all the stops and their reasons, records of the performance of the predictive maintenance is needed to be observed. In total, 1347 stops were observed, which 186 were preventative and 1161 corrective stops. Thus, it is possible to observe the superior amount of unplanned stops, thus questioning the effectiveness of preventive maintenance. Table 1 shows the failure among the harvests, considering both machines analysis. The Pricker and Transmission from each machine were selected given their complexity in the maintenance. Figure 1 describes the number of failures per year divided by harvest, considering their temporal sparsity, by which items analyzed in this report, correspond to 18% of the stops. It is possible to notice a difference in the machines' behaviour, both machines appear to be equally affected by the problems of Transmission and Pricker, but the machine B is more affected by problems with the Pricker. Further, reliability models were individually adjusted, and thereby compared, as described in the next section.

Preventive maintenance
In this section, we discuss a parametric approach in order to perform a predictive analysis for the lifetime of the components. Table 2 presents a high defect rate after a short repair time as well, compromising the cost of the production. The experiment considered a total period of 30 months, as said before. Then operating equipment had three off-seasons, these periods were not included in the dataset. The equipment was only observed during the time of its active operation.  1  1  1  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4   From the TTT-plot we observed that the proposed data has unimodal hazard rate, which implies that all the proposed models may be used to describe the proposed dataset. Table 3 presents the results of AIC, and AICc in order to discriminate the best fit. Among the proposed models the Exponentiated Weibull distribution has superior goodness of fit since the AIC and AICc returned smaller values. Therefore, using the Exponentiated Weibull distribution we computed the maximum likelihood estimates and the predictive value for 25% (see Table 4). Hereafter, as we considered the quantile function to obtain the predictive value, the confidence intervals related to this estimate were obtained from bootstrap technique (see Efron and Tibshirani [6]). From Table 4 we observe that the predictive maintenance should be done in approximately 3 days after the last failure with confidence interval between 2 and 4 days.

Pricker from machine B
A similar behavior is observed for the Pricker in the machine B, shown in Table 5 presenting a high defect rate as well. The approach was maintained considering only the time during its active operation.    From the TTT-plot we observed that the proposed data has unimodal hazard rate, which implies that all the proposed models may be used to describe the proposed dataset. Table 6 presents the results of AIC, and AICc in order to discriminate the best fit with the EW distribution. In this case, we observe the similarity between both machines for this component. Thus, the maximum likelihood estimates for the EW distribuiton were computed as well as the predictive value for 25%. Table 7 presents the MLEs, Standard deviations and 95% credibility intervals for α, θ, σ and y * related to the EW distribution.  Table 7 results suggest that predictive maintenance should be done in approximately 3 days, considering a point estimation, or given a 95% credibility interval would be between 2 to 4 days approximated. Thereby, Pricker among machines showed no difference performance so ever. Table 8 shows that more than 50% of the defect rate appears until 8 days right after its repair for the Transmission for the machine A.  From the TTT-plot we observed that the proposed data has also fulfill the hazard rate shape preposterous to different generalizations of the Weibull distribution used. Table 9 presents the results of AIC, and AICc in order to discriminate the best fit.

Transmission from machine A
As can be seen from the Table 9 the GW distribution has the p-value of the KS test smaller than 0.05, therefore, is not a possible candidate to fit the data. Overall, the GG distribution has a better fit since has the smaller AIC and AICc. Therefore, we computed the maximum likelihood estimates and the predictive value for 25% using the GG distribution. Table 10 presents the MLEs, Standard deviations and 95% credibility intervals for φ, µ, α and y * related to the GG distribution. Table 10 results suggest that predictive maintenance should be done in approximately 3 days, considering a point estimation, or given a 95% credibility interval   would be between 2 to 4 days approximated.

Transmission from machine B
Comparing to the others equipments, the transmission from the machine B presented smaller number of occurrence. Table 11 shows the sparsity of the dataset related to the sugarcane harvester's transmission B. Once again, Figure 5 presents the TTT-plot, as well as, the survival function fitted by different generalizations of the Weibull distribution, considering the transmission from machine B. From the TTT-plot we observed that the proposed data may be fitted by all the proposed models since has unimodal hazard rate. Table 12 presents the results of AIC, and AICc in order to discriminate the best fit. As shown in Table 12 the EWP distribution has the minimum AIC and AICc. Therefore, we computed its respectively the maximum likelihood estimates and the predictive value for 25%. Table 13 presents the MLEs, Standard deviations and 95% credibility intervals for α, θ, σ and y * related to the GG distribution.  Table 13 results suggest that predictive maintenance should be done in approximately 7 days, considering a point estimation, or given a 95% credibility interval would be between 4 to 10 days approximated.

Final Remarks
In this study, we considered different distributions to describe the lifetime of harvest sugarcane machine components. The harvesters stand out for having a large number of corrective stops, given the functionality in such extreme environmental conditions. However, these harvesters does not have an effective preventive maintenance policy which affects its working time schedule. To overcome this problem, we presented a predictive analysis using probability models based on its percentiles aiming to incorporate intelligence into maintenance planning.
The Weibull distribution is a popular model that can be used to describe a wide range of problems, however, it can not be used to describe data with non-monotone hazard rate. Thus, many generalizations of the Weibull distribution have been proposed to overcome this problem. Since the proposed datasets have non-monotone hazard rate, we considered some flexible generalizations such as the Gamma-Weibull, the generalized Weibull, the Exponentiated Weibull, Marshall-Olkin Weibull and the Extended Poisson Weibull distribution. For the proposed distributions, some mathematical functions were discussed as well as the parameter estimators under the maximum likelihood approach.
The proposed distributions were used to fit the datasets using maximum likelihood estimators. The exponential Weibull presented a superior fit for both machines considering the pricker component, in these cases we concluded a predictive maintenance should be done in approximately 3 days. On the other hand, for the transmission component, the distributions that presented better fit were respectively the Gamma-Weibull distribution and the extended Poisson Weibull for the machine A and B, where a predictive maintenance should be done respectively in 3 and 7 days after the last failure.
Further work should be considered beyond the adjusted models adding to the structure of recurrent event data, implement them, and analyzed their forecast accuracy. This approach should be implemented as an APP, helping the maintenance section in their individualised scheduling distributions.

Disclosure statement
No potential conflict of interest was reported by the author(s)