AParameter EstimationMethod of ShockModel Constructedwith Phase-Type Distribution on the Condition of Interval Data

-e phase-type distribution (also known as PH distribution) has mathematical properties of denseness and closure in calculation and is, therefore, widely used in shock model constructions describing occurrence time of a shock or its damage. However, in the case of samples with only interval data, modeling with PH distribution will cause decoupling issues in parameter estimation. Aiming at this problem, an approximate parameter estimation method based on building PH distribution with dynamic order is proposed. Firstly, the shock model established by PH distribution and the likelihood function under samples with only interval data are briefly introduced. -en, the principle and steps of the method are introduced in detail, and the derivation processes of some related formulas are also given. Finally, the performance of the algorithm is illustrated by a case with three different types of distributions.


Introduction
e shock model is usually used in the study of a system's reliability. e model was first proposed by Esary et al. [1] to describe time intervals between adjacent shocks and its damage by random distributions, as shown in Figure 1. When the cumulative damage (blue line) caused by the shock sequence (green lines) exceeds a certain threshold (red line), the system fails.
In the shock model, the occurrence time of shocks is usually expressed as a Poisson process [2,3]; that is, the time interval of adjacent shocks satisfies the exponential distribution, and the one shock damage is also assumed to satisfy the same cluster. Of course, other forms of distributions [4][5][6], such as gamma distribution, log-normal distribution, and Weibull distribution, also have been used in constructing shock models. While these distribution clusters enhance the model's adaptability, it also increases the difficulties in estimating their parameters.
Observed sample data are required to obtain the model's parameters. When facing point sample data, their form of log-likelihood function is usually a sum form of its distribution with the logarithmic operator in the front, that is to say, the parameters are easy to be decoupled and methods of MLE (maximum likelihood estimation), including various types of Newton iterative methods [7], can be used for estimating parameters and obtaining a result with a high accuracy. When the sample data are interval data, the likelihood function often contains a large number of convolution operations since the data are incomplete. At this time, exponential distribution is often assumed to simplify the calculation, while it may cause the model to be inconsistent with the actual situation. Of course, some Monte Carlo methods [8] have also been proposed to solve the computational problems caused by convolution, but the estimation accuracy of the methods is usually not controllable.
Phase-type distribution (known as PH distribution) is a kind of probability distribution clusters based on continuous-time Markov process [9]. Suppose a continuous-time Markov process X(t), t ≥ 0 { } with k + 1 states, wherein the states (1, ..., k) are transient states, the state k + 1 is an absorbing state, and X(t) represents the state of the random process at moment t.
en, the infinitesimal generator matrix of the Markov process can be expressed as Q � D 0 d 1 0 0 , wherein D 0 is a k-dimensional invertible matrix and d 1 is a k-dimensional column vector, satisfying where e ⇀ is a column vector and full elements of 1 with corresponding dimensional. Based on the above Markov process, a k-order continuous PH distribution (α, D 0 ) can be defined. A random variable T is distributed as an absorption time taken to transition from any initial state (i � 1, ..., k) to the absorbing state (k + 1), wherein α � α 1 ... α k represents a row vector of probability distribution of each initial phase, D 0 is called the phase-type generator, and d 1 represents a column vector of absorption rate in which the respective phase transfers to the absorbing state. e cumulative distribution function (CDF) of the PH distribution is e probability density function (PDF) of the PH distribution is e PH distribution is dense in the positive abscissa axis; that is, it can fit any type of probability distributions with the random variable located on the interval [0, ∞). In addition, the PH distribution also has the feature of closure in convolution operations, which greatly simplifies the calculation. Due to the above properties, it is widely used in shock modeling. Neuts and Bhattacharjee [10] first used PH distribution in shock model construction. Ochoa et al. [11,12] used PH distribution to characterize time intervals of adjacent shocks and one shock damage to analyze reliability of systems. ese works are constantly expanding the adaptability of shock models, but it can also be seen that the PH distribution is currently mainly used in shock model constructions under the samples with point data or partial interval data.
ere are three general parameter estimation methods for PH distribution [13]: maximum likelihood method, Monte Carlo method, and EM method. Among them, the maximum likelihood method, including various types of Newton iterative methods, is often used to solve some specific types of simple PH distributions; the Monte Carlo method is mainly used for specific forms of PH distribution with point data; the expectation maximization method (also known as the EM method) is the most widely used method for parameter estimation; however, it is mainly used in PH distribution with point data [14] or partial interval data [15]. If only interval data can be observed, such as only censored data and interval failure data can be obtained, the model constructed with PH distribution will face the decoupling problem when performing parameter estimation. At this time, the methods described previously cannot be used directly.
Aiming at this problem, an approximate method for parameter estimation of the shock model constructed with PH distribution is proposed. By giving an estimated error and constructing PH distribution with dynamic order, the parameters of the shock model with only interval data samples can be estimated. e paper is organized as follows: in Section 2, the shock model constructed by PH distribution and the likelihood function with only interval data samples are briefly introduced. Section 3 reconstructs the process statistics based on the framework of the traditional EM method [14], explains the approximation principle in detail, derives the approximate estimated expression of the model parameters, and also gives the total steps of the entire algorithm. Section 4 tests the performance of the algorithm through a case with three different types of distributions; Section 5 summarizes the method proposed in this paper and briefly introduces the follow-up work.

Shock Model Based on PH Distribution. Some assumptions:
(1) e failure threshold of a component is X th , and the censored time is t (c) . (2) e shock damage X n > 0 caused by the nth (n � 1, 2, ...) shock is an iid random variable, and the corresponding PDF is f(x), x > 0, and the CDF is F(x), x > 0. e cumulative damage caused by the number of shock n is X * n � n i�1 X i , let X * 0 � 0, and the corresponding CDF is F n (x), x > 0, and the PDF is f n (x), x > 0.
(3) e occurrence time of the nth (n � 1, 2, ...) shock is T n > 0, let T 0 � 0, and the corresponding CDF is G n (t), t > 0, and the PDF is g n (t), t > 0. e time interval ΔT n � T n − T n−1 is also an iid random variable, and the corresponding PDF is g Δ (t), t > 0, and the CDF is G Δ (x), t > 0. (4) e occurrence time of a shock is independent of its damage and also independent of the cumulative damage that has been caused by shock sequence. Let G * n (t) � P(T n−1 ≤ t, T n > t), t > 0, represent the probability that the occurrence time of the nth shock exceed t for the first time. Based on the previous assumptions, the reliability at moment t is where N(t) represents the number of shocks that the component receives at the moment t and X − th represents the left limit of the threshold. e following describes the specific forms of the previous parameters by PH distributions: (1) Assume that the one shock damage X n satisfies υ order (υ � 1, 2, ...) PH distribution (β, R 0 ), wherein β � β 1 ... β υ represents the probability distribution vector of initial phases, which satisfies υ i�1 β i � 1, and R 0 � e ⇀ represent the column vector of absorption rate. en, (2) e cumulative damage X * n caused the number of shock n � 1, 2, ... satisfies the PH distribution (c n , Y (0) n ), wherein c n � β 0 ... 0 1,υn represents the distribution probability vector of initial phases present the probability that the cumulative damage is no less than x after the nth shock for the first time.
represents a column vector that the ((n − 1)υ + 1)th to the (υm)th elements are 1 and others are 0, with corresponding order. e distribution of the adjacent shock time interval ΔT n and shock time T n is (α, D 0 ) (m order) and (μ n , M (0) n ). e assumption forms of these two distributions are similar to the shock damage distributions described previously.

Likelihood Function of Interval Data.
Assume that the samples are independent of each other and the data can only be obtained at the censored time t (c) . e information of observed sample data is as follows: (1) e total number of the samples is S > 0 (2) e failure mode set of the samples is δ � δ 1 ... δ S , wherein δ s � 0 represents the censored sample s � 1, 2, ..., S and δ s � 1 represents the failure sample s (3) e number of shocks received by the samples is N � n 1 ... n S Let θ � θ x , θ t represent all parameters to be estimated, where θ x � β, R 0 represents the parameter set of the distribution of the one shock damage and θ t � α, D 0 represents the parameter set of the distribution of the time interval. Based on the assumption that the occurrence time of the shock is independent of its damage, the likelihood function can be expressed separately as shown in Table 1.
In summary, the overall likelihood function based on the sample data is It can be seen in Table 1 that parameters in equations with the integral cannot be decoupled easily with the general methods proposed in Section 1. Combined with the assumed model above, observation data, and likelihood function proposed in this section, a specific method on estimating parameters θ is introduced as follows.

Principles, Steps, and Derivation of the Method
It is not difficult to see from Table 1 that forms of the likelihood functions are essentially the same, so only the estimating process of parameters θ x is described in detail here, which can also be used for estimating θ t . According to the main steps of the EM method [14], it is necessary to construct a complete data likelihood function.

Process Statistics Construction.
Four variables and an indicator function are defined in the following: (1) d (s) n,k represents the phase of the sample s � 1, 2, ..., S after the kth phase transition in the nth (n � 1, 2, ...) shock damage period; (2) b (s) n,k represents the damage caused by phase d (s) n,k ; (3) n (s) n represents the number of phase transitions of the sample s � 1, 2, . . . , S during the nth (n � 1, 2, . . .)

Mathematical Problems in Engineering
shock damage period, including the times when the damage period changes; s represents the number of additional shocks that may cause failure if the sample s(δ s � 0) is censored with shock number n s . It can be seen that N (h) s > 0 is a random variable and n (h) s represents its specific sample value; whether the phase of the sample s � 1, 2, . . . , S is i � 1, 2, . . . , υ after the kth phase transition in the nth (n � 1, 2, . . .) shock damage period.
With the above variables and function, four process statistics can be constructed as follows: Table 2.
(3) e set of statistics M (s) n,ij represents the total number of phase transition times of the sample s � 1, 2, . . . , S when the phase is transferred from i � 1, 2, . . . , υ to j � 1, 2, . . . , υ(i ≠ j) during the nth (n � 1, 2, . . .) shock damage period, which does not include the times of the phase transition when the damage period changes. e values of M (s) n,ij are shown in Table 4.  Table 5.
represents the set of process statistics of the sample s � 1, 2, . . . , S.

Parameter Estimation Expression.
With the above process statistics, the likelihood function under the complete data can be expressed as follows: wherein, the complete data likelihood function of the censored sample (δ s � 0) is e complete data likelihood function of the failure sample (δ s � 1) is A log-likelihood function can be obtained from equation (5): According to the EM method, the posterior distribution of process statistics H is Assuming that the parameter values θ 0 have been obtained by previous iteration, the expectation of posterior distribution of process statistics H in equation (8) by equation (9) is Mathematical Problems in Engineering where the expectation of the log-likelihood function with the censored sample (δ s � 0) data is And, the expectation of the log-likelihood function with the failure sample (δ s � 1) data is Mathematical Problems in Engineering 5 e parameters θ x to be estimated satisfy the following constraints: (1) e initial phase probability satisfies υ i�1 β i � 1 (2) For any phasei � 1, 2, . . . , υ, υ j�1 r (0) ij + r (1) i � 0 With the MLE method, the parameters θ x can be expressed in the following equations: e general steps of the EM method are Step E: solving the expectation sets of posterior distribution of the process statistics proposed in Section 3.1: Step M: bringing the result obtained in step E into equation (13) to obtain an iteration value of parameters θ x .
It is not difficult to see that the set of process statistics constructed in Section 3.2 is infinite, and expectations of the posterior distribution of the process statistics cannot be obtained at this time, so an approximate method is introduced in the following sections.

Approximate Principle.
Assume that the sample s is censored after the n s th (n s � 1, 2, . . .) shock, and the probability of failure after another n (h) s th (n (h) s � 1, 2, . . .) shock is P(X * n s +n (h) en, the survival probability of the sample after the n s th shock can be expressed as Let Δ(n) � P(X * n s < X th ) − n n (h) s �1 P(X * n s +n (h) s −1 < X th , X * n s +n (h) s ≥ X th ), then it is not difficult to see that for any determined PH distribution (β, R 0 ) and failure thresh-oldX th , lim n⟶∞ Δ(n) � 0 is constantly satisfied.
at is, given any positive real number R > 0, there must exist at least one n (Δ) s � 1, 2, . . . such that for any n > n (Δ) s , Δ(n) ≤ R is constantly satisfied. Now, assume that R is sufficiently small and n (Δ) s satisfies Δ(n (Δ) s ) ≤ R and Δ(n (Δ) s − 1) > R, then equation (15) can be approximated by a function of finite summation: n s +n (Δ) s X th e ⇀ (Σ) n s +1,n s +n (Δ) s , (16) where e ⇀ (Σ) m,n represents a column vector that the element from (m − 1)υ + 1 to nυ is 1 and the others are 0, with corresponding order.
Based on the above description, the parameters θ x can be approximated as

Expectations of Process Statistics with Posterior Distribution.
After approximation, the expectations of process statistics with posterior distribution can be obtained referring to the method in [14,15] by making some improvements. e results are given directly as follows: (1) Censored sample (δ s � 0): (1) e expectations of B (s) n,i with posterior distribution: ① When n � 1, where n (T) s � n s + n (Δ) s . ② When n � 2, 3, . . . , n s + 1, ① When n � 1,

3.5.
Steps of the Method. It can be seen from the previous description that it is necessary to determine the value of the parameter R first. After that, the corresponding number of shocks n (T) according to the parameter values θ 0 obtained by the current iteration can be determined, so that the survival probability that the number of additional shocks exceeds n (Δ) s � n (T) − n s can be suffered for most censored samples (δ s � 0) is less than R.
If the number of shocks of a censored sample satisfies n s ≥ n (T) , it is not difficult to see that the survival probability of such samples satisfies erefore, it can be assumed that if the sample suffers one more shock, then it will inevitably fail, which means n (T) s � n s + 1. However, for the other censored samples, the original value n (T) remains unchanged.
Taking the process of parameter estimation of θ x in damage distribution (β, R 0 ) as an example, the specific steps of the method based on the shock model and interval sample data described in Section 2 are as follows: Step 1: given the initial value θ 0 to the distribution (β, R 0 ) and ε > 0 as the end condition of iteration, determine the approximate threshold R > 0 Step 2: calculate the corresponding n (T) by θ 0 and R Step 3: calculate the value of the log-likelihood function ln L x (θ 0 ; X th , N, δ) according to θ 0 Step 4: use the E-step according to the distribution parameter θ n−1 obtained in the (n − 1)th iteration and the formulas in Section 3.4 to obtain expectation E θ n−1 (H | X th , N) of the process statistic set H with posterior distribution Note that during the processes, for any censored sample (δ s � 0), ①When n s < n (T) , n (T) does not change ② When n s ≥ n (T) , there is n (T) s � n s + 1 Step 5: bring E θ n−1 (H | X th , N) into equation (17) to obtain a new iteration value θ n Step 6: calculate the corresponding n (T) by θ n and R Step 7: calculate the value of the log-likelihood function ln L x (θ n ; X th , N, δ) of the sample according to θ n Step 8: compare the value of Δ � |ln L x (θ n ; X th , N, δ) − ln L x (θ n−1 ; X th , N, δ)| with ε: ① When Δ > ε, let n � n + 1 and repeat steps 4-7 ② When Δ ≤ ε, end the iteration and return the estimated value θ ⌢ x � θ n e parameter θ t of time interval distribution (α, D 0 ) can also be obtained based on the above steps. For censored samples (δ s � 0), note that since the last shock time T n s satisfies T n s < t (c) , while for failure samples (δ s � 1), the cumulative damage X * n s caused by shocks at the time of failure satisfies X * n s ≥ X th , it is necessary to consider that the number of shocks should be n s + 1 when estimating θ t , which is similar to n s when estimating θ x .

A Case
In this section, the performance of the parameter estimation method is tested by a case with only interval data samples. e case includes three types of nonexponential distributions (gamma distribution, log-normal distribution, and Weibull distribution) which are commonly used in shock model constructions to generate sample data. e proposed method is used to estimate the parameters of the corresponding PH distributions, and the results are compared with the above three distributions to analyze its performance.

Model Assumption
(1) Assume that the sample size is S � 100, the failure threshold is X th � 30, and the censored time is t (c) � 70. Sample data can only be obtained at the censored time.
(2) Assume that the end condition of iteration is ε � 0.0001 and the approximate threshold is R � 0.0001. (3) ere are two shock models: (1) Model 1: ① e interval of shock time satisfies gamma distribution with the shape parameter of 3 and scale parameter of 2 ② e one shock damage satisfies log-normal distribution with a log-mean value of 0.5 and logvariance value of 1 (3) Model 2: ① e interval of shock time satisfies gamma distribution with the shape parameter of 3 and scale parameter of 2 ② e one shock damage satisfies Weibull distribution with the shape parameter of 3 and scale parameter of 2

Parameter Estimation and Result
Analysis. Based on the method, the initial value, including type and order of PH distribution, should be given first. How to select type and order of a PH distribution can be referred to [16], and each type of PH distribution is described in detail in [17]. e estimation results of parameters of the time interval distributions are only shown in the first model since the distributions in both models are the same.

Parameter Estimation of Time Interval Distribution.
e distribution of the time interval is fitted by Erlang-3 distribution with 3-order. e initial value of the PH distribution is as follows: e simulation time is about 5 h, and the parameter estimation results based on the simulation data are as follows: e estimated PDF of the PH distribution and the original gamma distribution are both shown in Figure 2, and the related statistics are shown in Table 6.
It can be seen from the results shown in Figure 2 and Table 6 that the estimated PH distribution fits well with the original gamma distribution based on the interval sample data. When the shape parameter of gamma distribution is an integer, the random variable can be regarded as the sum of the same number of iid random variables from exponential distribution with the parameter equal to the scale parameter of the gamma distribution, so the initial distribution form can be easily determined as Erlang-3 distribution in this case. At this time, the parameter estimation of the PH distribution is an unbiased estimation, and the estimated errors are entirely derived from the randomness of the samples.

Parameter Estimation of Damage Distribution (Log-Normal Damage).
e distribution of the shock damage is fitted by Coxian distribution with 3-order. e initial value of the PH distribution is given as follows: e simulation time is about 14 h, and the parameter estimation results based on the simulation data are as follows: Mathematical Problems in Engineering e estimated PDF of the PH distribution and the original log-normal distribution are both shown in Figure 3, and the related statistics are shown in Table 7.
It can be seen from Figure 3 that the fitting result of the estimated PH distribution is not much different from the original log-normal distribution, but it also can be seen from Table 7 that the deviation of the statistics is larger than the deviation in Table 6.
is is because the parameter estimation result of the PH distribution is affected by the randomness of the sample, the initial type of the PH distribution, and the initial value of the parameter.

Parameter Estimation of Damage Distribution (Weibull Damage).
e distribution of the shock damage is fitted by Coxian distribution with 4-order, and the initial value of the PH distribution is given as follows: e simulation time is about 17 h, and the parameter estimation results based on the simulation data are as follows: e estimated PDF of the PH distribution and the original Weibull distribution are both shown in Figure 4, and the relevant statistics are shown in Table 8.
It can be found that if the initial value of the first phase in the absorption rate vector is not set to 0, the PDF of the PH distribution obtained by parameter estimation at point 0 in abscissa is not 0. e reason for this is that the EM method always obtains the iterative parameter value which can maximize the likelihood function that means the PH distribution obtained under the nonzero absorption rate of the first phase has a larger likelihood value, but the result does not match the actual situation. erefore, the initial value of the absorption rate of the first phase should be set to 0.     It can be seen from the case that under the interval data, the PH distributions based on the estimation results can fit well with the original distributions mentioned above. erefore, through the approximate method proposed in this paper, the PH distribution can be used for shock model construction under only interval sample data, which not only improves the integral problem cause by the original distributions, but also can obtain the distribution form with a higher degree of fitting.
During the process of the above case, it is found that, in the early stage of the iteration, the value of the log-likelihood function which is calculated with the relative parameter iterative value θ t obtained by the interval failure data and θ x obtained by the censored sample data, is very small; that is, the difference between the two partial log-likelihood functions through two adjacent iterative loops does not affect the end condition of the iteration. However, it can be seen from equation (17) that performing parameter estimation with these two types of samples needs to calculate a large number of posterior distribution expectation of process statistics. In order to improve the efficiency of the proposed method, some improvements are given here with the estimation of parameters θ x as an example. When performing the estimation of parameters θ x , a threshold ε x > 0 is set for the censored samples which satisfies ε x ≈ ε. When S s�1,δ s �0 ln( X th 0 c n s exp(Y (0) n s x)y (1) n s dx) ≥ ε x , the censored sample data are considered in the parameter estimation. When S s�1,δ s �0 ln( X th 0 c n s exp(Y (0) n s x)y (1) n s dx) < ε x , the parameters are estimated only by the interval failure data. e estimation of parameters θ t in distribution (α, D 0 ) is similar when dealing with interval failure samples. is improvement is only applicable to the case where the sample set contains both censored samples (δ s � 0) and interval failure samples (δ s � 1).

Conclusion
In this paper, an approximate parameter estimation method based on constructing PH distribution with dynamic orders is proposed to solve the decoupling problems in parameter estimation of the shock model when facing the samples with only interval data. Its performance is tested through a simulation case from which can be seen that the method proposed in this paper can obtain well-fitted parameter estimation results using only interval sample data.
ere are also limitations in the application of this method. For example, it is only applicable to the shock problem under a single stress source which means the shock time interval and the shock damage need to be subjected to the condition of iid. If there exists multistress sources which means the shock interval and shock damage may satisfy different types of distributions, the methods presented in this paper will face some identifiability problems. Later work will focus on solving these.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.