Fault Sample Generation for Virtual Testability Demonstration Test Subject to Minimal Maintenance and Scheduled Replacement

Virtual testability demonstration test brings new requirements to the fault sample generation. First, fault occurrence process is described by stochastic process theory. It is discussed that fault occurrence process subject to minimal repair is nonhomogeneous Poisson process (NHPP). Second, the interarrival time distribution function of the next fault event is proposed and three typical kinds of parameterized NHPP are discussed.Third, the procedure of fault sample generation is put forward with the assumptions of minimal maintenance and scheduled replacement. The fault modes and their occurrence time subject to specified conditions and time period can be obtained. Finally, an antenna driving subsystem in automatic pointing and tracking platform is taken as a case to illustrate the proposed method. Results indicate that both the size and structure of the fault samples generated by the proposed method are reasonable and effective. The proposed method can be applied to virtual testability demonstration test well.


Introduction
Recently, testability test has two basic methods, including fault injection test and field test.Both of them are physical tests.It often takes long time to get enough original fault samples in field test.In order to accelerate testability demonstration, fault injection is always applied in the testability test [1][2][3][4].
However, application results indicate that testability demonstration test based on fault injection has two unavoidable problems [1][2][3][4][5].First, large numbers of fault injection tests lead to high cost.Second, some faults cannot be allowed to be injected because of destroyable influence and some faults cannot be effectively injected because of restricted fault injection means.These two shortcomings lead to unreasonable fault sample structure and low confidence.
The fault sample selection is to determine appropriate sample size and to make fault sample structure reasonable, that is, to select representative fault samples [2][3][4][5][6].On one hand, considering the limits of test cost and time cost, the fault sample size needs to be as small as possible.On the other hand, in order to improve the accuracy and precision of test demonstration results, the fault sample size needs to be as large as possible.It results in a difficult contradiction [1][2][3][4].
Nowadays, many researches attach importance to virtual test.Virtual test can simulate the process of a real test and obtain test results in an efficient way.It means that virtual test can effectively decrease the test cost and risk and shorten the test period compared with physical test.According to recent studies, large-scale system modeling and simulation are difficult while small-scale system modeling and simulation can be performed in the present technical conditions [1,[7][8][9].
As mentioned above, virtual test has many advantages, such as high efficiency, short test period, and low cost.As the fault sample size of virtual testability test is almost unlimited, it overcomes some deficiencies of physical testability test.Thus, the fault sample generation in virtual testability test is different from fault sample selection in physical testability test.
The combination of minimal maintenance and scheduled replacement is the main maintenance mode for many systems.The occurrences of faults are nonhomogeneous because faults occur randomly and are repairable.On the basis of Monte Carlo method, Zhao et al. proposed a fault sample generation method which was subject to exponential distribution [4].Considering various types of life distribution and assuming perfect maintenance, Zhang et al. proposed a fault sample generation method based on renewal process [1].
The nonhomogeneous Poisson process has clear physical meanings and theoretical basis.It is widely applied to system reliability analysis, reliability indices calculation, and reliability growth test.
This paper discusses the occurrence process of faults and describes it by NHPP.A suitable fault sample simulation method for virtual testability demonstration is proposed.The main idea of the method proposed in this paper is obtaining the value and composition of fault sample based on fault statistical model and statistical simulation.The purpose is obtaining an implementation of fault occurrence within the specified time and conditions, which is called fault sample simulation in this paper.

Description of Fault Occurrence Process
Let () be the total number of faults up to time .Faults more than two at time  would be ignored under single fault assumption.So () has the following properties: (1) () ≥ 0; (2) (0) = 0; (3) () is integer valued; (4) the process has independent increments; (5) According to the definition of counting process, fault occurrence process {(),  ≥ 0} is a Poisson process with the parameter   () [10,11].The parameter   () is also defined as the intensity function, which describes the intensity level of fault occurrence.If   () is a constant, {(),  ≥ 0} is a homogeneous Poisson process (HPP).Otherwise, it is a nonhomogeneous Poisson process.The nonhomogeneous Poisson process is the generalized form of homogeneous Poisson process.Equally, the homogeneous Poisson process is the special case of a nonhomogeneous Poisson process [11].
Let () denote the mean number of faults in the interval (0, ]; (1) () is also called cumulative number of occurrence of failures.Thus, where  (3) Consider a repairable system that is put into operation at time  = 0.The first fault event of the system will occur at time  1 .The second fault will occur at time  2 and so on.We thus get a sequence of fault time  1 ,  2 ,. ... Let   be the time between the (−1)th fault event and the th fault event for  = 1, 2, 3, . .., where  0 is taken to be zero.  is called the interarrival time .{  ,  = 1, 2, 3, . ..} is called the sequence of fault interarrival time.Fault occurrence process is indicated in Figure 1.
If fault occurrence process is a HPP having rate   , all the fault interarrival time is independent and exponentially distributed with the same parameter   .If the failure component is replaced or restored to an "as good as new" condition and its lifetime distribution is exponential distribution, the fault occurrence process may be a HPP.However, it is hard to meet.The rate of occurrence of faults may vary with time.
It is important to note that some fault occurrence processes do not have stationary increments.The rate of occurrence of faults varies with time rather than being a constant.This means that failures may be more or less likely to occur at certain time than others, and hence the interarrival time is generally neither independent nor identically distributed [11][12][13][14][15].
The NHPP is generalization of HPP having the HPP as a special case.It is often used to model repairable systems that are subject to a minimal repair strategy with negligible repair time.Minimal repair means that a failed system is restored just back to functioning state.After a minimal repair, the system continues as if nothing had happened.The likelihood of a system fault is the same immediately before and after a fault.
Consider a system consisting of many components.Suppose that a component fails and causes a system failure and this component is immediately replaced by a component of the same type, thus causing a negligible system downtime.Since only a small fraction of the system is replaced, it seems natural to assume that the system's reliability after the repair essentially is the same as immediately before the failure.In other words, the assumption of minimal repair is a realistic approximation.The minimal repair assumption is therefore often applicable and the NHPP may be accepted as a realistic model [11][12][13][14][15].
Schematic diagram of fault detection process is shown in Figure 2. Let   ( = 1, 2, . ..) be the interval time of adjacent fault detection.Variables   ( = 1, 2, . ..) are influenced by fault occurrence process and testability plan.The number of fault detection and the interval time of fault detection are random.The number of detected faults is random, too.
Generally, the observed values of fault detection rate always change in the specified time period (0, ].The formula is where  is the time variable,   () is the number of detected faults up to time , and () is the total number of faults up to time .As discussed above,   () and () are stochastic.Thus, the observed value of fault detection rate  FD is random, too.Note that if a fault event occurs at time , then, independent of what has occurred prior to , the additional time until the next fault event has the distribution   .Let {() − ( − ℎ) = 1} denote that a fault event occurs at time  (ℎ → 0).This event is recorded as Az.Let  Δ be the interval time between Az and the next fault event.Then, the event { Δ < } is equal to a fault event occurring in (,  + ); that is, {( + ) − () = 1}.The interarrival time distribution function   () of the next fault event after time  is

Fault Sample Generation
According to the independent assumption of fault event, (5) can be simplified as According to (1) and (3), it is obtained that We can now simulate the fault event time  1 ,  2 , . . .by generating  1 from the distribution  0 .Then, we generate  2 by adding  1 to a generated value from the distribution   1 .We generate  3 by adding  2 to a generated value from the distribution   2 and so on [16].

Parametric NHPP Models of Fault Occurrence Process.
The key of fault events simulation is the distribution function and its inverse function.If fault occurrence process is described by NHPP, it can be uniquely determined by the rate of occurrence of faults   ().Fault occurrence processes are usually classified into linear model, power law model, and log-linear model according to the shape of the   () [11].The three models can be expressed in the common form where  0 is a common multiplier and (; ) determines the shape of the   ().
In the linear model, the ROCOF of the NHPP is defined as The interarrival time distribution function   () of the next fault event after time  is The inverse functionof   () is A repairable system modeled by the linear model is deteriorating if  > 0 and improving when  < 0. When  = 0, the log-linear model reduces to a HPP.When  < 0, then   () will sooner or later become less than zero.
The model can only be used in time intervals where   () > 0. The linear model is often used to describe the fault occurrence process in random failure period.
In the power law model, the ROCOF of the NHPP is defined as Thus, A repairable system modeled by the power law model is seen to be improving if 0 <  < 1, and deteriorating if  > 1.If  = 0, the model reduces to a HPP.This NHPP is sometimes referred to as a Weibull process, since the ROCOF has the same functional form as the failure rate function of the Weibull distribution.The power law model is often used to describe the fault occurrence process of electromechanical systems and reliability growth model.
In the log-linear model, the ROCOF of the NHPP is defined by Thus, A repairable system modeled by the log-linear model is improving if  < 0 and deteriorating if  > 0. When  = 0, the log-linear model reduces to a HPP.The log-linear model is often used to describe the fault occurrence process of electronic systems.
The cumulative number of faults and occurrence time of each fault subject to specified conditions and time period can be obtained according to reliability test and other trials.Appropriate parametric NHPP model will be selected according to failure statistics.We do not intend to discuss the model selection and parameter estimation method in this paper.

Fault Sample Simulation.
In this paper, we assume that the repair or maintenance time is negligible and the corrective maintenance is minimal maintenance or repair, that is, the maintenance action which restores the part to the failure rate it had when it failed.The part after repair is as bad as old.
If fault events occur before the scheduled replacement, the part will be processed by breakdown maintenance.If no fault event occurs before scheduled replacement, the part should be replaced by a brand new one regardless of its health condition when it meets the replacement requirement.
Statistical simulation method is also known as random simulation method, random sampling method, or statistical test method.It can effectively solve uncertainty problems and complex computing problems.For example, Monte Carlo method is widely applied in financial engineering, statistical physics, computational mathematics, reliability engineering, and other fields [17,18].
The flow chart of fault sample simulation is showed in Figure 3.   is the scheduled interval replacement time.  () is the interarrival time distribution function of the next fault event after time .  ( = 1, 2, . ..) are uniform (0, 1) random variables.  ( = 1, 2, . ..) are the simulation results of interarrival time of fault events.[] is the cumulative working time of the th part.The initial value of [] is zero for  = 1, 2, 3, . . . . is the cumulative number of fault events.  is the occurrence time of the th fault event. is the cumulative working time of the parts. * is the specified statistical time.
The basic steps of fault sample simulation are as follows.
Step 1. Determine the parameters of the NHPP and set the interval replacement time   .
Step 3. Solve the interarrival time distribution function   () of the next fault event after time .
Step 5. Generate the random number   .
Step 6. Calculate the interarrival time   based on the direct sampling method,   =  −1  (  ).
Step 9. Obtain fault samples based on probability proportional to size (PPS) sampling method.Each fault mode is set to be proportional to its occurrence percentage ratio (OPR).

Examples
An automatic pointing and tracking platform has the ability to isolate the movement of moving vehicles, such as car, ship, and aircraft.It can automatically track the target and maintain stable communication.The stable tracking platform consists of multiple subsystems.We take antenna driving subsystem as example to carry out experiments.The lifetime of the automatic pointing and tracking platform is 15 years.The average working time is 1500 hours per year.The lifetime of the antenna drive subsystem is 7500 working hours.The antenna driving subsystems are replaced by new ones every 5 years.In the subsystem's life cycle, breakdown maintenance and the assumption of minimal repair with negligible repair times are taken when it fails.The subsystems are replaced by new ones with the assumption of perfect repair after the end of their life cycle.As the platform is new equipment, the failure statistics in full life cycle are poor.The same antenna driving subsystems have been tried out for 5 years in advance.We collected some credible and valuable failure and maintenance statistics of the subsystem in their single life cycle.The statistics contain 56 complete sets of trial data.Fault modes and their occurrence percentage ratio of the antenna driving subsystem are shown in Table 1.The fault occurrence process of the antenna driving subsystem in its single life cycle is a NHPP.The parameters of ROCOF were estimated by maximum likelihood method [11].We obtained a NHPP in (0,  * ) and faults occurred at time  1 ,  2 , . ... β can be found by solving Then, solve λ0 by The result is   () = 0.00054 0.00022 .The interarrival time distribution function   () of the next fault event after time  of the antenna driving subsystem is The inverse function of   () is It is assumed that the specified statistical time of testability demonstration is 15 years.The fault samples are generated by fault occurrence process simulation based on the proposed method.The fault modes and their occurrence time are obtained.A simulation result of fault sample is shown in Table 2.
The cumulative number of the subsystem faults is shown in Figure 4. Abscissa represents the cumulative working time.Ordinate represents the cumulative number of faults.The cumulative number of faults increases one when a fault occurs.
We implement the statistical simulation 1000 times.The specified statistical simulation time is single life cycle of the antenna driving subsystem.1000 groups of fault samples are generated automatically by simulation.The numbers of faults in fault sample are random variables.We compare some statistics of the actual samples and the simulation samples to examine the effectiveness of the proposed method.The   comparison is shown in Table 3.Let   denote the number of faults of the th fault sample.The sample mean is where  is the sample size.The sample variance is The two-order origin moment is The sample values are arranged in increasing order so as to meet The sample median is is an odd number 1 2 ( /2 +  (/2)+1 ) is an even number. (23) The percentage ratios of fault modes are also figured out and compared in Table 3.We can get that the statistics of the simulation results are nearly consistent with the actual fault samples according to the comparison.The composition of the simulation samples is rational.The results show that the proposed method is feasible and effective.The random fault samples generated by statistical simulation can be applied to virtual testability demonstration test.

Conclusion
(1) It is analyzed and pointed out that the fault sample generation in virtual testability test is different from fault sample selection in physical testability test.
(2) In the case of minimal repair and scheduled replacement, the fault occurrence process can be described by NHPP theory.A fault sample generation approach for virtual testability demonstration test is proposed.
(3) As some assumptions are eliminated, the size and structure of the fault samples simulated by proposed method are reasonable.Experiment results show that the proposed method is feasible and effective.It can also be applied to virtual maintainability test and integrated logistics support scheme design.

𝑁(t):
The number of detected faults up to time    (t): The number of detected faults up to time   FD : Fault detection rate : F a u l t s a m p l e s i z e (): M e a nn u m b e ro ff a u l t si nt h ei n t e r v a l (0, ]   (): The rate of occurrence of faults at time    ( = 1, 2, . ..):The interval time of adjacent fault detection : The event denoting one occurring fault  Δ : The interval time between  and the next fault event   (): Interval time distribution function of the next fault event after time  (; ): The coefficient determines the shape of the   ()   : The scheduled interval replacement time   ( = 1, 2, . ..):Random variables having the uniform distribution in (0, 1)

Figure 2 :
Figure 2: Schematic diagram of fault detection process.

7 .
If [] +   <   , it shows that the part has broken down before the scheduled replacement; the cumulative working time of this part is [] = [] +   .The cumulative number of fault events adds 1; that is,  =  + 1.Then, set   = [],  = [], and  =  + [].If [] +   ≥   , it shows that the part is good until the scheduled replacement time.The part is as good as new after replacement.Then, set [] = [] +   ,  = 0,  =  + [], and  =  + 1.
() is called the rate of occurrence of faults (ROCOF) at time .It can be regarded as the mean number of faults per time unit at time .If   () is a constant, {(),  ≥ 0} is a HPP.
3.1.TheFault Events Simulation.Let  1 ,  2 , . . .denote the successive fault event time of such a fault occurrence process.As these random variables are clearly dependent, we generate them in sequence starting with  1 and use the generated value of  1 to generate  2 and so on.

Table 1 :
Fault modes and their occurrence percentage ratio.

Table 2 :
A simulation result of fault sample.

Table 3 :
The comparison of sample statistics.