^{1}

^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

This paper discusses the predictive maintenance (PM) problem of a single equipment system. It is assumed that the equipment has deteriorating quality states as it operates, resulting in multiple yield levels represented as system observation states. We cast the equipment deterioration as discrete-state and continuous-time semi-Markov decision process (SMDP) model and solve the SMDP problem in reinforcement learning (RL) framework using the strategy-based method. In doing so, the goal is to maximize the system average reward rate (SARR) and generate the optimal maintenance strategy for given observation states. Further, the PM time is capable of being produced by a simulation method. In order to prove the advantage of our proposed method, we introduce the standard sequential preventive maintenance algorithm with unequal time interval. Our proposed method is compared with the sequential preventive maintenance algorithm in a test objective of SARR, and the results tell us that our proposed method can outperform the sequential preventive maintenance algorithm. In the end, the sensitivity analysis of some parameters on the PM time is given.

In real production system, equipment deterioration is almost universal with use, age, and other causes. If the maintenance is not performed, eventually the failure or severe malfunction can occur. Operating the equipment in a deteriorating state often brings about higher production cost and lower product quality. Therefore, an effective maintenance policy is very essential in industrial practice. The periodic or age-based preventive maintenance strategy often leads to inadequate maintenance or over maintenance, in which over maintenance will cause unnecessary interference to production, resulting in the decreased production efficiency and increased production cost. The aim of condition-based maintenance is to see if the maintenance decision should be performed according to the current system state [

There are few theoretical and practical researches on PM in a strict sense compared with condition-based maintenance [

Moreover, in real industrial systems, such as semiconductor production and precision instruments, the deteriorating equipment states are closely related to the quality levels of the products [

Therefore, we claim that it is of great significance to make maintenance decisions by taking quality inspection data into account, which is able to keep the costs down and meet the needs of industrial production management. There are relatively few studies on the factors of production, maintenance, and quality, and no effective methods have been found to find a solution in the existing literature. We attempt to solve the equipment maintenance problem in production practice. Since the deteriorating equipment states cannot be directly observed, a large amount of real-time quality inspection information can be used as the implicit information. A discrete-state continuous-time SMDP with a large number of yield stages is induced to describe the equipment deterioration process. However, it is worth noting that the production and maintenance time are random variables that follow general distributions based on realistic considerations. A strategy iteration-based RL method is put forward to guarantee the optimal strategy solution to the model. Furthermore, the future maintenance time corresponding to each observed state can be produced by a simulation method based on the fixed maintenance strategy, and the influences of the main technical parameters on the optimization goal of the system are analyzed. And finally, the advantages of our proposed RL method for solving such a dynamic environment problem are revealed compared with the sequential preventive maintenance algorithm with unequal time interval.

This paper investigates deteriorating equipment that has multiple discrete states. Assume that the equipment condition can be directly reflected by the condition monitoring measures such as the yield levels. A single type of product is produced, and each processed product is immediately inspected to identify an unqualified product or a qualified product. The inspection time and inspection cost are assumed to be zero. Due to the fault of the inspection equipment or the proficiency of the inspection workers and other reasons, there are certain inspection errors in product quality inspection. The inspection errors are mainly divided into two types [

Type I error: that is the false detection with a probability _{1} and the cost _{e1}. The parameter _{e1} includes the production cost per unit product and other related costs.

Type II error: that is the missed detection with a probability _{2} and the cost _{e2}. The parameter _{e2} includes the production cost per unit product and other possible costs such as the costs arising from quality and safety issues which is far beyond production costs.

In addition, through the accurate inspection, the profit of producing a qualified product is _{d}

The sequential decision-making problem under uncertain conditions can be solved by analyzing the Markov process. A large number of researches related to this issue can be found in stochastic dynamic programming and other related literature [

We employ a discrete-state continuous-time SMDP model to present the deteriorating process of the single equipment system, as shown in Figure _{kl} cannot be obtained directly, the inspection information _{21}), after which the _{11}), and then another updating subcycle is initiated.

The model for SMDP.

In general, the equipment in a production system deteriorates as its condition is getting worse, which will lead to the result of the shorter sojourn time in each quality state. Therefore, this paper assumes that the sojourn time _{kl} under each yield level _{kl} follows a gamma distribution Г (_{kl}, _{kl}. That is, _{k,l+1} = _{s}_{kl} (0 < _{s} < 1). Meanwhile, it is assumed that the stochastic malfunction time interval also follows a gamma distribution under the

The model-free RL is divided into two algorithms, including value iteration-based algorithm and strategy iteration-based algorithm, respectively. Nevertheless, if it is used to solve SMDP problems, the value iteration-based RL algorithm is not suitable, mainly because this algorithm cannot guarantee that the average reward SMDP problems produce the optimal solution [

The RL technology approaches the optimal strategy in the SMDP model through strategy iteration and learns the mapping from environment state to behavior through trial and error, so as to maximize the cumulative SARR

The Q-P learning algorithm can accurately solve the SMDP problems based on average cumulative rewards. In each decision cycle, the current state _{j} (_{j} (

_{max} is a large positive integer; _{0} is the initial value of _{0} = 0.1. It should be noted that the value of _{0} will have a certain influence on the final convergence of the RL algorithm, which can be referred to [

_{e1} is defined as the loss of Type I error

_{d} is defined as the production cost per unit

_{e2} is defined as the loss of Type II error

_{R} is defined as the loss of major repair

_{M} is defined as the loss of minor maintenance

The current strategy of Q-P learning algorithm is

Step 1: Initialization

Initialize the maintenance strategy _{max} and the maximum updating times of the strategy evaluation _{max}; initialize the learning rate parameters

According to the known maintenance strategy

Step 2: Strategy Evaluation

Initialize the current state _{f}, and the cumulative state transition time _{c}.

Choose the greedy action _{n}; otherwise, the random action _{n}

Simulate the decision action

Update state

When the major repair is performed, the corresponding immediate reward and state transition time are obtained, and the action value _{max}, jump to Step 3 (i); otherwise, jump to Step 2 (ii).

Update the visit factors _{n}, and then jump to Step 2 (ii).

Step 3: Strategy Improvement

Let _{max}, stop the learning process; otherwise jump to i-b and continue learning.

According to the action value ^{∗} by using the following equation:

Block diagram of maintenance strategy.

In Section ^{∗} of the deteriorating equipment can be obtained by the proposed method. In this section, the optimal maintenance strategy ^{∗} and the equipment deteriorating process model are used to estimate the future maintenance time corresponding to different observation states _{i}. Firstly, one-dimensional vector _{d} of unqualified product state and one-dimensional vector _{t} of production time are defined. These two vectors record the accumulative quantity of unqualified product ^{∗}, the actions of a new state can be obtained until the equipment performs maintenance action. The vector _{d} records the state from production to the maintenance process. The vector _{t} can directly calculate the maintenance point in time of different states

The detailed process for obtaining the PM time is shown in Figure ^{∗}. The quality state and the production process are random in the simulation process. The maintenance policy is applied to the model of Figure _{i} is produced, and the mean value

Block diagram for PM.

The maintenance action is imperfect; that is, after maintenance, the quality state of the equipment will be improved, and the yield level also will be improved, but the equipment will not be restored to a new state. So, to what extent will the equipment be restored after the maintenance? This section mainly explains this process through the change of yield level before maintenance and after maintenance. Referring to the ideas of Zhu et al. [

_{k} is a degradation factor of equation (_{k} is defined as an age degradation factor, which is a value between 0 and 1; _{k} represents the time interval of the

According to the problem description and the modeling description of the deteriorating equipment in this paper, the relevant parameters are assumed and given in Table _{max} = 15; the maximum updating times of the strategy evaluation _{max} = 10000; the visit factor _{c} ≥ _{f} or

Numerical study parameters.

Production time per unit product | MM time | MR time | Stochastic breakdown time | Yield level limit | Yield level | |||||
---|---|---|---|---|---|---|---|---|---|---|

Γ (10, 0.1) | Γ (20, 0.5) | Γ (100, 0.2) | 0.6 | 4 | ||||||

_{k} | _{k} | _{s} | _{1} | _{2} | _{e1} | _{e2} | _{d} | _{M} | _{R} | |

0.2 | 0.9 | 0.8 | 0.05 | 0.05 | 30 | 60 | 90 | 40 | 100 | 300 |

The method proposed in this paper is adopted for learning and the learning results are shown in Figure

The SARRs for single equipment learned by Q-P method versus the sequential preventive maintenance method.

The sojourn time _{kl} for each state is related to the decrease factor of sojourn time _{s}. The smaller _{s} is, the change of _{kl} will be greater. Correspondingly, the PM time will also change. The PM time increases when _{s} decreases, as shown in Figure _{s} is smaller, and the expected SARR in the long run will increase. For example, when _{s} decrease from 1 to 0.6, the expected SARR changes from 20.6 to 30.

Impact of decrease factor of sojourn time.

Figure _{2} increases from 0 to 0.1. The reason is that the increase of _{2} can result in a reduction of long-run expected SARR, and it decreases from 31.8 to 30.6. At the same time, the PM time is not sensitive to the change of Type II error _{2}; this is due to the fact that the cost _{e2} of Type II error _{e2} _{1} continues to increase, because _{e1} is comparatively small and the growth parameter _{1} can result in a reduction of long-run expected SARR.

Impact of Type II error.

_{f}_{f} refers to the cost of wrongly identifying a qualified product as an unqualified product. From Figure _{f} increases; this is due to the fact that increase in _{f} leads to a decrease in the long-term expected SARR. Meanwhile, Figure _{f}, which is caused by the assumption of a very small false detection probability _{f} in this paper.

Impact of the cost _{f}.

_{n}_{n} is the cost of wrongly identifying an unqualified product as a qualified product. As shown in Figure _{n} increases, the PM time decreases; this is because the expected SARR in the long run decreases as the cost _{n} increases. Figure _{n}, which is caused by the assumption of a very small probability of missed detection _{n} in this paper.

Impact of the cost _{n}.

The coefficient

Impact of initial quality deterioration rate _{y}.

In this paper, we propose a PM method for single deteriorating equipment having multiple yield quality problems. It is assumed that the yield stage is coupled with the equipment quality state, and a stochastic breakdown can also occur besides the quality failure. Moreover, the equipment cannot return to normal operating condition without repair. We assume that there are two decision actions including MM and MR in each observation state. The preventive maintenance is MM, which can be performed in a deteriorating quality state, while the MR is forced to be implemented in a failure state. A discrete-state continuous-time SMDP model is proposed to present the deterioration process of the equipment. The Q-P method in the RL framework is utilized to solve the SMDP model. Given the product quality inspection data with certain detection errors, the optimal maintenance strategy based on each observed state is produced by taking into account the goal of maximizing the long-run expected SARR. The PM time is capable of being achieved by a simulation method.

Through the simulation examples, it is proved that the proposed method adopted in this paper is capable of solving the PM problems of the equipment under dynamic environment. The experimental results also prove that the proposed method can outperform the standard sequential preventive maintenance method with unequal time interval. The change of maintenance action rules is further shown, which is not progressive with the increase of maintenance times and unqualified rate. It can also be observed that the PM time depends on the observed state, and it decreases as the total number of products produced increases and also decreases monotonically as the number of unqualified products increases for a given total number of products produced. Moreover, an increase in the number of maintenance times will also cause a decrease in the PM time. In addition, the influences of the main parameters on the optimization goal are also investigated.

The relevant data of calculation used to support the findings of this study are included within the article.

The authors declare that they have no conflicts of interest regarding the publication of this paper.

This work was supported by the Natural Science Foundation of Liaoning Province under Grant 20180550746 and the National Science Foundation of China under Grant 61901283.