This paper discusses the predictive maintenance (PM) problem of a single equipment system. It is assumed that the equipment has deteriorating quality states as it operates, resulting in multiple yield levels represented as system observation states. We cast the equipment deterioration as discrete-state and continuous-time semi-Markov decision process (SMDP) model and solve the SMDP problem in reinforcement learning (RL) framework using the strategy-based method. In doing so, the goal is to maximize the system average reward rate (SARR) and generate the optimal maintenance strategy for given observation states. Further, the PM time is capable of being produced by a simulation method. In order to prove the advantage of our proposed method, we introduce the standard sequential preventive maintenance algorithm with unequal time interval. Our proposed method is compared with the sequential preventive maintenance algorithm in a test objective of SARR, and the results tell us that our proposed method can outperform the sequential preventive maintenance algorithm. In the end, the sensitivity analysis of some parameters on the PM time is given.
In real production system, equipment deterioration is almost universal with use, age, and other causes. If the maintenance is not performed, eventually the failure or severe malfunction can occur. Operating the equipment in a deteriorating state often brings about higher production cost and lower product quality. Therefore, an effective maintenance policy is very essential in industrial practice. The periodic or age-based preventive maintenance strategy often leads to inadequate maintenance or over maintenance, in which over maintenance will cause unnecessary interference to production, resulting in the decreased production efficiency and increased production cost. The aim of condition-based maintenance is to see if the maintenance decision should be performed according to the current system state [
There are few theoretical and practical researches on PM in a strict sense compared with condition-based maintenance [
Moreover, in real industrial systems, such as semiconductor production and precision instruments, the deteriorating equipment states are closely related to the quality levels of the products [
Therefore, we claim that it is of great significance to make maintenance decisions by taking quality inspection data into account, which is able to keep the costs down and meet the needs of industrial production management. There are relatively few studies on the factors of production, maintenance, and quality, and no effective methods have been found to find a solution in the existing literature. We attempt to solve the equipment maintenance problem in production practice. Since the deteriorating equipment states cannot be directly observed, a large amount of real-time quality inspection information can be used as the implicit information. A discrete-state continuous-time SMDP with a large number of yield stages is induced to describe the equipment deterioration process. However, it is worth noting that the production and maintenance time are random variables that follow general distributions based on realistic considerations. A strategy iteration-based RL method is put forward to guarantee the optimal strategy solution to the model. Furthermore, the future maintenance time corresponding to each observed state can be produced by a simulation method based on the fixed maintenance strategy, and the influences of the main technical parameters on the optimization goal of the system are analyzed. And finally, the advantages of our proposed RL method for solving such a dynamic environment problem are revealed compared with the sequential preventive maintenance algorithm with unequal time interval.
This paper investigates deteriorating equipment that has multiple discrete states. Assume that the equipment condition can be directly reflected by the condition monitoring measures such as the yield levels. A single type of product is produced, and each processed product is immediately inspected to identify an unqualified product or a qualified product. The inspection time and inspection cost are assumed to be zero. Due to the fault of the inspection equipment or the proficiency of the inspection workers and other reasons, there are certain inspection errors in product quality inspection. The inspection errors are mainly divided into two types [ Type I error: that is the false detection with a probability Type II error: that is the missed detection with a probability In addition, through the accurate inspection, the profit of producing a qualified product is
The sequential decision-making problem under uncertain conditions can be solved by analyzing the Markov process. A large number of researches related to this issue can be found in stochastic dynamic programming and other related literature [
We employ a discrete-state continuous-time SMDP model to present the deteriorating process of the single equipment system, as shown in Figure
The model for SMDP.
In general, the equipment in a production system deteriorates as its condition is getting worse, which will lead to the result of the shorter sojourn time in each quality state. Therefore, this paper assumes that the sojourn time
The model-free RL is divided into two algorithms, including value iteration-based algorithm and strategy iteration-based algorithm, respectively. Nevertheless, if it is used to solve SMDP problems, the value iteration-based RL algorithm is not suitable, mainly because this algorithm cannot guarantee that the average reward SMDP problems produce the optimal solution [
The RL technology approaches the optimal strategy in the SMDP model through strategy iteration and learns the mapping from environment state to behavior through trial and error, so as to maximize the cumulative SARR
The Q-P learning algorithm can accurately solve the SMDP problems based on average cumulative rewards. In each decision cycle, the current state
The current strategy of Q-P learning algorithm is Step 1: Initialization Initialize the maintenance strategy According to the known maintenance strategy Step 2: Strategy Evaluation Initialize the current state Choose the greedy action Simulate the decision action Update state When the major repair is performed, the corresponding immediate reward and state transition time are obtained, and the action value Update the visit factors Step 3: Strategy Improvement Let According to the action value
Block diagram of maintenance strategy.
In Section
The detailed process for obtaining the PM time is shown in Figure
Block diagram for PM.
The maintenance action is imperfect; that is, after maintenance, the quality state of the equipment will be improved, and the yield level also will be improved, but the equipment will not be restored to a new state. So, to what extent will the equipment be restored after the maintenance? This section mainly explains this process through the change of yield level before maintenance and after maintenance. Referring to the ideas of Zhu et al. [
According to the problem description and the modeling description of the deteriorating equipment in this paper, the relevant parameters are assumed and given in Table
Numerical study parameters.
Production time per unit product | MM time | MR time | Stochastic breakdown time | Yield level limit | Yield level | |||||
---|---|---|---|---|---|---|---|---|---|---|
Γ (10, 0.1) | Γ (20, 0.5) | Γ (100, 0.2) | 0.6 | 4 | ||||||
0.2 | 0.9 | 0.8 | 0.05 | 0.05 | 30 | 60 | 90 | 40 | 100 | 300 |
The method proposed in this paper is adopted for learning and the learning results are shown in Figure
The SARRs for single equipment learned by Q-P method versus the sequential preventive maintenance method.
The sojourn time
Impact of decrease factor of sojourn time.
Figure
Impact of Type II error.
Impact of the cost
Impact of the cost
The coefficient
Impact of initial quality deterioration rate
In this paper, we propose a PM method for single deteriorating equipment having multiple yield quality problems. It is assumed that the yield stage is coupled with the equipment quality state, and a stochastic breakdown can also occur besides the quality failure. Moreover, the equipment cannot return to normal operating condition without repair. We assume that there are two decision actions including MM and MR in each observation state. The preventive maintenance is MM, which can be performed in a deteriorating quality state, while the MR is forced to be implemented in a failure state. A discrete-state continuous-time SMDP model is proposed to present the deterioration process of the equipment. The Q-P method in the RL framework is utilized to solve the SMDP model. Given the product quality inspection data with certain detection errors, the optimal maintenance strategy based on each observed state is produced by taking into account the goal of maximizing the long-run expected SARR. The PM time is capable of being achieved by a simulation method.
Through the simulation examples, it is proved that the proposed method adopted in this paper is capable of solving the PM problems of the equipment under dynamic environment. The experimental results also prove that the proposed method can outperform the standard sequential preventive maintenance method with unequal time interval. The change of maintenance action rules is further shown, which is not progressive with the increase of maintenance times and unqualified rate. It can also be observed that the PM time depends on the observed state, and it decreases as the total number of products produced increases and also decreases monotonically as the number of unqualified products increases for a given total number of products produced. Moreover, an increase in the number of maintenance times will also cause a decrease in the PM time. In addition, the influences of the main parameters on the optimization goal are also investigated.
The relevant data of calculation used to support the findings of this study are included within the article.
The authors declare that they have no conflicts of interest regarding the publication of this paper.
This work was supported by the Natural Science Foundation of Liaoning Province under Grant 20180550746 and the National Science Foundation of China under Grant 61901283.