Abandon Policies for Two Types of Multiattempt Missions

Systems that perform critical missions will often be aected by internal degradation until they reach a failure state. For safety critical systems that perform certain tasks, failure of the systems will have serious consequences. In such scenarios, the survival of the systems has a higher priority than the completion of the task. ­e task can be suspended at an appropriate time and a rescue procedure can be initiated to reduce the risk of system failures. When the task is important, the systems can try to execute the task multiple times after the task is abandoned and the rescue is completed, to improve the probability of task completion. ­is study further expands the existing research on multiple task abandon strategy by proposing degradation-based multicriteria mission abandon policies consideringmultiattempts and two types of task success criteria.­e task is abandoned dynamically based on the degradation level and time in the mission in each attempt. Under the dynamic abandon policies, mission reliability, and systems survivability are evaluated using the recursive method. ­e optimal abandon thresholds are investigated numerically.


Introduction
e existing systems reliability models mainly focus on the ability of systems to perform certain functions under given operating conditions and at a speci c time. e probability of task success is an important index for task systems, that is, the probability of completing a speci c task under certain conditions [1][2][3]. However, in engineering practice, when systems failure will cause serious consequences or huge losses, the survival of the systems may be more important than completing the task. erefore, for critical safety systems, the survival probability of the systems can be improved by terminating the task, to e ectively reduce the risk of casualties and huge economic losses. When a certain condition is met, the task can be suspended and the safety rescue program can be started to maintain the systems [4]. For example, when the working aircraft su ers a certain amount of lightning impact, it can immediately stop the mission and carry out rescue to avoid aircraft damage and human death. e mission abandon strategy has been shown to be e ective for many engineering systems in enhancing their survivability. Levitin et al. [5] proposed two important indicators to evaluate the reliability of the systems with the possibility of task suspension: task reliability and system survivability.
Task reliability refers to the probability that the task will be completed within a speci c time, and systems survivability refers to the probability that the systems will survive the task without any catastrophic failures. By balancing these two key indicators, the abandonment threshold of tasks can be optimized to minimize the total operating cost.
Due to its great application value, research on task abandon strategy and related optimization design and expansion has attracted increasing attention. Since Myers [6] proposed the task suspension strategy in his pioneering paper, many models have been developed to study the impact of the task suspension strategy on the system operation process [7]. According to the failure mechanisms and mission characteristics of a systems, the task abandon conditions are reasonably designed from various perspectives to e ectively balance the two reliability indexes of task reliability and systems survivability. e failure risk of safety critical systems comes mainly from internal degradation and an external impact environment. For systems with internal degradation, Qiu et al. [8] studied the two-stage failure process including the normal stage and the defect stage and considered the abandonment strategy of the task based on the level and duration of degradation in the defective stage, respectively. e former abandoned the task when the level of degradation of the system was greater than the threshold, and the latter abandoned the task when the duration of the defect stage was greater than the threshold. Additionally, the reliability of tasks and the system's survivability under the two suspension strategies are evaluated. Zhao et al. [9,10] investigated optimal condition-based task abandon policies. Yang et al. [11] designed risk control policies of mission critical systems: by abandoning decision-making that integrates health and age conditions.
Research on the task abandon strategy in impact environment has recently attracted increasing attention. Different task suspension strategies are proposed on the basis of different systems structures and suspension conditions. Levitin et al. [4] designed a mission abandon strategy based on the number of shocks when studying the reliability of two-state systems in an impact environment and proposed the abandon strategy of tasks for systems that execute a specific task in a random environment. When cumulative or consecutive times of effective impact reach the threshold, the task is abandoned, and rescue is carried out. Cha et al. [12] took the number of minor repairs as the decision-making parameter of the abandonment strategy of tasks of some repairable systems to balance the two indicators of task reliability and system survivability. Wu et al. [13] studied the optimal task abandon strategy of balanced systems, in which the number of failed or forced closed components in any sector is considered the decision parameter of the task abandon strategy. In addition to binary-state systems, related research on multistate systems has received considerable attention. Levitin et al. [14] studied the task abandon strategy of multistate systems running in random impact environment, which is modeled by renewal processes and nonhomogeneous Poisson process, and the task abandon criteria is the number of shocks experienced by the systems. Wang et al. [15] proposed the task abandon strategy of balanced systems with multiple multistate components, characterized the system operation process using the Markov process method, and proposed two competitive abandon criteria, namely, the maximum state difference of the component and the number of damaged components. Wu et al. [13] studied the optimal mission abandon policy for k-out-of-n: F balanced systems.
Most previous studies allowed only one attempt to complete the task, that is, in the attempt, the systems will no longer perform the task after the task is abandoned and a successful rescue. However, when it is important to complete a task, and there are no strict restrictions on time and resources, the systems can try to complete the task multiple times [16]. In real-world applications, this property is called time redundancy, which can enhance the task reliability significantly and consequently influence the decisionmaking process. Levitin et al. [17] first proposed the task abandon strategy under multiple attempts. After the rescue is successful, the systems can try to complete the task again. e trade-off between mission success probability and system survival probability is also discussed, and an engineering case of UAV is given to prove the effectiveness of the strategy. Based on previous research, researchers further assumed that the probability of mission success is a function of the number of attempts successfully completed, and the probability of attempt success and rescue success depend on the number of shocks suffered by the systems in the corresponding mission stage. In addition, the trade-off between overall mission success probability and systems failure probability is discussed, and the corresponding optimization model is given. In the multiple task suspension strategy of multistate repairable systems, the systems will be repaired to perfect state after each rescue, and the repair time depends on the state of the system before repair. e goal of the optimization model is to maximize the probability that the systems running in a random environment complete the task in a certain time. ere is little research on multiple attempts and task abandon strategy of multicomponent systems. Levitin et al. [18] considered that systems with multiple identical components can execute a task in parallel, that is, each component can complete the task independently. When a component completes the task, the task is successful. In this model, the decision variable is the impact times of a single component, which is related to the number of failed components and the number of attempts. In addition, the trade-off between mission success probability and the number of expected lost parts is discussed and an optimization model aimed at minimizing the overall operating cost is proposed.
Despite the significant theoretical advancement in abandon modeling, the degradation-based multicriteria abandon policy for multiattempt missions has not been explored. Taking into account the degradation and mission characteristics when abandoning decisions are made shall lead to more effective and beneficial abandonment policies of tasks. To further advance the state-of-the-art of abandon modeling, based on the research of the existing literature, this study proposes the degradation-based multicriteria task suspension strategy for the first time and gives the corresponding dynamic suspension criteria according to the degradation level and time in mission. In each attempt, if the degradation level reaches the threshold and the corresponding time in mission is less than the time threshold, then the task is abandoned and the rescue starts. If the rescue is successful, start the next attempt with new degradation and time thresholds until the maximum number of attempts is reached before the allowable time, the task is completed, or the systems fail. In addition, two types of task success criteria is considered, including the continuous operating time and cumulative operating time. Finally, the corresponding strategy optimization model is proposed, and numerical example analysis and key parameter sensitivity analysis are performed combined with engineering examples, such as cloud computing systems. e rest of this paper is presented in the following structure. In Section 2, we model the monotone degradation process and develop dynamic task abandon policies under two types of task success criteria. e task success probability and system survivability are, respectively, evaluated under two types of task success criteria in Sections 3 and 4. Section 5 optimizes the proposed abandon thresholds. In Section 6, we illustrate the results obtained by a case study. We 2 Journal of Mathematics conclude the research in Section 7 discussing the conclusions and future research directions.

Deterioration
Modeling. e deterioration process of the considered systems is denoted by Z(t), t ≥ 0 { } having increasing monotone degradation pathways to reflect the increasing physical deterioration processes such as wear and crack. e most commonly used stochastic process models for stochastically increasing degradation in existing studies is the Gamma process. e gamma process and the corresponding extensions have been extensively studied in degradation modeling and have been shown to be useful in analyzing degradation data (e.g., [19]). However, there are many applications (e.g., [20]) in which Gamma processes do not fit the data at all. Alternatively, the inverse Gaussian process can be used to model degradation processes possessing a monotone degradation path.
is study assumes that Z(t), t ≥ 0 { } follows the inverse Gaussian process thanks to its nice mathematical properties and physical implications. According to the property of inverse Gaussian process, Z(t) has independent increments following inverse Gaussian distribution. To be specific, for s < t, the degradation increment in time interval where η is the volatility parameter, Λ(t) is a monotonic increasing function with Λ(0) � 0. For simplicity, this study assumes Λ(t) to be linear and Λ(t) � t. en Z(t) follows the inverse Gaussian distribution IG(t, ηt 2 ) with probability density function (PDF) given as and cumulative distribution function (CDF) is where Φ(•) is the standard normal CDF. System failure occurs once the level of deterioration exceeds a predetermined threshold ℓ. Consequently, the failure time T can be defined as the first hitting time of the deterioration process Z(t) with respect to the failure threshold ℓ. e CDF of T can be given as and PDF of T is given as where ϕ(•) is the standard normal PDF.

Multicriteria Abandon Policies.
e considered systems is required to keep operating for a duration of τ by the required deadline τ to completed the task. e maximum number of attempts is K. We consider the following two common types of task success criteria. (a) Task success criteria I: the continuous operating time should exceed a threshold τ (τ < τ); (b) Task success criteria II: the cumulative operating time should exceed a threshold τ(τ < τ).
SS is measured through the probability that the no catastrophic failure occurs during task execution. To enhance the SS of the considered systems, at an inspection, a task can be abandoned if the deterioration level is larger than a specified level and a rescue procedure taking a duration of φ(t) is started. Let ε be the time after which the task success takes less time than the rescue procedure. Namely, φ(t) + t > τ, ∀t > ε. us, for t > ε, the remaining task takes less time than the rescue procedure, and the task will not be aborted.
At each attempt, the abandon decision is controlled through the thresholds of degradation level and time in the mission. To be specific, in the k th attempt, the thresholds for degradation and time in mission are denoted by g k and t k respectively. Let T(g k ) be the random time from the beginning of the k th attempt to the abandon instant if threshold g k is taken, which is the first passage time of Z(t) with respect to the threshold g k . If T(g k ) is less than t k , then the mission is aborted otherwise, the mission continues. Using the level of degradation and the time in mission and the remaining time, the abandonment decision is made based on the abandonment function A(T(g k )|g k , t k ). is function takes the value of 1 if the mission is abandoned, and the value of 0 if the systems continues the mission. According to the abandon policy, A(T(g k )|g k , t k ) can be expressed as where I(A) denotes the indicator function of event A. By (4), the distribution function of T(g k ), F T(g k ) (t), is given as

Performance Evaluation under Task Success Criteria I
In this section, mission reliability and systems survivability under task success criteria I under the proposed dynamic Journal of Mathematics abandon policies. Due to multiple attempts, we use an event transition-based numerical algorithm to evaluate mission reliability and systems survivability.

Mission Reliability under Task Success Criteria I.
Task success criteria I require that the systems continuously operate for a duration larger than a threshold τ(τ < τ). Let S k be the remaining random time for the execution of the task before the k th attempt and α k (s|g k , t k ) be the PDF of S k given the abandon thresholds of g k and t k . A new system starts operating with the remaining task execution time τ at time 0. erefore, by definition of α k (s|g k , t k ), the corresponding probability mass function of S 0 can be given as follows Given the abandon threshold in the (k − 1) th attempt g k− 1 , the elapsed time of the (k − 1) th attempt is en the remaining time for task execution at the beginning of the k th task is given as Given the remaining time before the k th attempt s, and the mission is aborted at time t and survives the rescue procedure, then the remaining time before the (k − 1) th attempt is s + t + φ(t). us one can obtain the probability density function α k (s|g k , t k ) recursively as Note that the inverse Gaussian process has the property of stationary increment, then Using (5), f T(g k− 1 ) (t) can be given as en the probability density function α k (s|g k , t k ) recursively as Under task success criteria I, if the task is completed at the k th attempt before time τ, then the remaining mission execution time before the k th attempt exceeds the task duration τ, and the systems survives the k th attempt (the rescue initiated time in the k th attempt, T(g k ), is larger than ε, and the systems lifetime T is greater than the task duration τ, i.e., T(g k ) > ε and T > τ. en the probability of task success at the k th attempt under task success criteria I is given as R I,k g k , t k � τ τ P T g k > max ε, t k , T > τ α k t|g k , t k dt. (14) In accordance with the proposed multicriteria abandon policy, the probability the mission is not aborted at the k th attempt and the systems survives the mission is given as Due to the property of IG process, the degradation in- . Using the inverse Gaussian distribution function of the degradation increment in (4), we have Using (16), the probability the mission is not aborted and the systems survives the mission is given as Based on (16), the probability that the task is completed at the k th attempt under task success criteria I is given as Note that the number of attempts until the mission is completed is mutually exclusive, using the law of total probability, the mission reliability under task success criteria I as a function of the abandon thresholds can be obtained as Journal of Mathematics

Systems Survivability under Task Success Criteria I.
e systems survives under the condition that it finishes the task or the rescue process. erefore, the survivability of the system equals the sum of mission reliability and probability of rescue success. When the systems survives after k attempts before time τ, then it follows that (4) the time remaining before the k th attempt is larger than τ; (2) the task is aborted at the k th attempt and rescue procedure succeeds, i.e., T(g k ) < ε and T(g k ) + φ(T(g k )) < T; (3) and the remaining time for mission after the k th rescue is smaller than τ such that the mission is not further attempted, i.e., t k − (T(g k ) + φ(T(g k ))) < τ. us, the survival probability of the system after k attempts is given by By the property of independent and stationary increments of inverse Gaussian process, given the remaining mission execution time before the k th attempt, the probability that the systems survives k attempts is expressed as where ψ satisfies s − ψ − φ(ψ) � τ. According to (11) and (21), the probability that the systems survives the k th attempt under task success criteria I in (20) is given by By Eq. (10), the probability that the systems survives after k attempts under task success criteria I in (20) is given by Note that the number of attempts until systems survival is mutually exclusive, the systems survivability under task success criteria I can be obtained as

Performance Evaluation under Task Success Criteria II
is section derives mission reliability and system survivability under task success criteria II considering dynamic abandon policies. Similar to the derivation of mission reliability and systems survivability task success criteria I, recursive method is adopted to evaluate the mission reliability and systems survivability.

Mission Reliability under Task Success Criteria II.
Under task success criteria II, the cumulative operating time should be larger than a threshold τ(τ < τ). Let S k and U k respectively be random remaining time for mission execution and cumulative operating time before the k th attempt. Let α k (s, u|g k , t k ) be the joint PDF of S k and U k given abandon thresholds g k and t k . Initially, a new system proceeds to operate with the remaining task execution time τ and cumulative operating 0 before the first attempt. erefore, by definition of α k (s, u | g k , t k ), the corresponding probability mass function of S 0 and U 0 can be given as Let α k− 1 (s, u | g k , t kk ) be the joint probability density function of the remaining time and mission time before the (k − 1) th attempt. Given the remaining time s and the time in mission u before the k th attempt and the mission is aborted at time t, the remaining mission execution time and the cumulation operating time before the (k − 1) th attempt are s + t + φ(t) and u − t, respectively. us one can obtain α k (s, u|g k , t k ) recursively as Based on the distribution of the inverse Gaussian process and the PDF of the first passage time of the inverse Gaussian process, we have Under task success criteria II, if the task succeeds after k attempts by time τ, then the following condition is met: (1) the remaining mission execution time before the k th attempt should be greater than the required time to complete the mission; (2) the cumulative mission time exceeds τ after the k th attempt.
ere are two possible cases for task success. In Case 1, the cumulative mission time exceeds τ before the abandon time T d k . In Case 2, the cumulative mission time is shorter than τ before the abandon time T d k , but exceeds τ before system failure occurs. en the probability of task success after the k th attempt under task success criteria II is given as a function of the abandon thresholds as Journal of Mathematics e first part in (28) is the probability of mission completion before the abandon threshold. Using the first passage time of the inverse Gaussian process, we have e second part of (28) is the probability that the mission is completed after reaching the abandon threshold, which can be given as Based on the expression in (29) and (30), the probability that the task is completed at the k th attempt under IITR is given as can be derived as Since the number of attempts until the task succeeds is mutually exclusive, by the law of total probability, the mission reliability under task success criteria II can be given as a function of the abandon thresholds as 8 Journal of Mathematics

Systems Survivability under Task Success Criteria II.
Due to the multiple attempts for mission success, if the systems survive the mission after k attempts and no further attempt is made in time τ, then we can conclude that (4) the task is completed at the k th attempt, i.e. T(g(k)) < ε and T(g k ) + φ(T(g k )) < T; (2) the remaining time for mission execution after the k th rescue process should be less than the remaining required task time, i.e. s − φ(T(g k )) < τ − u.
Using the probability density function of S k and U k , the probability that the systems survives after k attempts under task success criteria II is given by Using the distribution function of the degradation increment in Eq. (1) and the stationary and independent increment property, it follows that where ψ satisfies s − φ(ψ) � τ − u. Based on the distribution of the inverse Gaussian process, we have Journal of Mathematics Using (35), the probability that the systems survives after k attempts under IITR can be given as In a similar manner, the SSP under ITR can be obtained by the law of total probability as

Optimizing the Abandon Thresholds
e mission reliability is increasing in the abandon thresholds, on the contrary, the survival probability of the system is decreasing in the abandon thresholds as a result of increased risk of failure during the duration of task execution. us, we should investigate the optimal task abandon thresholds to balance the trade-off between task success probability and systems survivability.
is study uses the commonly used cost criterion to establish the optimization problem. e cost in the optimization model includes task failure cost and systems failure cost. Denote c m and c s the task failure cost and the system failure cost, respectively. Using the expressions for task reliability and systems survivability, the expected total economic loss under task success criteria I during task execution can be given as and the expected total cost under task success criteria II during task execution can be given as Since the calculation of task reliability and systems survivability involves reserve function, we employ numerical method to obtain the value of task reliability and systems survivability, and then the optimal solution can be obtained using heuristic algorithms such as genetic algorithm. To evaluate the cost of the developed abandon policies, we compare the cost-effectiveness of different policies via numerical example.

Background.
is section applies the abandon strategies developed to cloud computing systems which refer to systems consisting of certain hardware and software resources performing data processing, data computing, and storage tasks. It has a wide range of application scenarios. e cloud computing systems discussed in this study consists of multiple virtual machines residing on different servers, which jointly perform certain computing tasks. When the degradation level of the cloud computing systems exceeds a critical level, the system fails with the consequences of destroying data. To characterize monotone degradation behavior, the degradation process is modeled by a homogeneous inverse Gaussian process with ℓ � 15, η � 0.9. Assume that the allowable time to perform the computing task is 40 hours. e time for a single computing task is 18 hours. When the systems degradation at each attempt reaches a threshold, the computing task is suspended and rescue will be carried out whose duration at time t is φ(t) � 0.5t. en we can calculate the maximum abandon time ε in each attempt is 12 hours. When the time in task is greater than 12 hours, the task takes less time to complete the rescue, that is, if the rescue starts after 12 hours, the task will not be suspended. In this section, the mission reliability and system survivability of cloud computing systems are numerically tested using the numerical integration method. en the optimal maintenance and abandon thresholds under dynamic policy are studied.

Optimal Abandon Policies.
We consider optimal task abandon policies under different task success criteria. is section investigates the variation of the optimal solution with respect to allowable time and task duration. e cost of a task failure and systems failure are assumed to be 300 and 1500 respectively. Figure 1 shows the mission reliability and system survivability under the time threshold and degradation threshold under single attempt. It can be observed that the mission reliability increases with the degradation thresholds while system survivability decreases with the degradation thresholds. In contrast, the mission reliability decreases with the time thresholds while system survivability increases with the time thresholds. Figure 2 shows the expected total cost under single attempt. Table 1 shows how the optimal abandon decisions under task success criteria I varies with the change of different maximal allowable time and task duration. It shows that given a fixed task duration and number of attempts, the degradation-based abandon threshold is nondecreasing in the allowable time. One possible explanation for such change is that when the allowable time is small, the abandon should be performed earlier in the first several attempts to save time for rescue procedure and future attempts. With the increase of allowable time, it is optimal to delay task abandon since there exists more time for rescue procedure and following task execution. Similarly, from Table 1 we can observe that, given a fixed number of attempts, the abandon threshold decreases when the duration of the task increases. Because when the mission duration is small, the abandon should be conducted at later stage to improve mission reliability. With the increase of task duration, it is optimal to scrub the task earlier to improve systems survivability. For a fixed time deadline and task duration, the abandon threshold decreases with the increase of task attempts. One explanation is that the task should be scrubbed earlier to save time for rescue and improve systems survivability in later attempts.
We can observe that given fixed number of allowable time, task duration, and allowed attempts, the optimal time threshold is nonincreasing in the number of task attempts to reduce the total cost. To be specific, in the first few task attempts it is optimal to set a larger time threshold to improve mission reliability while with the increase of task attempts it is optimal to reduce the time threshold to improve systems survivability. Given a fixed task duration and number of attempts, the time threshold is nondecreasing in the hand allowable time due to the increased mission reliability.   Table 2 shows how the optimal abandon decisions under task success criteria II varies with the change of different time deadlines and task duration. Comparing Tables 1 and 2, it can be found that under task success criteria II, the optimal degradation-based abandon threshold decreases under task success criteria II, since the completed work can be accumulated under task success criteria II. us it is optimal to conduct abandon earlier. e time-based abandon threshold under task success criteria I is larger than that under task success criteria II. Under task success criteria II, the completed task in different attempts can be accumulated under task success criteria II, resulting in higher mission reliability. Consequently, the task under task success criteria II can be aborted earlier due to higher mission reliability.

Conclusion
Based on the practical engineering background of the systems operation process and the characteristics of tasks, this article investigates the multiple task abandon strategy under two types of task success criteria. Multiple attempts can be made until the mission succeeds within an allowable time. In each attempt, the abandon decision is made based on the degradation level and time in the mission. When the system degenerates to the predetermined threshold, the mission stops and rescue is carried out. If the rescue is successful, the systems will return to the perfect state and start the next attempt to complete the task until the task is completed, the system fails, or the maximum number of attempts allowed is reached. e recursive formula is used to characterize the system state transition process, and the task reliability and system survivability under multiple attempts are derived. Finally, taking the engineering case of cloud computing systems as the background, the numerical example results are displayed. A number of future research directions of current research are worth investigating. First, the systems considered in this paper are subject to internal degradation. Future research can consider systems operating in the impact environment, which is also an important factor influencing the risk of critical safety systems. Secondly, this paper considers that the systems after a successful rescue can be restored to a perfect state. However, in practical engineering cases, the systems after rescue may only be partially repaired and cannot be restored as new. e case of imperfect state after rescue is worth of investigation. e current study assumes continuous inspection, which may be costly in practice. e case of periodic inspection and the optimization of the abandon policy is worthwhile investigation. Finally, this study focuses on the optimization of the abandon policy and the joint optimization of the abandon and rescue problem is another research direction.

Data Availability
All data used to support the findings of the study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.