Health Indicator for Predictive Maintenance Based on Fuzzy Cognitive Maps, Grey Wolf, and K-Nearest Neighbors Algorithms

An essential step in the implementation of predictive maintenance involves the health state analysis of productive equipment in order to provide company managers with performance and degradation indicators which help to predict component condition. In this paper, a supervised approach for health indicator calculation is provided combining the Grey Wolf Optimisation method, Swarm Intelligence algorithm, and Fuzzy Cognitive Maps. The k-neighbors algorithms is used to predict the Remaining Useful Life of an item, since, in addition to its simplicity, they produce good results in a large number of domains. The approach aims to solve the problem that frequently occurs in interpolation procedures: the approximation of functions belonging to a chosen class of functions of which we have no knowledge. The proposed algorithm allows maintenance managers to distinguish diﬀerent degradation proﬁles in depth with a consequently more precise estimate of the Remaining Useful Life of an item and, in addition, an in-depth understanding of the degradation process. Speciﬁcally, in order to show its suitability for predictive maintenance, a dataset on NASA aircraft engines has been used and results have been compared to those obtained with a neural network approach. Results highlight how all of the degradation proﬁles, obtained using the proposed approach, are modelled in a more detailed manner, allowing one to signiﬁcantly distinguish diﬀerent situations. Moreover, the physical core speed and the corrected fan speed have been identiﬁed as the main critical factors to the engine degradation.


Introduction
Although predictive maintenance practices have existed for many years, only recently, thanks to the emerging Industry 4.0 technologies with increasingly reliable and affordable smart systems, it has become widely accessible [1]. It has several advantages, including machine life increase by 3-5%, reduced maintenance costs by up to 40%, and returns on investment up to 10 times [2].
One of the most relevant steps in the prediction process is the choice of the best approach for the item behaviour assessment, such as data-driven or model-driven approach [3]. In particular, according to the platform developed by Patel et al. [4] for the application of Industry 4.0 principles to the industrial reality, the data-analytic layer is crucial to understand a plant functioning. Moreover, if properly designed, it allows users to identify the presence of invisible relations among data provided by the application layer [5]. It is also true that, according to the "no free lunch" theorems, a standard procedure for predictive maintenance does not exist. Still, it must be chosen among those that best suit the reality under analysis [6]. In any case, regardless of the adopted process, for a more accurate and optimal prediction, it is necessary to gather and analyse appropriately large amounts of data within a time frame [7,8] with consequent problems deriving from the identification of the most accurate health indicators. e health of a system can be defined as the deviation or degradation of an item behaviour from its regular operating performance [9]. e calculation of a suitable health indicator (HI) is fundamental to establish a link between the deviation or degradation of an item and its Remaining Useful Life (RUL).
us, an accurate HI is a key for a more precise prediction tool, guaranteeing its reproducibility [10,11]. is observation is the reason why many researchers focus their activity on this issue ranging from supervised and unsupervised algorithms [12,13] to physical [14] and virtual [15] HIs. e HI assessment needs the monitored parameters provided by the physical data from sensors to be transformed into information represented as indicators. e potential benefits include not only the reduction of the quantity of data examined but also the maximisation of the useful information content [16].
In this context, the proposed paper lays its foundations. An approach for HI definition and calculation is provided combining the Grey Wolf Optimisation (GWO) approach, belonging to the set of Swarm Intelligence algorithms, and Fuzzy Cognitive Maps (FCMs). Subsequently, the k-neighbors algorithms are used to predict the item RULs. e proposed approach, in comparison to previous studies presented in the literature, does not require knowledge about the gradients of the cost function and constrained functions, guaranteeing both reliable and robust performance and easy implementation. Moreover, it ensures extreme flexibility and adaptability to a given domain. It allows an in-depth understanding of a specific issue; thanks to the possibility of symbolically representing the relationships among all the involved variables.
To present the methodology and analyse its performance, the rest of the paper is organised as follows. Section 2 briefly describes the analysed literature on Swarm Intelligence algorithms and FCMs application to predictive maintenance. Section 3, divided into three sections, focuses on the explanation of the FCMs theory and GWO algorithm functioning. It then explains the proposed algorithm steps, underlining its benefits. Section 4 shows the results obtained using a dataset provided by NASA regarding the RUL prediction for aircraft engines and compares the results with an Artificial Neural Network approach. Conclusions have been drawn in Section 5.

Literature Review
As mentioned, research on predictive maintenance has grown in recent years due to the development of Industry 4.0 technology. Hence, to gather the most relevant contributions dealing with maintenance in general and FCMs and Swarm Intelligence applications in particular, a systematical approach has been adopted. e Scopus scientific database has been selected, considering that all the papers have an available full text written in English. All articles have been read to assess their relevance and pertinence to the theme developed in this study. In Table 1, the combination of the keywords selected, the number of papers retrieved by Scopus, and the ones chosen for this literature review are reported.
In recent literature, several contributions deal with the development of HIs aiming at predicting the need for maintenance interventions. For example, some authors propose the implementation of dashboards for the monitoring of the equipment health status in the semiconductor manufacturing industry [17,18], while others focus on structural vibrations analysis [19] and RUL prediction [20,21]. Various techniques and methodologies can be found in the literary contributions: for instance, Baraldi et al. [22] develop a differential evolutionbased multiobjective model aiming at defining the health status of the system and adopting maintenance strategies; other authors, instead, apply artificial neural networks [23] or genetic algorithms [24] to model the health status of the system.
To the best of the authors' knowledge, there is no evidence of scientific papers dealing with predictive maintenance through the application of FCMs and Swarm Intelligence (SI) approaches. At the same time, a contribution can be found only through the application of SI methods. Li et al. [25], indeed, applied a multiclass relevant vector machine-optimised through the application of the SI dragonfly algorithm-to predict the failures of a diesel engine. Other SI applications to the maintenance field can be found in existing literature, for example, Zheng et al. [26] use the particle swarm optimisation to predict the performance degradation of aeroengines, considering aspects such as fuel consumption, rotor vibration, and thrust loss. A similar perspective is adopted by Hu et al. [27], in diagnosing the failures of a gearbox, through the particle swarm optimisation and the kernel extreme learning machine, and by Zhao and Liu [28] who solved the same class of problems through the rough set theory. Several further SI applications to the maintenance field instead focus on the definition of the maintenance scheduling [29][30][31][32]; R. [33].
Going into detail regarding the GWO algorithm, some applications in the maintenance field can be found in the literature: the majority of them focus on the cost efficiency of the maintenance processes. For example, it is applied to optimise the design and maintenance of photovoltaic power plants [34] or to minimise maintenance costs of heat and power systems [35][36][37][38]. Kumar et al. [39] focus on both the reliability and the costs of a Space Shuttle, through the implementation of a multiobjective GWO. Dalla Vedova et al. [40], instead, compare different algorithms, among which the GWO is for the RUL estimation of an aircraft actuator, while Abdelghafar et al. [41] optimise a support vector machine through the GWO to improve the detection of satellite sensor failures. Some works focus on the scheduling through the implementation of the GWO Algorithm: it can be applied to solve job shop and maintenance scheduling problems [42] as well as to block flow shop scheduling, considering fuzzy processing times and dynamic maintenance strategies [43,44].
FCMs have proved to be useful tools in supporting the decision-making processes in the maintenance field. For instance, they can be applied to verify the impact of maintenance activities on a building's energy efficiency [37] or to identify the factors affecting human reliability during the maintenance operations [45]. According to Gupta and Gandhi [46], data coming from maintenance work orders can be used to detect possible improvement areas in terms of component design. Dynamic risk modelling is also performed through the FMCs: in Lopez and Salmeron study [47], an FCM is built to assess the risk during the enterprise resource planning of maintenance processes, while in Jamshidi et al. [48], it is used to study the critical factors related to the maintenance outsourcing. Damage detection can also be performed through the FMCs. For instance, Senniappan et al. [49] propose an application for the early detection of damages in civil structures' elements of support based on an FCM, modelling both the knowledge obtained from the domain experts and the existing literature. Instead, Lee et al. [50] use rule-based FCMs based on the experts' knowledge and experience to identify the factors accelerating the deterioration of rubber components in order to predict the maintenance timing and structure a diagnostic process. In the work of Azadeh et al. [51], the FCM is used to assess which factors among cognitive and temporal ones have a more relevant impact on the execution of the maintenance interventions. Similarly, maintenance errors can be analysed through FCMs in order to highlight which are the most critical and repetitive ones and recommend modifications in the maintenance process or training [52]. Zhang et al. [53], instead, develop a robot dedicated to live maintenance whose behaviour is predicted through an FCM.
According to the existing contributions, there is no evidence of the joint implementation of FCMs and GWO, even though both the methodologies have been successfully applied to the maintenance field. Among the benefits harboured by the GWO, its ability to work in a dynamic environment is one of the most useful in this application field. In parallel, the FCMs are useful for the qualitative simulation of a modelled system. To sum up, the benefits harboured by the joint implementation of the two techniques proposed in this research approach are the flexibility and adaptability, as well as the reliability and robustness of the performance.

The Research Approach
e general scheme of a predictive maintenance procedure proposed in this work is shown in Figure 1 and described below.
(i) Preprocessing Data. Preprocessing means the preparation of the dataset for analysis; it incorporates all the steps for dataset preparation. In this part of the process, it is essential to get as much information and indications as possible from the dataset.
(ii) Features' Extraction. It is the step in which variables are selected and/or the amount of data to be processed is reduced, ensuring an accurate and complete description of the original dataset. (iii) Splitting Data. is is an analytical step to understand how to train the machine learning system in the best way. As a matter of fact, within machine learning systems, there are two parts: the first is the training that, as the name may indicate, trains the course and teaches it how to act. After this step, the system is ready to perform what it has learned and to test if the training completed in the previous step was successful. is is done through the score or test. Given the significance that these two steps assume, it is of fundamental importance to understand the best way to divide the available data package in the right proportions. (iv) Health Indicator Modelling. e sensor readings, reworked in the previous steps, are combined into a single parameter called health indicator to be used in the prediction of the adverse event.
(v) RUL Prediction. e RUL equipment is carried out in this work through the K-nearest neighbors classifier [54] and Weibull fitting [55].
e core activities of this work are the HI definition and RUL Prediction. e innovative proposed methodology to develop these activities will be described in depth in Section 3.1.

Data acquisition
Preprocessing data

Features' extraction
Splitting dataset Health indicator modelling RUL prediction e proposed approach for the HI modelling is performed by the combined use of FCM and a Swarm Intelligence algorithm such as the GWO. Before describing the proposed approach, the FCM and GWO theories are briefly described in the following two sections.

e FCMs' Modelling.
A cognitive map (CM) can be thought of as a concept map reflecting mental processing, comprised of collected information and several cognitive abstractions, individually filtered, about regarding physical phenomena and experiences [56]. Cognitive maps are visual representations of an individual's mental model constructs, analogous to concept maps for representing human reasoning and knowledge or beliefs [7]. us, a generic problem is considered, and an expert panel of experts is formed for its in-depth analysis, since different individuals may face the same question differently. According to their area of expertise through fuzzy logic, they model collective FCM identifying concepts and relationships about regarding the considered problem. In particular, concepts, in number of N, are the FCM key elements that stand for the main characteristics of the abstract mental model for whichever complex system [57]. Once concepts are identified, experts are asked to assign a numerical value wij (the weight of the relation between concept ith and jth) for the W matrix, which represents the influence of concept Ci on concept Cj. According to equation (1), wij ranges in [-1, 1]. Specifically, wij � 0 indicates no causality between concepts, wij > 0 indicates causal Cj increases as Ci increases (or Cj decreases as Ci decreases), and wij < 0 shows causal decrease or negative causality (Cj decreases as Ci increases or Cj increases as Ci decreases): Although many studies exist concerning the dynamical representation of an FCM, generally, the experts' opinion aggregation of expert opinions for the collective weight matrix modelling is performed using the SUM method [58]. en, overall linguistic weight is evaluated using the centre of gravity (COG) defuzzification method [59]. Some examples are presented by Bevilacqua et al. [7,60,61] and Stylios et al. [62] where a unique credibility value is assigned to each expert and a threshold function is used in the aggregation. On the contrary, a modification of the approach mentioned above has been provided by Stylios and Groumpos [63] and Stylios and Groumpos [64], introducing a corrective factor for the experts' credibility evaluation. However, this approach does not take into consideration the fact that, in a complex multidisciplinary problem, most experts have in-depth knowledge of only parts of the problem and not the entire issue [65].
Once the total weights' matrix, W, has been designed, it is possible to analyse the system behaviour through simulations. us, if A i defines the instantaneous value of concept Ci, its evolution over time can be evaluated computing the influence of the related concepts Cj on the specific concept Ci according to where Aik + 1 is the value of concept Ci at simulation step k + 1 and Ajk is the value of concept Cj at simulation step k. Also, w ij is the weight of the interconnection from concept Cj to concept Ci and f is an appropriate threshold function used to force the concept value to be monotonically mapped into a normalised range [66]. Other equations can be used in place of equation (2) as suggested by Mazzuto et al. [67] and Osoba and Kosko [68]. An important topic in the FCM analysis is the indirect and total causal effect evaluation (Axelrod, 1976), whose knowledge allows an in-depth map analysis. e indirect effect I k of C i concept on C j concept can be defined as shown in I k is defined as the minimum numerical of the w ij weight along a single path between concepts ith to jth. At the same time, the total causal effect T(C i , C j ) (equation (4)) is the maximum of the indirect effect of concept C i on concept C j : According to Bevilacqua et al. [7], equation (3) can be described using the "weak ring in the chain" metaphor. Indeed, it is necessary for the identification of ATO identify concept concatenation as a chain where the weight w ij is the hardiness of each chain ring. In the presence of a weak ring into the chain, it is not possible to consider it as a "resistant chain," and its total hardiness is quantified with the hardiness of the weak ring. erefore, once derived the value of hardiness is derived from by equation (3), and equation (4) allows defining the more resistant chain to be defined. Finally, the chain hardiness highlights the relevance of the first concept in the concatenations affecting the top event.
In the proposed approach, the concepts of the FCM represent the working conditions of the component to be analysed, the sensor signals installed on the components, and the HI of the component. e FCM takes the advantage of the situation to identify the relationships among all the involved concepts in a matrix form to be used to calculate the health indicator for the RUL prediction.

e GWO Algorithm.
Mirjalili et al. [69] introduced the GWO, which mimics the hierarchy of leadership and the mechanism for hunting grey wolf packs in the wild. e algorithm divides the agents (grey wolves) into four different hierarchical categories called alpha (α), beta (ß), delta (δ), and omega (ω), in the descending order.
Each hierarchy has different roles to find solutions, which in this case correspond to the prey. e leaders of the packs are the wolves called alphas. e alpha is primarily responsible for decisions about hunting, where to sleep, and so on. e alpha wolf is the dominant one, and the pack must follow his orders. e beta wolves identify the second level in the hierarchy.
ey are subordinate wolves that help the alpha in decision-making or other pack activities. Moreover, a beta wolf must not only respect the alpha but also command other lower level wolves.
e lower level grey wolf is the omega. e ω plays the role of scapegoat, and it helps to satisfy the entire pack and maintain the dominant structure. Omega wolves must always submit themselves to all other dominant wolves. It may seem that the omega is not an essential individual in the pack, but it is also true that the entire pack faces internal struggles and problems if the omega is lost. If a wolf is not an alpha, beta, or omega, it is called a subordinate (or delta in some references). Delta wolves must submit themselves to alphas and betas, but dominate omegas. Scouts and hunters, for example, belong to this category. ey are responsible for guarding the boundaries of the territory and warning the pack in case of danger. Hunters help the alphas and betas to hunt prey and provide food.
To mathematically model the social hierarchy of wolves in the GWO design, α is therefore considered the most suitable (optimal) solution. Consequently, the second-and third-best solutions are named β and δ, respectively. e remaining candidate solutions are the ω ones.
In the GWO algorithm, α, β, and δ wolves impose the rules of hunting and the ω ones follow them. In particular, the hunt is composed of three main phases such as (i) searching and chasing prey, (ii) surrounding and harassing the victim until it stops moving, and, finally, (iii) attacking the prey.
After spotting the possible prey, the wolves begin to surround it and then move on to the attack. Equations (5) and (6) model mathematically encirclement behaviour: where D represents the difference between the position of the prey and the predator, t denotes the current iteration, x p specifies the location of the victim, and x indicates the wolf location. Equations (7) and (8) allow one to calculate the A and C values: where the components of a linearly decrease from 2 to 0 during each iteration and r 1 and r 2 are random arrays with ranging in [0 1], and they allow wolves to reach any position between the points, as illustrated in Figure 2.
As shown in Figure 2(a), a wolf in position (X, Y) can update its location according to the prey's position (X * , Y * ), and the same consideration is possible in 3D space (Figure 2(b)), or in n dimension space.
It is assumed that alpha (best candidate solution), beta, and delta have a better knowledge of the potential position of the prey to simulate the hunting behaviour of wolves mathematically. erefore, the first three best solutions are considered, and the other search agents (omega wolves) are obliged to update their positions according to the location of the best search agent [70].
As mentioned above, wolves end the hunt by attacking their prey when it stops moving. If |A| < 1, the wolves begin the attack phase by moving towards the victim. Wolves look for prey mainly based on alpha, beta, and delta positions. In this phase of research (exploration), the wolves move away from each other to identify the different places of the prey (solutions). e vector A assumes values higher than one or less than −1 and forces the research agent to diverge from the victim. is emphasises the exploration and allows the GWO algorithm to search globally to find better prey. us, once α, β, and δ wolves are identified, all of the members' pack positions are updated according to Figure 3 describes the step to implement the GWO according to the mentioned equations. e GWO has the advantage of having few parameters to initialise and be a flexible algorithm, so it can adapt to various practical engineering problems. Indeed, only the number of wolves in the pack (nPop) and the maximum number of iteration (MaxIt) must be initialised. In Figure 3, Iter is the current iteration. Moreover, the GWO can be easily implemented, and thanks to its hierarchical structure, which guarantees high accuracy in the solution.
Although recently introduced, the GWO has been used in various fields of application. Das et al. [71] have tested the GWO to optimise the parameters of a PID controller used for speed control of a DC motor system. Komaki and Kayvanfar [72] proposed the application of GWO to program the optimal machining and assembly sequence to minimise the completion time. e results obtained with this algorithm were then compared with other methods. is comparison revealed that the GWO provided better performance. Nguyen et al. [73] used a multiobjective GWO to solve the problem of node location in a wireless sensor network. Various constraints were considered in the Mathematical Problems in Engineering localisation model, including the limitation of spatial distance between nodes and the restriction of topology. e results of the simulations show significant improvements in terms of localisation accuracy and rate of convergence to the optimal solution, compared to those obtained with other methods. Song et al. [74] used GWO to estimate the parameters of Rayleigh Waves (a type of elastic surface wave).
However, the research and development activities for this algorithm are still at an early stage [75]. As previously stated, the GWO has a strong exploration capacity, which can avoid convergence in excellent premises. is feature may lead the algorithm to slow convergence and indeed led us to try GWO to define the w ij values of the FCM.

e K-Nearest Neighbors
Algorithm. An in-depth analysis of the k-nearest neighbors (KNN) algorithm allows underlining as it is simple and easy-to-implement supervised machine learning algorithm used to solve both classification and regression problems. Its functioning is based on the similarity of the characteristics: the closer an instance is to a data point, the more KNN will consider them similar [76].
Once the HIs have been defined for each training unit, they can be used as models representing the degradation profile, from normal functioning to disruption. At this point, a set of models M i (with i � 1 to the number of items composing the training dataset) is available and usable to predict the RUL. erefore, to find the most similar element, it is necessary to measure the distance between the model M i and Y � y 1 , y 2 , ..., y r , which represents the HI of the test unit obtained through consecutive observations. e distance is calculated by the Euclidean distance (depending on the problem under examination) or by the mean value of the absolute residual (used in the proposed approach), as described by equations (11) and (12). us, the smaller the distance, the greater the similarity between the data point and the instance to be predicted:  Mathematical Problems in Engineering where y i is ith training model and y l is the l th testing one, and each of them is composed of x i and x l components. en, the calculated distances are used as the argument to evaluate the similarity weight, sw i,l , between the testing HI and all of the training ones considering Once obtaining the similarity weights among the testing unit and training ones, it is possible to rank them in descending order and identify the number of similar unit SU as described in where N is the number of training units and k is an initial fixed value. In particular, when k is small, it is limiting the prediction region, forcing the classifier to be "more blind" than the general distribution. On the contrary, a large k reduces the impact of the variance caused by a random error but runs the risk of ignoring small details that might be relevant. For the proposed approach, initially, k is fixed equal to 50. us, having similar units, it is possible, considering the relative end dates updated to the number of test samples, to fit a Weibull distribution to find the RUL. Figure 4 shows the framework used to esteem the aircraft engine HIs and, subsequently, to predict the RUL of the engine. e proposed algorithm can be classified within the condition monitoring techniques. It consists of a general framework and can be applied to any equipment. e dataset, both training and testing, are composed of the sensor readings of the considered items.

e Procedure for HI Assessment.
In the "Time Indicator Modelling" phase, a lifetime indicator (LTI) is defined. e sample number of each piece of equipment (that corresponds to the number of rows of the dataset) represents its life duration. e main idea of LTI is to model a degradation profile considering that, at the beginning of sampling, an item has the maximum reliability value (equal to 1), and when the disruption occurs, the item reliability has a minimum amount (equal to 0). e first value of LTI is equal to 1, and the last one is equal to zero, according to where DUR m is the dataset length for the mth equipment. Each element of TI m indicates the remaining cycle times to the relative disruption at time t (then normalised in TI m "). Hence, each value of LTI m decreases from 1 to 0. LTI m represents the parameter to be esteemed and used in the algorithm for the HI estimation. Table 2 shows an example of LTI calculation. e "FCM Modelling" phase is the core of the proposed approach to identify the HI. Figure 5 describes the iterative phase for the HI calculation reviewing the general GWO algorithm shown in Figure 3. In particular, the GWO algorithm is used for defining the weight of the relation between concept ith and jth (w ij values) of the FCM matrix. e concepts of the FCM represent the working conditions of the equipment to be analysed, the sensor signals installed on the equipment, and the last concept which is the HI. In particular, since the purpose of the approach is the HI estimation using the FCM theory, the number of concepts (NC) to be used is equal to the number of reduced dataset variables' number plus the HI (the algorithm output). is means that if the reduced dataset variables number is n, NC � n + 1.
e iterative phase, shown in Figure 5, is executed for each equipment belonging to the training dataset. In each iteration, the final α wolf position is assumed as the temporary FCM j and used as initial FCM for the next one. When the terminal equipment has been analysed, the relative FCM is considered as the optimal solution. e GWO algorithm is used to define the w ij values of e term f identifies the instant in which the fault occurred; n represents the progressive number of the main reduced variables, and m is the number of considered equipment. e pack members' position, obtained through the GWO application for a specific device, is given in the form of

Mathematical Problems in Engineering
where p is the pth member of the pack (p � 1, 2, . . .nPop), iter is the current iteration with iter ≤ MaxIt, and NC is the concepts number for the FCM algorithm defined before. us, FCM iter p,m is the relative position of the pth pack member at the current iteration for the mth equipment.
By analysing Figure 5, it is possible to highlight the presence of two main "for loop." e first one (external) is referred to the number of available equipment (M) and the second one (internal) to the maximum iteration number for GWO (MaxIt).
At each iteration of the external loop (the iteration is equal to the equipment number in the dataset), the LTI related to a specific item is used as the benchmark for the positional cost calculation (if the iteration is equal to one, the LTI 1 is examined). is means that the external loop has the objective of identifying the best FCM m for the mth equipment. In the internal loop, for the GWO application, all the pack members take a position within the domain space, updating it at each inner iteration. At the end of the inner loop, the best position for the considered item is identified (FCM m ). e obtained FCM m is used as the initial position for the FCM m+1 identification (as long as m < M) to improve its accuracy.
In the algorithm initialisation, FCM 0 1 can be defined randomly if there is no knowledge of the involved equipment or a panel of experts cannot be established to model it, as described for the classical FCM design approach, according to the experience of each professional involved.
As mentioned before, equation (1) evaluates the fitness cost value for each FCM iter p,m . is step is the most critical in the whole algorithm, as highlighted by Mazzuto and Stylios [77]. Indeed, to calculate the cost connected to FCM iter p,m r , it is necessary to consider all of the samples which make up RD m and LTI m . More accurately, if RD m has f samples (as described above) as well as LTI m , equation (1) has to be applied f times. Besides, since the number of iteration (k) in equation (1) depends on the function convergence or the fixed amount of repetition (FCMiter), the fitness cost evaluation requires an iteration number equal to (f · FCM iter ). Considering nPop wolves and a maximum   Steps description Equation application Suppose machine 1 breaks after 10 sampling cycles; it follows that DU R 1 � 10, according to equation (16) TI 1 ′ � 9 8 7 6 5 4 3 2 1 0 According to equation (17), the relative normalised value TI ' Considering the pth pack member, once its position is defined (FCM iter p,m ), its positional cost has to be calculated according to equation (1) for the best position identification. e main idea is to consider each sample in the reduced dataset for a specific item (RD m ) as the initial array A 0 at the specified iteration according to e null value is because the output is considered within the set of FCM concepts, but it is the variable that needs to be taken into consideration.
When the application of equation (1) reaches the convergent, A * ,j m will be described by where Var m 1,j is the convergent value for the variable Var m 1,j and HI * ,j m is the esteemed output for the sample at time j for the mth engine.
Once all of the samples in RD m have been processed, a final output array HI * p,m for the pth pack member, equation (23), will be available: us, the esteemed output HI * p,m and the connected LTI m can be used to calculate the fitness cost value (C iter p,m ), for the pth pack member and the mth item and at iteration iter, using the root mean squared error formula, as shown in  e root mean square error has been chosen because it describes efficiently how concentrated the data is around the line of best fit [78].
Finally, once the optimal FCM to be used for the HI identification phases is identified, it is used to calculate the HIs in the "Training Health Indicator definitions" and "Testing Health Indicator definitions," respectively.

Research Approach Application
To explain the proposed approach and to test its accuracy for HI modelling, the Turbofan Engine Degradation Simulation Dataset has been used. It is available online on the NASA repository website (https://ti.arc.nasa.gov/tech/dash/groups/ pcoe/prognostic-data-repository/, last access July 21, 2020). e aircraft gas turbine engine has an integrated control system, which consists of a fan-speed controller and a set of controllers and limiters. In particular, it includes three high-limit regulators aimed at preventing the engine from exceeding its designed parameters [79].
Several categories of signals could be used, including temperature, pressure, speed, and air ratio to monitor the condition of the aircraft gas turbine engine. e dataset is composed of 21 sensors installed in the aircraft engine's different components, allowing the health conditions of the aircraft engine to be monitored (see Figure 6). An excerpt of the used dataset is shown in Table 3. To have a complete view of the dataset, it is possible to refer to Saxena et al. [79] and Xu et al. [80]. e training dataset is made up of readings from 249 engines (for a total of 61249 rows and 26 columns), while the testing dataset is made up of data from 248 engines (for a total of 41214 rows and 26 columns). e approach evaluation has been carried out using Matlab 2019© installed on a Intel ® Core ™ i7-6700HQ CPU @ 2.60 GHz. e results of the proposed approaches have been compared to those obtained using an artificial neural network, due to the similarity between the Artificial Neural Network (ANN) and FCM, in order to evaluate the performance of these approaches. In light of this, to have comparable results, the initial dataset has been standardised according to the working conditions and then reduced through the trendability analysis [21] to guarantee the impartiality of the data suitability obtained with the two approaches. e reduced dataset has been used as input for the proposed approach and the ANN. More specifically, according to Figure 6, the number of reduced sensors is equal to 8, such as 2, 3,4,8,9,11,13, and 17 (see Table 4). ese sensors will be the concepts for the realised FCM and the input for the ANN. Table 5 shows the nomenclature used for each sensors, the concepts in FCM, and the input of the ANN.

4.1.
e Proposed Approach Results. Once the training dataset has been reduced, for each engine, according to equations (16)-(18), the relative LTI array has been calculated (see Figure 7) to be used as output in the proposed approach regarding the positional cost definition.
As far as the proposed approach is concerned, as mentioned in Section 3.1.3, it can be initialised either using an FCM design referring to the experience of an expert panel or with a random matrix to be iteratively corrected. Due to the lack of availability of the experts concerning the aircraft engine knowhow, for the examined case study, a random initial FCM has been adopted.
Since the training dataset is composed of 249 engines, the entire process has been carried out for 249 iterations during which the FCM obtained in the previous iteration is corrected. Figure 8 shows the convergence curves during the algorithm iterations and highlights the final value of the last curve that shows the minimum root mean square error equal to 1.6117. Moreover, concerning the application of equation (2), the hyperbolic tangent function has been chosen as the threshold function f() with slope factor equal to 1. e maximum number of iterations for the positional cost calculus has been fixed equal to 50 and an additional threshold value, equal to 10-3, has been defined to potentially arrest the algorithm. e required training time to identify the final FCM has been calculated to be equal to 15 minutes due to the large numbers of samples composing the dataset. Table 6 shows the final w ij values among the concepts of the optimal FCM and the output of the proposed algorithm.
e last row shows all null values being the last concept C9, the HI, an output concept.
Analysing Table 6, it is possible to highlight the presence of some low values (less than 0.1). It would be possible to filter the final FCM so that these values could be considered null to facilitate the HI calculus. However, the additional filtering phase adds to the entire process a delay factor since the user Index Symbol  Description  Units  1  T 2  T24  T30  T50  P2  P15  P30   Total temperature at fan inlet  Total temperature at LPC outlet  Total temperature at HPC outlet  Total temperature at LPT outlet  Pressure at fan inlet  Total pressure in bypass-duct  Total pressure at HPC outlet   2  3 1  2  3  1  2  3  4  5  6  7  8  9 . . .    C1  C2  C3  C4  C5  C6  C7  C8  ANN id  I1  I2  I3  I4  I5  I6  I7  I8 should define the filter threshold value properly through specific algorithms, increasing the iteration time. Since the size of the FCM concepts set, for the examined case study, is not so big, the additional filter phase has been neglected.
e final FCM can be graphically represented to evaluate all the concatenations among concepts, as shown in Figure 9.
Once the final FCM has been obtained, the strength of the concepts involved can be analysed so as to identify the    Table 7 shows the Total Effects' matrix where the (i, j) values of this matrix represent how much the concept indicated in the rows affects the concept indicated in the columns. e most significant influence is related to the relationship between concept C7 (sensor number 13) and C1 (sensor number 2) with the strength value equal to 1. is relation means that C7, the Corrected fan speed, is the main cause of the increase to C1 that is the Total temperature at the LPC outlet.
Focusing on the degradation profile, the HI concept C9, it is possible to highlight that concept C7, jointly with concept C5 (sensor number 9, Physical core speed) have the most significant weight since the strength value is equal to -0.998. is mean that the fan speed increasing is the relevant cause of aircraft engine degradation.
Analysing in depth all the critical paths (Table 8) starting from each concept of the FCM and ending in the concept C9, it is evident how the strength of the relationship between C7 and C9 is not a direct influence. is is because C7 indirectly affects C9 through the influence on C5. us, C5 can be considered the most relevant cause of aircraft engine degradation. is could be an important consideration for a proper maintenance plan design.
Once the FCM has been analysed, it can be used to calculate the HIs for each engine in the training dataset (the first 50 HIs are shown in Figure 10) and also for the testing dataset. In practical terms, the HI shape provides maintenance managers with the real RUL value.

e Comparison between the Proposed Approach with ANN.
Results of the proposed approach have been compared to those obtained using an Artificial Neural Network.   e ANNs were chosen for the comparison as this methodology is one of the most used in literature for the evaluation of the HI and also due to the similarity with the FCM method. As far as the ANN is concerned, the best results have been obtained considering a two-level network composed of ten neurons, respectively. Referring to the Levenberg-Marquardt method [81] and using the mean squared error as a performance indicator, Figure 11 shows how the lower MSE is obtained at 60th epoch with a value ranging between [10 −2 10 −1 ]. e LTI shown in Figure 7 have been used to train the ANN in order to obtain the HI estimation, as reported in Figure 12.
e HIs defined for the engines in the testing dataset have been used for the RUL prediction using k-neighbors algorithm. Table 9 shows an excerpt of the results. Specifically, engines are reported in the ascending order in terms of the FCM percentage error (%err FCM). e second column is the real RUL value for each machine (values provided by NASA). e estimated RUL values by FCM and ANN (columns 3 and 4, respectively) underline how the proposed approach performances are better than those obtained using ANN. Figures 13 and 14   involved curves allow the reduction of the variability range for a likely prediction. is is a typical problem of ANN that has been overcome through the proposed method. e HIs defined using the proposed approach are more accurate and, in addition, the algorithm provides significant discrimination of all the considered aircraft engines (see Figure 15). us, small variations in the sensor readings define quite distinct degradation profiles.

Discussion
e proposed approach has the ability to operate in a dynamic environment with no significant difference in the operation of the algorithm in steady state or dynamic mode, guaranteeing a reliable and robust performance together with an easy implementation.
At the same time, it requires particular attention by users in defining all the involved parameters such as the size of the dataset, the number of agents to be used to find the final FCM, and the threshold values. Indeed, as discussed, the total number of iterations and therefore the total computational time to calculate HIs depends on them. However, the analysed case study has highlighted how this limitation can be overcome by applying, before the algorithm initialisation, a dataset reduction to minimise the involved variables number.     Probably the most significant advantage of the swarm's intelligence is its ability to operate in a dynamic environment. e swarm can continuously follow the path even for rapidly evolving optimisation. In principle, there is no significant difference in the operation of the algorithm in steady state or dynamic mode [82]. Moreover, these algorithms do not require knowledge, for example, about the gradients of the cost function and constrained functions. ey guarantee reliable and robust performance together with an easy implementation [83]. Specifically, as far as the GWO is concerned, the most important one refers to the small number of parameters needed for its implementation and adjustment [84,85].
At the same time, regarding the utilisation of FCMs, among several advantages, the most important is their extreme flexibility and adaptability to a given domain, allowing qualitative simulation of a system once constructed. Furthermore, FCMs symbolically represent knowledge, converting the relations between the elements of a mental landscape to assess the impact of these elements [86,87]. e use of FCMs demonstrates other additional benefits, including the use of fuzzy logic. Indeed, the fuzzy set theory allows the incorporation of uncertainty due to sparse and imprecise information [88]. A fuzzy value is a fuzzy representation of a specific property when it is not precisely known [89]. e fuzzy set theory and numbers are mainly used to quantify the grade to which a property can be connected with an object. It must not be confused with the concept of probability. Indeed, the causality among concepts is considered as a certainty, since the concept of causality is not used to try to identify or find relationships between factors such as structural equation model and/or Bayesian nets [90].

Conclusion
In this paper, an innovative supervised approach that combines a Swarm Intelligence algorithm, the GWO, and FCMs is proposed for HI analysis and calculation. is approach allows maintenance managers to predict the RUL of items through the use of k-neighbors algorithms as well as to have an in-depth understanding of the degradation process; thanks to the analysis of the main paths of concepts that affect the HI. In order to enhance the operating reliability and reduce maintenance costs, an integrated fault diagnosis and prognosis framework that analyse the machinery degradation process is necessary.
In the proposed approach, the working conditions of the engines and the sensor signals installed on engines become the concepts of the FCM, while the GWO, a Swarm Intelligence algorithm, has been used for defining the connection weight among these concepts and the HI concepts.
A dataset provided by NASA that concerns the data of aircraft engines has been used to test the proposed approach. e case used underlines a crucial aspect. Comparing the results with those obtained through neural networks, the proposed algorithm models, and all of the degradation profiles in a more detailed manner allows one to significantly distinguish different situations without imposing any specified mathematical functions. is consideration is reflected in fewer profiles that can be considered similar to the case in question and, consequently, give a more precise estimate of the RUL. Moreover, analysing the final FCM, the physical core speed and the corrected fan speed have been identified as the main critical factors to the engine degradation.
Furthermore, the use of the FCM approach allows the user to be able to analyse in an intuitive way the relationships between the variables involved and thus have a greater understanding of the degradation process, which is impossible for an ANN. Indeed, in an ANN, the variables involved are the inputs for the system and the neurons concatenation has no meaning to understand the process. On the contrary, in an FCM, the variables are simultaneous inputs and "neurons," so their concatenation gives more information about the process. e performance of the proposed approach has been demonstrated using a NASA dataset, but it can also be applicable to the other fault diagnosis and prognosis equipment. A wide range of experiments will be performed to investigate the robustness of the proposed method in our next step research. At the same time, it is evident how the proposed approach can be based not only on a feature reduction but also on the determination of the most useful items for the training phase. Indeed, considering all the variables involved for the algorithm application (number of wolves, maximum iteration number for the GWO, thresholds etc.), the total number of iterations for the identification of the optimal FCM can be very huge and time-consuming. For this reason, as further development is crucial to design a preliminary step to be used after feature extraction step.

Data Availability
To explain the proposed approach and to test its accuracy for Health Indicator modelling, the Turbofan Engine Degradation Simulation Data Set has been used. It is available online on the NASA repository website (https://ti.arc.nasa.gov/tech/ dash/groups/pcoe/prognostic-data-repository/, last access July 21, 2020).

Conflicts of Interest
e authors declare that they have no conflicts of interest.