New Multifeature Information Health Index (MIHI) Based on a Quasi-Orthogonal Sparse Algorithm for Bearing Degradation Monitoring

Data-driven intelligent prognostic health management (PHM) systems have been widely investigated in the area of defective bearing signals. These systems can provide precise information on condition monitoring and diagnosis. However, existing PHM systems cannot identify the accurate degradation trend and the current fault types simultaneously. Given that different fault types have various effects on the mechanical system, the corresponding maintenance strategies also vary. Then, choosing the appropriate maintenance strategy according to the future fault type can reduce the maintenance cost of the equipment operation. Therefore, a multifeature information health index (MIHI) must be developed to trace various bearing degradation trends with various types of faults simultaneously. This paper reports a new quasi-orthogonal sparse project algorithm that can mutually convert the degraded processing feature vector sets (such as spectrum) for each type of fault to orthogonal approximate spatial straight lines. The algorithm builds a MIHI through the spectrum of current state measured points. The MIHI is then transformed by a quasi-orthogonal sparse project algorithm to trace the various bearing degradation trends and recognize the fault type simultaneously. The case study of bearing degradation data demonstrates that this approach is effective in assessing the various degradation trends of different fault types.


Introduction
With the progress of science and technology, building an effective intelligent prognostic health management (PHM) system has become necessary for some critical components. Engineers deploy various types of sensors to detect the health conditions of a single component. However, how to deal with sensor information effectively and thus assist the intelligent health management of equipment are yet to be extensively researched. Remaining useful life (RUL) estimation is one of the key factors in asset condition-based maintenance, prognostics, and health management [1]. e basic principle of RUL estimation is how to construct the health index [2]. An appropriate health index can effectively improve the accuracy of RUL prediction. To detect the incipient faults and offer effective information to the RUL estimation and prognostics model, the health index plays an important role as a bridge connecting the sensor measured signal with the assets' health condition prognostics. Some statistical indicators, such as root mean square (RMS) and the kurtosis of the measured vibration signal [3], are used to indicate the assets' conditions for RUL estimation and prognostics. Moreover, some studies utilize time-frequency features, such as waveform entropy [4]. Many dimensionality reduction methods have been favored by researchers, for example, kernel principal component analysis is used in [5] and isometric feature mapping is used in [6]. In recent studies, researchers have developed some effective statistical indicators, i.e., using multiscale fuzzy entropy [7], Kullback-Leibler divergence combined with Gaussian process regression [8], and a neural network to conduct a health index to monitor the equipment condition directly [9,10]. e above health index building methods are used in prognostics, and they mostly map the information from various sensor signals or feature extractors into an indicator full of information. is process is called data fusion, which has three categories: feature-level fusion, decision-level fusion, and data-level fusion [11]. Feature-level fusion methods rely on prior knowledge of the degradation mechanism and physical models to analyse the input data. Goebel [12] utilized a feature-level fusion model that consisted of principal component analysis, filtering, smoothing, normalization, and log-transformation methods to feed information to the RUL estimation model and predict the breakage of paper webs in a paper-making machine at the wet end. Ma [13] reported a multiple-view feature fusion to predict lithium-ion battery RUL. Decision-level methods fuse high-level decisions made based on individual sensor data and do not rely on raw signal feature extraction. Niu et al. [14] combined the advantages of wavelet analysis on transient signals and decision-level fusion techniques to improve the accuracy of fault diagnosis. Wei [15] proposed a decision-level data fusion method to map the individual sensor signal into reliable data to improve the ability of the quality control system in additive manufacturing and RUL estimation of aircraft engines.
In contrast to feature-and decision-level fusion methods that focus on the physical meaning, data-level fusion methods pay attention to mining the embedding feature suitable to the task from the raw data. In RUL prediction and condition monitoring fields, data-level fusion methods always build the health index based on some properties that the researchers hope can improve the RUL estimation ability. Data-level data fusion methods are suitable for dealing with complex system situations because these systems hardly build an effective model to fuse various signals, but data-level can monitor the machinery system state according to the requirements of monitoring task, which has a stronger versatility. In summary, the data-level fusion method is concentrated on the property of required tasks. Some scholars have focused on investigations of properties on the health index. Two properties of the health index are proposed by Liu [16]. First, it must have good monotonicity; second, the variance of the failure threshold of multiple experiments must be minimized. Chehade [17] proposed another proposition: separability. ey thought the greater the difference of the health index between two observations, the more validated and reliable the health index. In [2], the development of the health index focused on monotonicity and separability, and it converted the transform matrix solving the problem to a convex optimization process. Yan [18] focused on locating informative frequency bands, determining the optimal fault frequency band for health indicating. Some researchers combined the health index building with statistical analysis. For example, Kim [19] proposed a linear multisensor information fusion method to build a health index and derive the best linear unbiased estimator of the fusion coefficients. At the same time, some researchers studied the methods based on the Kalman filter to fuse the sensor signals (e.g., Markov model [20]). To deal with the sensor selection problem from the multiple sensors situation, Liu [21] studied the perspective of the quality of the signal provided by sensors and proposed a signal-to-noise ratiometric method to combine various sensor signals, develop a health index, and monitor asset degradation. However, from the current study of bearing health condition monitoring, the sparsity used in building a health index is rarely considered. e lack of sparse terms leads to overfitting and reduces the generalization ability of the health index. In data-driven model building, the sparse penalty term is also used to make models have additional reusability and avoid overfitting. Sparse models built via lasso (L1-normalization) can achieve interpretable feature selection [22], and lasso has been widely investigated in the fields of biology and medicine. For some features with adjacent relevance, the fused lasso is proposed to make feature weights sparse and smooth [23].
Given that the high-dimensional feature is difficult to calculate and observe, some scholars use the health index to monitor equipment's operating condition intuitively and conduct prognostic health management. e degradation trend of different types of faults varies. Meanwhile, information on the current faults' evolution direction can also further improve the accuracy of RUL estimation. erefore, this study aims to provide a method for constructing a health index that can indicate various fault type degradation trends from the fusion of the features at the same time. e contributions of this study can be summarized as follows: (1) is work proposes an orthogonal proposition of developing a health index. is proposition focuses on expanding the discrepancy of different fault type degradations, which is the basic idea of the multifeature information health index (MIHI). (2) e MIHI uses a quasi-orthogonal sparse projection algorithm to convert the spectral features of the current measurement point into a low-dimensional vector. is low-dimensional vector can simultaneously represent the bearing degradation trend of multiple types of faults.
(3) e weight sparse and the weight difference sparse are added to MIHI to build an objection function and improve the general applicability. (4) e optimization problem expression of the proposed quasi-orthogonal sparse project algorithm is a nonconvex function with constraint. e fast-solving algorithm based on iteration is given.
Property 1. Monotonicity: once an initial fault occurs, the trend of the degradation signals should be monotonic [16].

Property 2.
Sensitivity: the health index should be sensitive to bearing components that generate abnormal defects, that is, the health index can separate the normal and abnormal health conditions of bearings [24]. Property 4. Weight sparsity: to prevent overfitting, the weight of the features with low correlation to Properties 1-3 should be small. We also hope that the weights of these features can be set as 0 to achieve feature selection.
Property 5. Weight difference sparsity: under rotating machinery operating conditions with load, working frequency has some small fluctuations (especially, bearing ball pass frequency). e weights' difference sparsity helps the projection matrix to become more flexible and deal with the fluctuating frequencies. e principle of MIHI is extracting the spectrum characteristics, which can track the degradation trend and distinguish different fault types from degradation datasets of various faults. Given that the bearing fault impulse signal is a pseudocyclization signal (the period of impulse random fluctuates around a mean value), the characteristic frequency of the fault fluctuates in a small interval. e fused lasso sparse term makes HIMI's weight matrix focus on the frequency band around the mean characteristic frequency rather than the signal characteristic frequency. Hence, this term can prevent overfitting. In summary, the quasi-orthogonal sparse project algorithm for building a sparse multi-information feature health index is expected to detect bearing fault types and evaluate the bearing degradation process monotonically and sensitively at the same time.
e proposed quasi-orthogonal project algorithm is based on the traditional linear fitting method, and the equation is defined as follows: where W ∈ R k×m is the projection matrix, m denotes the size of the spectrum feature f i of the ith observation, k is the fault type number, and HIΜΙ i ∈ R k×1 . Figure 1 shows the example of equation (1).
To deal with various fault type monitoring tasks, the proposed algorithm needs different types of fault degradation datasets as prior knowledge. We denote the jth fault type degradation data as F j ∈ R m×n j , where n j denotes the number of observation epochs of the jth fault type degradation. We also denote the ith observation epoch feature vector of the jth fault type degradation data as f j,i .
In addition, the projection matrix can be denoted as , and the weights in this projection matrix W need to be calculated row by row. e solving process of the jth row of the projection matrix w j can be described as an optimization problem, and the formula of this optimization is provided as follows: where α is defined to balance sensitivity and monotonicity (corresponding to Property 1 and 2), β is the sparsity penalty parameter (corresponding to Property 4), and c is the weight difference sparsity penalty parameter (corresponding to Property 5). μ j denotes the mean vector of the normal state . . , f j,n nor ], and n nor is the number of the first few observation epochs (generally, the bearing signals can be assumed as normal signals in the first few observation epochs), and n nor corresponds to Property 3. en, D 2 sensitive (w j ) can be calculated by To ensure that the health index can sensitively monitor the abnormal condition, D 2 sensitive (w j ) needs to be as large as possible. D 2 monotonousness (w j ) corresponds to Property 1. To reduce computational complexity, health index monotonicity relies on D 2 sensitive (w j ) and D 2 monotonousness (w j ). D 2 monotonousness (w j ) is used to evaluate the stability of the different health indexes of two adjacent observation epochs, and it can be calculated by where D 2 monotonousness (w j ) can only assess the stability but not the monotonicity of the health index. However, enlarging D 2 sensitive (w j ) and shrinking D 2 monotonousness (w j ) simultaneously can effectively approximate the realization of monotonicity. To achieve orthogonality, we need to reduce the health index difference between the other fault type data and the current fault type health state. D 2 orthogonality (w j ) is calculated by Computational Intelligence and Neuroscience where D 2 orthogonality (w j ) assesses the health index value difference of other fault types of degradation data with the jth fault type normal condition data. To realize orthogonality, D 2 orthogonality (w j ) needs to be as small as possible.
To reduce computation complexity, D 2 sensitive (w j ), D 2 monotonousness (w j ), and D 2 orthogonality (w j ) are simplified as follows:  Computational Intelligence and Neuroscience Substituting equations (6)-(8) into equation (2), referring to the calculation from Fisher's discriminant ratio, and then rewriting equation (2), we obtain where S j subject � S j orthogonality + α 2 S j monotonousness . σ j,q gives more penalty to the features that cause MIHI monotonicity and orthogonality showing fluctuations. σ j � (σ j,1 , σ j,2 , . . . , σ j,m ) is calculated by the following formula: where Lastly, according to the health index of jth fault type value changing trend is positive or negative, multiplying 1 or -1 with w j to ensure the health index has an increasing trend. Figure 2 shows the solving flowchart of the proposed quasi-orthogonal sparse project algorithm. First, the historical data of various types of faults are reprocessed, transferring all observation signals to the spectrum feature [F 1 , F 2 , . . . , F k ]. Second, the parameters α, β, c, and n nor are set, and each row of the weight matrix W is solved in turns.

Solution Detail of the Weight Vector.
In this section, the solution of equation (10) is provided. In general, equation (9) cannot be solved using tools from convex optimization. According to [25], we need to use a minimization-maximization algorithm to rewrite it. e first step is to construct an iterative from equation (9): From the result d ⌢ of equation (11), we can obtain . e second step is to build a transfer matrix R ∈ R (m− 1)×m : Substituting equation (12) into equation (11), we obtain where ∘ denotes Hadamard product. en, the augmented Lagrange function of equation (13) is built as follows: Computational Intelligence and Neuroscience where . By using the linearized alternating direction method [26], iterative augmented Lagrange function equation (14) can be rewritten as three subequations: where and l is the approximation parameter. To ensure the convergence of equation (15), Using a soft-threshold algorithm can obtain the closed-form solution of equation (15): Setting parameters α, β, γ, n nor and j = 1 Obtaining weight matrix W · Figure 2: Solving weight matrix of the quasi-orthogonal project algorithm.

Data of Illustrative Example.
In this illustrative example, the bearing run-to-failure data are studied. Bearing fault datasets have three types: inner race fault, cage, and outer race fault. e run-to-failure data from XJTU are measured under the condition of 11 kN load, 2250 rpm speed, and 25.6 kHz sampling frequency. e data files that come from bearing 2_1, bearing 2_3, and bearing 2_5 in reference. Table 1 is used to calculate the weight matrix.
In each observation epoch, the length of the signal that the sensor collected is more than 20000; thus, the size of the spectrum generated via FFT transform can be more than 10000. To reduce computational complexity, in this case, FFT only generates 512 dimensions' spectrums as the quasiorthogonal sparse project algorithm input data. Figure 3 shows the performance degradation assessment of MIHI for the different bearing fault monitoring. Here, the balance parameter α is set to 1, n nor is set to 10, β is 2e −4 , and c � 2e −5 . e blue line denotes the condition monitoring health index of the inner race fault, the red line represents the cage fault monitoring, and the green color is the degradation trend of the outer race fault. To facilitate the observation, the MIHI value of each fault type monitor curve will minus the average of the first 50 files' MIHI value. Figure 4 shows time-domain features that can indicate an occurrence of an incipient-bearing fault. To monitor the bearing health state, some time-domain features are used to quantify the bearing run-to-failure data, such as standard deviation and kurtosis. In Figure 4, we use the dataset bearing 2_5 to illustrate the superiority of the MIHI for bearing health monitoring. However, the time-domain features do not have monotonic trending, which is not beneficial to the assessment of bearing degradation performance and prognostics. Meanwhile, the MIHI monitoring curve not only has a strong monotonic trending line but also can indicate an incipient-bearing fault.

Results and Analysis.
To illustrate further the advantages of the proposed method, the natural variability of the proposed MIHI and how it is used for fault detection and incipient fault diagnosis are also provided. First, the MIHI at observation epochs 1-50 in a normal stage is used as a historical normal dataset. Second, whether the normal state dataset obeys the Gaussian distribution is checked. At a significance level of 5%, the MIHI normal state datasets from bearings 2_1, 2_3, and 2_5 all satisfy the normal distribution conditions. erefore, the Gaussian distribution assumption of the normal stage is accepted. Lastly, the three-sigma rule is used to detect a bearing abnormality, and the statistical threshold can be used as an early warning baseline for fault detection and beginning of degradation assessment. e proposed MIHI monitoring curve family and its corresponding incipient fault threshold are plotted in Figure 5. Combined with the statistical threshold, the proposed MIHI can realize bearing incipient fault diagnosis and continuous detection of the bearing degradation process.

Hyperparameter Analysis.
e main hyperparameters of the proposed algorithm α, β, c, and n nor measure monotonicity, orthogonality, weight sparsity, and weight difference sparsity. ese four hyperparameters are empirically chosen. In this section, the hyperparameter selection suggestion and the hyperparameters' effect on the final HIMI are studied.
First, a function is built to evaluate the MIHI condition monitoring curve's monotonicity: where HIMI where HIMI j ∈ R k×n j , which is calculated by . . , f j,n j ], and HIMI j j,: denotes the jth row of HIMI j . e smaller the value of orthogonality, the better orthogonality the MIHI curve family has.
In this illustrative example, the final fault type number is 3 (inner race, cage, and outer race); thus, k � 3. Assume that the first fault degradation dataset is bearing 2_1 in Table 1 and consists of 491 files; thus, n 1 � 491. e spectrum from the first file's signal in bearings 2_1, 2_3, and 2_5 is f 1,1 , f 2,1 , and f 3,1 , respectively. Our experiments show that even though the hyperparameter n nor (n nor � 10) is set as a small value, the quasi-orthogonal sparse project algorithm can still help the MIHI curve family to obtain good orthogonality. For the rest of this section, the hyperparameter n nor selection is not studied. e resulting heatmap of monotonicity is shown in Figure 6, and that of orthogonality is shown in Figure 7. We study four α values, i.e., 0, 0.4, 1, and 2. In each heatmap, we study eight β and c values, which make up 64 combinations. Figures 6 and 7 not only indicate the hyperparameters' (α, β, and c) effect on the monotonicity and orthogonality of the MIHI monitoring curve family but also offer the selection reference of hyperparameters α, β, and c. e hyperparameter α decides the MIHI monitoring curve's monotonicity. As shown in Figure 6, the bigger the parameter α, the more monotonic the MIHI monitoring curve is. However, Figure 7 indicates that the big α causes the Computational Intelligence and Neuroscience 7 MIHI monitoring curve family's orthogonality to decrease. Moreover, from all the heatmaps in Figures 6 and 7, α � 1.
When prerequirement Properties 1-3 of the HIMI curve family are met, the sparser the weight matrix W is, the better overfitting is avoided. e heatmaps in Figures 6 and 7 show that β should be smaller than 0.0002 and c/β should be smaller than 0.1; otherwise, the weight matrix W is too sparse to map the spectrum features into a MIHI monitoring curve family, which has good monotonicity and orthogonality.
ALGORITHM 1: Algorithm 1.   Computational Intelligence and Neuroscience illustrate the effect of hyperparameters β and c in Figure 8. Compared with the MIHI curve family when β � 2e −4 and c � 2e −5 , as shown in Figure 8(b), the monotonicity and orthogonality of the MIHI curve family when β and c � 0are stronger (Figure 8(a)), and the weight matrix fits all spectrum components. However, bearing fault characteristic information is not distributed in all frequency bands, and the weight matrix should only focus on the spectrum components related to the fault. e weight matrix calculated when β and c � 0 also fits with substantial noise, thus reducing the generality of MIHI. As shown in Figure 8(c), if the weight matrix is too sparse, then the MIHI monitoring curve family cannot satisfy Properties 1, 2, and 3. us, we recommend using hyperparameter heatmaps to select suitable hyperparameters when applying the proposed method.

Conclusions
is study proposed a method of building a MIHI to trace the various bearing degradation trends with various types of faults. e proposed method is a linear transform algorithm that maps high-dimensional observation features into low-dimension MIHI, indicating the bearing's various fault type degradation trends at the same time. Meanwhile, inspired by the orthogonal vector, we proposed utilizing orthogonality to develop the health index of bearing degradation trend monitoring. is algorithm also introduces weight sparsity and weight difference sparsity to avoid overfitting. e proposed algorithm has explicit and simple mathematical expressions, and the process of calculation does not rely on the complex optimization algorithm. erefore, the method is suitable to deal with situations that have high-dimensional observation features.

Nomenclature k:
Number of fault types m: e dimension of the spectrum feature α: MIHI balance parameter β: MIHI sparsity penalty parameter c: MIHI weight difference sparsity penalty parameter n nor : Number of the epoch of bearing normal state η: Lagrangian penalty parameter when solving l: Approximation parameter when solving w j,q : j − row and q − column value of W W ∈ R k×m : Project matrix of MIHI w j ∈ R 1×m : j row of W F j ∈ R m×n j : Spectrum feature matrix of jth fault type degradation process data f j,i ∈ R m×1 : Spectrum feature of jth fault type ith observation epoch N j ∈ R m×n nor : Normal state observation matrix of jth fault type μ j ∈ R m×1 : Mean vector of N j σ j ∈ R 1×m : Project matrix j − row's feature penalty vector A ∈ R m×m , b ∈ R 1×(m−1) , c ∈ R m×1 , d ∈ R 1×m , e ∈ R 1×(m−1) , Q ∈ R m×m , G ∈ R m×m :

Intermediate variables in solving process
Data Availability e data used to support the findings of this study are included within the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.