A Vibrational Signal Fault Diagnosis Rule Extraction Method Based on DST-ACI Discriminant Criterion

,


Introduction
e health condition of the mechanical equipment can be evaluated by analyzing the vibration signals. Since the signal is a vector composed of a large number of vibration amplitudes, it is difficult to directly observe the hidden information in it. erefore, researchers usually preprocess the signal first and extract some signal features that can be used for fault diagnosis. en, based on these quantifiable features, the signal type and the equipment health will be discriminated further. ese features come from researchers' previous research experience, and can describe the volatility, periodicity, energy distribution, or waveform characteristics of the signal from different angles. After the feature extraction, the relationship between the signal feature value and the fault type can be used to construct a fault diagnosis rule of the form "when a signal feature value is within a certain range interval, it means that the equipment has a certain fault." Association rule mining and frequent pattern extraction are the important technical methods to find the relationship between different events. e frequent pattern mining technique was first proposed by Agrawal et al. [1]. is technique first finds frequent itemset whose support is greater than the support threshold. en, several inferences are generated according to the frequent itemset, and the inferences whose confidence is greater than the confidence threshold will be saved as association rules. Subsequently, Agrawal and Srikant, successively, proposed the famous apriori algorithm, apriori-Tid algorithm, and apriori-Hybird algorithm [2]. Since then, the researchers have developed a variety of improved Apriori-based algorithms. Park et al. proposed the DHP algorithm based on the Hash technique [3]. Savasere et al. proposed the partition algorithm to improve the Apriori algorithm based on the idea of partitioning [4,5]. Bayardo and Agrawal introduced random sampling technology to the apriori algorithm [6]. By mining the frequent co-occurrence relationships between entities, researchers can conduct more in-depth research on the implicit connections between seemingly unrelated entities [7][8][9]. en, the obtained knowledge can be put into industrial or commercial applications [10][11][12]. erefore, the two technologies have been widely used in many fields such as disease diagnosis, traffic management, weather forecasting, software testing, and commercial marketing [13][14][15][16][17][18][19].
With reference to the traditional association rule mining method based on the support-confidence-lift (SCL) discriminant framework, a fault diagnosis rule extraction method based on the same framework can be given. e first step is data preprocessing. Data preprocessing consists of feature extraction, feature state coding, and transactional dataset generation. Several nonrepeated feature state code can form a feature pattern, which represents a signal whose feature values all fall within the corresponding feature state coding range interval. Only when the support of a feature pattern is greater than the support threshold, this pattern can be judged as a frequent feature pattern. On this basis, only when the confidence and lift of the inference containing a frequent feature pattern are greater than the set threshold, the inference can be judged as a correct fault diagnosis rule. In the SCL-based framework, the support evaluates the adequacy of the sample that can support the conclusion, the confidence evaluates the reliability of the inference, and lift evaluates the correlation between the pattern and the fault.
In order to collect the vibration signal of the equipment, the vibration acceleration sensor will be installed on the equipment. In general, the key locations close to the mechanical equipment core component or the vulnerable points that are prone to damage and aging will be selected as the sensor measurement points. However, in some specific environments, due to the design constraints of industrial land or equipment specifications, the space where the equipment is located is small, and the equipment distribution is dense. As a result, there is no space for arranging the signal transmission cable and sensor protective cover on the case or the position near the fault source. At this time, engineers can only install the sensors on the base or machine foot, so that the sensors do not take up space and is easy to load or unload.
At present, in the research literature on fault diagnosis based on vibration signals, few scholars specifically study the installation location of the sensor. In the related literature, the sensor installation positions are all set at the key positions mentioned above [20][21][22][23][24][25][26][27][28][29][30]. For example, if engineers want to monitor bearings, sensors are often installed in the radial and vertical directions of the bearing seat. If engineers want to monitor the alignment of the shaft system, the sensor is often installed near the coupling. Almost no researchers set the measuring points on the machine feet specially. Furthermore, for many commonly used standard dataset in the field of fault diagnosis, the sensors used to collect the data are installed in the conventional measuring point positions, instead of the positions of the machine feet [31][32][33][34][35][36]. In addition, in some literature that considers severe background noise interference, only application scenarios with severe noise are introduced. However, the specific sensor placement location is not involved [37][38][39][40][41][42][43]. erefore, there is almost no research on the fault diagnosis of the machine foot vibration signal.
is special engineering application scenario brings the following three inconveniences.
(1) In different measurements, the value of the same feature calculated from the signal under the same working condition is not the same. Due to the noise interference, these values are distributed in a range interval, but the width of the interval is random. When the sensor is arranged at the machine foot, the measuring position is far away from the fault source. As a result, the noise interference is more serious. In the data preprocessing stage, it is necessary to map the continuous signal feature values to the unique code representing the different range intervals. If the range interval is simply divided according to the traditional method such as equal-width discretization or equal-density discretization, it will be possible to confuse the signal feature values of different fault types into one category and map them to the same feature state code. As a result, a large number of transactions in which the fault type does not match the feature state code will be generated, and the credibility of the extracted rules will be reduced.
(2) Due to the strong noise interference of the machine foot vibration signal, the noise feature and the fault signal feature that can reflect the fault characteristics are mixed with each other. Under these circumstances, it is very possible to extract the redundant pattern of a nonreference value. e redundant pattern refers to the pattern that contains a large number of noise feature states. e noise features are widely distributed in the features of all types of signals.
erefore, if a pattern contains a large number of noise feature states, the support of the pattern will be greater, and the traditional algorithm will misjudge the pattern with a higher occurrence frequency as the frequent feature pattern, but in fact the value of this pattern is not high.
(3) In the traditional frameworks, the lift is introduced to further evaluate the correlation between the feature pattern and the fault. Lift is a metrics to describe the promotion degree of a pattern to the occurrence probability of a fault. By introducing the lift, the socalled redundant rules (the contradictory "mutual exclusion rules") can be prevented from being extracted. In order to further eliminate the influence of accidental factors, especially when the background noise interference of the machine foot measurement position is large, the user often hopes to extract the For the above reasons, this paper proposes a fault diagnosis rule extraction method based on the Dynamic Support reshold and Association Coefficient Interestingness (DST-ACI) discriminant criterion for the machine foot measurement point signal. e main innovations of the proposed method are as follows: (1) In order to divide the range of different features into different intervals and map densely clustered signal feature values that may represent the same fault to the same range interval as much as possible, a feature state coding method based on K-means clustering is proposed. is method uses the Kmeans clustering method to classify the densely clustered values of the same feature into one cluster, and each cluster represents a subrange of the feature value range. e feature values classified into the same cluster will be assigned a feature state code representing the cluster. e transactional dataset will be generated according to the conclusion of the feature state coding.
is method takes into account the imbalance of signal feature value distribution. e relatively clustered feature values will map to the same range interval as much as possible, so that the converted feature state codes will be also as the same as possible. In this way, the matching accuracy between the feature state and the fault type in the generated transaction can be further improved, thereby improving the credibility of the extracted fault diagnosis rules.
(2) In order to reduce the number of the extracted redundant patterns, a frequent feature pattern mining method based on the dynamic threshold support (DST) discriminant criterion is proposed. In the process of frequent feature pattern mining, the DST of different candidate feature patterns can be dynamically adjusted based on the occurrence frequency of the feature state in the pattern.
Since the noise features are widely distributed in all types of signals, the DSTof the pattern that contains a large number of noise feature states will be large. erefore, the redundant pattern that contains a large number of noise feature states will be filtered out to a certain extent. On the other side, if a pattern contains a large number of fault feature states, since the occurrence frequency of these feature states is relatively smaller, the DST will become smaller. By adopting this discriminant criterion, the feature pattern with relatively few occurrences but with strong cooccurrence of feature states can be extracted as much as possible.
(3) In order to meet the users' needs to further improve the credibility of the extracted rules, a fault diagnosis rules extraction method based on the association coefficient interestingness (ACI) discriminant criterion is proposed. is method introduces a new metrics called the association coefficient interestingness, abbreviated as ACI. e user can intuitively distinguish the correlation between the feature pattern and fault through ACI. In addition, users can set the interestingness threshold between 0 and 1 to filter out the fault diagnosis rules with a higher lift, thereby further improving the credibility of the extracted rules.
Furthermore, the concept of minimum lift function is introduced to measure the reliability of different fault diagnosis rule discriminant criterions. It is mathematically proved that the minimum lift function of the rules extracted based on the ACI criterion are generally higher than the other two existing methods under the same threshold conditions. Moreover, the user can increase the minimum lift function value to any desired level by adjusting the association coefficient in ACI. erefore, the ACI-based discriminant criterion can ensure that its minimum lift function is always larger than any other criterion by adjusting the association coefficient. e fourth section mainly introduces the algorithm flow of extracting fault diagnosis rules. e logical and technical relationship between the three innovations is shown in Figure 1.

Feature State Coding Method Based on K-Means Clustering.
is chapter introduces the whole process of using cluster-based feature state coding methods to complete data preprocessing in detail. e first part introduces the commonly used signal features and the calculation method. e second part introduces the feature state coding method based on Kmeans clustering. In this part, the K-means clustering algorithm will be used to divide the range of each feature into different intervals, and the feature values falling in the same range interval will be mapped to the feature state code representing the range interval. e third part introduces the process of generating total fault transactional datasets based on the clustering results. e three parts of this chapter and the logical relationship between them are shown in Figure 2. e first step of feature state coding is to extract the feature parameter value of each sample signal. Signal features can reflect different characteristics of the signal [44][45][46][47][48][49]. e commonly used signal features are shown in Table 1 Suppose the range of a feature parameter P f n (m) is Ψ m , and it needs to be divided into L m range intervals Ψ m l , then In traditional methods, equal-width discretization or equal-density discretization is often used to complete coding. e equal-width discretization will divide the feature value range into several equal-length intervals. e equaldensity discretization maintain the number of the feature value in each interval the same. en, the feature values falling in the same range interval will be uniformly mapped to the unique code representing the range interval. e above two coding methods are suitable for the situation where the data distribution is uniform and the data change is continuous or gentle. However, if the feature value distribution of the sample signal is imbalanced, the above coding methods are likely to classify feature values that do not belong to the same fault into the same range interval. In the industrial scenario studied in this paper, due to the influence of noise, the noise interference of the signal collected from the machine foot measuring point is more serious. erefore, the distribution of the feature values will be more imbalanced. As a result, it is likely to classify the feature values that do not belong to the same fault into one category.
e fault type and the feature state code will mismatch in many transactions, which seriously affects the credibility of the extracted fault diagnosis rules.
For this reason, this paper uses the K-means clustering algorithm to find the optimal division of each feature range.
For the m-th signal feature P m , the feature state coding algorithm flowcharts based on K-means clustering are shown in Figure 3. In this paper, the number of clusters k is equal to the number of fault types that need to be classified, and the cluster center is equal to the average of the feature values of all samples.
It should be noted that all the feature coding method cannot be completely accurate, and the possibility of dividing feature values belong to different faults into the same range interval still exists. However, if an encoding method fully considers the distribution law of the feature values, the matching degree between the fault type and the feature state code in the same transaction will be higher, and the fault diagnosis rules extracted based on this coding method will generally be more accurate. In terms of the overall level, the extracted rule metrics will be better than the discretization methods that do not consider the feature value distribution. erefore, the performance of different discretization methods can be compared based on the overall metrics level of the extracted rules.
Finally, all the feature values P f n (m) of different sample signals will be divided into k parts.
is paper takes the output as the final optimal range interval division, and the values of P

Frequent Feature Pattern Mining Method Based on the Dynamic Support reshold Discriminant Criterion.
After the total fault transactional dataset is generated, frequent feature patterns can be mined. is chapter is divided into two sections. e first section introduces the concept and calculation method of the dynamic support threshold. en, the frequent feature pattern mining method based on the Root mean square P 11 � P 10 /P 5 Shape factor P 12 � P 1 /P 10 Crest factor P 13 � P 1 /P 5 Impulse factor Square root amplitude P 15 � P 1 /P 14 Coefficient of variation

Main frequency band position
N is the number of sampling points, x i is the amplitude of the i-th sampling point, k is the number of frequency domain sequence points, f k is the k-th frequency value, and y k is the spectral value corresponding to f k .

Shock and Vibration 5
DST discriminant criterion and its advantages are analyzed. e second section mathematically proves that the DST criterion has downward closure property.

Frequent Feature Pattern Mining Based on DST Dis
of pattern X can be calculated according to the following formula: where θ min is the minimum support threshold set by the user, and θ avg (X) is the average threshold of pattern X. e calculation methods of the two thresholds are as follows: where count(T) is the transaction number of total fault transactional dataset T, and avg(X) is the mean of the occurrence frequency of all elements in the pattern X. 0 < α < 1 is the support coefficient, and 0 < β < 1 is the average coefficient. Only when the support of pattern X meets the following conditions, X will be judged to be a frequent feature pattern, where count(X) is the occurrence frequency of the feature pattern X in the dataset T. It can be seen that the traditional constant support-based criterion emphasizes the "absolute occurrence frequency" of the feature pattern. However, the DST-based criterion emphasizes the "relative co-occurrence frequency" of the feature states in a pattern. e DST-based criterion has the following advantages: (1) e support coefficient α measures the proportion of the transaction number that contains X to the transaction number of the total fault transactional dataset T. erefore, the minimum support threshold θ min can ensure that the occurrence frequency of frequent feature patterns meets the bottom line of users' psychological expectations. 2) The number of clusters k Step1: Calculate the distance between the feature value of all sample signals and each cluster center. Then, assign the feature value of each sample to the cluster center with the smallest distance, and the feature values assigned to the same cluster center form a cluster.
Step2: According to the feature values in each cluster, calculate the centroid of all the feature value in each cluster and take it as the new cluster center.

No Yes u=u+1
Whether there are new feature values are reassigned to different clusters or the cluster center changes End, Output the clustering results (2) e average threshold θ avg (X) can be dynamically adjusted according to the occurrence frequency of each item in pattern X. If the occurrence frequency of the items in pattern X is generally large, then avg(X) and θ avg (X) will increase. Conversely, avg(X) and θ avg (X) will decrease. If most of the feature states in a candidate feature pattern comes from the noise signal, these feature states will frequently appear in the total fault transactional dataset. At this time, the DST corresponding to this kind of candidate feature pattern will increase, thereby filtering out more redundant patterns to a certain extent. Conversely, when most of the feature states in a candidate feature pattern come from fault signals, the occurrence frequency of these feature states in the total fault transactional dataset will be relatively less than that of the noise feature states. At this time, the DST corresponding to this pattern will be relatively low.
In other words, the average threshold θ avg (X) can ensure that each feature state in it have high cooccurrence. Each feature state in pattern X often occurs concentratedly, and rarely occurs separately when they do not occur at the same time.
(3) e discriminant criterion based on the DST has downward closure property, so the apriori-like algorithm can be used to mine the frequent feature pattern. e detailed proof will be given in Section 2.2.2.

Downward Closure Property Proof of DST-Based Discriminant Criterion
Theorem 1. e DST-based discriminant criterion has downward closure property. When the pattern Y⊆X, if Y is not a frequent feature pattern, then X must not be a frequent feature pattern.

Proof.
Suppose that the pattern e feature pattern containing k feature states is called as feature k-pattern. Suppose that a frequent feature k-pattern . e process of generating highdimensional patterns from low-dimensional patterns is a process of constantly adding new items to the patterns. erefore, in order to generate a high-dimensional superpattern X from the low-dimensional pattern Y, first add an item x Y 1 −Y to the pattern Y to generate candidate feature (k + 1)-patterns Y 1 .
In the process of feature state coding, the feature states have been sorted and recoded according to their occurrence frequency. In addition, in the process of connecting two frequent feature k-patterns to generate a candidate feature (k + 1)-pattern, only the feature state with a larger code is allowed to become a suffix of the feature state with a smaller code. erefore, the occurrence frequency count(x k ) of any item x k in the frequent feature k-pattern Y must not be greater than the occurrence frequency count( erefore, the following inequality holds, By shifting the term, the inequality becomes the following form: erefore, the average threshold of pattern Y and the average threshold of pattern Y 1 meet the following inequality: Repeatedly adding a single item to pattern Y, then the higher-dimensional superpatterns Y r (r � 1, 2, . . . , n) will continuously generate until the feature pattern X is generated. Based on the above conclusion, the occurrence frequency of each newly added feature state code must meet the following recurrence inequality: where count(x A−B ) represents the occurrence frequency of the new item added in pattern B to generate pattern A. erefore, the following conclusions are established: e θ min is the constant minimum support threshold artificially set. erefore, Finally, the following conclusions can be derived: In summary, the discriminant criterion based on the DST has downward closure property. is chapter includes three subsections. First, a function called the minimum lift function is introduced to evaluate the minimum credibility of the fault diagnosis rules extracted by different correlation discriminant criteria. Second, a new metrics called the association coefficient interestingness (hereinafter, referred to as ACI) for evaluating the correlation between patterns and fault is proposed.
is metrics maps the range of lift to the interval (−1, 1), allowing users to set an interestingness threshold that meets their psychological expectations to further filter out more reliable fault diagnosis rules. ird, the following conclusions are proved mathematically: given the same threshold, the minimum lift function of the ACI-based discriminant criterion is greater than the two existing discriminant criteria, therefore, the extracted fault diagnosis rules are more credible.

Minimum Lift Function. Suppose the correlation discriminant criterion between the pattern X and fault
is a metrics function that measures the correlation between the pattern X and the fault f, which can be called the interestingness function. 0 < θ I < 1 is the interestingness threshold.
When the interestingness of the rule X L ⟶ f L is exactly equal to θ I , the rule X L ⟶ f L has the lowest interestingness among all the rules that can be extracted by this interestingness discriminant criterion. erefore, it can be considered that the credibility of X L ⟶ f L is the lowest among all the rules that can be extracted. e lift of rule X L ⟶ f L can be calculated as It is easy to prove that when the user sets lift(X L ⟶ f L ) as interest(X ⟶ f), then lift(X L ⟶ f L ) � θ I . Bao and Zhang, respectively, proposed two different improved interestingness discriminant criteria in the literature [50,51]. e lift(X L ⟶ f L ) of these two criteria are the same, and both can be written as follows: e lift(X L ⟶ f L ) of the least credible rule X L ⟶ f L is a function of interestingness threshold θ I . erefore, the following definition is made: when the discriminant criterion interest(X ⟶ f) ≥ θ I is used to determine the correlation between the pattern X and the fault f, the lift of the rule X L ⟶ f L with the smallest interestingness among all the rules that can be extracted is defined as the minimum lift function MLF(θ I ) of the criterion, which is denoted as MLF(θ I ) is a function of θ I . As explained above, the lift measures the promotion degree of the pattern X to the occurrence probability of the fault f. Obviously, under the same threshold condition, the greater the MLF(θ I ) of a discriminant criterion, the greater the lift of the least reliable fault diagnosis rule that can be extracted according to this criterion. erefore, the lift of the extracted rules based on this criterion will be generally higher, and the overall credibility of the extracted rules will also be generally higher.
e existing research shows that under the same threshold condition, the minimum lift function of the two discriminant criteria in literature [50,51] is greater than that of the lift-based criterion.

Fault Diagnosis Rule Extraction Based on ACI Discriminant Criterion.
e Association Coefficient Interestingness (ACI) proposed in this paper is as follows: where p > 1 is the association coefficient and p ≡ 1(mod2).
Only when the confidence and the ACI of the inference X ⟶ f meet the following two conditions, can it be determined that X ⟶ f is the fault diagnosis rule, where θ I is the minimum interestingness threshold and θ C is the minimum confidence threshold. e advantages of this discriminant method are as follows: (1) e monotonicity of ACI(X ⟶ f) is consistent with lift(X ⟶ f). In addition, the correlation between the feature pattern X and the fault f can be directly judged according to the sign (plus or minus) of ACI(X ⟶ f), the discriminant conclusion is compatible with lift(X ⟶ f), and the discriminant method is intuitive and simple. When ACI(X ⟶ f) > 0, lift(X ⟶ f) > 1, the pattern X and the fault f are positively correlated. When ACI(X ⟶ f) � 0, lift(X ⟶ f) � 1, the pattern X and the fault f are independent of each other. When ACI(X ⟶ f) < 0, lift(X ⟶ f) < 1, the pattern X and the fault f are negatively correlated.
(2) ACI maps the lift to the range (−1, 1). erefore, it is possible for the user to set an interestingness threshold in the range (−1, 1) to further filter the fault diagnosis rules with higher credibility, which solves the problem that it is difficult to set the proper threshold due to the unbounded lift.
(3) e MLF(θ I ) of the ACI-based discriminant criterion is greater than the criteria proposed in the literature [50,51] under the same threshold condition. e user can adjust the association coefficient in the ACI to adjust the change rate of ACI and make the MLF(θ I ) of the ACI-based discriminant criterion reach a sufficiently high level according to their actual needs. e next section will give the proof process of this conclusion. 8 Shock and Vibration

e Property of ACI and Its Proof
Conclusion 1. Under the same threshold conditions, MLF(θ I ) of the ACI-based discriminant criterion is larger than those of the criteria proposed in the literature [50,51].
Proof. When the pattern X and the fault f are positively correlated, lift(X ⟶ f) > 1. e following inequality holds: Suppose there is a fault diagnosis rule, X L ⟶ f L , whose ACI(X L ⟶ f L ) is exactly equal to θ I , that is, en, the following conclusion holds: e MLF(θ I ) of the ACI-based criterion can be derived as follows: If 0 < θ I < 1 and p ≡ 1(mod2), obviously, 0 < �� θ I p < 1. Construct a function g(x), the expression of g(x) is When x ∈ (0, 1), g(x) is a monotonically increasing function. erefore, if 0 < x 1 < x 2 < 1, then g( When p > 1 and is a monotonically increasing function, g( �� θ I p ) > g(θ I ). erefore, the following inequality can be obtained: It can be seen that the minimum lift function MLF(θ I ) of ACI-based criterion proposed in this paper is larger than those of the criteria proposed in the literature [50,51].
In addition, it can be found that the greater the association coefficient p, the greater the minimum lift function MLF(θ I ). If p ⟶ + ∞, then �� θ r p ⟶ 1, the following conclusion holds: erefore, the change rate of ACI and the minimum lift function MLF(θ I ) can be adjusted by changing the association coefficient p according to the users' different psychological expectations, so that the credibility of the extracted rules can be increased to any expectation level. In other words, the discriminant criterion based on ACI can ensure that its minimum lift function is always larger than any other criterion by adjusting the association coefficient.
is advantage guarantees the flexibility and versatility of the criteria based on ACI.
In addition, due to p ≡ 1(mod2), for different association coefficients p, the monotonicity and the sign (plus or minus) of ACI(X ⟶ f) will not change.

Fault Diagnosis Rule Extraction Algorithm Based on DST-ACI Discriminant Criterion.
Synthesizing the contents of the above three subsections, only when inference X ⟶ f meets the following three conditions can it be judged as the correct fault diagnosis rule: If the inference X ⟶ f is judged to be the correct fault diagnosis rule, it will be determined that the fault diagnosis rule "if the feature pattern X appears, the equipment fault type will be determined as f" is established. e algorithm used in this paper will be referred to as the DST-ACI-apriori algorithm. e detailed process of the algorithm is shown in Figure 4.

Experiments
is chapter includes four sections. e first section introduces the fault simulation experiment equipment and the parts used to simulate the fault. e second section introduces the data format of the collected vibration signal and the parameter values used in the algorithm. e third section shows the experimental conclusions in detail. In the fourth section, using the same experimental data as the input, the performance between the DST-ACI method and other existing fault diagnosis rule extraction methods is compared.

Fault Simulation Experiment Equipment.
e fault simulation experiment platform used for fault simulation experiments is shown in Figure 5. e experiment platform is composed of a drive motor, a conveyor belt, a rotating shaft, a shafting aluminum plate, a bearing seat, a movable base, and other parts. e three magnetic vibration acceleration sensors are used to collect the signals. For the acceleration sensor, the sensitivity is Shock and Vibration 9 100 mv/g, the frequency response range is 0.1 Hz∼10 kHz, and the measuring range is −10 g∼+10 g. e data acquisition equipment used in the experiment is shown in Figure 6. e CPU is 4-core 1.91 GHz, the Internet access is 1 gigabit ethernet access, and the RAM is 4 GB.
For the acceleration monitoring module of the data acquisition equipment, the input channel is 4-channel ICP sensor input, the precision is 24 bit, and the sampling rate is optional from 2.56 kHz to 256 kHz. e computer used to save the data is shown in Figure 7.
In the experiment, four types of typical faults, namely, rotor imbalance fault, shafting misalignment fault, bearing inner ring fault, and bearing outer ring fault will be simulated by different fault simulation methods.

Imbalance Fault Simulation Experiment.
A threaded hole is set on the shafting aluminum plate of the fault simulation experiment platform. e imbalanced counterweight bolt will be screwed into the threaded hole on the aluminum plate to simulate the rotor imbalance fault. e imbalanced counterweight bolt and its installation method are shown in Figure 8. e sensor installation position is shown in Figure 9. If k=1, combine the frequent feature k-patterns in pairs to generate the candidate feature (k+1)-patterns.
If k>1, check whether the first (k-1) items of any two frequent feature k-patterns the same. If they are the same, take the union of the two patterns and arrange the elements in ascending order of item code to obtain candidate feature (k+1)-patterns.
Step2: Prune Traverse all the candidate feature (k+1)-patterns. If the support of the candidate feature (k+1)-pattern X is greater than its DST, X will be saved as a frequent feature (k+1)-pattern.

Yes No k=k+1
Whether the number of frequent feature patterns obtained through Step3 is Equal to 0 End Output: Frequent feature patterns,

Step3: Fault Diagnosis Rules Extraction
Traverse all the frequent feature j-patterns X, check all the inferences X->f. If confidence is greater than the minimum confidence threshold and the ACI is greater than the minimum interestingness threshold, X->f will be determined as the correct fault diagnosis rule.     shafting aluminum plate and the rotating shaft, and the signals collected at the two positions are clearer. No.3 measuring point is located on the base of the experiment platform and has no direct contact with the shafting aluminum plate or the rotating shaft. It can be used to simulate the data acquisition conditions when the sensor is placed on the machine foot. erefore, the data used in this paper comes from the sensor installed at No.3 measuring point.
In the following, for each failure simulation test, the two ideal sensor installation positions will be recorded as No.1 and No.2 measuring points, and the sensor to collect the experiment data will be recorded as No.3. e location and number of the measuring point will be shown in the corresponding figure.

Misalignment Fault Simulation Experiment.
After the shafting is aligned, the front mandrel screw of the experiment platform remains stationary, and the rear mandrel screw is screwed in to one side, pushing a corner of the movable base to slightly deviate, causing a shafting misalignment fault. e mandrel screw is as shown in Figure 10. e sensor installation position is shown in Figure 11.

Bearing Inner Ring Fault Simulation Experiment.
A normal bearing is installed in the bearing seat at the end of the experiment platform, and it can be replaced with a inner ring fault bearing. e contact surface between the inner ring and the ball of the fault bearing is machined with a wear groove along the axial direction to simulate the damage of the inner ring. e faulty inner ring bearing used in the experiment is shown in Figure 12. e sensor installation position is shown in Figure 13. For the inner ring fault bearing, the inner diameter of the inner ring is 2.62 cm, the outer diameter of the inner ring is 3.11 cm, the ball diameter is 0.50 cm, the inner diameter of the outer ring is 4.31 cm, and the outer diameter of the outer ring is 5.20 cm. e wear groove machined on the inner ring has a width of 2 mm and a depth of 1 mm.

Bearing Outer Ring Fault Simulation Experiment.
e normal bearing in the bearing seat at the end of the experiment platform can also be replaced with a outer ring fault bearing. e contact surface between the outer ring and the ball of the fault bearing is machined with wear grooves along the axial direction to simulate the damage of the outer ring. e faulty outer ring bearing used in the experiment is shown in Figure 14.
e parameters of the inner ring and outer ring of the outer ring fault bearing are the same as those of the inner ring fault bearing. e location of the wear groove is on the contact surface between the outer ring and the ball. e sensor installation position of the outer ring fault bearing simulation experiment is the same as the inner ring fault bearing simulation experiment.

Experiment Parameters.
In each fault simulation experiment, the data sampling frequency is 25600 Hz, 320, 000 acceleration amplitude data points are collected, and the unit is m 2 /s. Since four types of faults are simulated, the total sample signal dataset includes 1, 280, 000 acceleration amplitude data points. e sampling points in each type of fault simulation experiment are divided into a group of 2000 points, and the entire collected signal sequence can be divided into 160 groups of signal subsequences. e 23 signal features in Table 1 will be extracted from the collected vibration signals. erefore, each type of the fault simulation experiment will generate 160 transactions. Each transaction contains 23 feature state codes and 1 fault type code. e corresponding parameter settings involved in the algorithm are as follows: association coefficient p � 3, minimum support threshold θ min � 100, minimum confidence threshold θ C � 0.7, minimum interestingness threshold θ I � 0.2, and average coefficient β � 0.6.

Experimental
Results. Input the data collected by the above-mentioned fault simulation experiment into the DST-ACI-apriori algorithm, and then the fault diagnosis rules that meet the DST-ACI criterion can be extracted.  Table 2.

Results of Frequent
e frequent feature k-pattern contains k feature state codes. e support boxplot of different frequent feature kpatterns is shown in Figure 15.
In the above figure, the x-axis is the k value of the frequent feature k-pattern, and the y-axis is the support. According to the above table and boxplot, the following conclusions can be obtained: (1) When k � 1, there are frequent feature patterns with support close to 400. When k > 1, the support of all the frequent feature patterns are smaller than 300. When k � 1, there is only one feature state code in the pattern, so the support of frequent feature pattern is generally greater than that of the pattern with a larger k. erefore, it is easy to have the special cases with very large support among them. After checking the result, it is found that only one pattern has a support greater than 400, and the support of this pattern is 401. In addition, there is another pattern whose support of is very close to 400, and the support of this another pattern is 392. However, with the exception of the two frequent feature 1-patterns mentioned above, the support of all the other patterns is smaller than 300. Obviously, the feature states in the two patterns frequently appear in at least three types of fault transactional dataset. e two feature states are widely distributed in multiple types of fault transactional datasets. erefore, they can be considered as the noise feature states with a low reference value. After further data inspection, it is found that in the frequent feature 2-patterns and the frequent feature patterns with higher dimensions, the above two feature states no longer appear. It can be seen that the frequent feature pattern mining method based on the DST criterion can indeed effectively filter out the noise feature state.
(2) When k � 1, the dispersion degree of the support of the frequent feature 1-pattern is very large. e possible reason is that when discriminating the frequent feature 1-patterns, only the support of the pattern is considered, and there is no other restrictions on the discrimination of the frequent   feature patterns. In this case, the pattern will be judged as a frequent feature 1-pattern as long as its support exceeds the support threshold. erefore, when k � 1, there will be various types of frequent feature 1-patterns, and the support dispersion degree of these patterns is difficult to limit.
(3) As k continues to increase, the support dispersion degree of the frequent feature patterns gradually decreases. On the one hand, as the value of k become larger, the number of frequent feature patterns mined also decrease, and the support difference of patterns with the same dimension will also become smaller. is leads to a more concentrated support distribution of the frequent feature patterns. On the other hand, when k keeps increasing, DST will not only check the "absolute occurrence frequency" of the feature states, but also further check the "relative co-occurrence frequency" between each feature state in the same pattern. erefore, in the high-dimensional frequent feature patterns, there will be no more significant "unsocial" noise feature state patterns like the mining frequent feature 1-patterns.

Result of Fault Diagnosis Rule Extraction.
A total of 175 fault diagnosis rules are extracted. e number, maximum confidence, minimum confidence, average confidence, maximum lift, minimum lift, average lift, maximum ACI, minimum ACI, and average ACI of the extracted fault diagnosis rules are shown in Table 3.
According to the above table, it can be seen that the confidence of the fault diagnosis rules extracted based on the DST-ACI discriminant criteria are all higher than 0.98, up to 1. Furthermore, the lift and ACI of the extracted rules are also maintained at a high numerical level. e confidence is essentially accuracy. is result shows that the fault diagnosis rules extracted by the DST-ACI discriminant criterion have a high credibility, the mined frequent feature patterns have a strong positive correlation with the faults, and the reliability and accuracy of the conclusions are high. e metrics of a diagnosis rule with the highest confidence are displayed in Table 4.
In the above table, the number before the arrow is the feature state code, and the number greater than 10000 after the arrow is the fault type code. Among them, 10000 represents the shafting misalignment fault, 20000 represents the rotor imbalance fault, 30000 represents the bearing inner ring fault, and 40000 represents the bearing outer ring fault. In fact, after data analysis, among all the extracted rules, the number of fault diagnosis rules related to the shafting misalignment fault is the largest, followed by the bearing inner ring fault and the rotor imbalance fault, and the bearing outer ring fault is the least. e confidence boxplot, lift boxplot, and ACI boxplot of different fault diagnosis rules with k feature is shown in Figures 16-18.
In the above figures, the x-axis represents the k value of the frequent feature k-pattern, and the y-axis represents confidence, lift, and ACI, respectively. According to the above figures, the following conclusions can be drawn: (1) e data distribution characteristics of the three metrics of confidence, lift, and ACI are similar. According to the calculation formula, there is a coefficient between the confidence and lift. is coefficient is only related to the transaction number of the fault transactional dataset. When the fault transactional dataset is generated, it can be considered that confidence multiplied by this coefficient is equal to lift. erefore, there is a linear relationship between the lift and the confidence. erefore, the data distribution characteristics of confidence and lift are similar. In addition, ACI is a function with lift as a variable, so the data distribution characteristics of ACI and lift are also similar.
(2) When k � 1, the dispersion degree of confidence, lift, and ACI of the extracted rule is the largest. When k � 7, the dispersion degree of confidence, lift, and

Shock and Vibration 13
ACI of the extracted rule is the smallest. When k � 1, there are more frequent feature 1-patterns. erefore, the number of fault diagnosis rules that can be extracted is more, and the metrics difference between different rules is also large. When k reaches a certain level, the number of frequent feature k-patterns will be significantly reduced. In addition, the larger the k is, the fewer fault diagnosis rules can be extracted, and the metrics difference between different rules will be gradually reduced.

Performance Comparison Experiment.
In this section, using the same experimental data as the input, the performance between the DST-ACI method and other existing fault diagnosis rule extraction methods will be compared. is section mainly compares the performance differences   with different feature state coding methods, different frequent feature pattern discriminant criteria, and different fault diagnosis rule discriminant criteria.

Performance Comparison of Different Feature State Coding Methods.
is paper proposes a feature state coding method based on K-means clustering. e other two commonly used coding methods are equal-width discretization coding and equal-density discretization coding. In order to test the performance differences, using the same collected signal data, three different transactional datasets are generated according to the above three methods. en, three transactional datasets will be used to mine the frequent feature pattern and extract fault diagnosis rules, respectively. e experiment uses the SCL framework, and the threshold is the same as those in Section 3.2.
e frequent feature pattern mining results are shown in Tables 5-7.
To show the performance difference of different methods more clearly, the key data in the table are shown in Figure 19.
According to the above tables and figures, the following conclusions can be drawn: (1) Based on the equal-density discretization coding method, a total of 109 frequent feature patterns are mined, and k is at most 5. Based on the equal-width discretization coding method, a total of 100 frequent feature patterns are mined, and k is at most 4. Based on the K-means clustering coding method, a total of 625 frequent feature patterns are mined, and k is at most 8. It can be seen that the coding method based on K-means clustering can mine more frequent feature patterns that meet the requirements under the same constraint conditions. (2) When k > 5, the other two discretization coding methods can no longer extract frequent feature patterns, while the K-means clustering method can still mine frequent feature patterns. In addition, the maximum support, minimum support, and average support of the mined frequent feature patterns are maintained at a relatively high numerical level.
(3) Overall, the number of frequent feature patterns mined based on K-means clustering coding method is about 6 times those of the other two coding methods. However, most of the metrics values are maintained at a similar level with the other two discretization coding methods and even have obvious advantages. It can be seen that the frequent feature patterns mined by the coding method based on K-means clustering ensure the accuracy and reliability of the conclusions under the premise of a significant increase in the results' number. is phenomenon also proves that the transaction generated by this method is more accurate than the other two methods, and the coding method based on Kmeans clustering is more in line with the matching relationship between the feature value and the fault type. e fault diagnosis rule extraction results are shown in Tables 8-10.
In order to show the performance difference of different methods more clearly, the key data in the table are shown in Figures 20 and 21.
According to the above tables and figures, the following conclusions can be drawn: (1) Based on the equal-density discretization coding method, a total of 102 rules are extracted, and k is at most 5. Based on the equal-width discretization coding method, a total of 29 rules are extracted, and k is at most 4. Based on the K-means clustering coding method, a total of 490 rules are extracted, and k is at most 8. e K-means clustering coding method can extract the largest number of fault diagnosis rules, and the number of feature states contained in the rules is also the largest. is result shows that the Kmeans clustering coding method can extract more available fault diagnosis rules while ensuring that the extracted rules meet the threshold constraints. (2) e maximum confidence and the maximum lift of the fault diagnosis rules extracted by the K-means clustering coding method are not lower than the results of the equal-density discretization method. e minimum confidence, the average confidence, the minimum lift, and the average lift are all significantly greater than the results of the equal-density-based coding method. It can be seen that the transactional datasets generated by the K-means clustering coding method are more accurate than the transactional datasets generated by the coding method based on equal-density discretization, so more reliable and credible fault diagnosis rules can be extracted.
(3) When k > 5, the other two discretization coding methods can no longer extract the fault diagnosis rules, and the K-means clustering coding method can still extract the fault diagnosis rules. In addition, when k < 8, the maximum confidence of the extracted fault diagnosis rule can reach 1, and the maximum lift can reach 4. When k � 8, the confidence of the extracted fault diagnosis rule can reach 0.98077, and the lift can reach 3.9231. It can be seen that the transaction generated based on the K-means clustering coding method can not only ensure that the number of the extracted fault diagnosis rules is larger, but also that the mined rules have the higher metrics level. It proves that the K-means clustering coding method has better usability and reliability. (4) In general, the number of fault diagnosis rules extracted by the K-means clustering coding method is roughly equivalent to 5 times the number of rules extracted based on the equal-density discretization coding method, and 17 times the number of rules extracted based on the equal-width discretization coding method. However, most of the metrics values are maintained at a similar level with the other two Shock and Vibration      Tables 11 and 12. To show the performance difference of different methods more clearly, the key data in the table are shown in Figure 22.
According to the above tables and figure, the following conclusions can be drawn: (1) Based on the constant support threshold discriminant criterion, a total of 625 frequent feature patterns are mined, and k is at most 8. Based on the DST criterion, a total of 461 frequent feature patterns are mined, and k is at most 7.
(2) In general, under the same support threshold condition, the number of frequent feature patterns mined by the DST-based discriminant criterion is   roughly equivalent to 2/3 of the number of frequent feature patterns mined based on the constant support threshold discriminant criterion. However, the minimum support and average support of the frequent feature patterns mined by the DST-based discriminant criterion are both greater than those of the constant support threshold discriminant criterion.
e maximum support of frequent feature patterns mined by the method based on the DST criterion is the same as the maximum support of frequent feature patterns mined based on the constant support threshold criterion.
It can be seen that, on the basis of testing the support of the pattern, the DST-based discriminant criterion further tests the "relatively frequent co-occurrence" between each feature states in the pattern. erefore, the patterns with stronger co-occurrence of the feature state and higher support can be extracted.

Performance Comparison of Different Fault Diagnosis
Rule Discriminant Criterion. In order to test the performance of different fault diagnosis rule extraction frameworks, using the same collected signal dataset, the three        frameworks are used to extract fault diagnosis rules, respectively. e methods used for comparison are confidence-liftbased method and confidence-improint-based method [50]. e threshold is the same as those in Section 3.2. e results are shown in  In order to show the performance difference of different methods more clearly, the key data in the table are shown in Figures 23 and 24.
According to the above tables and figures, the following conclusions can be drawn: (1) e confidence-lift framework extracts a total of 326 fault diagnosis rules, and k is at most 7. e confidence-improint framework extracts a total of 326 fault diagnosis rules, and k is at most 7. e confidence-ACI framework extracts a total of 175 frequent feature patterns, and k is at most 7.
(2) e rules extracted based on the confidence-lift framework are exactly the same as those extracted based on the confidence-improint framework. It means that the improint-based discriminant criterion does not actually play any role. When the number of transactions in the transactional datasets is large, the lift of the inference will generally be very large. If a small value is set as the interestingness threshold in the confidenceimproint framework, the improint of almost all the    inferences will be greater than the threshold. Under this condition, the improint-based criterion degenerates to the lift-based criterion and loses its meaning.
(3) Regardless of the value of k, the maximum confidence and maximum lift of the fault diagnosis rules extracted based on the confidence-ACI framework are not less than those of the other two frameworks. Moreover, the minimum confidence, the average confidence, the minimum lift, and the average lift of the fault diagnosis rules extracted based on the confidence-ACI framework are all greater than those of the other two frameworks. is result proves that the positive correlation between the feature states and the fault type of the fault diagnosis rules extracted based on the confidence-ACI framework is generally stronger.

Conclusions and Future Works
is paper proposes a vibrational signal fault diagnosis rule extraction method based on the DST-ACI discriminant criterion for the machine foot vibration signal. e proposed method includes three innovations, namely, the feature state coding method based on K-means clustering, frequent feature pattern mining method based on dynamic support threshold (DST) discriminant criterion, and the fault diagnosis rules extraction method based on the association coefficient interestingness (ACI) discriminant criterion. e fault simulation experiment was carried out using a fault simulation experiment platform. e performance of the proposed method was tested using the collected data. e results show that the confidence of the fault diagnosis rules extracted based on the DST-ACI discriminant criteria are all higher than 0.98, up to 1. e overall metrics value level of the support, confidence, and ACI of the proposed method is higher than the existing six methods.
At present, in the research of fault diagnosis based on vibration signals, sensors used to monitor the equipment are often installed at key locations sensitive to vibration signals or close to the fault source. In the literature involving the fault diagnosis under the strong noisy condition, there is no research on the noise interference caused by the sensor installation position. e fault diagnosis research on the machine foot vibration signal is still in its infancy. is topic lacks the sufficient literature support and research results. erefore, there are a lot of research directions that can be explored in the future.
First, the measuring point of the machine foot is far away from the fault source. As a result, the fault signal will be attenuated during the transmission process, and the noise interference will be more. erefore, the structure and dynamic characteristics of mechanical equipment can be explored with the help of modeling techniques such as finite element [52]. On this basis, the fault mechanism and the signal transmission process can be explored, so as to further study the fault diagnosis method for the machine foot signal.
Second, in the early stage of equipment fault, generally only a certain type of fault will occur first. As the equipment operating time increases, due to the coupling of faulty parts with other parts, secondary fault will occur. Compared with the signal generated by a single type of fault, the signal generated by the compound fault will be more complex [53][54][55]. In this case, the technical method of using the machine foot vibration signal to diagnose the compound fault should be studied in the future.
ird, deep learning technology has been widely used in various fields, and the use of neural networks to distinguish equipment fault has also become a very valuable research direction [56,57]. e neural network can classify the complex input data by constructing a high-dimensional classifier.
e convolutional neural network that is often used for deep learning even has the ability to automatically extract high-dimensional features [58][59][60]. erefore, the fault diagnosis of the machine foot vibration signal based on deep learning will also become an important research topic. [61-69].

Data Availability
e signal data used to support the findings of this study were supplied by Guo Cheng under license and so cannot be made freely available. Requests for access to these data should be made to Guo Cheng, 510098438@qq.com.

Conflicts of Interest
All the authors declare that they have no conflicts of interest.