A Diagnosis Method for Rotation Machinery Faults Based on Dimensionless Indexes Combined with K-Nearest Neighbor Algorithm

It is difficult to well distinguish the dimensionless indexes between normal petrochemical rotatingmachinery equipment and those with complex faults.When the conflict of evidence is too big, it will result in uncertainty of diagnosis.This paper presents a diagnosis method for rotation machinery fault based on dimensionless indexes combined with K-nearest neighbor (KNN) algorithm. This method uses a KNN algorithm and an evidence fusion theoretical formula to process fuzzy data, incomplete data, and accurate data. This method can transfer the signals from the petrochemical rotating machinery sensors to the reliability manners using dimensionless indexes and KNN algorithm. The input information is further integrated by an evidence synthesis formula to get the final data. The type of fault will be decided based on these data. The experimental results show that the proposed method can integrate data to provide a more reliable and reasonable result, thereby reducing the decision risk.


Introduction
Large rotating machinery and equipment (such as steam turbines, rotary bearings, fans, and compressors) are key equipment in petroleum, chemical, metallurgy, machinery manufacturing, aerospace, and other important engineering fields.Such equipment requires high safety and reliability [1], so the study of fault diagnosis methods for these types of equipment has been a hot topic in this field.Vibration monitoring signals have lots of nonlinear, random, and nonergodic information, which causes many complications in the fault signal analysis when rotating machinery does not work [2].The timedomain signal of vibration is the most basic and original signal.References [1,2] extracted failure characteristics directly from the time-domain signal and analyzed the fault diagnosis, showing that maintaining the basic characteristics of the signal will be very beneficial.References [2][3][4] used the probability density function of the vibration signal to derive dimensional indexes (the mean and root mean square values, etc.) and dimensionless indexes (waveform, margin index, pulse, etc.) in the amplitude domain.In practice, although a dimensional index is sensitive to the fault characteristics, its value will increase with the development of the fault.In addition, as the working conditions (load, speed, etc.) change, these are easily affected by interference, which reduces their performance as diagnostic measures [3].By contrast, the dimensionless indexes are not sensitive to the disturbance of the vibration monitoring signal and the performance is stable.In particular, these dimensionless indexes are sensitive to neither the changes in amplitude nor the frequency of the signal.That is, they have little relationship to the particular working conditions of the machine [1][2][3].Dimensionless indexes have been widely used in the fault diagnosis of rotating machinery.For the dimensionless indexes, pulse index and kurtosis index are more sensitive to impact type fault, especially in the early fault, the large amplitude of 2 Mathematical Problems in Engineering the pulse is less.Compare to other parameters kurtosis index and pulse index rise faster, so that fault of the range is larger, it is difficult to determine fault type [3,4].
Solving the above problems requires the use of an effective method which can process uncertain information rationally, systematically, and flexibly [5].Evidence theory can effectively express and deal with uncertain and imprecise information and other problems [6].However, in the actual information fusion system, the interference due to the natural environment or human disturbances often leads to conflicting reports by the sensors [7].Traditional D-S evidence theory cannot effectively deal with the integration of conflicting information.When evidence conflict exists between global or local information, using D-S combination rules for fusion leads to fusion results which are contrary to the real value [1,8].Therefore, when a high degree of conflict exists between the evidence, achieving effective integration between the evidence is an urgent problem that needs to be solved.At present, there are three major ways for treating evidence conflict [7]: (1) in the improvement method based on D-S combination rules, there is a high degree of conflict between the evidence, and the D-S combination rules for fusion are used directly.This often produces unreasonable fusion results.This kind of methods has been presented by Yager [9], Dubois and Prade [10], and Feng et al. [7].(2) In the second method, which is based on modifying the original sources of evidence [8,11], the conflict evidence is preprocessed first, and then evidence combination rules are used for fusing evidence.(3) The third method is to modify the model from the sources based on a known model.
These three methods can solve some practical application problems.Which method is used depends on the actual situation and the need.In the absence of an effective combination rule evaluation criteria, it is difficult to determine which combination rule is "the most excellent." Smets et al. put forward a TBM model and believed that the reliability of the evidence conflict which resulted from incompleteness of recognition framework should be assigned to the empty set.Murphy provided an evidence average combination method.This method is based on the modified model and has faster convergence.The deficiency of this method is that it uses the unfused evidence simply, without considering the interconnectedness between them.Reference [12] used a modified Euclidean distance to determine the correlation between the evidence and to obtain the weight of evidence before modifying the original evidence and finally making the fusion decisions.Reference [13] pointed out that the use of a conflict coefficient to measure the conflict between the evidence was not sufficient: the conflict between the evidence is also related to the pignistic probability distance, so these two factors should be used together to measure conflict degree between the evidence.Reference [7] presented a conflicting interval evidence fusion method based on the evidence similarity measure.This method defines the probability conversion rules of the extended pignistic probability and converts the interval evidence into interval pignistic probability and uses the fuzzy interval to normalize the Euclidean distance and gets the similarity between interval pignistic, thus determining the similarity matrix between any two evidences and obtaining confidence of the interval evidence.Different situations of evidence conflict are classified and discussed according to the size of the two values.However, Reference [13] did not consider the distance of the evidence body, so the conflict representation model is still not comprehensive.More importantly, there is no analysis of the conflict and the relationship leading to the conflict and it has not been developed to give a further approach to the process conflict.
Given the above problems, we propose an evidence synthesis method based on dimensionless indexes combined with KNN to improve the reliability and the rationality of the results.This paper is organized as follows.Sections 2, 3, and 4 introduce the problem statement, theory, and rotating machinery fault diagnosis experiment, respectively.Finally, a conclusion is drawn in Section 5.

Problem Statement
Vibration monitoring signals from rotating machinery in the event of a fault include nonlinear, random, and nonergodic information, which leads to great difficulty in fault signal analysis.Although a dimensional index is sensitive to the fault characteristics, its value will increase with the development of the fault; when the working conditions (load, speed, etc.) change, it is easy to be affected, resulting in unstable performance.The dimensionless index is not sensitive to disturbance of the vibration monitoring signal, performance is stable.Of dimensionless indexes, the pulse index and the kurtosis index are more sensitive to impact type fault, especially in the early stage of fault, since there is less pulse and no significant increase in other parameter values while the kurtosis index and pulse index rise fastly.Therefore, these two indexes are more sensitive in the early stage of fault in rotating machinery, resulting in increased fault interval range so it is difficult to distinguish.
Definition 1 (see [2,3]).A dimensionless index is made up of the ratio of two amounts with the same dimension.When describing a particular system, it has a certain physical meaning.In fault diagnosis for dimensionless parameter indexes, where  is the vibration amplitude and () is the probability density function of vibration amplitude.Multiple historical monitoring data of a single fault can be calculated using this function.
Our Aim.We attempted to answer the question regarding how to integrate reliable and reasonable results when the fault diagnosis of the rotating machinery fault signal is uncertain.
In this paper, we addressed the diagnoses of rotating machinery fault for large petrochemical enterprises.Sensors collected many kinds of fault data by mechanical operation in real time online, and the distance between this data and the known training samples was calculated using  the KNN algorithm.After obtaining the distance between the test samples and the known training samples, we took the reciprocal value of the distance, as the probability that the tested sample is the kind of training sample.We fused evidence using the D-S evidence theory synthesis method to make a final decision about the fault.The specific flow is shown in Figure 1.

Calculation of Dimensionless Indexes and Determination of the Fault Zone.
In this paper, we processed vibration monitoring signal using the method of dimensionless calculation [14].
Hypothesis 1 (see [1,2,5]).Under Definition 1, and  = 1,  = 1, then the waveform index Similarly, (1) when  = ∞,  = 1, pulse index   is defined as (2) when  = ∞,  = 1/2, margin index   is defined as (3) when  = ∞,  = 2, peak index   is defined as (4) when  = ∞,  = 4, kurtosis index  V is defined as A dimensionless index is made up of the ratio of two amounts with the same dimension.In this paper, we monitored signals based on the probability density function of the monitoring signal.This dimensionless index is a ratio, which is not affected by the magnitude of the signal, and the correlations between the sensitivity of vibration detector, amplifier, and the magnification are not large, so the monitoring system without calibration can be used in the actual equipment fault diagnosis [1,14].

Mathematical Problems in Engineering
To use the dimensionless index in the study of fault diagnosis, we began with petrochemical core units.We collected data online in real time and calculated the normal state of the rotation unit and many kinds of dimensionless index parameters when each fault happens.Then we calculated the maximum value and minimum value of each dimensionless index for each of core units in the normal state and all kinds of fault states.
Hypothesis 2.  monitoring data of vibration data  were collected under the single fault and  is relatively large.

Conclusion 1.
Under the condition of Definition 1, Hypothesis 1, and Hypothesis 2, the expectation of the dimensionless index can be approximate So, dimensionless index Δ  approximate  This theory provides useful evidence combination rules to fuse and update evidence information in order to solve the problem of processing uncertain information.
Evidence synthesis is the core of the evidence theory.It fuses independent evidence information coming from different information sources in order to produce more reliable evidence information.However, D-S evidence synthesis is limited in different degrees in practical application, especially when evidence conflicts highly or fully.In these cases D-S evidence synthesis loses efficacy, and so researchers at home and abroad in the field have proposed many improvements from their different perspectives.At present, China's fault diagnosis technology is widely used in military, aerospace, chemicals, shipbuilding, and so forth.There are many theories and methods of fault diagnosis, and evidential reasoning has a great significance in fault research.It contains uncertainty information processing, the effective integration of information, determinations of the credibility of the fault indicators, formation, and decision-making.In this paper, we use the idea of evidence theory combined with the dimensionless index to solve such uncertainty problems.Through multifeature fusion recognition analysis, we improve the recognition performance and accuracy of fault diagnosis using effective, appropriate diagnostic methods and determine the root cause of failure quickly [9].
In a large crew equipment, we can install the sensors in different parts of large crew of the equipment to achieve equipment testing.The information from sensors provides all the fault information from each part that needs monitoring and forms a body of evidence.Different evidence bodies correspond to different credits functions.Through analyzing credit functions, we can obtain the corresponding credit and fuse each credit function using certain D-S combination principles to determine the fault eventually.
(1) Basic Probability Assignment.In the recognition framework Θ, the basic probability assignment (BPA) is a 2 Θ → [0, 1] function , called the mass function.This satisfies where the () which makes () > 0 is called a focal element for .
(2) Trust Function.Trust function is also known as belief function.In recognition framework Θ, based on BPA, trust function definition of  is (3) Likelihood Function.Likelihood function is also known as plausibility function.In recognition framework Θ, based on BPA, likelihood function definition of  is (4) Confidence Zone.In evidence theory, hypothetical  is in recognition framework Θ This equation is a classic synthetic formula from D-S evidence theory where the size of the  value which represents the conflict between all the evidence is called the normalization factor.The role of 1 −  is not to assign the nonzero probability values to the empty set in the process of evidence synthesis [16].
In the classical D-S evidence theory synthesis formula, especially for the case of a completed conflict (i.e.,  = 1), the results obtained from (13) above are usually not consistent with the actual situation and the formula loses efficacy.People began to modify this method on the basis of the original formula.There are two main ways in which it can be modified [8].
(1) Based on Modification Rules [9,17].The key to improving synthesis results is how to manage conflict.The new synthesis rules need to efficiently determine how to allocate conflict, and this problem also contains two small problems: which subsets should the conflict be reassigned to and after determining the subset, in what proportion should the conflict be allocated.
(2) Based on Modification Evidence Source Modification [17].This presumes that the D-S synthesis rules for evidence theory are not themselves wrong.When the evidence conflicts highly, evidence should be pretreated first, and then the D-S evidence theory synthesis rules should be used.For those evidence sources in which conflicts are great and unreliable, we can use the discount factor and other methods [18] to process the evidence source without modifying the synthesis rule.

Improved D-S Algorithm.
Since the classic D-S theory can not manage conflict effectively, when evidence conflicts highly, the results using the D-S evidence synthesis rule is different from the actual situation.Many people in China have proposed various modifications to D-S evidence theory.Ye et al. proposed an evidence combination method based on the weight coefficients and the confliction probability distribution [16].After calculating the weighting coefficients for each piece of evidence, the following evidence combination is used for information fusion.The steps are as follows [16].
(1) Allocate the probability value to the proposition in the framework according to the evidence provided by evidence source and establish the weight vector of the evidence source: (2) Assume  max = max( 1 ,  2 , . . .,   ); relative weight vector is available:  = ( 1 ,  2 , . . .,   )/ max , then we can determine the "discount rate" of the basic probability assignment value of the evidence.Using the "discount rate" to adjust basic probability assignment value of all the proposition in each recognition framework according to the following method, the basic probability assignment value after being adjusted is [16] wherein the discount rate is (3) By substituting the probability value of all propositions after adjusting into [11] formula, we can get a new synthetic formula: New synthetic formula fully considers the importance of the fusion evidence that comes from different data and makes synthetic results more realistic.Moreover, except the above improved D-S evidence combination formula in detail, there are many other methods, such as D-S evidence combination formula based on credibility proposed by Sun et al. [19], an effective evidence theory synthesis formula proposed by Li et al. [11].This not only reduces the confliction effectively but also makes the results of synthetic realistic.

Improved D-S Algorithm Application in the Rotating
Machinery Fault Diagnosis.We have realized the rotating machinery fault diagnosis for large petrochemical enterprises.Through sensors collect all kinds of fault data by mechanical operation in real time online and calculate the distance between this data and the known training samples using KNN algorithm.After obtaining the distance between the tested samples and the known training samples, take the reciprocal value of the distance as the test sample probability and training sample probability.The specific flow of fusion evidences using D-S evidence theory synthesis method and making a final decision is shown in Figure 1.The implementation steps are as follows.
Step 1. Fault data can be collected from petrochemical rotation real time.
Step 2. Based on the collected failure data, dimensionless indexes can be calculated and fault zone (the maximum and minimum range in 10 indices) can be set up.Use (2), ( 3), ( 4), (5), and ( 6) to calculate the waveform indices, peak indicators, pulse index, margin index, and the kurtosis index range faults.
Step 3.According to the KNN algorithm, the number of nearest fault points  can be found and the distribution can be derived.
Step 4. Use the improved D-S algorithm in (Section 3.3.2) to calculate degree of conflict (  ) and then conflict vector (  ) can be obtained.
Step 6. Calculate the entropy   in conflict vector (   ) after normalizing.Meanwhile, the weighing values   of   can be calculated based on (18).
Step 7. Use (23) to correct D-S fusion data.
Step 8. Make final decisions after correction.

Rotating Machinery Fault Diagnosis Experiment
This experiment was conducted on large rotating machinery fault diagnosis experiment platform in petrochemical equipment fault diagnosis key laboratories of Guangdong province.Real time data collection of many kinds of fault types at Guangdong University of Petrochemical Technology (GDUPT) are shown in Figure 2.There are five causes of bearing fault in petrochemical rotary sets.There are bearing wear, bearing outer crack, bearing inner crack, bent axle, and lack of bearing.The rotating machinery vibration acceleration signal in the process of operation was detected and calculated using a linear operation to get the waveform indicator   , peak metric   , pulse index   , clearance factor   , and kurtosis value  V for each kind of fault.
In order to make the experimental data more accurate, we have collected 1024 fault points for every kind of fault and used them as the training samples.Five indexes of the training samples were obtained by linear operations, and the minimum and maximum values of the five indexes were selected to confirm the range of the indicators, as shown in Table 1.In Table 1, it can be seen in the sensitivity of various indicators that the waveform index, because its scope is very small, is the least sensitive.By contrast, the sensitive of the margin index to the jamming signal is much higher.In addition, under the same kind of dimensionless index, the overlap of five kinds of faults is significant; that is, they are highly conflicted.For example, for bearing outer cracks and bearing inner cracks, the dimensionless index values range for the five kinds of indexes is generally low.Choosing a group of data randomly from all the real time acquired data, for example, we can choose a bearing crack value of 3950 and use all the collected 1024 external bearing crack data to produce an array , and we can get (5) = 1, (12) = 5.Then the data value of 3950 was subjected to a linear operation and the fault data values were obtained and used in KNN arithmetic.First, take the middle values of the five dimensionless indexes as the central values of the scope and then calculate the distance from the fault value to each central value.Here we will get 25 groups of distance values.Then convert distance to a probability value using KNN algorithm.The way we choose is to directly take the reciprocals of those 25 groups of distance values and obtain their corresponding probability values.The guiding ideology is that when a test samples is closer to a training sample, it has a higher probability to share the same category of that training sample.In order to make it meet the basic probability equation (1), a probability value normalized processing was performed in each index, and the results are shown in Figure 3. Figure 3 lists various fault probability values under the five indexes.Each indicator provides fault probability values for five kinds of faults including bearing wear, bearing outer crack, bearing inside crack, bent axle, and lack of bearing.We named each indicator to be a basic probability distribution function, which is also called the evidence collection.Five sets of evidence were formed by KNN algorithm, and the information from the 5 groups of evidence collection was fused using D-S evidence theory.We used classic D-S evidence theory and various improvements to D-S evidence theory to match information fusion, and the results are shown in Figure 4. From Figure 4, we can see that the evidence collection processing is not strong enough when it meets the classic D-S evidence theory, especially the classic source of D-S evidence theory considers all of the evidences are equally important, it leads us to the even wrong conclusion with this situation [20].In view of the above reasons, we used the improved D-S evidence theory, adding different weight coefficients to different evidence.The three methods in Figure 4 are based on the weight coefficient of the D-S evidence theory synthesis method.It can be seen that in comparing the three kinds of synthesis methods to the classical D-S evidence theory, that when evidence was highly conflicted, the other methods increase reliability and rationality of the results of synthesis.The tested data, however, were from an external bearing crack.Despite using improved D-S evidence theory the correct diagnosis of the fault was still not obtained.

Mathematical Problems in Engineering
We can see from Figure 4 that from the various sources of evidence, the probability value for the external bearing crack fault is not the largest.In other words, before fusing the evidence, each source of evidence does not think that it is the bearing outer crack that broke down, so the final fusion results are also incorrect.

Conclusion
There are some problems of identifying complex faults in petrochemical rotating machinery.First, the corresponding zone of the dimensionless index is difficult to determine.[16] The synthesis formula of this paper Second, when the data is transferred from the scene to a remote server, it is disturbed by various factors which cause transmission errors.Fluctuations in the calculation of the rotating machinery fault dimensionless indexes are large, resulting in difficulties with correct fault diagnosis.In this paper, we used a rotating machinery fault evidence synthesis diagnosis method combining dimensionless index with KNN to achieve fault evidence synthesis diagnosis of the rotating machinery to make the fusion result more reasonable and reliable.The increased reliability of the results will reduce the risk of decisions based on incorrect information.

Figure 1 :
Figure 1: Flowchart of rotating machinery fault evidence synthesis diagnosis.

Figure 2 :
Figure 2: Rotating machinery fault diagnosis real experiment condition.(a) The developed real test bed.(b) Fault diagnosis rotating machinery test bed.(c) Normal bearing.(d) Wearing ball bearing.(e) Outer ring crack bearing.(f) Inner ring crack bearing.(g) Bend shaft.(h) Lacking ball bearing.

Five
types in petrochemical rotary sets of bearing failure Probability value D-S evidence theory synthetic formula Direct weighted synthetic formula Synthesis formula of
[15]r and Hart proposed the -nearest neighbor algorithm (KNN) in 1968[15].The idea behind the algorithm is to calculate the distance between tested samples and known training samples based on a distance function.Select -nearest sample values and choose an unknown sample according to the -nearest sample values.This method is widely used in fault diagnosis, text classification, data mining, machine learning, pattern recognition and image processing, and other domains.This paper has  fault samples distributed to  classes,   ,   ,   ,   , and  Classic D-S Algorithm.D-S evidence theory is an uncertainty reasoning method, also known as belief function theory.It is widely used in intelligent data processing, information fusion, expert systems, data mining, fault diagnosis, target identification, decision analysis, and other domains.