^{1}

^{1}

^{1}

^{2}

^{1}

^{2}

In view of the problems of uneven distribution of reality fault samples and dimension reduction effect of locally linear embedding (LLE) algorithm which is easily affected by neighboring points, an improved local linear embedding algorithm of homogenization distance (HLLE) is developed. The method makes the overall distribution of sample points tend to be homogenization and reduces the influence of neighboring points using homogenization distance instead of the traditional Euclidean distance. It is helpful to choose effective neighboring points to construct weight matrix for dimension reduction. Because the fault recognition performance improvement of HLLE is limited and unstable, the paper further proposes a new local linear embedding algorithm of supervision and homogenization distance (SHLLE) by adding the supervised learning mechanism. On the basis of homogenization distance, supervised learning increases the category information of sample points so that the same category of sample points will be gathered and the heterogeneous category of sample points will be scattered. It effectively improves the performance of fault diagnosis and maintains stability at the same time. A comparison of the methods mentioned above was made by simulation experiment with rotor system fault diagnosis, and the results show that SHLLE algorithm has superior fault recognition performance.

With the development of the modernization process, the structure of mechanical equipment has been more and more increasingly sophisticated while the degree of automation and function of realization grow increasingly stronger. In the fault diagnosis of rotating machinery, the more sophisticated the monitoring and control system, the more the numbers of sensors. What is more, the data of presenting state characteristics in real space is complex. These multiple variables make the data to present parameters of equipment running status become more complex which to describe the state of the data abstraction are the high-dimensional data. Faced with the characteristics of failure samples such as high-dimensional diversity, the traditional linear method has great limitations. However, the nonlinear manifold learning method was developed rapidly since it was first proposed in the journal of Science in 2000. From a huge amount of complicated and changeable high-dimensional observation data, the methods make data analysis and state decision-making extend from the original Euclidean space to the manifold, which is able to identify the key information accurately and dig out signal essential characteristics and internal rules, which will be analyzed and judged for fault diagnosis. Classic manifold learning methods such as isometric feature mapping (ISOMAP) [

Euclidean distance is used in LLE algorithm for local linear fitting to show the overall topological structure, namely, to represent the global nonlinear structure characteristics according to the local linear fitting. LLE algorithm is a local weight matrix manifold learning algorithm with the assumption that observation data set is located in or approximately in the low-dimensional embedding manifold in the high-dimensional space. The basic idea of LLE is that the weight value of the data points in high-dimensional space can be best reconstructed and it can carry the local geometry information of manifold from high-dimensional space to low-dimensional space. LLE holds the opinion that constructed weight matrix can preserve the essence characteristics of the local neighborhood. That means the weight matrix can maintain the same geometric properties of local neighborhood of data set regardless of scaling or rotation.

The basic steps (described as Figure

Steps of locally linear embedding algorithm.

Set the neighbor points (

Calculate the Euclidean distance between any two points

In the neighbor area,

Calculate

In LLE algorithm, nonlinear data set is smartly divided into the date representation with local linear structure to reduce dimension effectively for nonlinear date. The main parameters are neighbor points and embedding dimension. The problem of solving the least squares is transformed into the eigenvalues in calculating process; thus it reduces the amount of calculation. In general, LLE has advantages of less undetermined parameters, overall analysis optimal solution, smaller computational complexity, and direct geometric meaning.

It is a certain difficulty when original LLE algorithm is used to deal with complicated data samples in the engineering practice. Because the LLE method assumes that the distribution of sample points on the manifold is continuous dense and uniform sampling, using Euclidean distance directly in high dimensional space to select local neighborhood may not truly reflect the intrinsic nature of the manifold structure. When manifold shows the curls or cascade, namely, the short distance between the two manifold surfaces, refactoring process will cause points with shorter distance from different surface to the same local neighborhood space, thus causing the distortion of the manifold structure. In addition, the numbers of neighbor points (

The purpose of homogenization distance is getting the relatively narrow distance between sample points in the relatively sparse area and the relatively increased distance in the dense area through calculating improved distance between sample points, so that the overall distribution of sample points tends to be homogeneous, which is helpful to the classification of sample set; namely, the change of distance makes categories much more classifiable and reduces dimension reduction effects of the neighbor points. Figure

Effect diagram of distance measurement change.

The molecules of original Euclidean distance are constant in the homogenization distance formula (

Sparse area:

Dense area:

Thereinto,

From formula (

Locally linear embedding method is a higher efficient manifold method in nonlinear data dimension reduction method. But it is a nonlinear dimension reduction method without supervised learning in essence. The inadequate use of the samples category information results in a certain influence on the classification accuracy, which will not achieve an optimal effect if it is applied to areas such as classification. Therefore, de Ridder et al. [

From formula (

Locally linear embedding algorithm of homogenization distance by the change of distance makes categories much more classifiable. But the improvement of its classification performance is limited, and it is also a kind of unsupervised algorithm. In order to promote the fault recognition rate and stability of method, supervised learning mechanism is introduced to the HLLE, and then locally linear embedding algorithm of supervision and homogenization distance algorithm (SHLLE) is proposed. On the basis of homogenization distance, to increase the category information of sample points by supervised learning, so that the same fault category are gathered and heterogeneous fault are scattered. The main steps of method are as follows.

Set neighbor points

Calculate the homogenization distance. For a given data set

Add sample information and select local neighbors. According to formula

Calculate and reconstruct weight matrixes. Calculate locally optimal reconstruction weights of sample points and make the reconstruction error minimized. That is, acquire the optimal solution

Calculate low-dimensional embedding matrix

Comprehensive fault simulation test bed of Spectra Quest Company (USA) is used as experimental platform. Specifically, fault simulation experiment system is composed of Spectra Quest integrated fault simulation test bench and PULSE data acquisition system. As shown in Figure

Rotor system fault simulation experiment.

Through the simulation of the fault of the rotor system which is normal, unbalanced, misalignment, and loose in the pedestal at the running speed of 10 Hz, 20 Hz, and 30 Hz and using a total of six acceleration sensors at two bearing seats on three directions to carry out the vibration signal acquisition, a total of 144 groups of data signal were obtained.

Merging and reconstructing raw signal data, through analysis and comparison, a set of data 1 is shown as vector index of fault feature which could represent normal, misalignment, and unbalanced fault. Another group of data 2 is shown as vector index of fault feature which represent loose, misalignment, and unbalanced fault. Then two groups of the data space are filtered and extract 8 time domain parameters to form the original featured space which can be rolled into two

The analysis and comparison of classification and results of fault diagnosis between LLE, HLLE, SLLE, and SHLLE are as follows.

As shown in Figures

Fault recognition effect of data 1.

LLE

HLLE

SLLE

SHLLE

Fault recognition effect of data 2.

LLE

HLLE

SLLE

SHLLE

In Figure

In Figure

Two-dimensional map of data set according to various algorithms cannot fully reflect effect of fault recognition, because some methods also can be used to classify and identify the fault when the number of neighborhood is small. Therefore, a comparative analysis of the recognition rate changing with the neighbor points was shown below.

Figure

Recognition rate based on LLE, HLLE, SLLE, and SHLLE.

Data 1

Data 2

Figure

SHLLE algorithm still has a soft spot for recognition rate of loose, misalignment, and unbalanced faults which reaches 100% firstly and keeps stable after identifying one case of fluctuation when

In conclusion, fault diagnosis based on SHLLE algorithm has superior performance compared to the other LLE algorithm. SLLE algorithm is suitable for fault identification of which the neighborhood is slightly larger, and it is relatively stable. However, SHLLE algorithm has optimal performance, of which fault identification is more stable than others.

The paper researches on rotating machinery. In order to get better recognition effect, LLE of homogenization distance (HLLE) and LLE of supervision and homogenization distance (SHLLE) are proposed. Proving the validity of the fault diagnosis by simulating rotor system failure experiment, the following conclusions are reached.

In two-dimensional map of each algorithm for two types of data set, overlapping phenomenon exists between unbalanced fault and misalignment fault when using LLE, HLLE, and SLLE methods. However, SHLLE has a strong advantage and is more effective in fault classification.

In the map with the changes of neighbor points of each algorithm for two types of data set, SLLE algorithm is suitable for the fault diagnosis when the number of neighbor points is slightly larger. However, the fault diagnosis of SHLLE algorithm has superior performance compared to the other LLE algorithm, even fault identification is more stable.

The authors declare that there is no conflict of interests regarding the publication of this paper.

Financial support from National Natural Science Foundation of China (51175170), The Industrial Cultivation Program of Scientific and Technological Achievements in Higher Educational Institutions of Hunan Province (10CY008), Aid Program for Science and Technology Innovative Research Team in Higher Educational Institutions of Hunan province are gratefully acknowledged.