Identification of rotary machines excitation forces using wavelet transform and neural networks

Unbalance and asynchronous forces acting on a flexible rotor are characterized by their positions, amplitudes, frequencies and phases, using its measured vibration responses. The rotary machine dynamic model is a neural network trained with measured vibration signals previously decomposed by wavelets. A typical compaction ratio of 2048:4 is achieved in this application, considering the stationary nature of the measured vibrations signals and the shape of the chosen wavelet function. The Matching Pursuit procedure, coupled to a modified Simulated Annealing optimization algorithm is used to decompose the vibration signals. The performance of several neural network with different input database sets is analyzed to define the best network architecture in the sense to achieve successful training, minimum identification error, with maximum probability to give the correct answers. The experiments are conducted on a vertical rotor with three rigid discs mounted on a flexible shaft supported by two flexible bearings. The vibration responses are measured at the bearings and at the discs. A methodology to balance flexible rotors based on the proposed identification methodology is also presented.


Introduction
Vibratory mechanical systems always present non linearities that are responsible for the differences between experimental responses and those obtained through the simulations of any adopted linear model.In order to solve the inverse problem associated to the identification of excitation forces of a dynamic system, a neural network model represents a viable alternative.It has robustness and is capable to represent any nonlinear system, but its application is restricted to a closed domain for the inputs and outputs defined during the training of the neural network [5].
The identification of excitation forces of linear mechanical systems has been recently studied.Steffen [10] reconstructed the excitation forces of a vibratory system, represented by a finite element model, using orthogonal functions.Rade [7] applied a deconvolution technique to identify excitation sources of a mechanical system, previously characterized by modal analysis.Santos [9] studied the identification of the defects on a ball bearing applying a neural network model trained with the vibration signals preprocessed by wavelet transform, using the Morlet function as the mother wavelet.
The methodology to compact data using wavelet transform has been applied to signal transmission and to pattern recognition.This technique is very powerful to reduce redundant information in signals that are used as inputs to neural network [3].
In this paper, unbalance and asynchronous forces acting on a flexible rotor are characterized by their positions, amplitudes, frequencies and phases.The system dynamics model is a neural network trained with measured vibration signals, previously decomposed by wavelets.The typical compaction index of 2048:4 is achieved in this application, considering the stationary nature of the measured vibrations signals and the shape of the chosen wavelet function.
The performance of different neural network with different input database sets is analyzed to define the best network architecture in the sense to achieve successful training, minimum identification error and with maximum probability to give the correct answers.
A methodology to balance flexible rotors based on the proposed identification methodology is presented.For the flexible rotor used in the experimental apparatus, the unbalance is predominant at the rigid disc stations.The applied methodology was able to identify the position of the unbalanced discs and the unbalance magnitudes and phase angles relative to an angular reference frame.The rotor remains balanced independently of its angular velocity since the correction masses are installed in opposition to the unbalance excitation forces.

Fundaments of the decomposition of vibration signals using wavelets
Several authors, such as Lepore [3], used wavelet functions to compress vibration signals and to reduce data redundancy.The signals processed by wavelets were used as the input database training set of a neural network applied to recognize fault patterns of a mechanical system.The design of the neural network resulted an optimized architecture, with reduced number of neurons, as consequence of the choice of a suitable wavelet function, allied to the ability of the wavelet transform to compact and to reduce redundant information of the input data.The neural network training process effort was significantly reduced and its performance and precision were also improved when compared to other architectures trained with input signals that are not preprocessed by wavelets.
Vibration signals measured on mechanical systems frequently contain noise and present redundant information.The direct application of these signals, expressed in the time or frequency domains, to a neural network is not a viable task, as shown by Oliveira [5].He proposed a methodology to compact the data using a statistic approach based on the analysis of the covariance matrix constructed with the input signals.
The shape of the mother wavelet function plays an important role to represent precisely the response of a vibrating mechanical system.The use of general waveforms can mathematically produce a good result,but the association of the physical system properties with the wavelet parameters is very difficult to achieve.Lepore and Santos [1] applied wavelets to identify modal parameters of a highly damped mechanical system, which has close spaced natural frequencies.The adopted mother wavelet function was similar to the system impulse response.
The choice of the correct wavelet function, over a wide set of available functions, determines which signal patterns can be represented by a finite set of linear combination of the wavelet.Consequently, the compaction level, the minimum required number of wavelets, the precise representation of the vibration signal, and the identification of physical parameter values, depend strongly on the selected wavelet function.
Using the wavelet transform it is possible to identify special patterns contained on vibration signals, resulting compacted data that can be used as inputs to reduced architecture neural networks, designed to classify and to identify excitation forces acting on a rotary machine.
The identification of unbalance and asynchronous excitation forces applied to a flexible rotor is analyzed.The unbalance is located at several discrete positions along the shaft, and the asynchronous forces are always applied at the rotor bearings.With these two classes of excitation forces, steady vibration responses can be measured at any location of the rotor, and normally are periodic signals containing multiple harmonic components of the rotor angular velocity.To analyze these type of signals the wavelet function must use as variable parameters: the frequency, f, an exponential decay coefficient, ξ, and a phase angle, φ.
The proposed mother wavelet, presented by Eq. ( 1), represents an orthogonal and orthonormal set of functions, and can be used to univocally represent the rotor vibration signals [9].
To decompose the vibration signals in terms of the atoms defined by Eq. ( 1), the Matching Pursuit algorithm proposed by Mallat and Zhang [4] is coupled to the Simulated Annealing optimization algorithm modified by Lepore and Santos [2].This technique provides good convergence to the optimal values of the wavelet design parameters, being insensitive to local minimal.To refine the solution around the optimal, a classic steepest descent algorithm is applied at the final stage of the solution.The signal components are successively extracted using the Eq. ( 2), where < , > indicates the internal product, n is the iteration step and R f is the decomposed signal and its residue.
The design variables f, ξ and φ, which characterize the mother wavelet function ψ f,ξ,φ , are determined by the proposed optimization procedure applied to the objective function defined by Eq. ( 3)

Neural network basic concepts and the Back Propagation training algorithm
A neural network can be considered as a set of nonlinear equations with memory capacity.It can only be used to interpolate results on a closed domain limited by the set of data used on its training.The basic processing unit of a neural network is the neuron activating function which produces an active or non active condition at its output, depending on the pattern applied at its input.The activation functions normally present nonlinear behavior such as the step, the sigmoid and the ramp functions.The neural network synapses are weighting values applied to the inputs of each neuron, defining paths for the information propagation through the network.The set of values for the synapses, obtained after the training process, provides the memory capacity of the neural network [5].
The number of layers, the type of connections between neurons and the way that data flows through the network characterize the neural network architecture.For unidirectional or feed-forward neural networks, the data is presented to the input layer, and flow through the intermediate or invisible layers were they are processed, and result the desired answers at the output layer.On a neural network designed with the feedforward topology, the neurons of the same layer don't exchange information and also don't receive data from neurons from the forward layers.
The definition of the neural network optimal architecture is an inverse problem with difficult solution.Using insufficient number of neurons at the invisible layers a state of neural paralysis is promoted, in a such way that the neural network can not represent an information or can not distinguish between different events.The overfitting problem will occur when an excessive number of neurons is adopted, and the training process effort is significantly increased.The optimal design of neural network architecture is not discussed here.

The back-propagation algorithm
RUMERLHAR [8] presents an algorithm to adjust the synapses weights from the input through the invisible layers up to the output layer.The error of each invisible layer is calculated by back-propagating the error from the output layer.For this reason, the method is known as the back-propagation learning rule.This algorithm can be considered a generalization of the delta rule, used on multi layers neural network, with nonlinear activation functions [5].
Adjusting the matrix of the synapses weights using sets of known inputs and outputs does the training process.This problem can be solved by an optimization technique.The changes in the weight values are proportional to the residual error at each layer.This constant of proportionality is known as learning rate.If a fraction of the gradient of the error is preserved from the last iteration to the next, an inertia factor is included on the optimization process.The correct choice for the value of the inertia factor helps the optimization algorithm to escape from a local minimum of the objective function.
In this paper an initial weight matrix is calculated by the Simulated Annealing method, modified by Lepore and Santos [2].This procedure overcome the illconditioning numeric problems and also can surpass the extreme number of local minimum of the error function.
As defined by Eq. ( 4), the error function is the average of the errors calculated for each vector, or experiment, that composes the training data set.
The adopted optimization procedure permits to use a higher learning rate and lower values for the inertia factor when compared to those used by conventional optimization techniques.Reducing the inertia factor and increasing the learning rate provide better convergence rate and a reduction of final error of the training process [2].

The modified simulated annealing algorithm
The proposed optimization procedure includes fundamental modifications to the classic Simulated Annealing algorithm with respect to the sampling strategy of the design variables.The Metropolis criterion is also modified to increase the convergence of the ob-jective function to the global minimum [2].With these modifications, the algorithm is self-adjustable to the characteristics of the error function associated with the training of the neural network.This property is very useful to the training process, since the shape of the error function is always modified when an existing experiment is removed and a new one is included in the training data set.The error function also changes when the neural network topology is modified, since neurons are included or removed from the invisible layers.This occurs when the neural network architecture is to be optimized.
The following strategy is implemented: The initial size of the training data set [X] is chosen.It is characterized by a fixed number of n experiments (x i , i = 1, n).This set is used to estimate the statistic correlation index [r] between each normalized design variable and the error function (f i ).The calculated r i values are used as the standard deviation of the variation that is applied to the correspondent design variable, during the search procedure.
The correlation index is determined using the covariance matrix [6] as stated by Eqs ( 5) and ( 6), where r is the correlation coefficient between the design variables and the cost function.
The correlation index is similar to the sensitivity analysis used in gradient based optimization procedures.When the value of r i ∈ [0, 1] approaches zero there is practically no correlation between the ith design variable and the error function.This occurs if the variable is close to its optimum value.This property promotes the refinement of the solution, since the ithvariable perturbation is reduced when r i is closed to zero.For each new value of the design variable a new sample of the objective function is calculated.To be included in the sample set the value of f must be close to the mean value of all objective functions calculated on the previous iteration steps.
To fulfill this requirement, a new criterion similar to the Metropolis is adopted.It is assumed that the objec-tive function variations follow a gaussian probability density function, as represented by Eq. (7).
With this procedure the global optimum is achieved with high probability, even if the objective function presents several local minimal.This is the case of the error function given by Eq. ( 4).If the standard deviation grows in a region containing the minimum, the mean value attracts the new perturbations of the design variables, reducing the probability of the acceptance of a design located outside that region.
Finally, when the last n samples are at the region of minimum, the resulting reduction of the standard deviation value forces the algorithm to escape from the local minimum, since the value of correlation index increases to one.This can represent a critical situation because the global optimum is not precisely found.To improve the precision after the global optimum region is reached by the Simulated Annealing procedure, the optimization final stage applies a conventional steepest descent method to refine the solution.

Case study and results
The experimental apparatus, represented in Figs 1  and 2, is used to validate the proposed methodology to detect and quantify unbalance of the discs and asynchronous forces applied to the bearings.
A finite element model of the complete system is used to design the rotor, the bearings housing suspension, and the external supporting structure.Therefore, the main structure does not have natural frequencies coincident with the first three critical velocities of the rotor.This model was validated by modal analysis.
Each shaft ball bearing is mounted on a rigid housing that is connected to the external frame by two mutually perpendicular sets of four parallel spring blades.The adopted configuration for the bearings suspension uncouples the vibrations in the x and y directions.
The rigid discs are mounted on the shaft by means of conical sleeves.This solution allows testing several rotor geometric configurations, with minimum design modifications, since the discs can be easily positioned along the shaft.
An AC induction motor, controlled by a frequency inverter, is used to drive the rotor, so that, the angular speed can be adjusted with a resolution of 1 Hz.This inverter operates with a pulse modulation frequency equal to 16 kHz and all electric cables are shielded and properly grounded, to reduce the electromagnetic induction on the measuring instruments.A very flexible link between the motor and the rotor shafts is used to reduce the transmission of lateral vibrations.Proximity inductive sensors, with sensitivity equal to 1.0 Volt/mm measure the vibration signals directly at the rotor discs.
The vibrations at the bearings are measured by piezoelectric accelerometers.By analog integration the acceleration is converted to displacement, with an overall sensitivity of 1 Volt/mm.
The proximity sensors are the probes 1 to 3, and the accelerometers are the probes 4 and 5, as indicated in Fig. 2. The angular position and velocity of the rotor are measured from two sources of TTL pulses generated by optic encoders.The first source provides one pulse per revolution that is used to trigger the acquisition system and is the reference for the phase measurements.The second TTL source, that generates 120 pulses per revolution, is used to position the unbalance and the corrective masses at the discs, with a resolution of three degrees.
All analog signals are sent to the HP 36650 that is a simultaneous eight-channel data acquisition system.The sampling frequency is set to 1024 Hz, and 2048 points are acquired per sample.The time domain digitized data is transferred to a workstation HP725i.
Adding known masses to the discs the unbalance excitations are generated.
Two electromagnetic exciters are used to apply the asynchronous excitation at the bearings.The upper bearing exciter acts perpendicular to the plane that contains the proximity probes and the lower bearing exciter apply forces parallel to the proximity probes.Piezoelectric force sensors with sensitivity equal to 100 mV/N measure the applied forces.

Design of the experiments
In order generate the input data to the classification and identification neural networks a set of experiments are conducted using the testing apparatus shown in Figs 1 and 2. They include different combinations of amplitude, phase, frequency and location of the excitations applied to the rotor, according to the following description: -The angular velocity of the rotor is kept constant at 15 Hz in all experiments.This value is between the second and the third critical speed of the rotor.-Sixteen unbalance masses varying from 3.5 grams up to 13.A subset of experiments, collected from the global database, is not used in the training process of the neural network, but it is reserved to evaluate the performance of the previously trained neural network.The performance evaluation is measured by the mean square error (E p ), defined as the percentage of the actual neural networks error with respect to the correct values, obtained with the reserved group.Small values of the mean square error indicate better neural network performance.
The experiments are organized as the columns of the general matrix [P ] as shown in Eq. 8.A column contains the wavelet parameters grouped by each measuring channel, (chan i , i = 1, N).The same organization of [P ] is used as input to all neural networks designed to solve the classification and the identification problems.
An example that shows the capacity of the wavelet functions to produce compacted data and to remove noises from the signals is presented by Fig. 3.This signal is randomly selected from the asynchronous excitation training set, and is represented by only three atoms of wavelet functions.As the mother wavelet used in this analysis has four parameters: the amplitude, the phase, the frequency and the damping factor, then 1024 data points of the signal can be represented by 12 wavelet parameters, without loss any important information.The wavelet decomposition retained at least 92% of the RMS energy of the measured vibration signal.
The force identification problem was solved in two steps.A first neural network is designed to classify the force type, and other two neural networks are designed to quantify the force parameters associated with each type of excitation.

The classification neural network
Two groups of classification network are designed to accept measurements from two combination of measuring channels: (a) signals are measured by probes 3 and 5, located at one disc and at the lower bearing and, (b) signals come from probes 4 and 5, located at the bearings.The three neurons at the output layer distin- guish the following conditions: only unbalance; only asynchronous; or both excitations exist on the analyzed signals.
Four architectures were studied, differing by the number of neurons at the invisible layer.This approach permits to analyze the effect of the size of the invisible layer on the performance of the neural network, and also determines its sensitivity to the location of the measuring points.
All neural network were trained with a superset of inputs obtained from the combination of 60% of unbalance and asynchronous excitation experiments, including their extremes.The reminder experiments are used to validate the network performance.
Tables 1 and 2 present the results for the two groups of classification networks.To complete the training process a mean square error equal to 0.001 was imposed to all tested architectures.Different numbers of iterations are necessary in the training process to achieve the desired error, independently of the network architecture.This can be explained by the fact that the Simulated Annealing optimization technique is a non-deterministic procedure and depends strongly on the shape of the objective function.The networks of group (b), operating with the vibration signals measured at the bearings, produced better results than those obtained by the network group (a), but the differences in classification performance are lower than 2%.
Reduced architectures, such as 3×3×3, could be successfully trained and present good classification results.This is a direct consequence of proposed methodology that includes preprocessing the input vibration signals by the wavelet transform and the use of modified Simulated Annealing during the training process.An important fact is that this network architecture couldn't be trained using only a classical optimization algorithm, based on the steepest descent method.
When the two types of forces are simultaneously applied to the rotor, an amplitude modulation effect appears in the signal, with the carrier frequency equal to the angular speed of the rotor.The performance of the classification neural network is not affected by this modulation effect, which introduces a series of harmonic frequencies in the measured signals.

Identification of the asynchronous forces applied to the bearings
A neural network process the vibration signals generated by asynchronous excitations applied to the bearings and identifies the amplitude, the frequency and the point of application of these forces.The experimental signals used as the network input data are obtained by the procedures described in item 5.1.All tested architecture designs of the identification network use four neurons at the input and output layers.Several input signal combinations are studied but only those that produced the best results are presented by Table 3.The output layer gives four answers about the localization, the amplitudes and the frequencies of the asynchronous excitation forces.
To evaluate the performance of these neural networks the following criteria are adopted: -The number of iterations required to achieve the desired mean square error in the training process; -the percentage of correct localization of the force and; -the combined error (Ep) of the amplitude and frequency estimates.
The neural network with 12 neurons at the invisible layer, trained with input signals measured from all probes, presents best performance.
Using only two probes, and 8 neurons at the invisible layer, the influence of the measuring points on the network performance can be analyzed by looking at the last three rows of Table 3.For the network trained with probes 4 and 5 located at the bearings, the localization quality, the frequency error, and the combined frequency-amplitude error are very close to those of the best neural network, therefore the value amplitude error is doubled.
These results agree with the physical interpretation of the dynamic behavior of the rotor.Probes at the bearings receive more information about the asynchronous forces than the probes placed at the discs.
Some other neural networks are trained with the signals measured by only one probe, installed at the lower bearing or at the upper bearing, representing an extremely unfavorable situation.These networks performance, not shown in Table 3, are poor when the rotor is excited by forces applied simultaneously to both bearings.This behavior is due the strong influence of the force on the vibration measured at the same bearing.
Using probes 1 and 3 positioned at the discs the worst network performance was obtained.This can be explained by the fact that several vibration natural modes of the flexible rotor present nodes at the bearings.

Identification of the Unbalance forces applied at the discs
A neural network that identifies the mass magnitudes, their localization along the shaft and their angular positions, processes the signals generated by unbalance excitations applied to the three discs.The ex-  4, that contains network architectures with 5 and 10 neurons at the input layer and with 5 up to 25 neurons at the invisible layer.The influence of the number of inputs is analyzed by selecting signals from all five probes and sets of only two-probe combination.
The mean square error of the training process was set to 0.01 for all network architectures of Table 4.Other networks with larger number of neurons at the invisible layer were tested, but presented overfitting problems, resulting incorrect answers, even when the mean square errors of the training process was set less than 0.001.
To evaluate the performance of these neural networks the following criteria were adopted: -The number of iterations required to achieve the desired mean square error in the training process; -the percentage of correct localization of the unbalance; and -the combined error (Ep) of the unbalance magnitudes and phase angles.
The networks trained with signals measured by the five probes presented better performance than those trained with only two probes.Measurements done by two probes at the bearings gave better results than with probes locate exclusively at the discs.
Additionally, the architecture 10 × 5 × 5 always produced good results independently of the number of measuring channels applied as inputs in the training process.
The worst case occurred for the probes 1 and 3, with the 5 × 10 × 5 architecture, that produced a combined magnitude and phase error Ep = 1.25E-1, and 72% of correct localization of the unbalanced disc.All other architectures, presented by Table 4, have validation error in the order of 10 −2 , which is very low.This indicates the precision of the neural network to identify the unbalance forces applied to the flexible rotor.
The proposed methodology can be used to balance flexible rotors with concentrated inertia discretely distributed along the shaft.Considering that the neural network which models the rotary machine is previously available, the correct identification of the unbalance permits that the correction masses be placed in opposition to the unbalance masses.Consequently, the rotor remains balanced independently of its angular speed, since the unbalance excitations are individually canceled.

Conclusions
Neural networks with reduced architecture and improved performance can be successfully trained when the input data is previously compressed by the wavelet transform.The signal compression done by an adequate wavelet function naturally increases the signal to noise ratio and is able to retrieve only the desired information from the signal.So that, analog or digital filters can be eliminated in the preprocessing stage of the signals used as input database to a neural network.
The design of the neural network optimal architecture is an open problem.The results obtained with this research indicate that a reduced topology is always better, since it can be easily trained and generally provides good performance.The use of the wavelet transform to compress the input data to the network is fundamental to fulfill this task.
The training process of neural networks is improved by the application of the modified simulated annealing algorithm to the optimization of the network error function.This algorithm proved to be efficient, robust and less sensitive to the presence of local minimal.Therefore, the usage of simulated annealing combined to steepest descent algorithm at the final stage of the optimization procedure is a reliable approach to find the global minimum of objective functions that present large number of local minimal.The final results are obtained with good precision and without large The neural network identification methodology is efficient and robust to identify excitation forces in rotary machines, without any prior knowledge about the number of the forces, but requires the existence of a previously trained network.
The identification of the unbalance presented surprising good results, since the precision obtained for the magnitudes and their angular position were high, for almost all tested architectures.The correct localization of the balancing planes depends on the number and position of sensors that measure the vibrations used as inputs to the neural network.Evidently, the sensors should not be positioned close to nodes of the predominant modes of the flexible rotor at its operating speed.The best measuring points can be determined by an experimental modal analysis of the actual rotary machine or by simulation of a computational model.

Fig. 3 .
Fig. 3. Sample of a measured signal and its decomposition by wavelets.

Table 1
Training and validation results for the group (a) classification neural networks

Table 4
Training and validation results for the unbalance identification neural networks