An Approach to Intelligent Fault Diagnosis of Cryocooler Using Time-Frequency Image and CNN

Cryocooler plays an essential role in the field of infrared remote sensing. Linear compressor, as the power component of the cryocooler, will directly affect the normal operation and performance of the detector if there is a fault. Therefore, the intelligent fault diagnosis of the linear compressor is of great significance. An intelligent fault diagnosis method based on time-frequency image and convolutional neural network is proposed to solve the problems of piston and cylinder friction, mass imbalance, and plate spring distortion in the linear compressor. Firstly, the wavelet transform time-frequency analysis method is used to generate the corresponding time-frequency image. Convolutional neural network (CNN) is used to automatically extract features of time-frequency images, so as to realize the classification of various fault modes. The results of simulation experiments show that the method can identify several fault modes of the linear compressor with 95% accuracy.


Introduction
With linear motor drive, plate spring support and gap seal technology are used to ensure long life and high reliable application requirements for the compressor [1][2][3]. As the main moving parts of the linear compressor, piston, plate spring, and coil are considered the core of the linear compressor which are most vulnerable to failure. It is very important to identify and locate the faults of the linear compressor accurately and quickly to prevent the failure of the linear compressor [4][5][6].
At present, the fault diagnosis of mechanical equipment is mainly focused on rotary machinery, and reciprocating compressor mechanical fault diagnosis work is relatively few. Vibration analysis is widely used in the field of mechanical equipment fault diagnosis. e traditional fault diagnosis methods mainly use signal processing methods such as fast Fourier transform (FFT) and short-time Fourier transform (STFT) to extract fault features. Intelligent fault diagnosis is a new developing direction of mechanical fault diagnosis. Artificial neural network and support vector machine are the two most popular methods. For example, Jamadar et al. extracted 24-dimensional characteristic parameters to describe the working state of the bearing and adopted BP neural network to classify various faults. Ali used two new features and 17 characteristic parameters to judge the health state of the bearing and then established an artificial neural network (ANN) to identify fault types. Chen et al. proposed a feature extraction and selection method and applied ANN to the diagnosis of fault severity. But the construction of eigenvectors is affected by the uncertainty and bias of domain experts.
In this context, deep learning comes into play, and its main benefit is that the deep learning approach is able to learn the nonlinear representation of the raw signal to a higher level of abstraction and complexity that is independent of the contact with the human engineer guiding the learning. Since 2015, deep learning has been successfully applied to the diagnosis or classification of vibration signals of mechanical equipment [7]. Wang et al. proposed to use wavelet scale image as the input of CNN and used a series of 32 × 32 images to detect faults in a group of vibration data. Li et al. studied the effect of a raw signal containing noise effect on CNN training. e time-frequency image obtained by the short-time Fourier transform is used as the input layer of the CNN. For the gearbox vibration data, Chen et al. input the traditional feature structure as the feature vector into the convolutional neural network (CNN) structure composed of a convolutional layer and a pooling layer.
A diagnosis method based on time-frequency image and convolutional neural network is proposed for linear compressor faults. e main structure consists of two-layer convolution layer and two-layer pooling layer. Continuous wavelet transform is used as image input of convolutional neural network. is method is selected because it provides suitable output for complex high-dimensional representation without additional feature extraction [8][9][10]. e rest of this article is organized as follows. Section 1 provides an overview of deep learning and CNN. Section 2 gives a brief overview of the basic principles of the timefrequency analysis method of continuous wavelet transform. Section 3 outlines the CNN architecture built to complete the diagnostic task of fault detection. Section 4 summarizes the results of applying this method to the measured linear compressor fault dataset.

Convolutional Neural Network
In the CNN structure, a neuron area in the input layer is connected to a neuron in the hidden layer, which is called the local receptive field. For each neuron in the hidden layer, the local receptive field and bias are the same. Different from traditional neural networks, CNN shares weights and deviations in the whole input layer and hidden layer. e mathematical expression of weight sharing filtering is shown in the following formula: where W l,m represents the shared weight, b represents the deviation, a j+l,k+m is the input activation value at a certain position, and n is the size of the filter. Pooling layer is usually followed by a pooling layer in each convolutional layer, which can increase the robustness of the network while reducing the input size and network parameters of the next layer. e network structure proposed in this paper uses maximum pooling to extract features. For the convolution layer, if there are M feature mappings as input and N filters, then the output mapping of the x l j layer can be calculated according to the following formula: where f represents the activation function; x l−1 i is the i-th feature map; k l ij is the j-th filter kernel connected to the i-th input mapping; and b l j represents the bias corresponding to the j-th filter.
us, N feature maps are obtained as the output. Assuming the filter size is s × s, we can use formula (3) to calculate the number of all parameters of the convolution layer: e convolution operation is shown in Figure 1. After the convolution operation and adding the corresponding bias, an activation function is used to calculate the output mapping. e commonly used activation functions of neural network include logistic function, hyperbolic tangent function, and rectifying linear unit (ReLU) function, as shown in equations (4)-(6): ere are two types of pooling operations: maximum pooling and average pooling. e maximum value of the local region in the feature map is calculated by maximum pooling, and the average pooling unit is calculated by average of this region. e operation process of pooling is shown in Figure 2.
After the input image is extracted by multilayer convolutional layer and pooling layer, the obtained feature map is input to the full-connection layer for further feature extraction. In the whole convolutional neural network structure, softmax is used as a "classifier" to realize the classification of fault signals.

Time-Frequency Analysis Method
e time-frequency transform represents the joint distribution information of the signal in time and frequency at the same time. Because the fault vibration signal of linear compressor is nonstationary, the traditional frequency analysis method is unable to obtain fault characteristic information comprehensively. e time-frequency analysis method has good effect on nonstationary signal processing. e commonly used time-frequency analysis methods are wavelet transform, short-time Fourier transform, s-transform, and so on [11][12][13][14].

Wavelet Transformation.
e CWT of signal x(t) can be realized by the convolution operation of signal x(t) and   2 Computational Intelligence and Neuroscience complex conjugate of a set of wavelets, whose expression is as follows: where α and b represent expansion and translation factors, respectively. ψ * (•) is the complex conjugate of the scaling and translation wavelet functions ψ(•). According to the definition of wavelet transform:  where C ψ is a constant; |wt(s, τ)| 2 /C ψ s 2 can be regarded as the energy density function of the time-scale plane; and |wt(s, τ)| 2 ΔsΔτ/C ψ s 2 represents the total energy in the domain centered at time intervals Δs and scale intervals Δτ, which is centered on point (s, τ). e difference between CWT and Fourier transform is that in CWT, the wavelet family replaces the sines and cosines in the Fourier transform as the basis function. Because the family of wavelets contains two parameters (expansion factor and translation factor b), a signal with a family of wavelets can be projected onto a two-dimensional, time-scale plane rather than converted into a one-dimensional plane using the Fourier transform. e wavelet coefficient W(α, b) represents the signal s(t) and the similarity measurement of the analysis wavelet ψ(t) at a series of different scales defined by the parameter α and at different time positions defined by the parameter b. e above formula indicates that wavelet analysis belongs to time-frequency analysis, or more appropriately, time-scale analysis, which can reflect time-frequency information of signals. erefore, it is widely used in the field of unsteady signal analysis and mechanical equipment fault diagnosis.

Time-Frequency Analysis of Linear Compressor Based on Continuous Wavelet Transform.
e input of the convolutional neural network structure proposed in this paper is a two-dimensional grayscale graph. erefore, it is necessary to first convert the original vibration signal into time-frequency diagram for the six common linear compressors: normal state, mass imbalance of the rotor (slight), mass imbalance of the rotor (moderate), mass imbalance of the rotor (severe), dynamic and static friction, and plate spring distortion.
e time-frequency graphs corresponding to them are converted by continuous wavelet transform, as shown in Figure 3.

Data Preprocessing.
e method proposed in this paper needs to preprocess the original data and convert them into image format. e input of CNN should be in the form of m × n × k matrix. In the field of image processing, the value of k is usually 3, representing the three channels of the color image. In order to simplify the calculation, this paper adopts the grayscale graph as the input, so the value of k is 1. is paper uses the time-frequency analysis method of wavelet transform, and the specific operation is shown in Figure 4.
Firstly, the original signal is converted into time-frequency graph by wavelet transform. e information content of color pictures is too large, and when the picture is recognized, it is actually enough to use the information in the grayscale image, so the purpose of image grayscale is to improve the speed of operation. en, MATLAB was used to convert the time-frequency diagram into a grayscale diagram. To reduce the computational load and facilitate CNN training, the image size was compressed to 32 × 32.

Proposed CNN Structure.
Based on the limited amount of data and insufficient computing power, the network we designed is not complicated. As shown in Figure 5, C1 is the first convolutional layer. e number of convolution kernels is 6, and the size of convolution kernels is 5 × 5; P1 represents the pooling layer of the first layer. e pooling layer adopts the maximum pooling method, and the size of the pooling area is 2 × 2. C2 represents the second convolution layer, the number of convolution kernel is 12, and the size of convolution kernel is 5 × 5; P2 represents the second pooling layer, which adopts the maximum pooling, and the size of the pooling area is 2 × 2. F is a fully connected layer with 120 nodes, and softmax is the output layer that contains six classes. Softmax sorter was used and drop technology was used in the whole connection layer (p was set to 0.4). e training model adopts the small batch gradient descent method with batch size of 50 and learning rate of 0.01, so as to minimize the cross entropy loss.

Fault Data Acquisition and Analysis.
In order to verify the superiority and effectiveness of the proposed method, the experimental data in this paper were collected from the simulation test platform for fault diagnosis of small cryogenic refrigerator designed by Shanghai Institute of Technology, Chinese Academy of Sciences, as shown in Figure 6. e fault test used acceleration sensor and data acquisition software of mpichina company in Germany. e sampling frequency is 1024 Hz, and the sampling time is 2 s, that is, 2048 points are sampled at one time. e vibration data of the refrigerator under six fault states were tested: normal state, mass imbalance of the rotor (slight), mass imbalance of the rotor (moderate), mass imbalance of the rotor (severe), dynamic and static friction, and plate spring distortion, as Computational Intelligence and Neuroscience shown in Table 1. In each test, a single failure is guaranteed while other parts are normal. Mass imbalance fault of the rotor is due to machining errors and prolonged wear. For the mass imbalance fault of the rotor, 20 g, 40 g, and 60 g mass blocks are added on the piston connecting rod at one end of the linear compressor to realize the mass imbalance of the motors on both sides of the linear compressor. e dynamic and static rubbing fault is due to the small gap between the two and the contact. e dynamic and static rubbing fault is achieved by changing the piston and the inclination angle between the piston and the cylinder. e clearance between the piston and the cylinder is usually only 10 to 20 microns. Due to the small gap between the piston cylinders, when the cylinder is skewed or there is debris, it will cause friction between the piston and the cylinder. e friction between the piston and the cylinder can be realized by controlling the piston's inclination angle. For the plate spring distortion fault, when installing the single motor plate spring, loosen the nut of the outer plate spring, tighten the center nut with torque wrench to make the plate spring twisted, and then lock the nut of the outer ring. e fault diagnosis process of linear compressor is shown in Figure 7. 1000 samples are collected for each fault, and each signal length is 2 s. 70% data are randomly selected as training samples, and 30% data are selected as test samples.

Parameter Selection of CNN.
e selection of appropriate parameters is very important for the training of CNN. For different sample sets, the selection of optimal parameters is also different. When training CNN, it is an important program to find the optimal parameter of corresponding dataset.

Learning Rate.
In the process of CNN training, stochastic gradient descent (SGD) is selected as the optimizer, and the learning rate is a very important parameter, which directly affects the updation of weights and error convergence     Computational Intelligence and Neuroscience [15]. erefore, choosing an appropriate learning rate is crucial to improve the efficiency of network training. Setting a large learning rate at the beginning speeds up convergence and avoids getting stuck in local minima, and then a small learning rate is set so that the model can converge. In the process of CNN training, the optimal learning rate was selected by comparing the loss and accuracy under different learning rates. e results are shown in Table 2.
According to the analysis of Table 2 and Figure 8, with the increase of learning rate, the training accuracy and test accuracy increase. When the learning rate increases to 0.001, the accuracy reaches the maximum; when the learning rate continues to increase, the accuracy decreases. erefore, the learning rate proposed in this paper is equal to 0.001.

Batch Size.
In the training of CNN, due to the large amount of sample data, limited computer configuration, and other conditions, we cannot let all the samples be used for network training at the same time. erefore, it is usually best to divide the sample into medium-sized chunks. e size of this block is called the batch size. In this experiment, we used different batch sizes to train CNN, and other parameters were the same. Losses, accuracies, and time costs are shown in Table 3.
It can be known from Figures 9 and 10 that the smaller the batch size is, the higher the prediction accuracy will be and the longer the iterative calculation time will be. When the value of batch size is 10, the accuracy decreases less when comparing smaller size, and the training time decreases to a    greater extent. When the value is greater than 10, the prediction accuracy decreases greatly, but the training time does not decrease significantly. To sum up, when the batch size of the CNN structural training parameter proposed in this paper is equal to 10, both the prediction accuracy and the training time are taken into account. Input the grayscale image into the CNN structure proposed in this paper and take the learning rate equal to 0.001 and the batch size equal to 10. e final predicted results are shown in Figure 11. e abscissa is shown from left to right: normal state, mass imbalance of the rotor (slight), mass imbalance of the rotor (moderate), mass imbalance of the rotor (severe), dynamic and static friction, and spring distortion.
Using the CNN structure and the optimized parameters, the recognition rate of the six states of the linear compressor is increased to 95.5%.

Conclusion
Obviously, the following conclusions can be drawn from the above analysis results. By using WPT to transform the original vibration signal into time-frequency diagram, more and richer fault information can be obtained. At the same time, it avoids the tedious manual extraction and selection of fault features and simplifies the fault diagnosis program. Secondly, the optimized structural parameters of CNN can improve the fault identification accuracy to 95.5% and shorten the training time.
e validity of the method is verified by the measured fault data of the linear compressor.
Future work will include more experimental tests to further understand the limitations of CWT and CNN methods [16][17][18], especially for some more complex failures, such as multiple failures occurring simultaneously and coupling with each other. For the determination of optimal parameters, it is still a problem to be further studied; especially when using a deeper network structure or studying another completely different fault, the corresponding optimal parameters may also change.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest. Computational Intelligence and Neuroscience 7