Wheelset-Bearing Fault Detection Using Adaptive Convolution Sparse Representation

Wheelset bearings are crucial mechanical components of high-speed trains. Wheelset-bearing fault detection is of great signiﬁcance to ensure the safety of high-speed train service. Convolution sparse representations (CSRs) provide an excellent framework for extracting impulse responses induced by bearing faults. However, the performance of CSR on extracting impulse responses is fairly sensitive to inappropriate selection of method-related parameters, and a convolution model for representing the impulse responses has not been discussed. In view of these two unsolved problems, a convolutional representation model of the impulse response series is developed. A novel fault detection method, named adaptive CSR (ACSR), is then proposed based on combinations of CSR and methods for estimating three parameters related to CSR. Finally, the eﬀectiveness of the proposed ACSR method is validated via simulation, bench testing, and a real-life running test employing a high-speed train.


Introduction
Wheelset bearings are crucial mechanical components of high-speed trains, and their major roles are to transform the rotational motion of wheelsets to the linear motion of highspeed trains, transmit driving motor torques, and bear the vertical loads of frames and car bodies. During the long-term running process of a high-speed train, the complex dynamic actions on wheelset bearings inevitably lead to the initiation and further extension of wheelset-bearing faults and finally endanger train operational safety [1]. erefore, it is of great significance to detect wheelset-bearing faults to ensure the safety of high-speed train service.
Vibration-based analyses, as feasible and effective tools for the detection of wheelset-bearing faults, can provide fruitful feature information regarding the working status of a monitored bearing [2]. Once a defect appears on the surface of a bearing component, a series of impulse responses induced by the defect will be generated as the wheelset rotates. However, when the defect enters and leaves the bearing zone, the amplitudes of those impulse responses will be modulated. When bearing rollers slip, the repetitive frequencies of the impulse responses are modulated [3]. Under certain conditions, impulse response series with different resonance frequencies will be excited [4]. In addition, measured noise and strong wheel-rail interference submerge or pollute the weak impulse responses [5]. e spectra of the fault signals will be smeared. As a result, such problems make the detection of wheelset-bearing faults complex and difficult.
To effectively resolve those difficulties in detecting wheelset-bearing faults, many advanced signal processing methods have been proposed, which primarily include the filter-based high-frequency resonance technique [6], shorttime Fourier transform (STFT) [7], Wigner-Ville distribution (WVD) [8], empirical mode decomposition (EMD) [9], wavelet transform (WT) [10,11], and compressive sensing (CS) [12,13]. However, the inconvenient selection of centre frequency and bandwidth hampers the wide application of the filter-based high-frequency resonance technique [14]. STFT is inherently unsuited to analyse timevarying signals [15]. e cross terms of WVD on the multicomponent signals cause unexpected interference when detecting faults [16]. As a result, EMD, WT, and CS have been widely applied in the field of rotational machine fault detection and have become the leading algorithms for fault detection. EMD, as an adaptive signal processing method, is suitable for analysing nonlinear and nonstationary signals and can be used to decompose analysed signals into sets of intrinsic mode functions (IMFs) and residuals [17]. Hence, EMD has been employed quite successfully in the fields of fault diagnosis, failure detection, damage identification, and health monitoring [18]. However, EMD suffers from the lack of a theoretical foundation, and the definition of its IMF is still controversial [19]. In addition, measured noise and sampling errors easily result in the incorrect placement of signal extremes, and EMD-sifting processes based on the envelopes determined by incorrect extremes inevitably will generate inaccurate or even erroneous IMFs. us, EMD performs poorly when used for fault detection in low signalto-noise ratio (SNR) or weak transient situations [20]. An IMF could include several-mode resonance responses induced by bearing faults because of EMD mode mixtures, and the single-mode resonance responses induced by the bearing faults could be divided into the different IMFs because of the mode break. Mode problems caused by using EMD adversely affect fault detection performance [21]. To alleviate mode problems, variants of EMD (such as EEMD [22], BEMD [23], CEMD [24], and MEMD [25]) have been developed. However, there is still no theory that guarantees mode problems will be avoided.
WT can provide time-frequency information on the impulse response induced by bearing faults. e continuous wavelet transform (CWT) and discrete wavelet transform (DWT) methods have been successfully applied to fault detection for rotational machinery [26] given their multiresolution merits. e redundancy coefficients and huge computational costs of CWT restrict its wide application in practical engineering. With the fast iterative algorithm, the DWT has gained fruitful application in mechanical fault detection [27]. To improve the decomposition performance of WT for high-frequency bands containing rich fault modulation information, the wavelet packet transform was proposed [28]. However, the decomposition quality of a DWT heavily depends on the selection of the mother wavelet [29]. e shift-variance characteristic of most DWTs causes the impulse responses to be distorted [30]. e fixed dyadic frequency partitioning of DWTs easily generates scale mixtures and scale breaks [31]. e low oscillation of the wavelet basis in a DWT weakens its ability to sparsely represent impulses with highly oscillatory characteristics [32]. To well match vibration signals containing multimode resonance responses induced by composite faults, the multiwavelet packet was proposed [20]. To realize the shiftinvariant function of WTs, the dual-tree complex wavelet [30] and higher-density dyadic wavelet transform [33] were developed. To improve the flexibility of the frequency partitioning of WTs, overcomplete rational dilation discrete wavelet transform was proposed [31]. To adjust WT oscillations, the wavelet transform with tunable Q-factor [32] and ensemble superwavelet transform [34] were proposed. erefore, there are no WTs that have comprehensive performance (such as self-adaption, shift invariance, flexible partitioning of frequency bands, and tunable oscillations).
Sparse representation mainly consists of sparse coding and dictionary design. Sparse coding models the analysed signals as linear combinations of atoms in a redundant dictionary. Dictionary design is, as much as possible, adapted to the features of the vibration signals to well match the high-level structures of the impulse responses embedded in the vibration signals. Sparse representation is widely employed and has yielded state-of-the-art results in multiple fields of machine learning, neuroscience, signal processing, image and audio processing, classification, and statistics [35,36]. In terms of sparse coding, its exact resolution is usually an NP-hard problem [37]. us, some pursuit algorithms are considered instead, mainly including matching pursuit [38] and basis pursuit [39]. For dictionary design, the methods for constructing dictionaries include those for explicit and implicit dictionaries. An implicit dictionary is a dictionary that is directly inferred from input data using machine learning techniques, including regular dictionary learning [40] and shift-invariant dictionary learning (SIDL) [41,42]. By virtue of the excellent performance of sparse representation for representing signals (such as flexibility, sparsity, and superresolution), sparse representation-based fault detection has become increasingly popular in the field of mechanical fault detection. Sparse representation based on matching pursuit and explicit dictionaries has been used to extract impulse responses induced by rotational machine faults for gear and bearing fault detection [43]. Sparse representation, through the combination of basis pursuit and explicit dictionaries, is employed to capture impulse patterns for rotating machine fault diagnosis [44][45][46]. Dictionary learning has advantages and potential for mining high-level structures embedded in signals. A nonlocal sparse model based on regular dictionary learning has been proposed [47]. SIDL can obtain single-value and jointly optimized results over the entirety of vibration signals, unlike regular dictionary learning. Sparse representation based on pursuit algorithms and SIDL can be exploited to extract impulse responses submerged in the vibration signals of rotating machine systems [41,42,48,49]. ose findings show that the fault detection performance of sparse representation is superior to those of EMD and WT. In addition, group sparsity is applied to bearing fault detection [50] but requires prior knowledge of the impulsive periods.
Convolution sparse representation is another name for SIDL-based sparse representation. A new approach based on the alternating direction method of multipliers was proposed in 2016 [51] and is called CSR (convolution sparse representation based on the alternating direction method of multipliers) in this paper. CSR realizes not only SIDL but also shift-invariant sparse representations (SISC) of vibration signals. e strategy of the interleaved optimization between SIDL and SISC rather than the alternating optimization of SIDL and basis pursuit in traditional SIDL-based sparse representation leads to higher computational efficiencies and more accurate convolution sparse representations [52]. CSR has been applied to the detection of faults in wheelset bearings [5]. Although CSR has obtained satisfied fault detection results, convolution sparse representation framework or model for representing impulse response series has still not been discussed, and its fault detection performance is sensitive to inappropriate selections of method-related parameters. In view of these two unsolved problems, a convolutional representation model of impulse response series induced by bearing faults is proposed. A novel fault detection method, which is named adaptive CSR (ACSR), is then proposed in this paper.
is paper is organized as follows. e convolutional representation model for characterizing impulse response series induced by bearing defects is proposed in Section 2. Section 3 introduces the basic theory of CSR and discusses CSR-related parameters. A novel fault detection method, ACSR, is proposed in Section 4. A simulation-based verification of ACSR is conducted in Section 5. An experimental validation of ACSR is performed in Section 6. Section 7 concludes the paper.

Convolutional Representation Model of Impulse Response Series
When there is a defect on the surface of a wheelset-bearing component, an impulse response series (IRS) will be generated as the wheelset rotates. An impulse response caused by the defect can be modelled as the impulse response of a single degree of freedom mass-spring-damper system [3]: Hence, the IRS with fault-characteristic frequency of T − 1 p can be represented as [3] where A m is the amplitude of the mth impulse response, u(t) is a unit step function, T p is the time period corresponding to the fault-characteristic frequency, β is the structure damper coefficient, f r is the excited resonance frequency, τ i represents the effects of the random slippage of the rollers and is the ith realization of a zero mean, uniformly distributed random variable with standard deviations of 0.01T p ∼ 0.02T p , and s(t) is the IRS with M impulse responses. Because such vibrations are often measured using an accelerometer, the measured vibration signal can be described in an acceleration format [3], i.e., where According to equations (2) and (3), the acceleration version of the IRS, a(t), can be expressed as Define impulse responses d(t) as Define time-location coefficients x(t) as e acceleration version of the IRS in equation (5) can be modelled as the convolution of the defined impulse response d(t) and associated defining time-location coefficients x(t): If the measured vibration signals contain C kinds of impulse responses with different resonance frequencies, the measure-version IRS can be modelled as where N denotes the measured noises, d c (t) is the cth impulse response type, and x c (t) is the time-location coefficients related to d c (t). e convolution representation model for representing the IRS in equation (9) can clearly represent the dynamic interaction procedures of the defects of wheelset-bearing components and their matching surfaces and the vibration characteristics. To illustrate the physical meanings of the convolution representation model, an example is shown in Figure 1.
In Figure 1 erefore, the convolutional representation model is fairly suitable for representing the impulse response series caused by bearing faults. If there is a technique that can directly infer the impulse responses d c (t) and timelocation coefficients x c (t) measured IRS in Figure 1(e). Obviously, the information for detecting bearing faults will be obtained.

Basic Theory of CSR
Convolutional sparse representation properly provides a framework for inferring the impulse responses and timelocation coefficients from the measured vibration signals a(t) in equation (9) and is expressed as [51] where a k (t) ∈ R n denotes the kth set of analysed signals with length n, d c (t) denotes different kinds of impulse responses, x k,c (t) ∈ R n− P+1 denotes time-location coefficients associated with the analysed signal a k (t) and impulse response x k,c (t), p is the length of impulse response, λ ∈ R + is a regularization parameter, and the constraint on the norms of shock responses d c (t) avoids the scaling ambiguity between the impulse response and time-location coefficients. e solution methods to the optimization problem in equation (10)   optimization and interleaved optimization. Alternating optimization methods solve for impulse responses and timelocation coefficients using different optimal techniques, e.g., feature-sign search [52] or fast iterative shrinkage thresholding [53] are used to optimize the time-location coefficients, and the Lagrange multiplier method [48] is employed to optimize impulse responses. After many calculation steps when updating the time-location coefficients, switch to the impulse responses, and alternating execution of the two optimization processes until convergence. However, the interleaved optimization can simultaneously update the impulse responses and time-location coefficients in a calculation step using the same optimization method [51]. ADMM-based CSR is a kind of interleaved version of optimizing the convolution sparse representation in equation (10). Interleaved optimization has much higher optimization efficiency and obtains even more accurate results than alternating optimization [51]. ADMM-based interleaved optimization includes shift-invariant sparsity coding and shiftinvariant dictionary learning in an optimization calculation step.
After SIDL and SISC are performed on a set of signals are expressed as erefore, IRSs embedded in measured signals are extracted using the following formula: Although convolution sparse representation provides a framework for extracting IRSs caused by wheelset-bearing faults, and ADMM can effectively and efficiently solve CSRrelated optimization problems, practical research has discovered that inappropriate selections of CSR-related parameters adversely influence the extraction of IRSs. CSRrelated parameters can be divided into two categories: the boundary condition-related parameters listed in Table 1 and signal feature-related parameters listed in Table 2.
Considering the CSR Fourier transform and sampling frequency, the length of a single set signal n and the number of analysed signal sets K can be set to 1024 and 8, respectively. e convergence conditions for CSR (primal residuals of SISC r x , dual residuals of SISC s x , primal residuals of SIDL r d , and dual residuals of SIDL s d ) are previously set to 0.001.
However, four parameters related to signal features cannot be set beforehand and should be adaptively tuned because the different measured signals contain different types of impulse responses. e types of impulse responses are closely related to the orders of the excited resonance frequencies induced by the wheelset bearing. e impulse responses with different resonance frequencies and damping coefficients naturally have different impulse response lengths. e CSR-related regularization parameter reflects the sparsity of the impulse responses and is tightly related to the rotational speed and geometry parameters of the wheelset bearing. e selection of a suitable ADMM-related penalty parameter ρ is critical to obtaining a good convergence rate. ere are two strategies for selecting ρ: the increasing parameter scheme [52] and the adaptive method [54]. Due to the good convergence performance of the adaptive method, the adaptive method is employed in this paper and can be described as [54] on the optimization process of equations (14) and (15), and are residuals of dual variables, respectively. τ and μ are constants, the typical values of which in [52] are τ � 2 and μ � 10. us, selecting the ADMM-related parameter ρ becomes a determination of the initial value ρ (0) . According to the test in [52], ρ (0) � ξλ. For this paper, ξ was set to 100. Next, the other three parameters will be discussed in detail. To illustrate the influences caused by the different regularization parameters, the IRS vs. λ were extracted and are shown in Figure 2

The Proposed ACSR
A novel fault detection named adaptive CSR (ACSR) is proposed in this paper. e procedure of the proposed method is shown in Figure 3. It mainly contains four steps: (1) Estimating the types of impulse responses C (2) Estimating the length of impulse responses P (3) Estimating the regularization parameter λ (4) Extracting IRS using ACSR with optimal parameters

Estimating the Types of Impulse Responses.
According to the physical definition of an impulse response in equation (6), impulse responses are a function of resonance frequency ω r and damping coefficient β. e resonance frequency can be used to identify different types of impulse responses [55]. us, the dominant frequencies of the inferred impulse responses from measured vibration signals can be used to estimate the types of impulse responses C. When the analysed signals are executed by CSR with c, c impulse responses can be obtained. eir amplitude-frequency spectra are obtained by Fourier transform taken on c learned impulse responses, respectively. e frequencies which the maximal amplitudes point to are the extracted main frequencies in the amplitude-frequency spectra. erefore, the real value of C can be estimated by repeatedly executing CSR with C � 2, Shock and Vibration Shock and Vibration P � 32, and λ � 5. If the differences between any two dominant frequencies Δf C are less than f s p − 1 (f s p − 1 means the frequency resolution of Fourier transform, f s denotes the sampling frequency and is 10000 Hz, p is set to 32 in this paper, and f s p − 1 is equal to 312.5 Hz), there are less than C types of impulse responses. en, C � C − 1, CSR is repeatedly executed to learn (C − 1) types of impulse responses, and the resulting dominant frequencies Δf C− 1 are calculated until the frequency differences of any two dominant frequencies are more than the frequency resolution. Otherwise, C � C + 1, CSR is repeatedly executed to learn (C + 1) types of impulse responses, their dominant frequencies are calculated until at least one frequency difference Δf C+1 is less than the frequency resolution, and C � C − 1 (in this paper, the initial value of C is set to 2). e final value of C is output as the estimated real values of the types of impulse responses embedded in the measured vibration signals. In Section 5, the detailed procedures for calculating the main frequencies when estimating C are shown in Figures 4 and 5 for Case 1 and Case 2, respectively.
According to the above-discussed rules for estimating the types of impulse responses C, in Case 1, when C was initialized to 2, the difference between the two main frequencies f 1 � 2508 Hz and f 2 � 2439 Hz, Δf 2 , was equal to 69 Hz and was less than the frequency resolution in Figure 4.
us, C � C − 1. e type of impulse responses embedded in the simulation signals of Case 1 was 1 and was identical to the simulation setting. Similarly, there were two types of impulse response for Case 2, i.e., there were two IRSs for Case 2.

Estimating the Lengths of the Impulse Responses.
Although the full time-domain waveform of an impulse response is contaminated by measured noise, the high amplitude zone of an impulse response should have a larger SNR than the low amplitude zone in Figure 6.
If only two extreme values y(t 1 ) and y(t 2 ) can be precisely obtained in Figure 6, the two parameters: resonance frequency f r and damping coefficient β for describing an impulse response are indirectly computed and are expressed as [55] erefore, extracting information on an impulse response should contain at least two extreme values, i.e., N e ≥ 2.
Ideally, when an impulse response properly has two extreme values, and its length is p, the time difference between two extreme values t n � t 1 − t 2 � p(f s ) − 1 . e resonance frequency associated with two extreme values f r can be computed by the following formulation: ∆f c ≥ f s P -1 ∆f c+1 < f s P -1 ∆f c+1 ≤ f s P -1 Shock and Vibration 7 erefore, the resonance frequency of the leaned impulse response f r should satisfy the following conditions: As a result, a possible length of p should satisfy the following relationship: where f min is the minimum resonance frequency and is considered to be 300 Hz in this paper. Higher vibration frequencies necessitate smaller kernel function p lengths. When the resonance frequency is equal to 0.5f s , the length of kernel P is only larger than 2. When the resonance frequency is 350 Hz (f s � 10 kHz), P exceeds 28.5. erefore, p is initially set to 32 in this paper. If there are not two complete maximum values in the learned kernel function, the lengths of the impulse responses P should be increased to represent the lower resonance frequency. To illustrate the influence of different P on extracting IRSs, impulse responses with different P were learned and are shown in Figure 7, and the resulting IRSs were extracted and are shown in Figure 8. rough a careful analysis of Figures 8(a) erefore, it is reasonable to initially set P to 32 in this paper.

Estimating Regularization
Parameter. In Figure 2, it can be seen that the IRSs with different regularization parameters had different envelope spectra kurtoses and contained different noise levels. e number of impulse responses decreased with increasing regularization parameter. Conversely, the noise contained in the extracted IRS increases. erefore, it is critical to determine a rational regularization parameter value suitable for the analysed vibration signals. If the quality of the extracted IRS can be measured by an index, automatic parameter selection is feasible.
Envelope spectrum kurtosis, as an effective measure index of impulsive feature distribution, could be used to accurately evaluate the information capacity of bearing faults [46]. As the number of periodic impulse responses increases, the envelope spectrum kurtosis of the extracted IRS becomes larger. However, when the number of periodic impulse responses exceeds the actual value, the kurtosis of its envelope spectrum will decrease with continuous increases in the regularization parameter λ. erefore, the variance of the envelope spectrum kurtosis can reflect the impulsive feature distribution caused by different values of the regularization parameter. e maximum kurtosis of the envelope spectrum points to the desired result of the regularization parameter. e envelope spectrum kurtosis vs. λ in Section 5 for Case 1 and Case 2 were calculated and are shown in Figures 9 and 10, respectively.

Extracting IRSs Using ACSR with Optimal Parameters.
After estimating the three parameters, an adaptive version of CSR, which is called the ACSR with optimal parameters (C o , P o , λ o ), was obtained. IRS can then be adaptively extracted by ACSR. Its concrete steps are listed as follows: (1) Learning the C types of the impulse responses from the partition signals a k , k ∈ [1, K] using equations (11) and (17) with the optimal parameters (C o , P o , λ o ) (2) Inferring the time-location coefficients of the measured signals a using the sparse representation based on the learned impulse responses using equation (12) with the optimal parameters (C o , P o , λ o )  y(t 2 ) t 2 f r = l/t n t n β = ln(y(t 1 )/y(t 2 ))/2π

Simulation Validation
To illustrate the effectiveness of the proposed method, two classes of simulation signals with different resonances frequencies are introduced in this section.

Case 1: Simulation Signals with One Type of Impulse
Response. e simulation signals with a type of impulse responses contained an IRS. IRS can be realized by the instantiation of the related parameters in equation (9). e parameters are listed in Table 3.
e simulated IRS at a SNR of − 10 dB is shown in Figure 11. According to Figure 4 and the rules for determining the number of types of impulse responses, there should be one type of impulse response (C � 1). When P was initially set to 32, the number of maximum values of the learned impulse response shown in Figure 12 was more than 2. erefore, the length of the kernel function in Case 1, P, was set to 32. Finally, the envelope spectra kurtosis vs. λ are shown in Figure 9, and the optimal target sparsity λ o was 8. e learned impulse response and IRS extracted using ACSR are shown in Figure 12. Fourier and Hilbert envelope spectra of the extracted IRS are shown in Figure 13.
To illustrate the effectiveness of the proposed extracted IRS method, two well-known fault detection methods, spectral kurtosis [56] and EEMD [18], were used to analyse the same signal in Figure 11(a), and the obtained results are shown in Figures 14 and 15, respectively (to save space, the first intrinsic mode function (IMF), which was much more impulsive, is shown).
By comparing the time-domain waveform of the extracted IRS in Figure 12(b) with Figures 14(a) and 15(a), the proposed ACSR was able to extract the IRS containing 19 impulse responses from the simulation's noisy signals. However, the other two comparative methods failed to extract distinct IRSs, and there was strong noise between adjacent impulse responses. is shows that the proposed ACSR surely can characterize the kinematic process of the bearing faults. By comparing the of envelope spectra in Figures 13(b), 14(b), and 15(b), it can be seen that both the amplitude and harmonic number of the fault-characteristic frequency were larger than for the other two methods. In addition, the fault-characteristic frequency harmonics obtained by EEMD and spectra kurtosis were confused by some uncorrelated spectra lines, indicating that the proposed ACSR had good performance when extracting IRSs caused by wheelset bearings.

Case 2: Simulation Signals with Two Types of Impulse
Responses.
e simulation signals with two types of impulse responses contained two IRSs. Each of the IRSs can be   10 Shock and Vibration individually realized by instantiating the related parameters in equation (9). e parameters are listed in Tables 4 and 5. e simulated IRSs at an SNR of − 1 dB are shown in Figure 16(a).
e Fourier and envelope spectra of the simulated signals are shown in Figures 16(b) and 16(c), respectively. e fault frequency f 1 was obtained, but fault frequency f 2 almost cannot be seen in the envelope spectra in Figure 16(c). e proposed ACSR was used to process the simulation signals in Figure 16(a). When C was set as 2, two main frequencies can be seen in Figure 5 1195 Hz and f 2 was equal to 3000 Hz. e difference in the two main frequencies was 1805 Hz, which was greater than the frequency resolution of 312.5 Hz. According to the rule for estimating the number of impulse responses, C � C + 1 and was set to 3, and three main frequencies can be seen in Figure 5 Figure 17 exceeded 2. erefore, P was set to 32 for Case 2. e curves of the envelope spectra kurtosis vs. different values of the regularization parameter are shown in Figure 10. e regularization parameters λ 1o and λ 2o for impulse responses 1 and 2 were 7.5 and 5.5, respectively. e IRSs extracted using ACSR are shown in Figure 17. e Fourier and envelope spectra of IRS1 and IRS2 are shown in Figure 18.
To illustrate the advantage of the proposed ACSR, two comparative methods (spectra kurtosis and EEMD) were employed to analyse the simulation signals in Figure 16(a), and the results are shown in Figures 19 and  20, respectively (the previous two IMFs are shown due to their higher impulsiveness). On the one hand, by comparing the extracted time-domain waveforms in Figures 17(b) and 17(d), 19(a), and 20(a) and 20(c), it can be seen that the proposed ACSR can clearly extract IRSs for which there is almost any amount of noise in two adjacent impulse responses and can isolate two ISRs. is is advantageous for characterizing the actions of wheelsetbearing faults. On the other hand, from the envelope spectra in Figures 18(b) and 18(d), 19(b), and 20(b) and 20(d), the amplitudes and harmonic numbers of the faultcharacteristic frequencies of the IRSs extracted using the proposed ACSR were larger than those for the other two comparative methods, and the envelope spectra of the IRSs extracted using the proposed ACSR were almost unconfused by noise, indicating that the fault detection performance of ACSR was good.

Experimental Verification
To further test the effectiveness and fault detection performance of the proposed ACSR for practical vibration signals, bench and running tests, respectively, were carried out. e wheelset-bearing test bench used for the practical test of the proposed method is shown in Figure 21.
e test bench consisted of a motor, driving wheel, loading device, wheelset, and axle box. e motor delivered driving power at different motor speeds. e driving power was conveyed to the driving wheel through rubber belts. e traction power of the driving wheel was then transmitted to the wheelset. Faults on the outer and inner races, which are   shown in Figure 22, were introduced to the roller bearing installed in the axle box (see Figure 23(b) for the axle box and the accelerometer mounted on it). Figure 23(a) also shows a photo of the test bench. e fault bearing parameters are listed in Table 6. e vibration signals collected from the wheelsetbearing system test bench are shown in Figure 24 when the outer race of the wheelset bearing had the faults shown in Figure 22. e rotational frequency of the wheelset, f o , was 15.4 Hz and corresponded to the running speed of 150 kmh − 1 . Its sampling frequency was 10 kHz. In the envelope spectra of the measured signals, there was only the basic frequency of the inner race fault, f BPFI , but there were no spectral lines about the outer-race fault-characteristic frequency f BPFI . f BPFI and f BPFO are expressed as follows: where N b is the roller number, P d is the pitch diameter, B d is the roller diameter, ϕ is the contact angle, and f o is the rotational speed of the wheelset. According to the parameters listed in Table 6, the outer-race fault-characteristic frequency f BPFO was 124.9 Hz, and the inner-race faultcharacteristic frequency, f BPFI , was 168.1 Hz. e proposed ACSR was used to analyse the measured vibration signals in Figure 24(a). e main frequencies for estimating the types of impulse responses embedded in measured vibration signals were calculated and are shown in Figure 25.
According to the rule for estimating the types of impulse responses, there should have been 2 types of impulse responses. When P was initially set to 32, the extreme points of the learned impulse responses in Figures 26(a) and 26(c) were larger than 2. erefore, P was set to 32. e curve of the envelope spectra kurtosis with different values of the regularization parameters is shown in Figure 27. e two optimal regularization parameters related to the impulse responses d 1 and d 2 were λ 1o � 19.5 and λ 2o � 20.5, respectively. e learned impulse responses and extracted IRSs are shown in Figure 26. e envelope spectra of IRS1 and IRS2 are shown in Figures 28 and 29, respectively. As seen in

Real-Life Running Test in a High-Speed Train.
To test the fault detection performance of ACSR under the practical running conditions of a high-speed train, an accelerometer was installed on the axle box cover of a high-speed train, as shown in Figure 32. e tested wheelset bearing used with the high-speed train was identical to that used with the bench test, and its parameters are listed in Table 6. When the roller of the wheelset bearing had a defect, the vibration acceleration signals were collected from the axle box cover of the running high-speed train at a speed of 200 kmh − 1 (the corresponding rotational frequency of wheelset was 20.6 Hz) and are shown in Figure 33. e sampling frequency of the measured vibration signals was 10 kHz. e roller-characteristic frequency f BSF was 134.6 Hz. e roller-characteristic frequency is defined as As seen in Figure 33(c), the roller-characteristic frequency and its harmonics could not be discovered from the envelope spectra of the original measured signals for fault detection.
ACSR was therefore used to process the measured signals in Figure 33(a). e main frequencies for estimating the types of impulse responses were calculated    and are shown in Figure 34. According to the rule for estimating the optimal number of types of impulse responses, C was equal to 2. In the learned impulse response in Figure 35, the number of maximum extreme values for each type of learned impulse responses exceeded 2. e length of the kernel function was therefore set to 32. e curves for selecting the optimal values of the regularization parameter were calculated and are shown in Figure 36. e regularization parameter related to IRS1, λ 1o , was 20, and that related to IRS2, λ 2o , was 13. e learned impulse responses and extracted IRSs obtained using ACSR are shown in Figure 35. e Fourier and envelope spectra of the extracted IRSs are shown in Figure 37. In Figure 35(d), the vibration behaviour when the faulty roller alternatively entered and left the bearing zone (BZ) or nonbearing zone can be distinctly seen. e fault-characteristic frequency of the roller and its harmonics were extracted for detecting the wheelset-bearing            fault, as shown in Figure 37(d). e rotational frequency f o of the wheelset and its harmonics can be observed in Figure 37(b). In accordance with the vibrational feature, there should have been a defect on the wheel tread. When the tested wheelset was carefully checked, the wheel tread was found to have the defect shown in Figure 38.
To compare the performances of the proposed ACSR with the well-known spectra kurtosis and EEMD methods, two comparative methods were used to analyse the measured vibration signals in Figure 33(a). e analysed results are shown in Figures 39 and 40. As seen in Figure 39(a), the extracted IRSs were distorted. From Figure 39(b), it can be seen that the spectra kurtosis only found the basic frequency of the roller-fault-characteristic frequency. In Figures 40(a) and 40(c), there was strong noise in the two impulse responses. As a result, the fault action of the roller could not be characterized. As shown in Figures 40(b) and 40(d), the basic frequency of the roller-fault-characteristic frequency could only be detected from the envelope spectra of the IMFs, indicating that the proposed ACSR performed better when detecting wheelset-bearing faults under the running conditions of the high-speed train.

Conclusion
In this paper, a novel fault detection method named ACSR was proposed: (1) A convolutional representation mode for characterizing impulse response series was developed. An IRS induced by bearing faults can be described as the convolution of an impulse response and the resulting time-location coefficients. (2) ACSR was proposed based on the combination of CSR and a method for estimating three parameters (the number of types of impulse responses C, length of impulse response P, and regularization parameter related to the convolution sparse representation λ). (3) e capacity of ACSR for characterizing and detecting bearing faults was validated by simulation and bench and real-line tests. Compared to spectra kurtosis and EEMD, ACSR can not only extract full IRSs but also isolate IRSs induced by multiple faults in a wheelset-bearing system. erefore, ACSR's performance for detecting wheelset-bearing faults is superior to that for two other comparative methods.
In addition, ACSR provides a route for estimating the states of wheel treads using the measured vibration signals of axle boxes. is will be further investigated in the future. Using the Matlab.2014a platform on a Dell Inspiron 14 laptop computer, the mean calculation time of SIDL and SISC for the tested signals in Figure 33(a) was approximately 52 seconds. e real-time performance of ACSR needs to be improved.

Data Availability
e data used to support the findings of the study have not been made available because they are confidential.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.