Tool Wear Monitoring with Vibration Signals Based on Short-Time Fourier Transform and Deep Convolutional Neural Network in Milling

Tool wear monitoring is essential in precision manufacturing to improve surface quality, increase machining efficiency, and reduce manufacturing cost. Although tool wear can be reflected by measurable signals in automatic machining operations, with the increase of collected data, features are manually extracted and optimized, which lowers monitoring efficiency and increases prediction error. For addressing the aforementioned problems, this paper proposes a tool wear monitoring method using vibration signal based on short-time Fourier transform (STFT) and deep convolutional neural network (DCNN) in milling operations. First, the image representation of acquired vibration signals is obtained based on STFT, and then the DCNNmodel is designed to establish the relationship between obtained time-frequency maps and tool wear, which performs adaptive feature extraction and automatic tool wear prediction. Moreover, this method is demonstrated by employing three tool wear experimental datasets collected from three-flute ball nose tungsten carbide cutter of a high-speed CNC machine under dry milling. Finally, the experimental results prove that the proposed method is more accurate and relatively reliable than other compared methods.


Introduction
In the metal cutting process, tool wear seriously reduces the workpiece quality, lowers the machining efficiency, and increases the manufacturing costs; thus, tool wear monitoring becomes increasingly significant in precision machining [1]. For achieving the online monitoring of tool wear without interrupting the machining process, researchers collected measurable signals (such as cutting force, vibration, and acoustic emission) to reflect the real-time tool wear [2].
As the data are generally collected faster than diagnosticians directly analyze them [3], there is an urgent need for artificial intelligence methods of tool wear monitoring that can efficiently analyze massive data to obtain accurate results automatically. Wang et al. adopted dimension reduction methods to select 54 features extracted from three directional force and vibration signals and applied support vector regression (SVR) to predict tool wear [4]. Hong et al. employed wavelet packet transform and Fisher's linear discriminant for tool wear monitoring [5]. Kong et al. used the integrated radial basis function-based kernel principal component analysis (KPCA) to fuse 48 features extracted from the three orthogonal cutting forces and constructed the Gaussian process regression-based tool wear predictive model [6]. Yu et al. selected root mean square of the vibration signal and developed weighted hidden Markov model for tool wear monitoring [7]. Aliustaoglu et al. selected, respectively, four statistical parameters calculated from force, vibration, and machine sound signals and sent them to fuzzy inference system for tool wear condition monitoring [8]. Ghosh et al. employed feature space filtering to find the most informative features extracted from sensors and developed a neural network-based fusion model for tool wear monitoring [9]. Wu et al. used correlation analysis, monotonicity, and autocorrelation to optimize 66 features extracted from multisensor signals, fused features through adaptive neurofuzzy inference system, and predicted remaining useful life (RUL) of machining tools by polynomial curve fitting method [10].
However, because the capacity to learn complex nonlinear relationships and handle big datasets is restricted by the shallowed network structure in these methods, the researches have to manually extract and optimize features from measured signals, largely relying on prior knowledge about dimension reduction techniques and data fusion methods. e manual feature selection often leads to the loss of potential effective features, which is difficult to increase the accuracy of tool wear monitoring. Moreover, these complex feature selections and fusions are often proved to be computationally costly operations that may ultimately restrict the applications of such methods in real-time monitoring.
Recently, powerful data mining characteristics of deep learning [11] have attracted the attention of researchers in many fields and helped them to solve many challenges in machine learning domain [12]. Zhao et al. designed a convolution bidirectional long-short term memory network for tool condition monitoring in milling process [13]. Aghazadeh et al. employed convolutional neural network for tool wear estimation and improved the predicting accuracy [14]. Deutsch et al. proposed a deep learning-based method for RUL prediction of rotating components [15]. Wu et al. proposed utilizing vanilla long-short term memory network for RUL prediction of aircraft gas turbine engines [16]. Although these methods work effectively to a certain extent, features are still selected manually based on particular problems and may easily lead to the loss of information that may be applicable to other problems, which not only reduces prediction accuracy but also limits expansion of the application.
Considering the brilliant achievements of deep convolutional neural network (DCNN) in image classification and recognition of big datasets [17], correspondingly, some researchers have imaged the raw signals and then proposed some new diagnostic and prognostic methods based on DCNN. Fu et al. imaged vibration signals and fed them into the constructed convolutional neural network model, obtaining better comprehensive performance in the drilling process [18]. Park et al. proposed a convolutional neural network-based method to inspect nonpatterned welding defects on the surface of the engine transmission, which is more effective confirmed by experimental studies [19]. However, they only took DCNNs as classifiers for state diagnosis, which are unable to make the real-time continuous prediction of tool wear.
In the prognostic terms, Li et al. used a time window to image the raw signal and then input them into a DCNN, which improved the prognostic accuracy on the RUL estimation of the engine dataset [20]. Babu et al. imaged the time series data and proposed a novel DCNN based regression approach for estimating the RUL, which is more efficient and accurate than the existing shallow regression model on two publicly datasets [21]. erefore, it is a feasible strategy using the appropriate method to image the acquired signals and then feed them into the DCNN for prediction. Nevertheless, the datasets used in these methods are small, which ignored the data mining capabilities of these deep networks for big datasets. Furthermore, these are still rarely applied in the current online continuous prediction of tool wear.
For enhancing predicting performance, the paper presents a tool wear monitoring method, which firstly obtains image representation of collected vibration signals based on STFT and then establishes the relationship between obtained time-frequency maps and real-time tool flank wear based on the designed DCNN model. is paper is organized as follows: Section 2 explains the proposed method. Experimental setup is described in Section 3. In Section 4, the proposed model is validated and results are discussed. Section 5 is dedicated to conclusion.

Tool Wear Monitoring Methodology
To effectively take advantage of vibration signals for monitoring tool wear, the tool wear monitoring method based on STFT and DCNN is proposed in this paper, which firstly images the raw signal based on STFT and then designs the predicting model based on DCNN. As shown in Figure 1, the proposed method involves two processes and includes data acquisition, image representation based on STFT, model designing based on DCNN, and tool wear predicting.
In the offline modeling process, the vibration signals are acquired by accelerometer sensors installed on the CNC machine; meanwhile, the actual tool flank wear width in the milling process is measured by the microscope as the target labels. en, the acquired vibration signals are imaged based on STFT to generate time-frequency maps, which are set as training and validating datasets. Next, the model for tool wear monitoring is designed based on DCNN, taking the training and validating datasets and the target labels as input. Finally, after network training and performance evaluation, an optimized model is obtained for real-time tool wear monitoring.
In the online predicting process, the signals are acquired by accelerometer sensors and then are imaged into timefrequency maps based on STFT as the testing datasets, which are input to the optimized DCNN model to predict tool wear. erefore, through offline modeling and online predicting, the proposed method realizes the real-time tool wear monitoring in the milling process.

Image Representation Based on STFT.
Vibration signal is considered one of the signals suitable for field acquisition because it is convenient to collect and sensor installation hardly affects the processing operation [22]. us, the proposed method uses vibration signal to predict tool wear in milling process. In addition, the vibration signal is a nonstationary and time-varying signal as a whole [22]. As one of the most frequently-used time-frequency analysis methods [23], STFT is widely utilized in vibration signal processing. erefore, this paper firstly uses STFT to visualize the vibration signal collected by the accelerometer sensor. e main steps of image representation are as shown in Figure 2(a).
Firstly, determine relevant parameters for imaging the raw vibration signals based on STFT, as shown in Figure 2(b), which includes number of samples for image representation every time (NS), frame length (FL), ratio of overlap between adjacent frames to frame length (RO), and type of window (TW).
en, calculate STFT of the sampling signals. e discrete STFT is calculated as follows [24]: where n is the time variable, k is the frequency variable, x(m) is the sampling signal, w(n − m) is the window function, and N equals NS. Next, calculate the power spectral density (PSD) and the obtained two-dimensional matrix information in time-frequency domain is displayed by pseudocolor map. e PSD function is defined as follows [24]: P(n, k) � STFT(n, k) ×(conj(STFT(n, k))).
(2) e time-frequency map of vibration signal is obtained as shown in Figure 2(c), which expresses the comprehensive Compute power spectrum density and displaying pseudocolor map Convert the spectrogram to input layer dimension of the DCNN model Mathematical Problems in Engineering and dynamic information of the raw vibration signals and provides an effective imaging representation for real-time tool wear monitoring. Finally, convert the obtained time-frequency map to meet the requirement of input layer dimension of the designed DCNN model for the real-time tool wear monitoring, and the pixels of converted map are written into the datasets as the input of the DCNN model.

Model Designing Based on DCNN.
DCNN is widely used to learn complex nonlinear relationship, because it has a powerful ability of data mining and feature extraction. As a result, in this paper, the real-time tool wear monitoring model in milling process is designed based on DCNN. As shown in Figure 3, the designed DCNN model for tool wear monitoring consists of one input layer, three convolutional layers, three pooling layers, one fully connected layer, and one output layer. Concretely, the input layer is the pixel matrix obtained by signal imaging based on STFT, the alternately convolutional and pooling layers are used for adaptive feature extraction of input data, and the output layer with three neurons is employed to monitor three flutes wear of milling cutter. e algorithms for each layer of the DCNN model are separately described as follows.
e convolutional layer convolves the local regions of input matrix with filter kernels, followed by the activation unit to generate the output in the forward propagation process. e output of convolutional network can be considered as the feature map obtained by adaptive feature extraction of input image representation. e forward propagation of the convolutional layer is calculated as follows: where X l j is the l th element of the l th layer input, X l−1 i is the i th element of the l − 1 th layer input, K l ij is the l th layer kernels indexed by (i, j), M j represents local regions of input maps, b l j is j th element of the l th layer bias, and f(·) is the ReLU function. e backpropagation algorithm [25] is used to update parameters of convolutional layer in the training process. Taking L as the loss function, then the calculation is given as follows: where P l−1 i is the patch in X l−1 i that was multiplied elementwise by K l ij during convolution to calculate the element at (u, v) in X l j and δ l j is the j th element of the l th layer sensitivities.
Although the convolutional layer can significantly cut down the connections between different layers, the neurons in each convolutional output are not obviously reduced.
us, the pooling layer is carried out behind the convolutional layer to reduce the dimension of feature map significantly, avoid overfitting, and protect the scale invariance of extracted feature [26]. e computation of the pooling layer in forward propagation process is defined as follows: where down(·) denotes the maximum pooling function and β l j is a multiplicative bias given to the j th output of the l th layer.
Correspondingly, the parameters of the pooling layer are updated as follows: For the fully connected layer, the neurons are all jointed to the previous layer of feature map; the output of which is as follows: where σ(·) is the sigmoid function, W l is the l th layer weight, and b l is the l th layer bias. Similarly, the backpropagation of fully connected layer is calculated as follows: e Euclidean distance between the measured flank wear width y and the predicted flank wear width y is regarded as loss function to train the constructed DCNN model for tool wear monitoring, and the expression is given as follows: where N denotes the number of time-frequency maps in training dataset.

Experimental Setup
To experimentally validate the effectiveness of the proposed tool wear monitoring method in milling operations, the experimental datasets obtained from high-speed milling machine under dry operations were adopted [27,28]. e tool type used in the machining test was a three-flute ball nose tungsten carbide cutter, and the workpiece material used in the milling test was stainless steel (HRC52). Besides, the workpiece had been preprocessed to remove the original skin layer containing hard particles [28]. ree Kistler accelerometers were located, respectively, on the workpiece in x, y, and z directions to monitor the real-time vibration signals of the machine tool, while DAQ NI PCI1200 was used to acquire the machine tool vibration signals in three directions with a continuous sampling frequency of 50 kHz during the tool wear test. ese obtained data are stored and processed in computer platform and then used for training and testing of the designed DCNN model for real-time tool wear monitoring. At the same time, after each cutting process is completed, the actual flank wear width of each milling cutter edge is measured offline using a LEICA MZ12 microscope and regarded as the target labels for training the constructed DCNN model. e relevant parameters of the milling operation are shown in Table 1.
In the experiments, three milling cutters (C1, C4, and C6) with actual flank wear data were taken as the experimental datasets. Each cutter includes intact 300 data files, corresponding to 300 milling operations. All the vibration signals collected in each data file are divided into training, validating, and testing data, all of which are different from each other, and then the training, validating, and testing dataset are obtained after the image representation of vibration signals. Finally, the experimental datasets are descripted as shown in Table 2. e designed DCNN model for tool wear monitoring in milling was run on Alibaba ECS platform with 8-core CPU and NVIDIA Tesla P100 GPU in Ubuntu16.04 operating system, using CUDA8 to accelerate the computation. When NS is 1024, the training time for each epoch is less than 28 s, and the computing time of a single testing sample is only 0.008 s. Hence the proposed method is a high-efficiency approach for intelligent real-time monitoring of tool wear with big data.

Performance Evaluation.
For quantitatively evaluating the effectiveness of the tool wear monitoring method proposed in this paper, the following two measures are used, that is, Mean Absolut Error (MAE) and Root Mean Squared Error (RMSE). e MAE is used to calculate the average absolute error between the predicted tool flank wear width y and the actual tool flank wear width y during milling. e equation for the calculations of MAE is as follows: where N is the number of time-frequency maps in used dataset. e RMSE is used to calculate the root mean square error between the predicted tool flank wear width y and the actual tool flank wear width y during milling. e equation for the calculations of RMSE is as follows: Similarly, N represents the number of time-frequency maps in used dataset.

Influence of Image Representation.
As a time-frequency visualizing method of vibration signals, the parameters of image representation affect the quality of the signal spectrogram. e appropriate parameters can better express the two-dimensional information of the signal's time-frequency domain, which is beneficial to the intelligent real-time milling tool wear monitoring method presented in this paper to gain good performance. To this end, the influence of the parameters (including NS, FL, RO, and TW) of image representation on predicting accuracy of the proposed method is discussed through lots of experiments. When the aforementioned measures of performance evaluation get the minimum value in the testing datasets, the corresponding parameters are appropriate for the method. Besides, in order to eliminate interference from parameters of the DCNN model, the control variate technique is used to discover optimal parameter of image representation. Concretely, when the parameters of image representation are studied,    Figure 4. It is clear to see from Table 2 that, in the case of the same length of the raw signal, the larger the NS, the smaller the dataset capacity; thus, the time required to generate and train the dataset is less. As shown in Figure 4, when the NS is larger or smaller than 1024, both MAE and RMSE measures of predicted tool wear are large. Finally, considering the data processing time and model monitoring accuracy, this paper chooses 1024 as NS parameter.

Frame Length.
In the experiment, the initial values of FL are 32, 64, 128, 256, and 512, respectively. e corresponding MAE and RMSE measures of predicted tool wear are shown in Figure 5. When the FL is larger or smaller than 256, more prediction errors occur. us, 256 is selected as FL parameter.

Ration of Overlap.
In the experiment, the initial values of RO are 1/16, 1/8, 1/4, 1/2, 3/4, 7/8, and 15/16, respectively. e corresponding MAE and RMSE measures of predicted tool wear are shown in Figure 6. e monitoring error decreases as the RO increases, but if the RO is larger than 7/8, the monitoring error will increase to some extent. Finally, this paper chooses 7/8 as RO parameter.

Type of Window.
e commonly used window functions for the initial selection of TW parameters in this experiment are as follows: Bartlett, Blackman, Bohman, Chebyshev, Gaussian, Hamming, Hanning, Kaise, Rectangular, and Triangular window. e corresponding MAE and RMSE measures of predicted tool wear are shown in Figure 7. After comparing and analyzing the above window functions, when the Hamming window is used as the window function, the MAE and RMSE measures of predicted tool wear of the three milling cutters are generally small. erefore, the Hamming window is selected as TW parameter.

Influence of the Designed Model.
e DCNN parameters vary with the milling experimental datasets. Adjusting these parameters to find the optimal parameters of the relevant dataset is a crucial part of the training process. e appropriate DCNN parameters are beneficial for the proposed tool wear monitoring method to obtain good performance. When the aforementioned measures of performance evaluation get the minimum value in the testing datasets, the corresponding parameters are appropriate for the method. Besides, in order to eliminate interference from parameters of image representation, the control variate technique is used to discover optimal parameter of the DCNN model. Concretely, when the parameters of the DCNN model are studied, the parameters of image representation are set as fixed values equal to the optimal value in Section 4.2. In addition, when analyzing one of those parameters, the other parameters of the DCNN model are also equal to the optimal value.

Type of Gradient.
As a commonly used algorithm, the gradient descent optimization algorithm is often employed to train convolutional networks. Nevertheless, it is difficult to explain their advantages and disadvantages theoretically, and these algorithms are often used as black box optimizers [29]. erefore, this paper will compare the popular gradient descent algorithms (such as SGD, Adagrad, Adadelta, Adam, RSMprop, and Nestrov) [29] and select the appropriate type of gradient descent optimization algorithm (TG) to achieve good performance of the proposed model. As shown in Figure 8, the TG significantly influences monitoring errors, and Nesterov algorithms get the lower error. Hence, the Nesterov optimization algorithm is selected to optimize the proposed DCNN model.

Batch Sizes.
Considering the contradiction between massive data and computing resources, it is hard to compute the backpropagation gradient and update all network parameters of overall datasets during the DCNN training process simultaneously. It is generally preferable to set the massive dataset into appropriate batches [30]. e batch size (BS) is a key variable in neural network training. erefore, different BSs in this experiment are used for training the DCNN model. e training time is shown in Figure 9. Correspondingly, the MAE and RMSE measures of predicted tool wear under different NBs are shown in Figure 10.
As shown in Figures 9 and 10, when the BS is smaller, the MAE and RMSE measures of predicted tool wear are small, but the per epoch training time is longer. When the BS is larger (especially when it is more than 100), the per epoch training time is obviously reduced and tends to be stable, but the MAE and RMSE measures of predicted tool wear are significantly increased. Considering the above situation, when BS equals 100, not only can tool wear monitoring accuracy of the constructed DCNN model be guaranteed but also the training time of the proposed model can be reduced. Finally, this paper chooses 100 as BS parameter.

Learning Rate.
In network training, the gradient descent optimization algorithm is applied for error backpropagation. Learning rate (LR) is another key factor, which not only influences the neural weights but also influences network convergence. For improving the performance of the DCNN model, it is crucial to select the suitable LR parameters. In this experiment, the constructed DCNN model training will be performed under different LR parameters. e MAE and RMSE measures of predicted tool wear under different LRs are shown in Figure 11.
As can be seen from Figure 11, when the LR is smaller than 0.03 or larger than 0.4, the tool monitoring error is big; when the LR is moderate, the tool monitoring error is small. Hence, this experiment selects 0.4 as LR parameter.

Number of Epochs.
In addition, the number of epochs (NE) is important for the proposed DCNN model in the training process, and it impacts the error convergence of the DCNN model. In the experiment, different NEs are employed for training the constructed model, and the corresponding MAE and RMSE measures of predicted tool wear under different NEs are shown in Figure 12.
Analysis of Figure 12 easily shows that when the NE is too small, the network training is inadequate, and the tool monitoring error is larger; when the NE is growing, the network training is more sufficient, and the monitoring error is decreased continuously. Nevertheless, when the NE is increased largely (especially over 250), the designed DCNN model training is overfitting, and the tool wear monitoring error is generally stable. erefore, this experiment chooses 250 as NE parameter.

Comparison of Other Methods.
rough a series of optimization experiments, the optimal parameters of the tool wear monitoring method proposed in this paper are Mathematical Problems in Engineering shown in Table 3, and the corresponding predicted tool flank wear of three datasets is as shown in Figure 13.
For demonstrating its performance and advancement, the proposed tool wear monitoring method is compared with other advanced methods using the same open published datasets. First, the shallow models (including SVR, PSO-SVR, LSSVR, and PSO-LSSVR) are used for tool wear monitoring. Besides, the deep models (including CNN, LSTM, and LSTM-CNN) published in literature [31] are employed to compare the performance of the proposed methods. Finally, the compared results are shown in Table 4.
As shown in Table 4, the mean value of the MAE measures of predicted flank wear of three milling cutters is 1.3 μm, and the mean value of RMSE is 1.9082 μm, which are the minimum values in all methods of comparison. Besides, the variance of MAE is 0.372 μm, and the variance of RMSE is 0.6568 μm, which are also the minimum values in the     methods of comparison. Moreover, the proposed method can not only guarantee the predicted tool wear with low deviation but also enable the predicted tool wear with a small dispersion. In conclusion, the proposed tool wear monitoring method can effectively and correctly establish the complicated mapping between the acquisition signals and the tool wear in real-time milling, thus obtaining higher tool wear monitoring accuracy.

Conclusions
In this paper, we propose a tool wear monitoring method based on STFT and DNNN in milling operations, which utilizes the time-frequency maps based on STFT as image representation of vibration signals and establishes the nonlinear relationship between obtained images and tool flank wear width based on the constructed DCNN model. ree cutter experimental datasets measured from highspeed CNC milling machine under dry operations were used to verify the effectiveness of the proposed method. e experimental results show that the proposed method is more accurate and relatively reliable than previous methods. Concretely, the following conclusions can be drawn: (1) It is effective to use the time-frequency maps based on STFT as the input of the constructed DCNN model for online tool wear monitoring, and the image representation affects the performance of the presented method. When the NS is 1024, FL is 256, RO is 7/8, and TW is Hamming window, the generated time-frequency map is optimal for real-time tool wear monitoring. (2) e constructed DCNN model can easily learn the complex and nonlinear relationship between image representation of vibration signals and milling tool wear. e DCNN model has an important influence on monitoring accuracy of tool wear. When the TG is Nesterov, BS is 100, LR is 0.4, and NE is 250, the designed DCNN model has better comprehensive performance; correspondingly, the MAE is 1.3000 ± 0.3720 μm and the RMSE is 1.9082 ± 0.6568 μm.
(3) Compared with other intelligent tool wear predicting methods, the proposed method can adaptively extract features from collected vibration signals and effectively monitor tool wear in real time, which relies less on the prior knowledge of the diagnosticians about feature selections and fusion techniques. erefore, the proposed method can easily expand the scope of applications, which has certain guiding significance for real-time tool wear monitoring in other cutting processes, such as turning and drilling. In addition, it may be attractive for other application fields, such as motor, turbine engine, and rolling bearing.
In future, we will study multisignal fusion technique and tool wear monitoring under different working conditions (especially the actual processing conditions) to improve the performance of the whole method.

Data Availability
No data were used to support this study.

Conflicts of Interest
e authors declare that they have no conflicts of interest.