Intensive Cold-Air Invasion Detection and Classification with Deep Learning in Complicated Meteorological Systems

,


Introduction
China, located in the east of Eurasian continent and adjacent to the Northwest Pacific, is significantly influenced by the prominent Asia monsoon system originating from the thermal gradient between ocean (the Pacific and Indian Ocean) and land (the Asia continent) [1]. Chinese climate usually exhibits multiscale variability, from diurnal to decadal [2], due to the complicated interactions among various atmospheric circulation systems including the western Pacific subtropical High, South Asia High, mid-latitude high level jet, blocking High, and typhoon and the multisource modulation including ENSO [3], the Indian Ocean sea surface temperature [4], snow cover over the Tibetan-Plateau [5], and sea ice in the polar regions [6].
In winter, China is frequently impacted by the cold-air (CA) processes especially for the northeast and northwest region, which may cause huge economic loss and serious health threat. In January of 2008, most areas of southern China suffered an extreme cold spell accompanied by severe precipitation and snowfall [7,8], which brought grave traffic and energy pressure. Unfortunately, such cold wave happened in the Spring Festival travel season and thus many people had to stay in railway stations or airports for several days and could not come back home.
Under the background of global warming, subtropical extreme cold events keep increasing rather than decreasing because of the weakened westerly jet associated with the lessened temperature gradient between polar and tropical regions [9,10], and therefore CA events become a hot topic in current climate research [11]. Generally speaking, CA studies can be divided into three fields: case study, synoptic dynamics study, and climate dynamics study. Case analyses mainly focus on temporal-spatial characteristics and associated physical mechanisms in certain extreme CA event [12][13][14]. In the viewpoint of synoptic dynamics, the intensity, persistency, and spatial coverage of CA events depend on complicated and nonlinear interaction among different circulation systems [15][16][17]. With climatic diagnostic analysis and numerical sensitivity experiments, the climatological background of CA events and the role of critical external forcings can be understood [18][19][20][21]. Also, from the perspective of spatial impact, the CA events can be separated into national and regional processes, and the former can cover larger areas and have stronger influences on economic and social development. Due to frequent occurrence of large-scale CA events after 2000, meteorologists' interest in national CA processes is fuelled in recent years.
Although CA researches have made great progress in the past decades, especially the investigation on the role of subseasonal processes in triggering strong cold wave which has gone deep, how to identify the routine of CA invasion and associated intensive temperature-reduction (ITR) remains unclear. In fact, the track of most moving synoptic systems is hard to detect with simple statistical method. Even for some single entity systems such as typhoon, determining the exact path is also very difficult [22]. As for the march of CA and ITR, considering that each air particle has its own path, the composite pathway of CA actually reflects the statistical characteristic of all particles, which is invisible and arduous to calculate. In traditional CA and ITR monitoring operation, the routine is usually and subjectively judged by forecasters [23], which is obviously not precise. To improve monitoring and diagnosis accuracy of CA events and largescale ITR, an objective identification method for CA and ITR path is urgently needed. In fact, the trajectory of ITR does reflect the influence of CA and has intimate association with meteorological-disaster prevention, and large-scale ITR especially national ITR (NITR) usually causes extremely serious damage, so, in this paper, we mainly focus on objective recognition for NITR path.
In the past ten years, artificial intelligence (AI) technology has made great progress in the fields of computer vision [24], language processing [25,26], machine translation [27], medical imaging [28], robotics [29], and biological information control [30], especially for medical diagnoses [31]. For example, it performs well in terms of unmanned driving [32] and has higher recognition accuracy than the human brain for image and voice recognition. As the core method of AI, machine learning is the main method to implement artificial intelligence. Machine learning is a collection of various algorithms that allow computers to learn automatically. It helps computers analyze large sets of sample data, obtain rules, and then use these rules to classify or predict new data. erefore, it has triggered a historic revolution in many fields [33,34]. In common meteorological research, low-temperature forecasting involves the combination of numerical prediction products and statistical theory [35][36][37]. e rise of artificial intelligence facilitates applying deep learning technology to the forecasting of meteorological elements and improving the accuracy of the forecast core research problems. As an extension of machine learning algorithms, deep learning is currently and mostly used and limited to image recognition technology in meteorology. How to better to apply it to intelligent seamless grid weather forecasting is an urgent problem. Compared with the establishment of forecast equations point by point in the past, deep learning can directly establish the forecast of the entire element field, which can not only correct the forecast results in the numerical model but also consider the continuity of the spatial distribution of the elements. It has a very considerable advantage in developing objective forecasting techniques for grid points and is also more in line with forecasters' forecasting ideas.
Convolutional deep learning neural networks models, including convolutional neural networks (CNNs) [38], recurrent neural networks (RNNs) [39], deep neural networks (DNNs), and gated recurrent units (GRUs) [40], are mainly used to extract and recognize image features of the meteorological element field. Long short-term memory (LSTM) [41] networks are particularly suitable for predicting and analyzing big data time series and are continuously improving for meteorology.
To detect the exact track of NITR events and make reasonable classification, faster R-CNN target detection architecture is used to solve the problems of moving path uncertainty, changeable coverage, and high complexity. On this basis, an improved recognition method based on faster R-CNN and SVM is proposed. is algorithm adopts SVM for NITR's path classification to enhance the confidence of classification. Finally, the improved faster R-CNN model is used to identify, classify, and locate the path of NITR events. e experimental results show that, compared to the original algorithm, the improved faster R-CNN algorithm greatly improves the performance of path identification, especially for the mixed directions and incomplete development scenarios. In general, the amended faster R-CNN algorithm has fast calculation speed, high recognition accuracy, good robustness, and generalization ability of the practical application of NITR pathway detection. e remainder of this paper is organized as follows. Section 2 describes data processing, the overall architecture, and the methods for the Faster R-CNN and SVM model of intensive NITR recognition and classification, including the faster R-CNN network, network training, recognition, and classification method. Section 3 presents the experimental environment and method used to evaluate and analyze the performance of the improved faster R-CNN and SVM model. Section 4 concludes the paper.

Methodology
In this section, we describe our model in detail. First, we introduce data used in this paper and the division of source regions of the NITR events. Next, we present overall structure of the faster-RCNN model. Finally, the improved faster-RCNN model for intensive NITR recognition and classification is provided. Information Center, the surface observational data of 1995 stations were processed by simple time series investigation, neighbour interpolation, and outlier detection analysis methods, and different station datasets of intensive temperature-reduction processes were generated. As shown in Figure 1, there is a comparison of original data and revised data of national conventional station dataset from 1961 to 2018. ere are invalid values and missing measured data in Figures 1(a), 1(c), and 1(e), which are corrected by simple time series investigation, neighbour interpolation, and outlier detection analysis methods. e revised results shown in Figures 1(b), 1(d), and 1(f) are used as the original dataset of this paper. According to the results of massive studies associated with large-scale ITR over China induced by heavy cold-air processes, NITR events are mainly originated from dense cold-air invasion (CAI) from three source regions: northwest, North, and northeast of China. Although all the southward movement of CA is related to the negative phase of the Arctic Oscillation, each path is dominated by separate circulation system. e northwest pathway is usually controlled by the Siberia High, and the northeast routine is linked to activity of the northeast cold vortex and Okhotsk High. e existence of the north path can be attributed to interaction/competition between the Siberia High and northeast cold vortex. erefore, three source regions of CAI can be preassigned, i.e., the northwest region (73°E-95°E), the north region (95°E-115°E), and the northeast region (115°E -135°E) (see Figure 2). Considering that there may exist compound pathways, actually we have seven types of NITR routines: the single northwest (NW), north (N), northeast (NE) path and the composite northwest + north (NW + N), northeast + north (NE + N), northwest + northeast (NW + NE), and northwest + north + northeast (NW + N + NE) path.

Faster R-CNN Network.
Convolutional neural network (CNN) has been widely used in many fields, such as target detection and speech recognition. Besides, the region based convolutional neural network (R-CNN), which was proposed by Ross Girshick in 2014 [42], also performs well and gets rapid development. R-CNN is a classic algorithm and basic method for image recognition using region recommendation, and, on the basis of R-CNN, two new technologies, the Fast R-CNN and faster R-CNN algorithms, are further proposed and improved.
In general, R-CNN algorithm can be divided into four steps: (1) candidate region generation, (2) feature extraction, (3) category judgment, and (4) location refinement. Firstly, a large number of candidate regions are generated by visual method, and then the high-dimensional feature vectors of these regions are formed by convolution operation with CNN method. Subsequently, these feature vectors are sent to some classifiers, such as simple logical regression and Softmax regression. After calculating the overlap degree IOU of the object score and bounding box of the candidate regions, the candidate box is refined to realize object recognition and location.
Compared with the traditional target detection algorithm which uses sliding window to judge all possible regions in turn, the R-CNN algorithm extracts a series of candidate regions which are more likely to be objects in advance and then extracts features only on these candidate regions (using CNN) for judgment, which effectively reduces the calculation of subsequent feature vectors and can better deal with the scale problem. e implementation of CNN adopts GPU parallel computing, which improves the computing speed and efficiency. In addition, the regression step of the peripheral box improves the accuracy of target location.

2.2.2.
National Intensive Temperature-Reduction Recognition. Although R-CNN has become a typical algorithm in the field of image recognition, the bottleneck of the algorithm is that it needs to take long time to generate region suggestions in the first step. Aiming at this defect, faster-RCNN came into being. As for the new algorithm, an RPN is proposed, which is a network based on full convolution. It can simultaneously predict the target area box and target score of each position of the input image, aiming at efficiently generating high-quality area suggestion box. Its appearance replaces the previous methods such as selective search and edge boxes. It shares the convolution characteristics of the whole image with the detection network, so that the detection of region recommendation is almost timeconsuming. erefore, the faster R-CNN is used for NITR recognition in this paper, and, moreover, Support Vector Machine (SVM) model is adopted to classify the type of NITR.
(1) Network Training. Faster R-CNN algorithm includes RPN and faster CNN detection network. In this paper, ZFNet [43] is pretrained to initialize the detection network of RPN and faster R-CNN. e typical structure of ZFNet is shown in Figure 3. e pretraining process of this method in this paper is as follows.
(1) Pretraining CNN. e typical structure of ZFNet consists of five Convolutional Layers and two Fully Connected (FC) Layers. A pooling layer is added behind convolution layers, and the filter size and convolution step size of each layer are slightly different. e last Convolution Layer 5 (Layer 5) of ZFNet outputs 256 channel feature maps, and the Full Connection Layer 6 (Layer 6) concatenates all the features in 256 channels to generate a single channel high-dimensional feature vector with 4096 dimensions. Different types of images have great differences in deep features. e classifier is used for the feature vectors output by Layer 5, Layer 6, and Layer 7, which can output image recognition results.
e pretrained ZFNet is used to initialize the RPN, and a small Convolution Layer 6 (Layer 6) with specific function is added after the original Convolution Layer 5 (Layer 5). On this layer, the convolution operation of the feature map output by Convolution Layer 5 (Layer 5) is carried out in a sliding window way and the shapes of the sliding window were squares or rectangles and overlapping ratio is 0.5. For each position of the image, nine fixed dimensions and aspect ratio (1 : 1, 1 : 2, 1 : 2, and 1 : 2) are considered as 2 : 1. e output of Layer 6 is used as the input of two independent full connection layers, box regression layer and box classification layer, and finally multiplied by 9. e probability is that two windows belong to the target or background, and four pan zoom parameters are multiplied by 9.
(3) Training Faster R-CNN Detection Network. In the same way, the ZFNet is used to initialize the detection network, and the region recommendation obtained from RPN is used as the input of the detection network. e feature is extracted by five Convolution Layers, and the feature map is compressed through the corresponding pooling layer to get 256 channel feature maps. en, the feature map is connected in series through Fully Connected Layer 6 and Fully Connected Layer 7 and finally classified by SVM. In this manner, whether there is the type of intensive temperaturereduction in the suggestion box and the associated location can be obtained. e samples are used for training and fine-tuning many times, and the layer connection weight matrix is updated in the process of error backpropagation. Finally, a detection network suitable for NITR recognition is acquired.

(4) RPN and Faster R-CNN Sharing Convolution Layer.
After the above training processes, the two networks are still independent of each other, so it is necessary to share the Convolution Layer so that the features can be used for both region suggestion box generation and target detection. e specific methods are the following: (a) using ZFNet to generate RPN independently; (b) training the faster R-CNN detection network with the region suggestions and network parameters generated by RPN in (a); (c) applying the faster R-CNN detection network parameters to initialize RPN. At this time, it is necessary to pay attention to set the learning rate of convolution layers shared by RPN and faster R-CNN network to 0, that is, not to update these convolution layers, but only updating those network layers unique to RPN and retraining them. en, RPN and faster R-CNN detection network share all the common convolution layers, which improves the region recommendation procedure and effectively reduces the run time of the algorithm.
(2) Classification Method. SVM is a machine learning method based on statistical learning theory. By seeking the minimum structural risk, the empirical risk and confidence range are minimized, so that the system can get better statistical rules even when the number of samples is small. Compared with traditional pattern recognition methods, SVM has strong generalization ability and can guarantee the global optimization. e core idea of SVM algorithm is to find an optimal classification to meet the classification requirements.
In reality, most of the classification is nonlinear, and the strong cooling path recognition in this paper is also nonlinear. At this time, the nonlinear problem can be transformed into a linear problem in a high-dimensional space through space transformation, and the optimal classification surface or the optimal generalized classification surface can be obtained in the transformed highdimensional space. e kernel function is used to map the linear nonseparable low dimensional space to the linear separable high-dimensional space. e common kernel functions are the Polynomial function, the Radial Basis function (RBF), and the Sigmoid function. In this paper, RBF is used in the NITR pathway recognition algorithm, which can be expressed as where σ is the kernel parameter. x and y are the vector. NITR pathway recognition is a multiclassification issue. Given a set of training samples, it is necessary to divide those raw data into seven categories, namely, NW (marked as 1), N (marked as 2), NE (marked as 3), NW + N (marked as 4), NE + N (marked as 5), NW + NE (marked as 6), and NW + N + NE (marked as 7), so we need totally 7 SVM classifiers. In practice, SVM can be trained and used for classification through the following steps: (1) the first is feature extraction of classified images; (2) a simple linear method is used to normalize the feature vector to prevent large data fluctuation from dominating data perturbation and small data fluctuation from being ignored; (3) the RBF Complexity kernel is used to select the kernel function; (4) the cross validation method was used to select parameter C; (5) the optimal parameters are used to train the training set to obtain the SVM classification model; (6) the trained SVM model is used to classify and predict the output eigenvectors, and the output eigenvector matrix is dot-multiplied with the SVM weight matrix to get the score of the recommendation box in the region, that is, the NITR path type in the recommendation box in the region.
(3) Recognition and Classification Method. From the above network training process and SVM classification method, we can see that the two networks using faster R-CNN for recognition share convolution layer. erefore, the whole recognition process only needs to complete a series of convolution operations, which is able to effectively realize recognition and solve the long time-consuming problem of regional recommendation. In addition, SVM is used as the final classifier to minimize the empirical risk and confidence range, which can get better statistical rules for the number of samples is small. e structure of faster R-CNN and SVM model is demonstrated in Figure 4. First of all, the structure of proposed model with RPN has been implemented on the available dataset to extract the features on the convolution layer. And, second, the feature map from convolution layer enters RPN and generates a large number of regional suggestion boxes on the feature map. It should be noted that, for each position of the feature map, nine candidate windows with fixed scale and aspect ratio are considered. irdly, nonmaximum suppression was applied to the RPN-generated regional suggestion boxes, and 200 boxes with higher scores were retained. Fourthly, the faster R-CNN recognition network extracts feature vectors from the image in the region suggestion boxes, inputting them into the full connection layer, and then inputs them into the SVM classifier to calculate the score of each region suggestion box. Finally, the faster R-CNN recognition network refines the region suggestion box by regression.

Experiments and Analysis
To examine the performance and effectiveness of NITR identification based on the faster R-CNN model, the deep learning experiments are constructed by using Python 3.7.
e CPU of the experiment is an Intel Core i5 @ 2.30 GHz with 8 GB of memory, and the operation system is 64-bit Windows 10. e proposed hybrid faster R-CNN model has default parameter settings; the number in the 1 st convolution layer is 64; the filter size in the 1 st convolution layer is 3; the pooling size is 2; the dropout rate is 0.46. e experimental process of the faster R-CNN model includes data acquisition, data preprocessing, feature importance assessment, model training, model testing, and model evaluation. Data preprocessing includes data normalization, training set partition, test set construction, and time series construction. After data preprocessing, the training data are used to generate the model that is to adjust the network weight through the optimization function to minimize the loss function of the model until the number of iterations reaches the set value. en, the training model is applied to the test set data, and the performance of the model is measured by the average precision (AP), the mean average precision (mAP), and other evaluation indicators.

Dataset.
How to describe climatic characteristics of the NITR events including cooling amplitude and related coverage is an important issue in NITR path identification. In a national standard published by the China Meteorological Administration, the change of daily minimum temperature is chosen to reflect the intensity of heavy CA or cold wave processes. us, the time series of daily minimum temperature over 1995 national conventional stations is selected here as the original dataset. Table 1 shows the classification of stational ITR by reference to Chinese national standard (GBT 20484-2017). Based on this standard, the linear interpolation method has been chosen to make grid analysis and pictures for national station data; 497 NITR processes from 1961 to 2018 are generated, with a total of 3434 target images. e image size is 800 pixels × 800 pixels, and the storage format is JPG.
After strict selection, there are totally 2800 exactly suitable images, marked with LabelImg tool, and the location of NITR is recorded. According to the preassigned types of NITR paths, the dataset is divided into seven different kinds of NITR processes: NW, N, NE, NW + N, NE + N, NW + NE, and NW + N + NE, as shown in Figure 5. In accordance with the ratio of 8 : 1:1, the dataset is divided into training set (80%), verification set (10%), and test set (10%). ese three datasets are independent and mutually exclusive, which are used for training, parameter optimization, and performance evaluation of target detection model, respectively.  Figure 3: e typical structure of ZFnet. 6 Complexity

Model Training.
In order to improve the training speed and convergence performance of target model. Firstly, the ImageNet dataset is preprocessed to convert the training set and verification set data into TFRecord format. en, the training is started based on the TFRecord data file. During the training procedure, critical parameters are settled as follows: the batch size is 64; the image size is scaled to 224 pixels × 224 pixels; the training cycle is 85; the number of iterations in each cycle is 10000 and the total number of iterations is 850000; the momentum factor is 0.9; the weight attenuation coefficient is 0.0001; the initial learning rate is 0.01. e learning rate is attenuated by using the segmentation constant, and the final learning rate is faded to 0.00001. Finally, the random gradient descent method is used to deploy the target detection model. In the process of network training, there are altogether 2240 images in the training set; the momentum factor is 0.9; the weight attenuation system is 0.0005; the initial learning rate is 0.0001; the attenuation rate is 0.9 and the total number of iterations is 6000.

Evaluation.
To objectively evaluate the generalization ability of the NITR path type recognition model, the AP and mAP criteria are used as measurements of derivation between observed and predicted values. In the application scenario of this paper, NITR events are set as positive samples and the corresponding backgrounds are negative samples. e ratio between the number of strong cooling paths correctly detected by the model and the entire number of predicted strong cooling tracks is defined as precision (P), which is used to measure the recognition ability of positive samples. Recall (R) is defined as the ratio of the amount of correctly identified data of a certain type of strong cooling pathway in the test set data to the total number of such strong cooling pathways, which is used to measure the coverage of positive samples. e average accuracy is related to the accuracy and recall. It is the integral of the accuracy recall curve and the coordinate axis, which is used to measure the recognition effect of the model. e larger the value is, the better the recognition effect of strong cooling path is. e average value is able to reflect mean accuracy of multicategory strong cooling path identification. Similarly, the larger the value is, the higher the accuracy of model realization is.
(1) AP can more intuitively show the classifier performance, which is defined in the following equation: where p(r) is a function of precision as a function of r. e area between the function curve and the coordinate axis is the average accuracy.
(2) e calculation formulas of accuracy and recall are as follows: where P is the accuracy rate; R is the recall rate. T P is the number of truly positive samples, and such samples as positive members in observation are also determined to be positive samples by recognition model, so the prediction is correct. F P is the number of falsely positive samples, and such samples as actually negative members in observation are judging as positive samples by model, so the prediction is wrong. F N is the number of falsely negative cases, and such cases are positive ones in fact but judging as negative samples by model, and thus these samples are mistakenly omitted. (3) mAP is defined in the following equation: where mAP represents the average precision and n is the number of targets to be detected. ere are 7 detection targets in this paper.

Performance and Analysis.
Faster R-CNN and R-FCN models are trained with the same training set samples, and the performance of the trained models are compared with our new model proposed in this paper. Table 2 shows performance comparison of the three models. In terms of accuracy, the average accuracy of our    In many cases of the NITR events, there appear mixed NITR paths and some single NITR paths, which bring difficulties to the identification of severe cooling paths. In order to distinguish multiple intense cooling routes in different directions and the model recognition accuracy in the case of incomplete development, 160 samples are randomly selected for each fine condition and input into faster R-CNN, R-FCN, and our model, respectively, for recognition test.
e results are shown in Table 3, the average precision of our model for recognizing mixed NITR paths and single NITR paths in different directions is 86.3% and 87.6%, respectively, and the average value in two cases is 86.95%. All the three indices are higher than those of the faster R-CNN and R-FCN models. is is because our model uses convolution kernels of different sizes for operation, which has strong multiscale feature extraction ability. FPN unit fuses different scale feature information to strengthen the expression ability of target characteristics. Under such circumstance, various strong cooling paths in different directions and incomplete development processes can be effectively identified even if the semantic information is lost on the feature map. Figures 6 and 7 show the recognition effect of our models on single and mixed NITR paths under medium strong cooling conditions. Figure 6 shows an example of single NITR path and Figure 7 shows the performance of all models on mixed NITR paths. It can be clearly seen that our model is able to accurately detect the types of NITR paths in different environments.

Conclusion and Future Work
With the development of deep learning technology, an improved recognition and classification method of national ITR path in China based on the faster R-CNN and SVM Model in complicated meteorological systems has fast calculation speed and high recognition accuracy. e method proposed in the paper improves the recognition performance of NITR paths. First, quality control of the original dataset of strong cooling processes is carried out by means of data filtering. en, based on the Chinese national standard (GBT 20484-2017), the linear interpolation method has been chosen to make grid analysis and pictures for national station data; 497 NITR processes from 1961 to 2018 are generated. Meanwhile, the regularization parameters of Softmax classification method will cause approximate results of probability calculation, so SVM is used for path classification, which can obtain better results when the number of samples is small, ensure the global optimization, and improve the reliability of classification. e experimental results show that, compared with other models, the storage space of the faster R-CNN and SVM Model is 85 MB and the recognition speed is 11.1f/s, which effectively reduces the network scale and significantly improves the recognition speed. In addition, the mAP of new model is 86.5%, 1.7%, and 1.4% higher than that of faster R-CNN and R-FCN, respectively. At the same time, it has good generalization performance for mixed paths and single NITR paths. erefore, the improved faster R-CNN model is new method in the meteorological application of NITR path recognition.
In the future, with the development of deep learning technology and cloud computing, we will study the methods of migrating the model computing tasks in this paper to edge devices, including mobile edge computing [44], privacy aware deployment of machine learning applications [45], and dynamic resource allocation [46]. en, we try to use the idle computing power of edge devices to share the computing pressure of cloud servers and improve computing efficiency.

Data Availability
e data and models used during the study are available from the corresponding author by request.