A Novel Method for Regional NO2 Concentration Prediction Using Discrete Wavelet Transform and an LSTM Network

Achieving accurate predictions of urban NO2 concentration is essential for effectively control of air pollution. This paper selected the concentration of NO2 in Tianjin as the research object, concentrating predicting model based on Discrete Wavelet Transform and Long- and Short-Term Memory network (DWT-LSTM) for predicting daily average NO2 concentration. Five major atmospheric pollutants, key meteorological data, and historical data were selected as the input indexes, realizing the effective prediction of NO2 concentration in the next day. Firstly, the input data were decomposed by Discrete Wavelet Transform to increase the data dimension. Furthermore, the LSTM network model was used to learn the features of the decomposed data. Ultimately, Support Vector Regression (SVR), Gated Regression Unit (GRU), and single LSTM model were selected as comparison models, and each performance was evaluated by the Mean Absolute Percentage Error (MAPE). The results show that the DWT-LSTM model constructed in this paper can improve the accuracy and generalization ability of data mining by decomposing the input data into multiple components. Compared with the other three methods, the model structure is more suitable for predicting NO2 concentration in Tianjin.


Introduction
With the development of urbanization and industrialization, per capita energy consumption increases year by year. In addition to natural sources such as dust storms, bush fires, and volcanic eruptions, increased nitrogen dioxide (NO 2 ) from vehicle exhaust and boiler exhaust has become one of the major environmental problems facing most countries in the world [1]. In China, the number of air quality standards and the average number of days with good air quality in 338 cities have increased year by year recently. However, in the Beijing-Tianjin-Hebei region and surrounding areas, the concentration of six air pollutants (PM2.5, PM10, O 3 , SO 2 , NO 2 , and CO) decreased the least, except ozone (O 3 ). And, the same is true for Tianjin. In addition, the annual average concentration of NO 2 in Tianjin in 2018 was 47 micrograms per cubic meter, exceeding the national annual average concentration standard (40 micrograms per cubic meter). NO 2 has become the most important pollutant affecting the air quality of Tianjin. All of these data above are obtained from the website http://www.mee.gov.cn/hjzl/zghjzkgb/ lnzghjzkgb/.
In fact, nitrogen dioxide dissolves in water in the air to form acids, which may lead to the occurrence of acid rain and react with ultraviolet radiation to form photochemical smog. In addition, human exposure to NO 2 of different concentrations may lead to lung function damage of different degrees, seriously affecting industrial production and social activities [2]. erefore, the effective detection and accurate prediction of NO 2 and the establishment of a highly accurate and stable prediction model can provide an early warning of air pollution emergencies and guide the release of NO 2 control measures and public health protection work. e existing research on the prediction of atmospheric pollutant concentration can be roughly divided into three categories. e first category is the deterministic method based on the physical and chemical change model of the atmosphere [3][4][5].
e second category is the use of computational methods based on regression and neural networks. e third category is the optimal combination model based on the second category. e deterministic approaches with no need for a large amount of historical data require a complete knowledge of pollution sources, timely emission quantities, main chemical reactions of the gaseous pollutants, and spatiotemporal physical transformation processes. e second and third types of computational methods usually require a large amount of historical measurement data under various meteorological conditions. Wang et al. [6] used the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) for a serious pollution incident in Beijing in December 2016. e accompanying sensitivity analysis in this paper could capture the influence of emission sources on the concentration of target pollutants in different regions and different time periods, which provided a good reference for formulating effective emission reduction measures and regional air pollution prevention and control. However, due to the limitations of the resolution of existing models and other issues, the pollution events studied cannot be further utilized for other pollution events of different seasons and types. Baykara et al. [7] applied CMAQ (5.2), based on the heating activity data of local residents in Istanbul, to explore the influence of emissions from the residential heating sector on the level of environmental particulate matter. ey thought that winter was the time when residential heating sector mainly affects regional air quality. A possible reason for this was the increase in coal burning that produces sulfur dioxide emissions but not for other man-made emissions.
e data-driven model mainly conducts a statistical analysis of air quality data and related factors to obtain scientific conclusions. In terms of linear correlation analysis, scholars have put forward a series of methods, such as geographic weighted regression (GWR), geographical and time-weighted regression (GTWR), and land use regression (LUR). Hinojosa-Baliño et al. [8] used meteorological, demographic, geographical, and social data and mixed geographic information system (GIS) and LUR to generate the prediction model and spatial distribution of PM2.5 air pollution. Alahmadi et al. [9] used a local GWR model in the GIS environment to describe and quantify the contribution of transportation sector emissions to the NO 2 concentration in the Red Sea, but the limitation is the unavailability of some data. Further, Mirzaei et al. [10] used the GTWR model to study the spatial-temporal variability between PM2.5 concentration at ground monitoring stations and satellite aerosol optical depth (AOD) data. However, in warm season, there were defects in the retrieval algorithm when detecting the low value of particulate matter, which lead to a decrease in the prediction accuracy of the model, and thus affected the simulation output. e application of machine learning algorithms, such as random forests, support vector machines (SVMs), and artificial neural networks (ANNs), takes the nonlinear relationship into consideration and improves the accuracy of prediction [11][12][13]. Masih [14] employed an integrated data mining tool that used random forests to predict the concentration of nitrogen dioxide in the atmosphere taking the emission inventory and meteorological parameter monitoring data set as input prediction factors, and compared them with M5P and SVM, demonstrating the superiority of the model. Liu et al. [15] used support vector regression (SVR) to make a collaborative prediction of the Chinese urban air quality index (AQI). Experiments showed that when there was a strong interaction and correlation between air quality characteristic attributes and the air quality index, the MAPE (Mean Absolute Percentage Error) value of the multicity multidimensional regression model decreased. Cabaneros et al. [16] applied a mixed artificial neural network to the prediction of urban road NO 2 . Mishra and Goyal [17] developed an NO 2 concentration prediction model based on an artificial intelligence neuro-fuzzy model. However, these prediction models cannot capture both longterm and short-term characteristics, so Long Short-Term Memory (LSTM) is often used to predict air quality and pollutant concentration with time series characteristics. Li et al. [18] used LSTM layers to automatically extract inherent useful features of atmospheric pollutant data to predict the PM2.5 concentration in Beijing in the next hour. Following this, Reddy et al. [19] extended the prediction from a single time step to the next 5 to 10 hours based on the time series data of pollution and meteorological information of Beijing.
However, the above model may lead to insufficient accuracy due to its potential convergence to local minima and overfitting [20]. In recent years, with the development of artificial intelligence and big data analysis, hybrid methods based on various information processing methods and deep learning methods have been widely used [21][22][23][24][25][26]. Kordestani and Samadi used distributed neural network and Bayesian algorithm to predict the remaining service life of Multifunctional Spoilers (MFS). Taking the data of LJ2000 series fighter data as samples, the hybrid prediction method was evaluated with relative accuracy, and it was found that the prediction effect of distributed structure was better [27]. Rezamand et al. constructed a hybrid prediction method based on real-time Supervisory Control and Data Acquisition (SCADA) and vibration signals to predict the Remaining Useful Life (RUL) of wind turbine bearings, made an empirical analysis of the hybrid model, and concluded that the prediction accuracy of this method was higher than that of the Bayesian algorithm [28]. Chen, Zhang, and Vachtsevanos proposed a prediction method of machine health condition based on Neural-Fuzzy Systems (NFSs) and Bayesian algorithm. Two examples of a cracked bearing plate and a faulty bearing were used to verify the effectiveness of the hybrid prediction method. e experimental results show that the hybrid method can effectively predict the running condition of the machine [29]. Bai et al. [30] proposed a neural network with long-and short-term memory (E-LSTM) to predict PM2.5 concentration per hour and added mode decomposition (EMD) to the LSTM foundation, effectively improving the prediction accuracy. Zhao et al. [31] proposed a data-driven model called the LSTM-FC neural network, which uses historical air quality data, meteorological data, and weather forecast data to predict PM2.5 pollution over 48 hours at a particular air quality monitoring station. Other researchers, such as Pak et al. [32], proposed that a mixed model convolutional neural network (CNN) combined with long-and short-term memory (CNN-LSTM) has better seasonal stability and prediction performance compared with the single LSTM model. Wu and Lin [33] developed a hybrid model called VMD-SE-LSTM which applies VMD (Variational Mode Decomposition) technique to decompose the AQI data and employed SE (Sample Entropy) to recombine these components and train each recombinant subsequence with an LSTM neural network. is model not only improves the precision but also has good generalization ability.
Although the data decomposition and deep learning model have been combined in existing studies, the combination model of wavelet transform and deep learning method has not been applied to the prediction of air pollutant concentration. e novelties of this study is to take Tianjin as an example and build a DWT-LSTM combined model to predict the NO 2 concentration of the city in the future. In addition, through an empirical analysis of the NO 2 concentration in Tianjin, it is concluded that the prediction effect of the DWT-LSTM model constructed in this paper is better than that of the SVR, GRU, and single LSTM models. At the same time, the prediction results can provide early warning for air pollution emergencies in Tianjin and guide the introduction of NO 2 control measures and public health protection.
e above research is of certain use for the prediction of NO 2 concentration in the regional atmosphere. In this study, a data processing method based on discrete wavelet decomposition combined with an LSTM deep learning algorithm achieves the purpose of relatively accurate prediction of NO 2 concentration in Tianjin. e main contributions of this study are as follows: (1) the use of wavelet decomposition to carry out dimensional processing of data, optimize input variables, and improve the prediction accuracy of the LSTM model; (2) the development of the DWT-LSTM prediction model; (3) the consideration of the correlation between traditional LSTM prediction results and wavelet-LSTM results and actual data, verifying that data can improve the prediction accuracy and stability of the LSTM model through wavelet decomposition.

Research Area and Data
e geographical area of this study is Tianjin, China, located in the north China plain (117:10e39:10n, Figure 1). As of July 1, 2019, the Tianjin environmental air quality monitoring network has been established, covering the central urban area, the four districts around the city, the new Binhai area and other districts. Each point has six regular air pollutant monitoring capabilities, including PM10, PM2.5, SO 2 , NO 2 , CO, and O 3 . Since 2013, the Tianjin environmental air quality GIS platform has been used to release environmental air quality information for all monitoring points in Tianjin to the public. Now there are 16 national control stations and 11 municipal control stations with a total of 27 testing stations.
We collected the daily average data of PM2.5, PM10, NO 2 , SO 2 , O 3 , and CO on January 1, 2014, solstice to June 30, 2019 (2007 days) in Tianjin and used the latest available data to correct the missing data of each type of air pollutant. In addition, we also downloaded meteorological observation data from the Chinese meteorological website platform established by the China meteorological administration (CMA), including wind speed, temperature, and weather conditions. e output predictor of the experiment is the daily average concentration of NO 2 in Tianjin, which is shown in Figure 2. In Figure 2(e), it can be seen that the NO 2 concentration exhibits obvious periodicity, namely, high concentrations in the winter and summer concentration is low. Figures 2(a)-2(d) show the graph of the concentrations of PM2.5, PM10, SO 2 , and CO, respectively, over the same period. It can be seen that the four kinds of pollutant and NO 2 exhibit the same periodic variation. Figure 2(f ) shows the O 3 concentration graph, where it can be seen that O 3 has periodic changes that are opposite to the other five pollutants. is may be because low-altitude O 3 is usually prone to produce and exceeds the standard in high temperature seasons [34]. e wind force was quantified according to the method of Bai et al. [35]. We graded the weather conditions according to how good or bad they were and quantified the weather indicators. e data (five pollutants, temperature, weather, wind data, and historical NO 2 ) from January 2, 2014, to May 26, 2018, were used for training. e data from May 27, 2018, to June 30, 2019, were used for testing, also in conjunction with the NO 2 historical data.
Statistical descriptions of six pollutants are given in Table 1, where O 3 is the 8-hour average concentration.

Long Short-Term Memory.
e LSTM neural network is a popular recursive neural network algorithm, which was first proposed by Hochreite and Schmidhuber to improve the memory of long (static) and short (cyclic) dynamic features of time series [36]. Similar to the traditional recurrent neural network model, this approach models time data by mining the circular connections between neurons and mining the internal connections between time series data. However, unlike traditional circular neural network models, it has a unique neuron structure called a "memory unit." e hidden layer of an LSTM network constructed by this approach can store time information of any length to obtain a more accurate time series model. e memory unit structure of the LSTM network is shown in Figure 3 [37]. e fixed length window of the time series is generated and input into the LSTM network. Multiple LSTMs can be superimposed to learn more complex patterns of sequential information [38]. e memory module consists of an input gate, forgetting gate, output gate, and a loop unit. Its core idea is to control the switch of each gate by a nonlinear function, to protect and control the state of the memory unit, so as to control the increase or decrease of information [39]. erefore, the key of an LSTM network is to store data information through the state of the storage unit for a long time. In general, the output value of the three gates is 0∼1, and the sigmoid function is used to      Computational Intelligence and Neuroscience determine how much information can be input to the memory location. e main formula is as follows: where o represents the Hadamard product and tanh is used as the activation function. x t , c t , and h t are the input, storage unit state, and output of the LSTM at time t, respectively, while i t , o t , and f t are the function values of the input gate, output gate, and forgetting gate, respectively. c t ′ is the input modulate gate, which determines how much new information can be received. σ(·) is the sigmoid function, R, W, and U are weight matrixes, iis the input, f is the forget, c is the cell structure, o is the output. W s represents the weight matrices, and its superscript represents the two variables connected by the matrix. For example, W i is the weight matrix between the input and the input gate and b is the deviation of the gate.

Discrete Wavelet Transform.
Considering that the concentration of NO 2 in the atmosphere is related to the other five major pollutants, wind force, weather conditions, temperature difference, and other factors, the daily average NO 2 concentration series is nonstationary, volatile, and time-ordered. e above factors have different effects on NO 2 concentration. e input signal contains various frequency components: the contributions of lowfrequency and high-frequency components to the dynamic characteristics of wind power data are different. If the components of these different frequencies can be learned by independent LSTMs, it will improve the performance of data mining. erefore, the divide and conquer strategy requires that the original wind power data be decomposed into low-frequency and high-frequency signals through appropriate decomposition algorithms. In this paper, a discrete wavelet transform is used to decompose the original input data. It makes use of the time scale function to analyze the data and makes the wavelet transform have a multiscale resolution and timeshift characteristics. e scaling operation can observe signals of different scales. erefore, a wavelet transform is very suitable for dealing with nonstationary time series including air pollutant data.
Assuming that x(t) squared can be integrated, x(t) can be expanded under the wavelet basis function. is operation is called the continuous wavelet transform of x(t). e mathematical definition of the wavelet basis function is Figure 3: Structure of an LSTM neural network. Computational Intelligence and Neuroscience e mathematical definition of the x(t) continuous wavelet transform is given as In equations (7) and (8), ψ(t) can be considered as the parent wavelet function. a is the scale parameter and b is the time center parameter. When a and b change continuously, the whole transformation process is called a continuous wavelet transform. However, in practical applications, continuous transformation greatly increases the computational complexity, application cost, and implementation difficulty and is usually replaced by a small step discrete wavelet (DWT) [40].
A discrete wavelet transform makes the application of a wavelet transform easy to realize. e exponential discretization of parameters a and b reduces the computational complexity and avoids the information redundancy brought by a continuous wavelet transform.
e discrete wavelet transform of x(t) is defined as In equation (9), a and b are discrete. a � a e discrete wavelet transform is the Mallat algorithm proposed in 1988 [41]. is is actually a signal decomposition method. For the multiresolution characteristics of wavelets, variable j is used to determine the resolution at different scales. Specifically, the main outline of the original signal is observed on a large scale, and the detailed information of the original signal is observed on a small scale. Finally, with gradually increasing j the results come out: one approximation value (i.e., low-frequency component) and n (which needs to be artificially set) detailed signals (i.e., highfrequency) d n , d n−1 , d n−2 , . . ., d 1 [42]. e original signal and two kinds of subsignals are satisfied by the following formula: x(t) � a n + d n + d n−1 + · · · + d 1 .
e schematic diagram of the discrete wavelet transform decomposition is shown in Figure 4.

Overview of DWT-LSTM.
e LSTM neural network model is used to identify data patterns, and wavelet decomposition is used to decompose the input data. e prediction model combined with the wavelet transform and LSTM neural network consists of the following stages: Step 1: add the original data set M 1 , M 2 , . . . , M n which is normalized, and the experimental data set D 1 , D 2 , . . . , D n ; Step 2: set D 1 , D 2 , . . . , D n can be decomposed through m layers to obtain the high-dimensional input information set to train the LSTM model. rough repeated data training and data testing, adjust parameters and get the optimal prediction model f(X i ), as shown in Figure 5.
Step 4: the predicted value f(X t+1 ′ ) of the concentration of air pollutants in stage t + 1 can be measured by using the prediction model obtained above and according to the input vector X t+1 obtained in stage t + 1.
As the purpose of this study is to predict the Tianjin daily average concentration of NO 2 , the input index includes two kinds of data, pollutant concentration, and meteorological factors. First, the meteorological factor information is quantified for further processing. e quantitative data are used for integration with other numeric data, but due to abnormal fluctuations, it will seriously affect the prediction ability.
e consolidated data are normalized using Min-Max methods [43]: e normalized data are used as the original signal for wavelet decomposition. As shown in Figure 6, a group of low-frequency subsignals and three groups of high-frequency subsignals of each original data were selected as input data and used for training and validation by LSTM neural networks. e best prediction model is obtained by adjusting the parameters and the structure of the design model.

Model Parameters and Performance
Indicator. e prediction model proposed in this paper was implemented using Python 2.7 in Matlab 2017a and the Linux system Figure 4: ree-layer wavelet decomposition. 6 Computational Intelligence and Neuroscience environment.
e DWT-LSTM model adopts 3-layer wavelet decomposition and the Daubechies (DB) wavelet basis function. In network parameter setting, the primary parameters of LSTM include learning rate, max epochs, batch size, number of hidden layers, and tine step. In the best model, learning rate is 0.0001, max epoch is taken as 500, number of hidden layers is 32, and time step is 3. e selection of the relevant parameters in the model targets mean absolute percentage error (MAPE) minimization [44], and this is an important indicator to measure prediction accuracy in the statistical field and is also widely used in the prediction of air pollutant concentrations. e MAPE index was used to measure the error of the prediction algorithm and compare it with other algorithms. Not only was the error between the predicted value and the true value considered but also the ratio between the error and the true value was considered [45]. e following equation gives the calculation of MAPE: where y * i is the observed NO 2 concentration, y i is the predicted NO 2 concentration, and n is the number of detected samples.

Data Description.
is paper selects six air pollutants of PM2.5, PM10, NO 2 , SO 2 , O 3 , and CO and three meteorological observation factors of wind speed, maximum temperature, and minimum temperature, as input indexes. In order to evaluate the accuracy of the NO 2 concentration prediction model, the index data from January 1, 2014 to June 30, 2019, with a total of 2007 points, were selected in this paper. e original data sample was divided into two data sets: 80% of the original data (1606 data points) were used as the training sample, and the remaining 20% of the original data (401 data points) were used as the test sample to evaluate the prediction performance of the model.

Results of the Wavelet Transform.
e LSTM method is suitable for time series prediction as it has good prediction performance. Also, the LSTM model can effectively represent the nonlinear relationship between the input vector and prediction target through the use of a kernel function. Appropriate high-dimensional input vectors can describe the information in features more effectively and accurately and express the meaning of the data. erefore, the prediction performance depends largely on the choice of input vector in model design. In this study, when the LSTM model is used to predict pollutant concentration, in order to make the prediction results more accurate and stable, the structural transformation of the input variables can be determined to obtain a new set of input variables. By using wavelet decomposition, the data are promoted from onedimensional data to high-dimensional data, which fully represents the trend of data change and improves the prediction accuracy. In this study, wavelet decomposition is based on the wavelet basis function of Daubechies   Computational Intelligence and Neuroscience characteristics and is suitable for feature selection. Due to its inherent orthogonality, the Daubechies wavelet can be used widely and shows good performance in analyzing applied time series data. Using the Matlab tool, low-frequency approximate information and high-frequency information obtained by wavelet decomposition transformation are taken as another new input vector group of the LSTM model to form a new prediction data set of the six kinds of air pollutants (PM10, PM2.5, NO 2 , SO 2 , O 3 , and CO). e transformation results are provided in Figure 7, which shows the high-frequency information group and the low-frequency information group. e set of wavelet decomposition transforms the density time series data of the three input characteristic variables to generate high-dimensional input vectors, which effectively increases the amount of data representation information and significantly improves the prediction stability of the model. Computational Intelligence and Neuroscience

Result of Prediction.
We compared the performance of the proposed DWT-LSTM model with that of SVR, GRU and the single LSTM model and trained and tested these models with the same training and test set applicable to the DWT-LSTM model. In order to evaluate the effectiveness of this method, we added two indexes: root mean square error (RMSE) and average absolute error (MAE). ese indicators can be expressed as follows: Figure 8 intuitively shows the experimental results of the four prediction models. rough visual analysis, it can be seen that the prediction curve of the SVR model is relatively flat, it is difficult to accurately predict the fluctuation of data, and it presents a fluctuation trend opposite to the target value in some time periods. GRU, a variant of the LSTM, algorithmically combines forgetting and input gates into a single update gate, as well as a mixture of cellular and hidden states, and other changes. Although it has better performance in some experiments [46], its performance in this experiment is not as good as that of the single traditional LSTM model; especially it cannot predict outliers well. e LSTM model performs well regarding outliers (maxima and minima). For example, the prediction accuracy of a single LSTM model is better than that of the other three models on the two maxima of day 185 and day 206 and the two minima of day 155 and day 259. In order to objectively evaluate the performance of the four models, we calculated the predicted results according to the above formula, and the results are shown in Table 2. e evaluation results show that the performance of the DWT-LSTM model is better than that of the other three neural network models. Although the performance of the predicted outliers is not as good as that of the single LSTM model, the overall prediction accuracy is the highest. e value of MAE and RMSE can explain the above phenomena, and the average absolute error and the average error of the LSTM model are greater than the DWT-LSTM model, which shows that the predictive value of the LSTM model is large, so it is also more likely to approximate the real value in the case of abnormal values. But the average absolute error and the average mean and mean error are relatively small, and the relatively small number of changes is far from the real value, which is higher than the other models and can more effectively predict the change of the concentration of NO 2 , which can be more effective in the prediction of other areas or other pollutants and more effectively guide the prevention and control of air pollution.

Analysis of Influencing Factors.
In order to explore the relationship between various factors and NO 2 concentration, we changed the input indexes and conducted a series of experiments: (1) To investigate whether the weather conditions, wind power, and temperature difference are related to the concentration of NO 2 , we eliminate the meteorological index and only used NO 2 historical data and other 5 pollutants as input indicators for fitting. e results showed that the MAPE increased from 11.58% to 13.54%, proving that the meteorological index correlated with the concentration of air pollutants. en, on the basis of the above experiments, we successively added weather condition indicators, temperature difference, and wind force in the experiments, and the MAPES were 13.63%, 13.61%, and 11.53%. It is proved that among the three meteorological factors, wind power has the largest effect on NO 2 concentration; weather condition and temperature difference do not have much effect. (2) In order to explore the relationship between pollutant concentration and NO 2 concentration, we firstly forecast the historical data of NO 2 as a single input, and the MPAE is 16.61%. is proves that the historical data alone cannot effectively predict the future NO 2 concentration. As a result, we will explore which major air pollutants have the greatest impact on NO 2 in the future. On the basis of the historical data of NO 2 , we successively add PM2. showed an overall downward trend. e seasonal periodic change rule remained unchanged, and the peak value in winter also showed a downward trend every year. is shows that Tianjin has paid more attention to the ecological environment during the 13th five-year plan period, and its specific work has achieved results.
(1) Since 2011, Tianjin has been working in the development of new energy vehicles and has issued a series of related documents. By 2015, energy saving and new energy vehicles in the city increased to a total of more than 60000. In 2017 and 2018, this increased to 79000 vehicles. e promotion of new energy vehicles resulted in 36% of the total transport buses being new energy buses, bringing the total that has been put into operation to 3670. By the end of April 2019, the number of new energy vehicles in the city had reached 125,000. As a result, the coal consumption and oil consumption in Tianjin also tended to decline, and the concentration of NO 2 in the air also decreases accordingly. (2) During this period, Tianjin carried out the optimization and upgrading of traditional industries and forced the closure of a series of enterprises with serious pollution emissions, especially internal combustion engine production enterprises and nonferrous and ferrous metal smelters. On the basis of the traditional manufacturing industry, the industrial structure has been adjusted, focusing on the development of high-end equipment, new-generation information technology, aerospace, new energy vehicles, new materials, biomedicine, new energy, energy conservation and environmental protection, modern petrochemical, modern metallurgy, and ten other industries, while paying attention to the development of the service industry. erefore, NO x emissions are reduced by reducing emission sources. (3) We can clearly find that the concentration of NO 2 in spring and winter is higher than that in summer and autumn in Tianjin. is phenomenon may be caused by the low temperature in spring and winter, which is not conducive to the diffusion of NO 2 and leads to an increase in concentration. For Tianjin, this may be related to the cold weather in winter and spring, which requires a large amount of coal burning and some natural gas to heat the city. Fossil fuels such as coal produce a large amount of nitrogen dioxide, leading to an increase in the total NO 2 content. In addition, although fireworks are banned in Tianjin, fireworks are still set off in rural areas due to insufficient supervision, leading to an increase in the concentration of NO 2 in the city.

Cause of Prediction Deviation.
In 2012, the ministry of environmental protection and the general administration of quality supervision, inspection, and quarantine jointly issued the environmental air quality standard (GB 3095-2012), which has been implemented nationwide since January 1, 2016. e implementation of the new standard is slightly different from that before in the way pollutants are counted, so the numerical performance is inconsistent within the statistical range. In addition, the accumulation of historical data is insufficient, the high concentration value of  heavy pollution days is less, and the limited training data brings great uncertainty to the prediction of high values. On the other hand, DWT-LSTM is a data-based statistical model, which mainly relies on the empirical formula of meteorological parameters and historical monitoring data, and fails to consider the atmospheric chemical transformation, pollution source emission change, and regional transmission process. NO 2 reacts photochemically with O 3 and acts as a catalyst in the air to convert O 2 into O 3 . Most of the NO produced by human activities comes from the combustion of fossil fuels, such as automobiles, airplanes, internal combustion engines, and industrial kilns. It also comes from the process of producing and using nitric acid, such as nitrogenous fertilizer plant, organic intermediate plant, nonferrous, and ferrous metal smelting plant. NO reacts with the oxygen in the air in the atmosphere, generating NO 2 . erefore, NO 2 is associated with changes of industrial structure adjustment and pollution emissions. At the same time, external sources also contribute significantly to pollutant concentration during heavy pollution period.

Conclusions
In this study, a combined prediction model was established based on discrete wavelet decomposition and a neural network method to predict NO 2 concentration in Tianjin. e conclusions are as follows: (1) e combined prediction model uses wavelets to decompose the time series data of air pollutant concentration and takes the low-frequency and highfrequency data obtained after decomposition as input variables at the same time, thus increasing data dimensionality. rough the use of information representation of pollutant concentration time series data at different frequencies, the characteristics of the data can be better described.
(2) e prediction model was built using an LSTM neural network, a high-dimensional nonlinear learning algorithm. When applied to the prediction of Tianjin NO 2 concentration, the performance was not as good as that of a single LSTM model, but the overall prediction accuracy was the highest. However, due to the low dimensionality of pollutant concentration time series data, the representation of the information is incomplete, affecting the ability of the prediction model to generalize. (3) e DWT-LSTM neural network method can be used to accurately predict the air pollutant concentration. Compared with the single LSTM model, the MAPE decreased from 17.85% to 11.58%; the MAE and RMSE increased to 4.3377 and 5.9291, respectively. (4) e practical significance of the use of a statistical prediction method for urban development and social activities is demonstrated by discussing the reasons underpinning prediction deviations and NO 2 concentration change.
In the process of data modeling, because of the limitation of the hardware used in the experimentation, the data cannot be fully analyzed, including the exploration of the structure of the neural network, leading to less complex model design. Research in the future will increase the dimension of data collection and carry out further experiments and explorations under the improved hardware environment.

Data Availability
Raw data used to support the results of this study are included in the article.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.