A Novel Hybrid Method to Predict PM2.5 Concentration Based on the SWT-QPSO-LSTM Hybrid Model

PM2.5 concentration is an important indicator to measure air quality. Its value is affected by meteorological factors and air pollutants, so it has the characteristics of nonlinearity, irregularity, and uncertainty. To accurately predict PM2.5 concentration, this paper proposes a hybrid prediction system based on the Synchrosqueezing Wavelet Transform (SWT) method, Quantum Particle Swarm Optimization (QPSO) algorithm, and Long Short-Term Memory (LSTM) model. First, the original data are denoised by the SWT method and taken as the input of the prediction model. Then, the main parameters of the LSTM model are optimized by global search based on the QPSO algorithm, which solves the problems of slow convergence and local extremum of traditional parameter training algorithms. Finally, the PM2.5 daily concentration data of Chengdu, Shijiazhuang, Shenyang, and Wuhan are predicted by the proposed SWT-QPSO-LSTM model, and the prediction results are compared with those of single prediction models and hybrid prediction models. The experimental results show that the proposed model achieves higher prediction precision and lower prediction error than other models.


Introduction
e problem of air pollution is increasingly significant as industrialization and urbanization progress, and it has attracted attention worldwide. PM 2.5 (fine particles with aerodynamic diameter ≤2.5 μm) mainly comes from the exhaust emission of urban traffic and the exhaust gas generated by various industrial activities. As the primary component of atmospheric pollution particles, PM 2.5 is the culprit of haze weather. e epidemiological studies at home and abroad show that long-time exposure to a high concentration of PM 2.5 will cause blood pressure rise, arrhythmia, and even stroke, myocardial infarction, atherosclerosis, and other diseases, seriously threatening human health [1]. Besides, PM 2.5 also has adverse effects on climate change. It leads to abnormal rainfall, exacerbates the greenhouse effect, affects traffic visibility, and improves the traffic accident rate, thus hindering the development of the transportation industry and regional economic development [2]. erefore, gathering PM 2.5 concentration data in real time is critical for managing air pollution, preventing cardiovascular and cerebrovascular diseases, and improving regional climate, ecological balance, and regional economic development. Because of the serious negative impact of PM 2.5 on people's life, more and more individuals are paying attention to PM 2.5 concentration predictions. PM 2.5 concentration prediction is considered to be an important and effective method to alleviate the negative impact of PM 2.5 .
Physical models and data-driven models are two types of PM 2.5 concentration prediction methodologies. e physical model simulates the diffusion, propagation, and elimination process of PM 2.5 through physical models. e advantage of this method is that it can illustrate the pollutant propagation and transformation process of PM 2.5 , which allows for better knowledge of PM 2.5 pollution [3]. e typical physical models include the multiscale air quality model CMAQ developed by the U.S. Environmental Protection Agency (USEPA), the mesoscale weather prediction model chemical module WRF-Chem developed by the National Center for Atmospheric Research (NCAR) and the National Oceanic and Atmospheric Administration (NOAA), and the China Unified Atmospheric Chemical Environment (CUACE/ Haze-fog) developed by the China Meteorological Administration. Because the diffusion mechanism of PM 2.5 and other pollutants is very complex, the accuracy of the physical model is affected by many meteorological elements, such as temperature, humidity, wind direction, and season. Also, the physical model is often sensitive to the initial and boundary conditions and has high uncertainty. Besides, due to the time-consuming simulation process and the high calculation cost, the application of this model is greatly limited. In comparison, the data-driven model adopts statistical and machine learning methods to predict the PM 2.5 concentration by mining the historical information and related influencing factors of PM 2.5 concentration in meteorological observation stations. is model can obtain high-accuracy prediction results. e data-driven air quality prediction models can be divided into three types: statistical prediction models, machine learning prediction models, and hybrid prediction models. e common statistical prediction models for PM 2.5 concentration are ARIMA Model, grey prediction model (G, M), Markov model, and ANFIS [4][5][6][7]. However, the statistical prediction models are relatively simple and the prediction accuracy is relatively low. e prediction models based on machine learning mainly include Artificial Neural Network (ANN), Support Vector Machine (SVM), Support Vector Regression (SVR), and Least Squares Support Vector Machine (LS-SVM) [8][9][10][11]. However, SVM, SVR, and LS-SVM have high requirements for parameter selection and cannot deal with the problem of big data. ANN needs a lot of data for training and it is prone to overfitting in the process of data training and prediction. e lack of a single appropriate method encourages the growth of hybrid models. To increase the model's prediction performance, a hybrid model integrates data preprocessing with various optimization techniques. e decomposition algorithm is commonly used in data preprocessing to reduce noise from the original data and extract the data's effective features. rough the decomposition algorithm, the original data can be transformed into a form suitable for machine learning model training and prediction. e decomposition algorithms used in PM 2.5 concentration prediction mainly include empirical mode decomposition (EMD), variable mode decomposition (VMD), and wavelet decomposition [12][13][14]. EMD and VMD algorithms lack a mathematical basis, so they suffer from several problems such as wrong signals or pattern confusion. For the wavelet packet decomposition algorithm, there is no reliable and accurate expression of nonsmooth signals. An optimization algorithm is usually adopted to increase the accuracy of model prediction by optimizing the initial values and weights of machine learning models. Common optimization algorithms include the genetic algorithm, particle swarm optimization, and grey wolf algorithm [15]. Some studies combine the optimization algorithm with a machine learning prediction model to predict PM 2.5 concentration. For example, Niu et al. [16] combined comprehensive ensemble empirical mode decomposition (CEEMD), SVR, and Grey Wolf Optimizer (GWO) to predict PM 2.5 concentration. Sun and Sun [17] constructed a hybrid model that combines principal component analysis, LSSVM, and Cuckoo algorithm to predict PM 2.5 concentration. Cheng et al. [18] believed that the prediction accuracy of hybrid models such as Wavelet-ANN, Wavelet-ARIMA, and Wavelet-SVM for PM2.5 is significantly higher than that of ARIMA, ANN, and SVM.
In recent years, the application of deep learning methods has achieved great success in numerous fields, such as image classification, natural language processing, and time series prediction. Some scholars have begun to use deep learning models to predict air quality, and the most successful prediction model is Long Short-Term Memory (LSTM). LSTM has the characteristics of long-term and short-term memory for time series data, so the prediction methods based on this model have good generalization ability and fault tolerance. Chang et al. [19] considered that the aggregated LSTM model achieves a better prediction effect for PM 2.5 concentration than SVR, GBTR (Gradient Boosted Tree Regression), LSTM, and other models. Dhakal et al. [20] used Kathmandu Valley as a case study and discovered that the LSTM model predicted PM 2.5 better than the SARIMA model. Some scholars construct hybrid models combining data preprocessing algorithms with LSTM models to predict PM 2.5 concentrations. Yan et al. [21] proposed a hybrid model combining LSTM with the stationary wavelet transform, which is superior than SVR, LSTM, and convolutional neural network combining Long Short-Term Memory (CNN-LSTM). Zhang et al. [22] combined VMD and bidirectional LSTM (BiLSTM) to predict PM 2.5 concentration, VMD algorithm is better than EMD algorithm in preprocessing, and, compared with SVR, MLP, and LSTM model, the prediction effect of BiLSTM has obvious advantages. Jin et al. [23] constructed a hybrid model that combines wavelet transform and nested LSTM to forecast PM 2.5 concentration; the results show that the performance of the proposed method outperforms those of other models such as Decision Tree, Random Forest, MLP, EMD-LSTM, and VMD-LSTM. However, the above-mentioned papers do not use evolutionary algorithms to optimize the hyperparameters of the LSTM model, which may affect the prediction results of the models.
In this paper, a new hybrid model is developed to predict PM 2.5 concentration in China using Synchrosqueezing Wavelet Transform (SWT), Long-Short Term Memory (LSTM), and Quantum Particle Swarm Optimization (QPSO). e contributions of this work are summarized as follows: (1) In data preprocessing, the SWT is used to decompose and reconstruct the data. As a classical time-frequency analysis method, the wavelet transform method has a solid mathematical theory basis, and it can accurately express the time-frequency local properties of signals. SWT is developed on the basis of wavelet transform theory; compared with other wavelet transforms, SWT reinstantiates energy only in the frequency direction and preserves the temporal resolution of the signal, which can improve the time-frequency resolution of the whole process without changing the clarity of the instantaneous points; these features of the SWT method are suitable for reconstructing PM 2.5 data. (2) QPSO is applied to optimize the LSTM model. e selection of LSTM parameters affects the prediction accuracy of PM 2.5 . Based on the PSO algorithm and the principle of quantum mechanics, QPSO makes the motion state of particles follow random rules so that the optimization search range of particles can be expanded to the whole feasible solution space. Meanwhile, the global optimal solution can be found faster and more accurately. e experimental results indicate that the proposed hybrid prediction model may successfully improve the PM 2.5 concentration prediction effect.

Principle of Synchrosqueezing Wavelet Transform.
Traditional wavelet analysis is widely used in signal denoising because of its good time-frequency characteristics and multiresolution characteristics. SWT is a time-frequency domain redistribution method proposed by Daubechie et al. [24], which is a further development of wavelet transform (WT). Based on the continuous wavelet transform (CWT), this method adds and recombines wavelet coefficients with the same instantaneous frequency; that is, the wavelet coefficients near the central frequency are squeezed to improve the fuzzy phenomenon in the scale direction. Meanwhile, the time-frequency curve is thinned so that there is no cross term, which can significantly improve the time-frequency resolution and reconstruct the signal without distortion. Using the SWT method, the energy in the time-scale plane is redistributed according to the module size of each element, and the plane is transformed into a time-frequency plane through special mapping. is method achieves good results in various applications, such as climate analysis, mechanical fault diagnosis, signal denoising, civil engineering structures, harmonic and interharmonic detection, and seismic signal extraction.
SWT is based on continuous wavelet transform (CWT), and the phase of the signal in the frequency domain processed by wavelet transform is not affected by scale transformation. CWT is applied to a signal sequence s(t), and the expression of wavelet coefficient can be obtained as follows: where ϕ(t − b/a) is the complex conjugate of the wavelet function; a is the scale parameter (also known as the stretching parameter) and a > 0; b ∈ R represents the translation parameter, and the wavelet basis function is According to equations (1) and (2), wavelet transform can be regarded as the convolution of wavelet function and signal after telescopic translation transform, and the convolution coefficients are arranged in the time-scale plane.
In practice, the collected signal sequence usually contains various noise and human factors. When the coefficient |W| s obtained by wavelet transform is close to 0, the phase of W s will be relatively unstable. For a threshold c, if |W s | ≤ c, this part is filtered out. According to different noise levels of wavelet coefficients, the threshold of each layer of wavelet coefficients can be adaptively obtained by using the following equation: where n represents the number of sampling points of the original signal; σ 2 η is the noise variance; and W s (a 1: n v , b) is the wavelet coefficient on the length of the scale factor.
In fact, the obtained wavelet coefficients spectrum always diffuses in the scale direction, and the focusing performance is unsatisfactory.
is makes the time-frequency diagram blurred. However, the phase of wavelet coefficients is not affected by scale transformation, so the instantaneous frequency W s (a, b) can be calculated according to wavelet coefficients W s (a, b; ϕ) (|W s | > c): rough the calculation of equation (4), the time-frequency domain redistribution is realized, and the time-scale plane is transformed into a time-frequency plane. is is the basic idea of compressed wavelet transform. Based on this, the value of the neighborhood interval [ω l − 1/2Δω, ω l + 1/2Δω] of any frequency ω l can be compressed to ω l . e formula of synchronous compression transformation can be expressed as where a k is the discrete scale and k is the number of scales. When the signal is in the discrete state, the scale coordinate Δa k and frequency coordinates Δω are discrete values: Computational Intelligence and Neuroscience SWT is a time-frequency rearrangement algorithm. Different from other rearrangement algorithms in the past, SWT can not only increase the time-frequency resolution ratio but also reconstruct the characteristic segment of the signals in the time-frequency diagram.
us, SWT is reversible, and its inverse transform (ISWT) can be denoted as Signal reconstruction can be realized by equation (7), where ψ * (ξ) represents the conjugate Fourier transform of a basic wavelet function equation.
To sum up, the extraction of signal sequences based on SWT consists of the three following steps: (1) Use CWT to transform signal s(t) to obtain the wavelet coefficients W s (a, b; ϕ) (2) Obtain the instantaneous frequency from the wavelet coefficients (|W s | > c) and obtain the synchronous compression coefficient T S (ω l , b) by synchronous compression (3) Use the effective signal for synchronous compression transformation and reconstruction to realize signal extraction

Principle of Long Short-Term Memory Neural Network.
Recurrent neural network (RNN) is often used to handle the data with sequence changes. RNN is a network containing loops, which allows information persistence. Every neural network module of RNN transmits the message to the next module through replication. However, when the interval between the predicted position and relevant information magnifies to a certain extent, RNN will lose its learning ability because it cannot connect such far information. LSTM is a specific RNN proposed by Hochreiter and Schmidhuber [25], which can solve the problem of gradient disappearance or explosion in long sequence training. LSTM achieves better performances on training longer sequences than conventional RNN. Unlike the standard RNN with only one transmission state (h t ) and a single tanh layer, LSTM has two transmission states (c t and h t ), and its duplicating module includes four layers that interact in a special way. Besides, LSTM adds a "processor" named cell to the algorithm, which can estimate whether the input information is serviceable. ere are three gates in a cell: input gate, forgetting gate, and output gate. ese gates protect and control the state of the cells. When information enters the LSTM network, the cell determines whether this information is useful according to the rules. Meanwhile, the information consistent with the algorithm identification is left, and the inconsistent information is forgotten through the forgetting gate. e processing process of LSTM is mainly divided into three steps. e first step discards the irregular information in the cell state through the forgetting gate according to the rules. e gate reads h t − 1 and x t and assigns each cell a value between 0 and 1 in the cell state C t − 1 , where 1 indicates "completely reserved" and 0 indicates "completely discarded." At this time, the cell status output is e second step determines the new information stored in the cell state according to the rules. e Sigmoid layer is called "input layer," and it resolves the value that needs to be updated. e tanh layer creates a new candidate value vector, and C t is involved in the conditions. e formula for input door is as follows: e state of an old cell is updated so that C t − 1 is updated to C t . e expression is as follows: e third step decides the final output value based on the state of the cells. e output gate determines the information output by the cells, and its formula is as follows: Here, W is the weight measurement of every door, and b is the deviation.

Principle of Quantum Particle Swarm Optimization.
QPSO is an improved PSO algorithm proposed by Sun et al. [26] to handle the problems of premature convergence and local extremum of standard PSO algorithm.
In the PSO algorithm, particles have two attributes: location (X) and velocity (V). In a D-dimensional solution space, the state of particle i is expressed as e fitness function is calculated to determine the fitness to express the advantages and disadvantages of the particle. Each time the particle is updated, its fitness needs to be recalculated. e particle with the best fitness is selected to update the optimal positions of individuals P best � (p i1 , p i2 , . . . , p iD ) and the group optimal position p gbest � (P g1 , P g2 , . . . , P gD ). e update of the speed and position of particles at a certain time depends on the state of particles at the previous time.
us, the speed and position of particles at any time are not random, and the particle optimization search cannot cover the whole solution space and falls into the local extremum easily. at is, the PSO algorithm is not a global convergence algorithm, and the final optimal solution obtained by this algorithm is the local optimal solution instead of the global optimal solution. With the expansion of applications, the PSO algorithm suffers from some problems, such as early convergence and dimension explosion.
Following the principle of quantum mechanics, QPSO algorithm uses the quantum behavior of particles for optimization. Specifically, the particles follow the random rule of quantum motion. e motion state of the particles at present is no longer affected by that of the previous moment.
us, the particle search scope can be extended to the whole feasible solution space, and the global optimum solution can be obtained faster and better. QPSO algorithm improves the iterative behavior and optimization strategy of particle swarm optimization, and it has the advantages of fewer parameters, high efficiency, and global optimization.
When a particle following the quantum motion rules moves, its velocity and position are uncertain. In this case, the motion state of the particle is expressed by the probability density function (PDF) of the particle at a certain point in the solution space. In our study, PDF is replaced by the square of the wave function, and the exact position of the particle is simulated by the Monte Carlo method. e update formula of particle position is where P t i � (P t i1 , P t i2 , . . . , P t iN ) is the random point where the particle moves in the feasible space; β is the contraction expansion control coefficient; μ is identified as a stochastic number within the range of [0, 1]; m best represents the average value of the historical optimal position of particles, and its computing formula is as follows: where N is the total number of particles in the population; D is the particles' dimension; P t best i is the individual optimal position of the i-th particle in the current number of iterations.
e optimal placement of individual particles interacts with the optimal location of the global particle swarm during particle motion, and P i is constantly updated. e iterative equation of the i-th particle is where p t gbest is the historical optimal location of the group and φ is a random number with [0, 1]. e procedure of the QPSO algorithm is illustrated in Algorithm 1.

Structure of the Prediction
Model. e PM 2.5 concentration prediction model studied in this paper mainly consists of the four following components: (a) Data Selection. PM 2.5 is the main component of smog particles. Exposed to high concentrations of PM 2.5 for a long time, people will suffer from an increased risk of respiratory, cardiovascular, and cerebrovascular diseases and even cancer. erefore, the PM 2.5 daily concentration data of a certain area is taken as the experimental data in this paper. (b) Data Preprocessing. ere are some problems in the collected data, such as missing values, anomalies, and inconsistent dimensions. erefore, missing value processing and standardization need to be performed on the experimental data before simulation analysis. In the experiment, the SWT method is used for data preprocessing. (c) Neural Network. e traditional neural network falls into the local extremum easily. Also, its convergence speed is slow, and the prediction precision needs to be improved. is paper uses QPSO to optimize the parameters of the LSTM neural network to construct a PM 2.5 concentration prediction model. (d) Analysis of Prediction Results. In this paper, the prediction precision of the simulation experiment is analyzed by calculating the values of RMSE, MAE, and MAPE. e structure of the prediction model is illustrated in Figure 1.

SWT-QPSO-LSTM Hybrid Prediction Model.
e key parameters in the SWT-QPSO-LSTM hybrid prediction model include the number of neurons in LSTM (L1 and L2), learning rate (ε), and training iterations (k). e four key parameters are taken as the characteristics of particle optimization, and the LSTM model is adjusted and optimized by the QPSO algorithm. e overall process of the algorithm design and optimization is as follows: (a) Standardizing the PM 2.5 historical concentration data processed by the SWT method. Because the LSTM neural network model is sensitive to the scale of input data, a large data scale will affect the training effect of the model. e daily PM 2.5 concentration data of 36 months before the sample data is taken as the training set, and the PM 2.5 concentration data of 12 months after the sample data is taken as the testing set. e data can be standardized by the following equation: where the default value of ymean is 0 and the default value of ystd is 1; x is the original PM 2.5 concentration data; xmean is the mean value of the original data; xstd is the standard deviation of the original data, and y is the standardized PM 2.5 concentration data with a mean value of 0 and standard deviation of 1. (b) Initializing the particle swarm parameters, including the population number, the learning rate, the maximum number of iterations, and the value range of particle position and velocity. (c) According to experience, this paper selects the initial values of L1, L2, ε, and k to establish the LSTM Computational Intelligence and Neuroscience model. en, the model is trained on the training set, and the results are compared with those obtained on the testing set. Finally, a set of superparameters are obtained by searching to minimize the prediction error of LSTM. is paper takes the mean square error (MSE) as the fitness function, which is calculated as follows: In the previous formula, y m and y m , respectively, represent the real value and predicted value of the sample. (d) e global optimal position gbest and local optimal position pbest are determined by the initial fitness value of the particles, and they are regarded as the historical optimal positions. e speeds and positions of the particles are updated. en, the corresponding particle fitness value is calculated and compared with the local and global optimal solution to improve the accuracy.
where n is the number of test samples; y i and y i are, respectively, the truth value and predicted value at time i.  (14) for

Simulation Analysis
gbest ← pbest (i,:) end for end do end ALGORITHM 1: QPSO algorithm. 6 Computational Intelligence and Neuroscience largest city in southwest China, has a more reasonable industrial structure and better air quality, and its air quality index is in the top of the 168 key cities in China. Wuhan, an important city in central China, has seen significant improvement in its air quality with the transformation of its industrial structure; now the Air Quality Index of Wuhan is ranked in the upper middle of China's 168 key cities. Shenyang is an important heavy industrial city, and its longterm economic development approach has led to rising resource and energy consumption; the air pollution problem is becoming increasingly acute; the Air Quality Index of Shenyang is in the lower reaches of 168 key cities in China. Shijiazhuang is located in the core of the Beijing-Tianjin-Hebei city cluster, it has poor air quality due to its special geographical location and unreasonable industrial layout, and its air quality index is at the end of 168 key cities in China. If the constructed model can well predict the air quality for these four cities, it means that the constructed model is effective. is paper uses the daily PM 2.5 concentration data from January 1, 2018, to March 31, 2021, as    Computational Intelligence and Neuroscience the case study data to determine the effectiveness of the constructed model. e data set can be obtained from the public air quality query website (https://www.aqistudy.cn/ historydata/). e PM 2.5 concentration unit is μg/m 3 , and there are 1,186 data points. e analysis revealed the phenomenon of missing values in the data, so the interpolation method was used to anomalously process the raw data, the data is divided into training and test sets according to a ratio of 7.3, the training data are used to fit the model, the test data are used to evaluate the performance of the constructed model, and their descriptive statistics are shown in Table 1.

Experimental Process.
e section is divided into two parts: data preprocessing and hybrid model prediction.

Data Preprocessing Results after SWT.
In this section, SWT is used to reduce the noise of the PM 2.5 concentration data. Figure 2 shows the SWT results of the original data, and Figure 2(a) presents the PM 2.5 concentration data before and after denoising; Figure 2(b) presents the SWT time-frequency diagram of the original data before and after noise reduction; Figure 2(c) presents the FFT spectrum of the original data before and after noise reduction. It can be seen from Figures 2(b) and 2(c) that the noise at the low frequency of SWT is well suppressed, and the frequency component is compressed.
is indicates that SWT can suppress random noise and extract characteristic frequency. Based on this, the noise interference of the original sample data is reduced by the SWT method to achieve more accurate prediction results.

QPSO-LSTM Prediction Model Based on the SWT Method.
To increase the prediction precision of the LSTM model, the QPSO algorithm is adopted to optimize the parameters of the LSTM model established on the SWT data preprocessing method. As we need to optimize the 4 parameters of the LSTM model, the optimization dimension is set to 4, in order to ensure that the parameter optimization could converge, the number of populations is set as large as possible, the population number is set to 100, and the value range of the learning rate is determined as [0.001, 0.01].
Taking the PM 2.5 concentration data of Chengdu as an example, the variation of the particle fitness with the increasing iterations of the SWT-QPSO-LSTM model is shown in Figure 3. e particle fitness finally stabilizes at 0.0070347. e variation of the maximum number of optimization iterations (K), learning rate (ε), and the number of hidden layer neurons (L1 and L2) with the number of model iterations is shown in Figure 4. It can be found that all four parameters converge at a certain point; the final parameter optimization result of the SWT-QPSO-LSTM prediction model is K � 89, ε � 0.0075506, L1 � 195, and L2 � 60. e results of parameter optimization for other models are shown in Table 2.
To verify the effectiveness of the SWT-QPSO-LSTM hybrid model proposed in this paper, the hybrid model is compared with the single prediction model SWT-LSTM and the hybrid prediction model SWT-PSO-LSTM. Besides, to prove that the accuracy of the PM 2.5 prediction model can be better improved by the SWT data processing method, the hybrid model proposed in this paper is compared with the QPSO-LSTM model without SWT processing and the QPSO-LSTM prediction model with the CEEMD data processing method. CEEMD is a decomposed method that is widely used in processing time-series data. Table 3 shows the evaluation indexes of the five prediction models on the PM2.5 concentration data of Chengdu, Wuhan, Shenyang, and Shijiazhuang. Data preprocessing has a greater impact on the prediction results, and the prediction results of the models with data preprocessing using either the SWT or CEEMD methods are significantly better than the prediction results of the models without data preprocessing; compared to the   CEEMD method, data preprocessing by SWTmethod results have smaller prediction errors. e LSTM models, whose parameters are optimized by the swarm intelligence algorithm, have the better prediction; among them, the LSTM model optimized by QPSO algorithm predicts slightly better than those optimized by PSO. e above analysis results indicate that the SWT-QPSO-LSTM hybrid prediction model proposed in this paper achieves less prediction error than other prediction models. is verifies the feasibility and effectiveness of the established model.

Conclusion
Aiming at the prediction of PM 2.5 concentration, this paper proposes a hybrid prediction model based on the LSTM neural network and the QPSO optimization algorithm. e data used in this study is preprocessed by the SWT method, and the obtained data is input into the QPSO-LSTM model for simulation and prediction. e prediction results of the proposed model are analyzed, and the model is compared with other hybrid models. It is proved that the proposed method achieves better feasibility and effectiveness than other methods. e specific conclusions are as follows: (1) e use of the SWT method to preprocess nonstationary signals can realize signal reconstruction, which improves the accuracy of subsequent data prediction (2) e use of the QPSO optimization algorithm to optimize the key parameters of the LSTM model can effectively avoid the impact of parameter setting on the prediction performance (3) e proposed SWT-QPSO-LSTM hybrid prediction model achieves better prediction performance than e model proposed in this paper provides a theoretical basis for air pollution control; however, there are still some limitations. How to balance prediction accuracy and complexity of the model has always been a difficult task. Our proposed model has obvious advantages in prediction accuracy; however, the time spent on training the deep model is also worthy of attention. In the future, we consider embedding the model in the parallel framework such as Apache Spark to improve the model running efficiency. In addition, meteorological factors are not considered in this paper; we will add more influencing factors and use spatiotemporal deep learning neural network model to improve PM 2.5 prediction accuracy in the future work.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.