Difference Equation Model-Based PM2.5 Prediction considering the Spatiotemporal Propagation: A Case Study of Bohai Rim Region, China

Accurate reporting and prediction of 
 
 
 
 PM
 
 
 2.5
 
 
 
 concentration are very important for improving public health. In this article, we use a spectral clustering algorithm to cluster 44 cities in the Bohai Rim Region. On this basis, we propose a special difference equation model, especially the use of nonlinear diffusion equations to characterize the temporal and spatial dynamic characteristics of 
 
 
 
 PM
 
 
 2.5
 
 
 
 propagation between and within clusters for real-time prediction. For example, through the analysis of 
 
 
 
 PM
 
 
 2.5
 
 
 
 concentration data for 92 consecutive days in the Bohai Rim Region, and according to different accuracy definitions, the average prediction accuracy of the difference equation model in all city clusters is 97% or 90%. The mean absolute error (MAE) of the forecast data for each urban agglomeration is within 7 units 
 
 
 
 μg
 /
 
 
 
 m
 
 
 3
 
 
 
 
 
 
 . The experimental results show that the difference equation model can effectively reduce the prediction time, improve the prediction accuracy, and provide decision support for local air pollution early warning and urban comprehensive management.


Introduction
refers to particulate matter with a diameter less than or equal to 2.5μm in the atmosphere, also known as fine particulate matter. Although the content of PM 2.5 in the atmosphere is sparse, it has a significant impact on air quality and visibility. Studies have shown that, PM 2.5 is the main source of a variety of respiratory diseases [1]. erefore, the accurate prediction of PM 2.5 is not only conducive to the monitoring of the existing governance effect but also provides direction for the further development of air governance in the future and can provide the people with the best travel time.
ere are many studies on PM 2.5 prediction [2,3]. Each method deals with the problem from a different perspective. Among them, statistical methods and satellite remote sensing techniques are the most widely used. e statistical method is an empirical prediction method. Common statistical methods include linear regression models [4,5], neural networks [3], and nonlinear regression models [6,7].
Although the statistical method is convenient and simple to operate, it needs to collect a large amount of data in advance, and the data processing speed is slow. Although satellite remote sensing techniques [2] have wide coverage and a long time, the equipment cost is high, and it is not suitable for predicting data in a small area for a long time.
Due to the development of applied mathematics, the use of equation models to study the propagation laws and development trends of atmospheric pollutants such as PM 2.5 has become an extremely important subject in biomathematics research. ere have been many more mature studies in recent years [8][9][10]. For example, Wang et al. [11] first established a partial differential equation model based on space-time dimensions to predict PM 2.5 , and then in 2020, they predicted PM 2.5 based on a data-driven ordinary differential equation model [12]. However, these models are all differential equation models, and one of the most important assumptions for using differential equation models is the continuity of time. But in reality, the collected data are all discrete, so the establishment of a differential equation model has certain errors. e difference equation is the discretization of the differential equation, so the differential equation model is also a powerful tool for predicting data.
is article uses the difference equation model to predict and analyze PM 2.5 .
is work aims to explore large-scale (between urban areas) air pollution migration and make further predictions. Specifically, we build a specific difference equation model based on the network and clustering of 44 cities in the urban agglomeration in the Bohai Rim region, combined with local emissions and global diffusion, which is used to describe the temporal and spatial dynamic propagation process of PM 2.5 in the region. For this model, no large-scale calculations are required. At the same time, the simulation results show that the model not only has good predictive ability but also can provide policy insights to a certain extent, providing a more scientific theoretical basis for controlling air pollution in the Bohai Rim Region.
is article mainly uses multisource data to make shortterm forecasts of PM 2.5 . is study uses data on geographic distance, wind direction, wind speed, and PM 2.5 concentration between cities. Figure 1 shows the framework of this research. e main content of this paper includes the following parts: Section 2 gives the research area, clusters the research area according to geographical distance, wind direction and wind speed, and other conditions, and constructs the difference equation model of PM 2.5 spatiotemporal propagation; Section 3 gives related prediction results of PM 2.5 and the error analysis of the prediction results are carried out; Section 4 gives the summary discussion of this article and some thoughts on the later work.

Study Area.
e Bohai Rim Region refers to the Bohai Rim coastal economic belt dominated by the Liaodong Peninsula, Shandong Peninsula, and Beijing-Tianjin-Hebei.
is area accounts for about 13.31% of the country's land area and 22.2% of the total population. At present, the concentration of PM 2.5 is extremely high in densely populated urban agglomerations such as Beijing, Tianjin, and Hebei. erefore, urban agglomerations in the Bohai Rim Region are facing serious PM 2.5 problems. Figure 2 shows the study area. e red markers on the map represent the 216 major air monitoring stations in the area, and the black markers represent all prefecture-level cities in the area.

Fine Particulate Matter (PM 2.5 ) Data.
e city clusters in the Bohai Rim Region in this study include 44 prefecturelevel cities in 5 provinces of Beijing, Tianjin, Liaoning, Hebei, and Shandong. e research data used in the study covered 92 days of PM 2.5 concentration data from July 1, 2020, to September 30, 2020. e average daily PM 2.5 levels for each cluster were calculated based on the daily PM 2.5 levels of all cities in the cluster. Specifically, we calculate the average PM 2.5 concentration of each prefecture-level city based on the data collected from 216 monitoring stations and then calculate the average PM 2.5 concentration of each city cluster based on the PM 2.5 concentration of each prefecture-level city concentration. All research data comes from the National Urban Air Quality Real-Time Release Platform of China Environmental Monitoring Station. e original PM 2.5 concentrations of each city are normalized to a discrete level value 1, 2, . . ., and 6, according to Ambient Air Quality Standards (GB3095-1996) of China, where PM 2.5 concentrations are divided into 0-35, 36-75, 76-115, 116-150, 151-250 and greater than 250μg and these different concentration ranges are leveled from 1 to 6, describing that air quality is good, mild, moderate, severe, highly severe, and seriously severe.

Clustering and Embedding.
In the study of regional transport of PM 2.5 , we divided 44 cities in the Bohai Rim Region into four city clusters so that we could conveniently put forward a specific difference equation model to describe the transmission process of PM 2.5 within and between clusters. e motif M 8 in Figure 3 reflects the movement of PM 2.5 from the source of infection to the target in PM 2.5 city network. We use M 8 as the basic module of the complex network and use the high-order spectral clustering algorithm in [13] to divide the urban agglomeration in the Bohai Rim Region into four clusters, as shown in Figure 4. For related work on high-order spectral clustering, see [14].
As mentioned above, we cluster the 44 cities in the Bohai Rim Region into 4 disjoint sets through the high-order spectral clustering algorithm, i.e., U 1 , U 2 , U 3 , U 4 ; we will order U i (i � 1, 2, 3, 4) in a meaningful way [11]. For general clustering partition, the spatial arrangement of U i can be based on specific modeling goals and social or geographical characteristics of the underlying network. In [15,16], the level of democracy, diaspora size, international economic relations, and geographical proximity are used to order U i . In [17], friendship hops are used to define distance metric, then U x is embed at location x based on that x-axis being used as the social distance. But for PM 2.5 , meteorological conditions are the most important factor affecting PM 2.5 concentration. erefore, in this study, we sorted these sets according to wind direction. From July to September, the prevailing wind direction in the Bohai Rim has been southerly, so the four city clusters are projected from south to north on the y-axis of the Cartesian coordinate system and the geographic locations are named U 1 to U 4 , as shown in Figure 4. Figure 5, the pollution sources in each city cluster have a greater impact on the cluster (local emission), and different city clusters also influence each other through factors such as air flow (global transport). erefore, this paper proposes a difference equation model with time and space factors to describe the dynamic propagation process of PM 2.5 . For a city cluster, factories, cars, etc., in the cluster will generate a large amount of PM 2.5 . e generation and dissipation of PM 2.5 in the cluster can be regarded as local emission. When PM 2.5 in a city cluster   Discrete Dynamics in Nature and Society spreads to another or multiple clusters along with airflow and other factors, it can be regarded as a global transport.

Model. As shown in
In the following, we propose a nonlinear difference equation-based model to abstractly translate the PM 2.5 transport into two processes: local emission and global transport (in Figure 5). e local emission reflects the diffusion within the cluster and the underlying network structure and is directly related to the cluster. Global transport is the spread of PM 2.5 between clusters due to airflow and other factors, usually manifested as a more or less random walk. is approach will extend our analysis of difference equation modeling results.
Following is the description of the difference equation model where t))] represents the regional transport (global transport) of PM 2.5 between different clusters, where (1) D(x) describes the PM 2.5 transport ability of the cluster at location x. Different city clusters have  different PM 2.5 transportation capabilities, so a piecewise function is used to represent D(x); the value of each segment needs to be determined according to the actual situation.
(iv) r(t)u(x, t) α [1 − (u(x, t)/K)] β represents the spread process (local process) within a cluster. Where α and β are real numbers greater than 0. is mathematical expression has been used to describe and predict the dynamics of various populations, such as the growth of bacteria and tumors [18].
(1) r(t) is the growth rate with time t in the local process. It depicts PM 2.5 dissipation with the external changing factors such as wind or certain other atmospheric conditions [19,20]. erefore the form r(t) can be expressed as eir optimal value will be determined by the actual data collected by us.
(2) K is the carrying capacity of the system (the maximum possible volume of u at a given location x).
is the initial function (PM 2.5 concentration at time t � 1 to be φ(x), which specifies that the initial function has to be always ≥0). (i) MAIA(Mean Absolute Increment Accuracy), which is proposed in this paper based on the practical significance, is defined as follows: where n is the number of sample points in test data set and AIA evaluates the absolute accuracy at each sample point. ere are totally six PM 2.5 concentration levels from level one to level six, and AIA describes the absolute accuracy in the view of level length [11].
(ii) MRA(Mean Relative Accuracy), which is defined as follows: where n is the number of the sample points in the test data set.
2.6. Error Definition. All of the experimental results are presented in this section, and they are evaluated using three criteria: the mean absolute error (MAE), the mean absolute percentage error (MAPE), and the root mean square error (RMSE), where y t denotes the actual value of PM 2.5 concentration, and y t is the predicted value of PM 2.5 concentration. which are computed as follows: (i) MAE(Mean Absolute Error): MAE is the average of the absolute error, which is defined as follows: e smaller the value of MAE, the better the accuracy of the model, which better reflects the actual situation of the predicted value error. (ii) RMSE(Root Mean Square Error): RMSE is used to measure the deviation between the observed value and the true value. RMSE is more sensitive to outliers. RMSE is defined as follows: (iii) MAPE(Mean Absolute Percentage Error): MAPE is used to measure the relative error between the predicted value and the actual value to measure the accuracy of the model. MAPE is defined as follows:

Results
Using the difference equation model proposed in this paper, the actual concentration, concentration level, and absolute error of PM 2.5 in the Bohai Rim Region from July 1, 2020, to September 30, 2020, for 92 consecutive days are used to verify the real-time prediction effect of the model.

Prediction Accuracy.
After collecting PM 2.5 data from 44 cities in the Bohai Rim Region, we proposed the following prediction process: First, we normalized the data to reduce experimental errors and calculated the daily average concentration and corresponding concentration level of each cluster. We used the first day's data to construct the initial Discrete Dynamics in Nature and Society data, and then used the three-day training data set to predict the PM 2.5 concentration level on the fourth day. at is, we used 1-3, 2-4, 3-5 days as training data, and predicted the data on the 4th, 5th, and 6th days accordingly, and recorded the 4th, 5th, 6th. . . prediction accuracy of all 4 regions in the day. Specifically, we took the detailed forecasting process on day 4 as an example. e data from the first day is used to build the initial functionality. Next, we calculated the concentration change data from day 1 to day 3 and used the concentration change data on days 1-3 to calculate the parameters in the model through the lsqcurvefit function in Matlab. Finally, we used the obtained parameters to predict the data on day 4. Figure 6 shows the predicted results of PM 2.5 concentration levels in four city clusters from July 1, 2020, to September 30, 2020. By observing the image, it can be found that the difference equation model can effectively predict the PM 2.5 concentration level, and the obtained prediction curve (represented by the blue line) is roughly the same as the actual curve (represented red the blue line). erefore, the difference equation model in this article provides an accurate estimation of the PM 2.5 concentration level, and the predicted trend is basically consistent with the actual change trend.
According to the definition of accuracy, we divide it into mean relative accuracy and mean absolute increment accuracy. In this study, the mean absolute increment accuracy reflects a precise definition of the PM 2.5 concentration value range, while the mean relative accuracy reflects a precise definition of the PM 2.5 concentration value. e prediction accuracy of each city-cluster for a total of 92 days from July 1, 2020, to September 30, 2020, is shown in Figures 7 and 8. According to the precision definition of MAIA, in Figure 7, most of the "×" marks are located above the horizontal line 0.9, which means that the predicted value of most days in each cluster is higher than 90%, and we get that the average accuracy of the 4 clusters is all higher than 95%. Compared with Figure 7, the "+" in Figure 8 is not as dense as in Figure 7. However, through observation, it is found that the "+" in Figure 8 is mostly above the horizontal line 0.85, which shows that the relative accuracy of each cluster is higher than 80%. And through Figure 7 and Figure 8, it can be found that the MAIA and MRA of all clusters are higher than 97% and 90%. Figure 9 is a line graph of the predicted and actual values of PM 2.5 concentration of cluster 1 on the right and the histogram of the actual error of cluster 1 on the left. As can be seen from the line diagram in Figure 9, the coincidence degree between the actual value curve and the predicted value curve is very high. As can be seen from the error histogram, the proportion of days with an error of 5 units is 46.94%, and the proportion of days with an error of 10 units is 84.78%. It can show that the model has good predictive performance. e prediction line graphs and error histograms of the remaining clusters are shown in Figure 10. Figure 11 shows the prediction error histogram of the model, which can prove the effectiveness and stability of the developed model. It can be seen from Figure 11 that the MAE, RMSE, and MAPE values of each city-cluster are relatively small. In summary, it can be seen that the overall performance of the difference equation model is relatively good. It can not only accurately predict the concentration of PM 2.5 , but also provide stable data.

Discussion
is paper adopts a prediction method of a difference equation model based on a spectral clustering algorithm. Mainly through the analysis of weather conditions such as wind speed and wind direction in a region, a spectral clustering algorithm is used to divide a region into several city-clusters. By analyzing the relationship between PM 2.5 in these several city-clusters, a specific difference equation model is established to describe the global and local propagation process of PM 2.5 , thereby predicting the PM 2.5 concentration of each cluster. After testing, the difference equation model based on the spectral clustering algorithm has high prediction accuracy and strong significance. e prediction of PM 2.5 is basically the same as the actual observation value. It has certain practicability and can provide people with choices for travel, but there are certain defects that still need to be improved continuously in the application.
However, studies have shown that the PM 2.5 concentration is also affected by weather factors (such as temperature, humidity, wind speed, and precipitation) and other particulate matter indicators (such as CO, NO, and SO 2 ). Especially rainfall has a huge impact on PM 2.5 concentration. erefore, the next step will consider adding more weather factors and other particulate matter index data to improve the prediction accuracy of PM 2.5 concentration.

Data Availability
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study.