Spatial-Temporal Characteristics Analysis of Air Pollution in Hubei Province Based on Functional Data Clustering

In recent years, China has attached great importance to ecological and environmental protection, upholding the important concept that clear waters and lush mountains are valuable assets. In order to explore the characteristics of air pollution in this area, the air quality index (AQI) is studied by clustering analysis of functional data. Firstly, the change curve of the air quality index is reduced by principal component analysis of functional data. Secondly, the basis coe ﬃ cients of principal components are clustered by K -means. Finally, the characteristics of air pollution in Hubei Province are analyzed from the aspects of the time dimension and space dimension. The results show that 13 prefectural administrative regions in Hubei Province can be divided into four types of areas according to air quality changes, showing a trend of increasing pollution from north to south and showing a signi ﬁ cant similar pattern in the time dimension, with signi ﬁ cant seasonal characteristics.


Introduction
With the rapid development of economy, urbanization, and industrialization, energy consumption increases rapidly, and environmental pollution becomes increasingly serious. While people are improving their living standards, they are becoming more and more concerned about their living environment; air quality has increasingly become one of the focuses of environmental protection. In the face of serious environmental pollution, ecosystem degradation, and resource constraints, the Chinese government attaches great importance to ecological and environmental protection and adheres to the important concept that clear waters and lush mountains are valuable assets. Green development has been incorporated into the new development concept, and pollution prevention and control has been included in the three key battles. As a major province in central China, Hubei Province has always attached great importance to air pollution prevention and control. In recent years, Hubei Province has launched a campaign to help prevent and control air pollution. In light of the frequent ozone pollution in summer and the frequent heavy pollution weather in autumn and winter, Hubei Province focused on coordinated control of ozone and PM 2.5 , earnestly implemented air pollution prevention and control measures, and earnestly fulfilled the air quality improvement targets.
In the past 10 years, severe haze pollution occurred frequently in many big cities in China, mostly in autumn and winter, and showed regional characteristics. As a major province in central China, most cities of Hubei Province have often been plagued by haze pollution, which has closed highways and caused respiratory problems among residents [1][2][3][4][5][6]. Several studies have been carried out on air pollution in Hubei Province [7][8][9][10]. In general, the variation of air pollutants is highly dependent on their emissions and weather conditions, including PM 2.5 , NO 2 , and O 3 [11][12][13]. Based on the air monitoring data of 17 cities in Hubei Province from 2004 to 2013, the spatial and temporal evolution characteristics and main influencing factors of urban ambient air quality were analyzed [14]. A population-based study in Wuhan, China, was conducted in a cohort of 95,911 live births during a twoyear period from 2011 to 2013 [15].
The above studies have thoroughly analyzed the characteristics and influencing factors of air pollution and meteorological factors in Hubei Province, but the application of statistical methods is not deep enough. According to the record of real-time ambient air quality monitoring data in Hubei Province, for a certain monitoring point, the air quality data is continuously updated according to the time dimension, and it has certain functional characteristics [16][17][18]. We can view the day (or year) as a cycle as a curve and treat the curve as an integral element of an abstract space (such as Hilbert space) for functional data analysis. In recent years, functional data analysis is one of the hot spots in statistical research and has been widely used in the study of air quality problems [19][20][21][22]. However, there are very few literatures using functional data analysis to study air quality in Hubei Province. The main contribution of this study is to transfer the techniques of multivariate principal component analysis to functional data analysis and use functional data analysis to conduct dimensionality reduction analysis of the air pollution in Hubei Province, and K-means method is used for clustering of 13 prefecture-level administrative regions.
In this paper, according to the functional characteristics of ambient air quality monitoring data in Hubei Province in 2020 and 2021, the curve is made with annual cycle, and the curve is regarded as an integral element of abstract space (such as Hilbert space) to conduct functional data analysis. This paper is organized as follows. In Section 2, some research data and functional data analysis methods are introduced. In Section 3, an empirical study is carried out. In Section 4, we give some conclusion and discussion.

Materials and Methods
2.1. Research Data. By the end of 2021, Hubei Province had jurisdiction over 13 prefecture-level administrative regions, namely, Wuhan, Huangshi city, Xiangyang city, Jingzhou city, Yichang city, Shiyan city, Xiaogan city, Jingmen city, Ezhou city, Huanggang city, Xianning city, Suizhou city, and Enshi state. There are 103 county-level administrative regions, of which Xiantao, Qianjiang, Tianmen, and Shennongjia are directly administered by Hubei Province. In this paper, as the statistical data of Xiantao, Qianjiang, Tianmen, and Shennongjia is missing in the public information, in addition, the economic volume and area of the above four regions are small; 13 prefecture-level administrative regions of Hubei Province, including 12 prefecture-level cites and Enshi state, are selected as the research objects, and the daily air quality index data of 2020 and 2021 are collated and analyzed. The data source was the air quality index (AQI) historical data of air quality on the website of the Department of Ecology and Environmental Protection of Hubei Province, China. Some missing data come from China Air Quality Online Monitoring and Analysis Platform (https://www .aqistudy.cn/), which is authoritative.

General
Steps for Functional Data Analysis. The steps of a functional data analysis approach generally include the fol-lowing: (1) collect and sort out the original data collected. (2) Select appropriate basis functions to convert discrete time series data into functional data. (3) By taking the derivative of the obtained smooth curve, the dynamic change law of the data can be found. For example, the phase plan of the function can be obtained by analyzing the first and second derivatives of the curve, and the growth trend of the function and the transformation of kinetic energy and potential energy can be analyzed. (4) Describe various characteristics of the data, including covariance function, mean function, and correlation coefficient function. (5) Finally, the functional data was analyzed, such as functional canonical correlation analysis, functional principal component analysis, and functional linear regression.

Clustering Method for Functional Principal Component
Analysis. Cluster analysis is one of the important data analysis methods in statistics [23,24]. It can not only be used as an independent tool to obtain the data distribution and observe the characteristics of each cluster but also focus on some specific clusters for specific analysis. Cluster analysis is to automatically group the sample observation data without prior knowledge according to the characteristics of the data and the "degree of affinity" in nature, so that the structural features of individuals within the group have high similarity, while the characteristics of individuals between groups have low similarity. In recent years, clustering methods for functional data have gradually matured, and they are mainly divided into three categories: traditional clustering methods (such as K-means and systematic clustering) after dimensionality reduction, nonparametric methods using special distance or curve difference, and model-based clustering methods.
For the above monitoring data, the clustering method based on functional principal component analysis is selected in this paper. Functional data is the data information of many scales (moments) of each observation object for the same variable in a certain paper (including time interval). In the study of functional data, time is regarded as the variable corresponding to multivariate data (change factor), and it is found that we are faced with "dimensional disaster." In this case, the technique of multivariate principal component analysis can be migrated to functional data analysis, which is called functional principal component analysis. Functional principal component analysis (FPCA) is a generalization of traditional PCA.

Functional Principal Component Analysis
. Let x i ðsÞðs ∈ TÞ, i = 1, 2, ⋯, n be square integrable function on the interval T. And x i ðsÞ, i = 1, 2, ⋯, n on the interval is integrated into a comprehensive variable: where βðsÞ is the weight function.

Wireless Communications and Mobile Computing
After processing such as curve standardization, the corresponding variance is The covariance of xðtÞ, xðsÞ is The first principal component meets the following conditions: Based on meeting the vertical weight function of the previous (k − 1) principal components, the k-th principal component is to solve the above optimization problem, namely, solving Solve the characteristic as follows: Let VβðsÞ = Ð T 0 νðs, tÞβðsÞds; then, we can get VβðsÞ = λ βðsÞ and solve for the characteristic functions and eigenvalues.

Estimation of the Model.
Let fx 1 , x 2 ,⋯,x n g be the observed value of a set of samples fX 1 , X 2 ,⋯,X n g; then, ∀s , t ∈ T; the estimation of mean function μðtÞ and covariance function νðs, tÞ can be expressed as Let ΦðtÞ = ½ϕ 1 ðtÞ, ϕ 2 ðtÞ,⋯,ϕ K ðtÞ ′ be the vector formed by the given basis function and c i = ½c i1 , c i2 ,⋯,c iK ′ be the vector of basis expansion coefficients; then, x i ðtÞ can be expressed as The choice of basis functions satisfies the property of quasiestimator functions. Generally speaking, there are two types of basis functions: the first is fixed basis functions, among which the common ones are Fourier basis and B-spline basis.
In this paper, the Fourier basis function is selected, and x i ðtÞ is normalized (that is, the mean function b μðtÞ is subtracted). For simplicity, it is still denoted as x i ðtÞ.
Let C = ½c 1 , c 2 ,⋯,c n ′ denote the coefficient matrix of n rows and K columns; then, the resulting family of curves can be written in the vector form as follows: Let N = n − 1; the estimated value νðs, tÞ of covariance function is shown in Let the characteristic function is the parameter vector to be estimated. And we substitute b νðs, tÞ for νðs, tÞ, and Equation (6) can be replaced by Then, we substitute Equation (11) into the left end of Equation (12) and obtain We substitute the expansion of β j ðsÞ and Equation (13) into Equation (12) and then obtain The characteristic function β j ðsÞ is obtained by solving the eigenvalue problem of the matrix equation. Note that condition b ′ Wb = 1ðkβ j k = 1Þ is satisfied.

Principal Component Basis of the Family of Air Quality
Curves. If f f 1 , f 2 ,⋯,f L g is a set of orthogonal basis functions, each sample curve x i ðtÞ can be approximately expressed as

Wireless Communications and Mobile Computing
The global approximation criterion is to minimize the objective function PCASSE (PCA sum of squares due to error), as shown in

Empirical Study
3.1. Acquisition of Data. Since there are many major pollutants involved in the air quality evaluation index, including fine particulate matter (PM 2.5 ), inhalable particulate matter (PM 10 ), sulfur dioxide (SO 2 ), nitrogen dioxide (NO 2 ), ozone (O 3 ), and carbon monoxide (CO), the urban AQI is determined by the maximum value of the air quality subindex of single pollutant. On the basis of the above theories, in order to study the variation characteristics of air quality in Hubei Province, we select the AQI data of 13 prefecture-level administrative regions of Hubei Province as the original data for analysis, including Wuhan, Huangshi, Shiyan, Yichang, Xiangyang, Ezhou, Jimgmen, Xiaogan, Jingzhou, Huanggang, Xianning city, and Enshi state, and the time span includes January 1, 2020, to December 31, 2021. As AQI data are essentially discrete points, they are not presented in the form of function curves, but with the change of time, they gradually show some functional features, as shown in Figures 1  and 2. Therefore, it is suitable to regard the data over a relatively long period of time as curves, so the selection of AQI meets the requirements of functional data.

Data
Fitting. In this paper, a set of appropriate basis functions are selected, and the target curve is obtained by approximating the linear combination of basis functions. From the perspective of functional data, the AQI of 13 prefecture-level regions is analyzed, and the AQI trend chart of 2021 is drawn. It is found that the AQI trend is low in the middle and high on both sides, and there is a certain symmetric relationship. Fourier basis function and B-spline basis function are the common fixed basis functions, and the former applies to periodic functional data, while the latter applies to aperiodic functional data. Therefore, the Fourier basis function is adopted in the paper, and the number of basis functions is set as 33. The AQI curve of each city is a functional data.
Next, we use the derivative to characterize the change of AQI. Figure 3(a) is the first-order derivative of the 2021 AQI curve, and Figure 3(b) is the second-order derivative of the AQI curve. The first derivative is whether the AQI is going up or down, and if the first derivative is greater than zero, the AQI is going up, and if the first derivative is less than zero, the AQI is going down. The second derivative is the rate at which AQI rises and falls. It can be found from Figure 3 that the rise and fall of AQI in the 13 prefecture-level regions are generally consistent, and it is obvious that the trend of the first-order derivative being less than zero increased after the end of July, indicating that the AQI of each city began to decline after the end of July 2021, but began to rise from September, and reached the peak in December and January of the next year. However, Figure 3(b) shows that there are differences in the rate of rise and fall among cities.

Functional Principal Component Analysis.
In order to further analyze the change dynamics of AQI over time, we conduct the analysis through functional principal components. The original data of 13 prefecture-level regions have been functionally processed. Through analysis, the first four principal components are extracted from the curve families of the air quality changes of 13 prefecture-level regional administrative regions under the jurisdiction of Hubei Province in 2020 and 2021, and the cumulative variance contribution rate was 91.48% and 92.34%, respectively, which could represent the vast majority of the information of the curve. The weight functions (characteristic functions) corresponding to the first four principal components in the functional PCA of AQI in 2021 are shown in Figure 4. Then, we can calculate the four principal components of AQI in 13 prefecture-level regions to further analyze their variation characteristics.

K-Means Clustering Analysis of Principal Component
Basis Coefficients. In this paper, the principal component basis coefficients α im are used for K-means clustering. Let K denote distribution number of the principal component basis coefficients; if K = 3 in 2020 and 2021, the clustering results of 2020 are as follows: the first kind includes Wuhan, Huangshi, Yichang, Ezhou, Jingmen, Xiaogan, Huanggang, and Suizhou; the second kind includes Shiyan, Xianning, and Enshi; the third kind includes Xiangyang and Jingmen; and the clustering results of 2021 are as follows: the first kind includes Wuhan, Huangshi, Yichang, Ezhou, Jingmen, Xiaogan, Huanggang, Xianning, and Suizhou; the second kind includes Shiyan and Enshi; the third kind includes Xiangyang and Jingmen; the classification results of two years were roughly the same, indicating that the air pollution characteristics of the 13 prefecture-level regions in 2020 and 2021 did not change much. Therefore, let us take K = 4, and the air quality curves of 13 prefecture-level regions are divided into four kinds; the specific clustering results are shown in Figure 5.

Spatial and Temporal Distribution Characteristics of Air
Pollution. According to the results of clustering, we take Wuhan, Yichang, Xiangyang, and Enshi as typical regions of each category and draw AQI change curves, which are shown in Figure 6. At the same time, the AQI monthly statistical data of the four cities in 2020 and 2021 are referred to specifically analyze the temporal distribution characteristics of air pollution in Hubei Province.
Hubei Province is located in the climate transition zone between north and south. It has a subtropical monsoon 4 Wireless Communications and Mobile Computing

Wireless Communications and Mobile Computing
climate with four distinct seasons, cold winter and hot summer, warm spring and cool autumn, and hot rain in the same season. The seasonal variation of precipitation is obvious, the precipitation in the most year (2020, 1708 mm) is about twice that of the least year (1966, 862 mm), with more in summer and less in winter, and the average annual precipitation in the province in 2021 is 1212 mm, close to the average level. Precipitation varies significantly from season to season, with more in summer and less in winter, mainly concentrated from May to September, and the precipitation accounts for about 63% of the annual total. The plum rain period (from mid-June to mid-July) has the most rainfall and the highest intensity.
As can be seen from Figure 6, the AQI in Hubei Province has a consistent trend of change in 2020 and 2021, with obvious similarity in time distribution, periodicity, and certain seasonal variation characteristics. The overall performance is the highest in winter, followed by autumn and spring, and the lowest in summer. The main pollution sources are concentrated in the ozone pollution in summer and the heavy pollution weather in autumn and winter. The AQI values in January and December of each year are larger and reach the maximum peak. The AQI value in July and August reached the lowest value of the whole year. From November, the AQI value fluctuated significantly and peaked again in December, showing a trend of low in the middle and high on both sides as a whole. The precipitation in Hubei Province is concentrated from March to July each year, which coincides with the time when the air quality is better.
Compared with 2020, the magnitude of the shock of 2021 has increased. Specifically, the AQI curve of Enshi state with the best air quality was relatively stable, with the annual AQI values below 100, and the air quality grades are all good order. The AQI values of the other three cities in January and December are generally greater than 100 and less than 150, and the air quality level belongs to the mild pollution level, and the AQI value of the other months is lower than 100, and the air quality level belongs to the good order. Among them, the AQI   also an important base of high-end manufacturing industry in China. Automobile is the first industry in Xiangyang industry, and its industrial scale ranks ninth among the top ten cities of automobile industry in China and the first among prefectural administrative regions. Xiangyang belongs to the north subtropical monsoon climate and is in a special geographical position. It is located at the southern edge of Nanxiang Basin, at the end of the ventilation corridor of Nanxiang Basin, and is the only way for the northern haze to move southward. The northern smog has had a direct impact on the city's heavily polluted weather. Therefore, the air quality index of Xiangyang is slightly higher than that of other cities in Hubei Province, which is also related to its industrial base and special geographical location. Enshi state is an inhabitation area of ethnic minorities in China. It has unique climate conditions, vigorous development of tourism, and perennial AQI in good order. All these are in line with the actual situation.

Conclusion and Discussion
The temporal and spatial distribution characteristics of air pollution in Hubei Province are first related to natural factors and geographical location. The spatial and temporal distribution of air quality on the map generally shows a gradual improvement from north to south, which is related to geographical environmental factors and human activities. There is sufficient rainfall in the rainy season in the south, which can effectively reduce the content of air pollutants. Hubei Province is a climate transition zone between north and south. Except for alpine areas, most of it has a subtropical monsoon climate, with distinct four seasons, abundant precipitation, cold winter, and hot summer. In autumn and winter, the weather becomes cooler, the temperature drops in the morning and evening, and the haze weather increases, which is not conducive to the diffusion of pollutants. In spring and summer, the reduction of air pollution depends on meteorological factors such as the increase of wind and frequent precipitation. Secondly, air pollution is closely related to industrial emissions, coal-fired emissions, motor vehicle emissions, large population, and other human activities. The air quality curves of all prefecture-level administrative regions in Hubei Province showed a synchronous fluctuation. The air pollutant concentration is lower in the months with concentrated precipitation. The plain and basin are affected by pollution factors such as industrial emissions and motor vehicle emissions, and the pollution was more serious than that of the mountainous and hilly regions. The classification of four typical areas of air pollution in Hubei Province is helpful to determine the hot spots of pollution and the sources of pollution, formulate targeted pollution control strategies, and then explore the regional cooperative management mode of air pollution in Hubei Province. Heavily polluted cities such as Xiangyang and Jingmen are home to a large number of highly polluting industries such as automobile manufacturing, oil refining and petrochemical, and cement. Industrial energy consumption is still dominated by coal, and air pollution is mainly related to industrial emissions. Therefore, we should actively increase pollution control input, adjust industrial industry structure, and take a new road to industrialization.
The air quality changes of the 13 prefecture-level administrative regions in Hubei Province have certain correlation on the time scale, and the functional correlation analysis can be carried out on the curve function of the air quality changes of each prefecture-level administrative region.

Data Availability
The simulation experiment data used to support the findings of this study are available from the corresponding author upon request.