Change Analysis of Spring Vegetation Green-Up Date in Qinba Mountains under the Support of Spatiotemporal Data Cube

In recent decades, global and local vegetation phenology has undergone significant changes due to the combination of climate change and human activities. Current researches have revealed the temporal and spatial distribution of vegetation phenology in large scale by using remote sensing data. However, researches on spatiotemporal differentiation of remote sensing phenology and its changes are limited which involves high-dimensional data processing and analysing. A new data model based on data cube technologies was proposed in the paper to efficiently organize remote sensing phenology and related reanalysis data in different scales. The multidimensional aggregation functions in the data cube promote the rapid discovery of the spatiotemporal differentiation of phenology. The exploratory analysis methods were extended to the data cube to mine the change characteristics of the long-term phenology and its influencing factors. Based on this method, the case study explored that the spring phenology of Qinba Mountains has a strong dependence on the topography, and the temperature plays a leading role in the vegetation green-up date distribution of the high-altitude areas while human activities dominate the low-altitude areas. The response of green-up trend slope seems to be the most sensitive at an altitude of about 2000 meters. This research provided a new approach for analysing phenology phenomena and its changes in Qinba Mountains that had the same reference value for other regional phenology studies.


Introduction
The phenology is an important indicator reflecting the status of vegetation growth as it responds to the ecological environment correspondingly [1]. Meanwhile, vegetation phenology has impact on the food supply, animal activities, and human health, further affecting the carbon budget and material recycling process at a global scale [2]. Therefore, the phenology observation and analysis are a critical way to understand the changes of the environment. In recent years, studies have found vegetation phenology changed significantly since the last century due to dual influence of global climate change and human activities. Several analyses have revealed the earlier green-up dates and delayed dormancy. Although the researches have shown no trends in spring and autumn phe-nology during the global warming hiatus (1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012) [3], the green-up date of some climate-sensitive areas still had an obvious advancement, such as Qinba Mountains. As the transitional belt between the north and south climate zones of China, this area shows complex topography and spatiotemporal heterogeneity of climate. More than six cities are distributed in this area. It is essential to understand the spatiotemporal differentiation of the vegetation phenology and related influencing factors in Qinba Mountains.
Nowadays, the network of site-based phenology observation has been established in some regions/countries [4][5][6]. The long-term FLUXNET measurements were analysed to present no widespread advancement of vegetation green-up date within the warming hiatus [3]. The phenology modelling methods based on the in situ observation of specific plants were also proposed to predict the trends with the climate changes [7]. The in situ observation data has high quality but limited site numbers. In Qinba Mountains, complex topography and rich biodiversity lead to great spatial heterogeneity of phenology distribution and its changes [8]. The observation data of limited monitoring sites cannot describe the spatial distribution of the phenology in the entire research area properly.
Remote sensing data provided long-term observations for large-scale land surface phenology research. The NDVI (normalized difference vegetation index)/EVI (enhanced vegetation index) dataset was used widely to find SOS (start of the growing season), EOS (end of the growing season), and LOS (length of the growing season) accordingly with the global climate change [9]. Various available remote sensing data source and related phenology parameter extraction methods established a good scientific foundation for the further analyses of the change characteristic [10][11][12]. The remote-sensed phenology map described the considerable spatial differentiations, especially between urban and rural areas, which suggested the potential influence of an urban heat island caused by human activities [13,14]. Land cover change and vegetation type were also considered to examine the differences of phenology change rate in the northern high-latitude areas [11]. In Qinba Mountains, several studies showed the overall spatial distribution of the spring phenology has an advanced trend [15,16]. The spatial pattern was consistent with the local hydrothermal conditions and has strong topographic dependence in the area [17,18].
The complex topography and special geolocation of Qinba Mountains lead to various types of climatic conditions with differentiations on several aspects, such as longitude, latitude, altitude, and slope. Therefore, it is necessary to explore the pattern of the phenology distribution in the area from multiple aspects. Although phenology change in Qinba Mountains draws much attention, studies of comprehensive change analysis from multiple dimensions are limited. Current studies by using remote sensing data mainly focused on the parameter extraction, overall spatial patterns, and trends of phenology. Such an analysis involves spatial and temporal unified computing and high-dimensional data processing of remote sensing data. A traditional remote sensing data model is a layer-based stack that is not efficient for multidimensional map algebra. In addition, the inconsistency of multivariate data scales poses great challenges to this research. Currently, data cube technologies become more and more popular for remote sensing data processing [19,20]. Data cube organizes gridded data through multidimensional arrays and is indexed by coordinates. Therefore, its dimensional independence and array operations help to achieve high-performance computing. The data cube can handle large-scale data efficiently if combined with an array database and parallel procedures [21,22]. Based on this idea, the EO Data Cube provides comprehensive platform supports from data acquisition to analysis services [23,24]. This paper built a tidy data model under the support of data cube technologies. Based on this model, the change analysis of vegetation green-up date along multiple dimensions has been investigated to discover more spatiotemporal differentiation.
The natural exploratory analysis method in the data cube has been used to understand climate-phenology-human interactions in the area. This research provided a new approach for analysing phenology phenomena and changes in Qinba Mountains that had the same reference value for other regional phenology studies.
The paper is organized as follows: Section 2 describes the data sources and methodology. Section 3 presents the results and analysis works based on the data cube. Section 4 and Section 5 finish with discussion and conclusions, respectively.  Figure 1). Its western part is connected to Qinghai-Tibet Plateau, and the east reaches the North China Plain. The landscape of mountains is dominant in the area with several basins, such as Hanzhong, Ankang, and Shandan, distributed in the middle. Due to the diverse hydrothermal conditions and the significant vertical changes, there are three climate types from the south to the north in Qinba Mountains: north subtropical climate, subtropical-warm temperate transitional climate, and warm temperate climates. The research area within the black boundary as shown in Figure 1 is covered by geographical regions of 80 counties with a total area of 230,000 km 2 and a maximum elevation of over 4000 m.

Data and Methodology
We utilized the long-term vegetation green-up products in the period of 1981-2016 from NASA MEaSUREs Project which fused measurements from different satellite missions and sensors: AVHRR (Advanced Very High Resolution Radiometer), MODIS (Moderate Resolution Imaging Spectroradiometer), and VIIRS (Visible infrared Imaging Radiometer). The products had been validated with the ground observations from US national phenology network (US-NPN) at a climate modelling grid (CMG, 0.05 DEG) resolution (https://vip.arizona.edu).
SRTM (Shuttle Radar Topography Mission) digital elevation data (DEM) was resampled to 5 km resolution by the nearest-neighborhood methods that were then used to generate slope and aspect data with the same resolution. Land use data in the year 2015 was derived from 500 m land cover products provided by Climate Change Initiative (CCI) of the European Space Agency (https://www.esa-landcover-cci .org). All the land use types in raw data were reclassified into six groups: settlement place, cropland, forest cover, other natural vegetation, water body, and bare land. The meteorological data are provided by the Meteorological Science Data Sharing Service Network of China (http://data.cma.cn), including the daily average temperature and precipitation from 43 national meteorological stations within the 40 km buffer zone of the study area. Daily meteorological data were interpolated to raster data at 5 km resolution by Ausplin4.2 tools using thin plate smoothing splines and DEM data. All the data were projected onto Albers equal-area conic projection along with a WGS 1984 coordinate system, which is 2 Journal of Sensors considered a standard projection and coordinate system throughout our study.

Data Cube Model.
A data cube has been applied in serval domains for multidimensional data managing and processing [25]. Therefore, there are no standards for the model definition and operations. Normally, a data cube can be seen as the specified container of multidimensional arrays with spatial and temporal dimensions at least. Certainly, it is able to contain more numbers of arbitrary dimensions to be a hypercube. The data cube used in this paper can be defined as a tuple of three elements: <D, C, A, M>, where D represents the dimension set including space and time, and the space can be expressed as one or two dimensions (see Figure 2). The combination of all the dimensions is used to locate a fixed point-named cell in the cube. A is the attribute associated with the cube, such as name or projection information.
One cell stores a series of measurements (M in the tuple) from various data sources, such as temperature, precipitation, and phenology data. There are three types of measurements in the cube: numeric, proportional, and typological metrics. The numeric metric means a single value, such as temperature. The proportional metric is strutted as a dictionary recording the proportion of multiple categories in the cell, such as land covers. If one cell is covered by one type of land use, we employ specified integer to code instead, namely, typological metric. In the tuple, C means the coordinate set, and each coordinate must keep the same size with some of the dimension members. A coordinate is used to describe the dimension from special perspective. For example, we define DEM as a coordinate to label every cell in the space by elevation. That means different labelling or indexing rules can be established by coordinates beyond dimensions. It is worth noting that the data of the coordinate can be converted to or from the dimension and measurement that facilitates realizing different kinds of operation, such as slice, aggregate, and pivot.
All of data modelling work is based on the Xarray which is a python library designed for multidimensional array computing [26]. The object-oriented cube implementation includes DimensionSet, CoordinateSet, Measurement-Set, and AttributeSet. Both Measurement and Dimension are the types of DataArray object that support 1D time series, 2D array, or array of arbitrary dimensions. The Coordinate object is inherited from Dimension objects; thus, it can be transformed to Dimension. The data cube inherits the mathematical capabilities of multidimensional arrays, and the spatial and temporal functions are able to execute simultaneously to generate new cubes. For instance, Cube ′ = α Cube + β, where Cube is the input data and α/β is the linear  3 Journal of Sensors parameter. The proposed cube model is cached in memory, and NetCDF (Network Common Data Form) file is supported to store the big cube data in the disk. The cube keeps the file handle open and lazily loads its contents from on-disk datasets to improve the efficiency of memory utilization.

Spatiotemporal Pattern Analysis by Cube Aggregation.
Due to the high dimensionality and large amount of data cube, the spatiotemporal pattern cannot be recognized visually. Therefore, the data cube needs to be aggregated according to a certain scale on one or several dimensions to reduce dimensionality. Then the summarized maps/charts that conform to the human visual cognition would help to visually analyse the spatiotemporal patterns of metrics along different perspective/dimensions.
There are only space and time dimension in the proposed data cube; thus, the cube aggregations can be categorized into three types: temporal aggregation, spatial aggregation, and spatiotemporal aggregation (see Figure 3). The temporal aggregation summarized the cube into time bins, such as monthly average precipitation. The spatial aggregation divides the space into several zones according to values or geographical boundaries and then aggregates the data within the zones. The spatiotemporal aggregation generates spacetime zones for further statics, which is useful when analysing the impact of climate on phenology. For example, the corresponding cumulative rainfall on the cell, where each SOS is located, can be calculated since the time bins for cumulative rainfall on these cells are different. The cube aggregations can be described by Zonal function in multidimensional map algebra proposed by Mennis [27] as follows: where zone is the aggregation zones of specific dimensions and f indicates the aggregate functions, such as the mean/sum/variance. Based on the data cube model, the Zonal function is realized through three steps: (1) convert the data of coordinate for aggregation to the members of cube dimension, (2) construct the bins of the resultant dimension for aggregation, and (3) group the data of cube by bins and then carry out aggregate functions for each bin.

Trend
Analysis in the Cube. The Mann-Kendall method was widely used to examine the trend because it does not assume a specific distribution for the data and is not disturbed by outliers [28,29]. We utilized the method to test the trend for the time-series values in each bin of the cube. The formulas are as follows: where x j is the jth value in the current bin and n indicates the number of the bin's item. The trend for each bin time series is recorded as a z-score and a p value. The Theil-Sen method can be calculated for each bin to estimate the slope of the Mann-Kendall trend as follows [30]:  Figure 2: UML diagram of the data cube model. 4 Journal of Sensors Based on the result of the data trend analysis, the Getis-Ord Gi * statistic was employed to identify trends (hot spot) in the cube that had passed the significance test [31]. The formulas are as follows: where x j is the value of the cell in spatiotemporal bin j, w i,j is the spatiotemporal weight between bin i and bin j, and n indicates the numbers of bins with spatiotemporal proximity to bin i. Once Getis-Ord Gi * statistic was completed for each bin, the hot spot trends were tested using the Mann-Kendall method. The results of the data trends and hot spot trends were combined to find new, intensifying, diminishing, and sporadic hot and cold spots in the cube. ArcGIS 10.5 is utilized to carry out host spot analysis. The proposed data cube supports reading or writing NetCDF data that can be directly piped to ArcGIS hot spot analysis tool by using ArcPy (python library for ArcGIS).

Results and Analysis
3.1. Data Cube Building. Two data cubes were generated: the SOS data cube and climate data cube and the basic structures of them are shown (see Table 1).
There are two types of measures in the SOS data cube. One of them is SOS represented by days of year (DOY), and a 5 km grid is used as a spatial dimension. Another one is land use data that are aggregated into the same grid. Due to the use of map projection, the latitude information and longitude information are regarded as coordi-nates in the cube to facilitate further analysis based on geographical locations.
Time series climate data in the period of 13149 days are packaged into the climate data cube where two types of aggregate rules are built to generate new measurements (temperature and precipitation) for the SOS data cube. The first one is yearly aggregation by aggregating data within yearlength bins. The other one is aggregating data within the time bins, in the form of (SOS, SOS + N month], based on the SOS for each location. We denote the average value aggregated within one month before SOS as 1 m Lag and the summarized value aggregated within one month before SOS as Σ1 m Lag. The aggregated climate data are injected into the SOS data cube as new measurements.

Spatiotemporal Pattern Analysis. The topography of
Qinba Mountains is diverse with multidimensional characteristics, and the climate also varies in different dimensions, such as latitude, longitude, elevation, and slope. These differences may lead to a high degree of complexity, diversity, and heterogeneity in the phenology of Qinba Mountains. Figure 4 presents the spatial distribution of average SOS between year 1981 and 2016 that indicates obvious consistency with the distribution of the topography shown in Figure 1. The cities and residential areas are distributed in the valleys or basins which are located in the northern, southern, eastern, and central parts of the area. Therefore, the impact intensity of human activities on the environment should have a strong correlation with the topographical distribution. The spatiotemporal patterns of SOS were analysed separately according to the three topographic factors as follows.
The SOS varying with increased elevation was calculated by aggregation SOS into elevation bins as shown in Figure 5(a). It appears that SOS becomes earlier while elevation increases. In the range below the elevation of 700 m, SOS increases greatly at first, then decreases a little. The possible reason is that human activities are more frequent in this zone, and the SOS gradually returns to the natural level at around 700 m. Temperature becomes the major controlling factor in phenology at high altitudes. When it reaches 3500 m, the upward trend of SOS is gradually flattening. The time-   Figure 5(b) illustrates these changes more clearly, and it also indicates the SOS gets advancement year by year for the same elevation bins.
Slope is one of the important topographic factors that directly affects soil type, soil moisture, and nutrients and thus lead to different vegetation types. According to Figure 6(a), SOS increased significantly in the zone with a slope of 0°-5°w hile growing slowly in 5°-25°. When it reaches 25°, the growth rate becomes obvious. Figure 6(b) illustrates these changes in a 2D space. The tests demonstrated that the slope has a preventive effect on vegetation green-up.
The aspect is able to redistribute the sunlight, energy, and rainfall in a local environment. We define two aspect bins The SOS data cube was aggregated along both the aspect and elevation dimensions (Figure 7). The south-north slope differentiation of SOS is not obvious at an altitude of 1500 m or less. The SOS gap between the south slope and the north slope appears within the zone of (1500 m, 2000 m], but becomes smaller gradually until year 2000. In areas above 2000 m elevation, there have been significant SOS differences between the north and south slopes. The result reveals that the higher the altitude, the more obvious the south-north slope differentiation of SOS. However, the differentiation will gradually disappear with the global warming trend.

Trend
Analysis. The Mann-Kendall trend test was performed on the SOS data cube for the period from 1981 to   Figure 8. It can be seen that SOS has an advance trend in most areas, except for the areas where human activities are more frequent. In order to examine the response sensitivity of the SOS slope to different altitudes, we aggregated the trend slopes into different elevation bins (see Figure 9). The result shows that there is no significant change of the slope in the lowaltitude area. When the elevation increases over 600 m, the negative slope gradually increased. At around 2000 m, the response of slope seems to be the most sensitive. It would decrease as the altitude continues to rise. The test demonstrates that SOS changes in high-altitude areas are more sensitive than those in low-altitude areas, and the effects of human activities on low-altitude areas on SOS changes are diverse.
The slope analysed above reflects the overall trend of SOS. In order to identify trends and investigate the change process, the Getis-Ord Gi * statistic method is used in the SOS data cube (hot spot analysis). As shown in Figure 10, the result has been categorized into eight clusters: persistent cold spot, intensified cold spot, diminishing cold spot, consecutive cold spot, oscillating cold spot, oscillating hot spot, diminishing hot spot, and persistent hot spot.
A persistent cold spot represents a statistically significant cold spot for 90% of the time-step intervals with no discernible trend. These types of spots are concentrated in the urban and small town area where the human activities are always frequent for a long time. The area is surrounded by intensified cold spot, diminishing cold spot, and constitutive cold spot. The intensified cold spot located in rural areas means the SOS is getting earlier while diminishing cold spot indicates the delaying trend. A consecutive cold spot describes a location with a single uninterrupted run of statistically significant cold spot bins in the final time-step intervals. The above different phenomena may be caused by different land use patterns by human in the region, such as bringing in exotic trees for city greening or planting different crops.
The area covered by oscillating cold spots is at medium and low altitudes, surrounding the residential areas. For this 7 Journal of Sensors type of area, there has been a statistically significant hot spot in the past, but a statistically significant cold spot recently. The result indicates this area may be affected by the superposition of climate change and human activities, so the SOS trend fluctuates greatly. Most of the high-altitude areas are covered by diminishing hot spots, and the edge area is distributed with an oscillating hot spot. Nonchanging hot spots only occupy a small part of the western region.

Discussions
Temperature and precipitation are direct factors affecting vegetation phenology. In mountainous areas, topography would cause differences in soil nutrients, soil water content, local temperature, and even rainfall distribution. Therefore, the spatial distribution would be affected by topography indirectly as shown in previous analysis. Based on the climate data cube and SOS data cube, this section gave exploratory analysis of potential influence factors by quantitative means, taking the data of year 2015 as an example.
Based on the least squares method, the linear relationships between temperature/rainfall/elevation/slope/aspect and SOS were calculated, respectively. The results are shown in Table 2 where the time bins of temperature are the month of SOS (0 m Lag), one month before SOS (1 m Lag), and the year of SOS (yearly). The time bins of precipitation are the month of SOS (0 m Lag), one month before SOS (Σ1 m Lag), two months before SOS (Σ2 m Lag), and the year of SOS (yearly). It can be obtained that the temperature of SOS's month and the accumulated precipitation before two months have a higher correlation with SOS. In addition, the correlation coefficient of elevation is the largest among all factors.
The least squares linear regression model and a geographically weighted regression model were established based on the four independent variables (temperature of 0 m Lag, precipitation of Σ2 m Lag, slope, and aspect), resulting in the adjusted R 2 of 0.73 and 0.85, respectively. Since the temperature uses elevation as the covariate during the interpolation processing, it has a large collinearity with elevation, and then, we remove the elevation from independent variables.

Journal of Sensors
Geographically weighted regression considers the nonstationarity of space and thus has a better overall fitting effect than linear regression as shown in Figure 11(a). However, the results in the southern and northern areas are not ideal that may be related to the vegetation disturbance caused by human activities in these areas. We masked the area where a forest land accounted for less than 70% and performed the geographically weighted regression again. The adjusted R 2 reaches 0.91, indicating that this model is suitable for explaining the distribution of SOS in the forest land for the region    Figure 11(b)). The test further illustrates the climate of mountains has a strong dependence on the topography, and the temperature plays a leading role in the spring phenology distribution of the region. For other areas, the land use/cover should be combined to measure the intensity of human activities for further attribution analysis in the future work.

Conclusions
The above analysis reveals the spatiotemporal differentiation and change characteristics of the spring phenology distribution in Qinba Mountains. The spatiotemporal data cube provides a good support for the whole analysis process in terms of data structure, multidimensional analysis, and exploratory analysis. Compared with traditional methods, this approach helps to explore hidden spatiotemporal pattern and its impacting factors. It provides a new perspective on studying phenology, which can be a complement to the current methods and techniques. In addition, the proposed data cube has the potential to connect to more exploratory analysis tools and spatial statistical algorithms. The topography factors including elevation, slope, and aspect were considered in the paper. However, the mass elevation effect [32] and plant species are also the critical influence factors to the local climate, which have not been analysed here. The future work will include these factors in the study as well as land use/cover. In the absence of ground observation validation, the exploratory analysis of the relative change and trend in spatiotemporal distribution by using remote sensing data still has great advantages. For regional analysis, higher resolution remote sensing data of phenology is required.

Data Availability
The dataset of daily average temperature and precipitation used in this paper is publicly available from the Meteorological Science Data Sharing Service Network of China (http://data.cma.cn). The vegetation green-up products in the period of 1981-2016 are also provided freely by Vegetation Index and Phenology Lab, University of Arizona, USA (https://vip.arizona.edu).

Conflicts of Interest
The authors declare that they have no competing interests.