Seasonal Multifactor Modelling of Weighted-Mean Temperature for Ground-Based GNSS Meteorology in Hunan , China

1Hunan Provincial Key Laboratory of Clean Coal Resources Utilization and Mine Environmental Protection, Hunan University of Science and Technology, Xiangtan 411201, China 2SPACE Research Centre, School of Science, RMIT University, Melbourne, VIC 3000, Australia 3Hunan Meteorological Observatory, Changsha 410118, China 4School of Environment Science and Spatial Informatics, China University of Mining and Technology, Xuzhou 221116, China


Introduction
Atmospheric water vapor is a minor constituent of the atmospheric mass distributed in the lower atmosphere layer, but it is one of the main driving forces to weather changes and atmospheric circulation [1].A large amount of water vapor may lead to thunderstorms and other weather disasters after a period of accumulation in an area.The dynamical variation trend of water vapor is an important factor for climate prediction and weather forecasting [2].However, it is not practical to use traditional meteorological sensors (e.g., radiosonde balloons and water vapor radiometers) to observe water vapor at a high spatiotemporal resolution due to their high operational costs and low spatiotemporal resolution; for example, most of radiosonde balloons are usually launched twice per day [3].Moreover, it is more difficult to trace the dynamical variation trend of water vapor on small-and medium-scale weather systems timely and accurately [4][5][6][7], especially for the detection and prediction of sudden rainstorms.
Nowadays, Global Navigation Satellite Systems (GNSS) have heralded a new era for retrieval of atmospheric precipitable water vapor (PWV) due to its 24-hour availability, global coverage, high accuracy, high spatiotemporal resolution, and low cost [7][8][9][10].The atmospheric parameter directly estimated from the GNSS measurement is the zenith tropospheric delay (ZTD) of the GNSS signal.The ZTD is comprised of zenith hydrostatic delay (ZHD) and zenith wet delay (ZWD).PWV can be converted from the ZWD together with other atmospheric variables.Although the GNSS-derived ZTD can be directly assimilated into numerical weather prediction (NWP) [4,5], PWV has the potential to be used for the studies of severe weather [11][12][13][14] and climate changes [15,16].Previous studies [13,14,17] have shown that most severe rainfall events occurred in the descending trends of time series PWV over a station after a long ascending period.It is likely to rain after a steep ascent and sudden descent in PWV.Moreover, Benevides et al. [18] suggested that the reliability and accuracy of severe weather forecast could be improved by analyzing 2D or 3D variation in PWV fields with the aid of other meteorological data [13,[18][19][20].
GNSS-derived PWV is obtained from the ZWD multiplying by a conversion factor ^ [21,22] which is a function of weighted-mean temperature (  ) over the station [8].Therefore, the accuracy of   affects the accuracy of its resultant PWV [23,24].The most accurate method for obtaining   is to use both temperature and humidity profiles from radiosonde data [8,25].However, at most GNSS stations, there are no colocated radiosonde stations available due to their high operational costs and low spatiotemporal resolution.The commonly used method is to use the surface temperature (  ) and the relationship between   and   [26][27][28][29], for example, the empirical Bevis   model (BTM):   = 0.72  + 70.2 established in 1992 mainly for real-time applications.The BTM was derived from the profiles of vapor pressure and dew point temperature from 8718 observations over 13 midlatitude radiosonde stations (27 ∘ to 65 ∘ N) in North America over a 2-year period.If   observed by a meteorological sensor is applied, the BTM can achieve a good accuracy.However, due to the rapid spatiotemporal variation in atmospheric wet profiles, the   -  relationship may vary with location and season; it is found that the BTM performs unevenly globally, such as in China; its systematic bias of the BTM relative to radiosonde-derived   is generally above 4 K, with an extreme value of 8 K in some regions or seasons [30,31].Under an extreme weather condition with large amounts of water vapor, the BTM's error can result in a several millimeters error in resultant PWV [23].
Researchers have studied the   -  numerical relationship using the linear or nonlinear regression methods based on local radiosonde observations.Many regional   models (RTMs) have been established all over the world [31][32][33][34][35].Some RTMs have been established in China [36][37][38][39][40][41][42][43][44][45]; for example, the RTM established by Liu et al. [37] outperformed the BTM in the Hong Kong region.Li and Mao [36] verified the good accuracy of the   -  relationship using radiosonde data from the Beijing observatory and obtained monthly coefficients of the RTM for its use in eastern China.The RTM was obtained by Lv et al. [38] for the Chengdu region using radiosonde data.Yu and Liu [39] found that the accuracy of   derived from the BTM was correlated with altitude.The accuracies of all these RTMs were found better than the BTM.
Different from the above one-factor (i.e.,   ) model, some researchers established multifactor RTMs by adding air pressure (  ) and water vapor pressure (  ) as new variables into the model [40,44,45].Gong [40] analyzed the relationship between   and each of three above-mentioned factors based on 123 radiosonde stations all over China during 2008-2011 and established both one-factor and multifactor linear RTMs for different climate regions and seasons.He found that the multifactor RTM slightly outperformed the one-factor RTM.Nevertheless, Wang et al. [45] claimed no significant difference between one-factor and multifactor (e.g.,   and   ) linear regression results in Hong Kong.
Since 2011, the Hunan Continuously Operating Reference Stations (HNCORS) network has been established by the Meteorological Bureau and National Land Agency of Hunan Province, China.It consists of 93 CORS stations covering whole Hunan region [46], and the meteorological applications of the HNCORS have been put on the agenda.If local radiosonde data are available, a new RTM may be developed and it may perform better than the BTM.There are three radiosonde stations (Changsha, Huaihua, and Chenzhou) in the Hunan area and a three-year period (2012-2014) of radiosonde data recorded at these stations was available.Thus in this study we used the data during the period of 2012-2013 to establish a new RTM, and its accuracy was evaluated using the following two years' (i.e., 2013-2014) radiosonde-derived   (as the truth).This may provide the foundation for the applications of the ground-based GNSS meteorology in Hunan region.
The outline of this paper is as follows.The methodology of obtaining   from radiosonde and the relationship between   and other meteorological factors will be analyzed in the second section, followed by several multifactor RTMs established; then several sets of   derived from the BTM, a one-factor RTM, a two-factor RTM, a three-factor RTM, and seasonal two-factor RTMs are compared against radiosondederived   (as the truth) for their performance assessment in the third section.Conclusions are given in the last section.

Methods and Materials
2.1.Obtaining   .The Constant, Bevis formula, numerical integration, and approximate integration methods can be used to obtain   .The required factors, the difficulty in realisation, and the accuracy level of these four methods are compared in Table 1.
Among these methods, the Constant method is the simplest but has the lowest accuracy; the Bevis formula is most widely used and especially for real-time applications, but its accuracy may vary with location and season; the approximate integration needs the temperature lapse rate () and vapor pressure decline rate (), which cannot be obtained easily, and usually leads to a low accuracy; the numerical integration is the most accurate method and also easy to be implemented [37].Hence the numerical integration is adopted in this study.Its mathematical expression is [47] where  is the th pressure level and   ,   , and Δℎ  are the partial pressure (in hPa) of water vapor, atmospheric temperature (in Kelvin), and thickness (m) of the layer, respectively, and ( In fact, vapor pressure is a measure of the amount of moisture in air.Technically, it is the pressure of water vapor above where   is in Celsius.Equation ( 3) is given by the World Meteorological Organization in 2008 [48].

Obtaining PWV from GNSS-ZWD.
Generally, the ZTD of GNSS signals can be estimated using undifferenced precise point positioning (PPP) or differential strategies.In this study, undifferential PPP is adopted [14].The ZTD is usually divided into two parts: the ZWD and zenith hydrostatical delay (ZHD), and 90% of the ZTD is induced by dry air in the atmosphere.The ZWD is mainly caused by the atmospheric water vapor which varies rapidly in both spatial and temporal domains, so if it is obtained from an empirical model, its error or accuracy is at levels of 10%-20% of the ZWD value.
The ZHD can be calculated using the most commonly used Saastamoinen model as expressed below [49]: where   ,   , and ℎ  are the surface pressure (hPa), geographic latitude, and altitude of the station, respectively (km).
The accuracy of surface pressure measured by meteorological sensors is generally 0.2∼0.5 hPa, and the accuracy of the ZHD calculated from the Saastamoinen model can be millimeters.Thus the accuracy of GNSS-PPP-derived ZTD can be also at a level of millimeters, and the accuracy of the ZWD, calculated by ZWD = ZTD − ZHD, is also at millimeters [50].
The ZWD can be converted to PWV by the following formulas:  where ^is the conversion factor;  is the density of liquid water (1 g/cm 3 );  V is the specific gas constant for water vapor (461 J/K/kg), and  is the ratio of the molar masses of PWV to dry air.The values of the three physical constants are  1 = 70.60 ± 0.05 K/mb,  2 = 70.4± 2.2 K/mb, and  3 = 3.739 ± 0.0012 10 5 K 2 /mb, and the constant   2 set to 22.1 ± 2.2 K/mb by Bevis et al. [8] is most commonly used.

Data Source.
In this study, radiosonde data collected from balloon-borne instrument platforms with radio transmitting twice per day during the period 2012-2014 from the aforementioned three radiosonde stations, Changsha, Huaihua, and Chenzhou in the Hunan region (Figure 1), were used to calculate the time series of   over the three stations using (1).The time series of   (two   values per day) and its corresponding multiple meteorological factors (e.g.,   ,   , and   ) from the three stations during the period 2012-2013 were used to establish multifactor RTMs, and the   resulting from the radiosonde in the period 2013-2014 were used to assess the performance of the new RTMs.
The overall performance of the new multifactor RTMs was measured by the bias and RMS of the   time series at each station, as defined below: where   is the new RTM-derived (or predicted)   and  true is the radiosonde-derived   .

Analysis of
Supposing that there exists a factor   and it can be expressed by a linear combination of the other factors, as expressed in (7), and then  1 ,  2 , . . .,   are full collinearity.Otherwise, there is no collinearity among  1 ,  2 , . . .,   .
Collinearity can be a problem for parameter estimation because it inflates the variance of regression parameters and hence potentially leads to the wrong identification of relevant predictors in a regression model.It is a severe problem when a model is trained on data from one region or time and predicted to another with a different or unknown structure of collinearity.Parameter estimates may be unstable, making standard errors on estimates inflated and consequently inference statistics biased.
In this study, linear regression was used to establish multifactor RTMs and collinearity must be tested whether pairwise linear correlations exist among the three variables   ,   , and   .Hence, the following tolerance value (Tol) was used to test the collinearity among any two of them before the modelling of   was performed.
where  2 is the square of the two correlation coefficients between the two factors.As a matter of experience, a threshold value of 0.1 for Tol is often adopted [51].If Tol value exceeds 0.1, it means there is no collinearity problem between the two factors involved.
From the results shown in Figure 2, we can obtain the following tolerance values among   ,   , and   : - = (0.9 × 0.88) 2 = 0.73 Tol - = 1 − 0.73 = 0.27 We can see that all the tolerance values are much larger than 0.1, meaning there are no collinearity problems among   ,   , and   .Thus, we can use all the three factors in the modelling of the RTMs.

Results and Discussion
3.1.One-Factor RTM 3.1.1.Establishing RTM.The two-dimensional linear fitting method for the one-factor RTM has the same expression as the BTM; that is,   =  *   + .The radiosonde-derived   and   from all the aforementioned three stations during the period of 2012-2013 were used in the following observation equation matrix: where  is the residual vector and coefficients  and  were estimated using the least squares estimation method.
3.1.2.Accuracy of the RTM and BTM.The statistical histograms for the differences of the BTM and the RTM from the radiosonde-derived   during the period of 2013-2014 at the three stations are shown in Figure 3.The differences of the BTM are mainly in the range of about 0∼4 K (Figure 3(a)), while the differences of the RTM are mainly in the range of about −3∼4 K (Figure 3(b)).This indicates that the systematic difference, which reflects the accuracy, of the RTM result is much less than that of the BTM in the Hunan region.The left panes in Figure 4 show the three   time series (2013-2014) resulting from radiosonde (truth) and the BTM and RTM at each of the three stations; and the right panes show the differences of the two models' results from the radiosonde-derived   .
The statistical results for the bias and RMS of the above   time series at each station are listed in Table 2, in which the last row is the mean of all the three stations' results.Both the bias and RMS of the RTM results are significantly less than that of the BTM counterparts at all the three stations.The last row indicates the overall improvements of the RTM over the BTM are 88% (in bias) and 28% (in RMS).

Multifactor RTMs.
According to the analysis in Section 2.3.2,  has a very high positive correlation with   and   .And it also has a weak negative correlation with   .Therefore, in this section, multifactor regression will be used to establish two-factor and three-factor RTMs, and their performances will be compared to that of the one-factor RTM.

Establishing RTMs.
The multiple linear fitting method was adopted to model the multifactor RTMs.The radiosondederived   ,   ,   , and   from all the aforementioned three  stations in 2012-2013 were used in the following observation equation system: where  is the residual vector and coefficients , , , and  were estimated using the least squares method.If only   and   are taken into consideration in (10), a two-factor RTM can be obtained using observations from three radiosonde stations and expressed as follows:  When all the three factors are all taken into consideration, the resulting three-factor RTM is

Accuracy of Multifactor RTMs.
Figure 5 shows the statistical histograms for the differences of the above twofactor and three-factor RTMs from the radiosonde-derived   (as the truth).We can see that the differences of the two RTMs are mostly in the range of about −3 K∼3 K.As shown in Figure 6 (left panes), the time series of predicted   from the two RTMs have a very good agreement with the truth.The time series difference values in the right panes of Figure 6 show a similar variation trend and difference values in winter (Dec.-Feb.)are larger than summer (Jun.-Aug.).
The statistical results of the above   time series are listed in Table 3.The bias and RMS of the one-factor, two-factor, and three-factor RTMs are compared with each other.The last row is the mean of the model results of the three stations.We can see that biases of the three RTMs results are all near zero; twofactor and three-factor RTMs show a similar performance, in terms of RMS and both are better than the one-factor RTM, with an improvement of 7% (in terms of RMS).In practical applications, the selection of an optimal RTM is based on the amount of available meteorological data of the stations.

Seasonal Two-Factor RTMs.
As shown in Figure 6 (right panes), the accuracy of both two-factor and three-factor RTMs show a correlation with season.Due to the similar performance of the two RTMs, only the two-factor model was adopted for the investigation of the performance of seasonal RTMs in this section.The time series of   ,   , and   from the same three stations and the same period were divided into four seasons for establishing four seasonal two-factor RTMs.  predicted from the RTMs for the period 2013-2014 were compared to the yearly two-factor RTM established in Section 3.

Establishing RTMs.
Similar to Section 3.2, the multiple linear regression method was used to obtain the seasonal twofactor RTMs.The radiosonde-derived   , surface temperatures   , and water vapor pressures   from all the three stations in the period 2012-2013 were used in the following observation equation: where  is the residual vector and the coefficients ,  and  were estimated using the least squares estimation.The four seasonal two-factor RTMs obtained are in Table 4.The statistical results of the above   time series are listed in Table 5.It can be seen that both biases and RMSs of all the seasonal RTMs are noticeably smaller than that of the yearly RTM, especially for the bias, meaning that the seasonal RTMs outperform the yearly RTM.The four seasonal two-factor RTMs slightly outperformed the yearly two-factor RTM, with reduction of 3%, 10%, 2%, and 3% in the RMS values.

Comparison of Conversion Factor and GNSS-PWV
Resulting from Two   Models

Advances in Meteorology
Radiosonde-derived Bevis-derived Regional-derived 10   BTM and seasonal two-factor RTM-derived   are compared against its reference/truth (which is resulting from radiosonde-derived   ) during the two years period of 2013-2014 are calculated and its monthly statistical results are listed in Table 6.The statistical result for each month listed in the table is based on the same month in the two years period.We can see the monthly mean of the BTM resultant ^range from 0.1536 (Jan.) to 0.1630 (Jul.) and the seasonal two-factor RTM resultant ^range from 0.1529 (Jan.) to 0.1613 (Jul.).The last row is the mean of all the monthly means over the two years period.The mean bias of the BTM resultant ^over the two years is 0.0012 and that of the seasonal two-factor RTM is 0.0001 and their corresponding mean RMSs are 0.0016 and 0.0011.This also means a 92% improvement/reduction to the mean bias and a 31% improvement to the mean RMS made by the seasonal two-factor RTM compared with the BTM, over the two years period.

Comparison of GNSS-PWV Resulting from Two-Model-
Derived   .The Chenzhou CORS station (named CZSQ) has a colocated radiosonde station with a horizontal distance of about 10 m (but some data are missing due to the instrument or data transmission failures of the CORS network during 2015).Its GNSS-ZTD is calculated using the PPP strategy as mentioned in Section 2.2.The ZHD is determined using (4) with pressure values from a pressure sensor mounted at CZSQ.The two sets of PWVs converted from the GNSS-ZTDs together with both the BTM and seasonal two-factor RTM-derived   are compared against the true PWV (resulting from radiosonde data twice daily) in 2015, as shown in Figure 8. Table 7 is for a comparison of the statistical bias and RMS of the two sets of PWVs resulting from the two models derived   (against their truth).The seasonal two-factor PWV is improved by 37% and 12%, respectively.Compared to the improvements of   and conversion factor ^, the PWV improvement is not enough.The most likely reason is the largely missing PWV at CZSQ station due to the instrument or data transmission failures of the Hunan CORS network.

Conclusion
In this study, several new RTMs were established using radiosonde data in the period of 2012-2013 from three stations in Hunan region.Numerical integration and least squares estimation methods were adopted to obtain the   time series at the three stations and the coefficients of the regression models, respectively.The RTMs include a yearly one-factor RTM, a yearly two-factor RTM and a yearly three-factor RTM, and four seasonal two-factor RTMs.These RTMs were validated by comparing the   time series predicted from the RTMs for the period of 2013-2014 against the same period's radiosonde-derived   .Results showed that the yearly onefactor RTM outperformed the BTM, with the improvements of 88% and 28% in bias and RMS, respectively.The twofactor and three-factor RTMs showed similar accuracy and both were better than the one-factor RTM, with an improvement of 7% in RMS.The four seasonal two-factor RTMs slightly outperformed the yearly two-factor RTM, with the improvements of 3%, 10%, 2%, and 3% in the RMS of the four seasons.The improvement of the conversion factors in mean bias and RMS resulting from the seasonal two-factor RTM is 92% and 31%.The bias and RMS of the PWV resulting from the seasonal two-factor RTM are improved by 37% and 12%, respectively.Therefore, the seasonal two-factor RTMs are recommended for the research and applications of GNSS meteorology in the Hunan region.

Figure 1 :
Figure 1: Location of the three radiosonde stations in Hunan region, China.

Figure 2 :
Figure 2: Scatter plots of correlations between   and   (a);   and   (b); and   and   (c).

Figure 3 :
Figure 3: Statistical histograms of the accuracies of the BTM (a) and one-factor RTM (b) during the period of 2013-2014 at the three stations.

Figure 4 :
Figure 4: Comparison of the three   time series (left) and the differences of the BTM and RTM results from the   truth (right) at the three stations (a) Changsha, (b) Huaihua, and (c) Chenzhou.

Figure 5 :
Figure 5: Statistical histograms of the multifactor RTM results at the three stations: (a) two-factor RTM and (b) three-factor RTM.

Figure 6 :
Figure 6: Comparison of the three   time series (left) and the differences of the two-factor and three-factor RTMs from the radiosondederived   truth (right) at the three stations: (a) Changsha, (b) Huaihua, and (c) Chenzhou.

3. 4 . 1 .Figure 7 :
Figure 7: Comparison of the   time series predicted using the four seasonal two-factor RTMs, yearly RTM, and the   truth (left panes) and the differences of the seasonal and yearly RTMs from the truth (right panes) in (a) spring, (b) summer, (c) autumn, and (d) winter.

Figure 8 :
Figure 8: (a) Comparison between PWVs (twice daily) resulting from the BTM and seasonal two-factor RTMs-derived   against the truth during 2015.(b) Accuracy of the two sets of PWVs.

Table 1 :
Comparison of four typical methods for obtaining   .Temperature lapse rate ; vapor pressure decline rate ; surface temperature   ; gas constant ; gravitational acceleration  Hardest Lowwater surface.When air reaches the saturated condition, the water vapor in air will condense, and dew point temperature is the same as air temperature at this time.Therefore, vapor pressure under the condition of saturation is the saturated [15,40]ressure (), which is the function of dew point temperature (  ) expressed by[15,40] = 6.112 exp ( 18.62  243.12 +   ) , Multiple Meteorological Factors.In this part, the correlations between   and each of the three meteorological factors   ,   , and   are investigated.The scatter plots and their correlation coefficients are shown in Figure2.The linear regression analysis shows the three correlation coefficients are 0.90, 0.88, and −0.55.Therefore,   has very high positive correlation to   and   , while it has a weak negative correlation to   .
[51]3.Collinearity Test.Collinearity, also called multicollinearity, is a phenomenon that two or more factors in a regression model are highly correlated.It refers to nonindependence of the predictor factors, usually in a regression-type analysis.As for a set of factors ( 1 ,  2 , ...,   ), there exist coefficients ( 1 ,  2 , ...,   ) to make the following equation hold[51]:

Table 2 :
Comparison of the biases and RMSs of   derived from the BTM and one-factor RTM against the   truth at the three stations.

Table 3 :
Comparison of the biases and RMSs of the differences of one-factor, two-factor, and three-factor RTM-derived   from the   truth at the three stations.

Table 4 :
Four seasonal two-factor RTMs during the period of 2012-2013.

Table 5 :
Comparison of the biases and RMSs of the seasonal and yearly two-factor RTM-derived   in four seasons.

Table 6 :
Bias and RMS of conversion factor (^) resulting from the BTM and seasonal two-factor RTMs against the truth of ^during 2013-2014.

Table 7 :
Bias and RMS of the PWVs resulting from the BTM and seasonal two-factor RTMs-derived   at the Chenzhou station (mm).