Linear Regression Estimation Methods for Inferring Standard Values of Snow Load in Small Sample Situations

,e aim of this paper is to establish a newmethod for inferring standard values of snow load in small sample situations. Due to the incomplete meteorological data in some areas, it is often necessary to infer the standard values of snow load in the conditions of small samples in engineering, but the point estimation methods of classical statistics adopted till now do not take into account the influences of statistical uncertainty, and the inference results are always aggressive. In order to overcome the above shortcomings, according to the basic principle of optimal linear unbiased estimation and invariant estimation of the minimum type I distribution parameters and the tantile, using the least square method, the linear regression estimationmethods for inferring standard values of snow load in small sample situations are proposed, which can take into account two cases such as parameter-free and known coefficient of variation, and the predicted formulas of snow load standard values are given, respectively. ,rough numerical integration and Monte Carlo numerical simulation, the numerical table of correlation coefficients is established, which is more convenient for the direct application of inferential formulas. According to the results of theoretical analysis and examples, when using the indirect point estimation methods to infer the standard values of snow load in the conditions of small samples, the inference results are always small.,e linear regression estimation method is suitable for inferring standard values of snow load in the conditions of small samples, which can give more reasonable results. When using the linear regression estimation to infer standard values of snow load in practical application, even if the coefficient of variation is unknown, it can set the upper limit value of the coefficient of variation according to the experience; meanwhile, according to the parameter-free and known coefficient of variation, the estimation is carried out, respectively, and the smaller value of the two is taken as the final estimate.,emethod can be extended to the statistical inference of variable load standard values such as wind load and floor load.


Introduction
Snow load is one of the main loads of buildings, and the inference of its standard value is the basis for the establishment of structural design and evaluation methods. At present, the inference of the standard values of snow load is generally fitting with the maximum type I distribution for the maximum annual snow pressure [1]; then, using the relation between the representative values and the distribution parameters, an estimate of the distribution parameters is given under a certain guaranteed rate. Among them, the estimation method of distribution parameters includes the moment method and the maximum likelihood method [2][3][4][5][6][7][8][9][10][11][12][13]. ese are point estimation methods. ey are mainly suitable for large samples. ey require sufficient meteorological data as statistical samples. It is generally believed that at least 30 years of data are required [14,15]. However, in engineering practice, it is sometimes necessary to infer the standard values of snow load under the condition that the test data are insufficient, and the actual sample capacity is often very limited. Statistical analysis is done mostly in the case of small sample capacity. is results in a significant reduction in parameter estimation accuracy when the sample capacity is small. In the case of such a small sample, if the current point estimation method is still used for inference, the result of inference is often reduced due to the influence of statistical uncertainty. A more reasonable choice is to use a small sample inference method [16][17][18][19].
For dead loads, the standard for appraisal of reliability of civil buildings has proposed a small sample method for inferring its standard value [20]; however, the snow load usually obeys the maximum type I distribution, which is different from the probability distribution form of the dead load. erefore, the same method cannot be used to infer the standard values of snow load. A method for estimating the maximum type I distribution parameters and tantile is proposed in the paper [21], which lays a theoretical foundation for the small sample inference of the standard values of snow load without parameter information. erefore, the inference formula of the standard values of snow load without parameter information can be established. However, in some cases, the variable coefficients of the probability distribution of snow loads are known, or their upper limit values can be set based on experience. Using the additional information, the uncertainty of the statistics in the estimation process can be significantly reduced, and more favorable extrapolation results can be obtained under the same conditions. erefore, it is necessary to study the inference method of the standard values of snow load when the coefficient of variation is known under the condition of small samples, to provide a better choice for the small sample inference of the standard values of snow load.
Based on the above analysis, the probability distribution model of snow load is established first in this paper because the standard values of snow load usually are expressed as a tantile of the distribution, type I maximum distribution, and type I minimum distribution which belong to the same extreme value distribution families and can be converted with each other; therefore, by using the least square method based on the best linear unbiased estimation and the invariant estimation principle of the current minimum type I distribution parameters, a linear regression estimation method for standard snow load under small sample conditions is proposed [22][23][24][25][26][27][28][29][30][31]. e inference formula of standard values of snow load is given in this paper, and no parameter information and the known coefficient of variation are both considered. rough numerical integration and Monte Carlo numerical simulation, a numerical table of correlation coefficient is established to tantile the application of the inference formula, and the conclusions and suggestions are given by comparing the results with the traditional large sample method. e method can be extended to the statistical inference of variable load standard values such as wind load and floor load.

Probability Distribution Model of Snow Load
A probability distribution model of snow load is established by using a stationary binomial random process [32].
It is assumed that the design reference period of the building structure is T, and it can be divided into r equal period of time in [0, T]; the average time for the snow load to change once is τ � T/r; the probability of action at each time is p; the probability distribution function appearing on different time segments is F Q (x), and the random variables x on different time segments are independent of each other. e establishment of the snow load model requires the identification of the above three key parameters. e Unified Standard for Reliability Design of Building Structures uses a limit state design method based on probability theory, that is, a first-order second-moment method that considers the probability distribution type of basic variables. Since the basic variables are considered as random variables, the random process of the load must be converted into a random variable [33]. If random variables at any point time are used instead of random processes, it will be unsafe, so the maximum load random variables Q T (Q T � max Q t , 0 ≤ t ≤ T) appearing in the design reference period should be used instead of random processes for statistical analysis. e basic steps to convert the random process to the maximum load within T years are as follows: Step 1: establish the load probability distribution function at any period time τ: where F i (x) is the load distribution function at i time point.
Step 2: establish the probability distribution function Q T : Assuming that the average number m is the snow load which occurs in T years, then Obviously, when p � 1 and m � r, then, When p < 1, using the approximate relationship, ] is sufficiently small in (2), then Hence, It can be seen from the above equation that the probability distribution function F T (x) of the maximum load Q T in the design reference period is equal to the m-power of the probability distribution function F i (x) of the load at any time, and F i (x) can be obtained according to statistics [34].
Step 3: the probability distribution of snow load is fitting with the maximum type I distribution. e distribution function and probability density functions are, respectively, as follows: where α and μ are the scale parameters and position parameters of the distribution.

In the Condition of Unknown Parameter
Information. e value at any time point of snow load obeys the maximum type I distribution, and the probability density function is where μ and α are distributed parameters,− ∞ < μ < ∞ and 0 < α < ∞. e standard values of snow load usually are expressed as down tantiles with p calibration of the random variable X, and they can be written as x k ; it meets that where p is a guaranteed rate of the standard value x k .
It is assumed that n samples of X are arranged from a small to large order: X (1) , X (2) , . . . , X (n) , and the test values are x (1) , x (2) , . . . , x (n) , respectively. Let en, Y ′ obeys the minimum type I distribution with two parameters − μ and α, and the order statistic and up tantiles with p calibration are For obtaining x k which is the characteristic value of variable actions, we usually select the upper limit estimated value as the inferring results; when the load effect is favorable for the structure, the lower limit estimated value should be selected, but it has been rarely found in the most unfavorable combinations of the action effect [35]. We can use the upper and lower limit estimated values of y p ′ to infer the upper and lower limit estimated values of x k because the estimate values And, for the random variables − μ and α, we have, us, the upper and lower limit estimated values of x k , which are the characteristic values of the variable actions, are where D I (n, n, j) and C I (n, n, j) can be directly determined from the numerical table.
Because v p,C and v p,1− C are present, numerical tables only give the numerical values when p � 0.90, 0.95, and 0.99, so the values of v p,C and v p,1− C usually cannot be directly determined from the numerical table [36]. erefore, we propose a new approximate method which uses the present numerical table to infer the values of v p,C and v p,1− C in this paper.
We use p 0 to denote the guarantee rate when the present numerical tables are adopted, and down tantiles with p 0 calibration of the random variable X are expressed as x k0 , so we have Mathematical Problems in Engineering It is approximate let that Because of the corresponding relationships between x k,U and x k0,U , x k,L , and x k0,L , we can obtain that In practical applications, we should select p 0 which is approximate to p. It is proved by calculation that when n � 5 − 25, p ≥ 0.90, and C � 0.6 − 0.95, the margins of errors of v p,C and v p,1− C are − 0.017 to 0.017 and − 0.031 to 0.036, respectively.
is method is more convenient and accurate than the interpolation method.

Linear Unbiased Estimation of Distribution
Parameters. It is assumed that the value at any time point of snow load is X and the coefficient of variation δ X of X is known, and the distribution parameters are where μ X , σ X , and δ X are the mean, standard deviation, and coefficient of variation of X, respectively, and C E is the Euler constant. It is assumed that the samples of X are arranged from the small to large order: X (1) , X (2) , . . . , X (n) , that is, At this time, the probability density function of X (i) and the joint probability density function of Let en, Z obeys the standard maximum type I distribution, and its order statistic is the probability density function of Z (i) and the joint probability density function of Z (i) and e mean, the variance of Z (i) , and the covariance of the Z (i) and Z (j) are en, the mean of X (i) is e covariance matrix and its inverse matrix of Z (1) , Z (2) , . . . , Z (n) are, respectively, denoted as 4 Mathematical Problems in Engineering According to the least square method of parameter estimation [36], take the weighted average sum as When the coefficient of variation δ X is known, let dQ dμ � 0.
When the coefficient of variation δ X is known, the least square estimate of the unknown parameter μ is Since X (j) � μ + αZ (j) � μ 1 + cZ (j) , j � 1, 2, . . . , n. (37) Hence, the mean of μ * is that is, the linear unbiased estimator of μ [37]. If x (1) , x (2) , . . . , x (n) are the test values of X (1) , X (2) , . . . , X (n) , then the linear unbiased estimate is Compared to the corresponding coefficient in the current linear unbiased estimate D(n, n, j) [37], the influence of the coefficient of variation δ X is considered in the coefficient D(n, n, j, δ X ). Since it is difficult to obtain the analytical expressions of equations (28)- (30), the values of v ij , μ i , and v ii must be determined by numerical integration, the value of D(n, n, j, δ X ) can be determined from (36), the values when n � 10 are listed in Table 1, and others are omitted.

Interval Estimation of Tantile.
For the standard values of snow load, a relatively large tantile is usually selected under the condition of a small sample, and the upper limit value in the interval estimate should be used as its estimate value. Let where p is the guaranteed probability when X ≤ x p , and the value is relatively large. When the coefficient of variation δ X is known, it can be known from (35) and (37) that Let U � μ/μ * , then U is a statistic that is independent of the distribution parameters μ and α. Based on the probability distribution of U, the estimate available of μ by the upper limit in the interval estimate is μ � u n, n, δ X , C μ * , where u(n, n, δ X , C) is the down tantile of U, C is the confidence degree, and the value is relatively large. e value of x p estimated from the upper limit is It is almost impossible to determine u(n, n, δ X , C) by the analytical method, so we need to use Monte Carlo numerical simulation to determine the value. e random numbers that obey the standard maximum type I distribution which are first generated in each simulation, a set of sample values Z (1) , Z (2) , . . . , Z (n) , can be obtained after sorting, and the sample value of U is obtained from equation (41). When the number of simulations is sufficient, any tantile u(n, n, δ X , C) can be obtained by statistics. Here, the number of simulations is 50,000. Table 2 lists the partial numerical tables when n � 10, 15, 20.

How to Choose the Confidence Degree in
Engineering Practice e confidence degree C is a representative of the trust level for the inferring results, and it has a direct impact on the inferring results; the higher the confidence degree, the higher the upper limit estimated value and the lower the lower limit  D(n, n, j, δ). Generally speaking, the larger the variability of the random variable X, the larger the value range of the distributed parameter and tantiles in the inference; moreover, the changes of C have great influences on the upper and lower limit estimated values when the confidence degree C gets larger and larger; that is, along with the C becoming larger, the upper limit estimated value is increased relatively quickly and the lower limit estimated value is reduced relatively quickly. When the variability of the random variable X is larger, and if we select the higher confidence degree in this case, the inferring results are too conservative, so we should select the relatively lower confidence degree. On the contrary, we should select the relatively higher confidence degree when the variability of the random variable X is smaller; it can avoid too aggressive inferring results.
e National Standard of the People's Republic of China (standard for appraiser of reliability of civil buildings) (GB50292-1999) throws out some suggestion about how to select the confidence degree to infer the standard value of the material strength: for steel, select C � 0.90; for concrete, select C � 0.75; for masonry, select C � 0.60; and it is suggested that C � 0.95 to infer the standard value of permanent action. e variation coefficients of material strength of the permanent action, steel, concrete, and masonry are in the order as 0.07, 0.06-0.10, 0.16-0.23, and 0.20-0.24 [20]. It can be seen that the value of the confidence degree in the National Standard of the People's Republic of China conforms to the above rules. e variation coefficient of snow load is more variable [38]; according to the above rules, we should select the relatively lower confidence degree, but the value should not be less than 0.5; otherwise, the lower limit estimated value will be higher than the upper limit estimated value. When the confidence degree C � 0.5 and all other conditions are equal, we contrast the inferring results of the moment method and the liner regression estimation method. It shows that the inferring result of the liner regression estimation method is slightly higher than the moment method, which is close to the result without considering the influence of statistical uncertainty. When the confidence degree C is not great, the changes of C have no great influence on the inferring results [39], so we can select the slightly higher confidence degree, which can take into account the influences of statistical uncertainty more fully and avoid the too aggressive inferring results. In this article, we suggest to select the confidence degree C � 0.75 to infer the standard values of snow load. We will illustrate the correctness by an example in Section 5.

Examples
In this section, we use the established linear regression estimation method to estimate the standard values of snow load. e data used in this article are the actual snowfall data from 1989 to 2008 in the Shuyang county [40], and the specific values are shown in Table 3.
We select the guarantee rate p � 1 − 1/50 � 0.98, and the confidence degree C in the interval estimate is 0.75.
By using the moment method, we can obtain the estimate of the standard values of snow load through calculation as follows: According to the maximum likelihood estimation method, by introducing the likelihood function, we can derive the parameter estimation formula as follows: We can obtain the estimated values of the parameters, respectively, by the iterative method, and finally obtain the standard values of snow load through calculation as follows: By using the linear regression estimation method of the standard value of snow load in the condition of unknown parameter information (equations (13), (14), and (18)), we can obtain the estimate through calculation as follows:   (10, 10, j) and C I (10, 10, j) are listed in Table 4, and v 0.99,0.75 � 6.23.
In the condition of the known coefficient of variation, to facilitate comparison under the same conditions, we select δ X � 0.4, by using the moment method, then By using the maximum likelihood estimation method, we can obtain the estimated values of the parameters, respectively, by the iterative method, and finally obtain the standard values of snow load: By using the linear regression estimation method that the coefficient of variation is known (equations (39) and (42)), then μ * � n j�1 D n, n, j, δ X x (j) � 0.120, where u(10, 10, 0.4, 0.75) � 1.096 and the value of D(10, 10, j, 0.4) is shown in Table 1; Table 5 shows the calculation process.
To facilitate comparative analysis, Table 6 lists the statistical extrapolation results of different estimation methods and different sample sizes in two cases where the coefficient of variation is known and the parameter information is unknown.
A comparative analysis of the calculations presented in Table 6 shows that (1) Regardless of whether the coefficient of variation is unknown or known, the results estimated by the maximum likelihood method are the smallest, followed by the moment method, because these two  classical statistical methods do not take into account the influences of statistical uncertainty, and the inferred results are always on the aggressive side. (2) Regardless of whether the coefficient of variation is unknown or known, with the gradual increase in the sample capacity, the relative errors of the results of the linear regression method and the classical statistical methods (the moment method and the maximum likelihood method) are gradually reduced, which is due to the gradual reduction of the influence of statistical uncertainty when the sample capacity increases. (3) According to the linear regression method, at different sample sizes, the estimated results when the coefficient of variation is known are smaller, which is due to the significant reduction in the uncertainty of the statistic U in the estimation process.
Due to space limitation, this paper only compares the extrapolation values of different inference methods in three groups of sample sizes. By calculating and comparing the extrapolation of the standard values of snow loads in other sample sizes, the same results can be obtained.

Conclusions
(1) When we use the current statistical inference method to infer the standard value of snow load in the conditions of small samples, the inferring results are always on the aggressive side because it does not take into account the influences of statistical uncertainty.
(2) e linear regression estimation method presented in this paper can reduce the influence of statistical uncertainty when the sample is small, and it is applicable to the statistical inference of the standard values of snow load in small sample conditions; considering the two cases where no parameter information and the coefficient of variation are known, more reasonable inference results can be given. (3) We suggest to select the confidence degree C � 0.75 to infer the standard value of snow load. (4) In the practical application, even if the coefficient of variation is unknown, the upper limit of the coefficient of variation can be set based on experience, the estimation can be made according to the no parameter information, the coefficients of variation are known, and the smaller value of the two is taken as the final estimate. e research in this paper can provide a theoretical basis for adjusting snow load.

Data Availability
e authors solemnly inform that the key data in the calculation process have been listed in the data list of the article, and other nonkey detailed data can be obtained by contacting the corresponding author if necessary.

Conflicts of Interest
e authors declare that they have no conflicts of interest.