Bayesian Estimation of Partially Linear Additive Spatial Autoregressive Models with P-Splines

In this paper, we aim to develop a partially linear additive spatial autoregressive model (PLASARM), which is a generalization of the partially linear additive model and spatial autoregressive model. It can be used to simultaneously evaluate the linear and nonlinear eﬀects of the covariates on the response for spatial data. To estimate the unknown parameters and approximate nonparametric functions by Bayesian P-splines, we develop a Bayesian Markov Chain Monte Carlo approach to estimate the PLASARM and design a Gibbs sampler to explore the joint posterior distributions of unknown parameters. Furthermore, we illustrate the performance of the proposed model and estimation method by a simulation study and analysis of Chinese housing price data.


Introduction
Spatial econometrics is a subfield of econometrics dealing with spatial interaction effects among geographical units Elhorst [1]; it has been widely applied in many research fields such as economics, geography, and environmental science. e traditional spatial parametric autoregressive models have received a lot of attention from theoretical econometricians and applied researchers, for example, Anselin [2], Case [4], Cressie [3], LeSage [4], LeSage [5], Anselin and Bera [6], Lee [7], Lee [8], Kakamu and Wago [9], LeSage and Pace [10], Kazar and Celik [11], and Piribauer and Crespo Cuaresma [12] among others. However, the spatial parametric autoregressive models are highly sensitive to model misspecification; they may not be adequate in many complex situations. Indeed, it is confirmed that some economic relationships in space exhibit highly nonlinear shapes [13][14][15][16]. Neglecting the underlying nonlinear relationship in spatial parametric autoregressive models frequently leads to inconsistent estimation of the parameters.
Although spatial nonparametric autoregressive models can be used to improve the performance of the spatial parametric autoregressive models, it is unavoidable to bear the risk of misspecifying the link function. However, when the dimension of the explanatory variables is high, a fully nonparametric spatial autoregressive model is too hard to be explored because of the so-called "curse of dimensionality" [17]. Several nonparametric dimension-reduction techniques have been developed to overcome the problem, for instance, single-index model [18,19], additive model [20], and varying-coefficient model [21], among others. us, semiparametric spatial autoregressive models have been proposed for dealing with spatial data. To capture the underlying relationships between the response variables and their associated covariates, semiparametric spatial autoregressive models have gained much attention in the literature of statistics and econometrics. For example, Su and Jin [22] discussed the quasi-likelihood estimator for the semiparametric partially linear spatial autoregressive model; Su [23] proposed semiparametric generalized method of moment (GMM) estimation of the semiparametric spatial autoregressive model; Sun et al. [24] developed a profile likelihood estimator for the semiparametric spatial dynamic model; Chen et al. [25] presented a two-step Bayesian approach for the semiparametric spatial autoregressive model; Wei and Sun [26] considered the semiparametric GMM estimation of a spatial model with space-varying coefficients; Hoshino [27] proposed a semiparametric series GMM estimator for the semiparametric spatial autoregressive model; Krisztin [28] studied a novel Bayesian semiparametric estimation for the penalized spline spatial autoregressive model; and Wei et al. [29] considered profile quasi-maximum likelihood method to estimate the partially linear varying-coefficient spatial autoregressive model. Sun and Wu [30] discussed the GMM to estimate the partially linear single-index spatial autoregressive model. Cheng et al. [31] developed the GMM estimation of the partially linear singleindex spatial autoregressive model. Du et al. [32] investigated a GMM estimator for PLASARM. Hu et al. [33] presented a new partial functional linear spatial autoregressive model.
In recent years, semiparametric models have become an important research topic between statistics and econometrics due to the explanation of parametric modeling and the flexibility of nonparametric modeling. Among the various semiparametric models, the most popular ones are perhaps the partially linear additive models. ey contain both linear and nonlinear additive components. As they can not only provide more flexible models than the stringent linear models but also mitigate the "curse of dimensionality" phenomenon encountered in nonparametric regression. erefore, the partially linear additive models provide a good balance between the flexibility of the additive models and the interpretation of the partially linear models. A lot of authors have developed various methods to analyze such models: local linear method [34], spline estimation [35][36][37], variable selection [38][39][40][41], quantile regression [42][43][44][45], and so on. But most of this work has not been applied to analyze spatial data. It is a distinctive challenge facing analysts of spatial data to characterize certain flexible functional forms that try to account for the latent nonlinearity of PLASARM. Combining the partially linear additive models with the spatial autoregressive models, we develop a Bayesian P-splines method and design a Gibbs sampler to explore the joint posterior distributions, along with a Markov chain Monte Carlo algorithm, to estimate the unknown parameters and approximate the nonparametric functions of PLASARM. e rest of the article proceeds as follows: In Section 2, we present PLASARM for spatial dependent data and discuss its identifiability conditions and then obtain the likelihood functions by approximating the link function with a Bayesian P-spline method. In Section 3, we specify the prior distributions, derive the full conditional posteriors of the unknown parameters, and describe the detailed sampling algorithms in order to provide a Bayesian P-splines method for analysis. e applicability and practicality of the proposed model and estimation method are demonstrated through a simulation study and a real dataset in Section 4. In Section 5, we conclude the paper with a summary.

Model.
e partially linear additive spatial autoregressive model is defined as follows: where y i is the i-th observation of response variable, x i � (x i1 , . . . , x iq ) T and z i � (z i1 , . . . , z ip ) T are the i-th observations of q and p dimensional covariate vectors, respectively, w il is a specified constant spatial weight, g j (·) (j � 1, . . . , p) are unknown univariate nonparametric functions, ρ is an unknown spatial parameter reflecting spatial autocorrelation between neighbors with stability condition |ρ| < 1, α � (α 1 , . . . , α q ) T is a q × 1 vector of the unknown parameters, and ε i 's are mutually independent and identically distributed normal random variables with zero mean and variance σ 2 . To ensure model identifiability of the nonparametric functions, it is often assumed that the condition n i�1 g j (z ij ) � 0 for j � 1, . . . , p.

Bayesian P-Splines and Likelihood.
In analyzing the proposed PLASARMs defined by (1), modeling the smooth nonparametric function is an important issue.
Because of the advantages of the Bayesian approach and the nice features of Bayesian P-splines [46], we will concentrate on applying the Bayesian P-splines approach. We intend to approximate unknown function g j (·) in (1) by B-splines [47]. For j � 1, . . . , p, the unknown function g j (·) is a polynomial spline of degree m j with k j order interior knots ξ j � (ξ j1 , . . . , ξ jk j ) T with a j < ξ j1 < · · · < ξ jk j < b j , that is, where K j � 1 + m j + k j is the number of splines determined by the number of knots, B j (z j ) � (B j1 (z j ), . . . , B jK j (z j )) T is a K j × 1 vector of spline basis that is determined by the knot vector ξ j , β j � (β j1 , . . . , β jK j ) T is a K j × 1 vector of spline coefficients, and In practice, a choice of K j in the range of 10 to 30 provides the flexibility of fitting. To prevent overfitting due to using a relatively large number of knots, Eilers and Marx [48] proposed applying a difference penalty on coefficients of adjacent B-splines. Let 1 n � (1, . . . , 1) T and B j be an n × K j matrix with B T j (z ij ) as its i-th row. To achieve identification, we set n i�1 K j l�1 B jl (z ij )β jl � 0, which is equivalent to 1 T n B j β j � 0. Denote Q j � 1 T n B j ; then, the constraint becomes Q j β j � 0.

Bayesian Estimation
We develop a Bayesian P-splines method with the Gibbs sampler to estimate the proposed model in this section. We begin with the specification of the prior distributions, then the derivations of the full conditional posteriors of all of the unknown parameters, and the narration of the detailed sampling scheme.

Priors.
In order to develop a Bayesian P-splines method to estimate the model and eschew the use of improper to prevent improper joint posterior, we specify appropriate prior distributions for all the unknown parameters which include regression coefficients θ � (α T , β T ) T , variance σ 2 , and spatial parameter ρ.
Firstly, we set a hierarchical prior for α, which consists of a conjugate normal prior, and an inverse-gamma prior, where r τ α0 and s 2 τ α0 are prespecified hyperparameters. Secondly, to identify the unspecified smooth function g j (·) for j � 1, . . . , p, the identifiability constraint Q j β j � 0 should be imposed on β j . Under this constraint, we choose the random walk prior, for β j , where I · { } is the indicator function, d is the order of the random walk, M β j is the penalty matrix that equals To enhance the flexibility and robustness of the method, we further put an inverse-gamma prior, on τ j , where r τ β j 0 and s 2 τ β j 0 are prespecified hyperparameters. In addition, the following conjugate prior distribution of σ 2 is assigned as where r 0 and s 2 0 are prespecified hyperparameters. roughout this article, we set r 0 � s 2 0 � 1 to obtain a Cauchy distribution of σ 2 and use r τ α0 � r τ β j 0 � 1 and s 2 τ α0 � s 2 τ β j 0 � 0.005 to obtain a highly dispersed inverse-gamma prior for τ j , j � 0, 1, . . . , p.
Finally, the spatial autocorrelation coefficient ρ is given a uniform prior ρ ∼ U(λ min − 1 , λ max − 1 ), where λ min and λ max are the minimum and maximum eigenvalues of the standardized spatial weight matrix W, respectively. In other words, e joint prior of all unknown parameters is given by where τ � (τ 0 , τ 1 , . . . , τ p ). Note that we have treated hyperparameter τ as a parameter vector for computational convenience.

e Full Conditional Posterior Distributions of the Parameters.
In this subsection, we present the full conditional posterior distributions of all parameters that are used in the Gibbs sampler algorithm and describe the detailed sampling method.
It follows from likelihood function (6) and priors (14) that, given the remaining unknown quantities, the conditional posterior distribution of spatial autocorrelation coefficient ρ is Because it is not a standard density function, we cannot directly do simulations from (15). We prefer the Metropolis-Hastings algorithm [49,50] to overcome the difficulty: generate a candidate ρ * from a truncated Cauchy distribution with location ρ and scale σ ρ on interval (λ min − 1 , λ max − 1 ), where σ ρ acts as a tuning parameter, and accept ρ * with probability given the factor It is easy to see from likelihood function (6) and priors (14) that, given the spatial autocorrelation coefficient ρ and the hyperparameter τ, the joint posterior of all the parameters is proportional to 4 Mathematical Problems in Engineering We can see from (18) that the method of composition [51] can be used to draw σ 2 from the conditional inversegamma posterior, and to generate θ from the conditional normal posterior, In addition, to achieve identification, the constraint Q j β j � 0 should be imposed on β j . According to Panagiotelis and Smith [52], sampling β j from (20) is equivalent to sampling It is obvious that the hyperparameter τ j for j � 0, 1, . . . , p is mutually independent in the posterior. e full conditional posterior of τ j is an inverse-gamma distribution with density which can be simulated directly from (22) and (23)

Numerical Illustration
A simulation study demonstrates that the proposed model and methodology perform satisfactorily in the finite sample.
We apply the proposed model to analyze a real dataset.

4.1.
Simulation. Consider the following model: where the covariate vector x i � (x i1 , x i2 ) T follows a bivariate normal distribution with mean 0 and covariance vector, and both z i1 and z i2 are independent; they follow uniform distributions on (− 1, 1) and (0, 1), respectively. e link functions g 1 (z) � sin(πz) and e true values of parameters are assumed as α � (1, − 1) T and σ 2 � 0.25, respectively. For comparison, we consider three different cases of spatial parameters ρ 1 � 0.25, ρ 2 � 0.5, and ρ 3 � 0.75, which represent from weak to strong spatial dependence of the response. Two kinds of spatial weight matrices W are used: one is Rook weight matrix [2] with n units, and the other is Case weight matrix [53]  According to the above design, we conduct each simulation with 500 replications. For j � 1, . . . , p, we applied a natural cubic spline in which the knots with K j � 22 are placed at an equally spaced interval of the predictor variables, used second-order random walk penalties, and assigned hyperparameters (r 0 , s 2 0 , r τ α0 , s 2 τ α0 , r τ β j 0 , s 2 τ β j 0 ) � (1, 1, 1, 0.005, 1, 0.005) in our computation. e second-order random walk penalties are used for the Bayesian Mathematical Problems in Engineering P-splines to approximate the unknown smooth functions. e initial states of the Markov chain are chosen as follows: the unknown quantities are drawn from the respective prior distributions. By incrementally increasing or decreasing, the tuning parameter σ ρ is used such that the resultant acceptable rate for the parameter is around 25%.
We can generate 6000 sampled values and delete the first 2000 values as a burn-in period for each of the replications by the proposed Gibbs sampler. In order to check the convergence of the proposed MCMC algorithm, we run five times with different starting values by the proposed Gibbs sampler for an arbitrarily selected replication. e sampled traces of parts of the unknown quantities for replication with (r, m) � (80, 5) and ρ � 0.5 are displayed in Figure 1, where the unknown quantities include model parameters and fitted function on grid points. It is evident that the five parallel sequences mix reasonably well. e other cases have similar performance. Based on the five parallel sequences, we further calculate the "potential scale reduction factor" � � R [54] for each model parameter and fitted function on 10 selected grid points. All the values of � � R against the iteration numbers are displayed in Figure 2 (the case of the spatial parameter ρ � 0.5). We observe that 2000 burn-in iterations have converged as all the values of � � R were less than 1.2. Based on 4000 sampled values, we then compute the means, the standard errors, and the 2.5 th and 97.5 th percentiles of the posterior distributions for all unknown quantities.
We evaluate the performance of the estimated functions by the mean absolute deviation errors (MADE): at 101 fixed grid points z ji 101 i�1 that are equally spaced chosen from interval [a j , b j ]. To assess the accuracy of the estimates for the remaining parameters, we calculate the corresponding means across 500 replications for the posterior mean (Mean), bias (Bias), standard error (SE), and 2.5 th and 97.5 th percentiles of the parameters (the 95% posterior credible intervals, 95% CI). We also compute the standard derivations (SD) of the estimated posterior means to compare them with the means of the estimated posterior standard errors.  Table 1 summarizes the estimation results. It is observed that all of the means of the estimates are very close to the respective true values, and the average values of the standard errors are close to the corresponding standard derivations, which indicate that the parameter estimates and the standard errors are accurate. is indicates that the proposed estimators of the parametric model perform better with increasing sample size. By comparing the estimates for ρ under the same sample sizes, we find the estimates of ρ with the Case weight matrix are much better than those with the Rook weight matrix. e possible main reason lies in the fact that the Rook weight matrix is more complicated than the Case weight matrix. Finally, the bigger the sample size under the same spatial complexity, the more accurate the estimates are. We have also repeated the aforementioned experiences with different starting values, and the results are similar. is indicates that the proposed Gibbs sampler works well. e median and SD of MADE values of the link functions are reported in Table 2. It shows that the median and SD of Input: Samples: (y i , x i , z i ) i�1,...,n . Initialization: Initialize the MCMC algorithm in iteration t � 0 with Θ (0) , where all unknown quantities are drawn from the respective prior distributions. MCMC iterations: For t � 1, 2, 3, . . ., given the current state Θ (t− 1) successively, resample Θ (t) from p (Θ|y, x, z). e detailed Gibbs sampler cycles are outlined as follows: p(θ|y, x, z, ρ (t− 1) , σ 2(t− 1) , τ (t− 1) ), and adjust β (t) according to (21); (iv) Draw τ (t) from p(τ|θ (t− 1) , σ 2(t− 1) ), which is replaced by the following two steps:

Mathematical Problems in Engineering
MADE values of the link functions decrease along with increasing the sample size, which indicates that the estimates of the unknown functions are convergent. e spatial correlation coefficients also have little effect on the nonparametric estimates. e estimated functions, together with the 95% pointwise posterior credible intervals, are depicted in Figure 4 from a typical sample under the spatial dependence ρ � 0.5 when the sample size is n � 100 and n � 400, respectively. e typical sample is selected in such a way that its MADE value is equal to the median in the 500 replications. is indicates that the proposed estimators of nonparametric functions are performed better with increasing sample size. It took about 30 seconds for n � 100 and 60 seconds for n � 400 to run each replication using a PC with Intel (R) Core (TM) i7-8750H 2.20 GHz, respectively. Computer code is available upon request from the authors.

A Real Data Example.
As an illustration of our proposed model and method with a real example, we analyze housing prices data in Chinese cities. e data are obtained from the China City Statistical Yearbook 2013 [56] and the China Statistical Yearbook for Regional Economy 2013 [57]. ey cover 286 cities at/above the prefecture level (except for the cities in Taiwan, Hong Kong, and Macau). In our real data analysis, we consider that the dependent variable y is each city's average selling price of residential houses (denoted by HP).
ere are three covariates: (1) population density (PD); (2) per capita disposable income of urban households (INC); and (3) loan-to-GDP ratio (MON), which indicates the degree of monetary policy whether ease or tightness.
is motivates us to consider the following partially linear additive spatial autoregressive model: With regard to the choice of the weight matrix, according to the practice in Sun et al. [24], we use the Euclidean distance in terms of any two houses to calculate the spatial weight matrix W. We use longitude and latitude to represent the location, which is denoted as s i � (Lon i , Lat i ). e spatial weight w il is We set as where the transformation is used for transferring the asymmetric distribution of PD to nearly uniform distribution on (0, 1). e other covariates variables of INC and MON are transformed so that the marginal distribution is approximately N(0, 1). ese do not alter the model but facilitate implementation. For this dataset, we adopted natural cubic P-splines, used second-order random walk penalties, and assigned (λ, r 0 , s 2 0 , r τ α0 , s 2 τ α0 , r τ β j 0 , s 2 τ β j 0 ) � (2, 1, 1, 1, 0.005, 1, 0.005). e tuning parameter σ ρ is used such that the acceptable rate for updating ρ is around 25%.
We run five times with different initial states by the proposed Gibbs sampler and generate 10000 sampled values following a burn-in of 20000 iterations in each  Mathematical Problems in Engineering replication. Figure 5 plots the traces of parts of the unknown quantities. It can be seen that the five parallel sequences mix very well. Based on the five parallel sequences, the "potential scale reduction factor" � � R is calculated and plotted in Figure 6, from which we observe that the Markov chain has converged within 20000 burnin iterations.
e estimated parameters together with their standard errors and 95% posterior credible intervals are reported in Table 3. e estimate of the spatial parameter is ρ � 0.615 with the 0.034  standard deviation which indicates there exists a strong and positive spatial relationship among the response. We observe that the covariate PD has significant effects on the response. e regression coefficient of PD is α � 0.258 > 0, in which the population density has a positive effect on the housing price. e estimated functions, together with the 95% pointwise posterior credible intervals, are depicted in Figure 7, which look like two nonlinear functions with an upward trend. e curves show that g 1 (z 1 ) and g 2 (z 2 ) have a local maximum of 0.764 at around z 1 � 3.335 and 0.272 at around

Summary
Spatial data are often encountered in economics, geography, and environmental science and can be analyzed by the spatial autoregressive models. To reduce the high risk of misspecification of the traditional spatial autoregressive models and avoid some serious drawbacks of fully nonparametric models, we have developed a PLASARM for spatial data, which combines the partially linear additive model and spatial autoregressive model. In this paper, we have considered a Bayesian MCMC approach with P-splines to analyze the proposed model and designed a Gibbs sampler to explore the joint posterior distributions based on the Bayesian P-splines technique. A simulation study demonstrates that the proposed model and methodology perform satisfactorily in the finite sample. e results have shown that they are reliable for recovering the true parameters and nonparametric components. We have further applied the proposed model and methodology to analyze a real dataset. Various extensions can be considered in the future. We can pursue the case where the covariates are high dimension in the model. e proposed models can be generalized in several ways. Although we focus on a PLASARM to assess the effects of the covariates on the response, the other semiparametric models such as single-index spatial autoregressive models and varying-coefficient spatial autoregressive models can also be considered. In addition, it is straightforward to generalize the model and methodology to handle more complicated data types, for example, mixed discrete, continuous, and ordinal data [55]. We leave these topics for future research.
Data Availability e data are obtained from the China City Statistical Yearbook 2013 [56] and the China Statistical Yearbook for Regional Economy 2013 [57]. e data can be made available from the corresponding author or obtained from the above Yearbook.

Conflicts of Interest
e authors declare that they have no conflicts of interest.