Bayesian Adaptive Lasso for the Partial Functional Linear Spatial Autoregressive Model

,is study introduces a partial functional linear spatial autoregressive model which can explore the relationship between a scalar spatially dependent response variable and predictive variables containing both multiple scalar covariates and a functional covariate. With approximating to the functional coefficient by Karhunen–Loève representation, we propose a Bayesian adaptive Lasso method to simultaneously estimate unknown parameters and select important covariates in the model, which can be performed by combining the Gibbs sampler and the Metropolis–Hastings algorithm. Some simulation studies are conducted and the results show that the proposed Bayesian method behaves well.


Introduction
With the rapid development of electronic technology and the emergence of advanced measurement tools, a large amount of data can be collected and stored cheaply. In particular, scientists usually confront the data providing information about curves, surfaces, or anything else varying with continuous variables, such as economic index, nuclear magnetic resonance data, weather data, and stock data. is type of data structure is called functional data and has been received an increasing level of attention in various applied fields, including financial engineering, environmental science, medicine, brain imaging, and public health. erefore, many useful functional regression models have been proposed, and there has been a large amount of work in functional data analysis; see [1][2][3][4][5], among others. Especially, it often occurs in data modeling that a response is related to predictive variables including scalar covariates and a functional covariate, which refers to the partial functional linear model (PFLM). Recently, a lot of research work has been contributed to PFLM. For example, Shin [6] established the asymptotic normality of the estimated regression coefficients and the convergence rate of the estimated functional coefficient of the proposed partial functional linear regression model. Based on the results of Shin [6], Lu et al. [7] introduced a functional partially quantile regression model and gave the large sample properties of the proposed estimators. Yu et al. [8] investigated the hypothesis test of the parametric component in partial functional linear regression. Based on the incomplete data, Zhou and Peng [4] discussed a functional partial linear model when some responses are missing at random. Yu et al. [9] studied a singleindex partially functional linear regression model and obtained the convergence rates and asymptotic normality of the resulting estimators. To sum up, most of the above references are concerned that all individuals in functional regression models are mutually independent and the response variables do not have a spatial dependence structure. To our knowledge, relatively few studies have been carried out when response variables are spatially dependent variables in functional data analysis.
It is well known that a large number of models and methods are used to deal with spatially dependent variables. One useful approach in dealing with spatial dependence is the spatial autoregressive model, which has received great interest of both statisticians and econometricians. For example, Liu et al. [10] investigated variable selection in the spatial autoregressive model with independent and identical distributed errors based on a penalized quasi-maximum likelihood method. Xie et al. [11] considered conducting variable selection in spatial autoregressive models with a diverging number of parameters. Based on the idea of B-spline approximation and instrumental variables, Du et al. [12] proposed an estimate method for the partially linear additive spatial autoregressive models and obtained the convergence rate and asymptotic normality of the resulting estimators. Su and Jin [13] developed statistical inference for partially linear spatial autoregressive models by using the kernel estimate method and quasi-maximum likelihood method. However, the abovementioned papers focused mainly on that the predictor variables are real-valued variables.
On the contrary, due to the recent dramatic evolution in advanced computational technologies, many subject areas based on numerical approximations have developed rapidly, such as [14,15]. One of the biggest beneficiaries of these fields is Bayesian statistics. Specifically, Bayesian models allow one to incorporate appropriate level of uncertainty; thus, Bayesian works for various statistical models have also received substantial attentions in recent years. For example, based on the variance modeling technique and spline approximation, Xu and Zhang [16] discussed Bayesian inference for semiparametric joint mean and variance models. Combining the Gibbs sampler with the Metropolis-Hastings algorithm, Tang et al. [17] investigated Bayesian estimation and Bayesian local influence analysis of the transformation linear mixed models that the random effects follow an unknown distribution. Pfarrhofer and Piribauer [18] proposed two shrinkage priors to make Bayesian variable selection for high-dimensional spatial autoregressive models. Wang and Tang [19] considered Bayesian inference on a quantile regression model in the presence of nonignorable missing covariates. Zhao et al. [20] studied the Bayesian composite quantile regression with adaptive group Lasso penalty. However, to the best of our knowledge, there is little work done for Bayesian analysis of the functional linear spatial autoregressive model. In particular, Bayesian variable selection for this model has almost never been studied. In addition, spatial dependence is a common data feature in functional analysis fields such as meteorology, econometrics, oceanography, and environmental science. erefore, the research of the functional linear spatial autoregressive model has a strong application background. Hence, in this study, we introduce a Bayesian adaptive Lasso approach into the functional linear spatial autoregressive model on the basis of the hybrid algorithm combining the Gibbs sampler and Metropolis-Hastings algorithm, which is the attempt for this model. e outline of this study is as follows. In Section 2, we introduce a partial functional linear spatial autoregressive model and then give the likelihood function by approximating to the functional coefficient using Karhunen-Loève representation. In Section 3, in order to provide a Bayesian adaptive Lasso procedure, we specify the prior distributions, derive the full conditional distributions of unknown parameters, and then describe the detailed sampling algorithms by combining the Gibbs sampler and the Metropolis-Hastings algorithm. To illustrate the proposed methodology, results obtained from some simulation studies are presented in Section 4. e study is concluded with a brief discussion in Section 5.

Model and Likelihood
2.1. Model. In this study, we consider the following partial functional linear spatial autoregressive model: where Y i is a real-valued spatially dependent response variable, Z i is a p-dimensional random variables, and X i (t) is a zero mean, second-order stochastic process belonging to L 2 ([0, 1]), i � 1, . . . , n. In addition, w ij is the (i, j)th element of a given n × n spatial weights matrix W with zero diagonal elements, i.e., w ij � 0, for all i � j. Besides, θ � (θ 1 , θ 2 , . . . , θ p ) T is a p-dimensional unknown parameters, β(t) is an unknown square integrable function on [0, 1], and ε i s are mutually independent and identically distributed normal random variables with zero mean and variance σ 2 .
Next, for convenience, we use matrix and vector notations to represent variables and models. Denote . , X n (t)) T , and ε � (ε 1 , ε 2 , . . . , ε n ) T . en, model (1) can be written as where the random errors ε ∼ N(0, σ 2 I n ) and I n is an n × n identity matrix.

2.2.
Likelihood. e functional part of the model involves infinite dimensional variables and unknown parameters, so we first transform the infinite dimensional problem into a finite dimensional problem. Specifically, define the covariance function for the functional variable X and its empirical version, respectively, as K(s, t) � Cov(X(t), X(s)) and A linear operator is defined by the covariance function K, which maps a function f to Kf given by It is assumed that the linear operator with kernel K is positive definite. By Mercer's theorem, the spectral decompositions of the covariance function K and K can be expressed as , respectively, where λ 1 > λ 2 > · · · > 0 and λ 1 ≥ λ 2 ≥ · · · ≥ 0 are, respectively, the ordered eigenvalues of the linear operators associated with kernels K and K and ϕ j and ϕ j are the corresponding orthonormal eigenfunctions. Obviously, the sequences ϕ j and ϕ j each form an orthonormal basis in L 2 ([0, 1]). us, according to the Karhunen-Loève representation, we have the following expansions: Journal of Mathematics where ξ i are uncorrelated random variables with zero mean and variance E[ξ 2 i ] � λ i , c i � 〈β, ϕ i 〉, and 〈·, ·〉 represents inner product. More details can also be found in Ramsay and Silverman [21]. Combining (2) with (3), we rewrite model (2) as us, the model in (4) can be well approximated as follows; where the truncation parameter m ≤ n is the cut-off level and generally diverges with n. We replace ϕ j with the corresponding estimator ϕ j for j � 1, . . . , m; thus, we can rewrite model (5) as where U � 〈ϕ j , X〉 j�1,...,m and c � (c 1 , . . . , c m ) T . en, we can obtain the likelihood function of model (5): where e � AY − Zθ − Uc and A � I n − ρW.

Priors.
In this study, a Bayesian approach is implemented to estimate the unknown parameters θ, c, ρ, and σ 2 . For carrying out it, we should appoint a prior distribution for unknown parameters of the model. Firstly, we need to specify the prior of θ and consider a Bayesian adaptive Lasso approach in this study which is the Bayesian version of the adaptive Lasso method. Specifically, according to the idea of Bayesian adaptive Lasso, the hierarchical priors for θ are considered as follows: In other words, θ j |η 2 j ∼ N(0, η 2 j ), for j � 1, . . . , p. e above hierarchical priors lead to the following conditionally independent double-exponential prior density for θ: which shows that θ j |λ j ∼ DE(λ j ) � λ j exp(−λ j |θ j |), for j � 1, . . . , p, where DE(λ j ) denotes the double-exponential density function. According to Bayes theorem, we can obtain the posterior distribution of θ: which implies that the posterior estimate of θ is just the adaptive Lasso estimator of θ. Generally, a big λ j corresponds to unimportant variables, while a small λ j corresponds to important variables. us, we can make variable selection through the value of λ j ′ s. In particular, if all values of λ j ′ s are equal, the above procedure becomes the well-known Bayesian Lasso method. To enable Bayesian analysis, in this study, we use the hierarchical Bayesian approach to select the tuning parameter λ j by regarding λ j as a random variable. Note that the tuning parameter λ j is larger than zero, so it is natural to consider a Gamma prior. Here, we put Gamma prior on λ 2 j rather than λ j for the convenience of sampling. Specifically, the Gamma prior for λ 2 j is given as follows: where c 1 and d 1 are predefined hyperparameters. In addition, the priors of parameters c and ρ are chosen as c ∼ N(c 0 , B c ) and ρ ∼ U(−1, 1), where the hyperparameters c 0 and B c are assumed to be known vectors or matrix. Besides, σ 2 is taken to be Inverse Gamma (denoted as IG ( c 0 , d 0 )) with the density function given by where c 0 and d 0 are known positive numbers. In this study, we focus on cases that the prior distributions for the parameters of the models are listed above such as the normal distribution, inverse Gamma, and uniform distribution. However, the proposed procedure can easily be modified to other specific prior distributions. us, the joint prior of all the unknown parameters is given by Inference. Let ψ � (θ, c, ρ, σ 2 ), and we need to estimate the unknown parameters of ψ. Based on the likelihood function (7) and priors (12), the joint posterior distribution p(ψ|Y, Z, X) of the unknown parameters is given by

Posterior
To make Bayesian inference based on the joint posterior distribution (13), we firstly derive the full conditional distributions of the unknown parameters and then construct the Gibbs sampler with the Metropolis-Hastings sampling algorithms to generate posterior samples from their full conditional distributions as follows.
(v) Full conditional distribution of σ 2 : where a σ 2 � n/2 Consequently, based on the conditional posterior distributions from (14) to (20), we can construct an efficient MCMC-based sampling algorithm for generating posterior samples which is summarized in Algorithm 1. It is easy to see that the conditional posterior distributions from (14) to (19) are some familiar distributions, such as the Inverse Gaussian, Inverse Gamma, Gamma, and normal distributions. us, it is fast and convenient to generate posterior samples from these standard distributions. However, the conditional posterior distribution of (20) has nonstandard density function and rather complicated; it is hard to directly draw observations based on this full conditional distribution. We prefer the wellknown Metropolis-Hastings algorithm to overcome the difficulty. Firstly, we choose the normal distribution N(0, σ 2 ρ ) as the proposal distribution, where σ 2 ρ is chosen such that the average acceptance rate is about between 0.25 and 0.45 (Gelman et al. [22]). en, the Metropolis-Hastings algorithm is implemented as follows: at the l + 1th iteration with the current value ρ (l) , a new candidate ρ * is generated from N(ρ (l) , σ 2 ρ ) and is accepted with probability: Consequently, we could obtain posterior samplers from the algorithm in Algorithm 1 which have converged to the joint posterior distribution of (13) after the burn-in period.

Simulation Study
In this section, we investigate the performance of the proposed model and Bayesian estimation method via Monte Carlo simulation. e datasets are generated from the following partial functional linear spatial autoregressive model: where Z follows a multivariate normal distribution N(0, Σ z ) and θ � (1, −1, 0, 0, 1, 0, 0, 0), here (Σ z ) i,j � 0.5 |i− j| , for i, j � 1, 2, . . . , 8. To have a better comparison, we consider two different values σ 2 � 0.25 and 1, which represent strong and weak signal-to-noise ratios. Furthermore, we take the spatial parameter ρ � 0.5, 0, −0.5, which describes different spatial dependencies and dependence directions. Similar to Xie et al. [11], the weight matrix is set to be W � I R ⊗ H q , H q � (l q l T q − I q )/(q − 1), where l q is an q-dimensional vector with all elements being 1 and ⊗ means Kronecker product. For the functional part, we take the same form as Shin [6]; in other words, the functional coefficient where ξ j are distributed as independent normal with mean zero and variance λ j � ((j − 0.5)π) − 2 and ϕ j (t) � � 2 √ sin((j − 0.5)πt). In the simulation, we consider the noninformative prior information type of hyperparameter values for unknown parameters θ, c, ρ, σ 2 : c 0 � 0 m , B c � I m , c 0 � d 0 � 0.01, c 1 � 1, and d 1 � 0.1, where 0 m is an m-dimensional vector with all elements being 0. In addition, we choose R to be 50 and 100 and q to be 3; thus, the sample size n � R × q equals to 150 and 300. roughout our numerical studies, we choose the truncation parameter m such that the first m functional principal components' scores can explain at least 90% of the total variability of the functional predictor X(t).
Based on the generated dataset and the above simulation settings, we use the previously proposed MCMC-based sampling algorithm in Algorithm 1 to obtain the Bayesian estimates of unknown parameters based on 100 replications. To investigate the convergence of the proposed MCMC algorithm, we compute the EPSR values (estimated potential scale reduction) [23] for a number of test runs on the base of three parallel chains of observations via three different starting values. We find that, in all test runs, the EPSR values are close to 1 and less than 1.2 after discarding the first 3,000 as the burn-in period. So, we can collect the observations after 3000 iterations with M � 2000 in producing the Bayesian estimates for each replication and posterior summary of the parameters is listed in Tables 1-3. In addition, the accuracy of the functional coefficient estimator is checked by the square root of average square errors (RASE) which is defined as follows: where t s , s � 1, 2, . . . , N, are the grid points at which the functional coefficient estimator β(t) is evaluated and N � 200 is used in our simulation. e simulation results of the mean and standard error (SE) of RASE for the functional component under different cases are reported in Table 4. Moreover, to see accuracy of the estimated function β(t) directly, we plot the true value of functional coefficient β(t) against its estimate under different cases, and list these functional curve fitting results under different spatial parameters in Figures 1-4.
In Tables 1-3, "Bias" means the difference between the true value and the average of the Bayesian estimates of the parameters based on 100 replications, "SD" means standard deviation of the Bayesian estimates, and "PP" denotes the proportion that parameter was identified to be zero in 100 replications in terms of the criterion that a parameter was identified to be 0 if its 95% confidence interval contains zero. By careful observation of Tables 1-4, several conclusions are obtained and summarized as follows. (1) Under all the considered settings, the performance of the proposed Bayesian method is quite satisfactory in the sense that their values of Bias and SD are reasonable small. Furthermore, the results of Bayesian estimation become better and better as the sample size increases. (2) As expected, the performance of Bayesian estimation is similar under different spatial parameters. (3) As the variance σ 2 becomes smaller, the Bayesian estimation method performs better. (4) As we predicted, the proposed Bayesian method could identify the correct models in most cases because the proportion values in tables corresponding to the important covariates are all Input: set up initial values ψ (0) � (θ (0) , c (0) , ρ (0) , σ 2(0) ), and the number of iterations of the sampling algorithm is J. Output: a sequence of posterior samples (ψ (1) , ψ (2) , . . . , ψ (J) ). MCMC iterations: (the detailed Gibbs sampler cycles are outlined as follows) For k ⟵ 1 to J do: Sample η −2 j |Y, Z, X, θ, c, ρ, σ 2 ∼ Inverse Gaussian (μ′, λ′); (6) Sample ρ | Y, Z, X, θ, c, σ 2 from (20) based on the Metropolis-Hastings algorithm. End ALGORITHM 1: An MCMC-based sampling algorithm for ψ � (θ, c, ρ, σ 2 ).        which agrees with what was discovered from Table 4. To sum up, all the above findings show that the preceding proposed Bayesian estimation procedure and MCMC-based sampling algorithm can well recover the true information in partial functional linear spatial autoregressive model.

Conclusion and Discussion
In this study, we propose a partial functional linear spatial autoregressive model which can explore the relationship between a scalar spatially dependent response variable and predictive variables containing both multiple scalar covariates and a functional covariate. Based on functional principal components analysis, a Bayesian adaptive Lasso approach is developed to analyse the model by combining the Gibbs sampler and Metropolis-Hastings algorithm. We conduct some simulation studies to show the efficiency of the proposed Bayesian method. e results indicate that the developed Bayesian variable selection method is highly efficient and computationally convenient.
Furthermore, there are several interesting extensions that can be considered in the future. Specific extensions are considered as follows. (i) Compared with classical mean regression, quantile regression analysis should be more robust to nonnormal errors and outliers. erefore, using quantile regression technology to analyse functional data with spatial dependence warrants a future investigation. (ii) Missing data analysis has always been a statistical hotspot. erefore, it is also worth studying of the current model when covariates are missing under different missingness mechanism. We leave these topics for future research.

Data Availability
No data were used to support this study, and computer code is available from the corresponding author upon request.

Conflicts of Interest
e authors declare no conflicts of interest.