Automatic Variable Selection for Partially Linear Functional Additive Model and Its Application to the Tecator Data Set

We introduce a new partially linear functional additive model, and we consider the problem of variable selection for this model. Based on the functional principal components method and the centered spline basis function approximation, a new variable selection procedure is proposed by using the smooth-threshold estimating equation (SEE). The proposed procedure automatically eliminates inactive predictors by setting the corresponding parameters to be zero and simultaneously estimates the nonzero regression coefficients by solving the SEE. The approach avoids the convex optimization problem, and it is flexible and easy to implement. We establish the asymptotic properties of the resulting estimators under some regularity conditions. We apply the proposed procedure to analyze a real data set: the Tecator data set.


Introduction
Functional data may be viewed as realization of observed stochastic processes, and it is commonly encountered in many fields of applied sciences, such as econometrics, biomedical studies, and physics experiment.The Tecator data set is collected by the Tecator company and is publicly available at http://lib.stat.cmu.edu/datasets/tecator.This data set consists of 215 meat samples.The measurements were made through a spectrometer named the Tecator Infratec Food and Feed Analyzer, and the spectral curves were recorded at wavelengths ranging from 850 nm to 1050 nm.For each meat sample, the data consists of a 100 channel spectrum of absorbances as well as the contents of moisture (water), fat, and protein.The three contents of fat, protein, and moisture (water), measured in percentages, are determined by analytic chemistry.We aim to predict the fat content of a meat sample.In this paper, we propose a new partially linear functional additive model and apply the SEE procedure to analyze the Tecator data set.
With the development of computer technology, much progress has been made on developing methodologies for analyzing functional data by many researchers, like Ramsay and Silverman [1], Cardot, Ferraty, and Sarda [2], Lian and Li [3], Fan, James, and Radchenko [4], Feng and Xue [5], Yu, Zhang, and Du [6], Zhou, Du, and Sun [7], among others.Regression models play a major role in the functional data analysis.The most widely used regression model is the following functional linear model: where  is a scalar response, functional predictor () is a smooth and square-integrable random function defined on a compact domain T = [0, 1] for simplicity, () is the squareintegrable regression parameter function, and  is a random error, which is independent of ().A commonly adopted approach for fitting model is the basis expansion; that is, () = () + ∑ ∞ =1     (), where () = {()}.Model (1) is then transformed to a linear form with the coefficients   : ( | ) =  0 + ∑ ∞ =1     , where  0 = ∫ 1 0 ()()d and   = ∫ 1 0 ()  ()d.The basis function set {  ()} can be either predetermined (e.g., Fourier basis, wavelets, or B-splines basis) or data-driven.One convenient choice for data-driven basis is the eigenbasis of the autocovariance operator of (), in which case the random coefficients {  } are called the functional principal component (FPC) scores.The FPC 2 Mathematical Problems in Engineering scores have zero means and variances equal to the corresponding eigenvalues {  ,  = 1, 2, ⋅ ⋅ ⋅}.We focus on the FPC representation of the functional regression throughout this paper.
Müller and Yao [8] relaxed the linearity assumption and proposed a functional additive model (FAM).This leads to a more widely applicable and flexible framework for the functional regression models.In the case of scalar response, the linear structure is replaced by the sum of nonlinear functional components; that is, where   (  ) are unknown smooth functions.Model (2) was fitted by estimating {  } by the functional principal component analysis, and by estimating {  } by the local polynomial smoothing in Müller and Yao [8].Zhu, Yao, and Zhang [9] proposed a new regularization framework for the structure estimation of FAM in the context of reproducing kernel Hilbert spaces.The selection was achieved by the penalized least squares using a penalty which encourages the sparse structure of the additive components, and the rate of convergence was investigated.However, in many real world problems, it is common to collect information on a large number of nonfunctional predictors.How to incorporate scalar predictors into the functional regression and perform model selection are important issues.In this paper, we combine the linear model with the functional additive model and introduce a new partially linear functional additive model (PLFAM).
Traditional real-value additive models were studied in Stone [10], Wang and Yang [11], Huang, Horowitz and Wei [12], and Zhao and Xue [13].When the explanatory variables are of functional nature, Ferraty and Vieu [14] used a two-step procedure to estimate an additive model with two functional predictors.Fan, James, and Radchenko [4] suggested a new penalized least squares method to fit the nonlinear functional additive model.This method can efficiently fit the highdimensional functional models while simultaneously performing variable selection to identify the relevant predictors.Febrero-Bande and Gonzalez-Manteiga [15] extended the ideas of the generalized additive models with multivariate data to the functional data covariates.The proposed algorithm was a modified version of the local scoring and backfitting algorithm that allows for the nonparametric estimation of the link function.
In the last decades, variable selection has received substantial attention, which has been a very important topic in regression analysis.Generally speaking, most of the variable selection procedures are based on penalized estimation based on some penalty functions, like Lasso penalty [16], SCAD penalty [17], Adaptive Lasso [18], and so on.However, these penalty functions have a singularity at zero such that these penalized estimation procedures require convex optimization, while adding the burden of computation.To overcome this problem, Ueki [19] developed a new variable selection procedure called the smooth-threshold estimating equation that can automatically eliminate irrelevant parameters by setting them as zero.The method has also been successfully applied to a large class of models; for example, Lai, Wang and Lian [20] explored the generalized the estimation equation (GEE) estimation and the smooth-threshold generalized estimation equation (SGEE) variable selection for singleindex model with clustered data.Li et al. [21] considered the SGEE variable selection for the generalized linear model with longitudinal data.Tian, Xue, and Xu [22] proposed a smooththreshold estimating equation variable selection for varying coefficient models with longitudinal data.
As we know, functional regression models have been widely applied to engineering problems.For example, Escabias, Aguilera, and Valderrama [23] used functional logistic regression to deal with the environmental problem, which is to estimate the risk of drought in a specific zone from time evolution of temperatures.Sonja, Branimir, and DraDen [24] dealt with tool wear in milling process and the prediction of its behaviour by utilizing functional data analysis (FDA) methodology.Pokhrel and Tsokos [25] applied functional data analysis techniques to model age-specific brain cancer mortality trend and forecast entire age-specific functions using exponential smoothing state-space models.
In this article, we propose a new functional regression model and consider the variable selection problem for this model; then we apply the proposed procedure to analyze the Tecator data set.Motivated by the idea in Ueki [19], based on the functional principal components analysis and the centered spline basis function approximation, an automatic variable selection procedure is proposed using the smooththreshold estimating equation.The proposed procedure automatically eliminates the irrelevant parameters in the model, while estimating the nonzero regression coefficients.Our approach can be easily implemented without solving any convex optimization problems, and it reduces the burden of computation.The proposed method shares some of the desired features including the oracle property.The proposed smooth-threshold estimating equation approach is flexible and easy to implement.Finally, the proposed method is applied to analyze a real data set: the Tecator data set.The validity of the partially linear functional additive model and the SEE method are confirmed.
The rest of this paper is organized as follows.In Section 2, we propose a variable selection procedure for PLFAM and study the asymptotic properties under some regularity conditions.In Section 3, we give the computation of the estimators as well as the choice of the tuning parameters.In Section 4, we apply the proposed method to analyze the Tecator data set.Concluding remarks are presented in Section 5.The technical proofs of all asymptotic results are provided in the Appendix.

Smooth-Threshold Estimating Equation.
For the convenience of model regularization, we would like to restrict the predictor variables to take values in [0, 1] without loss of generality.This is achieved by taking a transformation of the FPC scores through a cumulative distribution function (CDF) Ψ :  → [0, 1] for all {  }.We take the normal CDF denoted by Ψ(⋅,   ), with zero mean and variance   .It is easy to see that, if   s follow normal distribution, the normal CDF leads to uniformly distributed transformed variables on [0, 1].
Denoting the transformed variable of {  } by {  }, i.e.,   = Ψ(  ,   ), and denoting  ,∞ = ( 1 ,  2 , ⋅⋅⋅)  , we propose a partially linear functional additive model as follows: where   is independent error with zero mean and variance  2 ,  is a  × 1 vector of unknown regression coefficients, and To ensure identifiability, we assume that  0 (  ) = 0.In this paper, we assume that the PLFAM is the sparse structure of the underlying true model, and this assumption is critical in the context of functional data analysis.It means that the number of important functional additive components that contribute to the response is finite, but not necessarily restricted to the leading terms.In particular, we denote by I the index set of the important FPC scores and assume that |I| < ∞, where | ⋅ | denotes the cardinality of a set.In other words, there is a sufficiently large  such that I ⊆ {1, ⋅ ⋅ ⋅, }, which implies that   ≡ 0 as long as  > .Model (3) is thus equivalent to We replace  0 (  ) with its basis function approximations.Let (  ) = ( 1 (  ), ⋅ ⋅ ⋅,   (  ))  be the centralized B-spline basis functions with the order of , where  =  + , and  is the number of interior knots.Thus,  0 (  ) can be approximated by Substituting this into model (4), we have Let Motivated by the idea of Ueki [19], we propose the following smooth-threshold estimating equation: where  =  + ,   is the -dimensional identity matrix, and Δ is the diagonal matrix, i.e., Δ = diag(Δ 1 , Δ 2 ), where Note that  1 = 1 reduces to   = 0 for  = 1, ⋅ ⋅ ⋅, , and  2 = 1 reduces to   = 0, that is,  0 (⋅) = 0, for  = 1, ⋅ ⋅ ⋅, .Therefore, (8) can yield a sparse solution.Unfortunately, we cannot directly obtain the estimator of  by solving the smooth-threshold estimating equation ( 8).The reason is that (8) involves the unknown parameters  1 and  2 , which need to be chosen using some data-driven criteria.

Asymptotic Properties.
We first introduce some notations.Let  0 (⋅) and  0 denote the true values of (⋅) and , respectively. 0 is the spline coefficient vector from the spline approximation to  0 (⋅).Denote  0 = (  0 ,   0 )  , A 10 = { : For convenience and simplicity, let  denote a positive constant that may be different at each appearance throughout this paper.We list some regularity conditions that are used in this paper.
The following theorem gives the consistency of our proposed estimators.

Theorem 2. Under the regularity conditions of Theorem 1, one has
Next, we will show that the estimators of the nonzero coefficients for the parametric components have the same asymptotic distribution as that based on the correct model.

Issues in Practical Implementation
3.1.Computational Algorithm.Since the transformed FPC scores {  } cannot be observed, we first need to estimate the FPC scores before the estimation and selection of  0 (⋅).In what follows, we propose the algorithm to implement the estimation procedure.
Step 2. Calculate the initial estimate θ0 of  by solving the estimating equation () = 0.
Step 3. Choose the tuning parameters ] and  by the BIC-type criterion in the next subsection.

Application to Real Data
We demonstrate the effectiveness of the proposed method by an application for the Tecator data set.The Tecator data set contains 215 samples; each sample contains the finely chopped pure meat with different moisture, fat, and protein contents, which are measured in percentages and are determined by the analytical chemistry.The functional covariate by () for each food sample consists of a 100-channel spectrum of absorbances recorded on a Tecator Infratec Food and Feed Analyzer working in the wavelength range 850-1050 nm by the near-infrared transmission (NIT) principle.In this analysis,  is the fat content, {  } are functional principal component (FPC) scores of (), and   = Ψ(  ,   ), we take the protein and the moisture content by  1 and  2 , respectively.In order to predict the fat content of a meat sample, many models and algorithms are proposed to fit the data; see, for example, Aneiros-Pérez and Vieu [26].In this paper, to fit the data, we consider the following partially linear functional additive model: To compare the performance of different models, the sample is divided into two data sets: the training sample  1 = {1, ⋅ ⋅ ⋅ , 165} is used to obtain the estimators of the parameter and the nonparametric function, and the testing sample  2 = {166, ⋅ ⋅ ⋅ , 215} is used to verify the quality of prediction by the following mean squared errors of prediction (PMSE), In this section, we consider the variable selection problem for model (16).We apply the proposed smooth-threshold estimating equation approach to eliminate the irrelevant parameters in the model, while estimating the nonzero regression coefficients.Steps are as follows.
Step 2. Calculate the initial estimate θ0 of  by solving the estimating equation () = 0.
Step 3. Choose the tuning parameters ] and  by the BIC-type criterion.
(1) Real explanatory variables  1 and  2 can be used to improve the accuracy of the prediction; it is consistent with the conclusion of Aneiros-Pérez and Vieu [26].
(2) Compared with these additive semifunctional models, the PLFAM has a much smaller PMSE.The proposed PLFAM performs better than other models.
(3) Compared to the SCAD-penalized method and the Lasso-penalized method, the PMSE of PLFAM (SEE method) is smaller than all of the results of them.
These conclusions confirm the validity of the proposed PLFAM and SEE method.

Concluding Remarks
The article develops a SEE procedure for automatic variable selection in the partially linear functional additive model.The proposed procedure can identify nonzero regression coefficients significant variables from the parametric components and the nonparametric components simultaneously, and it automatically eliminates the irrelevant parameters by setting them as zero and simultaneously estimates the nonzero regression coefficients.It is noteworthy that the proposed procedure avoids the convex optimization problem, and the resulting estimator enjoys the oracle property.The application to the Tecator data set confirms the validity of the proposed PLFAM model and the SEE method.