Integral Least-Squares Inferences for Semiparametric Models with Functional Data

The inferences for semiparametric models with functional data are investigated.We propose an integral least-squares technique for estimating the parametric components, and the asymptotic normality of the resulting integral least-squares estimator is studied. For the nonparametric components, a local integral least-squares estimation method is proposed, and the asymptotic normality of the resulting estimator is also established. Based on these results, the confidence intervals for the parametric component and the nonparametric component are constructed. At last, some simulation studies and a real data analysis are undertaken to assess the finite sample performance of the proposed estimation method.


Introduction
In the recent literature, there has been increased interest in regression modeling for functional data, where both the predictor and response are random functions.Compared with the discrete multivariate analysis, functional data analysis can take into account the smoothness of the high dimensional covariates and can suggest some new approaches to the problems that have not been discovered before.Examples of functional data can be found in different application fields such as biomedicine, economics, and archaeology (see Ramsay and Silverman [1]).Furthermore, the statistical analysis for the regression model with functional data also has been considered by many authors.For example, Ramsay and Silverman [2] studied the linear regression model with functional data.Ait-Saïdi et al. [3] proposed a cross-validated estimation procedure for the single-functional index model.Ferraty et al. [4] and Chen et al. [5] considered the inferences for single and multiple index functional regression models by using the functional projection pursuit regression technology.In addition, Ferraty and Vieu [6] and Rachdi and Vieu [7] considered the nonparametric regression modeling for functional data.More works for the functional data analysis can be found in [8][9][10] and among others.
However, the linear functional model, which assumes that the model satisfies the linear relationship between the response and the covariates, may be too restrictive.Then, the semiparametric model with functional data is a useful extension of functional linear regression models and functional nonparametric regression models.More specifically, let (), (), and () be continuous random functions of index ; then, the semiparametric regression model with functional data has the following structure: where () is the response variable, () is the  × 1 covariate vector, () is the  × 1 covariate vector,  = ( 1 , . . .,   )  is a vector of unknown parameters, () = ( 1 (), . . .,   ())  is a vector of unknown function of , and () is a zero-mean stochastic process.Here, without loss of generality, we assume that index  ranges over a nondegenerate compact interval and extend the application literature of the classical leastsquares technology.More specifically, we propose an integral least-squares method for estimating the parametric components and the nonparametric component.Furthermore, the asymptotic normalities of the integral profile least-squares estimators are studied.Some simulation studies and a real data application imply that the proposed method is workable.The rest of this paper is organized as follows.In Section 2, we introduce the integral least-squares based estimation procedure for the parametric components and the nonparametric components.The asymptotic distributions of these estimators are also derived under some regularity conditions.In Section 3, some simulations and a real data analysis are carried out to assess the performance of the proposed estimation method.The technical proofs of all asymptotic results are provided in the Appendix.

Estimation and Asymptotic Distributions
For given , applying the integral least-squares method, we can get the weighted local integral least-squares estimator of { 1 (), . . .,   ()} by minimizing where  ℎ (⋅) = ℎ −1 (⋅/ℎ), (⋅) is a kernel function, ℎ is a bandwidth, and   () denote the th component of   ().Let Then, the solution to ( 4) is given by where   is  ×  identity matrix and 0  is  ×  zero matrix.Substituting ( 6) into (2), and by a simple calculation, we have where Applying the integral least-squares technology to linear model (7), we can get the integral least-squares estimator of , say β, by minimizing If the matrix Γ is invertible, β can be given by Let where where In order to construct the confidence interval of  by Theorem 1, we give the estimator of Σ, say Σ = Γ−1 B( β) Γ−1 , where Γ is defined in (11) and Invoking ‖ β − ‖ =   ( −1/2 ), with the similar argument to Lemma A.6, we can prove that Σ is a consistent estimator of Σ.Thus, by Theorem 1, we have where   is an identity matrix of order .Therefore, the confidence region of  can be constructed by using (17).Furthermore, substituting β into (6), we can get the integral least-squares estimator of () as We state the asymptotic normality of θ() in the following theorem.

Numerical Results
In this section, we conduct several simulation experiments to illustrate the finite sample performances of the proposed method and consider a real data set analysis for further illustration.

Simulation Studies.
To evaluate the performance of the proposed method, we consider the following model: where  = 2.5 and () = sin(2).To perform the simulation, we generated  = 30, 50, 100 samples, respectively.The covariates () and () are generated according to the model where  01 ∼ (5, 7),  02 ∼ (0, 1.5),  01 ∼ (0, 5), and  02 ∼ (0, 1), respectively.We use the Epanechnikov kernel function () = 0.75(1−  2 ) + and use the cross-validation method to determine bandwidth ℎ.Let θ− (⋅) and β− be the integral least-squares estimators of (⋅) and , respectively, which are computed with all of the measurements but not the th observation.Define the integral least-squares cross-validation function ( The cross-validation bandwidth is the one that minimizes (25); that is, For the parametric component , the average and standard deviation of the estimator β, based on 1000 simulations, are reported in Table 1.In addition, the average length and coverage probability of the confidence interval ( β , β ), with a nominal level 1 −  = 95%, are computed with 1000 simulation runs.The results are also summarized in Table 1.
For the nonparametric component (), the average pointwise confidence intervals, based on 1000 simulations, with a nominal level 1 −  = 95% are presented in Figure 1, and the corresponding coverage probabilities are presented in Figure 2.
Table 1 shows that, for the parametric component, our method can give a shorter confidence interval and the corresponding coverage probability is close to real nominal level.Figures 1 and 2 show that the average interval length decreases as the sample size increases, while the corresponding coverage probability increases.In addition, we can see that, for the nonparametric component, the proposed estimation method works well except for boundary points.

Application to Spectrometric Curves Data.
In this section, we present an application of the proposed estimation method to spectrometric curves data.This original data comes from a quality control problem in the food industry.This data set concerns a sample of finely chopped meat, and each food sample contains finely chopped pure meat with different fat, protein, and moisture (water) contents.The sample size of this data set is  = 240, and, for each food sample, the functional data consist of 100 channel spectrum of absorbances, which were recorded on the Tecator Infratec Food and Feed Analyzer working in the wavelength range 850-1050 nm by the near infrared transmission (NIT) principle.Because of the fineness of the grid, we can consider each subject as a continuous curve.Thus, each spectrometric analysis can be summarized by some continuous curves giving the observed absorbance as function of the wavelength.More details of the data can be found in Ferraty and Vieu [16].
The aim is to find the relationship between the percentage of fat content () and the corresponding percentages of protein content  1 (), the moisture content  2 (), and the spectrometric curve .The results, obtained by Aneiros-Pérez and Vieu [11], indicate that there is a strong linear relationship between the fat content and the protein and moisture contents, but the spectrometric curve  has a functional effect on the fat content.Hence, we consider the following semiparametric model: We computed the estimators of the parametric components  1 and  2 and the nonparametric component (⋅) by using the proposed integral least-squares method.The results for the parametric components are reported in Table 2, and the results for the nonparametric components are reported in Figure 3, where the solid curve is the estimator of (⋅) and the dashed curve is the pointwise confidence interval of (⋅).From Table 2, we can see that there is a significant negative correlation relationship between the fat content and the protein and moisture contents.In addition, Figure 3 indicates that the baseline function (⋅) really varies over the spectrometric curve .This finding basically agrees with what was discovered by Aneiros-Pérez and Vieu [11].

Proof of Theorems
For convenience and simplicity, let  denote a positive constant which may be different value at each appearance throughout this paper.Before we prove our main theorems, we list some regularity conditions which are used in this paper.

Table 1 :
The estimators and the 95% confidence intervals for .

Table 2 :
The estimators and the 95% confidence intervals for .