Empirical Likelihood for Generalized Functional-Coefficient Regression Models with Multiple Smoothing Variables under Right Censoring Data

Empirical likelihood as a nonparametric approach has been demonstrated to have many desirable merits for constructing a confidence region. .e purpose of this article is to apply the empirical likelihood method to study the generalized functionalcoefficient regression models with multiple smoothing variables when the response is subject to random right censoring. .e coefficient functions with multiple smoothing variables can accommodate various nonlinear interaction effects between covariates. .e empirical log-likelihood ratio of an unknown parameter is constructed and shown to have a standard chi-squared limiting distribution at the true parameter. Based on this, the confidence region of the unknown parameter can be constructed. Simulation studies are carried out to indicate that the empirical likelihood method performs better than a normal approximationbased approach for constructing the confidence region.


Introduction
In studying the relationship between a response and a set of predictor variables or regressors, the mean response variable is often assumed to be a linear regression function of the regressors. Recently, there has been increasing interest and activity in the general area of semiparametric regression modeling in statistics to analyse the high-dimensional data. Semiparametric models are often employed in regression analysis because they well balance between flexibility and fidelity. Among them, scholars have studied the semiparametric functional-coefficient regression model in recent years due to its direct connectivity to the classical linear model. e semiparametric functional-coefficient regression models with multiple smoothing variables have the following form: where Y is the response, X � (X 1 , . . . , X p ) T , Z � (Z 1 , . . . , Z d ) T , and U � (U 1 , . . . , U d ) T are predictors, β � (β 1 , . . . , β p ) T is a vector of p-dimensional unknown parameters, α(U) � (α 1 (U 1 ), . . . , α d (U d )) T is a d-dimensional vector of unspecified smooth coefficient functions, and ϵ is the model error.
Model (1) belongs to the class of semiparametric functional-coefficient regression model and has many useful models as its special cases. Most of previous works are based on the assumption that all coefficient functions share the same smoothing variable. For instance, when all coefficient functions share the same smoothing variable and α(·) ≡ constant, model (1) reduces to the usual linear regression model. When d � 1 and Z ≡ 1, model (1) reduces to the famous partially linear regression model which was first introduced by Engle et al. [1] to study the influence of weather on electricity demand and further studied by Chen [2], Härdle et al. [3], and Liang and Fan [4] among others. When all coefficient functions share the same smoothing variable, model (1) becomes the varying-coefficient partially linear model which has been widely studied in the literature as well, for instance, the work of Zhang et al. [5], Li et al. [6], Fan and Huang [7], and Huang and Zhang [8] among others. When the coefficient functions have different smoothing variables, Ip et al. [9] used a generalized likelihood ratio test to test coefficient functions in functional-coefficient regression models; Zhang and Li [10] discussed the functionalcoefficient regression models with different smoothing variables in different coefficient functions and defined the integrated estimates of the coefficient functions by marginal integration; Zhang and Li [11] introduced averaged estimation for coefficient functions in functional-coefficient regression models with different smoothing variables; Zhang and Li [12] proposed a procedure for estimating the coefficient functions in the functional-coefficient regression models with different smoothing variables in different coefficient functions; Yang and Lee [13] discussed the semiparametric efficient estimation for generalized functionalcoefficient regression models with multiple smoothing variables.
However, the abovementioned related articles are based on the assumption that the data are fully observable. In practice, censored data are widely existing and often encountered. For example, in medical research, termination of studies or subjects leaving studies by various reasons occurs, which is referred to as the random right censoring. e techniques of complete data cannot be directly applied to censored data. Specifically, we consider the following generalized functional-coefficient regression models with multiple smoothing variables: where Y is subject to random right censoring and ϕ(·) is a unknown function, which is introduced to include various functions according to ones interest, not to deal with random right censoring. For example, it may be the identity function or I(· ≤ t), which produces the conditional mean or the conditional probability function given covariates, respectively. When the response was random right censored, Buckley and James [14] gave a method of estimating parameters in the linear regression model; Bravo [15] considered the problem of estimation and inference in semiparametric varying-coefficients partially linear models; for semiparametric varying-coefficient models with different smoothing variables, Yang [16] applied a mean-preserving transformation method to construct estimators for unknown parameters and functions and established their asymptotic normalities. It is well known that the empirical likelihood (EL) method proposed by Owen [17] has many advantages over the normal approximation (NA) method and the bootstrap methods for constructing confidence intervals. e most appealing features are that the confidence regions constructed by the EL method do not require estimating the covariance of the estimator, and the shape and orientation of confidence regions are only determined by the data. In addition, the confidence region derived by the NA-based method is predetermined to be symmetric which may not be adequate when the underlying distribution is typically asymmetric. More discussions on advantages of the EL method over the existing methods can be found in the monograph of Owen [18]. In the past two decades, lots of scholars applied the EL approach to investigate various regression models, such as the partially linear model under right censored data [19], partially linear single-index model [20], semiparametric varying-coefficient partially linear regression models [21], longitudinal partially linear model with α-mixing errors [22], high-dimensional partially linear varying-coefficient model with measurement errors [23], linear quantile regression model with the covariates missing at random [24], among others.
In this article, we shall use the EL method to study the generalized functional-coefficient regression models with multiple smoothing variables when the response is subject to random right censoring. e coefficient functions with different smoothing variables can accommodate various nonlinear interaction effects between covariates that are admitted in the model. e empirical log-likelihood ratio for the unknown parameter is constructed based on the smooth backfitting technique, and it is shown to possess a nonparametric version of Wilks' theorem. e rest of this article is organized as follows. In Section 2, we propose the empirical log-likelihood ratio statistic for the unknown parameter and give the main result. A simulation study is presented in Section 3. Assumption conditions and proofs of the main result are relegated to the Appendix.

Introduction for Right Censored Semiparametric Model.
Let W � (X T , Z T , U T ) T and C be a set of covariates and a censoring variable, respectively. We assume that a random sample (W i , Y i ), i � 1, 2, . . . , n of (W, Y) comes from model (2). In the presence of random right censoring, we observe a random sample (T i , δ i ) of a pair (T, δ) instead of observing Y i . e observed Survival Time T is min (Y, C) and the censoring indicator is δ � I(Y ≤ C), which equals 1 if Y ≤ C and 0 otherwise. Since the estimation of the model cannot be obtained directly, the following mean-preserving transformation is considered: where G is the distribution function of the censoring variable C. In (3), Y G is zero when the response is censored and } when the response is observed. On the other hand, under the assumptions given below, by noting that Although Y G depends on the choice of ϕ, i.e, Y G � Y G (ϕ), we omit the dependence in the expression throughout the paper for notational convenience. It also can be seen that (4) holds for any ϕ.

Empirical Likelihood for Right Censored Data.
In this section, we shall apply the EL method to construct confidence regions of the unknown parameter for generalized functional-coefficient regression models (2) with multiple smoothing variables. For the given β, α(·) can be estimated by using some smoothing techniques with response Y G,i − X T i β. Here, we use the smoothing backfitting technique studied in Lee et al. [25] and Yang [16] and denote the resultant estimator as α G (·; β). Let α ll G,j (·; β) satisfy the following system of integral equations: where Note that α ll G (·;β) estimates not only the coefficient functions but also their derivatives. us, only the first components of α ll G,j (·;β)(j � 1,...,d) constitute α G (·;β) � (α G,1 (·;β),..., α G,d (·;β)) T .
An auxiliary random variable is introduced as follows: Similar to Owen [26], an empirical log-likelihood ratio can be defined as For any given β, a unique value for l n (β) exists, and we assume that 0 is inside the convex hull of the points η 1 (β), . . . , η n (β) . By the Lagrange multiplier method, we can obtain p i � 1 + λ τ η i (β) /n, and l n (β) can be expressed as where λ is the solution of the following equation: Theorem 1. Suppose that the conditions (C1)-(C11) in the Appendix hold. If β 0 is the true value of the parameter β, then where ��→ d stands for convergence in distribution, χ 2 p is a standard chi-square random variable with p degree of freedom.

Simulation Study
In this section, we conduct some simulations to assess the performance of the proposed empirical likelihood inference procedure. roughout this section, the kernel function is taken to be K(u) � 0.75(1 − u 2 ), |u| ≤ 1, and the bandwidth h is selected by the cross-validation procedure. We compare our proposed EL method with the normal approximation (NA) method through simulated examples and a real data example.
For random right censoring, we considered C ∼ U(− 6, 6) + c 0 , where c 0 is a constant to control the percentage of censoring. In our simulations, we take c 0 as 2.5, 4.0, and 6.6 to yield approximately 30%, 20%, and 10% censoring rate, respectively. We also take ϕ(t) � tI(t ≤ t 0 ) for the transformation of the response, where we set t 0 to the largest uncensored observation. e sample size n is chosen to be 100, 200, and 300, respectively. e coverage probabilities (CPs) and the average lengths (ALs) are calculated based on the EL and NA methods for the normal error and uniform distribution error with 1000 replicates. e simulation results of CPs and ALs with the nominal level 1 − α � 0.95 are reported in Table 1. Further, we give the confidence regions of (β 1 , β 2 ) with the nominal level 1 − α � 0.95 in Figures 1 and 2 for the normal error and uniform distribution error, respectively. Table 1 indicates that the EL method outperforms the NA method, since the EL method gives more accurate coverage probability and shorter average length for almost all settings. We also find that the EL and NA methods perform worse as the censoring rate increases or the sample size decreases. Furthermore, we can see from Figures 1 and 2 that, the EL method performs better than the NA method for different settings. When the censoring rate increases, the confidence region gets larger for two different model error cases. In sum, we strongly recommend our proposed EL Discrete Dynamics in Nature and Society   Discrete Dynamics in Nature and Society    Discrete Dynamics in Nature and Society method to construct the confidence regions for the unknown parameter in generalized functional-coefficient regression models with multiple smoothing variables under right censoring data.

Real Data Analysis.
In this subsection, we apply our proposed empirical likelihood method to analyse the lung cancer data from the Veteran's Administration lung cancer trial [27]. In this trial, a standard or testing chemotherapy was randomly assigned to a male suffering from inoperable lung cancer, and 137 males entered into the follow-up at the start of this trial. ere are 8 variables for all the patients which were also recorded: Survival in Days (denoted as Y), Status (1 � dead and 0 � censored) (denoted as δ), Treatment Indicator (1 � standard and 2 � test) (denoted as X 1 ), Cell Type (1 � squamous, 2 � small cell, 3 � adeno, and 4 � large) (denoted as X 2 ), Age (denoted as U 1 ), Karnofsky Score (denoted as U 2 ), Months from Diagnosis (denoted as Z 1 ), and Prior erapy (0 � no and 10 � yes) (denoted as Z 2 ).    Discrete Dynamics in Nature and Society Now, we apply model (2) to fit the data and use our proposed EL method and NA method to construct the confidence regions of (β 1 , β 2 ). e censoring rate of the data is about 6.57%. e profile least squares estimator of β is estimated as β � (0.1587, 0.6003) T . e confidence regions based on the EL method and NA method of (β 1 , β 2 ) with nominal level 1 − α � 0.95 and 0.90 are given in Figure 3. From Figure 3, we can see that the confidence regions contain the profile least squares estimate for EL and NA approaches. Furthermore, the confidence regions based on the EL method are smaller than those based on the NA method for different scenarios. Besides, the confidence regions with nominal level 1 − α � 0.95 are larger than those with nominal level 1 − α � 0.90. Also, from the confidence regions of the EL method, we can see that the Cell Type is an important factor in the interpretation of Survival in Days. e confidence regions of the Treatment Indicator include zero which indicates that the Treatment Indicator is not a significant factor, that is, there is no difference between the two chemotherapies in the lung cancer treatment.

Proof of the Main Results
Here, Π(H) denotes the projection onto the space H. In order to prove the main theorems, we first list some regularity conditions which are used in this paper. ese conditions are mild and also assumed in [16].
Proof. Observe that converges in probability to Γ. is comes from the fact that and this can be proved from the standard theory of smooth backfitting. us, We will prove that A 2,n and A 3,n are o p (1). As for A 2,n , observe that the conditional mean of A 2,n given X i , Z i , U i , i � 1, . . . n, is zero, and the conditional variance of kth component of A 2,n is given by is converges to zero in probability from the fact that Discrete Dynamics in Nature and Society sup