Statistical Inference for the Heteroscedastic Partially Linear Varying-Coefficient Errors-in-Variables Model with Missing Censoring Indicators

Key Laboratory of Advanced eory and Application in Statistics and Data Science MOE, School of Statistics, East China Normal University, Shanghai 200062, China College of Economics and Management, Shanghai Maritime University, Shanghai 201306, China School of Mathematics, Hefei University of Technology, Hefei 230009, China School of Mathematics and Statistics, Huangshan University, Huangshan 245041, China


Introduction
In regression analysis, for a long period of time, the flexible and refined statistical regression models are widely applied in theoretical study and practical application. e main results related to parameter regression models and nonparameter regression models are rather mature. Recently, semiparameter regression models can reduce the high risk of misspecification related to parameter regression models and avoid the "curse of dimensionality" for nonparametric regression models. anks to their advantage, semiparametric regression models enjoy consideration attention from statisticians. Semiparametric regression models have various forms. Specially, partially linear varying-coefficient errors-in-variables (PLVCEV) model, as a typical example, was introduced by You and Chen [1] and has the following form: where V is the response variable, X ∈ R p , Z ∈ R q are the covariates, β � (β 1 , . . . , β p ) ⊤ is a vector of p-dimensional unknown parameter, α(·) � (α 1 (·), . . . , α q (·)) ⊤ is an unknown q-dimensional vector of coefficient function, and ε is the random error. e measurement error e is independent of (X, Z, U) with mean zero and covariance matrix Σ e . In order to identify the model, Σ e is assumed to be known.
As one general and flexible semiparametric model, model (1) includes a variety of models of interest. When X is observed exactly, model (1) boils down to be PLVC model [2,3]. When Z � 1, q � 1 and X is observed exactly, model (1) reduces to partially linear regression model [4]. When X is observed exactly, and α(·) is a constant vector, model (1) becomes a linear regression model. When q � 1 and Z � 1, model (1) reduces to partially linear EV model [5]. For model (1), You and Chen [1] proposed estimators of parametric and nonparametric components and showed their asymptotic properties. Liu and Liang [6] constructed the asymptotical normality of jackknife estimator for error variance and standard chi-square distribution of jackknife empirical log-likelihood statistic. Fan et al. [7] established penalized profile least squares estimation of parameter and nonparameter in the model. e literature mentioned above assumed that the random errors are homoscedastic, which means that the random error ε is independent of (X, Z, U). However, in many practical application fields, the error variance function may change with the variables. Heteroscedastic error models have attracted much attention of many scholars. For example, You et al. [8] considered the estimation of parametric and nonparametric parts for partially linear regression models with heteroscedastic errors. Fan et al. [9] constructed confidence regions of parameter for heteroscedastic PLVCEV model based on empirical likelihood method. Shen et al. [10] discussed estimation and inference for PLVC model with heteroscedastic errors. Xu and Duan [11] extended the results of Shen et al. [10] to efficient estimation for PLVCEV model with heteroscedastic errors. e above related works assumed that the responses are observed completely. However, in many practical fields, especially in biomedical studies and survival analysis, the response cannot be completely observed due to censored variables. Huang and Huang [12,13] discussed the constructed confidence regions of the parameters for varyingcoefficient single-index model and partially linear singleindex EV model by empirical likelihood method under censored data, respectively. e aforementioned results require that the censoring indicators be always observed. However, the censoring indicators may not be observed completely. For example, the death of individual is attributable to the cause of interest that may require information that is not gathered or lost due to various reasons [14]. In this paper, we assume that the censoring indicators are missing at random (MAR), which is common and reasonable in statistical analysis with missing data [15]. ere are a lot of works related to missing censoring indicators. For example, Wang and Dinse [16] and Li and Wang [17] proposed weighted least square estimators of unknown parameter and proved their asymptotical normality for linear regression model. Shen and Liang [18] discussed the estimation and variable selection for PLVC quantile regression model. Wang et al. [19] considered composite quantile regression for linear regression model. However, there is no literature focusing on the estimation and confidence regions of heteroscedastic errors model with right-censored data when the censoring indicators are MAR.
In this paper, we consider modified profile least square (PLS) estimators of the unknown parameter and local linear estimators of the coefficient function. Besides the point estimation, we are also interested in interval estimation in terms of empirical likelihood (EL) method, which, first introduced by Owen [20], is a very effective method for constructing confidence regions, which enjoys a lot of nice properties over the normal approximation-based methods and bootstrap approach. anks to its advantage, there are a lot of literature-related EL methods to refer to. For instance, Fan et al. [21] considered penalized EL for high-dimensional PLCVEV model. Wang and Drton [22] established estimation for linear structural equation models with dependent errors based on EL method. Fan et al. [23] discussed weighted EL for heteroscedastic varying-coefficient partially nonlinear model with missing data. Zou et al. [24] considered EL inference for partially linear single-index EV model with missing censoring indicators.
It is worth pointing out that it is innovative and interesting in studying the PLVCEV model with heteroscedastic errors under censoring indicators MAR. us, we consider estimation and confidence regions based on modified profiled LS method and EL inference, respectively. e main aims of this paper include the following aspects: (1) define a class of modified PLS estimators of the parameter and local linear estimators of coefficient function based on regression calibration, imputation, and inverse probability weighted approaches, and prove the asymptotical normality of the proposed estimators; (2) construct reweighted estimators of the parameter and coefficient function based on estimators of the error variance function, and establish the asymptotic properties of the proposed estimators; (3) develop the asymptotic standard chi-squared distribution of the empirical log-likelihood ratio functions, construct the confidence regions for the parameter, and propose the asymptotic distribution of the corresponding maximum EL estimators. Finally, a simulation study and a real data analysis are conducted to demonstrate the finite sample performance of the proposed procedures. e rest of this paper is organized as follows. In Section 2, we construct modified PLE estimators of the parameter and local linear estimators of the coefficient function. In Section 3, we proposed empirical log-likelihood ratio statistics and maximum EL estimators. e main results are shown in Section 4. Section 5 presents simulation and real data analysis. In Section 6, we show some conclusions. e proofs of the main results are shown in Appendix.

Methodology
Suppose that (V i , X i , Z i , U i ), i � 1, . . . , n is a sample from model (1), that is, where the model error , which is an unknown function of U i representing heteroscedastic error. In the practical 2 Discrete Dynamics in Nature and Society application, the response V i may be right censored by various reasons. Let C i be censoring time with distribution function (df ) G(·). One can only observe Y i � min(V i , C i ) with df H and censoring indicator δ i � I(V i ≤ C i ). Define the missing indicator to be ς i , which is 0 if δ i is missing; otherwise, it is 1. roughout this article, we assume that V i is independent of C i , and δ i is MAR, which implies that ς i and δ i are conditional independent given

Modified Profile Least Squares Estimation.
e local linear regression technique is employed to estimate the coefficient function α(·). If α(·) has twice continuous derivative at point u 0 , for u in a small neighborhood of u 0 , one can approximate α(·) by the following expansion with Taylor expansion: where α j ′ (u) � zα j (u)/zu. en, (a j , b j ) can be estimated by minimizing the following objective function: where K(·) is a kernel function, and 0 < h 1n ⟶ 0 is a bandwidth sequence. Due to the missing indicators, some δ i cannot be observed. erefore, model (2) cannot be applied directly. One can replace δ i with its conditional expectation us, (a j , b j ) can be defined as the minimizer of However, in practical fields, function m(·) is usually unknown. One can use parametric and nonparametric methods to estimate m(·). However, when the covariates are high-dimensional, nonparametric estimation may cause "the curse of dimensionality." Hence, throughout this paper, we assume that m(·) follows a parametric model m(T) � m(T, θ), where θ is an unknown parameter vector. Following Wang and Dinse [16], the estimator θ n of θ can be obtained by maximizing the following likelihood function: Let Hence, (a j , b j ) can be estimated by minimizing the following objective function: where G n (Y i ) is the estimator of G(·), which is defined by Discrete Dynamics in Nature and Society which is the Nadaraya-Watson estimator of μ(y) � E[δ i |Y i � y] with the kernel function L(·) and bandwidth sequence 0 < a n ⟶ 0.
For notational simplicity, us, the local linear regression estimator of α(·) is defined as follows: is a nonparametric estimator of π(y) � E[ς|Y � y] with kernel function Ω(·) and bandwidth sequence 0 < b n ⟶ 0. Hence, (a j , b j ) can be estimated by minimizing the following objective function: If β is known, one can obtain the local linear estimator of coefficient function by 4 Discrete Dynamics in Nature and Society ). Substituting (19) into the original model and eliminating bias produced by the measurement error. Hence, we can get the following modified PLS estimator of β based on inverse probability weighted method: Hence, the local linear regression estimator of α(·) is defined as follows:

Estimation for Error Variance.
In order to improve the estimation of parametric and nonparametric parts, we construct local linear estimators of the error variance function σ 2 (·) in this subsection. Note that By minimizing the following object function with respect to μ 1 , the local linear regression estimator of σ 2 (u) based on regression calibration method is defined by where the weight function W c ni (·) is defined by Note that By minimizing the following object function with respect to μ 2 , the local linear regression estimator of σ 2 m (u) based on imputation method is defined by Discrete Dynamics in Nature and Society where the weight function W m ni (·) is defined by Note that By minimizing the following object function with respect to μ 3 , the local linear regression estimator of σ 2 (u) based on inverse probability weighted method is defined by where the weight function W w ni (·) is defined by

Reweighted Estimation.
In this subsection, we construct the reweighted estimations of the parametric and nonparametric parts based on the error variance estimator σ 2 c (u) given in (23). By minimizing the following object function, then, we get the reweighted estimator of β based on the inverse probability weighted method: us, the reweighted estimator of the coefficient function α(u) is defined by

Empirical Likelihood
e confidence regions of the parameter can be constructed by the asymptotic distribution of eorems 1 and 4. However, the estimation of asymptotic covariance is quite complicated. In this section, we shall employ the EL method to construct confidence regions for β, which avoids to estimate the complicated covariance.

Regression Calibration Empirical Likelihood.
We introduce the following auxiliary random vector based on regression calibration method: us, we define the empirical log-likelihood ratio function as follows: (46) e optimal value of p i satisfying (46) is given by By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is represented as By maximizing − l c (β) , we can obtain a maximum EL estimator β cel of β with regression calibration method.

Imputation Empirical Likelihood.
We introduce the following auxiliary random vector based on imputation method: Discrete Dynamics in Nature and Society Hence, we define the empirical log-likelihood ratio function as follows: (49) e optimal value of p i satisfying (49) is given by )) � 0. By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is By maximizing − l m (β) , we can obtain a maximum EL estimator β mel of β with imputation method.

Inverse Probability Weighted Empirical Likelihood.
We introduce the following auxiliary random vector based on inverse probability weighted method: en, we define the empirical log-likelihood ratio function as follows: (52) e optimal value of p i satisfying (52) is given by By the Lagrange multiplier method, the corresponding empirical log-likelihood ratio function is represented as By maximizing − l w (β) , we can obtain a maximum EL estimator β wel of β with inverse probability weighted method.

Main Results
For convenience and simplicity, we use C 0 , C 1 , . . . and c 0 , c 1 , . . . generically to represent any positive constants, which may take different values for each appear- In order to prove the main results, we give a set of assumptions that are stated in the following theorems: (C1) e random variable U has bounded support U and its density function g(·) is Lipschitz continuous and away from zero on its support. 8 Discrete Dynamics in Nature and Society e variance function σ 2 (·) with uniform boundedness has continuous second-order derivation and is bounded away from zero. (C5) e kernel K(·) as a symmetric density function has compact support [− 1, 1], which is Lipschitz continuous, and satisfies (C7) e kernel functions Ω(·) and L(·) are bounded with bounded compact supports, and na n ⟶ ∞, na 2 n ⟶ 0, nb n ⟶ ∞ and nb 2 n ⟶ 0. (C8) π(·) and μ(·) have bounded derivatives of order 1, and there exists c > 0 such that inf Wang [17], one can get which, together with assumption (C9), gives e asymptotic properties of the proposed estimators are shown in the following theorems.

Theorem 6. Suppose that assumptions (C1)-(C10) are satisfied; if β is the true value, then we have
where l n (β) denotes one of l c (β), l m (β), and l w (β). χ 2 1 is a standard chi-squared random variable with 1 degree of freedom.

Discrete Dynamics in Nature and Society
Remark 2.
(a) From eorems 1 and 4, the asymptotic variance of the reweighted estimator β n is not greater than that of the modified profile LS estimator β n ; that is, is a positive semidefinite matrix. e asymptotic variance of the reweighted estimator β m is smaller than that of β w , and is larger than that of β c , which indicates that β c performs the best, and β w performs the worst. e modified PLS estimators β c , β m , β w enjoy the same conclusion. (b) From eorems 2 and 5, the local polynomial estimator α n (·) and reweighted estimator α n (·) have the same asymptotic distribution, which reflects the characteristic of the local regression in nonparametric models.  υ and a � (a 1 , a 2 , a 3 ) and corresponding CR and MR in model (63). Quantiles of the input sample QQ-plot of sample data versus standard normal (c) From eorem 6, the 100(1 − τ)% EL confidence region for β can be established as I τ � β: l n (β)

Simulation
In this subsection, we carry out some numerical simulation to investigate the finite sample behavior of the proposed estimators. We compare the performance of the estimators based on the regression calibration method (CA), imputation method (IM) and inverse probability weighted method (IPW), and their corresponding reweighted estimators (R-CA, R-IM, R-IPW). Besides, we conduct a comparison of the EL method with the normal approximation (NA) approach in terms of coverage probabilities (CP) and average interval lengths (AL) under different settings. At the same time, we give a real data analysis. e kernel functions are taken as K(u) � (3/4)(1 − u 2 )I(|u| ≤ 1), L(u)(15/16)(1 − u 2 ) 2 I (|u| ≤ 1) and Ω(u) � (1/2)I(|u| ≤ 1). e bandwidths h 1n , h 2n and h 3n have taken the same values by leave-one-sampleout cross-validation. e following simulation is based on 500 replications. e sample size n is chosen to be 100 and 400, repeatedly.
In the first simulation, we study the finite sample performance of the proposed modified PLS estimators and reweighted estimators of β based on mean squared error (MSE) defined as and the global mean square error (GMSE) of α n (u) defined as where u j : j � 1, 2, . . . , n grid is a sequence of grid points. In addition, we plot QQ-plots of the reweighted estimator α w (u) for α(u) under different settings in Figures 1 and 2. In the second simulations, we plot the curves of the proposed estimators σ 2 c (u), σ 2 m (u), and σ 2 w (u) under different settings in Figures 3 and 4. In the third simulations, we consider CP and AL of the confidence regions for β based on the EL method (CPE, ALE) and NA method (CPN, ALN) with nominal level 0.95 under different settings in Table 2.
From Tables 2-4 and Figures 1-4, it can be seen that (1) In Tables 3 and 4, the MSE and GMSE of reweighted estimators are smaller than those of modified PLS estimators under the same setting. e results of IM estimators are smaller than those of IPW estimators, and bigger than those of RC estimators. e results increase as measurement error, heteroscedasticity error, and CR and/or MR increase. e results decrease as the sample size increases. e results above  imply that the reweighted estimators perform better than the modified PLS estimators. e RC method performs best, and IPW method performs worst, which confirms the theoretical results.
(2) In Table 2, the CP of reweighted estimators is larger than that of the modified PLS estimators. e CP of the RC method is the smallest, and that of the IPW method is the biggest under the same settings. e CP decreases as heteroscedasticity error and CR and/or MR increase. e results increase as sample size increases. e CP based on the EL method is smaller than that of the NA method. e AL performs in the opposite way.
(3) In Figures 1 and 2, the fit is better as decreasing the heteroscedasticity error and measurement error. e fit is worse as increasing CR and/or MR. (4) In Figures 3 and 4, the proposed estimators of error variance perform better as decreasing measurement error, heteroscedasticity error, and CR and/or MR. e estimator σ 2 c (·) performs the best, and σ 2 w (·) performs the worst under the same settings.

A Real Data Analysis.
In real data analysis, we illustrate the methodology via an application to a dataset from a breast cancer clinical trial [25]. is clinical trial was conducted by the Eastern Cooperative Oncology Group, whose target was evaluating tamoxifen as a treatment for stage II breast cancer among elderly women, who are older than 65. ere are 169 elderly women participating in the trial, and we focus on 79 women who died by the end of the trial. But, unfortunately, the cause of death is incomplete. Among them, 44 women died from breast cancer, 17 died from other known causes, and 18 died from unknown causes. Let the censoring indicator δ show whether the death was caused by breast cancer, and let the missing indicator ς show whether the cause of death was known.
e dataset contains four covariates: whether the patients accepted the treatment (1, tamoxifen; 0, placebo), denoted as X 1 ; whether the estrogen receptor status was positive (1, yes; 0, unknown), denoted as X 2 ; whether there were four or more axillary lymph positive nodes (1, yes; 0, no), denoted as Z; and whether the primary tumor is 3 cm or larger (1, yes; 0, no), denoted as U. en, we employ the following model to fit the data, where Y is the logarithm of the time to death due to breast cancer, which is censored, and the censoring indicator is MAR. e heteroscedastic error follows the form of Var(ε|X, Z, U) � σ 2 (U i ). For the purpose of comparison, we compute both mean squared prediction error (MSE) and mean where Y i is the fitted value of Y i . e values of MSE and MAD based on different methods are given in Table 5. In addition, the estimated curves of α(u) based on RC, IM, and IPW methods are reported in Figure 5.
From Table 5 and Figure 5, it can be seen that, (1) in Table 5, the estimators of β 1 are positive, which indicates that the breast-cancer deaths may live longer if they received the treatment. Among these estimators, the MSE and MAD based on the R-RC method are smallest, which confirms the conclusions in eorems 1 and 4. (2) In Figure 5, the primary tumor size of patients is mainly from 0 to 3 cm. e survival time decreases obviously as the tumor size increases.

Conclusion
In this paper, we consider the estimation and confidence regions based on modified PLS method and EL inference for PLVCEV model with heteroscedastic errors under censoring indicators MAR, respectively. Asymptotic properties of the proposed estimators are established, and the confidence regions of parameter are constructed. In addition, a simulation study and real data analysis are conducted to illustrate our proposed method.
Xu and Duan [11] established efficient estimation for varying-coefficient heteroscedastic partially linear model with additive errors, but their results are confined in responses observed completely. It is an innovative and challenging topic to study the statistical inference for heteroscedastic PLVCEV model under right-censored data with censoring indicators MAR.
Proof. Following the proof of eorem 1 in Wang and Ng [28], one can get the proof of Lemma A.4. To save space, here, we omit the details.
Under assumption (C10), we have D 11 � o p (1). Under assumption (C5), we have S i ε � Z ⊤ i 1 q O p (log n/nh 1n ). On applying assumption (C10), one can get (A.9) Recalling Remark 1 and from the results in eorem 3, it is easy to prove Under the missing mechanism and similar to the proof of D 1 , it is easy to prove that D 2l � o p (1) for l � 1, 2, 3, 4. Hence, we have D 2 � o p (1). Consider (A.11) From Lemma A.4, we have