Mixed Spline Smoothing and Kernel Estimator in Biresponse Nonparametric Regression

Mixed estimators in nonparametric regression have been developed in models with one response. /e biresponse cases with different patterns among predictor variables that tend to be mixed estimators are often encountered. /erefore, in this article, we propose a biresponse nonparametric regression model with mixed spline smoothing and kernel estimators. /is mixed estimator is suitable for modeling biresponse data with several patterns (response vs. predictors) that tend to change at certain subintervals such as the spline smoothing pattern, and other patterns that tend to be random are commonly modeled using kernel regression. /e mixed estimator is obtained through two-stage estimation, i.e., penalized weighted least square (PWLS) and weighted least square (WLS). Furthermore, the proposed biresponse modeling with mixed estimators is validated using simulation data. /is estimator is also applied to the percentage of the poor population and human development index data. /e results show that the proposed model can be appropriately implemented and gives satisfactory results.


Introduction
One of the most popular statistical methods often used for prediction is regression analysis. Regression analysis is commonly used to determine the functional relationship between independent variables (predictors) and dependent variables (responses) [1]. Functional relationships between predictor variables and response variables can have clear or unknown patterns; if these relationships have unknown patterns, the appropriate type of regression analysis is nonparametric regression [2]. In the nonparametric regression, the regression curve is assumed to be smooth. is regression has high flexibility because the data can drive to estimate its own regression curve without subjectivity from the researcher [3]. Researchers have proposed methods for estimating nonparametric regression functions such as spline, kernel, and Fourier series functions. e spline nonparametric regression has been developed by Eubank [3], Becher et al. [4], and Wang et al. [5]. Hall and Huang [6], Okumura and Naito [7], Du et al. [8], Chamidah and Saifudin [9], and Erçelik and Nadar [10] developed kernel nonparametric regression. Bilodeau [11] and Amato et al. [12] estimated the nonparametric regression function with the Fourier series function.
In applying nonparametric regression modeling, researchers sometimes assume that each predictor variable has the same pattern. Although there are often real cases with different patterns between the response and each predictor, if the researcher still insists on applying one type of estimator to all predictor variables, the estimation results can be inaccurate and produce a large error. Researchers have begun to develop nonparametric regression with a mixed estimator, including Hidayat et al. [13], Mariati et al. [14], and Octavanny et al. [15]. ese mixed estimators are formed by referring to the idea of semiparametric regression. e semiparametric regression model is an additive regression model that consists of a parametric component and a nonparametric component; mixed estimators, the additive model concept in semiparametric regression is adapted by modification using two different nonparametric components. However, these mixed estimators only use one response variable, even though some biresponse cases also have different patterns among the predictor variables. At present, nonparametric regression studies with mixed estimators for biresponse cases have never been developed. erefore, this study develops a new theory about the mixed estimator of spline smoothing and kernel in biresponse nonparametric regression. is mixed estimator is the development of the mixed estimator proposed by Hidayat et al. [13] that the kernel estimator is considered to be fixed, while the kernel function in this paper is estimated. In addition, this new mixed estimator can be applied for biresponse cases. is mixed estimator is obtained through a two-stage estimation.
e first stage of estimation uses the penalized weighted least square (PWLS) to obtain the spline smoothing component, followed by the second stage that employs the weighted least-square (WLS) estimation method to estimate the kernel component. e spline smoothing estimator is very dependent on the smoothing parameter, while the kernel estimator is very dependent on the bandwidth parameter. ese smoothing and bandwidth parameters are tuning parameters. e optimal value of these parameters will produce the best regression model. In nonparametric and semiparametric regression, there are several methods to determine the optimal parameter value to obtain the best regression model. Some of the popular methods are the cross-validation (CV) and generalized cross-validation (GCV) methods. e CV method is a method of selecting the best model based on the best predictive ability from all the different datasets. is method is widely used, but the calculation of this method will become more complex as the number of datasets increases. In addition, for partial linear models including mixed estimator models, the one-out crossover method tends to be time consuming even for moderate sample sizes [19]. Craven and Wahba [20] modified the CV method to make the calculation simpler, and the result of this modification is called the GCV method. is method is widely used by researchers because it has several advantages. e advantages of the GCV method include the following: simple and efficient in calculation, invariant to transformation, and does not require variant information. is method also has the advantage of optimal asymptotic properties over other methods [21,22]. Some researchers develop specific GCV methods according to the model in their research such as the GCV for semiparametric ridge regression with kernel smoothing [23][24][25]; also, several types of GCV were developed for uniresponse nonparametric regression with mixed estimators including the mixed estimator of spline smoothing and kernel [13], mixed estimator of spline smoothing and Fourier series [14], and mixed estimator of truncated spline and fourier series for longitudinal data [15]. erefore, in this study, the determination of the best model was carried out using the GCV method which was developed specifically for the mixed estimator of spline smoothing and kernel in biresponse nonparametric regression.
Next, the proposed mixed estimator is applied to the simulation data. e formula for generating data contains two different functions to represent two different patterns of predictor variables. is estimator is also implemented to model to the percentage of the poor population and human development index data in Papua Province, Indonesia. Empirical results indicate that the proposed mixed estimator performs very well for modeling the data with two different patterns. One pattern (response vs. predictors) tends to change at certain subintervals, and another pattern appears to be random, which are commonly modeled using kernel regression. e rest of this paper is organized as follows: In Section 2, we present the materials and methods about the two-stage estimation method, i.e., the PWLS and followed by the WLS. e proposed mixed spline smoothing and kernel estimator in biresponse nonparametric regression is explained in Section 3.1.
e selection of smoothing and bandwidth parameters using generalized cross validation (GCV) is described in Section 3.2. e simulation study and real data analysis are conducted to illustrate the performance of the proposed biresponse mixed estimator in Sections 3.3 and 3.4. e conclusions and further research are presented in the last section.

Materials and Methods
e paired data (y 1i , y 2i , x i , t i ) are given, where some predictor variables have a pattern that changes at certain subintervals, and the remaining predictors have a typically random pattern.
is work adopts the idea of the semiparametric regression model developed by Green and Yandell [16], which employed an additive model to combine parametric and nonparametric models. us, the additive model for nonparametric biresponse regression with two different estimators can be formulated as follows: where μ hi (x i , t i ) is a regression curve. In the biresponse cases, if y hi and y h′i are in pairs, then there is a correlation between the error of h-response and error of h'-response (h ≠ h ′ ). Error correlation between responses can be defined as Each regression curve μ hi (x i , t i ) is assumed to be unknown and additive so that it can be written as If equation (2) is substituted into equation (1), then equation (1) can be expressed in the following vector form: (3) e component of the regression curve g is approximated by the spline smoothing function, which is assumed to be smooth and contained in the Sobolev space g ∈ W m 2 [a, b], while the component of the regression curve f is approximated by the kernel function. Mixed spline smoothing and kernel estimator in biresponse nonparametric regression can be obtained through the twostage estimation method. e first stage is performed by estimating the spline smoothing component using the PWLS method, and the second stage is estimating the kernel component using the WLS method. To estimate the spline smoothing part, equation (3) is modified to the following form: where z � y − f. Estimation of the spline smoothing component g can be performed by applying PWLS optimization to equation (5), and then, the estimation results in the first stage are substituted into equation (3).
e estimation of the kernel component f is obtained in the second stage estimation using WLS optimization in equation (6). Furthermore, the results of the two-stage estimations are substituted into equation (3) to obtain the mixed spline smoothing and kernel estimator in biresponse nonparametric regression.
where K α (t 0 ) is a weighting matrix for the kernel estimator and M − 1 in equations (5) and (6) is a weighting matrix for the biresponse nonparametric regression like in the work of Wang et al. [5].
Proof. If g h ; h � 1, 2 is a function lying in Hilbert space H, the H space can be decomposed into a direct sum of two spaces H 0 and H 1 where H � H 0 ⊕ H 1 and H 0 ⊥ H 1 . If τ h1 , τ h2 , . . . , τ hm is the basis in H 0 and β h1 , β h2 , . . . , β hn is the basis in H 1 , according to Wahba [22], for each function g h ∈ H with u h ∈ H 0 and v h ∈ H 1 , we obtain where International Journal of Mathematics and Mathematical Sciences Equation (10) is a limited linear function in H space and g h ∈ H; therefore, equation (10) can be stated as and using inner product properties, equation (12) can be written as and for each response with all observations (i � 1, 2, 3, . . . , n), we can obtain the following vector g h : where erefore, for all responses (h � 1, 2), we can obtain the function form of the spline smoothing component in the biresponse regression curve as follows: where with . . , n is approached by the Taylor series with t around t 0 . Proof.
e form of the regression function f is derived from the component f h (t i ) in equation (2). e form of the function is unknown and approached using the kernel estimator. e function f h (t i ) for h � 1, 2 can be approached by the Taylor series with t around t 0 as follows [9]: , then equation (20) can be stated as where t ∈ (t 0 − α, t 0 + α), t 0 is the predictor value for prediction. e kernel estimator can be obtained when the polynomial order m h � 0; h � 1, 2. erefore, the function form for each response involving all observations can be stated as en, we can obtain the function form of component f in biresponse nonparametric regression as follows: where e biresponse nonparametric regression model is given in equation (1), where each component of the regression curve is additive as stated in equation (2). e function form of the g component is presented in Lemma 1, and the function form of the f component is presented in Lemma 2. Using PWLS in the first-stage estimation and WLS in the second-stage estimation, the mixed spline smoothing and kernel estimator in biresponse nonparametric regression is obtained as follows:
Proof. e first-stage estimation on the mixed spline smoothing and kernel estimator in biresponse nonparametric regression is performed by estimating the spline smoothing component using the PWLS in equation (5). e penalty component 2 h (x)] 2 dx in equation (5) can be obtained through the following decomposition [26] where P 1 is the orthogonal projection of g h to H 1 in H space. By substituting equation (26) into the penalty component, we obtain � c T ΛVc, [26], (27) and furthermore, the PWLS optimization in equation (5) can be written in matrix notation as follows: e solution for the PWLS optimization can be obtained from the partial derivative Q(c, d) by c and d. e partial derivative Q(c, d) by c results as follows: e partial derivative of Q(c, d) by d gives the result and by substituting equation (30) into equation (31), we can obtain considering that R � M − 1 V + 2nΛ so matrix V can be stated as V � M(R − 2nΛ); then, we can modify Equation (33) is substituted into equation (32), and then, we solve it and get the following result: Furthermore, by substituting equation (34) to equation (30), we obtain c and d are substituted into the function form of the spline smoothing component in equation (16), and then, the following spline smoothing estimator component in the biresponse nonparametric regression model is obtained: where Because z � y − f, the first-stage estimation results can be stated as and remember that z � g + ε; then, the model in equation (4) can be written as In the second stage of estimation, the function f as the kernel component on a mixed spline smoothing and kernel estimator in biresponse nonparametric regression is estimated using the WLS method with the following formula: where M − 1 is a weighting matrix for biresponse nonparametric regression and K α (t 0 ) is a weighting for the kernel estimator with the following structure: where By substituting the results of the first-stage estimation (38) and the function form f (Lemma 2) into the model of the mixed estimator (equation (3)), the error of this mixed estimator model can be written as follows: A(λ, α)y + A(λ, α) Furthermore, by supposing D � M − 1 K α (t 0 ) and substituting equation (42) e solution for the WLS optimization can be obtained from the partial derivative Q(t 0 ) by ω 0 (t 0 ). e optimization result is obtained as follows: A(λ, α))y.
(44) erefore, the estimation for kernel estimator component f can be written as

Selection of Smoothing and Bandwidth
Parameters. e best model for the biresponse mixed spline smoothing and kernel estimator depends on the optimal smoothing parameters (λ opt ) and optimal bandwidth parameters (α opt ), where λ and α are tuning parameters. ese optimal parameters can be obtained from the model with the smallest generalized cross-validation (GCV) value, see [13][14][15][23][24][25]. e GCV criteria are one of the methods to determine the best model in nonparametric regression [20]. e GCV formula for the mixed spline smoothing and kernel estimator in biresponse nonparametric regression (48) can be stated as follows: H(λ, α))y‖ 2 (2n) −1 trace(I − H(λ, α)) 2 . (49)

Simulation Study.
In this simulation, the mixed spline smoothing and kernel estimator in biresponse nonparametric regression, as formulated in equation (48), is applied to the simulation data. e simulation data are generated from a formula that contains two different functions, i.e., a polynomial function and an exponential function, to present two different patterns for each predictor. e polynomial function is used to generate spline smoothing-like patterns, and the exponential function is used to generate patterns in the data such as kernel. Using two response variables and two predictor variables, the formula for generating data is defined as follows: with 8 International Journal of Mathematics and Mathematical Sciences e predictors are generated from x i ∼ U(0, 2) and t i ∼ U(0, 2) with the sample size n � 100, and the random errors ε hi are generated from bivariate normal distributions with μ 1 � 0, μ 2 � 0, σ 2 1 � 0.5, σ 2 2 � 0.6, and ρ � 0.6. e scatterplots of the simulated data are shown in Figure 1. It can be seen that the pattern between y 1 , y 2 against x tends to change at certain subintervals such as the spline smoothing pattern, while the scatterplot between y 1 , y 2 against t tends to have a random pattern that is commonly modeled with kernel regression.
In this simulation, the Gaussian kernel was employed. Based on the empirical results from the two-stage estimation, we obtain combination values of the smoothing parameters (λ 1 , λ 2 ) and bandwidth parameters (α 1 , α 2 ) around the optimal values (Table 1). e exhibited results are a few of all combinations due to the limited space. e best model is chosen based on the smallest GCV value resulted from optimal smoothing parameters λ 1(opt) � 0.0002413 and λ 2(opt) � 0.0000899 along with optimal bandwidth parameters α 1(opt) � 2.396 and α 2(opt) � 2.416.
is model produces the lowest GCV � 3.239 with R 2 � 99.91% and RMSE � 0.1045. e result of modeling simulation data using the mixed spline smoothing and kernel estimator is compared with modeling using either a spline smoothing estimator or kernel estimator only. e results of these models are presented in Table 2. From Table 2, we can find out the best model that gives the smallest GCV value is the model with the mixed spline smoothing and kernel estimator. Besides, this model has the largest R 2 and the lowest MSE value. e plot between the estimation results and the original simulation data is presented in Figure 2, where the estimated values (red triangles) are very close to the original data (black squares). us, the proposed model and estimation procedure can be used to make a prediction correctly. Furthermore, on the left side of Figure 3, the surface plots are formed using equation (50), which is the equation for generating simulation data, whereas on the right side of Figure 3, there are two surface plots for each response which are formed from equation (48) where its parameters are estimated using two-stage estimation, i.e., the PWLS and WLS. e two sides of Figure 3 show that the plots appear to have similar surface shape. is evidence indicates that the estimation procedure proposed in equation (48) can be used appropriately to estimate the function generated from the simulation.

Data Application.
e biresponse mixed spline smoothing and kernel estimator proposed in this paper is applied to regress the percentage of the poor population (PPP), as the first response, and human development index (HDI), as the second response, on several predictors. ese two response variables are important because they are indicators of the success level of a country's development. e adoption of biresponse modeling for the two variables considers the initial study that there is a negative correlation between PPP and HDI. If the PPP in a region is getting lower, the HDI in that region will be higher [27,28].
Several variables that typically affect the two response variables are the gross regional domestic product (GRDP) and the population growth rate. Some researchers have pointed out several factors that can affect the PPP and HDI, including Grubaugh [29], who stated the variables that influence the growth of HDI in developing countries are population, population growth, and the initial level of the gross domestic product (GDP). Meanwhile, Mallick and Ghani [30] found that high population growth is the cause of poverty in Pakistan. While in North Sumatra Province, Indonesia, the GRDP and education level to university have a positive and significant influence in reducing the PPP [31]. Based on Malthus's theory, poverty is considered as the impact of high population growth rates [32]. Also, additional life support needs are considered slower than population growth. A high increase in population growth will have an impact on decreasing the quality of natural resources and reducing the opportunity for people to access life-support facilities. is situation can reduce the quality of human life, and people will be challenged to live in prosperity. Rapid economic growth is one way to alleviate poverty [33]. e GDP or GRDP has a close relationship with economic growth because the economic growth of a region is related to an increase in production or an increase in income per capita. Besides, if the GRDP is higher, the income per capita in the region will increase and have an impact on increasing the ability of the community to meet their needs and improve their quality of life. e application of this mixed estimator model on the PPP and HDI in this study is made after the authors conducted a preliminary study. Based on information from the initial research, it is known that the GRDP has a changing pattern at certain subintervals such as the spline pattern. In contrast, the population growth rate has a random pattern that is usually modeled by the kernel. erefore, the data of PPP and HDI of Papua Province in 2017 are used as response variables, while the predictor variables are the GRDP and the population growth rate. e data were obtained from Statistics Indonesia (Badan Pusat Statistik-BPS), Papua Province, with 29 regencies/cities as the observation unit. e biresponse mixed spline smoothing and kernel estimator in equation (12) was applied to the data. is modeling produces the minimum GCV value of 62.75, the R 2 of 96.54%, and the RMSE of 3.166. Based on the R 2 value, it is known that the model can describe the relationship between predictor variables and response up to 96.54%.
is finding shows that the biresponse mixed spline smoothing and kernel estimator is suitable for modeling the PPP and HDI in Papua Province. Also, the bar chart in Figures 4 and 5 show that the estimated values of PPP and HDI in Papua Province are close to their actual values.
Furthermore, to determine the predictive ability of this modeling, we use the model that has been obtained from the data in 2017 to predict the PPP and HDI values of Papua Province in 2018 and 2019. is prediction is carried out by applying the model that has been obtained to the GRDP and the    is 5.1658% or the level of accuracy is 94.8342%. ese MAPE values are less than 10%, which indicates that the biresponse nonparametric regression model with mixed spline smoothing and kernel estimators has a good predictive ability to predict PPP and HDI in Papua Province.

Conclusions
is paper presents the biresponse nonparametric regression model with mixed spline smoothing and kernel estimators. is new mixed estimator is obtained through two-stage estimation, i.e., the first stage using the PWLS to obtain the spline smoothing component, followed by the second stage that employs the WLS to estimate the kernel component. is mixed estimator is formed to handle the different data patterns between each predictor in the biresponse case, so this estimator can provide better estimation results. Selection of the best model for the proposed estimator is carried out by selecting a model that produces a minimum GCV value. e simulation results show the biresponse mixed spline smoothing and kernel estimator provides better results compared to the biresponse spline smoothing or biresponse kernel estimator. Furthermore, this proposed estimator can be appropriately applied to model the percentage of the poor population (PPP) and human development index (HDI) in Papua Province and gives satisfactory results. e limitation of this study is we only use one predictor variable for each component of the estimators, both spline smoothing and kernel estimators. For future work, this biresponse mixed estimator can be developed with more than one predictor for each estimator component. Apart from this limitation, this study is useful for our insight into mixed estimators in biresponse nonparametric regression.
Data Availability e data in this article are available in the BPS of the Papua Province repository (https://papua.bps.go.id).

Conflicts of Interest
e authors declare that there are no conflicts of interest in this paper.