A New Mixed Estimator in Nonparametric Regression for Longitudinal Data

We introduce a new method for estimating the nonparametric regression curve for longitudinal data. ,is method combines two estimators: truncated spline and Fourier series. ,is estimation is completed by minimizing the penalized weighted least squares and weighted least squares. ,is paper also provides the properties of the new mixed estimator, which are biased and linear in the observations. ,e best model is selected using the smallest value of generalized cross-validation. ,e performance of the new method is demonstrated by a simulation study with a variety of time points. ,en, the proposed approach is applied to a stroke patient dataset. ,e results show that simulated data and real data yield consistent findings.


Introduction
Nonparametric regression is a statistical method used if the data show an unknown regression curve. e strength of this method is its great flexibility since the data are used to find the form of its estimated regression curve without being influenced by subjective judgements [1]. Some estimators used are spline, Fourier series, kernel, and polynomial. e truncated spline is a function where there is a change in the behaviour of the curve in certain subintervals. e spline is one of the popular estimators in nonparametric regression because it has an excellent visual interpretation. Montoya et al. [2] conducted a simulation study to compare knot selection methods in a penalized regression spline model. Next, the Fourier series is much used to describe curves that present sine and cosine waves. is estimator is commonly used when the data have the characteristics of a periodicity. Bilodeau [3] estimated additive components with functions consisting of truncated Fourier cosine series, using penalized least squares to obtain the coefficients. Furthermore, Tripena [4] developed a Fourier series estimator for bi-response nonparametric regression. Gong and Gao [5] discussed a nonparametric kernel estimation of tax policy's impact on the demand for private health insurance in Australia.
A single estimator in nonparametric regression is commonly used but does not limit the possibility of developing into a mixed estimator. ere are many cases where each predictor variable has a different pattern. erefore, applying only a single estimator can make the regression model's estimation incorrect and produce a large error. Some previous studies that have been mentioned have limitations in that they can only be used for cross-sectional data. To overcome this limitation, a longitudinal data model has been developed. e use of longitudinal data has increased in recent years because it can be applied in various fields. Longitudinal data are data obtained in observations of independent subjects, where each subject is observed repeatedly at a certain period. is has the advantage of being able to observe changes based on time [6].
In [7], the mixed estimator is limited in the sample size it can treat. e present study extends the use of the mixed truncated spline and Fourier series (MTSFS) model to larger sample sizes and various time point designs. Some properties of the new mixed estimator will also be provided. We use generalized cross-validation (GCV) to determine the best model from various knots, oscillations, and smoothing parameters. A simulation study and real data are employed to demonstrate the performance of the proposed method. e case study includes the factors that affect the Glasgow Coma Scale (GCS) on stroke patients, i.e., body temperature (BT) and pulse rate (PR). e rest of this paper is divided into several main topics. Section 2 introduces the materials and methods used in this study. Section 3 consists of five subsections: the theory of the new mixed estimator, its properties, the selection of the best model, the simulation study, and a case study. e details of the new mixed estimator are presented in Section 3.1, followed by its properties in Section 3.2. Section 3.3 presents how to select the optimum knot point, oscillation parameter, and smoothing parameter to obtain the best model. e results of a simulation study of the proposed method are presented in Section 3.4, followed by an application to real data in Section 3.5. Section 4 concludes the paper.

Materials and Methods
Suppose y it is the response variable and x it and z it are the corresponding predictor variables with sample size of n subjects (i � 1, 2, . . . , n), with each subject having T observations (t � 1, 2, . . . , T).
e relations between the response and predictors for the nonparametric regression model for longitudinal data are where μ is the regression curve and ε it is a random error. Assume that the form of the regression curve μ is unknown and additive, so that where p j�1 f ji (x jit ) is the truncated spline component and q k�1 g ki (z kit ) is the Fourier series component. e function f ji , j � 1, 2, . . . p, is an approximation using truncated spline functions and g ki , k � 1, 2, . . . , q, is that using Fourier series. e estimator μ is obtained through a two-step optimization, i.e., penalized weighted least squares (PWLS) and weighted least squares (WLS). e results are summarized as follows. (2) is given by q k�1 g ki (z kit ), then the goodness of fit can be formulated as follows:

Lemma 1. If the Fourier series component in equation
where N � n × T, W is the nT × nT weighting matrix, y * � y − f, and Proof. e function g ki is assumed to be unknown and contained in the space C(0, π). e function is approximated using Fourier series with a trend, modified from Bilodeau [3]: Equation (5) can be written in matrix form where Z is a nT × (2 + H)nq matrix and c is a (2 + H)nq vector. e nonparametric model in equation (2) can be rewritten as en, the goodness of fit for equation (7) can be written as e model in equation (8) can be written in matrix form: □ Lemma 2. If the penalty component is given,

Journal of Mathematics 3
where c � c 1 c 2 . . . c n ′ , Proof. Regarding equation (5), we define Consequently, Let us, e penalty in equation (16) can be rewritten in matrix form Regarding the goodness of fit in Lemma 1 and the penalty component in Lemma 2, we obtain the PWLS optimization as Equation (18) can be rewritten in the form where Proof. e optimization in equation (18) can be written as Equation (21) can be rewritten in the form We obtain e completion of the optimization in (23) is obtained by taking the partial derivative of Q(c) with respect to c and setting it equal to zero, i.e., giving the result By substituting (25) into (6), we obtain e nonparametric regression model in equation (7) can be written as where If the truncated spline component in equation (2) is Proof. e function f ji is a linear truncated spline function with s knot for each x j , j � 1, 2, . . . , p: where Equation (29) can be rewritten in matrix form where X �

Journal of Mathematics
e MTSFS model for longitudinal data in equation (1) can be written in the form (32) By substituting (26) into (32), we obtain To obtain the estimator of truncated spline component, equation (33) We obtain Journal of Mathematics e complete optimization is obtained by setting equal to zero the partial derivative of Q(c) with respect to c, that is, giving the result By substituting c into equation (30), we obtain We obtain c by substituting (43) into (25): As a consequence, By substituting c and c into the MTSFS model for longitudinal data, we obtain

e Properties of MTSFS Model for Longitudinal Data.
is section provides the MTSFS model's properties for longitudinal data, i.e., it is biased and linear in observations. e mixed estimator is biased, as proved by

e Selection of the Optimal Number of Knot Points, Oscillation Parameter, and Smoothing Parameter.
e MTSFS model is very dependent on the number of the knot points, oscillation parameter, and smoothing parameter. e best model is obtained by using the optimal values of these parameters. In semiparametric and nonparametric regression, there are several methods to obtain the best regression model. One of the popular methods is generalized crossvalidation (GCV).
is method was developed to overcome the shortcomings of complex CV calculations. GCV has several advantages: it is simple and efficient in calculation, invariant to transformation, and does not require information about variant. In addition, GCV has better asymptotic properties than other methods [9,10].
In this study, the value of GCV is a criterion that can be used to determine the best model from variety of knots, oscillations, and smoothing parameters. e criterion for selecting the best model includes taking the model with the lowest GCV value. e modified GCV method of the MTSFS model for longitudinal data is stated as follows.
Journal of Mathematics e optimal number of knot points, oscillation, and smoothing parameters is obtained by minimizing GCV (k, h, λ).

Simulation Study.
is section presents the use of the MTSFS model on simulation data to see the performance of the estimators obtained. e simulation was done with two predictor variables, sample size n � 20, and varied number of time points, T � 5, 10, 15. We considered 20 models for each subject generated from a formula that contains two different functions representing the truncated spline and Fourier series pattern. A polynomial function is used to present the truncated spline, while trigonometric functions are used to present the Fourier series. e predictors are generated from U(0, 1), and the random errors ε it are generated from a multivariate normal distribution. e weight matrix is user-specified [6], and in this paper, we use W � N − 1 I so that each of the measurements is treated equally. In this study, we use two numbers of knots (K � 1 and K � 2) and several oscillation parameters (H � 1, 2, 3, 4). ese simulation studies based on GCV are presented in Table 1.
Based on Table 1, for the model with T � 5, it appears that the smallest GCV value occurs when the oscillation H � 3 for K � 1 and K � 2. e same is also seen at T � 10 and T � 15. e smallest GCV occurs when the oscillation H � 3 for both knot points.
In general, the larger the number of time points, the smaller the resulting value of the GCV. Furthermore, the greater the number of knot points, the larger the GCV. Other results show that a larger oscillation is not guaranteed to produce large or small values of the GCV. So, it is necessary to choose the optimum oscillation that produces the smallest GCV.

Application to Real Data.
e MTSFS model for longitudinal data obtained has been applied to the stroke patient dataset. ese data were taken after an initial study of GCS on stroke patients and the factors that influenced it. e pattern of the relations between the predictors and response followed the characteristics of a truncated spline and a Fourier series.
ere is a predictor with the form of a truncated spline, which is changing in certain subintervals. e other predictor has the form of a Fourier series, with a repeating pattern.
Stroke is a noncommunicable disease. e number of those who suffer from it continues to increase in the world. In 2013, stroke was the second leading cause of death globally (11.8% of all deaths) after ischemic heart disease (14.8% of all deaths). Besides, stroke is the third cause of disability, namely, 4.5% of all causes of disability [11]. Based on the Global Burden of Disease (GBD) Study 2016, the estimated global lifetime risk of stroke for those aged 25 years or older almost reached 25% [12]. e global prevalence of stroke in 2017 was 104.2 million people. In addition, age-standardized stroke prevalence rates were highest in Eastern Europe, North Africa, the Middle East, and Central and East Asia. Several countries in Europe, North Africa, and Central Asia have the highest rates of stroke mortality. Indonesia is one of the countries with the highest death rate due to stroke. According to the Indonesia Basic Health Survey 2007, stroke was the highest cause of death (15.4%) [13]. e prevalence of stroke in Indonesia in 2013 was 7% and increased to 10.9%, according to the Indonesia Basic Health Survey 2018. Furthermore, the stroke prevalence for those aged 15 years or older was 10.85%.
Stroke patients often experience head injuries due to falls. Trauma or head injury requires supervision to ensure further medical treatment. GCS was initially used to assess consciousness level after head injury and is now used in the medical field of both acute and trauma patients. According to Champion [14], trauma severity assessment is used to measure the severity of the injury and describe the patient's case's severity. An injury to the head will activate the immune response and release of interleukins by activating white blood cells and increasing the temperature. Injury to the hypothalamus also increases interference with body temperature regulation [15]. Fever is generally defined as an increase in body temperature above average and has been identified as one of the causes of worsening head injury [16]. Based on previous studies, the pulse was found to be dysfunctional in patients with severe brain injury. Changes in pulse rate are associated with increased mortality in brain injury patients. erefore, changes in patients' pulse rates with severe brain injury should be carefully observed [17].
is article uses GCS as a response variable with 20 stroke patients (n � 20) with 14-day (T � 14) measurements for each subject. Simultaneously, the predictor variables are body temperature (BT) and pulse rate (PR). e partial relationship between GCS and each predictor variable is illustrated in Figure 1. e plot in Figure 1 shows a different pattern for each predictor. For this reason, we propose the MTSFS model for longitudinal data. Body temperature will be approximated by a truncated spline estimator, while the pulse rate will be approximated by a Fourier series estimator.
Based on Table 2, some scenarios were performed to compare the effectiveness of the proposed model, i.e., using a single estimator and the mixed estimator. By using the GCV formula in Section 3.3, we selected the best model according to the minimum GCV criterion. As shown in Table 2, modeling GCS in stroke patients produces the smallest GCV value of 45.3517, achieved at when the number of knots is K � 1 and number of oscillation is H � 3 with λ � 10. is study revealed that the MTSFS model is better than using a single estimator.
is model yields a RMSE � 2.43. e important result was that the number of knots and the number of the oscillation parameters of the best model in the simulation study are in line with those for the case study. Similar to the simulation study, the best model had one knot and the oscillation parameter H equal to three. Figure 2 presents a comparison between the response variable (red line) and fitted values (blue line) using the

Conclusions
is article presented a new mixed estimator for nonparametric regression models for longitudinal data, combining the truncated spline estimator and Fourier series to obtain better estimation results if the predictors have different data patterns. A new two-step method for estimating the parameters used penalized weighted least squares and weighted least squares.
In both the simulation study and case study, the best model was selected by using the minimum generalized crossvalidation (GCV). A higher number of knots or oscillation parameter does not produce a high GCV. erefore, several combinations of knots and oscillations had to be tried to determine the best model. Interestingly, the number of knots and oscillations of the best model is the same for the simulation study as for the case study with real data: one knot and three oscillations. e proposed model estimator is a useful alternative for estimating nonparametric regression curves for longitudinal data. e presented results suggest that the next study can use different weighting matrices and more knots, so that researchers can compare the results for improving the model performance. Another possible line of further research could be simulation studies using another function to evaluate the performance of the proposed model.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.