Nonparametric Regression Model for Longitudinal Data with Mixed Truncated Spline and Fourier Series

Existing literature in nonparametric regression has established a model that only applies one estimator to all predictors. This study is aimed at developing a mixed truncated spline and Fourier series model in nonparametric regression for longitudinal data. The mixed estimator is obtained by solving the two-stage estimation, consisting of a penalized weighted least square (PWLS) and weighted least square (WLS) optimization. To demonstrate the performance of the proposed method, simulation and real data are provided. The results of the simulated data and case study show a consistent finding.


Introduction
Regression analysis is aimed at modeling the association between the predictor and the response. If the data pattern shows an unknown regression curve, nonparametric regression is used [1]. However, if the form of the regression curve is known, parametric regression can be applied [2]. Additionally, nonparametric regression has high flexibility because the data is expected to find its regression curve estimation form without being influenced by the researcher's subjectivity [3]. In this study, we have analyzed several models such as kernel, spline [4][5][6][7], and Fourier series [8].
A spline estimator, which has an excellent ability to handle data with changes at subspecified intervals [9], was obtained using penalized least square optimization [10] and the Bayesian approach [11]. A spline estimator can be applied for cross-sectional data as well as longitudinal data. Additionally, several studies on nonparametric regression for longitudinal data have been addressed using kernel estimator [12,13], generalized spline regression [14], and mixed-effects model [7]. Fourier series, which is useful to explain curves that show sine and cosine waves, is generally used if the data pattern is unknown and there is a tendency to iterate.
A considerable amount of research has used only one estimator for each predictor. However, because each predictor can have a different pattern, it was proposed to develop a mixed estimator. Recently, Sudiarsa et al. [15] discussed a study of the mixed estimator of the truncated spline and Fourier series. The study, which only discussed cross-sectional data, did not obtain a model for each subject as it did not include longitudinal data. Consequently, this study cannot be used to investigate response behavior based on the time change.
Although some research has been carried out on a mixed estimator, no studies have explored multisubject data so far. This paper proposes a new methodology for a mixed estimator of the truncated spline and Fourier series in the nonparametric regression for longitudinal data. This study addresses the gap in previous research by obtaining a mixed estimator of the truncated spline and Fourier series in the nonparametric regression for longitudinal data and applying it to simulated data and a case study.
This study is organized as follows. We briefly explain the materials and methods used in our study in Section 2. Section 3 consists of three subsections: the developed theory, simulation study, and case study. We present the developed nonparametric regression theory for longitudinal data with a mixed estimator of the truncated spline and Fourier series with two-stage estimation in Section 3.1. In Section 3.2, we conduct a simulation study based on the developed theory to assess the proposed estimator's behavior. To illustrate the applicability of the model, we use a dataset of patients with pulmonary tuberculosis in Section 3.3. Section 4 presents the conclusion.

Materials and Methods
Longitudinal data has n independent subjects and T observations for each subject. Given paired data ðx 1it , ⋯, x pit , z 1it , ⋯, z qit , y it Þ, which consist of p and q predictors with n subjects, each subject has T observations. The relationship between x 1it , ⋯, x pit , z 1it , ⋯, z qit and y it , which followed a nonparametric regression model for longitudinal data, is as follows: Each regression curve is additive so that the model can be expressed as ∑ p j=1 f ji ðx jit Þ is the truncated spline component and ∑ q k=1 g ki ðz kit Þ is the Fourier series component.
This study's first objective is to obtain the mixed estimator of the truncated spline and Fourier series in nonparametric regression for longitudinal data. To achieve this, we propose a two-stage estimation method. The first stage is estimating the components of the Fourier series using the penalized weighted least square (PWLS) method. The second stage is estimating the truncated spline component using the weighted least square (WLS) method. For the second goal, that is, a simulation study, we generate functions that meet the truncated spline and Fourier series characteristics. In the third step, we apply the developed theory to a dataset of patients with pulmonary tuberculosis.

Mixed Model of Truncated Spline and Fourier Series with
Two-Stage Estimation. Lemmas and theorems are used to obtain a nonparametric regression model for longitudinal data with a mixed estimator. The regression curve component that is approximated by the Fourier series estimator is presented in Lemma 1 and the penalty component for the Fourier series function is presented in Lemma 2. Following the PWLS form in Lemma 3, we estimate the Fourier series component by using PWLS in Theorem 4. The regression curve component that is approximated by the truncated spline estimator is presented in Lemma 5, and we estimate the truncated spline component using the WLS method in Theorem 6. The results are summarized as follows: Lemma 1. If g ki ðz kit Þ is approached by the Fourier series function, then the goodness of fit is where N = n × T and W is the nT × nT weighting matrix.
Proof. The regression curve g ki ðz kit Þ is a regression curve of an unknown shape and is contained in continuous space Cð 0, πÞ. The component of g ki ðz kit Þ, k = 1, 2, ⋯, q, in Equation (2) is approximated by the Fourier series function with the trend line as follows: Whereas, c is a ð2 + HÞnq × 1 vector given by To estimate the Fourier series component, the nonparametric regression model in Equation (1) can be written as The model in Equation (14) can be written in matrix form: Then, a goodness of fit for the model is formed as follows: with N = n × T. If the function g ki ðz kit Þ is approached by a Fourier series function as in Equation (4), then the goodness of fit can be presented in the form with W as a weighting matrix for the regression of longitudinal data.

Lemma 2.
If the Fourier series is given, then the penalty component is , 3 Abstract and Applied Analysis in the PWLS optimization based on Equation (4) can be obtained as follows: As a result, To simplify, we defined The value of A will be obtained as follows.
Furthermore, the value of B is given by Based on Equations (22) and (23), it can be written as follows: For i = 1, 2, ⋯, n, we obtained where :: Thus, the penalty component can be expressed in a matrix form as follows: Lemma 3. If the goodness of fit component is presented in Lemma 1 and the penalty component is given by Lemma 2, then the PWLS is In general, PWLS is defined as follows: Besides, PWLS can be presented in the form of a matrix as follows: The next step is to obtain a Fourier series estimator in nonparametric regression for longitudinal data derived in Theorem 4. Abstract and Applied Analysis mixed estimator that minimizes PWLS in Lemma 3, isĝ ðk,h,λÞ ðx, zÞ = Ly * with y * = y − f and L = Z Proof. The first estimation step in the mixed estimator of the truncated spline and Fourier series in the nonparametric regression model for longitudinal data is performed by estimating the form of the Fourier series estimator by using the PWLS method. The PWLS in Equation (29) can be written in the form of a matrix as follows: Next, we complete PWLS optimization using the following steps: To complete the optimization, the estimators are obtained by performing a partial derivative of QðcÞ concerning c and the results are equaled to zero. The given results arê By substitutingĉ into Equation (9), we get So, the model in Equation (15) can be written as with y * = y − f and L = Z½Z ′ WZ + NDðλÞ −1 Z ′ W.
Lemma 5. If f ji ðx jit Þ is approached with the truncated spline function, then the WLS is W is a nT × nT weighting matrix.
Proof. f ji ðx jit Þ, j = 1, 2, ⋯, p, is a truncated spline estimator component. The component regression curve f is a linear truncated spline function defined as follows: with truncated function If the component of the regression curve f involves one predictor, then it can be written as follows: By using Equation (41), it can be described in the form of a matrix as follows:  M is a nT × ð1 + sÞnp matrix and γ is a ð1 + sÞnp × 1 vector. The mixed model of nonparametric regression for longitudinal data in Equation (1) can be written in a matrix as follows: By substituting Equation (36) with Equation (46), we get The truncated spline component can be written as Substituting Equation (45) with Equation (48), we obtain If the function f ji ðx jit Þ is approximated by the truncated spline function as in Equation (38), then Therefore, we obtain the WLS by where W is a weighting matrix for the regression of longitudinal data. Next, the truncated spline estimator in the nonparametric regression for longitudinal data is derived in Theorem 6.

Theorem 6.
If paired data which follows the nonparametric regression model for longitudinal data is given, then the mixed estimator that minimizes WLS in Lemma 5, Proof. The second estimation stage of the mixed estimator of the truncated spline and Fourier series in the nonparametric regression model for longitudinal data is performed using the WLS method. The estimator can be obtained by completing the WLS optimization as follows: The estimators are obtained by performing a partial derivative of QðγÞ and the results are equaled to zero. The partial derivative results are as follows: where J = ð2 − L ′ ÞWLM − WM and K = ½ðL ′ − IÞWðI − LÞ.
By substituting b γ into the form of a truncated spline estimator component as in Equation (45) So,ĝ     The mixed estimator depends on the optimum knot point, oscillation parameter, and smoothing parameter. To obtain the best model, it is essential to select the optimum parameter. One of the criteria to select the optimum parameter is the generalized cross-validation (GCV) method [11]. The GCV function of the nonparametric regression model for longitudinal data is as follows: The optimum knot point, oscillation parameter, and smoothing parameter are obtained by solving the minimum optimization, as presented in Equation (60).

Simulation Study.
To demonstrate the performance of the proposed method, we created one sample size n = 10 with t = 13. For the simulation study, we considered ten models for each subject. The models are generated from the formula that contains two different functions to represent the truncated spline and Fourier series pattern. A polynomial function is used to present the truncated spline, while a trigonometry function is used to present the Fourier series. Additionally, x it and z it are generated from Uð0, 1Þ distribu-tion, and random errors ε it are generated from a multivariate normal distribution.
Using ten subjects and two predictors, the formula for generated data is stated as follows: The simulation study is applied based on these models, as shown in Table 1. Figure 1 illustrates the partial relationship between the response and each predictor variable. It can be seen that the relationship between predictor x and the response for each subject tends to change at certain subintervals, which is suitable for the truncated spline estimator. The relationship between z and the response for each subject has a repetitive pattern with a particular trend line, which is suitable for the Fourier series estimator.
Wu and Zhang [7] stated that a regression's performance strongly depends on good knot locations and a good choice of the number of knots. In general, the number of knots is smaller than the sample size n. Considering the scatterplot of simulated data and computational convenience, our study uses three knots ðK = 1, 2, 3Þ and three oscillation parameters ðH = 1, 2, 3Þ. To choose the optimum parameter, we use the minimum GCV criteria. Table 2 provides a summary of the GCV for varying knots and oscillations. What is remarkable

Case Study.
After conducting the simulation, we applied the proposed model to the case to confirm the results of the previous simulation. The data for this research was obtained from a study conducted by Fernandes and Solimun [16], that is, patients with pulmonary tuberculosis disease. Pulmonary tuberculosis is a contagious disease caused by Mycobacterium tuberculosis, which can attack various organs, particularly the lungs. This disease is typical among women in their productive years (ages 15-50 years). The World Health Organization (WHO) declared tuberculosis a global emergency in 1992 [17]. WHO report in 2013 stated that there were 8.6 million tuberculosis cases in 2012, of which 40% of the cases were in Southeast Asia. In a further report, Indonesia was noted as the country with the second-largest number of cases, 2.8 million, in 2015. This study's dataset consists of four patients ðn = 4Þ that represent radiological images of the thorax (stadium), which are minimal lesion, mod advance, far advance, and KP Miller. suPAR level as a response ðyÞ with mL units in several observation periods every two weeks for six months of treatment ðt = 13Þ is observed. The predictor variables are the erythrocyte sedimentation rate ðxÞ with mm/hour units and body mass index ðzÞ with kg/m 3 units.
The partial relationship between the suPAR level and each predictor variable for each subject is presented in Figure 2. There were changes in data patterns in the four subjects observed for six months with measurements taken every two weeks. The plot in Figure 2 shows a different pattern for each predictor. For this reason, we propose a nonparametric regression approach based on a mixed estimator for longitudinal data. The erythrocyte sedimentation rate ðxÞ will be approached by a truncated spline estimator, while the body mass index ðzÞ will be approached by a Fourier series estimator.
In this case study, similar to the simulation study, only three knot points and three oscillation parameters were used. From the various knots and oscillation parameter results, we obtained the GCV values listed in Table 3. Interestingly, the data in this table is that the minimum GCV achieved one oscillation parameter under the same conditions as the simulation study, that is, one knot point. However, it has a different smoothing parameter, λ = 1:25. This model provides a GCV value of 0.239 with a RMSE of 0.197. The knot point location and the results of the parameter estimation for each subject of patients with pulmonary tuberculosis are presented in Tables 4 and 5, respectively. Based on the optimal knot points in Table 3 and the parameter estimation for each subject in Table 4, the nonparametric regression model based on a mixed estimator for longitudinal data can be written as follows: (1) Model estimation for subject minimal lesion:

Conclusions
Based on the simulation study and the case study, we selected the best model by using the minimum GCV. The higher knot point or oscillation parameter does not produce a high GCV and vice versa. Therefore, we tried several combinations of knot points and the oscillation parameter to choose the best model. The result of the case study of patients suffering from pulmonary tuberculosis is similar to the simulation study. This study found that the best model uses a one knot point and one oscillation with different λ. It can be concluded that the simulation study supports the results of the case study. A limitation of this study is that it does not investigate other sample sizes. Consequently, we cannot compare the performance of the developed theory for different sample sizes. Despite its limitations, the study certainly adds to our understanding of the mixed estimator's new theory in nonparametric regression for longitudinal data.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.