Improvement of the Nonparametric Estimation of Functional Stationary Time Series Using Yeo-Johnson Transformation with Application to Temperature Curves

In this article, Box-Cox and Yeo-Johnson transformation models are applied to two time series datasets of monthly temperature averages to improve the forecast ability. An application algorithm was proposed to transform the positive original responses using the first model and the stationary responses using the second model to improve the nonparametric estimation of the functional time series. The Box-Cox model contributed to improving the results of the nonparametric estimation of the original data, but the results become somewhat confusing after attempting to make the transformed response variable stationary in the mean, while the functional time series predictions were more accurate using the transformed stationary datasets using the YeoJohnson model.


Introduction
Forecasting the future is the main function of time series analysis. Proceeding from this idea, researchers have developed several techniques that are concerned with the improvement of accuracy of forecasts by treating the time series as a stochastic process. A functional data analytic approach or so-called a stochastic forecast [1] allows dealing with the observations as a function [2] freely outside of the conditions of parametric and fully nonparametric modeling. This handling of observations in time series data makes it sequential and can be separated into successive time periods [3]. Thus, the dimensions of the time series are reduced with a limited loss of information [4] and represent the data in a linear combination of a few functions (carefully selected) instead of treating the data in its original form as a single vector of values [2], that is, processing and transforming the structure of time series data in line with the structure of regression models. Shang in 2019 showed that with the time dependence of observations in some datasets, the principal component method may lead to erroneous estimates. Therefore, the two authors believe that this problem may be exacerbated in some time series data, especially those that are characterized by the presence of seasonal changes. However, it has become known in practical applications of time series that they are rarely stationary and that seasonal changes, trend, and dependence on external factors have become the rules, not the exception [5]. For this reason, it can be said that the data transformation has become a part of the traditional parametric and nonparametric analysis of complex time series.
In this article, the two authors have used the Yeo-Johnson transformations to improve the nonparametric estimation of the functional time series. The use of both approaches, transformation, and functional analysis without considering the modeling conditions is an attempt to focus the analyzing goal and the efficiency criterion in the context of forecast ability.
The rest of the article is organized as follows. The Box-Cox and Yeo-Johnson transformations are presented in the next section. The third section contains the formulation of the problem and the proposed application methodology. The practical examples are included in the fourth section, while the fifth section contained some conclusions.

Box-Cox and Yeo-Johnson Transformations
Box and Cox [6] suggested the Box-Cox transformation (BCT) methodology in regression models to reduce anomalies in data, reduce nonlinearity, and achieve normality random errors. The methodology assumes, for any response variable Z > 0 and λ ∈ R, the transformed variable ΨðZÞ = ðZ λ − 1Þ/λ when λ ≠ 0 and ΨðZÞ = LnZ when λ = 0. And when λ is equal to 1, the data is analyzed in its original scale, whereas the case λ = 0 corresponds to the natural logarithmic transformation of the data. BCT is based on the assumption of the transformed response normality and then defining the probability density function of the original response as a "backward transformed" of change of variables technique.
Yeo and Johnson [7] generalized the BCT to include negative and positive values in datasets [7]. They used a smoothness condition to combine the transformations for positive and negative observations, thus obtaining a one-parameter transformation family [8]. For Z ∈ R, the YJT is given by Ln −Z + 1 ð Þ, λ = 2 and Z < 0: This transformation is appropriate for correcting both left and right skew when λ > 1 and λ < 1, respectively, while the linear relationship is achieved when λ = 1 [9]. Also, Yeo-Johnson transformations (YJT) can hold the properties of the log-mean standardization after the inverse transformation since ΨðZÞ is invertible [10].
In 1970, Box and Jenkins recommended for the first time the use of power transformation in ARIMA models [11]. After that date, many authors took up this topic and made numerous proposals in many mathematical and applied aspects of the time series. Also, some of them indicated some failures in practical cases, for example, the success limitation of the normality assumption of the transformed data, and that it could lead to noticeable improvements in the simplicity of the data models and the accuracy of the estimate [12] especially in the models with skewness for variables [10]. Cook and Olive, [13] and Atkinson [8] also point out that the estimation of the transformation parameter can be particularly sensitive to outliers. And in some practical cases of time series, the BCT may not lead to an improvement in forecasting performance [11], or as Chen and Lee [14] say, it does not consistently produce superior forecasts.
Some problems in practical applications occur for two reasons: the first is the difficulty in obtaining an optimum value of the transformation parameter, so that at the same time, the conditions of the fitting of assumed distribution of the transformed data are met, and the model errors are minimal, while the second is that the transformations lead to a change in the nature of the relationships between the variables of the model, which may lead to a lack of balance between the efficiency of statistical inference and the ability to interpret the sizes of the variables' influence [15].

Formulation of the Problem
Let us consider a univariate time series {Z t , t ∈ Rg, by redividing the time series sample into (p − 1) statistical samples of size ðn = N − s − p + 1Þ. This division allows the time series to be redefined as functional data fðX i , Y i Þg i=1,::,n in such the variation trends between times of the series are diagnosed through the functional analysis tools [1]. Thus, the relationship can be described as a standard regression model.
where mðXÞ is the smooth functional data, ε is a sequence of independent identically distributed function white noise sequence in such Eðε/XÞ = 0. X 1 , X 2 , ⋯, X n are identically distributed as the functional random variable In order to characterize the relationship, the response Y, given the functional variable X, assumes that N = nτ for some n ϵ N and some τ > 0. And then, we get a statistical sample of curves X i = fZðtÞ, ði − 1Þτ < t ≤ iτg of size ðn − 1Þ and the response Y i = Zðiτ + sÞ, i = 1, ⋯, n − 1 [16,17]. The usual nonparametric estimation of the functional relation has several advantages and can be very well adapted to local features of time series data [18] and robustness to functional form misspecification [19]. The kernel regression estimator is evaluated at a given function mðXÞ bŷ where K is a kernel function, h (depending on n) is a positive real bandwidth, and d ðX, X i Þ denotes any semimetric (index of proximity) between the observed curves. The authors suggest several ways to find equation (3) including kernel regression estimator, functional conditional quantiles, and conditional mode. A number of useful explanatory methods can be used to measure the closeness (proximities) between the curves of the functional variables in a reduced dimensional space. Ferraty and Vieu [16] refer to at least three families of semimetrics to measure d ðX, X i Þ, for example, the functional principal component analysis (FPCA) in which the proximity is measured by the square root of the quantity Ð ðX i ðtÞ − X j ðtÞÞ 2 dt. Also, there is another measure which is based on the second derivative, where the proximity is measured by the square root of the quantity Ð ðX (1986), Ferraty and Vieu [16], and Febrero-Bande and Oviedo de la Fuente (2012)).
Regarding the kernel estimator (3), Wand et al. [20] indicated that it is not working well when the data are asymmetric, as for the standard PCA which may not be the suitable technique to apply when the data distribution is skewed or there are outliers [21]. Therefore, power transformation is considered one of the important alternatives to improve the efficiency of nonparametric estimation of functional data (for more details, see [12,22,23]).
Most transformation approaches have a common analytical path, which is the choice of the power transformation model, and propose an algorithm for estimating the power parameters in parallel with the mechanisms of estimating the traditional parameters of the model. Also, there are two common directions of the power parameter's estimation; the first is the parametric direction in which the power parameters are estimated under the statistical modeling assumptions. The most important methodology of this direction is the Box-Cox transformation (BCT) to improve the efficiency of the multiple linear regression model under the normality assumption of transformed response [6]. Also, Wand et al. [20] used the same methodology of Box-Cox to improve the efficiency of density estimation under the assumption of some distributions of the transformed variable [20] (see also [24,25,26]).
The second direction is the nonparametric estimation of power parameters without any assumptions about the response and error distributions or what might be called the model-independent approach [27] (see also [28,29]). In this direction, the power parameters can estimate according to some decision rules such as minimizing or maximizing some indicators of model efficiency.

Application Methodology
It is known that the power transformations are important for making the time series stationary in the variance, while the differencing is useful for making the time series stationary in the mean. Generally, none of these approaches can be a substitute for the other. However, sometimes power transformations can make the time series stationary. And because the BCT is used to transform the positive responses, it becomes important to use it to transform the original data as a first stage and then calculate the differences to achieve the stationary of the time series. And as a result, the variance stabilizing obtained from the power transformation will be affected by the differencing process. In this regard, Dittmann and Granger [30] indicate that for every nonstationary process, the polynomial transformations are also nonstationary and have a stochastic trend in mean and invariance. To overcome these problems, the authors believe that the use of YJT will be appropriate to improve forecastability, because it can be used to stabilize the variance in stationary time series. Also, the estimation of the power parameter according to a certain decision rule that we have referred to would be appropriate as long as the issue is related to the nonparametric functional analysis.
The application methodology includes estimating the smooth functional data mðXÞ in the regression equation (2) according to the kernel estimator equation (3) after transforming the time series dataset. The BCT was applied to the original time series dataset, while the YJT was applied to the stationary time series dataset. So, the statistical sample of curves was redefined by the expression and the response by the expression where i = 1, ⋯, n − 1 and Ψ λ represents a data transformation function by the power parameter λ.
For each transformation model, the decision rule adopted for selecting the optimal estimate of power parameter λ is that which corresponds to the lowest estimates of the mean squares of the forecasting errors of the last curve of functional variable according to the equation where Z ∧ j and Z j are the j-th estimated and real values in the last curve. As for Z ∧ j values, they are computed from the inversions of BCT and YJT, or what we might call the retransformation from the transformed data metric to the original metric.
So, the application algorithm of BCT and YJT models and nonparametric estimation of the transformed functional time series were as follows: (1) Fix τ to define expressions (4) and (5) (2) Remove the seasonality patterns by taking the differences to make the time series stationary (4) For each λ ϵ Λ, BCT is used to transform the original time series ZðtÞ and YJT is used to transform the stationary time series of k differences Δ k ZðtÞ to get the two explanatory functional matrices Ψ λ ðXÞ = ½Ψ λ ðZÞ nxτ and Ψ λ ðXÞ = ½Ψ λ ðΔ k ZÞ nxτ , (for more details about the matrices fille organizing in R program, see [16,31]).
(5) Evaluate the explanatory function estimation of the relationship Ψ λ ðY i Þ = mðΨ λ ðXÞÞ + ε according to the following kernel estimator: The optimal value λ * of the power parameter λ is the one that minimizes the MSEðX n Þ of the last functional variable.

Applications
The two transformation models, BCT and YJT, were applied to two time series examples of monthly temperature averages, and an R program was used to analyze the data. The first time series has a size of 200 observations of Nineveh City in Iraq (TSN) for the eight rainy months in every year. We take the monthly average of the meteorological station of Nineveh for the period 1976 to 2000 (Figure 1(a)). The second has a size of 300 observations of Tunisia (TST) for all months of the period 1991 to 2015. The data can be found at https://climateknowledgeportal.worldbank.org (Figure 1(b)). It was found that the two time series are not stationary, and this is clearly demonstrated by the values of the autocorrelation functions (ACF) outside the confidence levels in Figure 2.
By applying BCT model to the two time series according to the five-step algorithm suggested in Section 4, we obtained the results shown in Table 1.
As expected, it is evident from the results shown in Table 1 that the estimate of MSE has decreased when using BCT compared to its value resulting from the analysis of the original data when λ = 1: These confusing results were overcome when the YJT model was used according to the same five-step algorithm.
As for the attempt to make the original and transformed time series stationary by the first-order differences, the MSE estimation increased in the two transformed series and decreased in the original series. By applying the YJT model according to the same five-step algorithm, more accurate predictions of fewer errors were obtained compared to the error estimates obtained by using the BCT model (Table 2). Figure 3 shows the plots of the original and predicted values for the latest curve (25th year) after smoothing the data using the YJT model for the two time series.

Conclusions
It is important to note that the optimum power parameters λ * for both transformation models are significantly different even though YJT represents the extended version of the BCT model. The authors believe that this difference and the amount of displacement in the original data generated by both models were due to the use of a nonparametric estimation method to choose the optimal power parameter as an alternative to the parametric method for the hypothesis of normality of transformed response, in addition to the differences in the level of homogeneity between stationary and nonstationary time series datasets.
The application methodology in this article demonstrates that YGT could be a successful alternative to BCT to improve the nonparametric estimation of the functional time series. Also, the nonparametric estimation of the power parameters not restricted by the conditions of the probability distribution provides the researcher with wide options to ensure the accuracy of the prediction.

Data Availability
The datasets supporting the conclusions of this article are included in the article.