Likelihood Inference of Nonlinear Models Based on a Class of Flexible Skewed Distributions

and Applied Analysis 3 the definition of STN distribution and, as a consequence, it shares analogous feature with these two distributions. For all of that, the FSTN distribution presents some interesting and peculiar features and is able to regulate the density in a more flexible way. To be particular, except for modeling skewness and tail thickness, the FSTN distribution allows for multimodality, depending on the specific setting of α. For the purpose of comparison and illustration, we assume f 0 (z) = φ(z) and G(z) = Φ(z) in FSS distribution, which is denoted by FSN, and, then, we set K = 2 in P K (z), that is, P K (x) = α


Introduction
The common assumption of distribution for random error is normal in statistical modeling.This assumption may lack the robustness against departures from normality and/or outliers and may result in misleading inferential results [1,2].For the past few years, there is an increasing interest in developing more flexible parametric families capable of adopting as closely as possible real data, which exhibit quite substantial nonnormal characteristics such as skewness and heavy tails.In a variety of applications, one popular option is to modify a symmetric probability density function of a variable, thereby introducing skewness.An important advantage of this sort of approach compared with other approaches to robustness is an explicit statement of the probabilistic setting, leading to a clear interpretation of the results [3].Following this idea, the skw-normal (SN) distribution was firstly introduced by [4], and, then, the skew-t (ST) distribution was introduced by [5]; the skew-t-normal (STN) was introduced by [6]; moreover, some extensions to these multivariate cases were studied by [7,8] and so on.Since then, several authors have tried to extend these results to more general forms of skew-symmetric distributions, of which here we would like to mention [9], in this paper; they proposed a general framework of distributions which is called flexible skewsymmetric (FSS) distribution.As pointed out by [10] that this distribution family enjoys a sufficient flexibility in that with different choice of submodel settings, the FSS distribution includes several known distributions such as the SN and ST as its special cases.
However, in many practical applications, it is not rare at all to encounter a multimodality, sometimes with an even irregular shape, and, for this case, all the distributions mentioned above appear to be unsufficient to describe the multimodal feature of the data.A solution to this problem is to use finite mixture models.In [11], the authors worked with a mixture model with component densities belonging to the STN distribution and a computationally feasible EMtype algorithm was developed for calculating the maximum likelihood (ML) estimates of parameters.Unfortunately, although the proposed methodology is useful for analyzing multimodal asymmetric data, it suffers from the problem of "model identification" as the number of the parameters to be estimated is usually large.As a result, in this paper, we deal with a new extension of the class of FSS distributions, which is referred to as flexible skew-t-normal (FSTN) distribution.This new distribution is proposed within the general framework of the FSS distributions in combination with the definition of STN distribution.In practical applications, it is able to regulate the density in a more flexible way to offer robustness and it can be treated as an appealing option for accommodating data with skewness and heavy tails as well as multimodality jointly.
On the other hand, nonlinear regression models are widely applied in the fields of economics, engineering, biomedical research, and so forth, where the nonlinear function of unknown parameters is used to explain or investigate the nonlinear relationship of random phenomena under study.More recently, several authors have used a class of skewed distributions in the context of nonlinear regression models, and some valuable results were obtained.For example, [12] developed the robust estimation and the local influence analysis for regression model with SMSN distribution.From Bayesian point of view, [13] considered the Bayesian estimation and the case influence diagnostics for nonlinear regression models with SMSN distributions.More related literature could be found in [14][15][16][17].Generally speaking, for model fitting of the nonlinear regression with skewed distributions, a popular approach is to consider the hierarchical representation of variables with a specific distribution, in which the postulated distribution is expressed as several conditional distributions of simpler forms such as normal and Student's  and Gamma.Based on that, EM algorithm or Bayesian hierarchical approach then can be implemented effectively for conducting model estimation and statistical inference.
In this paper, our aim is to develop an approach to likelihood inference of nonlinear regression models with FSTN assumption.As there is no stochastic representation for FSTN distribution, all the methods cited above become unavailable for our considered problem and an alternative way is to return to the original Newton-Raphson iterative procedure for model estimation.Under the nonlinear regression paradigm, the accuracy of estimates is affected by the strength of nonlinearity and the corresponding confidence interval and hypothesis test require the assumption of normality of the estimators or distribution, which is too restrictive.Besides, considering that in many practical applications, rather than the total parameters, we are usually interested in a proper subset of them.By taking all these factors into account, in this paper we focus on the parameters of interest and propose a modified Newton-Raphson iterative algorithm for calculating the ML estimates based on profile likelihood.Furthermore, the confidence interval and hypothesis test for the parameters of interest are also considered.We conduct an application and a simulation study to compare the algorithm effectiveness and distribution robustness for nonlinear regression model in terms of fitting performance and model selection.The results from the numerical examples illustrate the usefulness and the superiority of our methodology.
The remainder of this paper is organized as follows.In Section 2, we briefly discuss the FSS distribution and FSTN distributions.In Section 3, we present the likelihood inference including the quantities of the first-and the secondorder derivatives as well as the standard Newton-Raphson iterative formula.In Section 4, we give an introduction of profile inference for our proposed model, where the confidence estimation and hypothesis test are presented too.Section 5 gives numerical examples using both simulated and real data to illustrate the performance of the proposed methodology.Finally, some concluding remarks are given in Section 6.

Models and Notation
The class of skewed distributions such as SN, ST, and STN perform to be plausible for modeling skewness or (and) heavy tails underlying the observations.The actual situation is that it is not rare at all to encounter multimodality, sometimes with an even more irregular shape, and, for this case, the aforementioned distributions become unsufficient.In this paper, with the adoption of a sufficiently flexible class of distributions, we consider one of these extensions, referred to as the family of flexible skew-symmetric (FSS) distributions which is introduced by [9] with the following density function of type: where  0 and  are symmetric univariate density and distribution function, respectively, that is,  0 () =  0 (−), (−) = 1 − (), and   () =  1  +  3  3 + ⋅ ⋅ ⋅  2−1  2−1 is an odd polynomial of degree  (i.e., a polynomial including only terms of odd degree),  =  −1 ( − ),  ∈  is the location parameter,  > 0 is the scale parameter, and  = ( 1 ,  3 , . . .,  2−1 ) and  1 ,  3 , . . .,  2−1 ∈  are shape parameters.
In general, the density function of STN distribution can be represented as 2 −1 (; V)Φ(), where  and Φ, respectively, denote the univariate standard Student's  density function and the univariate standard normal distribution function and V is the degrees of freedom.The skewness is regulated by the shape parameter  and the tail thickness of the distribution is controlled by V. Commonly, in comparison with the SN distribution, the STN distribution exhibits obvious feature of heavy tails when V ≤ 10.
In this paper, we work with one version of ( 1) and the specific definition can be presented as follows.Let  0 () = (; V) and () = Φ(), where Φ() is defined as before and that is, the density function of univariate Student's  distribution with 0 location,  scale, and V degrees of freedom.The above extension of ( 1) is referred to as flexible skew-t-normal (FSTN) distribution, denoted by FSTN(, , , V).
It is noted that the FSTN distribution is proposed within the general framework of FSS distribution by combining with the definition of STN distribution and, as a consequence, it shares analogous feature with these two distributions.For all of that, the FSTN distribution presents some interesting and peculiar features and is able to regulate the density in a more flexible way.To be particular, except for modeling skewness and tail thickness, the FSTN distribution allows for multimodality, depending on the specific setting of .For the purpose of comparison and illustration, we assume  0 () = () and () = Φ() in FSS distribution, which is denoted by FSN, and, then, we set  = 2 in   (), that is,   () =  1  +  3  3 in STN distribution, and, for this case, different selections of  1 and  3 determine whether the density is unimodal or bimodal.Moreover, the same assumption for  and  in FSTN distribution is made.
Figure 1 displays the density functions of FSN and STN as well as FSTN distributions with four different situations considered, namely,  1 = 1,  3 = 0, and V = 10;  1 =  3 = 1 and V = 6;  1 = 1,  3 = −1, and V = 4;  1 = −1,  3 = 1, and V = 4, respectively, with  = 1,  = 1.5 for all cases.By examination of Figure 1, we can detect how these three densities change with different combinations of  and V.For instance, in Figure 1(a), FSN, STN, and FSTN appear to be very close, while STN and FSTN are heavy tailed to a little extent.In Figure 1(b), both FSN and FSTN are unimodal when  1 and  3 keep the same sign, and the ranking for the degree of skewness is STN, FSTN, and FSN in turn; that is, the STN and FSTN distributions have thicker tails compared to FSN distribution.With opposite sign of  1 and  3 , both FSN and FSTN distributions are bimodal and highly skewed in Figures 1(c) and 1(d); moreover, in the same direction, FSTN has thicker tails than FSN distribution.Our proposed FSTN distribution can be treated as a proper compromise between the FSS distribution and the STN distribution.It allows for a wider range of tail behavior compared to FSS distribution whereas it is able to accommodate multimodality which cannot be described by STN distribution.From the applied viewpoint, the FSTN distribution is an appealing option which can be expected to yield robust inferential results in the presence of outlying observations.

Likelihood Inference
Consider  independent observations satisfying a nonlinear regression model as [  ] ∼ FSTN(  , , , V) with   = (x  , ) for  = 1, 2, . . ., .Here, x  is a -dimensional vector and  is a  × 1 vector of parameters.Also, let X = (x 1 , . . ., x  )  be the  ×  design matrix; (⋅, ⋅) is a known twice differentiable function.Then, the corresponding log-likelihood for parameter  = (, , , V) is given by And we have where   = (x  , ) and () = ([log Γ()])/.The corresponding second-order derivatives of (3) can be shown as ,  = 1, 2, . . ., , where Assume () = (  (), (), (V),   ())  is the gradient or score vector and () is the Hessian matrix composed of the above second-order derivatives.To obtain the ML estimate of , the Newton-Raphson iteration algorithm is defined by It is noted that the above iterative procedure is an unpartitioned algorithm; that is, all the parameters including nonlinear regression coefficients , scale parameter , and shape parameter  as well as tail thickness parameter V are estimated simultaneously.For our considered problem, there are at least two difficulties that may be encountered for ( 6); the first one is that once the number of the parameters to be estimated becomes large, the corresponding computational burden turns to be heavy with an unacceptable estimation error, and the second one is as follows: when the strength of nonlinearity of the link function (⋅, ⋅) changes, the iterative process may become unstable or even nonconvergent, leading to the poor estimation results.Considering the needs of practical problems, rather than the total parameter set, we are usually interested in a proper subset of it.To improve the efficiency of the algorithm and to facilitate statistical inference of the nonlinear models with FSTN distribution, we put forward the following profile likelihood method based on (3) and ( 6).
In the subsequent context, we focus on the estimation of  based on profile likelihood method [18].Firstly, suppose  is known and we rewrite the original likelihood function (3) as where notation   () denotes that  is fixed but  varies.For each , to estimate  we can obtain Alternatively, to estimate , we evaluate the maximum value of   () over γ and have where (, γ ) and β are referred to as the profile likelihood function and the profile ML estimation, respectively.
Following [19], we define the profile Newton-Raphson iteration formula as follows: where , and all the matrices and the vectors on the right hand side of ( 10) are evaluated at β()  and γ()  .Note that, both in (6) and in (10), the strength of nonlinearity of link function (⋅, ⋅) is reflected by  2   /  to large extents.Therefore, by examination of the expression of  2 ()/  , we find that when the element of And then, the iteration formulas of  * | , [ * | ] −1 , β , and β *  can also be obtained just as before.The above estimation procedure is referred to as profile modified Newton-Raphson iteration algorithm.
The asymptotic covariance matrix of the ML estimates for profile likelihood can be evaluated by inverting the expected information matrix; however, it does not have a closed form expression; the observed information matrix () = −( 2 (, γ )/  ) can be used as a replacement which is estimated by  11 −  12 ( 22 ) −1  21 , where  11 ,  12 , and  22 can be obtained as similar as above.
The choice of the initial values plays an important role in nonlinear regression fitting; in this paper, the specific steps outperform the STN distribution by producing estimates with lower bias and higher AIC proportion; furthermore, the PNR and MPNR methods perform better than the traditional NR method in general.
To study the consistence properties of ML estimate, we focus on the situation that the true distribution for random error is Case (II) whereas the fitting distribution is FSTN too.For this case, two estimation algorithms, PNR and MPNR, are adopted for parameter estimation and samples of different sizes ( = 50, 100, and 200) are generated from Models (1) and (2).We compute the 95% confidence interval for the parameters of interest and the mean square error (MSE) for different model, where MSE() = (1/500) ∑ 500 =1 ‖ () − ‖ 2 .The length of the 95% confidence interval and the MSE results are summarized in Table 2. From Table 2 we can see that the length of the confidence interval and the MSE tend to decrease as the sample size increases as expected.
Tables 1 and 2 show that in general FSTN distribution enjoys more robustness and flexibility in modeling data with skewness and heavy tails as well as multimodality in comparison with other skewed alternatives and the implementation of MPNR method brings more accuracy and improvement for model estimation in the context of nonlinear regression with this new distribution.

Conclusion
We have proposed a new skewed distribution based on the general FSS distribution framework, called the FSTN distribution, which is allowed to accommodate multimodality, asymmetry, and heavy tails jointly to offer greater flexibility than SN and STN counterparts.Moreover, we have developed a modified profile of Newton-Raphson iterative algorithm for estimating the parameters of interest of nonlinear model with FSTN distribution and the interval estimation and hypothesis test in a profile likelihood paradigm are also considered.
Numerical studies reveal that, in the context of nonlinear regression analysis, if the true distribution is STN or FSN whereas the fitting distribution is FSTN, the estimation results are not influenced by this misspecification of distribution assumption.However, once the true distribution is FSTN while the fitting distribution is STN, the estimation results appear to be somewhat disappointing, which shows the robustness of FSTN distribution.In general, the combination of FSTN distribution with MPNR method brings more accuracy and improvement on the estimation of nonlinear regression.
So far the present methodology is limited to the complete data analysis; the extensions of this paper include missing data version as well as Bayesian analysis of this model, which will be reported in another paper.

Table 1 :
Results of the simulation study.

Table 2 :
The table shows the length of 95% confidence interval of parameters of interest and MSE of specified model.