Some New Robust Estimators for Circular Logistic Regression Model with Applications on Meteorological and Ecological Data

Maximum likelihood estimation (MLE) is often used to estimate the parameters of the circular logistic regression model due to its efficiency under a parametric model. However, evidence has shown that the classical MLE extremely affects the parameter estimation in the presence of outliers. )is article discusses the effect of outliers on circular logistic regression and extends four robust estimators, namely, Mallows, Schweppe, Bianco and Yohai estimator (BY), and weighted BY estimators, to the circular logistic regression model. )ese estimators have been successfully used in linear logistic regression models for the same purpose. )e four proposed robust estimators are compared with the classical MLE through simulation studies. )ey demonstrate satisfactory finite sample performance in the presence of misclassified errors and leverage points. Meteorological and ecological datasets are analyzed for illustration.


Introduction
Circular data arise whenever the values of a random variable can present the circumference of the unit circle. It is measured by angles with values between 0 and 2π or 0°a nd 360°, for example, wind directions, animal navigation, or the values of any periodic phenomena such as a 24-hour clock or days of the year which can be converted to circular data [1]. e modeling of the relationship between circular variables is so-called "circular regression," and it is classified into three main classes, namely, circular-circular, circular-linear, and linear-circular regression models [2]. e applications on circular regression models are widely spread in many applied fields. Several regression models were proposed to predict a continuous circular variable from other circular or linear predictors [3][4][5][6].
Logistic regression analysis is a useful statistical tool that analyses the relationship between a binary response and a predictor. e theory of logistic regression is well developed [7]. Daffaie and Khan [8] proposed the circular logistic model to predict a binary variable from a circular variable, such as modeling the rainfall (yes, no) and wind direction, fatal road accident (yes, no), and time of accident. e existence of outliers is a common problem in regression analysis. For the linear logistic model, Feser and Pia [9] showed that maximum likelihood estimation can be influenced by outliers. Croux et al. [10] found that the most dangerous outliers termed "bad leverage points" are misclassified observations that are outlying in the design space of predicted variables.
Circular logistic regression is also subjected to the existence of outliers as shown in [11], where they have proposed an outlier detection procedure based on the penalized maximum likelihood and applied it on different real datasets.
Several robust estimators that are less affected by outliers are proposed in the literature to improve the estimation performance in linear logistic regression models [12]. e authors in [13] introduced weights depending on the response and covariates and proposed the Mallows-type estimator. is estimator was analyzed deeply by Carroll and Pederson [14] using Mahalanobis distance to reduce the weights in terms of leverage, and Bianco and Yohai [15] proposed BY methods.
Since the published work just considered the detection of outliers in the circular logistic regression model, this article attempts to overcome the problem of outliers in the circular logistic regression model by extending some robust estimators from the classical linear logistic to the circular logistic case. e rest of this paper is organized as follows. Section 2 reviews the formulation of the circular logistic regression model and its parameter estimation via MLE. Section 3 presents the types of outliers in circular logistic regression and derives the proposed robust estimators in a logistic circular regression model. Section 4 discusses the effect of outliers on the circular logistic estimators by computing the influence functions. Section 5 investigates the performance of the considered robust estimators. Section 6 applies the proposed estimators on meteorological and ecological datasets. Section 7 provides the conclusion.

Model Formulation
A circular logistic regression describes the relationship between a binary response and circular predictors. It shows potential with various applications in the field of environmental sciences. e authors in [8] assumed that n binomial observations with a probability of success, p i � (v i /n i ), for i � 1, . . . , k, depend on circular random variable u i , and the proposed model is given as follows: where β 0 is the value of the logit (log odds) when (u i − u 0 ) � 90°and u 0 is the angle where the logit reaches its highest value. Let β ⋆ � ������ β 2 1 + β 2 2 and u 0 � arctan(β 2 /β 1 ), and equation (1) can be written as Suppose that binomial data of the form v i successes out of n i trails are observations from a binomial distribution, the likelihood function is then given by Let η i � β 0 + β 1 cos u i + β 2 sin u i , and by using the exponential function, we obtain e maximum likelihood estimation (MLE) is classically used for parameter estimation and is defined by an objective function as where e maximum likelihood equations are given as follows: ese equations are solved iteratively by using the Newton-Raphson method. Recently, Abuzaid and ElShekh Ahmed [11] used the penalized maximum likelihood estimator (PMLE) to identify outliers in the circular logistic regression model and investigated its performance via simulation. e following section discusses some possible robust estimators for the circular logistic regression model.

Outliers in Circular Logistic Regression.
is section distinguishes different cases of outlying observations in circular logistic regression, where outliers might occur in the dependent variable, independent variable, or both of them.
For binary data, all the v's values are either 0 or 1. Hence, an error in the v i direction can only occur as a transposition of 0 to 1 or vice versa. is type of outlier is known as residual outlier or misclassification-type error [16,17].
A leverage outlier or leverage point occurs when the circular observation at position d (e.g., u d ) is contaminated as follows: where u ⋆ d is the value after contamination and c is the degree of contamination in the range 0 ≤ c ≤ 1. A leverage point can be considered as a good leverage point when v i � 1 with a large value of P(v i � 1|u i ) while it is a bad leverage point when v i � 0 with small value of P(v i � 1|u i ) and vice versa. Abuzaid and ElShekh Ahmed [11] considered the misclassification-type error outlier in their simulation study without any investigation of the leverage point detection.
Alternatively, robust estimators are the common methods used for handling the problem of outliers in logistic models. e following section presents four robust estimators for the parameters in the circular logistic model as follows.

Circular Mallows Class.
e proposed estimator (MALLOWS C ) is extended from the Mallows class in [18] to the circular logistic model for weighting the maximum likelihood estimator.
Assume F is a continuous and increasing distribution function and is given by en, the partial derivatives in (7)-(9) become respectively. e robust estimates for the circular logistic regression model in equation (10) are given by the solution of β obtained by where w i are the weights that may depend on u i , v i , or both and c(.) is a correction function needed to ensure consistency. If w i � 1 and c(.) � 0, then equations (12)- (14) give the usual circular logistic regression estimate. If w i � w(u i , η i ) and c(.) � 0, then the weights depend only on u i .

Circular Schweppe Class.
Stefanski [19] stated that w i is robust Mahalanobis distance for the u i vector that depends on the covariance matrix of the regression model, which is given by where W is a diagonal matrix with w ii � p i (1 − p i ) and (1 − p i ) is the probability p i of v i � 1; then, the Mahalanobis distance is given by If w i � w(u i , η i , v i ), then the estimator is the same as the linear logistic Schweppe class proposed in [13]. Here, w i depends on v i and circular u i . is estimator is called circular conditionally unbiased bounded influence function or CUBIF C estimator. [15] proposed BY methods for the linear logistic model, and in this section, we extend it to the circular case and referred to as BY C .

Circular BY Estimators. Bianco and Yohai
Let the MLE be obtained by minimizing the deviance, where By replacing the deviance function in (17) with function ρ, the robust BY C estimator is defined by where ρ is a bounded, differentiable, and nondecreasing function defined in [15] and given by where m is a positive number, ρ ′ (u) � ψ(u), and 3.5. Circular WBY Estimators. e extension of BY estimator by including weights is reducing the influence of outliers in u i space. is weighted BY (WBY) estimator is extended to the circular logistic regression model and defined as Mathematical Problems in Engineering where the weights w(u i ) are distances which are computed using the minimum covariance determinant (MCD) estimator, to be a decreasing function of robust Mahalanobis distances (RD i ), and given by (see [20])

Influence Function
Suppose two source populations with k-dimensional circular variables, which are both von Mises distribution with different means but the same concentration parameter, κ. Circular variable u i can arise from one of these populations where i � 1, . . . , k and p 1 + p 0 � 1. Let the binary variable v i indicates the source population of the corresponding u i , then Let the joint distribution of (u i , v i ) be denoted by H and T be an estimator of the circular logistic parameters. en, the influence function [21] is defined as If T � MLE, as shown in equations (7)-(9), then the influence function is an unbounded function for spaces u i and v i . Specifically, a small amount of contamination in the training data due to the presence of possible outlier in u i or v i intensively affects the MLE, as shown in the simulation section.
Suppose T � MALLOWS C , where the weights depend on the robust distance of observation u i , and robust distance RD i is equal to the Mahalanobis distance of u i to the center of the data cloud. is condition reduces the influence of outlying observations in the u i space. us, If T � CUBIF C or T � WBY C , which adds a weight to BY C , then a fully bounded influence is obtained.

Settings.
is simulation aims to compare the robustness of the proposed robust estimators and the classical MLE. e independent circular variable is generated from von Mises distribution with mean 60°and concentration parameter of κ � 1, 2, 6, 10, and 15, with sample size N � 200, 250, and 300; a large sample size is chosen to avoid separation problems. e true parameter values are β � (0, 2, 2). e simulation study is reported in a variety of situations. Initially, the data without contamination are simulated.
e robust properties of all estimators with contaminated data are examined in three different ways. First, proportions are taken from the responses, and v is chosen randomly and changed from either 0 to 1 or 1 to 0.
is process constitutes the misclassification-type error. For each contaminated case, 5%, 10%,20%, 30%, and 40% of the original data are contaminated. Second, the same proportions are taken to contaminate u with c � 0.5 for good leverage points. Finally, the same proportions are considered, and the generated data are contaminated with two types of outliers simultaneously.
is process constitutes bad leverage points.
Each simulation includes 1000 replications. e performance of the estimators is evaluated based on the bias and the median squared error (MSE) for each parameter, which are defined as follows: A good estimator has bias and MSE that are relatively small or close to zero. e simulation used the standard available "Robust" package in R to obtain the estimators and the" CircStats" package for generating the circular variable u.

Results.
e bias and MSE of the five estimators are shown in Tables 1 to 7. Table 1 shows the results for uncontaminated data (i.e., clean data), where the biases and MSEs of all five estimators are fairly close to each other. However, the MALLOWS C 's MSE is larger than the others for large concentration parameter (κ � 6, 10, and 15). Hence, MALLOWS C performs worse compared with the other estimators in this situation. Tables 2 and 3 show the results of data with misclassified errors. e bias and MSE of the MLE estimates are immediately affected by 5% misclassified type error. e results suggest that the MLE becomes biased with 5% contamination, CUBIF C with 20% contamination, and BY C with 30% contamination. MALLOWS C and WBY C are good robust estimators and considered as the best methods.
As shown in Tables 4 and 5, small differences are found between the classical MLE and the robust methods when contaminated data with good leverage points are used, where the biases and MSEs for all methods are relatively small. Hence, good leverage points do not affect the data.
e results in the case of bad leverage points change the performance of estimators intensively. As shown in Tables 6  and 7, the CUBIF C estimator can only withstand up to 5%          Figure 1 shows the circular plot of wind direction data in Hebron. Figure 2 suggests no outlier is found in the data (clean dataset). Abuzaid and ElShekh Ahmed [11] have not identified any outliers by their PMLE method and refereed that to the low concentration parameter of the wind direction (κ � 0.883). Table 8 shows the estimated parameters and estimated SE for the different estimators. As observed in Table 8, all the five estimators are fairly close to each other. us, the model of the relation between the rainfall as a binary response and   Figure 3 shows the circular plot of leaf inclination angles, and Figure 4 shows the scatter plot of the data. Six observations far from the mass of directions were identified as outliers by Abuzaid and ElShekh Ahmed [11]. Table 9 presents the parameter estimates, SE, and p values for the various robust estimators where the CUBIF C and MALLOWS C estimates are reasonably close to the MLE. However, BY C and WBY C have the smallest SE values.
erefore, BY C and WBY C estimators give the best results for this dataset. us, the model of the relation between the vegetation canopy (v) as a binary response and the leaf inclination angles (u) as a circular predictor of the considered dataset is given by logit p i � log p i 1 − p i � − 6.8431 + 6.5681 cos u i + 4.5951 sin u i .

Conclusion
is paper aimed to propose robust estimators for the circular logistic regression model. We have compared the performance of the MLE and the proposed robust estimators for the circular logistic model under clean and contaminated datasets. e findings indicate that the MLE tends to bias in the presence of misclassified error and bad leverage points. A good performance is obtained for the proposed robust estimator in estimating the parameters in such models. Some robust estimators, such as BY C and WBY C , show superiority over others depending on the type of contamination.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.