The Arcsine Exponentiated-X Family: Validation and Insurance Application

In this paper, we propose a family of heavy tailed distributions, by incorporating a trigonometric function called the arcsine exponentiated-X family of distributions. Based on the proposed approach, a three-parameter extension of the Weibull distribution called the arcsine exponentiated-Weibull (ASE-W) distribution is studied in detail. Maximum likelihood is used to estimate the model parameters, and its performance is evaluated by two simulation studies. Actuarial measures including Value at Risk and Tail Value at Risk are derived for the ASE-W distribution. Furthermore, a numerical study of these measures is conducted proving that the proposed ASE-W distribution has a heavier tail than the baseline Weibull distribution. .ese actuarial measures are also estimated from insurance claims real data for the ASE-W and other competing distributions. .e usefulness and flexibility of the proposed model is proved by analyzing a real-life heavy tailed insurance claims data. We construct a modified chi-squared goodness-of-fit test based on the Nikulin–Rao–Robson statistic to verify the validity of the proposed ASE-W model. .e modified test shows that the ASE-W model can be used as a good candidate for analyzing heavy tailed insurance claims data.


Introduction
Heavy tailed distributions play a significant role in modeling data in applied sciences, particularly in risk management, banking, economics, financial, and actuarial sciences. However, the quality of the procedures primarily depends upon the assumed probability model of the phenomenon under consideration. Among the applied fields, the insurance datasets are usually positive [1], right-skewed [2], unimodal shaped [3], and with heavy tails [4]. Right-skewed data may be adequately modeled by the skewed distributions [5]. erefore, a number of unimodal positively skewed parametric distributions have been employed to model such datasets [6,7]. e heavy tailed distributions are those whose right tail probabilities are heavier than the exponential one, that is, where F(x) is the cdf of a baseline distribution. More information can be explored in Resnick [8] and Beirlant et al. [9]. Dutta and Perry [10] performed an empirical analysis of loss distributions to estimate the risk via different approaches. ey rejected the idea of using the exponential, gamma, and Weibull models because of their poor results and concluded that one would need to use a model that is flexible enough in its structure. ese results encouraged the researchers to propose new flexible models providing greater accuracy in data fitting. erefore, a number of approaches have been proposed to obtain new distributions with heavier tails than the exponential distribution, such as (i) transformation method [11,12], (ii) composition of two or more distributions [13], (iii) compounding of distributions [14,15] , and (iv) finite mixture of distributions [16,17]. e abovementioned approaches are very useful in deriving new flexible distributions; however, they are still subject to some sort of deficiencies, for example, (i) the transformation approach is simple to apply, but its inferences become difficult and many computational work is required to derive the distributional characteristics [18]. (ii) e approach of composition of two or more distributions using a fixed or a priori known mixing weights, and hence they can be very restrictive [19]. To overcome this problem, Scollnik [20] used unrestricted mixing weights. (iii) e density obtained by the compounding approach may not always have a closed form expression which makes the estimation more cumbersome [21]. (iv) Finite mixture models represent a further approach to define very flexible distributions which are also able to capture, for instance, multimodality of the underlying distribution. e price to pay for this greater flexibility is a more complicated and computationally challenging inference [22].
To overcome the problems associated with the above former methods, many authors have proposed new families of distributions, see, for example, Al-Mofleh [23], Jamal and Nasir [24] and Nasir et al. [25], Ahmad et al. [26], Afify et al. [27], Cordeiro et al. [28], Ahmad et al. [29], Afify and Alizadeh [30], and among many others. erefore, bringing flexibility to the existing distributions by adding additional parameter(s) is a desirable feature and an interesting research topic.
In this regard, Mudholkar and Srivastava [31] introduced the exponentiated family of distributions by adding a shape parameter to obtain more flexible version of the existing distributions. A random variable X is said to follow the exponentiated family, if its cumulative distribution function (cdf ) is given by where F(x; ξ) is the cdf of the baseline distribution depending on the parameter vector ξ and a > 0 is an additional shape parameter. Using equation (2), the exponentiated versions of the existing distributions have been proposed in the literature. Furthermore, Cordeiro and de Castro [32] proposed another approach known as the Kumaraswamy-generalized (Ku-G) family by adding two additional shape parameters. e cdf of the Ku-G family is From equation (3), it is clear that, for b � 1, the Ku-G family reduces to the exponentiated family. For a contributed work based on equation (3), we refer to Ahmad et al. [33], Mead and Afify [34], Afify et al. [35], and Mansour et al. [36].
In this paper, we enrich the branch of distribution theory by introducing the heavy tailed arcsine exponentiated-X (ASE-X) family of distributions. A random variable X belongs to the proposed ASE-X family if its cdf is where F(x; ξ) is the baseline cdf with a parameter vector ξ and an additional shape parameter a. e probability density function (pdf ) corresponding to equation (4) is given by e new pdf is most tractable when F(x; ξ) and f(x; ξ) have simple analytical expressions. Henceforth, a random variable X with pdf equation (5) is denoted by X ∼ ASE − X(x; a, ξ). Moreover, the key motivations for using the ASE-X family in practice are the following: (i) To improve the characteristics and flexibility of the existing distributions, the special models of this family can provide left-skewed, right-skewed, unimodal, reversed J-shaped and symmetric densities, and decreasing and increasing, bathtub, upside down bathtub, and reversed-J hazard rates (See Figures 1 and 2) (ii) A very simple and convenient method of adding an additional parameter provide extended heavy tailed distributions which are very useful in modeling data form the insurance field (see Sections 6 and 7) (iii) To introduce the extended version of a baseline distribution with closed forms for the cdf and hazard rate function (hrf ), the special submodels of this family can be used in analyzing censored datasets (iv) e special cases of the ASE-X approach is capable of modeling heavy tailed datasets in actuarial science as compared with existing competing models (see Sections 6 and 7).
Using the new cdf in equation (4), a number of new flexible distributions can be obtained. Some new contributed models based on the ASE-X approach are presented in Table 1. e survival function (sf ) and hrf of the proposed family are, respectively, given by e paper is outlined as follows. In Section 2, we define the ASE-W distribution and present some plots for its density and hazard functions. We provide some mathematical properties of the ASE-X distribution in Section 3. e maximum likelihood estimators (MLEs) of the model parameters are obtained in Section 4. Two Monte Carlo simulation studies to assess the performance of the MLEs are discussed in Section 5. In Section 6, we derive two important risk measures called value at risk and tail value at risk of the ASE-W distribution and perform a simulation study to prove that the ASE-W distribution has a heavier tail than the baseline Weibull distribution. In Section 7, the ASE-W distribution is applied to a real heavy tailed insurance claims data to illustrate its potentiality. Furthermore, the value at risk and tail value at risk measures are estimated for all competing models based on the insurance claims data. A modified goodness-of-fit test using a Nikulin-Rao-Robson statistic test is presented in Section 8. Finally, in Section 9, we provide some concluding remarks.

The ASE-W Distribution
In this section, we introduce the ASE-W distribution and investigate the behavior of its density and hazard functions, for selected values of the parameters. Consider the cdf of the two-parameter Weibull distribution, en, a random variable X is said to follow the ASE-W distribution if its cdf takes the form e pdf associated of equation (7) has the form For α � 1, the ASE-W distribution reduces to the ASEexponential distribution with parameter c, and for α � 2, it reduces to the ASE-Rayleigh distribution with parameter c.
For different values of the model parameters, plots of the pdf and hrf of the ASE-W distribution are sketched in

Basic Mathematical Properties
In this section, some statistical properties of the ASE-X family are derived.

Quantile Function.
Let X be the ASE-X random variable with pdf equation (5), the quantile function (qf ) of X, say Q (u), reduces to where u has the uniform distribution on the interval (0, 1). From the expression in equation (9), it is clear that the ASE-X family has a closed form solution of its quantile function which makes generating random numbers very simple. e qf of the ASE-W model follows as

Moments.
Moments are very important and play an essential role in statistical analysis. ey help to capture important features and characteristics of the distribution (e.g., central tendency, dispersion, skewness, and kurtosis).
e r th moment of the ASE-X family is Substituting equation (5) in equation (11), we obtain Using the binomial expansion, we have By replacing x with F(x; ξ) a , in equation (13), we obtain By inserting equation (14) in equation (12), we obtain . e moment generating function of the ASE-X class has the form e effects of different values of the parameters α and a on the mean, variance, skewness, and kurtosis of the ASE-W distribution with c � 1 are illustrated in Figures 3 and 4.

Maximum Likelihood Estimation
In this section, we consider the estimation of the unknown parameters of the ASE-X distribution from complete samples only via the maximum likelihood. Let X 1 , X 2 , . . . , X n be a random sample from the ASE-X family with observed values e log-likelihood function can be maximized either directly by using the R (AdequecyModel package), SAS (PROC NLMIXED), or the Ox program (sub-routine MaxBFGS) or by solving the nonlinear likelihood equations which are obtained by differentiating equation (17) as follows: e log-likelihood function for the ASE-W model reduces to e nonlinear likelihood equations can be obtained by differentiating the last equation as follows:    Complexity

Monte Carlo Simulation Study.
In this section, we perform a comprehensive simulation study to access the behavior of MLEs of the ASE-W parameters. e random number generation is obtained via the inverse cdf. e inverse process and results of MLEs are obtained using optim() R-function with the argument method L-BFGS-B. We generate N � 1000 samples of size n � 25, 100, 300, 600, 900, 1000 from the ASE-W distribution with true parameter values. In this simulation study, we empirically calculate the mean, bias, and mean square error (MSE) of the MLEs for different parameters combinations and each sample.
Coverage probabilities (CPs) are also calculated at the 95% confidence interval (C.I.). e simulation results are provided in Tables 2 and 3. Based on the generated data listed in Tables 2 and 3, the MLEs seem to behave as we expect, that is, the MSE values and the estimated biases decrease as n increases. Furthermore, the mean values of estimates tend to the true values as n increases, showing the consistency property of the MLEs.

Simulations Using the Barzilai-Borwein Algorithm.
In this section, we provide the results of a simulation study for the ASE-W distribution using the Barzilai-Borwein (BB) algorithm [37]. Initial values for the parameters (α � 1.6, c � 0.6, and a � 1.9) are selected and random sample of sizes n � 50, 100, 200, and 400 are obtained. Repetitions are made 10,000 times and the averages of the simulated values of the MLEs (α, c, a) along with their MSEs are calculated. e simulation results are provided in Table 4.
From the simulation results provided in Table 4, we can see that the maximum likelihood estimates of the ASE-W parameters are convergent. e graphical sketching of the maximum likelihood estimates of the ASE-W parameters is provided in Figure 5.
From Figure 5, it is clear that all the parameters estimates of the ASE-W distribution converge faster than n − 0.5 . erefore, we conclude that the MLEs of the ASE-W parameters are � n √ consistent.

Actuarial Measures
One of the most important tasks of financial and actuarial sciences institutions is to evaluate the exposure to market risk in a portfolio of instruments, which arise from changes in underlying variables such as prices of equity, interest rates, or exchange rates. In this section, we derive some important risk measures including value at risk (VaR) and tail value at risk (TVaR) of the ASE-W distribution which play a crucial role in portfolio optimization under uncertainty.

Value at Risk.
e VaR is widely used by practitioners as a standard financial market risk measure. It is also called the quantile premium principle or quantile risk measure. e VaR is always specified with a given degree of confidence say q (typically 90%, 95% or 99%), and it represents the percentage loss in portfolio value that will be equaled or exceeded only X percent of the time. e VaR of a random variable X is the qth quantile of its cdf [38]. Hence, the VaR of the ASE-W distribution is defined as 6.2. Tail Value at Risk. Another important measure is TVaR which can be used to quantify the expected value of the loss given that an event outside a given probability level has occurred. If X follows the ASE-W distribution, then its TVaR can be defined as Substituting equation (8) in equation (22), we obtain Finally, the TVaR of the ASE-W model takes the form Complexity

Numerical Study of the Actuarial Measures.
In this section, we provide some numerical results for the VaR and TVaR for the Weibull and ASE-W distributions for different sets of parameters. e process is described below: (i) Random sample of size n � 150 are generated from the Weibull and ASE-W distributions and parameters have been estimated via the maximum likelihood method.
(ii) 1000 repetitions are made to calculate the VaR and TVaR of the two distributions.
e simulation results of the VaR and TVaR for the Weibull and ASE-W models are provided in Tables 5 and 6. Furthermore, the results in these tables are depicted graphically in Figures 6 and 7, respectively. e simulation is performed for the Weibull and ASE-W distributions for selected values of their parameters. A model with higher values of VaR and TVaR is said to have a heavier tail. e simulated results in Tables 5 and 6 and the plots in Figures 6 and 7 show that the proposed ASE-W model has higher values of these risk measures than the Weibull model. Hence, the proposed ASE-W model has a heavier tail than the Weibull distribution and can be used effectively to model heavy tailed insurance data.

Modeling Heavy Tailed Insurance Claims Data
In this section, we demonstrate the flexibility of the ASE-W distribution by using heavy tailed insurance claims data. Furthermore, we calculate the actuarial measures of the (2) EE distribution: (3) EW distribution:  Figure 6: Graphical sketching of the VaR and TVaR using the results in Table 5. 10 Complexity (4) EL distribution: (5) Ku-W distribution: (6) BW distribution: (7) NWB-XII distribution: e competing models can be compared using some discrimination measures called  Table 7 gives the MLEs and their standard errors. e analytical measures are provided in Tables 8 and 9. e results in these tables indicate that the ASE-W distribution provides better fits than other competing models and could be chosen as an adequate model to analyze the heavy tailed insurance claims data. Figure 8 displays the fitted pdf and cdf of the proposed distribution which shows that the ASE-W fits the rightskewed heavy tailed distribution very well. e probabilityprobability (PP) plot and and Kaplan-Meier survival plots are sketched in Figure 9.

Estimating of VaR and TVaR Measures Using the Insurance Claims Data.
In this section, we compute the VaR and TVaR measures of the ASE-W and other competing distributions using the estimated values of the parameters using the insurance claims data. e numerical results for all fitted distributions are reported in Table 10.
e results in Table 10 are displayed graphically in Figure 10.
As we have mentioned earlier that a distribution with higher values of the risk measures is said to has a heavier tail. e values in Table 10 and Figure 10 illustrate that the ASE-W distribution has the highest values of VaR and TVaR among all competing models, proving that it has a heavier tail than other competitors for insurance claims data.  Table 6.       Figure 10: Graphical sketching of the VaR and TVaR using the results in Table 10 for insurance claims data.

Validation of the ASE-W Distribution
Goodness-of-fit tests indicate whether or not it is reasonable to assume that a random sample comes from a specific distribution. Statistical techniques often rely on observations obtained from a population that has a distribution of a specific form. Selection of a suitable model in all types of statistical analysis is of a great importance. For this purpose a lot of goodness-of-fit tests are proposed by some researchers. Nikulin [39,40] proposed a modification in the standard chi-squared Pearson's test for a continuous distribution. Rao and Robson [41] obtained the same result for the exponential family, and later this statistic is well adapted by some researchers with the name as Rao-Robson-Nikulin (RRN) test.
In this section, we use another goodness-of-fit test to show the validity of the ASE-W distribution for heavy tailed insurance data. For this purpose, we use the NRR test statistic to show the utility of the ASE-W distribution in insurance and financial sciences.

Nikulin-Rao-Robson Test Statistic.
So far in the literature, a number of methods have been proposed to verify the adequacy and goodness-of-fit of the statistical models to data. Since the seventies of the last century, researchers have shown a deep interest to propose new modifications of goodness-of-fit test. In this regard, Nikulin [42] and Rao and Robson [41] separately proposed a modification of the Pearson statistic for complete data known as Nikulin-Rao-Robson (NRR) statistic. To test the hypothesis H 0 , where ξ represents the vector of unknown parameters, the NRR statistic is denoted by Y 2 , and it is defined as follows.
Suppose observations X 1 , X 2 , . . . , X n are grouped in r subintervals I 1 , I 2 , . . . , I r , mutually disjoint: e limits a j of the intervals I j are obtained such that where a j (j�1,...,r− 1) � G − 1 j r . If is the vector of frequencies obtained by the grouping of data in these I j intervals, e NRR statistic is given by where and J(ξ) is the information matrix for the grouped data defined by with B(ξ)| (i�1,2,...,r and k�1,...,s) zμ r×s , where I n (ξ n ) represents the estimated Fisher information matrix and ξ n is the MLE of the parameter vector. e Y 2 statistic follows a chi square χ 2 distribution with (r − 1) degrees of freedom.

Modified Chi-Squared Test for the ASE-W Distribution.
A modified chi-squared goodness-of-fit test is constructed by fitting the Y 2 statistic developed in the previous section to verify if a sample X � (X 1 , X 2 , . . . , X n ) T is distributed according to the ASE-W model, P X i ≤ x � G ASE− W (x, ξ), with unknown parameters ξ � (α, c, a) T .
e MLEs ξ n of the unknown parameters of the ASE-W distribution are computed using the insurance claims data. e statistic Y 2 does not depend on the parameters, we can, therefore, use the estimated Fisher information matrix I n (ξ n ).
To test the null hypothesis H 0 that the insurance claims data came from the ASE-W distribution, we use the Y 2 statistic. To conduct the analysis, we use the BB algorithm in R software to compute the maximum likelihood estimates given by α � e value of the NRR statistic is given by Y 2 � 26.524781, whereas the critical value is χ 2 0.05 (23 − 1) � 33.92444. We can see that the value of Y 2 statistic is less than the critical value. erefore, we conclude that the insurance claims data follow the ASE-W model.

Simulation Study of the ASE-W Distribution Using Y 2
Statistic. To test the null hypothesis H 0 that the sample comes from the ASE-W model, we calculate Y 2 for 10, 000 simulated samples with sample sizes n � 50, n � 100, n � 200, and n � 400, respectively. For different significance levels (ε � 0.01, 0.02, 0.05, 0.1), we calculate the average of the nonrejections of the null hypothesis, i.e., Y 2 ≤ χ 2 ε (r − 1). We present the results of the corresponding empirical and theoretical levels in Table 11. As can be shown, the values of the empirical levels calculated are very close to those of their corresponding theoretical levels. us, we conclude that the proposed test provides a good fit to the ASE-W distribution.

Simulated Distribution of the Y 2 Statistic for the ASE-W Model.
e Y 2 statistic follows in the limit chi-squared distribution with k � r − 1 degrees of freedom. For demonstrating this fact, we compute N � 10, 000 times the simulated distribution of Y 2 (ξ) under the null hypothesis H 0 with different values of parameters and r � 10 intervals. We sketch the plots of the chi-squared distribution with k � r − 1 � 9 degree of freedom to see the visual representation. e histograms of the Y 2 statistic versus the chi-squared distribution with k � 9 degree of freedom are presented in Figures 11 and 12.
From Figures 11 and 12, we observe that the distribution of Y 2 with different values of parameters and different  numbers k of grouping cells for different number of equiprobable grouping intervals and different values of parameters in the limit follows a chi-squared distribution with k degrees of freedom within the statistical errors of simulation. erefore, we can say that the limiting distribution of the generalized chi-squared Y 2 statistic for ASE-W model is distribution free.

Concluding Remarks
In this paper, we used the trigonometric function to introduce a new family of heavy tailed distributions called the arcsine exponentiated-X (ASE-X) family of distributions. e ASE-X is very interesting and provides better fits to the heavy tailed insurance data. We define a special submodel called ASE-Weibull (ASE-W) distribution. e maximum likelihood is used to estimate the ASE-W parameters. e simulation results are obtained using the inversion and Barzilai-Borwein algorithms, assessing the performance of the maximum likelihood estimators. We derive two important risk measures called value at risk and tail value at risk of the ASE-W distribution and perform a simulation study to prove that the ASE-W distribution has a heavier tail than the baseline Weibull distribution. A heavy tailed insurance dataset is analyzed showing that the ASE-W distribution provides better fits than some other competing models. Furthermore, the value at risk and tail value at risk measures are estimated for all competing models based on the insurance claims data, proving that the ASE-W distribution performs well than other its competitors. Furthermore, we construct a modified chi-squared goodness-of-fit test statistic for the ASE-W distribution, based on the NRR statistic, to show its validity in modeling financial data. e special cases of Table 1 can be studied in future work. Furthermore, different classical and Bayesian methods can be employed to estimate the unknown parameters of these special submodels.

Data Availability
is work is mainly a methodological development and has been applied on secondary data related to the insurance science data, but if required, data will be provided.

Conflicts of Interest
e authors declare that they have no conflicts of interest.