On Modeling the Earthquake Insurance Data via a New Member of the T-X Family

Heavy-tailed distributions play an important role in modeling data in actuarial and financial sciences. In this article, a new method is suggested to define new distributions suitable for modeling data with a heavy right tail. The proposed method may be named as the Z-family of distributions. For illustrative purposes, a special submodel of the proposed family, called the Z-Weibull distribution, is considered in detail to model data with a heavy right tail. The method of maximum likelihood estimation is adopted to estimate the model parameters. A brief Monte Carlo simulation study for evaluating the maximum likelihood estimators is done. Furthermore, some actuarial measures such as value at risk and tail value at risk are calculated. A simulation study based on these actuarial measures is also done. An application of the Z-Weibull model to the earthquake insurance data is presented. Based on the analyses, we observed that the proposed distribution can be used quite effectively in modeling heavy-tailed data in insurance sciences and other related fields. Finally, Bayesian analysis and performance of Gibbs sampling for the earthquake data have also been carried out.


Introduction
In a number of applied areas such as finance and actuarial sciences, data sets are most often positive, and the respective distribution is unimodal hump-shaped and skewed to right having heavier tails as compared to the well-known classical distributions. ese distributions are not much flexible to adequately model such types of heavy-tailed data sets. For example, (i) the Pareto distribution, which is frequently used to model financial data sets, does not provide a reasonable fit for many applications, for example, if we are interested in modeling only especially moderate-to-large losses altogether, then in such cases, the Pareto distribution may not be a suitable choice to use [1], and (ii) the Weibull model is capable of catering the behavior of small losses very closely, but, unfortunately, fails to provide an adequate fit to the large losses [2]. In such circumstances, the utilization of the heavy-tailed models may be a good choice to apply. For positive data, heavy-tailed distributions are those whose right-tail probabilities are greater than the exponential one [3], that is, where F(x; ξ) is the cumulative distribution function (cdf ) depending on the parameter vector ξ ∈ R. Due to the usefulness and flexibility of the heavy-tailed models in financial and actuarial practice, actuaries are always intended to propose new statistical distributions. erefore, serious attempts have been made to propose new statistical models and are still growing rapidly. e new contribution is made via different approaches such as (i) transformation of variables, (ii) composition of two or more distributions, (iii) compounding of distributions, and (iv) finite mixture of distributions, see Ahmad et al. [4]. use as they are defined on R. In such circumstances, the transformation of variable approach, especially, the exponential transformation, has demonstrated to be considerable; for details, see Azzalini et al. [9]. Bagnato and Punzo [10] showed that the transformation method of introducing new distributions is easy to use; however, most often, the inferences become complicated.
Another useful method of proposing new versatile heavy-tailed distributions, which provide the best fit to the heavy-tailed losses, is the methodology of composition, see Paula et al. [11], Klugman et al. [12], Nadarajah and Bakar [13] and Bakar et al. [14]. However, it ought to be noted that the new statistical models introduced via this approach involve more than three parameters inflicting difficulties in estimating the parameters, and more computational efforts are needed.
Another approach of introducing new distributions to cater data modeling adequately with unimodality is compounding of distributions, see Punzo et al. [15] and Mazza and Punzo [16]. Unfortunately, the density function of the distributions obtained by this approach might not have a closed-form expression that makes the estimation more complicated as shown in Punzo et al. [15]. e method of finite mixture models is another prominent approach to obtain new, very flexible models which are able to capture, for example, multimodality of the distribution under consideration, see Bernardi et al. [17], Miljkovic and Grün [18], and Punzo et al. [15]. No doubt, the distributions obtained via this approach are much flexible, but the inferences become more complicated and computationally challenging.
Furthermore, Dutta and Perry [19] performed an empirical analysis of loss distributions, and risk was estimated by different approaches such as exploratory data analysis and other empirical approaches. ese authors rejected the idea of using the exponential, gamma, and Weibull models in modeling insurance losses due to the poor results. ey concluded that one would need to use a model that is flexible enough in its structure. is motivated the researchers to search for more flexible models offering greater accuracy in fitting the heavy-tailed data.
Hence, bringing flexibility to a model by introducing additional parameter(s) is a desirable feature [20][21][22][23][24]. In a number of recent papers, serious attempts have been made to introduce a variety of new heavy-tailed distributions, see Ahmad et al. [25] and Ahmad et al. [26]. Due to the importance of statistical distributions in financial science, a new family of distributions, called the Z-family, is introduced. e proposed family is introduced via the T-X family approach [27]. To illustrate the usefulness of the proposed method, a three-parameter submodel, called the Z-Weibull distribution, is taken and studied in detail. e proposed distribution provides a better description of the earthquake insurance data with possibly heavy tails than the available (i) two-parameter distributions such as Weibull, Burr-XII (B-XII), and generalized exponential (GE), (ii) three-parameter Weibull-claim (W-claim) and exponentiated Lomax (EL), and (iii) four-parameter beta-Weibull (BW) distributions, and possibly many others. e rest of the article is structured in the following way. e proposed method is introduced in Section 2. e Z-Weibull model is considered in Section 3, and the shapes of its probability density function (pdf ) are investigated in the same section. Estimation of parameters is discussed in Section 4. In the same section, a detailed Monte Carlo simulation study is conducted. Actuarial measures of the proposed method along with a simulation study are provided in Section 5. Distribution fit to the earthquake insurance data set is discussed in Section 6. Bayesian analysis as well as Gibbs sampling procedure for the real data set is discussed in Section 7. Future frame work is discussed in Section 8. Finally, some concluding remarks are presented in Section 9.

Development of the Z-Family
Let p(t) be the pdf of a random variable, say T, where T ∈ [n 1 , n 2 ] for − ∞ ≤ n 1 , n 2 ≤ ∞, and let W[F(x; ξ)] be a function of F(x; ξ) of a random variable, say X, depending on the vector parameter ξ satisfying the conditions given in the following: Alzaatreh et al. [27] introduced a general method for generating new families of distributions called the T-X family which is defined by where W[F(x; ξ)] satisfies the conditions mentioned above. e probability density function (pdf ) corresponding to (2) is given by Deploying the T-X proposal, several new classes of distributions have been introduced in the literature [28]. Let X have the exponential distribution with the pdf given by Using θ � 1 in (4), we get On setting W[F(x; ξ)] � − log (1 − F(x; ξ))/(β F(x;ξ) ) and p(t) � e − t in (2), we define the cdf of the Z-family of distributions by where F(x; ξ) is the baseline cdf. e expression in (6) represents a wide family of univariate continuous 2 Computational Intelligence and Neuroscience distributions. Clearly, when β � 1, the cdf of the proposed family derived in (6) becomes identical to the baseline cdf. e pdf corresponding to (6) is given by e survival function (sf ) and hazard rate function (hrf ) corresponding to (6) are, respectively, given by Due to induction of the extra parameter, the Z-family provides greater distributional flexibility. e key motivations for using the Z-family in the practice are as follows: (i) A very useful and simple method of introducing an additional parameter to generalize the existing distributions (ii) To improve the characteristics and flexibility of the existing models (iii) To introduce new distributions having closed form of cdf and sf, as well as hrf (iv) To extend the existing distributions by introducing only one parameter, rather than adding two or more parameters (v) To provide the best fit to heavy-tailed insurance data sets (vi) To provide better fits than other modified models having same or higher number of parameters

The Z-Weibull Distribution
Most of the extended forms of distributions are introduced for one of the following aims: (i) an extension of the existing model to improve its characteristics, (ii) to obtain new distribution having a heavy right tail, and (iii) to introduce a model whose empirical fit is good to data. Here, we discuss the Z-Weibull distribution that can possess at least one of these aims. e Weibull random variable has the cdf and pdf given by F(x; ξ) � 1 − e − cx α and f(x; ξ) � θcx α− 1 e − cx α , respectively, where ξ � (α, c). en, the cdf of the Z-Weibull distribution has the following form: e corresponding density is given by e sf, hrf, and reversed hazard rate function (rhrf ) of the proposed model are given by respectively. Different plots for the pdf of the Z-Weibull distribution for selected parameter values are given in Figure 1

Estimation and Monte Carlo Simulation Study
Several approaches to estimate the model parameter have been introduced in the literature, but the maximum likelihood estimation method is the most commonly employed. e maximum likelihood estimators (MLEs) enjoy several desirable properties and can be used for constructing confidence intervals and regions and also in test statistics. e normal approximation for MLEs in large samples can be easily handled either analytically or numerically. So, we estimate the parameters of the Z-family of distributions from complete samples via the maximum likelihood estimation method. Furthermore, we perform a comprehensive Monte Carlo simulation study to evaluate the performance of the MLEs.

Maximum Likelihood Estimation.
In this section, we obtain the MLEs of the model parameters of the Z-family of distributions from complete samples only. Let x 1 , x 2 , . . . , x n be the observed values from the Z-family of distributions with parameters β and ξ. e total log-likelihood function corresponding to (7) is given by e partial derivatives of (12) are Computational Intelligence and Neuroscience Setting (z/zβ)ℓ(β, ξ) and (z/zξ)ℓ(β, ξ) equal to zero and solving numerically these expressions simultaneously yield the MLEs of (β, ξ).

Monte Carlo Simulation Study.
is section offers a comprehensive simulation study to assess the behavior of the MLEs. e Z-family is easily simulated by inverting (6) as follows: if U has a uniform U (0,1) distribution, then the nonlinear equation is Expression (14) can be used to simulate any special subcase of the Z-family. Here, we consider the Z-Weibull distribution to assess the behavior of the MLEs of the proposed method. We simulate the Z-Weibull distribution for two sets of parameters (set 1: α � 0.9, β � 0.3, and c � 0.5 and set 2: α � 1.3, β � 0.8, and c � 1). e simulation is performed via statistical software R through the library (rootSolve) command mle. e number of Monte Carlo replications made was 750 times. For maximizing the log-likelihood function, we use the method � "L-BFGS-B" algorithm with optim(). e evaluation of the estimators was performed via the following quantities for each sample size: the empirical mean squared errors (MSEs) are calculated using the R package from the Monte Carlo replications. e MLEs are determined for each piece of simulated data, say, (α i , c i , β i ) for i � 1, 2, . . . , 750, and the biases and MSEs are computed, respectively, by for w � α, c, β. We consider the sample sizes of n � 25, 100, 200, 300, 400, 500, 600, 700, 750. e empirical results are given in Tables 1 and 2. Corresponding to Tables 1  and 2, the simulation results are graphically displayed in Figure 1: Plots for the density function of the Z-Weibull distribution.   Tables 1 and 2 and Figures 2 and 3,  the following results are concluded: (i) Biases for all parameters are positive (ii) e parameters tend to be stable (iii) Estimated biases decrease when the sample size n increases (iv) Estimated MSEs decay toward zero when the sample size n increases

Actuarial Measures
One of the most important tasks of actuarial science institutions is to evaluate the exposure to market risk in a portfolio of instruments, which arise from changes in underlying variables such as prices of equity, interest rates, or exchange rates. In this section, we calculate some important risk measures such as value at risk (VaR) and tail value at risk (TVaR) for the proposed distribution, which play a crucial role in portfolio optimization under uncertainty.

Value at Risk.
In the context of actuarial sciences, the VaR is widely used by practitioners as a standard financial market risk measure. It is also known as the quantile risk measure or quantile premium principle. e VaR is always specified with a given degree of confidence, say q (typically 90%, 95%, or 99%), and represents the percentage loss in the portfolio value that will be equaled or exceeded only X percent of the time. VaR of a random variable X is the q th quantile of its cdf, see Artzner [29]. If X follows the proposed method, then the VaR of X is where t is the solution of β t (1 − q) + t − 1 � 0.

Tail Value at Risk.
Another important measure is TVaR, also known as conditional tail expectation (CTE) or tail conditional expectation (TCE), which is used to quantify the expected value of the loss given that an event outside a given probability level has occurred. Let X follow the Z-Weibull distribution; then, TVaR of X is derived as Using (10) in (17), we get Recall the definition of incomplete gamma function in the form Γ(α, x) � ∞ x t α− 1 e − t dt, so from (18), we get

Numerical Study of the Risk
Measures. n this section, we provide numerical study of the VaR and TVaR for the Weibull and Z-Weibull distributions for different sets of parameters. e process is described as follows: (1) Random samples of size n � 100 are generated from the Weibull and Z-Weibull models (2) e parameters have been estimated via the maximum likelihood method  Tables 3 and 4 show that the proposed Z-Weibull model has higher values of the risk measures than the traditional Weibull distribution. e simulation results are graphically displayed in Figures 4 and 5, which show that the proposed model has a heavier tail than the Weibull distribution.

Practical Illustration via the Earthquake
Insurance Data e main applications of the heavy tail models are the socalled extreme value theory or insurance loss phenomena. In this section, we consider heavy-tailed earthquake insurance data to illustrate the usefulness of the Z-Weibull model. e data are reported by the "National Centers for Environmental Information" available at https://ngdc.noaa.gov/hazard/earthqk.shtml. We compare the goodness-of-fit results of the Z-Weibull distribution with the other well-known heavy-tailed distributions.
e distribution functions of the competitive models are as follows: (i) W-claim: (ii) Weibull: (iii) B-XII: (iv) GE: (v) EL: (vi) BW: Next, we consider certain analytical measures in order to verify which distribution fits better the considered data. ese analytical measures include (i) discrimination measures, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Hannan-Quinn information criterion (HQIC), and consistent Akaike information criterion (CAIC), and (ii) two other goodness-of-fit measures including Anderson-Darling (AD) test statistic and − 2ℓ. e discrimination measures are given as follows.
(vii) Akaike information criterion is given by (viii) Consistent Akaike information criterion is given by (ix) Bayesian information criterion is given by

(x) Hannan-Quinn information criterion is given by
where ℓ denotes the log-likelihood function, k is the number of model parameters, and n is the sample size.  Computational Intelligence and Neuroscience 9 (xi) e AD test statistic is given by where n is the sample size and x i is the i th observation in the sample, calculated when the data are sorted in the ascending order.
All the computations are carried out using the optim() R-function with the argument method � "BFGS" (see Appendix). A model with lowest values for these measures could be chosen as the best model to fit the data. e values of MLEs of the parameters along with standard errors in parenthesis are presented in Table 5, whereas the discrimination measures are displayed in Table 6. e AD statistic and − 2ℓ are provided in Table 7. Based on the considered data set, we have observed that the Z-Weibull distribution is the best fitted model among the above considered models.
As we can see, the results show that the Z-Weibull distribution provides a better fit than the other competitors. Hence, the proposed model can be used as a best  Table 3.  Table 4. candidate model for modeling insurance data sets. Furthermore, in support of Tables 6 and 7, the estimated cdf and pdf are plotted in Figure 6. e Kaplan-Meier survival plot and PP plot are sketched in Figure 7, whereas the QQ plot of the proposed distribution and box plot of the earthquake data are presented in Figure 8. From the estimated pdf in Figure 6, it is clear that the proposed distribution provides an adequate fit to the heavy-tailed earthquake data. From Figures 6 and 7, we can easily detect that the proposed distribution fits the estimated cdf and Kaplan-Meier survival plots very closely. e PP and QQ plots which serve as a tool for graphical display of the analytical measures show that the Z-Weibull distribution provides the best fit to real data. Finally, the box plot ( Figure 8) of the data is graphical evidence that the data possess tail skewed to the right. Furthermore, using the earthquake insurance data, we obtained the values of the Kolmogorov-Smirnov (KS) statistic of the proposed and other competing models. en, we applied the parametric bootstrap technique [30] and bootstrapped the p value for all the competing models.
e KS statistic and the corresponding bootstrapped p value are provided in Table 8. Based on the results provided in Table 8, we conclude that the proposed model is the best candidate model among the competing distributions for modeling the insurance claim data.

Bayesian Estimation
Bayesian inference procedure has been taken into consideration by many statistical researchers, especially those in the field of survival analysis and reliability engineering. In this section, complete sample data are analyzed through the Bayesian point of view. We assume that the parameters α, c, and β of the Z-Weibull distribution have independent prior distributions as follows: Gamma(a, b), where a, b, c, d, e, and f are positive. More about choosing gamma priors, refer Kundu and Howlader [31], S. Dey and T. Dey [32], Dey et al. [33], and Dey et al. [34]. Hence, the joint prior density function is formulated as follows: In the Bayesian estimation, the actual value of the parameter may be adversely affected by the loss when choosing an estimator. is loss can be measured by a function of the parameter and the corresponding estimator. Five well-   known loss functions and associated Bayesian estimators and corresponding posterior risk are presented in Table 9. For more details, see Calabria and Pulcini [35] and Dey et al. [36]. Next, we provide the posterior probability distribution for a complete data set. We define the function φ as e joint posterior distribution in terms of a given likelihood function L(data) and joint prior distribution π(α, c, β) is defined as π * (α, c, β | data)απ(α, c, β)L(data).
Hence, the joint posterior density of parameters α, c, and β for complete sample data is obtained by combining the likelihood function and joint prior density (32). erefore, the joint posterior density function is given by where Moreover, the marginal posterior density of α, c, and β, assuming that Θ � (α, c, β), is given by where i, j, k � 1, 2, 3, i ≠ j ≠ k, and also Θ i is the i-th member of vector Θ. From (35) and (37), it is clear that there is no closed form for the Bayesian estimators under the five loss functions described in Table 8. erefore, we use the MCMC procedure based on 10,000 replicates to compute Bayesian estimators.
Because of intractable integrals associated with joint posterior and marginal posterior distributions, one needs to use numerical software to solve the integral equations numerically. e two most popular MCMC methods are the Metropolis-Hastings algorithm [37,38] and the Gibbs sampling [39]. Gibbs sampling is a special case of the Metropolis-Hastings algorithm which generates a Markov chain by sampling from the full set of conditional distributions. Often, Bayesian inference requires computing intractable integrals to generate posterior samples. In practice, simulations related to Gibbs sampling are conducted through special software WinBUGS. WinBUGS software was developed in 1997 to simulate data of complex posterior distributions, where analytical or numerical integration techniques cannot be applied. One may also use OpenBUGS software, which is an open-source version of WinBUGS.
Using Gibbs sampling, we obtain samples from the joint posterior distribution and then use OpenBUGS software to carry out the Bayesian analysis. e process is described as follows: (i) Gibbs sampling technique is used to generate posterior samples (ii) 10,000 replicates are made to compute the Bayesian estimators via OpenBUGS software (iii) e idea of Congdon [40] was to implement and choose a � b � c � d � e � f � 0.0001 as we do not have any prior information about hyperparameters e corresponding Bayesian estimates and posterior risk are provided in Table 10. Table 11 provides 95% credible and HPD intervals for each parameter of the Z-Weibull distribution. Moreover, we provide the posterior summary plots in  ese plots confirm that the sampling process is of the prime quality, and the convergence does occur.

Discussion and Future Framework
Statistical decision theory addresses the state of uncertainty and provides a rational framework for dealing with problems of actuarial and financial decision-making. e insurance data sets are generally skewed to the right and heavy-tailed. e traditional distributions are not flexible enough to counter complex forms of data such as insurance science data.
Due to the importance of statistical distributions in actuarial sciences, a number of papers have been appeared in the literature aiming to improve the characteristics of the existing distributions. Although this has been achieved, unfortunately, the numbers of parameters have been increased, and the estimation of parameters and derivation of mathematical problems become complicated.
To provide a better description of the insurance science data, therefore, in this study, an attempt has been made to introduce a new family of statistical distributions aiming to increase the flexibility of the existing distributions. A special submodel of the proposed family offers the best fitting to the heavy-tailed insurance science data. e maximum likelihood method is adopted to estimate the model parameters, and a comprehensive Monte Carlo simulation study is done to evaluate the behavior of the estimators.
To show the usefulness of the proposed method in insurance sciences, a real-life application of the earthquake insurance data is discussed. Analyzing the data set, it showed that the proposed model performs much better than the other competitive distributions.
From the above discussion, it is obvious that the researchers are always in search of new flexible distributions.
erefore, to bring further flexibility in the proposed model, we suggest to introduce its extended versions. e proposed method can further be extended by introducing a shape parameter to the model.
(i) A random variable X is said to follow the extended version of the Z-family if its cdf is given by where a is the additional shape parameter. For a � 1, expression (38) reduces to (6). e new proposal may be named as the exponentiated Z-family. For the illustrative purposes, one may consider its special subcase, called the exponentiated Z-Weibull (EZ-Weibull) distribution, defined by the cdf Due to the introduction of the additional shape parameter, the suggested extension may be much flexible in modeling data in insurance sciences and other related fields. (ii) Another extension of the Z-family is given by where θ is the additional shape parameter. For θ � 1, expression (40) reduces to (6). e model defined in (40) may be named as the extended Z-family.
where a and θ are the additional shape parameters. Clearly, for a � 1, expression (41) reduces to (40), and for θ � 1, expression (41) reduces to (38). However, for a � θ � 1, expression (41) reduces to (6). e model introduced in (41) may be named as the extended exponentiated Z-family. (iv) Another generalized version of the new extended Z-family can be introduced via (42)

Concluding Remarks
A variety of methods for proposing new heavy-tailed distributions have been developed to model data related to financial and actuarial sciences. We carried out this area of research further and introduced a new heavy-tailed distribution family. Some distributional properties are derived, and the method of maximum likelihood estimation is discussed to estimate the model parameters. In addition to distributional properties, some actuarial properties are also derived. Based on the actuarial measures, a comprehensive simulation study is conducted. We focused our concentration on a three-parameter special model called the Z-Weibull distribution. To prove the potential and usefulness of the Z-Weibull distribution, earthquake insurance data are analyzed, and its comparison is made with the other well-known distributions. While analyzing the earthquake insurance data, it is observed that the proposed model performs better than the other competitive models. Bayesian analysis using the earthquake data is also provided. Finally, some new extensions are also suggested which may further improve the characteristics of the proposed family.

Data Availability
is work is mainly a methodological development and has been applied on secondary data related to the earthquake insurance data, but if required, data will be provided.

Disclosure
is article is drafted from the PhD work of the first author (Zubair Ahmad).

Conflicts of Interest
e authors declare no conflicts of interest.