On the Usefulness of the Logarithmic Skew Normal Distribution for Describing Claims Size Data

In this paper, the three-parameter skew lognormal distribution is proposed to model actuarial data concerning losses. This distribution yields a satisfactory fit to empirical data in the whole range of the empirical distribution as compared to other distributions used in the actuarial statistics literature. To the best of our knowledge, this distribution has not been used in insurance context and it might be suitable for computing reinsurance premiums in situations where the right tail of the empirical distribution plays an important role. Furthermore, a regression model can be simply derived to explain the response variable as a function of a set of explanatory variables.


Introduction
e modelling of large claims is a topic of relevant importance in general insurance and reinsurance, particularly in the field of mathematical risk theory, see, for instance, McNeil [1]; Beirlant and Teugels [2]; Beirlant et al. [3]; and Embrechts et al. [4]. For a comprehensive study about reinsurance, see the recent book of Albrecher et al. [5]. It is our interest to find simple statistical distributions appropriate for modelling both, smaller and medium-size losses with a high frequency and large losses with a low frequency. is is an issue of particular interest in the context of reinsurance and premium calculation principles. In this sense, the classical Pareto distribution [6][7][8] has been traditionally considered as a suitable claims' size distribution in relation to rating problems. Concerning this, the single parameter Pareto distribution not only has nice statistical properties but also provides a good description of the random behaviour of large losses (e.g., the right tail of the distribution). Particularly, when calculating deductibles and excess-of-loss levels of reinsurance, the simple Pareto distribution has been proved to be convenient (see, for example, [9,10]). As an alternative to the classical Pareto distribution, other models have been recently introduced in the actuarial literature by Sarabia et al. [11][12][13] and Ghitany et al. [14].
On the contrary, there exist many situations where the empirical data show slight or marked asymmetry.
is is frequently the case, for example, with actuarial and financial data that also have heavy tails reflecting the existence of extreme values. ese two features imply that the data cannot be adequately modelled by the Gaussian or normal distribution.
Let g and G, respectively, be the probability density function (pdf ) and the cumulative distribution function (cdf ) of a symmetric distribution. A random variable Z is said to have a skew distribution if its pdf is given by (1) is family of distributions has been widely studied as an extension of the normal distribution via a shape parameter, c, that accounts for the skewness of the model. When g(·) and G(·) are replaced in (1) by ϕ(z) and Φ(z), i.e., the standard normal density and distribution function, respectively, the resulting model is called the skew normal distribution.
In this paper, special attention is paid to the generalized skew normal density provided in Henze [15] and also studied by Arnold and Beaver [16]. Its pdf is given by For multivariate extensions, see for instance Azzalini and Valle [17], Azzalini and Capitanio [18], and Arnold and Beaver [16]. For an exhaustive and comprehensive study of the skewnormal distribution, see the recent book of Azzalini [19].
is paper is organized as follows. In Section 2, the proposed distribution is studied and some interesting properties are given. Numerical applications are provided in Section 3, and conclusions are drawn in Section 4.

Modelling the Size of Losses
e basic skew lognormal distribution has been studied by Lin and Stoyanov [20] (see also [19]; Chap. 2, [21][22][23]). Nevertheless, in this paper, we will pay special attention to the distribution arising from exponentiation of (2) in the following sense: if X is a random variable with density (2), X ∼ f Z , we consider a new random variable Y ∼ exp(X). Also, if taking c 0 � c 1 � λ and using a linear transformation, we allow for more flexible location and scale parameters in our model. us, we get the pdf given by Here, λ ∈ R, μ ∈ R, and σ > 0. Observe that when λ � 0 expression (3) reduces to the classical lognormal distribution. It can be seen that the parameter λ regulates the shape of the distribution. Furthermore, Lin and Stoyanov [20] established that the distribution has heavy tails and therefore it is a suitable distribution to be incorporated to the wide catalogue of heavy tails (see [5]). Additionally Lin and Stoyanov [20] provided a stochastic representation of this distribution and determined that the distribution can be obtained as the product of two independent random variables, one of which is lognormal and the other one a logarithmic half-normal. Simple calculations show that density (3) can be also written as It is straightforward to show that the distribution is unimodal with the mode satisfying the equation: e mean value and the second order moment of the random variable X with density (3), X ∼ LSN(μ, σ, λ), are given by Let F μ,σ,λ (x) � Pr[X ≤ x] denote the cdf of X ∼ LSN(μ, σ, λ). By applying directly result B.21 in Azzalini [19], it is not difficult to observe that this cdf is given by which is satisfied whenever λ ≠ 0 and x > 0. Here, T(x, a) represents Owen's function (see [24]) given by If λ � 0, we deal with the standard normal distribution, and T(x, 0) � 0. e latter function can be easily computed by using the Mathematica software package. e Acceptance-Rejection method of simulation can be used to generate random variates from (3). We begin by simulating a value from a lognormal distribution with parameters μ and σ. en, having chosen an alternative random variable that follows a lognormal probability distribution to simulate from, we define a constant c in the following way: where f μ,σ,λ (x) is given by (3) and g(x) is the standard lognormal density. e algorithm for simulating a random variate from the LSN distribution is as follows: (1) Generate a random variate from the lognormal distribution, x 1 . (2) Generate a random number u 1 .
, then set the simulated value from (3) equal to x 1 . Otherwise, return to step 1.
By writing μ as pdf (3) has mean value equal to θ > 0 and variance provided by where erefore, a regression model can be derived from the LSN distribution. In this model, θ is the mean of the response variable and ψ 2 (σ, λ) can be interpreted as a precision parameter in a way such that, for fixed θ, the larger the value of ψ 2 (σ, λ), the larger the variance of X.
Let us now consider that the random variable X i is related to a vector of k covariates y i � (y 1i , . . . , y ki ) ′ associated with the ith observation, where i � 1, 2 . . . , n.
is vector assigns a weight of observable features is related to θ i through a log link function and β � (β 1 , . . . , β k ) ′ is a vector of regressors with β j ∈ R for j � 1, . . . , k. e regression model takes the form Observe that log link ensures that θ i falls within the interval (0, ∞).
On the contrary, sometimes it is desirable to deal with a lower value in the range of X by shifting this random variable, say X − α with α > 0, and therefore a shifted version of the previous distribution is obtained. In this case, it is convenient to work with the pdf given by where Figure 1 shows pdf (15) as compared to the Pareto and PAT (Pareto ArcTan distribution) (this generalization of the Pareto distribution was proposed by Gómez-Déniz and Calderín-Ojeda [13]), all of them with the same mean, say (22), and a threshold of α � 1. It is discernible that the tail of the LSN density seems larger than those ones of the other models.
By taking it is guaranteed that (15) has mean equal to θ > α. In this case, for the associated regression model, we will use the link function: which guarantees that θ i falls within the interval [α, ∞). It is already known (see Lemma 2 in [25]) that en, it is straightforward to see that if λ < 0, we have is can be used to easily obtain asymptotic values of the pdf given in (15). In fact, if λ < 0, we have the following approximation:

Empirical Illustrations
In this section, the versatility of the proposed skew lognormal model, as compared with the Pareto distribution and the Pareto ArcTan Distribution [13], is tested using three datasets. e first dataset is the danishuni that can be found in the R package CASdatasets collected at Copenhagen Reinsurance and comprises 2167 fire losses over the period 1980 to 1990, adjusted for inflation to reflect 1985 values and  Table 1.
Parameter estimation for all the models considered in this paper has been carried out by the method of maximum likelihood using Mathematica v.12.0 ® and also confirmed by using WinRATS v.7.0. Both moments and the maximum likelihood methods are suitable to estimate the vector of parameters of the distribution via sample observations, as shown in Appendix A and B. Codes are available from the authors upon request. For details about software, see Ruskeepaa [26] and Brooks [27].
Since closed expressions are not available for the maximum likelihood estimates and the computation of the global maximum of the logarithm function of the likelihood is not guaranteed, thus it is advisable to use several seed points as the starting value. It is also prudent to use different optimization methods (Newton-Raphson, Broyden-Fletcher-Goldfarb-Sanno, BGGS) (this is an iterative method for solving unconstrained nonlinear optimization problems which can be seen in Broyden [28,29] and Shanno [30]) that ensure that the same solution is obtained from any of these methods. e standard errors of the estimates can be calculated by inverting the Hessian matrix. In this sense, both Mathematica and WinRats have at least two methods to reach it. e first is to retrieve them from the Cholesky factors (this package is available on the web upon request). e second, faster, is to obtain them by finite differentiation. Furthermore, WinRats package also offers the possibility to directly compute the maximum of the log-likelihood providing the elements of the Fisher information matrix. In fact, for the examples considered in this section, these two packages were used to quickly compute the maximum likelihood estimates. Commands for fitting the skew normal and log-skew normal distributions are also available in stata (see [31]).
Parameter estimates and their standard errors (in brackets), negative of the maximum of the log likelihood function, and AIC for the three datasets considered are shown in Tables 2 and 3. Model assessment is derived through the following information criteria. Negative log likelihood (NLL) is calculated by taking the negative of the value of the log-likelihood evaluated at the ML estimates; Akaike information criterion (AIC) is computed as twice the NLL and evaluated at the ML estimates, plus twice the number of estimated parameters; Consistent Akaike Information Criteria (CAIC), a corrected version of the AIC is proposed by Bozdogan [32] to overcome the tendency of the AIC overestimating the complexity of the underlying model as it lacks the asymptotic property of consistency. In order to calculate the CAIC, a correction factor based on the sample size is used to compensate for the overestimating nature of AIC.
e CAIC is defined as twice the NLL plus k(1 + ln(n)), where k is the number of free parameters and n refers to the sample size. Note that a model with a lower statistics value is preferred to one with a higher value. All these results are shown in Table 2. Furthermore, we also include the Kolmogorov-Smirnov test (KS) and the Anderson-Darling test (AD) to express the fit of the model to the data in terms of the distribution function. For these statistics, smaller values indicate a better fit of the model to the data. Note that they not only provide a way to measure the fit in terms of distribution functions but also allow us to perform hypothesis testing for model selection purposes. e p value of the test statistics, computed using the Monte Carlo method by using 1000 simulations, is shown in brackets. An extremely small p value may lead to a confident rejection of the null hypothesis that the data comes from the proposed model. It can be seen that LSN distribution is not rejected at the usual significance levels for both tests.
Graphs of histogram of the data and superimposed fitted densities are given in Figure 2. As can be seen, the proposed distribution provides a good fit for the empirical data.
In Table 4, the LSN distribution is fitted to the Wisconsin hospital costs' dataset. From left to right the parameter estimates, standard errors (SE), and the corresponding p values calculated based on the t-Wald statistic are displayed. Besides, AIC and BIC values for each model are provided in the last two rows of the table. e estimates of the regressors associated with the explanatory variables, number of beds and income, are not significant at the usual nominal level.
Finally, in order to choose a model that provides an acceptable description of the loss process for the Danish and Norwegian datasets, we should verify that the population limited expected values, computed numerically by are close to the empirical ones. As is well known, (22) is the expected amount per claim retained by the insured on a policy with a fixed amount deductible of x. In this case, the empirical limited expected value function was calculated based on E n (x) � (1/n) n i�1 min(x i , x). Obviously, when x Mathematical Problems in Engineering tends to infinity, L(x) and E n (x) approach E(X) and the sample mean, respectively. Table 5 below exhibits the limited expected value for different values of the policy limit x considered for the hospital costs' dataset. It is observed that the values obtained from the LSN distribution adheres closely to the observed empirical limited expected values obtained from the Pareto and PAT distributions. Similarly, in Figure 3, empirical and fitted limited expected value function for this dataset and also for the Danish and Norwegian data are shown. As can    For this reason, the dataset danishuni is randomly partitioned into two disjoint subsets of different sizes. e first subset (dataset A) is used for fitting the models, whereas the second subset (dataset B) is used for graphing the quantilequantile plots of the randomized quantile residuals to check for normality. e residuals for the out-of-sample datasets are exhibited in Figure 4 for different in-sample sizes, i.e., size of dataset A. For comparison purposes, we have we have plotted the residuals for the whole dataset (top-left graph)     with sample size 2167. A perfect alignment with the 45°line implies the residuals are normally distributed. It is observable that the residuals for LSN distribution adheres reasonably well to the line throughout the residual distribution. However, for all different in-sample sizes considered, the model tends to overestimate the lower quantiles and underestimate the upper quantiles. In general, the predictive power of the model is acceptable.

Conclusions and Extensions
In this paper, the use of the skew lognormal distribution has been proposed as a suitable model for describing claims' size empirical data. To the best of our knowledge, this probabilistic family has not been previously considered in the actuarial literature. e advantages of this distribution are twofold. On the one hand, it includes heavy tails which makes it an interesting model to describe severity data and; on the other hand, the distribution can be rewritten to allow for the incorporation of predictors to explain a response variable. Finally, it is worthy to point out that the distribution introduced in this paper could be extended to other areas of risk theory. For example, the LSN distribution could be employed as the secondary distribution (the distribution of the claims size) in the compound collective risk model. In particular, this distribution could be used as an approximation of the distribution of the total claims' size instead of the traditionally considered normal distribution. Furthermore, calculation of reinsurance premiums based on this distribution is an issue that deserves to be studied.