A Nonparametric Operational Risk Modeling Approach Based on Cornish-Fisher Expansion

It is generally accepted that the choice of severity distribution in loss distribution approach has a significant effect on the operational risk capital estimation. However, the usually used parametric approaches with predefined distribution assumption might be not able to fit the severity distribution accurately. The objective of this paper is to propose a nonparametric operational risk modeling approach based on Cornish-Fisher expansion. In this approach, the samples of severity are generated by Cornish-Fisher expansion and then used in the Monte Carlo simulation to sketch the annual operational loss distribution. In the experiment, the proposed approach is employed to calculate the operational risk capital charge for the overall Chinese banking.The experiment dataset is the most comprehensive operational risk dataset in China as far as we know.The results show that the proposed approach is able to use the information of high order moments and might be more effective and stable than the usually used parametric approach.


Introduction
Operational loss is an important source of bank risk.Take the incident of Barings Bank as an example; Nick Leeson's unauthorized trading activities at Singapore office result in a total loss of US $ 1.4 billon and eventually lead to the sudden bankruptcy of Barings bank.Nowadays, researchers, practitioners, and regulatory institutions are fully aware of the importance of operational risk.Basel committee on banking supervision (BCBS for short) formally defines it as the risk of loss resulting from inadequate or failed internal processes, people, and systems from external events [1].Besides, in Basel II accord, operational risk is covered under the Pillar I, keeping abreast of credit risk and market risk [2].
BCBS also introduces three approaches to the quantification of operational risk in a continuum of increasing sophistication and risk sensitivity, that is, basic indicator approach (BIA), standardized approach (SA), and advanced measurement approach (AMA) [3].By allowing banks to develop their own model for assessing the regulatory capital that covers their yearly operational risk at a confidence level of 99.9%, the most sophisticated option AMA has sparked an intense discussion in financial industry [4].Many quantitative AMAs for measuring operational risk have been proposed, such as internal measurement approach, loss distribution approach, scorecard approach, and extreme value theory based approach [1,5].
Among the eligible variants of AMA, loss distribution approach (LDA) is the most popular methodology by far [5,6].It is initially developed in actuarial industry and then introduced to operational risk modeling by Frachot et al. [7].LDA separately estimates the frequency distribution and severity distribution of operational risk loss and then combines them by convolution to derive annual operational risk distribution.The output of LDA is a full characterization of the distribution of annual operational losses, which contains all relevant information for the computation of regulatory capital charge [6].From the statistical viewpoint, LDA is the most accurate if it utilizes the exact distributions of loss frequency and severity.In other words, choosing the right frequency and severity distribution is the key to LDA.
The choice of severity distributions is usually supposed to have a more pronounced effect on capital than the choice of frequency distributions in LDA models [4,8].An improper choice of severity distribution might generate a significant distortion of the results and thus leads to overestimation or underestimation of capital charge.For this reason, very few studies work on the choice of frequency distribution.Some researchers simply employ Poisson distribution for its ease of use.Some other researchers who take this a step further will discuss whether negative binomial distribution or geometric distribution better fits the frequency [6].On the contrary, a vast majority of researchers and practitioners concentrate on the estimation of loss severity distribution.
The parametric approaches have dominated the estimation of severity distribution so far.It assumes a particular distribution for severity and then estimates the parameters by using moment estimation, maximum likelihood estimation, and so on [8].Lognormal distribution is firstly used to model severity distribution.Afterwards, Weibull and exponential distributions are also widely used [9].These distributions are usually able to fit "high-frequency and low-severity" loss well, that is, the middle part of the severity distribution.However, the major flaw of them is that they are badly fit for "low-frequency and high-severity" loss, that is, the tail of the severity distribution.After noticing this flaw, some heavy tail distributions are employed, for example, -ℎ distribution and -stable distribution [10,11].Extreme value theory is also applied to this area, which uses extreme value distribution, such as Pareto distribution, to model severity [5,12,13].The feature of the extreme distribution is that it only fits the tail of the distribution, and so a threshold is needed to determine where the tail starts.Improper threshold might severely affect the fitting results.However, most of the studies rely on subjective judgment to determine the threshold because there is no widely recognized objective method so far.
Unlike parametric approach, the nonparametric approach derives a loss amount at random from loss data to perform a simulation without assuming any particular severity distribution [8].This kind of approach has received relatively limited attention so far.The most simple and classical nonparametric approach is the bootstrap resa-mpling approach.In this approach, frequency samples and severity samples are directly drawn from the raw data and then combined to attain the yearly loss.This approach is just based on the historical data and cannot extrapolate.Another approach is based on maximum entropy principle.In this approach, the most suitable distribution is regarded as the distribution which is closest to uniform distribution and simultaneously meets some known statistic requirements [14].
Under different severity distribution assumption, results from LDA are greatly different from each other [9].Therefore improper distribution assumption will result in significant bias from the true capital charge.However, the usually used distributions either cannot fit both the body and tail well or cannot determine an objective threshold.Without so many assumptions, nonparametric approach is more robust than parametric approach.Moreover, parametric approach usually has the merit of ease of use and understanding.
The objective of this paper is to propose a nonparametric operational risk modeling approach.This approach uses Cornish-Fisher expansion to estimate the severities under the framework of LDA.Cornish-Fisher expansion is a wellknown mathematical expansion which is able to approximate the quantiles of a random variable based on its first few cumulants or moments [15].This approach does not need a predetermined distribution to fit the loss severity of operational risk.Besides, it is also able to partly solve the data sparseness problem of operational risk because, instead of the whole severity distribution, only the cumulants or moments of loss severity are required in the expansion.In this approach, samples of severity are generated by using standard normal distribution and Cornish-Fisher expansion.And then these samples are used in Monte Carlo simulation to sketch the annual operational risk loss distribution.In the experiment, based on the most comprehensive operational risk dataset in China as far as we know, the proposed approach is employed to calculate the operational risk capital charge of the overall Chinese banking.
The rest of this paper is organized as follows.Section 2 presents the proposed approach.Section 3 employs the approach to calculate the operational risk capital charge for the overall Chinese banking.Section 4 summarizes the conclusions.

The Proposed Cornish-Fisher Expansion
Based LDA Approach In this section, the proposed nonparametric approach using Cornish-Fisher expansion in the framework of LDA is illustrated at length.Firstly, the Cornish-Fisher expansion is introduced.Then the concept and specific steps of the proposed approach are presented.

Cornish-Fisher Expansion.
The Cornish-Fisher expansion, firstly proposed by Cornish and Fisher [16], has often been applied to finance when calculating value at risk (VaR).
It is a formula for approximating quantiles of a random variable based only on its first few cumulants.Assume that there is a random variable  whose mean  is equal to 0 and standard deviation  is equal to 1. Let   ( = 1, 2, . ..) denote the th order cumulant of : Then it can be proven that where   () denotes the cumulative distribution function of ,  denotes the differential operator, and Φ() denotes the standard normal distribution: where () denotes the probability density function of standard normal distribution and   () denotes the th order Chebyshev-Hermite polynomial: Next, according to (2) to ( 4), then we obtain Equation ( 5) is the called Edgeworth expansion, from which Cornish-Fisher expansion can be deduced as follows.Assume that   and V  are the  quantiles (0 <  < 1) of   () and Φ(), respectively.Then the Taylor expansion of Φ(V  ) at   is According to the definition of quantile, we have Finally, by combing (2), (5), and (7) together,   can be formulated by V  as Equation ( 8) is the Cornish-Fisher expansion of   by using V  .Equation ( 8) is valid only if  has mean of 0 and standard deviation of 1.However, we can still use it for other variables after normalizing the variable with its mean and standard deviation.

The Framework of the Proposed Approach.
LDA is a technique firstly estimating a frequency distribution for the occurrence of operational losses and a severity distribution for the economic impact of individual loss separately.Then in order to obtain the total distribution of operational losses, these two distributions are combined through -convolution of the severity distribution with itself, where  is a random variable that follows the frequency distribution [6].For an exhaustive introduction of LDA, please see Frachot et al. [7].
Because the multiple convolutions are usually analytically complex and do not lend themselves to implementation with closed-form formulas, Monte Carlo simulation is commonly used to derive the final annual distribution of operational risk loss.The procedure of Monte Carlo simulation based LDA is as follows: (1) determine loss frequency and loss severity distribution; (2) generate a number  from frequency distribution; (3) generate  samples from severity distribution; (4) sum the  samples to calculate the annual loss; (5) repeat ( 2) to (4)  times to attain  annual losses; (6) calculate VaR.
As described in ( 8), a random variable can be formulated by a variable from standard normal distribution in Cornish-Fisher expansion.Therefore, in this paper, we aim to use Cornish-Fisher expansion to help generate the samples of loss severity in Monte Carlo simulation.Samples are firstly generated from standard normal distribution and then transformed to the samples of loss severity by Cornish-Fisher expansion.The Cornish-Fisher expansion in the transformation process needs the cumulants of the loss severity.Compared with the original LDA approach, this approach does not need a predetermined distribution to fit loss severity.It is only the cumulants of loss severity that are required.
The framework of the proposed approach is shown in Figure 1.With respect to frequency, samples are still drawn from a fitted loss frequency distribution.With respect to severity, the samples are first generated from standard normal distribution and then transformed to severities by Cornish-Fisher expansion.After the Monte Carlo simulation, the annual loss distribution of operational risk is attained.Pioneered by J. P. Morgan, VaR has become a standard measure used in financial risk [17,18].So here we also use VaR to measure the magnitude of operational risk.

The Procedure of the Proposed Approach. As shown in
Figure 2, the whole procedure of the proposed approach consists of 3 stages and 7 steps in total.Stage 1 prepares for simulation.In this stage, frequency distribution  is determined and the cumulants of severity are calculated.Stage 2 is the Monte Carlo simulation process.Firstly a number  is randomly generated from frequency distribution.Then  samples of loss severity are generated by using standard normal distribution and Cornish-Fisher expansion.Next these samples are summed to calculate the annual loss.Finally this process is repeated a certain number of times to generate an empirical distribution of annual loss.Stage 3 calculates the VaR from the empirical distribution.
Assume that there are  operational risk observations and the number of the Monte Carlo simulation is .The operational risk severities are denoted as  1 ,  2 , . . .,   and

Standard normal distribution
Figure 1: The framework of the proposed operational risk modeling approach.
Step 1 determine frequency distribution G Step 2 calculate cumulants of severities Step 3 generate a number m from G Step 4 generate m samples of severities by Cornish-Fisher expansion Step 5 calculate total annual loss the simulated severities are denoted as ŷ1 , ŷ2 , . . ., ŷ .The detailed steps of the proposed approach are presented as follows.

Stage 1 Preparation
Step 1. Determine the best-fitting distribution  for frequency.
Step 2. Calculate the cumulants of severities. Step where  denotes the mean and  denotes the standard deviation of   .

Stage 2 Simulation
Step 3. Generate a number  randomly from frequency distribution .
Step 4.2.Transform V 1 , V 2 , . . ., V  to samples x1 , x2 , . . ., x by Step 5. Calculate total annual loss  by Step where VaR is the smallest number of  such that the probability that the loss  exceeds  is not larger than (1 − ).

Experiment
In this section, the proposed approach is employed to calculate the operational risk capital charge for the overall Chinese banking based on the most comprehensive operational risk dataset as far as we know.Firstly, the dataset and its statistical characteristics are introduced.Then the experiment results on the dataset are presented.
3.1.Data Description.Today, many financial institutions have started collecting data on their own operational loss experience, but it will take some time before the size and quality of most institution's databases allow reliable estimation of the parameters in the models [19].This problem of data sparseness is even worse in Chinese banking.Therefore, in addition to the work on the measurement approach, our laboratory also paid great attention to data collection.We have established an operational risk database of Chinese banking spanning from 1994 to 2012 with a total of 2132 collections.Each record is manually searched, labelled, and sorted out from public resources, such as the newspapers, the internet, and court documents, including the loss event description, start time, end time, exposed time, business line type, loss event type, loss amount in CNY, banks involved, location, and key person involved.As far as we know, this operational risk dataset of Chinese banking is the most comprehensive one in China.In this experiment, the end time and the loss amount are exacted from the database.The summary statistics of operational risk loss severity are shown in Panel A of Table 1.The values of range and standard deviation are very large, which means operational risk loss severities are of great difference.The skewness is 8.51, much larger than 0, indicating that the distribution is highly right-skewed.The kurtosis is 91.63, much larger than 3, which means that the distribution has an extreme sharp peak.The statistical characteristics of this experimental data are highly in accordance with the widely-recognized "leptokurtosis and fat tail" feature of operational risk.
The closer the unknown distribution is to the standard normal distribution, the more accurate the Cornish-Fisher expansion is.Therefore, we calculate the natural logarithm of loss severity to make the distribution closer to the standard normal distribution.After taking natural logarithm, the summary statistics of new data are shown in Panel B of Table 1.The central tendency of the natural log-distribution is significantly approved.The skewness is 0.15, very close to 0, which means that the distribution is almost symmetrical with a little right-skewed.Kurtosis also significantly decreases to 2.18, which is slightly smaller than 3.In summary, the logarithmic loss severity is much closer to the normal distribution than the original data.

Experiment Results.
In this section, the results of the proposed approach on the operational risk dataset are presented.Firstly, we will find a proper discrete distribution for frequency distribution.Poisson, negative binomial, and geometric distributions are three commonly used distributions in operational risk modeling [1,6,20].Among the three distributions, it is beyond all disputes that Poisson distribution is the most commonly used one.In a large majority of studies, mainly for its ease of use and the viewpoint of its smaller effect on capital, Poisson distribution is directly used without any test [4,7].This hasty usage of Poisson distribution is questionable.In this study, goodnessof-fit test is used to decide which distribution should be used rather than simply following the majority.Kolmogorov-Smirnov goodness-of-fit test (KS test for short) has a very wide application in testing whether a theoretical distribution is fit for an empirical distribution.So in this study, KS test is used to find which one fits the frequency distribution best.The results of the KS test and estimated parameters by maximum likelihood estimation are shown in Table 2.
As for KS test, the larger the  value is, the better the theoretical distribution is fit for the empirical distribution.Generally, the threshold is set as 5%.Table 2 shows that the  values of Poisson, negative binomial, and geometric distributions are 0.00, 0.43, and 0.10, respectively.It is noteworthy that the most frequently used distribution, Poisson distribution, is strongly rejected in the test.The remaining negative binomial and geometric distribution pass the test with  values larger than 5%.Besides, the  value of negative binomial distribution is the largest, which means that negative binomial   distribution is able to fit the frequency best.Therefore, in this experiment, negative binomial distribution is used to describe the distribution of frequency.
Then we normalize the logarithmic operational risk severity by its mean 5.32 and standard deviation 3.31.The moments and cumulants of operational risk severity after normalization are shown in Table 3.In Table 3,  1 to  5 denote the first moment to the fifth moment.Based on these moments, the cumulants  1 to  5 are calculated by using (10), which will be used in Cornish-Fisher expansion functions.
The larger the number of simulations is, the more accurate the results are and the longer the computational time required is.In order to balance simulation accuracy and time cost, we follow other studies and set the number of simulations as 100000 [5,18].Generally, the capital requirement is set to protect against losses over one year at 99.9% level because it is roughly equivalent to the default risk of an A-rated corporate bond [7].Basel committee on banking supervision also recommends 99.9% as a proper confidence level.Therefore, the VaR value at confidence level 99.9% is calculated and shown in Table 4.
Table 4 shows that VaR at 99.9% ranges from 67 to 13290 billion CNY.The order of Cornish-Fisher expansion dramatically affects the magnitude of VaR.Besides, as the order increases, the VaR result becomes relatively stable.When the order increases to three or larger, VaR converges to about 82 billion.Generally, the larger the order of Cornish-Fisher expansion, the more accurate the results are.Larger order will use more information of cumulants.One-order Cornish-Fisher expansion actually uses  1 and  2 and VaR turns out to be 3380 billion.Two-order expansion includes  3 and the VaR increases to 13290 billion.When  4 is added in three-order expansion, VaR drastically reduces to 84 billion.Higher order expansion also leads to the VaR result of about 82 billion.
Among the parametric distributions, lognormal distribution is undoubtedly the most frequently used one for modeling operational risk severity [1].The parameters of lognormal distribution are mean and standard deviation.In other words, lognormal distribution is only decided by the first and second moments.Nevertheless, higher order moment also contains some useful information.The results of Cornish-Fisher expansion in Table 4 show that higher order moment may have significant effects on the results.Cornish-Fisher expansion can utilize not only the mean and standard deviation, but also the information of higher order moments, so we think that the approach we propose is able to allocate the operational risk capital charge in a more effective way.
In our published book the proposed approach drew the conclusion that the capital charge for operational risk is 31 billion CNY in 2007 [14].The dataset used in this published book only contains the operational risk records before 2006.After years of effort, the dataset is largely extended and the new dataset spans from 1994 to 2012.By using the new dataset, this study reaches the conclusion that the capital for operational risk charge is 82 billion CNY in 2013.The total assets of Chinese banking financial institutions in 2007 are 53116 billion CNY.These institutions have developed very fast in recent years.In November of 2013, their total assets have increased to 145330 billion CNY, about 2.74 times of 2007.Thus the ratio of capital charge to total assets calculated from old dataset is almost consistent with the ratio calculated from the new dataset.Besides, it is also found out that the statistical characteristics of the new dataset and the old dataset are very similar.Therefore, the dataset is supposed to be stable and authentically reveal the operational risk of Chinese banking.

Conclusion
In this paper, a nonparametric operational risk modeling approach based on Cornish-Fisher expansion and loss distribution approach is proposed.This approach does not need to assume a distribution for severity beforehand.Only the cumulants or moments of the severities are required in Monte Carlo simulation process.In the experiment, based on the most comprehensive operational risk dataset as far as we know, the proposed approach is employed to calculate the operational risk capital charge for the overall Chinese banking.
The experiment shows that the resulting VaR values range from 67 billion CNY to 13290 billion CNY.The expansions with low order moments lead to large VaR values of 3390 and 13290 billion CNY.When higher order moments, that is, fourth and fifth moments, are added in the expansion, VaR converges to around 82 billion CNY.The widely used lognormal distribution only uses the information of the first and second moments, while the proposed approach is able to include the information of high order moments.Therefore, the proposed approach is supposed to model the operational risk in a more effective way.

Step 6 repeat Step 3 to Step 5 Figure 2 :
Figure 2: The procedure of the proposed operational risk modeling approach.

Table 1 :
Summary statistics of loss severity and logarithmic loss severity.

Table 2 :
Estimated parameters and goodness-of-fit test results of frequency distribution. denotes the specified number of failures and  denotes the probability of success in each trial.Besides,  and , respectively, denote the single parameters of Poisson and geometric distribution.

Table 3 :
Estimated parameters for Cornish-Fisher expansion. 1 to  5 denote the first moment to the fifth moment. 1 to  5 denote the first cumulant to the fifth cumulant.