DEPENDENT HIDDEN MARKOV MODEL OF CREDIT QUALITY

We propose a dependent hidden Markov model of credit quality. We suppose that the "true" credit quality is not observed directly but only through noisy observations given by posted credit ratings. The model is formulated in discrete time with a Markov chain observed in martingale noise, where "noise" terms of the state and observation processes are possibly dependent. The model provides estimates for the state of the Markov chain governing the evolution of the credit rating process and the parameters of the model, where the latter are estimated using the EM algorithm. The dependent dynamics allow for the so-called "rating momentum" discussed in the credit literature and also provide a convenient test of independence between the state and observation dynamics.


Introduction
Credit ratings summarise a range of qualitative and quantitative information about the credit worthiness of debt issuers and are therefore a convenient signal for the credit quality of the debtor.The estimation of credit quality transition matrices is at the core of credit risk measures with applications to pricing and portfolio risk management.In view of pending regulations regarding the calculation of capital requirements for banks, there is renewed interest in efficiency of credit ratings as indicators of credit quality and models of their dynamics Basel Committee on Banking Supervision 1 .
In the study of credit quality dynamics, it is convenient to assume that the credit rating process is a time-homogeneous Markov chain, with past changes in credit quality characterised by a transition matrix.The assumptions of time homogeneity and Markovian behaviour of the rating process have been challenged by some empirical studies; see, for example, Bangia et al. 2 or Lando and Skødeberg 3 .In particular, it has been proposed that ratings exhibit "rating momentum" or "drift," where a rating change in response to a International Journal of Stochastic Analysis change in credit quality does not fully reflect that change in credit quality.As pointed out by L öffler in 4, 5 , these violations of information efficiency could be the result of some of the agencies' rating policies, namely, rating through the cycle and avoiding rating reversals.
In recent years, a number of modelling alternatives were suggested to address departures from the Markov assumption.In Frydman and Schuermann 6 , a mixture of two independent continuous time homogeneous Markov chains is proposed for the ratings migration process, so that the future distribution of a firm's ratings depends not only its current rating but also on the past history of ratings.Wendin  In this paper we follow the hidden Markov model HMM approach taken in Korolkiewicz and Elliott 10 and assume that the "true" credit quality evolution can be described by a Markov chain but we do not observe this Markov chain directly.Rather, it is hidden in "noisy" observations represented by posted credit ratings.The model is formulated in discrete time, with a Markov chain of "true" credit quality observed in martingale noise.However, we suppose that noise terms of the signal and observation processes are not independent, which allows for the presence of "rating momentum" in posted credit ratings.Application of such dependent hidden Markov model dynamics to modelling credit quality appears to be new.We employ hidden Markov filtering and estimation techniques described in Elliott et al. 11 and use the filter-based EM Expectation Maximization algorithm to estimate the parameters of the model.By construction parameters are revised as new information is obtained and so the resulting filters are adaptive and "self-tuning." The paper is organized as follows.In Section 2 we describe a hidden Markov model HMM of credit quality and in Section 3 the dependent dynamics.Recursive filters are given in Section 4 and the parameter estimation procedure is described in Section 5. Section 6 provides an implementation example.

Dynamics of the Markov Chain and Observations
Here we briefly describe a hidden Markov model as given in Chapter 2 of Elliott et al. 11 .Formally, a discrete-time, finite-state, time homogeneous Markov chain is a stochastic process {X k } with the state space S {1, 2, . . ., N} and a transition matrix A a ji 1≤i,j≤N .Without loss of generality, we can assume that the elements of S are identified with the standard unit vectors {e 1 , e 2 , . . ., e N }, e i 0, . . ., 0, 1, 0, . . ., 0 ∈ R N .Write F k σ{X 0 , X 1 , . . ., X k } for a filtration {F k } models all possible histories of X.The relationship between the state process at time k and the state of the process at time k 1 is then given by E where Suppose we do not observe X directly.Rather, we observe a process Y such that where c is a function with values in a finite set and {ω k } is a sequence of i.i.d.random variables independent of X. Random variables {ω k } represent the noise present in the system.Suppose the range of c consists of M points which are identified with unit vectors {f 1 , f 2 , . . ., f M }, f j 0, . . ., 0, 1, 0, . . ., 0 ∈ R M .Write

2.3
These increasing families of σ-fields are filtrations representing possible histories of the state process X, the observation process Y , and both processes X, Y .Write for the probability of observing a state f j when the signal process is in fact in state e i .Then, it can be shown that where W is a martingale increment with E W k 1 |G k 0 ∈ R M .In our context, the process Y represents posted credit ratings and X "true" credit quality.For reasons which will become apparent in the next section, we assume one-period delay between X and Y .
In summary, the model for the Markov chain X hidden in martingale noise is as follows.

Hidden Markov Model (HMM)
Under a probability measure P , observation equation, posted rating .

2.5
A and C are matrices of transition probabilities whose entries satisfy V k and W k are martingale increments satisfying Parameters of this model are a ji , 1 ≤ i, j ≤ N and c ji ,

Dependent Dynamics
The situation considered in this section is that of a hidden Markov model HMM for which the "noise" terms in the state and observation processes are possibly dependent.
The dynamics of the state process X and the observation process Y are as given in Section 2. However, the noise terms V k and W k are not independent.Instead, we suppose that the joint distribution of Y k and X k is given by CX k , where , denotes the scalar product in R M and R N , respectively.Write s rji /a ji , and let C be the In summary, the model is now as follows.

Dependent Hidden Markov Model (Dependent HMM)
Under a probability measure P ,

3.3
A and C are matrices of transition probabilities whose entries satisfy V k and W k are martingale increments satisfying Parameters of this model are a ji , We are in a situation analogous to the dependent hidden Markov model case discussed in Chapter 2, Section 10 of Elliott et al. 11 .The difference is that we are assuming dynamics where the observation Y k depends on both X k and X k−1 .In other words, we suppose that the current credit rating contains information about both current and previous credit quality, thus allowing for the situation where a rating does not immediately reflect all available information about credit quality, as indicated by a number of empirical studies see, e.g., Lando and Skødeberg 3 .Put differently, in this model X k and observation Y k jointly depend on X k−1 , which means that, in addition to previous period's credit quality, knowledge of current credit rating carries information about current credit quality.Moreover, probabilities γ rji provide the distribution of the next period's credit rating given both current and next period's credit quality, thus allowing us to capture "rating momentum" or "rating drift." In the following sections we will presents estimates for the state of the Markov chain X, the number of jumps from one state to another, the occupation time of X in any state, the number of transitions of the observation process Y into a particular state of X, and the number of joint transitions of X and Y .We will then use the filter-based EM expectation maximization algorithm as described in Elliott et al. 11 , to obtain optimal estimates of the model, making it adaptive or "self-tuning." Note that if the noise terms in the state X and observation Y are independent, we have

3.6
Hence if the noise terms are independent, s rji c rj a ji 3.7 for 1 ≤ r ≤ M, 1 ≤ i, j ≤ N. Consequently, a test of independence is to check whether parameter estimates satisfy s rji c rj a ji . 3.8

Recursive Filter
Following Elliott et al. 11 , suppose that under some probability measure Suppose we observe Y 0 , . . ., Y k , and we wish to estimate X 0 , . . ., X k .The best mean square estimate of Hence, to estimate E X k | Y k we need to know the dynamics of q.Using the methods of Elliott et al. 11 , the following recursive formula for q k 1 is obtained:

Parameter Estimates
To estimate parameters of the model, matrices A, C, and S, we need estimates of the following processes:

5.1
The above processes are interpreted as follows: J ij k is the number of jumps of X from state e i to state e j up to time k.O i k is the amount of time, up to time k − 1, X has spent in state e i .T ir k is the number of transitions, up to time k, from state e i to observation f r .L ijr k is the number of jumps of X from state e i to state e j while Y was in state f r up to time k.

Note that
Consider first the jump process {J ij k }.We wish to estimate J ij k given the observations Y 0 , . . ., Y k .As in the case of a filter for the state X described in Section 4, the best meansquare estimate is We wish to know how σ J ij k is updated as time passes and new information arrives.However, as noted in Elliott et al. 11 , we work with

5.3
Similarly, we consider the best mean square estimates of O i k , T jr k , and L rji given Y k :

5.4
Recursive formulae for the processes International Journal of Stochastic Analysis are as follows:

5.6
As in the case of the number of jumps of the state process X, quantities of interest σ O i k , σ T ir k , and σ L ijr k are obtained by taking inner products with 1 1, 1, . . ., 1 :

5.7
The model is determined by parameters θ {a ji , s rji 1.

5.8
We want to determine a new set of parameters θ { a ji , given the arrival of new information embedded in the values of the observation process Y .This requires maximum likelihood estimation.As in 11 , we proceed by using the filter-based EM Expectation Maximization algorithm, which retains the well-established statistical properties of the EM algorithm while reducing memory costs and thus allowing for faster computation see, e.g., Krishnamurthy and Chung 12 .
Consider first the parameter a ji .Suppose that, under measure P θ , X is a Markov chain with transition matrix A a ji .We define a new probability measure P θ such that, under P θ , X is a Markov chain with transition matrix A a ji , that is, a sr a sr X l , e s X l−1 , e r .

5.10
In case a ji 0 take a ji 0 and a sr /a sr 1.
Define P θ by setting dP θ /dP θ | F k Λ k .It can then be shown that, under P θ , X is a Markov chain with transition matrix A a ji .Moreover, given the observations up to time k, {Y 0 , Y 1 , . . ., Y k }, and given the parameter set θ {a ji , 1 ≤ i, j ≤ N; c ji , 1 ≤ i ≤ N, 1 ≤ j ≤ M}, the EM estimates a ji are given by

5.11
Consider now the parameter c ji .Suppose that, under measure P θ , where C c ji .We define a new probability measure P θ as follows.Put

5.13
In case c ji 0 take c ji 0 and c sr /c sr 1. Define P θ by setting dP θ /dP θ | G k Λ k .Again it can be shown that, under P θ , Moreover, given the observations up to time k, {Y 0 , Y 1 , . . ., Y k }, and given the parameter set θ {a ji , 1 ≤ i, j ≤ N; c ji , 1 ≤ i ≤ N, 1 ≤ j ≤ M}, the EM estimates c ji are given by

5.15
Finally, consider the parameter s rji .A new probability measure P θ is defined by putting

5.16
In case s rji 0 take s rji 0 and s rji /s rji 1. Define P θ by setting

5.17
Given the observations up to time k, {Y 0 , Y 1 , . . ., Y k }, and given the parameter set θ {a ji , ≤ N}, the EM estimates s rji are then given by 5.18

Implementation Example
The dependent hidden Markov model Dependent HMM described in previous sections was applied to a dataset of Standard & Poor's credit ratings.Description of the data and implementation results are given below.We have a total of 19,515 firm-years in our sample.However, only 34% of those observations correspond to one of the eight Standard & Poor's rating labels in a given year.The remaining 66% of observations represent the so-called NR not rated status.As discussed in the literature, transitions to NR may be due to several reasons, such as expiration of the debt, calling of the debt, or the issuer deciding to bypass an agency rating see, e.g., Bangia et al. 2 .Unfortunately, details of individual transitions to NR are not known.

Data Description
Excluding NR, approximately 85% of the remaining ratings are in categories A down to B. The median rating is BB, the highest non-investment-grade rating.Approximately 1% of the observed ratings are AAA and 2% are defaults.The most common rating is B, two rating categories above default, which accounts for 25.5% of the observations.

Implementation Results
Since individual firms generally experience few rating changes and changes that do occur are to neighbouring categories, we apply the Dependent HMM algorithm to an aggregate of firms in the dataset rather to allow for more observed transitions between rating categories and make inferences possible.Specifically, we follow the filter-based cohort approach adopted in Korolkiewicz and Elliott 10 , and instead of estimating the distribution and parameters for the Markov chain X l k for each firm l, we estimate the distribution and parameters for L l 1 X l k given the additivity of all stochastic processes discussed in Sections 4 and 5.
Given the fairly large number of parameters to be estimated compared to the number of rating transitions in the dataset, we have reclassified all firms in the sample as IG investment grade , SG speculative grade , D, or NR and then applied the Dependent HMM Recall from Section 3 that, given estimates of matrices A and C, our Dependent HMM also provides the distribution of posted credit ratings at time k 1 given "true" credit quality at times k and k 1, namely, estimates of conditional probabilities γ rji P Y k 1 f r | X k 1 e j , X k e i .To illustrate, consider a borrower with investment-grade "true" credit quality at times k and k 1.The probability that this borrower is assigned to a speculative-grade rating class is P Y k 1 SG | X k 1 IG, X k IG , which, given our model parameter estimates, is given by s SG,IG,IG / a IG,IG 0.007/0.4080.017.Similarly, for a borrower whose "true" credit quality improves from SG to IG, the probability of being assigned to an IG rating class is given by P Y k 1 IG | X k 1 IG, X k SG , which we would estimate to be 0.007/0.0180.389.These estimates again suggest that rating agencies may be somewhat reluctant to downgrade upgrade borrowers from to investment grade, thus introducing a degree of "rating momentum."

Test of Independence
Recall that the Dependent HMM allows the "noise" terms in the state and observation processes to be possibly dependent.As indicated in Section 3, a convenient test of independence is to check whether the estimated parameters of the model satisfy s rji c rj a ji .
Given our estimates of matrices A and C, products c rj a ji were calculated and then compared to corresponding entries of the estimated matrix S using linear regression.The regression results are given in Table 2.As indicated by the high F-statistic 4728.10 and high R 2 value 98.71% , the fitted regression model is significant.The slope estimate is very close to one with low standard error and P value of 0.000, while the intercept estimate is very close zero and not significant P value of 0.91 .These regression results suggest no major departures from independence, which seems to agree with findings in Kiefer and Larson 14 that indicate the Markov assumption, implicit in most credit risk models, does not seem to be "too wrong" for typical forecast horizons.However, longer rating histories may be necessary to verify these results.

Conclusion
We have proposed a Dependent Hidden Markov Model for the evolution of credit quality in discrete time with a Markov chain observed in martingale noise.We have applied the estimation techniques of hidden Markov models from Elliott et al. 11 to obtain the best estimate of the Markov chain representing "true" credit quality and estimates of the parameters.The estimation procedure was repeated to ensure that the model and estimates improved with each iteration.The model was applied to a dataset of Standard & Poor's issuer ratings and our preliminary results agree with some qualitative observations made in the literature regarding credit rating systems but also indicate no significant dependence in the dynamics of the "state" credit quality and "observation" credit rating processes.
and McNeil 7 suppose that credit ratings are subject to both observed and unobserved systematic risk.Rating transition patters e.g., rating momentum are captured within the context of a generalised linear mixed model GLMM that is estimated using Bayesian techniques.Stefanescu et al. 8 propose a Bayesian hierarchical framework, based on Markov Chain Monte Carlo MCMC techniques, to model non-Markovian dynamics in ratings migrations.In Wozabal and Hochreiter 9 , a coupled Markov chain model is introduced to model dependency among rating migrations of issuers.
Define a new probability measure P by putting dP/dP | G k Λ k .Then, under P, X remains a Markov chain with transition matrix A and P Y k 1 However, P is a much 6 International Journal of Stochastic Analysis easier measure under which to work.Using Bayes' Theorem as described in Elliott et al. 11 , we have Our analysis takes advantage of the Standard & Poor's COMPUSTAT database, which contains rating histories for 1,301 obligors over the period 1985-1999 Standard & Poor's 13 .The universe of obligors is mainly large US and Canadian corporate institutions.The obligors include industrials, utilities, insurance companies, banks and other financial institutions, and real-estate companies.The COMPUSTAT database provides annual ratings.Every year each of the rated obligors is assigned to one of the Standard and Poor's 7 rating categories, ranging from AAA highest rating to CCC lowest rating as well as D payment in default and the NR not rated state.