^{1, 2, 3}

^{4}

^{1}

^{2}

^{3}

^{4}

Mortality models often have inbuilt identification issues challenging the statistician. The statistician can choose to work with well-defined freely varying parameters, derived as maximal invariants in this paper, or with ad hoc identified parameters which at first glance seem more intuitive, but which can introduce a number of unnecessary challenges. In this paper we describe the methodological advantages from using the maximal invariant parameterisation and we go through the extra methodological challenges a statistician has to deal with when insisting on working with ad hoc identifications. These challenges are broadly similar in frequentist and in Bayesian setups. We also go through a number of examples from the literature where ad hoc identifications have been preferred in the statistical analyses.

Mortality models are commonly used in a wide range of fields such as actuarial sciences, epidemiology, and sociology. They are often used in important decisions such as how to deal with unisex legislation in the pension industry; see Ornelas et al. [

A simple example is the age-period model for an age-period array of mortality rates. It is well-known that the levels of the age- and period-effects cannot be determined from the likelihood representing the overparametrisation of the model. When the estimated age- and period-effects are treated as time series and subjected to plotting and extrapolation, then our approach ensures that the statistical analysis is the same for two researchers identifying the above model in two different ways. Whereas this issue is relatively simple for the age-period model, identification becomes more tricky for complicated models such as the age-period-cohort model and the model of Lee and Carter [

Mortality models are built as a combination of age, period, and cohort-effects, but the likelihood only varies with a surjective function of these time effects. The time effects can be divided into two parts. One part that moves the likelihood function and another part which does not induce variation in the likelihood function. We will argue that all inferences and forecasts should be concerned primarily with the part of the parameter that moves the likelihood function. This does not preclude the researcher from working with the time effects, but it gives some limitations on what can be done. This is important because the motivation and the intuition of mortality models typically originate in the time effects. For instance, in the context of an age-period-cohort model linear trends cannot be identified so time series plots of the time effects need to be invariant to linear trends and extrapolations of time effects must preserve the arbitrary linear trend in the time effects. This applies regardless of whether the identification issue is dealt with in a frequentist manner or by Bayesian methods.

To formalise the discussion slightly return to the age-period example. Denote the predictor for the age-period data array by

Once an ad hoc identification of

Indeed, with many extrapolation methods forecasts will be invariant to the choice of

We will start by analysing linearly parametrised models at a rather general level. We do this with two aspects in mind. First, we need to step back to a point in the analysis before ad hoc identification is made. Secondly, we also want to avoid the discussion of how to choose

Subsequently, we will consider the age-period-cohort model in detail, both for one- and two-sample situations. Using the general results it becomes easier to see that a number of popular methods inadvertently include features that are not invariant to ad hoc identification. These include the “intrinsic estimator” advocated by Yang et al. [

Throughout the paper our concern rests exclusively with the identification problem and the consequences of ad hoc identification for estimation, plots, inference, and forecasting. In practice, important additional concerns are how to choose appropriate models and forecasting methods. We would like to refer to Girosi and King [

Section

In this section we present the identification problem in a linear framework. The problem is solved by analysing the mapping from the original time effect to the predictor which, in turn, leads to standard statistical analysis. In Section

In Section

The analysis of the linearly parametrised involves projections on linear or affine spaces and on their orthogonal complements. It is therefore convenient to introduce the following notation. A matrix

Think of the time effect

Consider a data vector

The model for the predictor

The parameter space for the likelihood function and therefore for the statistical model is given by the range of variation for the predictor

The identification problem of mortality models arises when the mapping from the time effect space

When analysing the mapping from our intuitively preferred parametrisation

The first method is to find a basis

Alternatively, the identification problem can be expressed through an invariance argument. This argument relates to the parameterization but resembles the classical invariance argument for reduction of data; see Cox and Hinkley [

In applications it can be difficult to find a basis

It is useful to note that in the choices of

The statistical model parametrised with the maximal invariant parameter

Suppose the likelihood is drawn from a generalized linear model based on an exponential family. Then the model is actually a regular exponential family where the maximal invariant parameter

The maximal invariant parameter

Hypotheses are easily formulated and analysed when using the maximal invariant parametrisation. An affine hypothesis that restricts

Most often the objective of a mortality study is to forecast the future mortality. In the linear context,

It is usually easy to extend the design

Ad hoc identified time effects can be extrapolated in a similar way; see Section

The introduction of the canonical parameter shows that the likelihood, in Bayesian notation, is of the form

In contrast, introducing a prior on ad hoc identified parameters gives various difficulties. Only parts of the prior are updated by the likelihood, so that it becomes unclear which information arises from the data and which information arises from the ad hoc identification. Moreover, avoidable arbitrariness is introduced in the forecast; see Section

In Section

In Section

In this section the time effect parametrisation is considered. An identification scheme has to be introduced when working with the time effects. This may rest on mathematical convenience or it may be chosen for a particular purpose given the substantive context. We therefore call it ad hoc identification. Here we consider a simple identification scheme but turn to a more common two-step identification scheme in Section

Once the canonical parameter

A linear ad hoc identification of

It is perhaps interesting to note that despite the linear parametrisation the ad hoc identification need not be done in a linear fashion as in (

The fit of the model is unaffected by the ad hoc identification. Indeed the fit is measured in terms of the estimate of the predictor

As an illustration of estimation in the presence of ad hoc identification consider a normal likelihood. Different, but equivalent, expressions can be found depending on the parametrisation. The likelihood of the predictor

The likelihood (

It is common to ad hoc identify parameter in a step-wise fashion. In the first step the time effect parameter is only partially constrained. The full identification then follows in a second step. An example is given in Section

The first step constraints are affine of the type

Suppose

The partial ad hoc identification by (

In the following we will look closer at the consequences of working with the ad hoc identified time effect parameter

In the mortality model (

Estimates of the time effects are constructed by combining an estimate of

Attempts to give intrinsic meaning to

Adding confidence bands to a plot of

Finally, it may be of interest to analyse the estimated

Having formulated the model in terms of time effects it may be of interest to test the hypothesis that one of these time effects is absent. No identification issues arise when the hypothesis is formulated as a restriction on the canonical parameter

Affine hypotheses on the time effect are of the form

Forecasts can be made by extrapolating the ad hoc identified time effects

Following the linear approach outlined in Section

In contrast, these considerations are redundant when working with the canonical parameter,

Mortality analysis is often carried out using either Bayesian methods or random effects methods. The mortality model is then altered through the introduction of a prior distribution on the parameters. One might think that the identification problems become less of an issue or even disappear. This is not the case since the Bayesian method and the random effects method is based on the mortality likelihood which only depends on the time effect

In Section

For Bayesian and random effects models we formulate a likelihood and a prior. Thus, consider a likelihood

Suppose the likelihood satisfies (

the predictive distribution does not depend on the conditional prior for

the posterior satisfies

the posterior means satisfy

Theorem

Due to Theorem

Bayesian forecasts involve integrating an extrapolative distribution. This can be done in two ways, either working exclusively with the identified, maximal invariant parameter

Consider first the case where only the maximal invariant parameter

Consider now forecasts involving the full time effect

Suppose that the likelihood satisfies (

To summarise, the identification issues surrounding Bayesian analysis are similar to those outlined in the previous sections. Examples of the problems that can arise are discussed in Sections

It is common to combine mortality models with a random effects approach, which effectively forms a new model. An example is given in Section

The random effect models are typically constructed as follows. Suppose the density of the data

In mortality modelling it is common to go one step further and estimate the time effects

We will now apply the theoretical considerations to analyse the age-period-cohort model. The methodological literature on this model is large and the consequences of the above theory are wide ranging.

In Section

The implementation of the canonical parameter depends on the type of data array. In Section

Here the age-period model is set up and a quite general identification result is reported.

Consider data

The statistical model is defined by the assumption that the variables

The model (

A first clue for the canonical parametrisation is given by Fienberg and Mason [

Illustration of interpretation of

Kuang et al. [

Let

the parametrisation of

Theorem

In itself this theorem does not tell how to express the predictor

The link between the canonical parameter

Age-cohort data arrays are rectangular in the age and cohort indices and given by

Age-cohort arrays are in particular used for reserving in general insurance. In that situation, only the triangle

The age-period-cohort model for the age-cohort arrays is parametrised by

The design matrix linking the canonical parameter

The identity (

The design matrix now follows from the identity (

The identification relies on Theorem

Let

the parametrisation of

Theorem

An age-period data array is rectangular in the age and cohort indices and given by

Age-period arrays are commonly used in epidemiology, in mortality analysis, and in sociology. The analysis of identification issue is largely similar to that of age-cohort arrays. However, the representation of the predictor

The age-period-cohort model for the age-period arrays is parametrised by

Let

the parametrisation of

The group of transformations in (

An age-period data arrays is rectangular in the age and cohort indices and given by

The age-period-cohort model for the age-cohort arrays is parametrised by

Let

the parametrisation of

It is often of interest to test the absence of the period effect. An application to analysing asbestos related mortality can be found in Miranda et al. [

The hypothesis is that

The identification problem simplifies to a question of determining the levels of

The age-cohort model can also be formulated as a hypothesis on the maximal invariant

There is a large literature seeking to identify the original time effects

For the age-period-cohort model it is popular to impose ad hoc identifications in two steps of the type discussed in Section

A common first step ad hoc identification is to require that

The constraint (

The “intrinsic” estimator is a popular estimator in the sociology literature; see Yang et al. [

The “intrinsic” estimator is defined in two steps. In the first step, the levels are identified by the ad hoc constraint (

We can analyse these steps using the developed framework. The first step identifies the levels by the ad hoc constraint (

In the second step the linear trend is ad hoc identified through a time effect parameter of the form (

The “intrinsic” estimator is ad hoc identified through the choices

The “intrinsic” parameter is an injective mapping of the canonical parameter

Theorem

Forecasting of future mortality rates involves an extrapolation of the time parameters. In Section

In the context of an age-period data array

Identification plays a role when extrapolating the estimates obtained on the data array

Consider the predictor

the extrapolation method for period and cohort effects is linear trend-preserving:

functions

To illustrate the use of Theorem

Kuang, Nielsen, and Nielsen [

A Bayesian ad hoc identification using a dynamic prior does not solve the identification problem as discussed in Section

The Berzuini-Clayton suggestion is to ad hoc identify the model (

We will analyse the Berzuini-Clayton model as applied to an age-period data array

We get a hyper-parameter

In the presentation of the posterior Berzuini and Clayton are careful only to consider the double differences

The extrapolative method is based on double differences so it only depends on

In summary, it appears that the Berzuini-Clayton analysis depends on the

It is instructive to consider functional form restrictions on the time effects. Such hypotheses can be analysed using the results outlined in Section

This restriction on the time effect can be analysed by writing it on the form

A quadratic polynomial has constant second order derivative. Therefore the restriction (

If the constraint is imposed directly on the canonical parameter, the restricted model is a regular exponential family with the advantages outlined in Section

In some cases a random effects approach can be used to get an overview of the many parameters of the age-period model. When applied to the time effects this implies an ad hoc identification. An example is the “hierarchical age-period cohort regression model” by Yang and Land [

From (

The random effects likelihood are constructed in three steps. First, we have the usual age-period-cohort likelihood

When confronted with two samples for women and for men it may be of interest to apply the age-period-cohort model (

The unrestricted two-sample model is simply analysed as two copies of the one sample model of Section

An application of the unrestricted two-sample model can be found in Cairns et al. [

The two-sample model allows the possibility for adding cross-sample restrictions on the parameters. As an example we consider the hypothesis of common period parameters.

Working with the canonical parameter the hypothesis is

The same result arises when writing the hypothesis in terms of time effects so that

The restriction has an interesting implication for the interpretation of the involved double differences. For the unrestricted model it was found that only plain double differences, such as

The analysis of Riebler and Held [

The apparent difference comes about because Riebler and Held follow a step-wise identification approach along the lines of Sections

The identification in the first step implies that

Some additional issues arise when looking at models with nonlinear parametrisations. A prominent example is the mortality model proposed by Lee and Carter [

We analyze the Lee-Carter model in Section

The mortality model proposed by Lee and Carter [

Lee and Carter pointed towards two identification issues of the model. If

We start by finding the predictor space

The next step is to analyse the time effect space

Let

the parametrisation of

Theorem

It is interesting to compare the properties of the spaces

The maximum likelihood estimator for

Consider a situation where the data array is of age-period form so

For a normal age-period array parametrised by (

Thus,

The ad hoc identification (

The parameter space

Investigate whether the time effects are present amounts to estimating the rank of

The consistency of this step-wise procedure is discussed in a cointegration context by Johansen [

The rank deficiency issue is typically not encountered in a standard Lee-Carter analysis. The reason is that the analysis is typically applied to data where there is a marked improvement in mortality rates over time. A Lee-Carter analysis could however run into trouble if it were applied to data without a strong calendar effect. The issue becomes more pertinent when extending Lee-Carter model with a cohort component such as

The purpose of Lee-Carter model is usually to forecast future mortality. This issue is considered for the model with parameter space

Let

The default forecast method in the literature is a random walk with a drift, which was the preferred forecast of Lee and Carter [

An alternative approach to forecasting would consider the predictor of the model for a particular age ground, say

A Bayesian model with dynamic specification of the prior has been suggested by Pedroza [

Pedroza presents posteriors for

We now turn to applications of the Lee-Carter model in two-sample problems. Suppose two samples are for women and men. One approach would be to fit separate Lee-Carter models to the two datasets. These Lee-Carter models are of the form

Let

For one sample the standard forecasting technique appears to be the random walk with a drift as in (

There are two fixes to this problem. The first solution is to work directly with the mortality predictors

Ad hoc identification is intimately linked to interpretation, inference, numerical analysis, and forecasting. The ad hoc identification will often introduce an arbitrary element in the statistical analysis, whether it is based on frequentist or Bayesian methods. This arbitrary element is entirely avoidable and is in our view best avoided unless there is a clear substantial motivation for ad hoc identification. For decades there has been a debate over how it is best to ad hoc identify mortality models. Our proposal is to bypass this discussion by analysing the surjective mapping between the unidentified time effect parameter and the predictor of the model and then deduce a maximal invariant parametrisation. In our experience there are typically two substantial benefits. First, it simplifies estimation and other statistical computations which is what we have focused on here. Secondly and perhaps more importantly, it helps to focus the substantial question that gives rise to the analysis in the first place.

The issue of dealing with two time scales also occurs in other statistical models, such as the Cox regression model; see Cabrera et al. [

Consider

(i)

Consider

Since

Since

(i) With the likelihood (

(ii) By Bayes formula and the likelihood (

(iii) The posterior means are

Consider the expressions in (

Similar to the proof of Kuang et al. [

Recall

We find a vector

We show that for any invertible matrix

To analyse the properties of

This is a generalisation of the proof of Kuang et al. [

(i) Recall the group

(i) Equation (

(ii) Equation (

(iii) The decomposition

Rewrite the trace term using the identity