A Quantitative Comparison of Multiple Population Mortality Model on Some East Asian Countries and Regions

&is paper reviews the progress of the multiple population mortality model and the defects in parameter estimation and proposes an effective method to improve the performance of the mortality model. We set up a multiple population group, using the data of mainland China, Hong Kong (China), and Japan, to test fitting performance and forecasting performance. Using the TSWLS and TSSVD methods in a multiple population stochastic mortality model has advantages in fitting performance and robustness. In addition, the forecasting value of mortality ratio between any two populations can converge to a fixed constant in a certain time period which obeys the regular of human biological characteristics.


Introduction
With the demographic dividend gradually disappearing worldwide, it is common for the elders to have fewer children. In the future, it will seriously change the age structure of the population and interfere with the country's formulation of certain strategic policies. Additionally, the acceleration of the life expectancy and the ageing of the population would lead to varying shocks to the national pension system, commercial insurance companies, and families, which makes a negative impact on the economic development for a country. erefore, it is beneficial to take countermeasures in advance to help economic entities, using scientific methods to forecasting the population mortality and reasonably assessing the impact of longevity risk. e research on the method of forecasting of mortality has experienced the development from a single population model to multiple population model. Among them, some classical methods, such as the Lee-Carter model, APC model, and CBD model, have been verified for stability, which represent the frontier progress of the research on the stochastic mortality model. e stochastic mortality model which is used for a single population group is first proposed by Lee and Carter in [1]. e Lee-Carter model assumes that the logarithmic mortality is composed of independent age and period effects, with fewer model parameters, simple fitting process, robust forecasting results, and other advantages, which has been widely used by scholars all over the world. e Lee-Carter model applies a two-stage method to estimate parameters. In the first stage, it uses the orthogonal least squares (OLS) method, the maximum likelihood estimate (MLE) method, or singular value decomposition (SVD) method to estimate static parameters. For the second stage, dynamic parameters are fitted by the time series model [2]. Scholars have made many improvements to the Lee-Carter model, including the improvement of the parameter estimation method [3] and model hypothesis [4]. Renshaw et al. [5] took the cohort effect of population mortality into consideration and further expanded it into the age-period-cohort (APC) model. Although the APC model is used in medicine for a long time, the idea of modelling stochastic mortality originated from the Lee-Carter model. Furthermore, due to the relatively small number of population exposures of old age, the Lee-Carter model and APC model are not well suitable for the mortality of the elders. However, it is found that the CBD model with two factors could cope with this problem [6]. e CBD mode with cohort effects is a prominent choice [7] for fitting mortality of the elders when both BIC information criterion and robustness of the parameters are considered. With the continuous development of population mortality models, there is an increasing number of shortcomings of single population stochastic mortality model exposed and there would be unreasonable crossover or deviation in mortality forecasting in a long time [8,9]. Because the mortality modelling is a kind of systematic work, it only considers a single population group, which will cause different population mortality violating human biological laws over time. erefore, it is necessary to promote the forecasting performance of mortality in the long term by combining with two or more populations in one model, which can make the mortality model have better fitting goodness and forecasting performance.
Carter and Lee [10] proposed the first multiple mortality model called the Joint-k model, which assumes that the mortality of multiple populations has the common period effect factor, and the gap among population groups reflects only in the individual age effect factor. Li and Lee [11] extended the age effect factor to a common factor based on the Joint-k model and then put forward the Li-Lee model, an augmented common factor model, which has common age and period effect factors plus the additional age and period effect factors that represent the mortality in a single population. Li and Hardy [12] proved that there is a cointegration relationship on the trend of the period effect factor among multiple populations and then established a linear time-effect factor model called the cointegrated Lee-Carter model. Kleinow [13] proposed a common age effect (CAE) model, using a common principal component analysis to estimate the parameter. e model is simple and has a better fitting performance. Enchev et al. [14] compared the above models with the MLE method to select which model has the best fitting performance. According to the result, the CAE model and Li-Lee model are more suitable for modelling on multiple population mortality, but there is a problem in converging of parameter estimation. Li and Liu [15] built a logistic two-population mortality projection model for the mortality at ages 80 to 100 of both sexes, applied this model and its extensions to high-quality old-age mortality data of Belgium, Sweden, Switzerland, and the UK, and produced a decent model performance in both mortality fitting and forecasting. Tsai and Zhang [16] proposed a nonparametric method to forecast the mortality of a multiple populations of the United States, the United Kingdom, and Japan.
According to current research, it is found that the multiple population stochastic mortality model has become the frontier progress, and there have been studies on the quantitative comparison of different types of multiple mortality models, but the quantitative comparison of parameter estimation methods is still blank. In addition, most of the data used to test the multiple population mortality models are European countries in the human mortality database, but few studies use data from statistical institutions in developing countries. As a country with a large population and rapid economic development, the mortality rate of mainland China is rapidly decreasing, but there are few studies on mortality forecasting based on the multiple population model. It is meaningful for mainland China to build a multiple mortality model with neighbouring countries or regions which not only have the lower values and higher quality mortality but also have much closer genes and habits that can affect mortality. erefore, this paper selects East Asian countries and regions, including Hong kong (China) and Japan to build a multiple population model with mainland China to compare quantitatively the methods of parameter estimation. e structure of this paper is as follows. Section 2 explains the rationale of the methods of parameter estimation for the multiple mortality model. Next, Section 3 describes the data features and research scheme. Following this, Section 4 inspects the fitting performance of the proposed methods. Section 5 examines the forecasting performance among the three population groups. Finally, Section 6 concludes the paper.  propose a logarithmic, linear stochastic mortality model as follows:

Lee-Carter Model.
in which m(x, t) is the crude mortality rate at the age x in the year t for a single population, α(x) and β(x) are variables about the age x, while α(x) is the average of the logarithmic mortality in all years, β(x) shows the age effect parameter which stands for the slope of the logarithmic mortality, k(t) explains the period effect parameter that represents the slope of the logarithmic mortality in the year t, and ε(x, t) is the normal error with i.i.d. ere are three types of methods to estimate the parameters in the Lee-Carter model, including the OLS, SVD, and MLE method.

Li-Lee Model.
While requiring the high quality of death data, the Lee-Carter model can only be used to forecast the mortality of a single population. To make up for the defects of the Lee-Carter model, Li and Lee proposed a mortality model from the perspective of multiple populations as follows: in which m(x, t, i) is the crude mortality rate at the age x in the year t in the population i and α(x, i) explains the average of logarithmic mortality at the age x in all years in the population i. e common age effect parameter B(x) represents the slope of the logarithmic mortality at the age x for all population groups, K(t) is period effect parameter which shows the slope of the logarithmic mortality in the year t for all population groups, the specific age effect parameter β(x, i) is the slope of the logarithmic mortality at the age x in the population i, k(t, i) is the specific period effect parameter that represents the slope of the logarithmic mortality in the year t in the population i, and ε(x, t, i) is the normal error with i.i.d. Similar to the Lee-Carter model, the Li-Lee model also contains three methods to estimate the parameters that are SVD, OLS, and MLE. Li and Lee used the SVD method to estimate the parameters of the multiple population mortality model. However, that experiment discovers the lack of suitability in the SVD method for certain deformation forms. Enchev et al. used the MLE method, which has an extensive range of applications to almost all types of multiple population mortality models, to estimate parameters. Nevertheless, the multiple population mortality models have more parameters, and there will be errors that the converging value is mistaken for the optimal local solution during the calculation process of the Newton-Raphson iterative algorithm. For the sake of solving the problems above, we propose two methods to estimate the parameter of the Li-Lee model, which are two-step weighted least squares (TSWLS) and two-step singular-value decomposition (TSSVD).
In what follows, we derive and implement constraints for the parameters that allow us to solve the identifiability issues. e constraints are as follows: in which we normalise the sums of the common age parameter to equate to unity. Additionally, the specific age parameters should also equate to unity for every population i. Furthermore, the common period parameter should sum to zero, and finally, the specific period parameters should sum to zero for each population i as well. e steps of TSWLS and TSSVD methods are as follows: Step 1: using the Lee-Carter model, define and then estimate the parameters α(x, i), B(x), and K(t) for the combined dataset of all populations. Both the methods, TSWLS and TSSVD, have the same Step 1.
(2) Based on x B(x) � 1, get the estimation of K(t): in which t � t L , . . . , t U , i � 1, . . . , r, x � x L , . . . , x U , and i � 1, . . . , r; w i is the weight of the group i, that is, r i�1 w i � 1; in this article, the weight of every group Step 2: using the Li-Lee model, define in which the estimated α(x, i), B(x), and K(t) are fixed as constant values.
e above step is the parameter estimation process of the TSWLS method, and the process of the TSSVD method can be directly used by singular-value decomposition.

Mortality Graduation Method.
Although the Li-Lee model, as Kang et al. [17] reveal, is one of the classical methods among the multiple population mortality models, the fluctuation of crude mortality data defects the fitting performance. erefore, graduating the crude mortality data before fitting the model is necessary. In this paper, we apply the two-dimensional beta kernel density method, which can guarantee the reasonable smoothing results on both age and time dimensions, to graduate the crude mortality [18]. e two-dimensional beta kernel density method requires three steps, as follows: Step 1: modelling two-dimensional beta kernel density: (1) Define the distribution function of the numbers of death: in which x is the age, y is the year, at the age x in the year y, d(x, y) is the numbers of death satisfying the binomial distribution, e(x, y) is the exposure, and q(x, y) is the actual mortality. (2) Define the function of two-dimensional beta kernel density: Mathematical Problems in Engineering in which Z is a two-dimensional random variable that has the value space of Z � a Z , b Z , c Z � b Z − a Z , and h z is the bandwidth. e above formula can be standardized as (3) Estimate the crude mortality: in which _ q(x, y) is defined as the crude mortality that is accord with the q(x, y), x ∈ X, and y ∈ Y. Otherwise, Step 2: setting the adaptive bandwidth: e formula of bandwidth adaptation is in which h Z stands for the global bandwidth factor and sis a sensitive parameter as the local bandwidth factor, s ∈ [0, 1]. e reliability function l Z (z)can determine the numerical ranges of the local bandwidth factor, and at the same time, reliability function is limited by the local bandwidth factor so that extreme values do not occur. Mazza and Punzo used variation coefficient (VC) to measure l Z (z), which is as follows: When the reliability function satisfies the condition of Step 3: selection of sensitive parameters: In the process of mortality graduation of the two-dimensional beta kernel, the selection of bandwidth is realised by minimizing CV statistics, whose formula is in which res[ _ q(z), q −z (z)] is the residual in z and _ q(z)is the rude mortality in z. e formula of q −z (z) can be expressed as Furthermore, the proportional difference form of residual in mortality graduation is commonly used, that is,

Fitting and Forecasting.
Before forecasting the mortality out-of-sample, it is necessary to test the fitting performance. e average absolute percentage error (MAPE) is used frequently [19] whose expression is Forecasting the mortality should model the time-effect factors first. e Li-Lee model includes both the common time-effect factor, K(t), and the specific time-effect factor, k(t, i). ey are modeled as in which K(t) is the processes of a random walk with drift and k(t, i) explains the autoregressive process with one order. On this basis, we get the following mortality prediction formula: e forecasting results can not only reflect the trend of mortality decline but also ensure the long-term consistency of mortality among different population groups.

Data and Research Scheme
We perform our analysis based on data obtained from public resources of the National Bureau of Statistics of China (BSC) and Human Mortality Database (HMD). e typical dataset consists of the numbers of deaths and the central exposure.
e age period range considered is from 0 up to 100 years (101 consecutive ages in total) and from 1994 up to the year 2014 (21 years in total), respectively.
To compare advantages of methods used in this paper, we make the following designs: (1) due to the lower number of deaths, we select male mortality as the research samples; (2) in order to keep up with the standard of HMD data, we use twodimensional beta kernel density method to graduate the mortality in mainland China; (3) we select the data from East Asian countries or regions of mainland China, Hong Kong (China), and Japan to compare our method with the Lee-Carter model; and (4) we test the robustness of our method by replacing the data of crude mortality with the graduated mortality.

Quantitative Comparison of Fitting Performance
In this paper, we compare models between the multiple population stochastic mortality model and single population Lee-Carter model, so do methods between the TSWLS method and the TSSVD method in the Li-Lee model, from the perspective of horizontal comparison. Because major research on mortality modelling of China uses the crude data, this paper first applies the same mortality data for analysis. Also, the MLE method cannot converge to a stable value all the time in the Li-Lee model, so it is not included in our study. Based on methods of the second part in this paper, we can calculate the MAPE values. Table 1 shows the MAPE values of the three methods in mainland China, Hong Kong (China), and Japan, in which a smaller MAPE value indicates a better fitting performance. In general, the Li-Lee model with the TSSVD method has the lowest MAPE value of 20.22% in all the years from 1995 to 2014, while the Lee-Carter model has the highest MAPE value of 21.18%. erefore, we can infer that the fitting performance of multiple population mortality model under the two methods for parameter estimation is better than that under the single population mortality model. Next, we plan to analyse the fitting performance of different population groups below. For mainland China, the Lee-Carter model gets the lowest MAPE value, while the Li-Lee model under the TSWLS method has the highest in all the years from 1995 to 2014. However, in most of the five-year periods, the Li-Lee model does better than the Lee-Carter model on fitting performance. For example, the Li-Lee model with TSSVD method has the best performance in the years of 1995∼2000 and 2006∼2010. By contrast, the Lee-Carter model only shows a surprisingly lowest MAPE value in the years of 2011∼2014, which brings about the best fitting performance in the whole historical period. In Hong Kong (China), the Li-Lee model with TSSVD method always displays the lowest MAPE value during 1995 and 2014, and only in 1995∼2000, it gets slightly higher value than the TSWLS method. As for Japan, the last population group, the Li-Lee model with TSSVD method is the one that has the best performance among any periods of years. In terms of numerical values, Japan's MAPE value ranges from 5% to 9%, while that in mainland China is from 27% to 49% and 18% to 25% in Hong Kong (China). From the above analysis, we can assume a preliminary conclusion that when the higher the smoothness is, the better fitting performance the mortality model will have. en, we use the two-dimensional beta kernel density method, aiming to derive mortality with higher smoothness in the dimension of the period, to graduate the data in mainland China. Although the mortality from the HMD database is graduated, the method is constrained within the dimension of age, which ignores the impact of the period trend. To not only test the robustness of the estimation but also explain whether the smoother population mortality data can improve the fitting performance of the model, we use the two-dimensional smoothing method for the population mortality in mainland China. Table 2 shows the MAPE values of three methods in mainland China, Hong Kong (China), and Japan, respectively. From Table 2, we notice that the fitting mortality model based on the graduated data makes a considerable positive impact on mainland China, which reduces the MAPE value range from the 27%∼49% to 9%∼15% but has less effect on the fitting performance of Japan and Hong Kong (China). It means that the smoothness of the data can significantly improve the fitting effect of the Li-lee model, with both methods, and the Lee-Carter model. At the same time, we can observe that the three methods show a stable estimation effect in all populations. Consistent with Table 1, the Li-Lee model with TSSVD method gets the lowest MAPE value, while the Lee-Carter model has the highest MAPE value. Consequently, we can make advancement in model fitting performance by using the graduated mortality data and solving the multiple population mortality models with the TSSVD method, especially for forecasting of mortality in mainland China.

Quantitative Comparison of Forecasting Performance
According to equation (11), we can obtain the numerical values of out-of-sample mortality from 2015 to 2050. In this paper, we take the 30-year-old population as an example and then display the values in Table 3. All three types of methods could show a decreasing trend of population mortality in the future, but there are some differences in detail. From the perspective of different population groups, the mortality of the Lee-Carter model in mainland China declines rapidly from 0.97‰ in 2015 to 0.28‰ in 2050, while the forecasting values of the other two methods are very close to each other which decline from 1.00‰ in 2015 to 0.50‰ in 2050 together. For Hong Kong (China), the mortality forecasting results by all methods are similar, except the Li-Lee model with TSWLS, which has slightly higher values. Compared with that in mainland China, the future values of population mortality in Japan is also close under the two methods of the Li-Lee model, but the Lee-Carter model has higher forecasting values. e reason for the above situation is that the multiple population mortality model is systematic, which can consider the interrelationship between different population groups and make the expected results reasonable in the long term. Because the single population of the Lee-Carter model assumes that the mortality decreases with a fixed constant, the forecasting values of mortality among different population groups will cross or Mathematical Problems in Engineering deviate abnormally in the long term. erefore, the multiple population mortality model is more suitable for forecasting mortality than the single population mortality model in the long term, also effectively improves the reliability of the prediction value.
Next, we use three figures to show the forecasting performance vividly. From the perspective of different methods, we further analyse the evidence of rationality of the population mortality model. Figure 1 shows the trend of mortality in different populations under the Lee-Carter   Mathematical Problems in Engineering model in 2015-2050. We can find two unreasonable problems in Figure 1. Firstly, the curve of mortality trend of mainland China intersects with Japan in 2038. On the other hand, it keeps below that of Japan in the following years. e reason is that the Lee-Carter model only considers the velocity of single population mortality decline in historical data. Due to the higher level of mortality in mainland China, the velocity of mortality decline rate is faster than that of Japan, which has finished the historical stage of rapid decline of mortality. If the mortality rates of the two population groups continue to fall at the current speed, the life expectancy in mainland China will surpass that in Japan soon after, though it is contradicting to the biological law of human beings. As time goes by, the gap of mortality curves between Hong Kong (China) and Japan is gradually widening with a trumpet shape, which is also attributed to the defects in the hypothesis of a single mortality model. Figure 2 demonstrates the forecasting values of mortality based on the Li-Lee model with the TSWLS method, the above problems in the single mortality model can be fixed through the multiple mortality model. e mortality of the three groups shows a consistent trend from 2015 to 2050 in Figure 2, similar to the historical experience of decline. Mortality of mainland China maintains a higher level than the others, and the gap is narrowing. However, there will be no crossing in the future. We can tell from Figure 2 that the mortality gap between Hong Kong (China) and Japan is relatively small and even overlapped in a short period, which is not accord with the current mortality relationship between the two populations.
How to address this issue can be found in Figure 3.  Figure 2. Yet, the TSSVD method for forecasting the mortality of two population groups in Hong Kong (China) and Japan has more advantages, as reflecting the mortality gap decreasing over time but no crossing in the short term. Overall, we conclude that the multiple population model is better than the single population model in terms of forecasting of mortality, and the TSSVD method is more suitable for the multiple population model than the TSWLS method.

Conclusion
is paper reviews the progress of the multiple population stochastic mortality model and finds some defects in parameter estimation, so that it proposes an effective method to improve the parameter estimation. We set up multiple population groups, using the data of mainland China, Hong Kong (China), and Japan, to test fitting performance and forecasting performance, and draw the following conclusions: (1) for the parameter estimation method, TSWLS and TSSVD methods, in a multiple population stochastic mortality model, can avoid some problems, like maximum likelihood estimation is not converge, or the result is local optimal caused by the more parameters. e two methods are simple and easy to understand and are the tools for the application in the multiple population mortality model. (2) For fitting performance, the multiple population mortality model is robustness based on TSWLS and TSSVD methods. Additionally, the fitting performance can be significantly improved based on graduated data, which illustrates that the multiple population mortality model is more suitable for Data Availability e data used to support the findings of this study are from the Human Mortality Database and China National Bureau of Statistics.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.