Frequentist and Bayesian Regression Approaches for Determining Risk Factors of Child Mortality in Ghana

Background Child mortality is a global health problem. The United Nations' 2018 report on levels and trends on child mortality indicated that under-five mortality is one of the major public health problems in Ghana with a rate of 60 deaths per 1000 live births. To further mitigate this problem, it is important to identify the drivers of under-five mortality in order to achieve the United Nations SDG Goal 3 target 2. Methods In this study, we investigated the effects of some selected risk factors on child mortality using data from the 2014 Ghana Demographic Health Survey. We modelled the relationship between child mortality and the risk factors using a logistic regression model under the frequentist and Bayesian frameworks. We used the Metropolis-Hastings Algorithm to simulate parameter estimates from the posterior distributions, and statistical analyses were carried out using STATA version 14.1. Results Results from the frequentist framework are in line with those from the Bayesian framework. The results showed an increased risk of death among children who were delivered through caesarean and reduced relative odds of death among children whose sizes are average or large at birth and whose mothers have formal education. Conclusions There is a need for improved health facilities for better health-care for mothers and children. Education should, among other things, emphasise on the need for mothers to go for regular check-ups during antinatal and postnatal periods for improved mother and child health.


Background
One of the indicators that measure national development is child mortality [1,2]. The world made remarkable progress in child survival in the past few decades, and millions of children have better survival chances than in 1990-5. It is known that 1 in 26 children died before reaching age five in 2018, compared to 1 in 11 in 1990. Moreover, progress in reducing child mortality has been accelerated in the 2000-2018 period compared with the 1990s, with the annual rate of reduction in the global under-five mortality rate increasing from 2.0 per cent in 1990-2000 to 3.8 per cent in 2000-2018. Despite the global progress in reducing child mortality over the past few decades, an estimated 5.3 million children under age five died in 2018-roughly half of those deaths occurred in sub-Saharan Africa [3].
The global under-five mortality rate declined by 59 per cent, from 93 deaths per 1000 live births in 1990 to 39 in 2018. Despite this considerable progress, improving child survival remains a matter of urgent concern. In 2018 alone, roughly 15,000 under-five deaths occurred every day, an intolerably high number of largely preventable child deaths [3,4]. With the end of the MDG era, the international community agreed on a new framework-the Sustainable Development Goals (SDGs) where the target is to end preventable deaths of new-borns and children under-5 years of age. The goal is for all countries aiming to reduce under-five mortality to at least as low as 25 per 1000 live births. 120 Member States already met the SDG target on under-five mortality, and 21 countries are expected to meet the target by 2030, if the current trends continue [3].
Disparities exist in this reduction across countries. According to estimates from the Global Burden of Disease (GBD) 2017 SDG Collaborators, many countries are on track for achieving the target of at least 25 deaths per 1000 live births by 2030. However, about 31 countries/territories would need to achieve annual rates of decline from 2015 to 2030 that are two to ten times higher than what was recorded for 1990-2015 in order to achieve this goal [3,5].
Despite the substantial decline in global under-five mortality, the rates remain high in sub-Saharan Africa where many countries like Ghana in the region failed to meet the Goal 4 of the Millennium Development Goals (MDGs) targets which aimed at a two-thirds reduction in the underfive mortality rate by 2015. In 2015, the under-five mortality rate in sub-Saharan Africa was 79 deaths per 1000 live births compared to the global rate of 41 deaths per 1000 live births in the same year [3,4]. The under-five mortality rate in Ghana is still high with a rate of 60 deaths per 1000 live births in 2014 which fell short of the target set in the Ghana Underfive Child Health Policy 2007-2015 which targeted a reduction in under-five mortality to 40 deaths per 1000 live births by 2015 [6]. In Ghana, several national policies, strategies, and interventions with notable ones among them been the Child Health Policy 2007-2015, Community-based Health Planning and Services (CHPS) policy, and National Health Insurance (e.g., free maternal delivery services, free treatment of children aged below 18 years) were launched to improve and promote the health of Ghanaian children [6,7]. Despite all these initiatives, under-five mortality in Ghana is still high in the country. A recent study conducted in 2016 among 46 African countries reported that Ghana is among the 8 countries that are making very little progress towards the reduction in under-five mortality [4,8]. This calls for further examination of factors that may be militating against the expected reduction in the under-five mortality in the country. Aheto [4] modeled the relationship between under-five mortality and risk factors using univariate and multivariate logistic regression models (under the frequentist framework). In this paper, we modeled the relationship between child mortality and the risk factors using the logistic regression model under the frequentist [9][10][11][12][13][14][15][16][17] and Bayesian [18,19] frameworks. We used the Metropolis-Hastings Algorithm to simulate parameter estimates from the posterior distributions, and statistical analyses were carried out using STATA version 14.1.
This paper is divided into 4 main sections. We have given the background of the study in Section 1. In Section 2, we introduce the study setting, source of data, response variable, risk factors, and statistical methods for the child mortality data. Section 3 presents the results of the statistical analyses using the data.

Methods
In this section, we introduced the study setting and the source of data. We also introduced the response variable (child alive or dead) of interest as well as some potential predictors of the status of the response variable. This is followed by a discussion on the statistical approaches used in this study.

Study Setting and Data
Source. This study is conducted in Ghana. The study used child mortality data on all the then ten regions in Ghana. The data were obtained from the Ghana Demographic and Health Survey for 2014. This study is cross-sectional, where the response variable of interest and its associated risk factors were measured at a single time point. In this study, we focused on individual child mortality data. The data used consist of 5868 children death records. Using these data, we categorized children into two groups; that is, those who are alive and those who are dead at the time of the survey. Also, we considered data on factors that are likely to determine whether a child is alive or dead. Ethical approval and consent to participate statements can be found on http://dhsprogram.com/What-We-Do/Protecting-the-Privacy-of-DHS-Survey-Respondents.cfm, approved by the ICF International Institutional Review Board (IRB). We will now introduce the outcome variable and the risk factors of child mortality.

Outcome Variable.
In this study, the outcome variable of interest is child mortality status (which was coded 0 if a child is alive or 1 if a child is dead).

The Risk Factors or Predictors of Child Death Status.
The status of the response variable, introduced in the previous section, depends on certain risk factors (also called predictors). These risk factors determine the status of the response variable. The risk factors used in this study are presented in Table 1.
The risk factors are gender (which was coded 0 if a child is female or 1 if a child is a male), where the child was delivered (which was coded 0 if a child was delivered at private and other health facilities or 1 if government health facility), caesarean section (CS) (which was coded 0 if a child was delivered normal or 1 if a child was delivered through CS), size of a child at birth (coded 0 if small, 1 average, or 2 if large), wealth index (was coded 0 if poor, 1 if average, or 2 if rich), geographical location (was coded 0 if rural or 1 if urban), and mother's educational status (was coded 0 if no formal education or 1 if formal education). Table 1 presents the percentage distribution of the background characteristics.
The percentage distributions of the variables in Table 1 showed that approximately 5% of the children died. There is a higher proportion (52%) of the male children with approximately 63% of the children delivered at government health facilities, and 10% of the children were delivered through caesarean section. Approximately 50% of the children were large in size at birth and 33% were average in size at birth, and a large proportion (54%) of children were from poor income homes. The majority (60%) of the children live in rural areas with more than 65% of the mothers having a formal education.

Statistical
Analysis. First, we used the chi-Square test statistic [20][21][22][23] to investigate whether there is a significant association between the response (death status) and the predictors shown in Table 1. In our further analyses, we used the logistic regression [9][10][11][12][13][14][15][16][17] under the frequentist framework, to establish the relationship and to estimate the effects of the predictor variables on child mortality (death status). We then considered the logistic regression model under the Bayesian framework [18,19]. We compared the results where X 1 , ⋯, X p are the predictors of child mortality, β 1 , ⋯β p are the coefficients of regression, and β 0 is the intercept. The coefficients of regression represent the magnitude and direction of the effects of X design matrix of the predictors on the dichotomous response variable y i and β is a vector of the regression coefficients. The p in equation (1) represents the probability that a child has and ðp/1 − pÞ is the odds of death among children who are exposed to the predictors compared with those children who are not exposed to the same predictors. This implies that β is the log odds ratio of the death among children who are exposed to the predictors relative to those who are not exposed to the predictors [11]. This statistical method was implemented to the child mortality data using STATA version 14.1 software [25][26][27][28] for the estimation of the regression coefficients.

Bayesian Regression
Model. The Bayesian inference for logistic analyses is conducted the same way as the usual pattern for all Bayesian analyses. That is, we need to specify the likelihood function for the data and prior distribution for the parameters in the proposed regression model. Thereafter, we multiply the data likelihood function and the prior distribution for the parameter estimates to derive the posterior density, from which all statistical inferences are drawn.
The probability of success or death among the children varies from one child to another, depending on their risk factors. Hence, the likelihood function for the child mortality data follows a binomial distribution. Let y = ðy 1 , y 2 , ⋯,y n Þ be n independent binomial or binary random variables with the probability mass function defined as with θ i = FðX ′ βÞ, F is a cumulative distribution function, X ′ is a p-dimensional covariates, and β is the vector of unknown regression coefficients, and m i is the number of observations for the ith child. The logit link function was obtained by setting It follows from equation (2) that the likelihood function of binomial data can be defined as From expression (3), we can show that It follows from expression (5) that the likelihood In the expression (6), the unknown parameters are β 0 , β 1 , ⋯, β p . So, we need to specify prior distribution for these unknowns. We used the most common priors for logistic regression parameters, which are of the form: β j~N ðμ j , σ 2 j Þ. The distribution of the parameter estimates can be expressed explicitly as The most common choice for μ j is zero, and σ j is usually chosen to be large enough to be considered as noninformative.
As stated earlier, the posterior distribution is obtained by multiplying the prior distribution over all parameters by the full likelihood function as It follows from expression (8) that The above expression (9) has no closed-form expression. Even if it has a close form, we would have to carry out multiple integrations over the parameter estimates to obtain the marginal distribution for each coefficient. In this study, we use the Metropolis-Hastings Algorithm, with the bayesmh package in STATA, to simulate parameter estimates from the posterior distributions. Note that these parameter estimates are subject to Monte Carlo error, which is difficult to quantify. We have therefore chosen a very long run of which convergence was reached at 500000 after a burn-in period of 1000 and thinning of every 99th element of the chain for each model.

Results
3.1. Results from the Chi-Square Test Statistics. In Table 2, we presented the results of the chi-square test of association between the outcome variable (death status) and the potential predictors of death status. The purpose of this exercise is to identify variables that more likely to predict the status of the outcome variable.
Although the wealth index, geographical location, and gender variables were statistically insignificant (at 5% level of significance), they were included in the multiple logistic regression model to further assess their effects on child mortality and exclusion or inclusion in the logistic regression.

Results from the Logistic Regression Model.
Our results showed that the wealth index (middle income, p value =0.743; rich, p value =0.610) and geographical location (p value =0.475) were highly statistically insignificant (at 5% level of significance), and hence, they were removed from the multiple logistic regression model. The best-fitting model has gender, delivery place, caesarean section, child size at birth, and mother's education variables included. These results are shown in the second column of Table 3. It can be observed that the unadjusted odds ratios (in the first column of Table 3) showed that gender, delivery place, and caesarean section are still statistically insignificant (at 5% level of significance). However, caesarean section becomes insignificant (at 5% level of significance) under the adjusted odds ratios (in the second column of Table 3).

BioMed Research International
For both the unadjusted and adjusted odds ratios, there is approximately 20% increased relative odds of death among male children compared to female children. However, these increases are not statistically significant (at 5% level of significance). The unadjusted odds showed that there is 21% reduced relative odds (OR = 0:792, 95%CI = 0:623, 1:006) of death among children who were delivered at government health facilities compared to private and other health facilities, whereas there is 20% reduced relative odds (OR = 0:804, 95%CI = 0:624, 1:036) of death among children who were delivered at government health facilities compared to private and other health facilities for the adjusted odds ratio. We observed these reductions are not statistically significant (at 5% level of significance).
The unadjusted odds showed that there is 1.3-folds increase in the relative odds (OR = 1:314, 95%CI = 0:924, 1:868) of death among children who were delivered through caesarean section. However, this increase is not statistically significant (at 5% level of significance). After adjusting for other risk factors of child mortality, there is a 1.4-fold increase in the relative odds (OR = 1:449, 95%CI = 1:005, 2:089) of death among children who were delivered through caesarean section compared to those who were born through normal delivery. This increase is statistically significant at 5% level of significance.
With the unadjusted odds ratios, we observed a 22% reduced relative odds (OR =0.780, 95% CI =0.600, 1.014) of death among children whose sizes are average at birth compared to those whose sizes are small at birth and 23% reduced relative odds (OR = 0:774, 95%CI = 0:610, 0:983) among children whose sizes are large at birth compared to those whose sizes are small at birth. When we adjusted for other risk factors of death, the relative odds (OR = 0:498, 95%CI = 0:362, 0:684) of death was reduced by 50% among children whose sizes are average at birth compared to those whose sizes are small at birth and 49% reduced relative odds (OR = 0:513, 95%CI = 0:384, 0:685) among children whose sizes are large at birth compared to those whose sizes are small at birth. The unadjusted odds ratio results showed that there was a 25% reduced relative odds (OR = 0:748, 95CI = 0:588, 0:952) of death among children whose mothers have formal education relative to those whose mothers have no formal education and 23% reduced relative odds (OR = 0:766, 95%CI = 0:596, 0:984) among children whose mothers have formal education relative to those whose mothers have no formal education when we adjust for other risk factors of child mortality. These reductions were found to be statistically significant.

Results from the Bayesian Logistic Regression Model.
The results from fitting Bayesian logistic regression model to the child mortality data are shown in Table 4. Parameter estimation was carried out using the Markov Chain Monte Carlo (MCMC) via Metropolis-Hastings Algorithm. Convergence of the MCMC was reached at 500000 iteration after a burnin period of 1000 sample and thinning of every 99th element of the chain. The Gelman and Rubin's convergence diagnostics for the parameters β 1 (gender) and β 2 (place of delivery) are presented in Figure 1. The Gelman and Rubin's convergence diagnostics of the parameter estimate β 3 (caesarean section), β 4 (large child size at birth), β 5 (average child size at birth), and β 6 (mother education) are presented in Figure 2. The traces, normality, and autocorrelation plots of these parameters showed that the Markov chain has converged.
The statistical inferences under the Bayesian paradigm are similar to those under the frequentist paradigm. Our results revealed that children who are males are at 21% increased relative odds (OR = 1:206, 95%CI = 0:950, 1:537) of death compared to females, and children who were     BioMed Research International small size children at birth. The results also showed that children whose mothers have formal education are at 23% reduced relative odds (OR = 0:766, 95%CI = 0:589, 0:986) of death compared to children whose mothers have no formal education.
Using the adjusted odds (under Tables 3 and 4), we observed that the size of a child at birth and mothers with formal education significantly reduce child mortality and caesarean section significantly increases child mortality.

Discussion
This paper investigated the effects of various risk factors on child's mortality in Ghana. The child's mortality status in this study was a dichotomous variable coded as 1 if a child was dead and 0 if a child was alive. The study used child mortality records data from the 2014 Ghana Demographic and Health Survey (2014GDHS). The data used for the analyses in this paper consist of 5884 complete cases [29][30][31] with 7 potential risk factors of child mortality. Analyses in this paper were carried out using the STATA software. We assessed the effects of the various risk factors on child mortality using a logistic regression model under the frequentist and the Bayesian frameworks. We built our regression models by first using a chi-square test statistic to determine potential predictors of child mortality.
We observed that 289 (4.91%) of the 5,884 children died. Risk factors such as caesarean section, size of the child at birth, and mother educational status were found to be significantly associated with child mortality. These study findings are in line with literature on child mortality in sub-Saharan Africa and developing countries. Child mortality is significantly higher among children who were delivered through caesarean section relative to those who were born through normal delivery. This is expected because literature [32][33][34] on child mortality and morbidity revealed that caesarean  Figure 2: Gelman and Rubin's convergence diagnostics for caesarean section, large child size, average child size, and mother education parameter estimates β 3 , β 4 , β 5 , and β 6 ,respectively. section is associated with an increased risk of child mortality. A possible explanation is that maintenance of body temperature, glycaemia, and pulmonary respiration in children born through caesarean section are delayed or affected [33]. Also, the development of a child's immune system is affected in children born through caesarean section [33].
There is a significant reduction in the risk of death among children whose sizes are average or large relative to those whose sizes are small at birth. This gives an indication that the diet of pregnant women must be balanced and health status regularly checked and improved for normal size or healthy children. A balanced diet for mothers will improve the nutritional status of the child or child health outcomes which will in turn lead to normal size child since studies [35][36][37][38] have shown that small child size is associated with increased risk of malnutrition.
A significant reduction of death among children whose mothers have education relative to those whose mothers have no education gives an indication that there is the need for improved maternal education which is likely to reduce mortality among children. This finding is in line with the findings from various authors [4,[39][40][41] on child mortality. Maternal education is expected to result in a reduction of child mortality because various authors [38,39] revealed that improved maternal education is beneficial to mothers and their children as well as the society. Maternal education is beneficial to mothers and their children and hence is likely to reduce child mortality because mothers exposed to education are likely to utilize maternal health care services, such as antenatal care and postnatal care, for healthy mothers and children. Previous studies [37,39,41,42] revealed that maternal education could offer mothers the opportunity to take part in making decisions that will improve the child health outcomes and utilization of maternal and child care services.
Our results also revealed that there is a reduction in the risk of child mortality among female children relative to male children. This study finding agrees with Aheto [4] finding, and the reason may be that female children are "more likely to develop early fetal lung maturity which will be protective of respiratory diseases unlike their male counterparts" [40,43].

Conclusion
Although there is a slight difference in terms of estimates from the frequentist and Bayesian regression models, statistical inferences from the Bayesian logistic regression model agreed with that of the frequentist logistic regression model. Substandard children are more likely to die; there is higher child mortality among children who were delivered through caesarean section and low mortality among children whose mothers have formal education.
Based on the study findings, we recommend the government and health authorities concern to improve health facilities for better health-care for mothers and children. With such improvement of health facilities, there will be a more secured caesarean section, which is likely to significantly decrease the risk of child mortality due to caesarean section. Also, health authorities should intensify the education of mothers/families on the need or importance of using maternal health care services during pregnancy and after delivery. Education should, among other things, emphasize the need for mothers to go for regular check-ups during antinatal and postnatal periods for improved mother and child health. For standard or normal size children, we recommend that diet of mothers should be improved during antinatal care and postnatal care for better child's health which is likely to produce healthy (standard child with normal size) children which have the potential of reducing child mortality.
The first limitation of the study is that the authors were unable to adjust for potential risk factors that were not collected during the data collection process since the authors used secondary data. Second, the study could not utilize most risk factors because they have most of their values missing [29][30][31]. Third, because this study receives no funding, the authors were unable to raise funds for the collection of data on potential risk factors that were not considered during the survey or considered during the survey but could not be used because they have almost all their values missing.
This study considered only complete data (data without missing values) and hence could not take into account several other important determinants of child mortality because they have missing values or were not considered during the survey. Future studies should consider the use of primary data to allow for the collection of data on these variables and other important variables that were not considered during the survey. Also, future studies should employ sophisticated data management approaches for the imputation of missing data values to obtain a complete dataset [29] before statistical models are applied. Future studies can also consider models that account for missing values in the modelling process.