Comparative Analysis of Some Structural Equation Model Estimation Methods with Application to Coronary Heart Disease Risk

'is study compared a ridge maximum likelihood estimator to Yuan and Chan (2008) ridge maximum likelihood, maximum likelihood, unweighted least squares, generalized least squares, and asymptotic distribution-free estimators in fitting six models that show relationships in some noncommunicable diseases. Uncontrolled hypertension has been shown to be a leading cause of coronary heart disease, kidney dysfunction, and other negative health outcomes. It poses equal danger when asymptomatic and undetected. Research has also shown that it tends to coexist with diabetes mellitus (DM), with the presence of DM doubling the risk of hypertension. 'e study assessed the effect of obesity, type II diabetes, and hypertension on coronary risk and also the existence of converse relationship with structural equation modelling (SEM).'e results showed that the two ridge estimators did better than other estimators. Nonconvergence occurred for most of the models for asymptotic distribution-free estimator and unweighted least squares estimator whilst generalized least squares estimator had one nonconvergence of results. Other estimators provided competing outputs, but unweighted least squares estimator reported unreliable parameter estimates such as large chisquare test statistic and root mean square error of approximation for Model 3. 'e maximum likelihood family of estimators did better than others like asymptotic distribution-free estimator in terms of overall model fit and parameter estimation. Also, the study found that increase in obesity could result in a significant increase in both hypertension and coronary risk. Diastolic blood pressure and diabetes have significant converse effects on each other.'is implies those who are hypertensive can develop diabetes and vice versa.


Introduction
Structural equation modelling (SEM) reduces several manifest variables to few related latent factors by explaining the covariance structure in the observed manifest variables using a combination of confirmatory factor analysis and path modelling in which the manifest relationships are hypothesized [1]. Comparative analysis of SEM estimation methods in applications is not commonplace because methods such as the traditional maximum likelihood estimator (MLE) and generalized least squares estimator (GLSE) have proven robust and high performing [2]. However, they are constrained by normality assumption unlike asymptotic distribution-free estimator (ADFE) and unweighted least squares estimator (ULSE) which do not require normality assumption, whilst ADFE is considered robust [3]. e MLE was shown to perform better than ADFE, GLSE, and ULSE although ADFE was developed as a robust estimator [4,5]. In preventing matrix singularity leading to nonconvergence due to small sample sizes in SEM, the ridge maximum likelihood estimators have shown to perform better than MLE [6]. Yuan and Chan [6] developed the ridge estimator (RMLE a ) which adds the ratio of the number of manifest variables and sample size to the diagonals of the covariance matrix. is method did better than the traditional MLE but lacks much information from data. A follow-up to this estimator is a proposed ridge maximum likelihood estimator (RMLE h ) in a companion paper, which includes much information from the data by adding a constant to the diagonals of the covariance matrix. ese estimators will be compared based on fit indices from a modelled real-life data on the relationship between hypertension, diabetes, obesity, and coronary risk. e study also assessed the effect of obesity, type II diabetes, and hypertension on coronary risk and the converse effect of diabetes on hypertension. Hypertension is a silent killer due to its ability to cause heart failure, stroke, kidney dysfunction, and others without showing symptoms [7]. Hypertension and diabetes studies showed that they frequently coexist, and people with diabetes are twice higher at risk of hypertension than those without it, and both conditions have similar causes [8,9]. Most studies showed that hypertension results in diabetes, but limited studies considered the converse effect. Obesity is the outcome when a person accumulates extra weights greater or equal to 20% of total body fat [10], which could be harmful to the person's well-being. Obesity or body fat measures such as body mass index (BMI), waist circumference (WC), and waist-to-hip ratio (WHR) were proven risk factors of type II diabetes and cardiovascular diseases [11][12][13]. BMI is usually used as the major measure of obesity and overweight. However, according to Jacobsen and Aars [14], WC, as abdominal obesity measure, could be used to measure obesity as it is able to give more information regarding diseases that result from excess weight [15]. In a study by Dagan et al. [16], it was stated that although BMI is commonly used, it does not reflect the body shape, and in addition, even though both measures were endorsed by the American Heart Association, BMI is still mostly used for adiposity. According to Kurniawan et al. [17], visceral fat (VF), body fat percentage, WC, BMI, and body weight measures have been used in measuring obesity. Using these measures could help to measure obesity as a theoretical variable and could be studied to better assess obesity in humans.
e SEM approach involving model representation and estimation methods of model parameters are presented in Section 2. Section 3 discusses the theoretical background to the relationships in the dataset and results of SEM application to the data whilst Section 4 presents the results and discussion and Section 5 concludes the study with a recommendation for policy implementation.

Structural Equation Model.
e SEM has a structural part represented in equation (1) as follows: e manifest variables are used to measure the exogenous (ξ) and endogenous (η) theoretical variables in equation (1) which are represented, respectively, in measurement models 2 and 3 as follows: where η contains the endogenous latent variables, ξ contains the exogenous latent variables, β contains the coefficients of η variables, ζ contains the random disturbances or errors associated with the structural model, Γ is the matrix of the coefficients of exogenous latent variables, ε and δ are the random errors associated with the measurement models for determining, respectively, endogenous and exogenous latent variables, and X and Y are the independent and dependent manifest variables [3].

Estimation of Model
Parameters. Estimating the model parameters (θ) in equations (1)-(3), we seek to minimize S − Σ(θ) using a function F(θ) [3,18]. Using the MLE by Jöreskog [19], we have where k is the number of manifest variables in the structural equation model, S and Σ(θ) are sample and implied covariance matrices, respectively [20]. Equation (4) is nonlinear, so iterative processes are employed in the minimization [2]. e MLE performs well when data follow normal distribution, but breaks down with varying degrees of outliers. Nonconvergence of results does occur with MLE when the sample size is small. Yuan and Chan [6] proposed a ridge maximum likelihood estimator which models S + aI instead of S, where a is a constant derived as (a � k/N). Since a does not take much information from data [21], another constant was suggested in RMLE h . erefore, instead of modelling S a in RMLE a , S h � S + hI was modelled in RMLE h , where h � k((1 + α)/N), α � (1/(1 + d 2 )), and d i (i � 1, 2, . . . , k) are the eigenvalues with mean (d). e other estimators, namely, ULSE, GLSE, and ADFE, are computed by equations (5)- (7). e ULSE is not based on the normality assumption but requires similar scales of measurement for all manifest variables. e ULSE is computed by the following equation: whilst GLSE is given by where tr(·) is the trace of the matrix. e GLSE computes the discrepancy function by minimizing the weighted difference of the sample covariance matrix (S) and model implied covariance matrix Σ(θ), using the same assumptions underlying the MLE. e ADFE which is in the family of weighted least squares (WLS) was proposed by Browne [22] to resist the effect of nonnormality in data for covariance structure models. It is computed by the following equation: where s and σ are the column vectors of the nonduplicate elements of sample and implied covariance matrices, respectively, and W is a positive definite matrix with size k * � (k(k + 1)/2) [23]. e lavaan package in R was used to produce the results of the model fit and path coefficients whilst the DiagrammeR package was used to report the path diagrams.

Model Adequacy Test.
e SEM, unlike the linear models, adopts fit indices [24]. In this study, we considered the absolute, relative, and parsimonious fit indices. e absolute fit indices are used in omnibus test, which is usually undertaken in SEM to test whether Σ � Σ(θ) or not, where Σ is the covariance matrix for the population which is estimated using sample covariance matrix S [25]. is test is distributed as χ 2 df , and nonsignificance of this test implies the discrepancy between these two covariance matrices is not significant.
e chi-square test tests the hypothesis: H 0 : Σ � Σ(θ) versus H 1 : Σ ≠ Σ(θ) that all the residuals are zero with a test statistic which follows the chi-square distribution with degrees of freedom, df � (k(k + 1)/2 − t), where t is the number of parameters to be estimated. e chi-square statistic with a large sample size rejects the null hypothesis and the test statistic from a small sample size lacks power, and as a result, the relative chi-square (χ 2 /df) was developed by Satorra and Bentler [26] such that 2 ≤ χ 2 /df ≤ 5. e goodness-of-fit index (GFI) assesses the amount of variance and covariance in the sample variance matrix that is predicted by the Σ(θ) [27], which is affected by sample size. e GFI is computed by the following equation: which usually falls between 0 and 1, but becomes desirable if it is at least 0.95 [28]. e adjusted goodness of fit (AGFI) by Jöreskog and Sörbom [27] adjusts the GFI for model complexity with degrees of freedom. Like the GFI, the AGFI falls within 0 and 1 and also sensitive to the sample size [28]. It is calculated by the following equation: e root mean square residual (RMR) by Jöreskog and Sörbom [27] is the square root of the average residual between the elements of sample covariance and predicted covariance matrix [29]. e RMR is computed by the following equation: where k � n + m, which is the total number of exogenous and endogenous variables. Generally, RMR assumes values from 0 to 1 (0 ≤ RMR ≤ 1), but RMR ≤ 0.05 is more preferred. When there are differences in the scales of measurement for the observed variables, it makes it difficult to interpret, and hence, standardized root mean square residual (SRMR) was developed for easier and meaningful interpretation [28,29]. e SRMR is computed by the following equation: where s ij and σ ij are the elements of the covariance matrix of the sample data and implied covariance matrix, respectively. e SRMR takes values from 0 to 1, and the lower the value of SRMR, the better. e root mean square error approximation (RMSEA) by Steiger and Lind [30] is among the fit indices which are used to assess the fitness of model data and are classified as badness-of-fit indices. e RMSEA is computed by the following equation: Several studies consider a model as close fit if RMSEA < 0.05, an average fit if 0.05 ≤ RMSEA ≤ 0.08, neither good nor bad fit if 0.08 < RMSEA ≤ 0.1, and poor fit if RMSEA > 0.1 [28]. e relative fit indices usually compare the chi-square statistic for the hypothesized model with the baseline model [29]. e values for "normed" or scaled fit indices should fall between 0 and 1 inclusive. However, sometimes the nonnormed fit indices assume values less than 0 or more than 1. A recently agreed cutoff point for a good model fit based on relative fit indices is a value greater or equal to 0.95 [31]. e comparative fit index (CFI) by Bentler [32] is used when comparing hypothesized and baseline models and is computed by the following equation: where χ 2 b and χ 2 h are the chi-square test statistics of the baseline and the hypothesized models, respectively, with corresponding degrees of freedom df b and df h . e CFI value is between 0 and 1, which is less affected by sample size and has an acceptable value of greater than or equal to 0.95 [32].
In rescaling the chi-square into 0 (no fit) and 1 (exact fit), the normed fit index (NFI) by Bentler and Bonett [33] is used and computed by the following equation: Journal of Probability and Statistics 3 where the fitting function value of the baseline model is given as F b and that of the hypothesized model is F h . is fit index responds to sample size, and acceptable value should be greater or equal to 0.95 [31]. e relative fit index (RFI) proposed by Bollen [34] is to reduce the effect of sample size on NFI by Bentler and Bonett [33]. It is computed by the following equation: e Tucker and Lewis fit index (TLI) [35], also known as nonnormed fit index (NNFI) [33], was developed against or to reduce the effect of sample size, but it sometimes reports values not within 0 and 1 inclusive. e TLI is computed by the following equation: e McDonald and Marsh [36] and Bentler [32] proposed the relative noncentrality index (RNI) for assessing model fit which is less affected by sample size but not bounded by 0 and 1. RNI is computed by the following equation: e Bollen incremental fit index (IFI) is one of the fit indices which are not affected by the sample size [37]. Many studies showed that some of the fit indices are influenced by sample sizes in such a way that the larger sample sizes appear to lead to fit indices with larger values [28,31]. IFI is computed by the following equation: To penalize for model complexity, the normed and goodness-of-fit indices were adjusted for loss of degrees of freedom for estimation of more parameters. As a result, Mulaik et al. [38] and James et al. [39] developed parsimonious normed fit index and adjusted goodness-of-fit index, respectively. ese fit indices usually assume values close to 0.5 [28]. e parsimonious normed fit index (PNFI) is given by the following equation: whilst the parsimonious goodness-of-fit index (PGFI) is computed by the following equation:

Study Data.
In order to compare the estimators, we used the diastolic and systolic blood pressure to measure a theoretical variable called hypertension [12]. Obesity and age are risk factors for hypertension and diabetes which are cardiovascular disease risk factors [10,40]. ese conditions contribute the highest to illness, disability, and mortality. Mortality due to cardiovascular diseases is very common. Coronary risk is an index used to assess the risk of heart disease. e direct effects of age, obesity, hypertension, fasting blood sugar, and postprandial glucose were assessed on the coronary risk (see the conceptual models in Figure 1). Because diastolic and systolic blood pressures contribute differently to hypertension, four additional models were fitted measuring hypertension with manifest variables with the focus of the converse relationship between hypertension and diabetes.
We used a real-life dataset collected and used by Lokpo et al. [41], from the Ghana Prison Service, Ho in Volta Region of Ghana after the Research Ethics Committee (REC) of the University of Health and Allied Sciences gave a clearance ("ERC/UHAS-REC A.4 [188] 18-19"). Other ethical considerations were followed including informed consent. ree variables, fasting blood sugar, postprandial glucose, and coronary risk in Table 1 had few missing values and they were imputed using their respective medians. e study data accounted for sampling adequacy using the Kaiser-Meyer-Olkin test. It reported a value of 0.7786 which implies the dataset had a good sampling adequacy and could be used for factor analysis. Also Bartlett's test showed that the variables are correlated (p value <0.0001). e dataset deviated slightly from multivariate normality; however, most of the estimators in this study can handle nonnormality to some level.

Results and Discussion
In order to compare the ridge maximum likelihood estimator (RMLE h ) with other estimators using the above real-life data, the SEM as presented in the previous section was applied to the required manifest variables to establish the effect of obesity, diabetes, and hypertension on coronary disease using a sample data of size 113. e models also assessed the effects of obesity and diabetes on hypertension as well as obesity and hypertension on diabetes. Table 2 shows the correlation matrix of the data, and the model data fit indicators are reported in Table 3. e path coefficients and types of effects are reported in Tables 4-9 while path model results are presented in Figures 2-7. In all, six models were fitted. Model 1 contains latent hypertension as a risk factor for type II diabetes and Model 2 models the effect of type II diabetes on latent hypertension. Model 3 models the effect of manifest diastolic blood pressure on type II diabetes and Model 4 looks at the effects of manifest systolic blood pressure on type II diabetes. Lastly, Model 5 models the effect of type II diabetes on manifest diastolic blood pressure whilst Model 6 accounts for the effect of type II diabetes on manifest systolic blood pressure.
ese relationships are hypothesized based on the literature [12,42,43]. e results as presented in Table 3      unknown p value. e relative chi-square is also less than 2((χ 2 /df) < 2) and falls within the accepted interval. Also, the RMSEA reported nonsignificant values (p value > 0.05) for all methods that converged successfully, implying a good model. All the estimation methods reported good and acceptable SRMR except the ADFE which reported a poor SRMR of 0.095. Other fit indices which also showed that the hypothesized models were fitted appropriately include the absolute fit index: GFI, relative fit indices: CFI, TLI, RNI, NFI, and BFI (IFI), and parsimonious fit indices: AGFI, PNFI, and PGFI. Generally, the ridge maximum likelihood estimators performed better than other           ese results show that RMLE h and RMLE a reported the best fit indices which are used to assess the model data fit for SEM approach. e RNI and CFI fit indices were proved to be the same sometimes due to some algebraic conditions they shared in common [44]. In this study, they were the same for all estimators.
e ADFE and ULSE are not good estimators for the hypothesized models in this study, whilst GLSE reported fit indices which do not fall within the acceptable ranges, especially CFI, TLI, RNI, and NFI for all models except Model 3. e obesity latent exogenous variable was determined using manifest variables: BMI, WC, hip circumference (HC), and VF. All the manifest variables for measuring obesity fitted correctly and were significantly positively related to the theoretical exogenous variable (obesity). e latent hypertension was also significantly determined using diastolic and systolic blood pressure. e measurement of the latent hypertension agrees with the results of Yousefi et al. [12], where systolic blood pressure contributed more than the diastolic blood pressure but disagrees with that of Broström et al. [43].
Some of the hypothesized paths were significant. e theoretical hypertension in this study had no effect on diabetes ( Figure 2 and Table 4). e exogenous latent obesity has the main effect of 0.215 on the endogenous latent hypertension.
e exogenous latent obesity also showed a positive relationship with coronary risk with significant direct and nonsignificant indirect effect of 0.218 and 0.022, respectively. e direct effect of exogenous latent obesity on postprandial glucose is 0.08 ( Figure 2). e relationship between latent blood pressure and latent obesity in this study (both Model 1: Table 4 and Figure 2, and Model 2: Table 5 and Figure 3) is consistent with the results of Yousefi et al. [12]. ey reported that obesity had a strong positive effect on latent hypertension which implies that reducing weight will reduce hypertension. From the first model (Figure 2), the blood pressure measured as a theoretical variable had no effect on type II diabetes. Like the first model, the second model ( Figure 3) shows that type II diabetes also do not have a significant direct or indirect effect on hypertension endogenous latent variable for this study (Figure 3 and Table 5). However, when blood pressure was measured with manifest diastolic blood pressure in Model 3 ( Figure 4 and Table 6), the blood pressure has a significant effect on type II diabetes. e manifest systolic blood pressure has no significant effect on type II diabetes ( Figure 5 and Table 7). Moreover, it revealed that obesity affects systolic and diastolic blood pressure, which means reducing obesity could reduce both hypertension and type II diabetes. Also, in Model 5 (Figure 6 and Table 8), type II diabetes has a significant effect on diastolic blood pressure. is implies that an increase in type II diabetes could lead to high diastolic blood pressure. Type II diabetes does not have effects on systolic blood pressure whilst obesity shows a significant effect on systolic blood pressure as in Model 6 ( Figure 7 and Table 9). Obesity has a positive effect on coronary risk in all models. Generally, age has positive total effects on obesity, hypertension, and coronary risk.

Conclusion
is study compared a ridge estimator with others in modelling the relationship between obesity, type II diabetes, and hypertension on coronary risk controlling for age, as well as the converse effects of type II diabetes on hypertension. e RMLE h and RMLE a did better than other estimators like the maximum likelihood, generalized least squares, unweighted least squares, and asymptotic distribution-free estimators. e ULSE and ADFE reported nonconverged and unreliable model coefficients. Aside the results of the ULSE and ADFE, the other estimators reported very similar model coefficients. e obesity latent exogenous variable was significantly measured using waist circumference, hip circumference, body mass index, and visceral fat. All obesity manifest measures are positively related to the endogenous variable. Also the diastolic and systolic blood pressure are significant determinants of blood pressure.
e study found that increase in obesity could result in a significant increase in both hypertension and coronary risk. Type II diabetes has a converse effect on blood pressure. Latent blood pressure and obesity do not have significant relationships with diabetes. However, type II diabetes has a significant positive effect on manifest diastolic blood pressure. e manifest diastolic blood also has a significant effect on type II diabetes. erefore, this calls for holistic public healthcare policies to reduce both conditions under noncommunicable diseases as reducing one at a time may not be effective.

Limitation.
In order to make an inference concerning the relationships between obesity, diabetes, hypertension, and coronary risk to the population of Ghana, a large dataset covering the whole country is needed. However, the dataset used for the empirical analysis in this study covered one of the regions of the country. Hence, the study is unable to generalize these findings to the whole country. Also, although in assessing the direct and indirect effects of obesity on diabetes, hypertension, and coronary risk we control for age, there may be other confounding variables which were not measured for this study, and hence, further study is required.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.