Sensitivity and Specificity Improvement in Abdominal Obesity Diagnosis Using Cluster Analysis during Waist Circumference Cut-Off Point Selection

Introduction. The purpose of this study was to analyze the influence of metabolic phenotypes during the construction of ROC curves for waist circumference (WC) cutpoint selection. Materials and Methods. A total of 1,902 subjects of both genders were selected from the Maracaibo City Metabolic Syndrome Prevalence Study database. Two-Step Cluster Analysis (TSCA) was applied to select metabolically healthy and sick men and women. ROC curves were constructed to determine WC cutoff points by gender. Results. Through TSCA, metabolic phenotype predictive variables were selected: HOMA2-IR and HOMA2-βcell for women and HOMA2-IR, HOMA2-βcell, and TAG for men. Subjects were classified as healthy normal weight, metabolically obese normal weight, healthy and metabolically disturbed overweight, and healthy and metabolically disturbed obese. Final WC cutpoints were 91.50 cm for women (93.4% sensitivity, 93.7% specificity) and 98.15 cm for men (96% sensitivity, 99.5% specificity). Conclusions. TSCA in the selection of the groups used in ROC curves construction proved to be an important tool, aiding in the detection of MOWN and MHO which cannot be identified with WC alone. The resulting WC cutpoints were <91.00 cm for women and <98.00 cm for men. Furthermore, anthropometry is insufficient to determine healthiness, and, biochemical analysis is needed to properly filter subjects during classification.


Introduction
Obesity is emerging as an important health issue in Venezuela, particularly in urban areas, paradoxically coexisting with undernutrition [1]. The rising prevalence of overweight and obesity around the world shares a direct correlation with the increasing occurrence of obesity-related comorbidities such as high blood pressure (HBP), metabolic syndrome (MS), dyslipidemia, type 2 diabetes mellitus (T2DM), and cardiovascular disease (CVD) [2]. Several pathophysiological aspects have been proposed to explain the close relationship between these diseases, including the degree of adiposity and anatomic fat localization [1][2][3].
Currently, it is accepted that the majority of individuals who have obesity progressively develop insulin resistance, beta cell failure, and lastly T2DM, proving to be a biological continuum that is undeniably complicated and intricate [2,3]. However, approximately 10-25% of obese individuals are metabolically healthy, most likely due to preserved insulin sensitivity probably due to genetic factors [4]. On the other hand, visceral adipose tissue inflammation, ectopic fat deposition, and adipose tissue dysfunction have been 2 Journal of Diabetes Research proposed as an etiologic triumvirate that mediates insulin resistance in human obesity independently of total body fat mass [3]. Furthermore, it has been reported that around 10-15% of lean subjects may exhibit insulin resistance and other metabolic disturbances like dyslipidemia, dysglycemia, and HBP [5]. This landscape suggests four well-defined phenotypes existence in human beings according to body composition and metabolic status: (a) healthy normal weight (HNW), (b) metabolically obese normal weight (MONW), (c) metabolically disturbed obese (MDO), and (d) metabolically healthy obese (MHO) [6][7][8].
It has been highlighted that the proposed waist circumference (WC) cut-off points for Latin America, as well as other parts of the world, have relatively low areas under the curve (AUC) and therefore relatively low sensitivities and specificities during COR curves construction [9,10] when using traditional criteria to classify subjects as healthy or sick, such as the presence of two or more components of the MS criteria as reported by Hara et al. [11]. Recently, our group built ROC curves for WC cutoff-point selection using 2 or more positive MS components to differentiate between healthy and sick individuals, rendering values of 90.25 cm (68.4% sensitivity, 65.8% specificity) for women and 95.15 cm for men (71.1% sensitivity, 67.4% specificity) [12].
Nevertheless, it has been suggested that unusual metabolic phenotypes such as MONW and MHO could influence the accuracy of obesity-centered studies due to difficulties in subject characterization [6][7][8], and this setback includes sensitivity and specificity from WC cut-offs point selection for obesity diagnosis. The biological traits of these uncommon phenotypes [6][7][8]13] result in uncharacteristic grouping of metabolic components which could be difficult to predict. Therefore, the proof-of-concept would be that the early detection of these phenotypes prior to WC selection could improve the accuracy of selected WC cutpoints, and their future application in epidemiological studies.
The possibility of detecting MONW and MHO prior to any cut-off point selection method cannot rely on common markers such as WC because they can be misleading, due to the uniqueness of such phenotypes [7,8]. In this context, the advantage of applying data mining techniques (like Cluster analysis) is that it allows the spontaneous grouping of individuals according to the behavior of metabolic and anthropometric variables, superseding the discriminating capacity of internationally appointed WC cut-off points and other preestablished criteria for metabolic alterations. Since these phenotypes do not behave in the same manner as the common ones, they could be considered as "noise" during the construction of ROC curves and might affect the sensitivity of specificity of the selected cutpoints for WC. Thus, the ability to identify and filter them from the ROC construction process is not only optional, but actually necessary.
Taking all this information into consideration, the purpose of this investigation was to identify subjects with unusual metabolic phenotypes and, afterwards, evaluate their influence during the construction of ROC curves for WC cutoff point selection.

Subject Selection.
The Maracaibo City Metabolic Syndrome Prevalence Study (MMSPS) [14] was a cross-sectional research study undertaken in the city of Maracaibo-Venezuela, whose purpose was the identification and analysis of MS and cardiovascular risk factors in the adult population of Maracaibo, the second largest city in Venezuela with 2.750.00 inhabitants. The methodology and randomization during sampling were published elsewhere [14]. Currently, there are 2,230 subjects enrolled [14], out of which 1,902 were selected, therefore excluding those individuals whose serum insulin levels were not determined and those diagnosis with diabetes mellitus; the latter was excluded because pharmacological treatment of these patients would modify the variables used in this research. The study was approved by the Bioethics Committee of the Endocrine and Metabolic Diseases Research Center, University of Zulia, and all participants signed a written consent before being interrogated and physically examined by a trained team.

Clinical
Evaluation. The assessment of blood pressure was done applying the auscultatory technique, and HBP classification was made using the criteria proposed in the VII Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure [15]. Mean Arterial Pressure (MAP) [16] was calculated using the equation (Diastolic Pressure + (Systolic Pressure − Diastolic Pressure/3)), expressed in mmHg. Obesity was classified applying the WHO criteria [17] based on the BMI value. Weight was assessed using a digital scale (Tanita, TBF-310 GS Body Composition Analyzer, Tokyo, Japan), while Height was obtained with a calibrated rod, with the patients shoeless and wearing light clothing. WC was measured using calibrated measuring tape in accordance to the anatomical landmarks proposed by the USA National Institutes of Health protocol [18].

Biochemical Analyses.
Fasting levels of glucose, cholesterol, triglycerides (TAG), HDL-C, and hs-CRP were determined using an automatized computer analyzer (Human Gesellschaft für Biochemica und Diagnostica mbH). LDL and VLDL levels were calculated applying the Friedewald formulas [19]. When triacylglycerides were over 400 mg/dL measurement was done using lipoprotein electrophoresis and optical densitometry (BioRad GS-800 densitometer, USA). Insulin was determined using an ultrasensitive ELISA method (DRG Instruments GmbH, Germany, International DRG Division, Inc). The MS diagnosis was done using the IDF/NHLBI/AHA-2009 consensus criteria [20].

Insulin
Sensitivity. This was assessed by the Homeostasis Model Assessment (HOMA2-IR) calculator, which is available at http://www.dtu.ox.ac.uk/homacalculator/index.php from the Oxford Centre for Diabetes, Endocrinology and Metabolism. Using ROC curve construction technique, our research team determined that the optimal cutpoint for HOMA2-IR for our population is 2.00 [21].

Statistical Analysis.
Database construction and cluster analysis were done using the Statistical Package for the Social Sciences (SPSS) v19 for Windows (IBM Inc., Chicago, IL), while the ROC curves were constructed using the R Project for Statistical Computing, available at http://www.r-project.org/. Normal distribution of continuous variables was assessed using Geary's test; for normally distributed variables, the results were expressed as arithmetic mean ± SD (standard deviation). Variables without normal distribution were logarithmically transformed, and normal distribution subsequently corroborated. The differences between arithmetic means were assessed using Student'stest (when two groups were compared) or one-way ANOVA (when three or more groups were compared). Qualitative variables were expressed as absolute and relative frequencies.

Cluster Analysis Protocol. Previously to the Two-Step
Cluster Analysis, all individuals were classified according to BMI in Normal Weight, Overweight, and Obese. The obese groups were primarily evaluated as a group (Obese, BMI ≥ 30 kg/m 2 ) and according to WHO classification (Class I, Class II, and Class III) [17]; since the results showed similar behavior between them, we decided to use the classification of Obesity because it allowed us to evaluate the subjects more clearly. Each BMI category was submitted independently to the cluster analysis, categorizing the subjects as metabolically healthy or sick; see Figure 1. The metabolic variables evaluated as possible metabolic predictors based on their physiological function and biological plausibility were MAP, TAG, total cholesterol, HDL-C, HOMA2-IR, HOMA2-cell, HOMA2-S, fasting blood glucose, non-HDL-C cholesterol, TAG/HDL-C index, and hs-CRP; WC was excluded because it was the assessed dependent variable. The predictive strength of these variables was analyzed in accordance to cluster ability and quality, ranging from 0.0 to 1.0. The best metabolic predictive variables selected were (a) HOMA2-IR and HOMA2-cell for normal weight women; (b) HOMA2-IR, HOMA2-cell and TAG for normal weight men; (c) HOMA2-IR and HOMA2-cell for overweight women; (d) HOMA2-IR, HOMA2-cell, and TAG for overweight men; and (e) HOMA2-IR for male and female obese patients ( Table 1).
The Two-Step Cluster Analysis for SPSS was conducted in two phases [22]: during the first step (called "precluster"), the subjects are divided into several small subclusters. Then, The cells with " * " indicate the "healthy" clusters for each phenotype. The cells without " * " represent the "sick" clusters of persons. HOMA2-IR: Homeostasis Model Assessment-2 for Insulin Resistance; HOMA2-Bcell: Homeostasis Model Assessment-2 for Pancreatic Cell Function; TAG: triglycerides, expressed in mg/dL. the obtained subclusters are grouped into a preferred number of clusters; if the desired number of clusters is unknown, the SPSS Two-Step Cluster Component will find the proper number of clusters automatically. Once the program analyzed the subclusters and the characteristics of each BMI category (as described previously), the subjects were categorized in 6 phenotypes: HNW, MONW, healthy and metabolically disturbed overweight, MDO, and MHO.

Cluster Quality Measures.
To evaluate the quality of the resulting clusters, the cohesion, separation, and silhouette coefficient were calculated [23][24][25]. The silhouette coefficient [26] encompasses the ideas of cohesion (the closeness of related objects in a cluster) and separation (the distance between objects in a cluster), describing the average distances between variables within a cluster and between other clusters, the highest silhouette results being between 0.5 and 1 [23,26]. The clusters with high cohesion are preferred because it is a guarantee of good quality clustering, demonstrated by high silhouette values and truly clustered variables [23][24][25][26].

Cross-Validation Technique.
Cluster validation aims to evaluate the differences within a cluster in order to confirm clustering selection to estimate the accuracy of a prediction model [27]. This method requires the division of the data into two groups: one to train (training dataset) and the other to validate (testing dataset) [27,28]. The process requires doing several rounds of partitioning and cross-validation,  where all the analyses are performed on the training set and then validating such analyses in the testing set [27,28], and agreement was assessed by Cohen's kappa coefficient.

ROC Curves Construction. The Receiving Operating
Characteristic (ROC) [29] curves were used to analyze the predictive validity and to determine optimal cut-off values for WC following a series of exclusion steps ( Figure 2). Comparison of AUC was calculated with DeLong's Test [30]. Several indexes were calculated to assess the optimal cut-off point on the curve, such as the Youden Index, the distance of the point closest to (0.1) on the ROC curve and Positive Likelihood Ratio were calculated [31]. Nevertheless, sensitivity over specificity was considered when selecting WC cut-off points.  Table 2. For anthropometric parameters, biochemical, and blood pressure results see Table 3.
Cluster quality was assessed with the silhouette coefficient, which rendered >0.5 for every cluster, meaning that all clusters were classified as good models. Next, cross-validation was performed, dividing the subjects in two groups: S1 and S2. The S1 group was used as the training set, where centroidbased clustering was calculated using the steps shown in Figure 1. The S2 group was used as the validating set (S2), where clusters were obtained using two methods: (a) normal clustering process (S2-clusters) and (b) clustering based on centroids and distances obtained from S1 (S2 clusters according to S1). All the resulting S2-derived clusters were compared using Cohen's kappa, resulting in 0.902; < 0,00001 (Table 4).

Curves Constructed with the Overall Population (All 6
Groups). We sought to find an appropriate cut-off point for this population sample, applying the 6 phenotypes previously described in a stepwise manner. In Figure 3, ROC curves for men and women are shown. In Figure 3(a), the selected cut-off point for women was 91.25 cm, with an AUC 0.768, sensitivity of 73.3%, and a specificity of 68.5% (Table 7). In the next panel, Figure 3(b), the chosen cut-off point for men was 98.15 cm, with an AUC of 0.786, 74.8% sensitivity, and 69.7% specificity.

COR Curves Construction without MONW and MHO
Groups (4 Groups). The following ROC curves were built without the "anomalous signals" derived from the atypical phenotypes, MONW and MHO. In Figure 4(a), the women's ROC curve is depicted, with a selected cut-off point of 91.50 cm, showing an AUC of 0.890, 80.1% sensitivity, and 79.3% specificity. In the following panel, Figure 4(b), the selected cut-off point for men was 98.15 cm, with an AUC of 0.919, sensitivity of 83.8%, and a specificity of 82.3% (Table 7). (2 Groups). The final ROC curves were built without the Overweight groups, leaving only the HNW and the MDO. In Figure 5(a), the women's ROC curve is shown, with a chosen cut-off point of 91.5 cm, characterized by an AUC of 0.982, sensitivity of 93.4%, and a specificity of 93.7%. In Figure 5(b), the men's cut-off point was 98.15 cm, with an AUC of 0.998, sensitivity of 96%, and a specificity of 99.5%; see Table 7. Figure 6 shows all the constructed ROC curves and their DeLong results. Finally, Table 8 shows the metabolic variables of the subjects categorized with the obtained WC cut-off points from this investigation, resulting in significant differences between obese and nonobese subjects in every variable, except in HOMA-2 cell in the women's group.

Discussion
It is imperative to determine accurate WC cut-off values in order to diagnose abdominal obesity, given the progressive and fast rise in the worldwide prevalence of this disease. This growing epidemic has been a driving force for the  development of improved diagnostic tests to aid physicians in their daily practice to diagnose abdominal obesity associated with metabolic disorders [31]. The search for ethnic-specific values for anthropometric measures requires the application of several techniques, ROC curves being one of the tools available in order to ascertain an appropriate cut-off point [32]. ROC curves approach to determine suitable cut-off points for WC has been extensively used [9][10][11], especially in populations that are not properly classified in the latest MS criteria by the IDF/NHLBI/AHA-2009 due to lack of sufficient population-specific data for the WC variable [20]. As opposed to more widespread methodology in these studies [9], the involvement of data mining techniques (cluster analysis) enhances the selection of healthy and sick subjects for the construction of ROC curves because it does not use predetermined variables nor arbitrary cut-off points to decide; instead it allows the program to group the individuals according to their biological characteristics and spontaneous tendencies [22]. This improvement in subject classification     guarantees cutoff points with superior sensitivity and specificity, which is the ultimate goal in surveys such as ours.
Several studies have suggested that the WC cut-off proposed by the IDF/NHLBI/AHA-2009 consensus seemed to be invalid for certain ethnicities, particularly the Hispanic groups in Latin America [9]. Aschner et al. [10] published their WC cut-off points based on ROC curves using visceral fat area (≤100 cm 2 ) as the independent variable, with resulting cut-off values of 94 cm for men (89.9% sensitivity and 80.2% specificity) and 90-92 cm for women (78.9%-72.9% sensitivity and 67.6%-74.5% specificity). However, Aschner's research conveys the use of visceral fat to find an optimal cutoff value of WC which detects subjects at risk of abdominal obesity. A cut-off point of 100 cm 2 was calculated for Japanese population [33], using metabolic criteria cutoffs that are now considered outdated (e.g., fasting glucose >110 mg/dL). Moreover, Latin-Americans are phenotypically and genetically different from Asians [34], which hinders the possibility of properly extrapolating results from their group onto ours. Despite these shortcomings, this cut-off has been used in several studies as a standard. Currently, Latin America also needs cut-off values concerning visceral fat, especially when ethnic minority groups are included, such as the Amerindians and Afro-Descendants.
Two-Step Cluster Analysis approach enhances sorting of the subjects, allowing for better grouping and evaluation according to biochemical and anthropometric coalescent variables, eliminating the bias observed in predetermined variables and cut-off points. On this reasoning, 6 phenotypes were constructed: Healthy Normal-Weight, MONW, Healthy and Metabolically Disturbed Overweight, MHO, and Metabolically Disturbed Obese. Each group has diverse cardiometabolic profiles which have been widely described in the last decade [5][6][7][8]13]. Evidently, the MONW and MHO are exceptions to rules that have been described traditionally, where first glance examination of an obese or lean patient would automatically classify them as sick or healthy, respectively. Using the selected parameters according to BMI, a proper classification is possible, being demonstrated by the enhancement of sensitivity, specificity, and AUC for abdominal circumference in these groups.
ROC curve programs allow the determination of true positive and negative cases, by providing cut-off points and their corresponding AUC, sensitivity, and specificity; nevertheless, this feature depends on an appropriate sorting of the sample and its accuracy is confirmed with the comparison of curves before and after selection. Eliminating noise during the filtering of information is of paramount importance, since it behaves as phantom signals which derail the evaluation towards inaccurate values. Exclusion of the MONW and MHO categories impedes the use of false data to determine a cut-off point, rendering enough sensitivity and specificity to identify subjects at risk. It is imperative that physicians embrace the advantages offered by both techniques in order to be able to determine valid cutoff points in ethnic-based studies concerning metabolic variables, which are categorized as biological and thus display a continuous behavior.
The other groups that got excluded were the Overweight individuals. The definition of overweight lies between normalcy and obesity, between 25.00 and 29.99 kg/m 2 . This allocation confers this definition a "transition" quality which is based on the possibility of reducing weight and achieving normal weight or augmenting weight reaching obesity levels [35,36]. Moreover, this also suggests that weight is a continuous biological factor, and the arbitrary classification of overweight is a transition phase in the natural history  of obesity [36]. Therefore, since overweight subjects and considered "in transition" were removed them from the final WC ROC curve construction. It is necessary to emphasize three facts: WC alone cannot recognize MONW or MHO subjects, anthropometry is insufficient to determine healthiness, and relying on the information obtained in other variables is needed for the filtering process. These facts open a new window of opportunity to investigate the establishment of new strategies that can help identify peculiar phenotypes, such as the use of somatotypes constructed with local anthropometric and biochemical data, facilitating the identification of MOWN and MHO subjects.
The chosen cut-off point for the Women's group was 91.25 cm, very similar to that reported by Herrera et al. [9], suggesting that the females in the sample tend to have higher WC values than the ones set previously both by ATPIII [37] (<88 cm) and IDF/NHLBI/AHA-2009 (<80 cm). This finding is probably explained by differences regarding height, fat distribution, and genetic background [38]. It is noteworthy to point out that, even after the groups were filtered extracting MOWN, MHO, and overweight individuals, the WC cut off point always remained the same, but sensitivity and specificity improved significantly, proving that this approach offers a better way to scrutinize metabolically heterogeneous groups. Women appear to boast higher WC cut-offs, perhaps due to displaying a central fat distribution despite having femoral-gluteal fat distribution tendencies. Regarding males, a similar trend was observed, this time with a selected cut-off of 98.15 cm which is between the cutoff points proposed in IDF/NHLBI/AHA-2009 (<90 cm) and ATPIII (<102 cm).
We have previously published the prevalence of obesity in the city of Maracaibo [1], reporting that the overall prevalence of abdominal obesity using IDF/NHLBI/AHA-2009 criteria [20] was 74.2%, while using the ATPIII criteria [37] rendered a prevalence of 51.7%. Using the cut-off points proposed in this research, the overall abdominal obesity prevalence using the complete sample of MMSPS ( = 2,230) is 35.6% ( = 794). Thus, the new WC cut-off point reduces the alarming 74.2% obtained with the harmonizing criteria and offers better information to design strategies for primary and secondary prevention.
Lastly, we address two important limitations within this investigation. First, the absence of imaging study confirmation such as visceral fat measurement; this branch of the MMSPS study is currently underway. Second, we used BMI as a method of diagnosis and categorization instead of other obesity diagnostic tools like DEXA for Body Composition [39] due to lack of resources for such endeavor. It has been reported that BMI has limitations in regard to adiposity diagnosis, especially in intermediate ranges of BMI [40]. However, BMI cutoff point of ≥30 kg/m 2 should be easily dismissed, since it has been associated with high specificity and positive predictive value for diagnosing obesity in both sexes [40] and strong association with other entities such as arterial hypertension [41], diabetes mellitus [42], stroke [43], premature death [44], and several types of cancer [45].
In conclusion, we propose WC cut-off points of <91.00 cm for women and <98.00 cm for men, both providing an excellent sensitivity and specificity when concerning the diagnosis of abdominal obesity. The need for ethnic-specific WC cut-off points is paramount, especially when there is an association between WC and mortality prediction [46,47]. The application of statistical methods that allow the filtering and gathering of accurate information, like Cluster Analysis and ROC curve constructs, will warrantee production of veracious cut-off points that can be applied in large prospective trials.