Exploratory Factor Analysis for Validating Traditional Chinese Syndrome Patterns of Chronic Atrophic Gastritis

Background. Traditional Chinese medicine (TCM) has long been used to treat chronic atrophic gastritis (CAG). The aim of the present study was to evaluate the TCM syndrome characteristics of CAG and its core pathogenesis so as to promote optimization of treatment strategies. Methods. This study was based on a participant survey conducted in 4 hospitals in China. Patients diagnosed with CAG were recruited by simple random sampling. Exploratory factor analysis (EFA) was conducted on syndrome extraction. Results. Common factors extracted were assigned to six syndrome patterns: qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, and yang deficiency. Distribution frequency of all syndrome patterns showed that qi deficiency, qi stagnation, blood stasis, phlegm turbidity, and heat excess were higher (76.7%–84.2%) compared with yang deficiency (42.5%). Distribution of main syndrome patterns showed that frequencies of qi deficiency, qi stagnation, phlegm turbidity, heat, and yang deficiency were higher (15.8%–20.8%) compared with blood stasis (8.3%). Conclusions. The core pathogenesis of CAG is combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, and yang deficiency. Therefore, treatment strategy of herbal prescriptions for CAG should include herbs that regulate qi, activate blood, resolve turbidity, clear heat, remove toxin, and warm yang.


Background
Chronic atrophic gastritis (CAG) is an inflammatory disease of the stomach from various etiologies [1][2][3]. Typical symptoms, when present, include epigastric pain, fullness, belching, anorexia, and other nonspecific symptoms [3]. CAG can lead to mucosal atrophy, intestinal metaplasia (IM), and gastric intraepithelial dysplasia (GED), also known as intraepithelial neoplasia, which is defined as the precancerous stage of gastric carcinoma [1,4,5]. Global cancer statistics for 2012 estimated that there were 951,600 new cases of stomach cancer worldwide [6]. The transition from chronic gastritis to gastric cancer is a typical disease model of uncontrolled inflammation leading to malignant transformation [7][8][9][10]. Active treatment of CAG arrests further malignant transformation, and thus prevents gastric cancer [11,12].
Traditional Chinese medicine (TCM) has a long history in treating gastritis. Numerous basic and clinical studies 2 Evidence-Based Complementary and Alternative Medicine have demonstrated that Chinese medicine can effectively treat CAG, including resolving mucosal inflammation and reversing glandular atrophy, as well as inhibiting or even reversing intestinal metaplasia and gastric epithelial dysplasia [13][14][15][16][17][18][19][20][21][22][23][24]. Modern-day TCM treatment of CAG remains based on the time-honored core principle of syndrome pattern differentiation to identify and treat the root of illness. When differentiating a patient's syndrome pattern, the TCM practitioner systematically collects comprehensive information about the presenting signs and symptoms, using the four diagnostic methods of looking, listening/smelling, asking, and palpating. The collocated information is then evaluated according to TCM theory and clinical experience to identify the physical condition and nature of pathologic changes during the current stage of the disease. Treatment is then applied in accordance with conclusions drawn from the pattern differentiation process.
Factor analysis (FA) is a statistical analytic method for reducing data, that is, the redundancy of variables, and to detect the structure (relationships) among the variables being measured. FA has been applied in TCM to develop usable dimensional taxonomies, by which large numbers of observed variables can be remodeled as linear combinations of a smaller number of the underlying factors. There are two modes of FA. Exploratory factor analysis (EFA) is an exploratory data-driven tool that generates solutions for developing theories. Its aim is to explore the relationships among the variables without a specific hypothesis or a priori fixed number of factors. Confirmatory factor analysis (CFA) is a method for theory testing that requires the researcher to have substantive knowledge and a firm idea about the number of factors that will be come upon during analysis. In the field of TCM, EFA is being used increasingly for data mining of measured variables such as clinical information obtained during syndrome pattern differentiation. Through EFA features of the distribution of syndromes can be ascertained.
Although various studies have demonstrated that TCM can effectively treat CAG [13][14][15][16][17][18][19][20][21][22][23][24], there has been a lack of consensus across studies on syndrome differentiation, treatment strategy, and elaboration of herbal prescriptions [25][26][27]. This is likely due to the complexity of CAG pathogenesis in TCM. Results of literature reviews have confirmed the lack of agreement in CAG syndrome features [28][29][30][31][32][33] with only a few studies focusing mainly on CAG. This situation is not conducive to standardization of syndrome differentiation and improvement in treatment efficacy of CAG. The present study explored syndrome pattern features and core TCM pathogenesis of CAG by applying EFA to provide evidence for establishment of TCM treatment principles and standardization of syndrome differentiation for chronic atrophic gastritis.

Inclusion and Exclusion
Criteria. Inclusion criteria were as follows: (1) meeting the diagnostic criteria of CAG with detailed medical records and diagnostic reports, (2) age: 20-75 years old, (3) willing to participate in the investigation and sign informed consent, and (4) capable of completing the clinical observation questionnaire and responding to the investigator queries.
Exclusion criteria were as follows: (1) unclear diagnosis or not meeting the diagnostic criteria of CAG, (2) not meeting the age criteria; (3) cognitive difficulty such that four diagnostic methods cannot be completed accurately; (4) other digestive disease or neurologic, circulatory, respiratory, and endocrine disease.

Case Report Form Content and Administration.
Content of the CAG case report form (CRF) was based on literature research, on expert advice, and on standard Chinese guidelines [34][35][36][37]. CRF content included general information (name, sex, and age), disease data (complaint, history of present illness, past medical history, family medical history, and endoscopy results), information from the four diagnoses (symptoms, physical signs, tongue appearance, and pulse reading), western medicine diagnosis, and TCM diagnosis.
Disease history and results of the four diagnostic methods were collected and recorded. Records that met the following criteria were deemed acceptable: all general information filled in except address and contact details and complete information for CAG and four diagnostic methods.
The study was carried out with strict quality control. All investigators were specialized in TCM or integrated Chinese and Western medicines and trained in standard operating procedures of the study. Each study participant was examined and followed up by at least two resident physicians or graduate students who filled in the CRFs. At least two senior staff physicians supervised the interview sessions to ensure consistency and authenticity of data collection to reduce measurement bias.

Data Analysis.
Frequency analysis was performed on data collected from the four diagnostic methods. Exploratory factor analysis (EFA) was performed on syndrome element extraction ( Figure 1). All statistical analyses in this study were

Characteristics of Participants.
A total of 135 CRFs were distributed, and 131 were completed and returned, with a return rate of 97%. There were 120 forms, that is, patients, that were deemed eligible for the study after eliminating questionnaires with incomplete information, thus, with the acceptance rate of 92%. Among the participants, 62 were male and 58 were female, with an average age of 52.56 and a standard deviation of 12.45.

Distribution Characteristics of Results from the Four
Diagnostic Methods. Information from the four diagnostic methods was collected and 53 entries with ≥10% frequency were tabulated based on the distribution frequency (Table 1). Data were preanalyzed and ultimately 40 diagnostic variables with more than 21% frequency of occurrence were chosen to judge the applicability of the data for EFA, so as to determine the number of common factors to be extracted for formal analysis.

Suitability Test.
KMO and Bartlett's test of sphericity were used to evaluate suitability of collocated diagnostic variables for EFA. The Kaiser-Meyer-Olkin (KMO) test assesses partial correlation between variables, and if the KMO value is >0.5, the variable will be more suitable for EFA. In addition, the closer the KMO value is to 1, the stronger the correlation is between variables. Bartlett's test of sphericity assesses the null hypothesis and whether the correlation matrix is rejected as a unit matrix. Only when variables are relatively nonindependent ( < 0.05) can they be used for EFA.
In this study, the KMO value of the partial correlation of variables was 0.549 > 0.5 (Table 2), indicating a certain degree of partial correlations between variables, such that EFA could be carried out. The approximate chi-square value of Bartlett's test of sphericity was 1589.24 and < 0.001, indicating strong correlation between variables, rejection of the hypothesis of independence of variables, and that the variables could be applied to EFA.

Assessment of Common Factors.
Characteristic root value is an index to evaluate the influence of extracted common factors; that is, introduction of this common factor can explain and evaluate the information of the original variables. Variance contribution rate is the proportion of communality in all variances, and the value is positively correlated with carrying capacity of comprehensive information. The cumulative variance contribution rate is the accumulation of variance contribution rate of first N common factors, which represents the proportion of information of all variables whose first N common factors is covered. Results of characteristic root values and variance contribution rate of common factors were tabulated (Table 3). Principal component analysis was applied to extract common factors. Characteristic root values of the first 15 common factors were greater than 1, and their cumulative variance contribution rates reached 70.795, good overall data interpretation capability.
Scree plot displayed relevance of common factors and characteristic root values ( Figure 2). The number of common factors was shown on the -axis and the characteristic values on the -axis. Scatter locations of first 15 common factors were steep, and characteristic root values of the last 25  common factors were small as revealed by the leveling off of the curve's slope. Thus, the number of extracted common factors in formal calculation was 15.

Factor Rotation and Transformation.
Factor rotation was performed to allow the factor load absolute value of the new common factor for each of the four diagnostic results to polarize to 0 or 1 and thus more clearly display the load of all common factor variables, as well as maintaining the variance of all common factors in the corresponding row unchanged. This allows for a more reasonable explanation for the extraction of all common factors. Principal component analysis was used to extract 15 common factors, and the factor rotation method used was varimax rotation. Rotation was converged after 20 iterations and results of the factor load matrix after rotation transformation is shown in Table 4. Table 4, the factor load value was the coefficient values of each common factor that were used to reflect the closeness between common factors and variables. The essence of factor load values was Evidence-Based Complementary and Alternative Medicine 5   the correlation coefficient between them. A positive factor load value from the four diagnostic methods' information represents a positive correlation, and a negative factor load value represents a negative correlation. Correlation between them had a positive correlation with factor load value. Load coefficients that were positive and larger or equal to 0.25 of the four diagnostic methods' information were selected into the corresponding factors.

Common Factor Extraction. In
Through consultation with TCM gastroenterology experts throughout China, the diagnostic variables, nature of disease, and disease location obtained from the four diagnostic methods' information were assigned to 15 common factors (F) ( Table 5).

Extraction of Syndrome
Patterns. Next, the 15 common factors were combined based on syndrome pattern and organ location. Ultimately, six syndrome patterns were established (Table 6).

Distribution of Syndrome Patterns.
The corresponding common factor score of each participant's six syndrome patterns was calculated based on the factor score coefficient, according to which all syndrome patterns of each participant were estimated. Distribution of all syndrome patterns was tabulated (Figure 3).
Factor load after rotation and transformation yielded a syndrome pattern with the highest score, which was taken as the primary syndrome pattern. Distribution of primary syndrome patterns was shown in Figure 4

Discussion
EFA is a multivariate statistical analytic method to explain original variable correlation with potential variables from the perspective of original variable correlation information. The fundamental concept is to project high-dimensional information onto a lower plain to explore internal structure and essential characteristics through dimensionality reduction. EFA applies principal component analysis to extract common factors. Constant factor variance and difference maximization (relative load of squares) are achieved by factor  rotation. Finally, complex original data set information is summarized to a limited number of unmeasurable latent variables (common factors) to describe information of most variables. Based on this, the relationship where original measured variables are governed by a small number of independent factors is explored, and the nature of the original variables is clarified by a linear combination of common factors [38][39][40][41][42].
The holistic theory that TCM is based on determines the complex multivariate nonlinear, that is, multicollinear, relationship of the variables of the four diagnostic methods' information. In this study using EFA to assess TCM syndrome characteristics of CAG the four diagnostic variables were the original variables and had significantly higher dimensional characteristics. The syndrome patterns and other syndrome information were extracted from the common factors (four diagnostic variables with specific combination of features), which were the governing correlations of the four diagnostic variables, so as to achieve dimensionality reduction and elimination of multicollinearity. Evidence-Based Complementary and Alternative Medicine 9 The four diagnostic methods' information that were detected with high frequency were further assessed to rule out any noise interference after frequency statistics. EFA was applied to extract the syndrome patterns, organ location, and other common factors related to the four diagnostic methods' information. All information the common factors belonged to was appraised based on professional knowledge to extract the syndrome patterns. The distribution features of all syndrome patterns and their corresponding primary syndrome patterns were determined. Thus, EFA showed a total of 15 extracted common factors comprised 6 syndrome patterns (qi deficiency, qi stagnation, blood stasis, phlegm turbidity, heat, and yang deficiency) as well as disease location in the liver, spleen, or stomach.
In TCM, the syndrome pattern is the basic pathogenetic unit for evaluating a disease [43,44]. Extraction of the aforementioned 6 syndrome patterns suggests that they form the core pathogenesis of CAG and should thus be the fundamental diagnostic elements taken into account during differentiation of CAG. It follows that, when formulating a prescription to treat CAG, the 6 strategies that should be considered depending on the presenting syndrome pattern are regulating qi (tonification of qi and moving of qi), activating blood, resolving turbidity, clearing heat, removing toxin, and warming yang. Furthermore, target organs of treatment should be the liver, spleen, and stomach.
In this study, EFA results indicated that the syndrome pattern blood stasis was highly associated with qi deficiency, qi stagnation, phlegm turbidity, and heat, whereas association of yang deficiency with the same syndrome patterns was significantly lower. However, in terms of distribution of the primary syndrome pattern, the detected frequency was significantly lower than that of qi deficiency, qi stagnation, phlegm turbidity, blood stasis, and yang deficiency. Thus, from two seemingly contradictory results, it can be inferred that blood stasis, similar to qi deficiency, qi stagnation, phlegm turbidity, and heat, has a wide distribution as a fundamental syndrome pattern and has a high frequency of detection even in patients who have mild blood stasis. For this reason, of all the syndrome patterns, blood stasis had the lowest distribution frequency. This suggests that, in terms of treatment of CAG, formulation of prescriptions should focus on the combination of tonification and moving of qi, activating blood and resolving stasis, clearing phlegm to resolve turbidity, clearing heat and removing toxin, and activating blood to resolve stasis. However, herbs that activate blood to resolve stasis should be prescribed in lesser amounts.
Interestingly, analysis of all syndrome patterns showed that frequency of yang deficiency was significantly lower than that of qi deficiency, qi stagnation, phlegm turbidity, and heat. EFA of primary syndrome patterns indicated that distribution frequency of yang deficiency was similar to that of qi deficiency, qi stagnation, phlegm turbidity, and heat, and was significantly higher than that of blood stasis. Thus, unlike qi deficiency, qi stagnation, phlegm turbidity, and heat, yang deficiency was not widely detected in CAG patients as a fundamental syndrome pattern. Although overall distribution was narrow, once yang deficiency took hold in the body, its presence was intense and detectable, thus occupying a considerable proportion of the primary syndrome distribution. Therefore, in formulating a prescription, in the early stage of yang deficiency type CAG, herbs that tonify yang should receive less prominence than herbs that focus on tonifying and regulating qi, activating blood, resolving phlegm turbidity, clearing heat, and removing toxin. If yang deficiency persists, yang-tonifying herbs should be used to some extent.
Other investigations have not been consistent on TCM syndrome patterns and disease site of CAG [28][29][30][31][32]. For example, Gan and colleagues' [28] conclusion regarding disease site (stomach, spleen, and liver) concurred with our findings. However, their results on primary syndrome patterns of CAG (combined deficiency and excess and combined cold and heat) differed from our results of six patterns. Literature reviews have also confirmed the lack of agreement among TCM authors regarding disease site and primary syndrome patterns of CAG [30,31], indicating the complexities of both the disease nature and progression of CAG as well as the differentiation of its syndrome patterns. Thus, our study applied EFA in an attempt to describe the network of relationships among the four diagnostic variables in order to understand the main syndrome patterns and thus disease sites of CAG.
There are some limitations of this study. Selection bias may exist because all data were derived from participants in hospitals in only two cities with a relatively small sample and therefore are likely not representative of the four diagnostic variables in CAG patients in the rest of China. Future multicentered study which includes large samples is required to verify the conclusions of the present study. In terms of data mining methodology, in EFA each variable or common factor has only one chance to be included in a single category, which means multiple correspondence between variables and categories cannot be created. These flaws may result in an inability to accurately describe the internal property or external relevance of TCM variables from a multidimensional and multilinear perspective to some extent. So in this process, multicollinearity is eliminated by reducing the dimension, from which information may be lost by the so-called dimension effect. It should be noted that Bartlett's test of sphericity approximate chi-square value of EFA was 1589.243 ( < 0.001) and the KMO value was 0.549 > 0.5, indicating that EFA was acceptable. However, the degree of perfection was not satisfactory such that the results need to be validated with other supervised data mining methods.

Conclusions
EFA is a valuable methodology for developing usable dimensional taxonomies in TCM, by which observed syndromerelated variables can be remodeled as linear combinations of the underlying factors. Application of EFA can provide evidence for treatment principles and standardization of syndrome differentiation of chronic atrophic gastritis. Results of this study found that the core pathogenesis of CAG is a combination of qi deficiency, qi stagnation, blood stasis, phlegm turbidity, and heat and yang deficiency. TCM treatment of CAG should therefore focus on regulating qi, activating blood, resolving turbidity, clearing heat and removing toxin, and warming yang.