Phenotyping Chronic Pelvic Pain Based on Latent Class Modeling of Physical Examination

Introduction. Defining clinical phenotypes based on physical examination is required for clarifying heterogeneous disorders such as chronic pelvic pain (CPP). The objective of this study was to determine the number of classes within 4 examinable regions and then establish threshold and optimal exam criteria for the classes discovered. Methods. A total of 476 patients meeting the criteria for CPP were examined using pain pressure threshold (PPT) algometry and standardized numeric scale (NRS) pain ratings at 30 distinct sites over 4 pelvic regions. Exploratory factor analysis, latent profile analysis, and ROC curves were then used to identify classes, optimal examination points, and threshold scores. Results. Latent profile analysis produced two classes for each region: high and low pain groups. The optimal examination sites (and high pain minimum thresholds) were for the abdominal wall region: the pair at the midabdomen (PPT threshold depression of > 2); vulvar vestibule region: 10:00 position (NRS > 2); pelvic floor region: puborectalis (combined NRS > 6); vaginal apex region: uterosacral ligaments (combined NRS > 8). Conclusion. Physical examination scores of patients with CPP are best categorized into two classes: high pain and low pain. Standardization of the physical examination in CPP provides both researchers and general gynecologists with a validated technique.


Introduction
Establishing phenotypes for clinical conditions is a fundamental step in the development of diagnostic criteria, which are required for coherent research and evidence based clinical care [1]. From the categorization of fetal heart rate patterns to the description of pelvic organ prolapse, a validated nomenclature allows an apples-to-apples comparison of research studies and also lets clinicians translate research findings into practice by clearly describing a clinical condition in terms of objective findings.
Chronic pelvic pain is an area of gynecology sorely in need of evidence based phenotypes [2]. The current phenotyping approaches are primarily symptom based and limited to urologic pain [3]. The challenges in this field are many and varied [4]. Since pain is subjective, an easily replicated standardized examination becomes even more important. How to perform the exam, which points to examine, and where to set thresholds between incidental pain and significant pain are all problems faced by clinicians on a daily basis [5]. Patients also are frustrated by a lack of uniformity in describing their condition and are hindered by incomplete evaluations [6]. Phenotyping patients into similar groups can be used clinically to assess prognosis, evaluate potential treatments, or suggest further diagnostic evaluation.
To address these concerns requires a substantial sample of patients, an extensive standardized physical examination, fundamental inclusion criteria, and some technique for data reduction. Once such method is known as latent class analysis. In this approach, a large data set is mathematically examined to find related classes of patients hidden (latent) within the data structure. The analysis is stepwise, beginning with an assurance that the columns (in this case physical examination locations) are all appropriately measuring a single construct (i.e., they are unidimensional). Once this is done, the values of the examination sites are evaluated using multiple measures to determine how many latent classes exist in the data set. The number of latent classes present (i.e., 2 classes: high and low; 3 classes such as high, low, and intermediate; or 4 classes: minimal, mild, moderate, and severe) is determined based on several mathematical criteria. Once the number of classes is known, thresholds for class allocation can be created, and then the original larger number of evaluation sites can be reduced to only the most pertinent locations. These can then be used to establish research or clinical phenotypes.
The objective of this study was to apply latent class modeling to the physical examination of patients with chronic pelvic pain, with the ultimate goal of defining clinical phenotypes in these patients.

Methods
A total of 476 female patients were evaluated following referral to the Pelvic Pain Specialty Center at Summa Health System according to Institutional Review Board-approved protocol 07048. All patients met the American College of Obstetricians and Gynecologists definition for CPP and were evaluated in a standardized manner similar to that suggested by the International Pelvic Pain Society (http://www.pelvicpain.org/resources/handpform.aspx). Patients underwent a structured history including a clinical interview by a board certified psychiatrist and a complete physical examination by a board certified gynecologist, including semiquantitative pelvic pain testing. This was done across multiple sites of the pelvis, including the pelvic abdominal wall, the vulvar vestibule, the pelvic floor, and the vaginal vault.
Patients were placed in the dorsal lithotomy position and pain on the abdominal wall was evaluated with pain pressure threshold (PPT) algometry according to previously described protocols [7]. The physical examination included the application, perpendicular to the abdominal wall, at a rate of approximately 1 kgf/s of a pressure algometer (Wagner Instruments, Greenwich, CT, model FPK 10) with a 1 cm 2 tip at 14 sites of the lower anterior abdominal wall. Pressure was steadily applied until patients reported a change from pressure to pain. Evaluations were done based on PPT suppression, calculated as the patient's threshold value subtracted from the maximum value of 3 kgf applied.
Pelvic floor pain testing was done using a lubricated, gloved single finger administered by a trained examiner applying 1 kg/cm 2 of force to the central point of each area [8]. Before testing the examiner reviewed the pressure needed to apply 1 kg/cm 2 by training on the pressure algometer. No pain was rated at zero, otherwise from one to ten, with ten being coached as "the worst pain imaginable. " The pelvic floor muscles were palpated in order, starting with the puborectalis, palpated in the middle of its body at the 4 and 10 o' clock position from the introitus. The pubococcygeous-iliococcycgeous complex was palpated approximately 2 cm dorsal to this position, in the midbody of the muscle. The obdurator internus was provoked by instructing the patient to adduct her flexed leg against resistance while the muscle body was palpated.
The vulvar skin was tested using a cotton tipped applicator with just enough force to indent the mucosa. This was done in 6 locations in order starting at the 12:00 position and progressing clockwise in the vestibule at the 2, 4, 6, 8, and 10:00 positions. For the vaginal vault, gentle posterior-lateral traction was applied over the uterosacral ligaments bilaterally, and then anterior-lateral traction was applied to the adnexa. Pain from these sites was also recorded on the 0-10 numeric rating scale.
Statistical analysis was conducted in three steps. First, an exploratory factor analysis (EFA) was conducted using the 30 examination sites noted above to test the unidimensionality of each pelvic region, where it was expected that sites from each region would load onto one of four factors representing each region. Geomin rotation, an oblique rotation method that permits factors to correlate, was used as pain within one region is expected to be related to pain in another region [9]. The number of factors extracted was determined from a scree plot of factor eigenvalues to identify the "breakpoint" where the curve flattens out [10,11]. Evidence of unidimensionality within each region was established by statistically significant item factor loadings with standardized values greater than 0.35 and without substantial cross loadings on other regions. Next, a latent profile analysis (LPA), a type of latent class analysis in which the class indicators are continuous variables like the examination site pain measures used in this study, was then used to classify patients into groups with similar patterns of pain within each pelvic region, using the sites that had loaded onto that region in the EFA [12]. The number of classes for each region was assessed using multiple statistical fit criteria, but primarily determined by the Bayesian Information Criterion (BIC) [11]. Entropy was used as an indicator of how well subjects can be differentiated between classes [13]. Both EFA and LCA were conducted with Mplus version 7 [14]. Last, receiver operator characteristic (ROC) curves were calculated to establish the area under the curve (AUC) for each examination site, which was used to identify the best threshold value for each site and compare the predictive performance of the sites within each region. Bootstrapping with 10,000 samples was used to calculate the confidence intervals of sensitivity and specificity at each threshold value for each site, and to statistically compare ROC curves for each site using the "pROC" package from the R statistical program.

Results
Patients tolerated the evaluation well, with no more than 5% missing data points. Within a given region the combination of missing data from multiple exam sites slightly reduced number of subjects for each LCA. Demographic characteristics of this population with CPP are shown in Table 1. Exploratory factor analysis of the tested sites is shown in Table 2(a). The EFA with four factors showed the appropriate factor loadings of the examination sites on the four hypothesized areas of abdominal wall, vulva, pelvic floor, and vaginal vault as demonstrated in Table 2(b). Table 3 demonstrates the results for multiple methods (Log-likelihood, BIC, entropy, and smallest class) for determining the number of latent classes present. Based on the results of Table 2, these are divided into the tests of the abdominal wall sites (Table 3(a)), the vulvar sites (Table 3(b)), the pelvic floor sites (Table 3(c)), and the internal vaginal vault sites (  provides an optimized classification scheme based on all the different parameters. The two latent classes for each region determined in Table 3 are depicted graphically in Figures 1-4. Classification of pain pressure threshold suppression of the abdominal wall sites is depicted in Figure 1. Class 1 represents patients with low levels of threshold suppression (i.e., these patients have very low pain in the abdominal wall) and Class 2 includes only patients with a high degree of pain pressure threshold reduction (i.e., they are very tender to touch). On this scale a score of 0 means that the patient can tolerate 3 kg/cm 2 force applied to the abdomen without reporting any pain, while the maximum score (2.5) means that the patient reports a sensation of pain (rather than pressure) with only 0.5 kg/cm 2 force applied at that point (the equivalent of light touch). Thus patients with low thresholds have greater pain sensitivity (Class 2) and patients with high PPTs do not report a feeling of pain until a substantial pressure is applied (Class 1).
Classification of the numeric ratings for pain with light touch to the vulvar vestibule is shown in Figure 2. Class 1 represents patients with little or no pain, and Class 2,  represents patients reporting high levels of pain to light touch. Figure 3 demonstrates the classification of patients based on pain scores in the pelvic floor muscles. Class 1 patients report low levels of pain on palpation, and Class 2, report high levels of pain. Classification of patients according to palpation of the vaginal vault is shown in Figure 4. Class 1 patients report low levels of pain on palpation, and Class 2 report high levels of pain.
In Table 4 the examination sites (paired) are compared to the 2 class solution using receiver operator characteristic curve analysis to determine the site most predictive of class membership. Table 4(a) demonstrates results for the abdominal wall sites, with the left and right middle abdomen having the greatest area under the curve (AUC) with a pain pressure threshold suppression of 2 or more to be included in the high pain class (class 2). In this analysis, any sum of values for these two sites (i.e., 0.5 on the right and 1.5 on the left) would result in that patient being included in the high pain class. Testing sites on the lateral abdominal wall (between the iliac crest and lower costal margin) and the inguinal ligaments are significantly worse than the best pair (left and right middle abdomen) for assignment of patients into a high or low pain class. Table 4(b) demonstrates the results of ROC testing based on the two class solution for the vulvar pain sites. A report of pain of 2 or more at the 10:00 position in the vestibule is enough to include the patient in the high pain class. Examination at the 12:00, 2:00, and 4:00 positions are significantly less accurate at assigning patients into the two class solutions. Table 4(c) demonstrates the results of an evaluation of the pelvic floor muscle sites. A summed report of pain of 6 or more at the left and right puborectalis classifies a patient into the high pain class. These examination sites are all statistically equivalent to each other, but to be classified as high pain, the 4 Pain Research and Treatment sum of the reports of pain at the obdurator internus is higher, 10 or more (out of a maximum of 20 for any sum). Classification of patients based on testing the uterosacral ligaments or adnexal tenderness is demonstrated in

Conclusion
This approach to phenotyping represents a combination of quantitative and semiquantitative methods suitable for both research and clinical application. The mathematical approach taken here is a stepwise method, assuring that the examination sites in the different regions tested actually measure one construct (unidimensionality: Table 2), determining how many latent classes are present in the data (based on multiple statistical fit criterion: Table 3), applying the classification scheme to the physical exam data (Figures 1-4), and then determining the optimal examination sites and thresholds for class assignment (ROC analysis: Table 4).
Quantification of the pelvic pain examination was performed according to previously published protocols using little or no instrumentation. This approach has the significant advantage that it can then be widely applied to clinical practice or deployed to multiple research sites with a limited equipment cost. A wide range of alternative techniques are available, using thimble algometers or other custom crafted devices [15]. Further research may demonstrate these to be excellent research tools, but until they are widely available these techniques have limited applicability in creating clinical phenotypes for use by typical clinicians.
One benefit of this approach is to highlight the complexity of the pelvic pain evaluation. Chronic pelvic pain is a significant problem for a substantial proportion of our patients and demands more than just a cursory bimanual examination. Although many pain diagnoses may be present which are not easily determined on physical examination, at a minimum this study indicates that pain in the abdominal wall, vulvar vestibule, pelvic floor, and uterosacral ligaments should be evaluated separately.
Setting thresholds and standardizing the examination are vital milestones for phenotyping clinical conditions. Chronic pelvic pain is an excellent example of complex disease, with multiple purported risk factors [16], heterogeneity within the tissues [17], heterogeneity within an individual diagnosis [18], and the complexity of multiple diagnoses potentially present in a single individual patient [19]. Determining useful sites to examine, the number of classifications existing at each examination point, and the thresholds for assigning classification are all vital steps toward developing evidence based phenotypes for use in research and clinical practice.
The results reported here offer some unique insights into the structure of pain related diagnoses. Although many scales rate pain along a continuum (such as the PROMIS approach [20], or the VAS pain scale), all four of the regions evaluated here produced two latent classes: high pain and low pain. There was no a priori determination of how many classes might exist hidden in the data. Since all patients met ACOG criteria for CPP and were being seen in a referral center, it was felt possible that only one class would be found (everyone would have significant pain). Similarly any number of classes could be postulated: low, medium, and high pain (3 classes), or 5 classes in a Likert style scale. Based on these results, future studies of a general clinical population should report physical examination pain as none, low, or high pain. Studies This study has a number of methodological limitations which must be acknowledged and which will need to be addressed by further study. The examination is conducted by a single clinician with experience in evaluation of chronic pelvic pain. Although this technique is largely replicated in other studies using multiple examiners at multiple institutions [21], these studies are also performed by clinicians with a special interest in CPP. The generalizability of these results to screening populations of patients without CPP or to other examiners will similarly require further study. As a study of female chronic pelvic pain, it is possible but unknown to what extent these conclusions can be transferred to men with chronic pain.
Other limitations include the analysis of a finite number of examination sites. A wide range of other sites to test exist in the pelvis, which include the adductors, pyriformis, and coccygeus among others. Inclusion of other examination sites has the potential to provide better correlation with the different classes; however Table 4 demonstrates that there are limited (though sometimes statistically significant) differences between the sites tested here. Based on this finding, the addition of other sites in similar regions of the pelvis is unlikely to alter the fundamental findings of this study but may produce alternative examination sites with different thresholds.
The evaluation of pelvic pain in this study followed a prescribed protocol which was followed for all patients. This approach has the advantage of producing a complete data set, but the disadvantage of introducing bias based on the order of testing. This is most prominently displayed in evaluation of vulvar pain. In this study the order of examination began at the 12:00 position and then proceeded in a clockwise manner to end at the 10:00 position. Since pain can be worsened through sensitization due to previous stimulation (examination) of nearby sites, it is possible that the increased AUC at the 10:00 position is an artifact of the order of examination. Further research with a randomized vulvar testing scheme may reveal a different threshold or point for assignment to the high pain class.
This study represents an effort to produce a statistically sound approach to phenotyping CPP arising from the abdominal wall, vulva, pelvic floor, and vault; it is not designed to evaluate any of the other myriad disorders associated with CPP. In particular pain arising from the uterus, which may be due to a number of conditions including fibroids, endometriosis, or adenomyosis, is excluded. Evaluation of pain of this type demands histologic correlation and is beyond the scope of this study. Similarly pain from the bladder (which may be associated with interstitial cystitis or merely a bladder infection) and pain from the rectum (which may be due to irritable bowel syndrome, diverticular disease, or hemorrhoids) are also excluded intentionally from this evaluation. The rationale for this is explored in a separate latent class analysis of these pelvic floor locations [8].
This study is clearly a beginning rather than an end in itself. With a phenotyping methodology is available, many future possibilities exist. In particular, a physical exam based phenotype can be applied to clinically defined disorders such as endometriosis and interstitial cystitis to determine the different subpopulations within these diagnoses. A more precise classification scheme in chronic pain states has the potential, with future research, to assist clinicians with an optimized selection of treatments and can provide patients with a more clear prognosis based on the results of standardized outcomes.

Clinical Implications.
Based on these results, four CPP phenotypes can be defined.
Pain in the anterior abdominal-pelvic wall can be separated into two classes, with high pain defined when PPT depression in the left and right midwall (measured halfway between the level of the umbilicus and the inguinal canal at the lateral border of the rectus muscle) equals a sum of 2 or more. Other abdominal wall locations can also be used, including the midline, upper, or lower lateral rectus borders, but with different thresholds. The lateral abdominal wall or inguinal ligaments are not as accurate in classifying patients based on abdominal-pelvic wall pain. Pain on the vulva can be can be separated into two classes, with high pain defined based on a report of pain with gentle mucosal indentation of the vestibule at the 10:00 position producing a reported pain of 2 or more. The 6:00 and 8:00 positions can also be used, with different thresholds. The 12:00, 2:00, and 4:00 positions are not as accurate in classifying patients into high and low vulvar pain groups.
Pain in the pelvic floor can be separated into two classes, with high pain defined based on a report of pain on palpation of the left and right puborectalis with a sum of reported pain of 6 or more. The obdurator internus and iliococcygeus can also be used to classify patients but with different thresholds.
Pain in the vaginal vault can be separated into two classes, with high pain defined based on a report of pain on palpation of the left and right uterosacral ligaments with a sum of reported pain of 8 or more. Pain in the adnexa is not as accurate in classifying patients into high and low pain groups.

Conflict of Interests
No author has a commercial relationship with any aspect of this paper.