Accuracy of five different diagnostic techniques in mild-to-moderate pelvic inflammatory disease.

OBJECTIVE
To evaluate the clinical diagnosis of pelvic inflammatory disease (PID) compared with the diagnosis of PID made by laparoscopy, endometrial biopsy, transvaginal ultrasound, and cervical and endometrial cultures.


STUDY DESIGN
A diagnostic performance test study was carried out by cross-sectional analysis in 61 women. A group presenting PID (n = 31) was compared with a group (n = 30) presenting another cause for non-specific lower abdominal pain (NSLAP). Diagnosis provided by an evaluated method was compared with a standard diagnosis (by surgical findings, histopathology, and microbiology). The pathologist was unaware of the visual findings and presumptive diagnoses given by other methods.


RESULTS
All clinical and laboratory PID criteria showed low discrimination capacity. Adnexal tenderness showed the greatest sensitivity. Clinical diagnosis had 87% sensitivity, while laparoscopy had 81% sensitivity and 100% specificity; transvaginal ultrasound had 30% sensitivity and 67% specificity; and endometrial culture had 83% sensitivity and 26% specificity.


CONCLUSIONS
Clinical criteria represent the best diagnostic method for discriminating PID. Laparoscopy showed the best specificity and is thus useful in those cases having an atypical clinical course for discarding abdominal pain when caused by another factor. The other diagnostic methods might have limited use.

complications like peritonitis, sepsis, and death, PID may lead to consequences such as infertility, ectopic pregnancy, and chronic pelvic pain, especially in AIDS patients 4,5 . Such complications and consequences are related to disease severity and time of diagnosis; early diagnosis is thus essential for diminishing the disease's impact 6 .
PID is part of the non-specific lower abdominal pain syndrome (NSLAP) in fertile women, its symptoms representing an especially difficult challenge given the anatomical relationship between the female upper genital tract, the menstrual cycle, and pregnancy.
Clinical diagnosis is the doctor's basic tool in the emergency room for those patients presenting acute abdominal pain having no clear cause. Laparoscopy has been considered as being the gold standard for PID diagnosis 7,8 ; but its sensitivity varies depending on the stage of the illness -being less sensitive in the mild form where diagnostic criteria are less objective 9 . It is not helpful for all patients (being an expensive technique), nor is it exempt from risk (being an invasive technique).
PID diagnosis is based on certain clinical criteria proposed in 1983 and modified in 1991 10 . These criteria have been partially evaluated. Existing studies present problems regarding the suitable selection of control population as ethical concerns emerge when an invasive gold standard such as laparoscopy is performed on a population having low probability of disease or who are healthy women. This situation has also affected evaluation of laparoscopy, ultrasound, and endometrial biopsy [11][12][13] . These techniques have not been adequately evaluated with such a goal in mind in our setting.
This study's purpose then was to evaluate the accuracy of clinical and laboratory criteria and evaluate laparoscopy, endometrial biopsy, endometrial culture, and transvaginal ultrasound performance in patients having mild-to-moderate PID, belonging to a population presenting NSLAP.

SUBJECTS AND METHODS
A diagnostic-performance test study 14 was carried out by cross-sectional analysis of the patient population participating in the study to evaluate the effectiveness of laparoscopy in patients with NSLAP. Briefly, a randomized clinical trial was designed to compare the effectiveness of laparoscopy and conventional diagnosis, based on close clinical observation and paraclinical tests, in patients with NSLAP 1 . Patients with NSLAP were allocated to the early laparoscopy group or conventional diagnosis group using a computergenerated, random table.
The conventional diagnosis method was defined as that based on permanent clinical assessment and laboratory tests. This may have included surgical intervention such as precision laparotomy carried out by the Instituto Maternal Infantil (IMI) emergency team. PID was diagnosed by the presence of at least two of Hager's main criteria 15 . Appendicitis was diagnosed upon visualizing the appendix with signs of swelling or necrosis. Ectopic pregnancy was diagnosed by means of ultrasonography and serial human chorionic gonadotropin (hCG) determination or laparotomy. Ovarian cyst diagnosis was confirmed by the presence of a cystic mass in the ovary during laparotomy. Diagnosis of a healthy pelvis was reached when no alterations were found in pelvic organs during laparotomy.
The laparoscopic diagnostic method was defined as that direct visualization of the abdominal pelvic cavity, by lens, through the abdominal wall carried out by the IMI laparoscopy team, which is qualified to carry out third-level laparoscopy and has experience in lower abdominal pain diagnosis 3 . Laparoscopic diagnosis of PID was done according to Hager and co-workers' criteria 15 . Diagnosis of unbroken ectopic pregnancy was reached by the presence of a bluish mass in the tube, whether or not associated with hemorrhagic material in the cul-de-sac 16 . Appendicitis and ovarian cyst diagnoses were based on visualization of those changes described in the previous paragraph. A healthy pelvis was diagnosed when no alterations were found.
Histopathologic and microbiologic tests were performed for all patients; pathologists with wide experience in gynecological disorders read the resulting histopathology. However, in none of the cases did any of the pathologists know results from laparoscopy, laparotomy, or possible etiological diagnoses of pain.
Cases were considered to be PID when patients fulfilled at least two of the following criteria: (c) Cultures positive for Neisseria gonorrhoeae or Chlamydia trachomatis in the cervix or endometrium, or by the presence of aerobic or anaerobic bacteria, N. gonorrhoeae, mycoplasma or ureaplasma in the endometrium 13,19 .
(d) Presentation of acute or chronic salpingitis in pathology.
A case was considered to be non-PID when a diagnosis of appendicitis, ectopic pregnancy, complicated ovarian cyst, or endometriosis was achieved based on visual findings and pathological criteria. Forty-nine patients then underwent visual examination ( Figure 1). Nineteen of the PIDgroup and 30 of the non-PID-group patients underwent laparoscopy or laparotomy, respectively. Exclusion criteria were signs of generalized peritonitis, prior intestinal surgery, shock, chronic pelvic pain, viable intrauterine pregnancy, weight Patien ts with NSLAP n = 110 Conven tion a l dia g nost ic method n = 55 Dia g nost ic lap aroscopy method n = 55 Pa tient with dia g nosis according to current g old standa rd n = 29 Pa tien t with dia g nosis a ccording to current g old sta nda rd n = 32 over 100 kg and current/previous psychiatric problems. It was considered that patients presented a healthy pelvis when visual examination of the pelvis did not reveal any abnormal finding, the endometrial biopsy was negative for PID, and cultures were negative, with pain disappearing following the procedure.
The IMI emergency team made the diagnosis of the underlying pathology. Diagnoses in the first 6 hours following admission and 48 hours later were taken into account as clinical diagnoses.
The proposal and written consent procedures were approved by the Universidad Nacional's Ethics Committee and the Instituto Materno Infantil's Institutional Review Board.
Before the study began, the sample size was calculated by taking into account that the difference between sensitivity percentages can be used for estimating sample size in diagnostic accuracy studies 20 . The EPI-INFO 6.0 statistical software was thus used to determine sample size. If it is assumed that the gold standard has 99% sensitivity and the clinical method to be compared has 70% sensitivity, then 30 patients per group are required, having 95% significance level; the probability of a type II error was chosen to be 20% with the c 2 test (1:1 ratio between subjects and controls).

Sample processing and transport
Endocervical and endometrial samples were taken from all patients for detection of N. gonorrhoeae, C. trachomatis, and mycoplasma. Endometrial samples were also taken by Pipelle (Unimar) for histopathology and culturing of these and other aerobic and anaerobic pathogens. The exocervix was washed with saline solution prior to sample taking. A sample from the peritoneum was taken when PID was suspected during laparoscopy or laparotomy.
N. gonorrhoeae samples were also obtained from the rectum and the bottom of the cul-de-sac, when indicated. Gram stain was done and samples were prepared for transport in Thayer Martin medium in 5% CO2 atmosphere, using the candle method. Identification and confirmatory tests were also carried out, as well as tests for b-lactamase on all isolates. Samples for C. trachomatis detection were transported in RPMI medium with bovine fetal serum and antibiotics, in refrigeration. Culturing was done in McCoy cells and colored with lugol. The samples for mycoplasma detection were transported in PPLO medium with horse serum and antibiotics, in refrigeration. Processing in PPLO agar culture with horse serum was then done. The N. gonorrhoeae, C. trachomatis, and mycoplasma samples were processed in the National Institute of Health's microbiology laboratory. The IMI microbiology laboratory processed endometrium samples for aerobic and anaerobic pathogens directly from the Pipelle in blood agar and anaerobic blood agar, respectively. A VDRL test was performed on all patients; positive cases were confirmed by using the TPHA test.

Histopathology of the endometrium
Ten percent buffered formol was used to fix samples in paraffin for staining with hematoxilyn and eosin. At least five sections were evaluated, quantifying the number of plasm cells per x120 field and glandular and stromal neutrophils per x400 field.
Transvaginal sonographic findings suggestive of PID included: liquid at the bottom of the Douglas sack; thickened fluid-filled fallopian tubes; multicystic ovaries; or anexial mass 21 . Ultrasound was done by the sonography team.

Statistical analysis
The groups' base socio-demographic, clinical, and laboratory variables are described below. Continuous variables were compared using Student's t-test or Mann-Whitney U test, according to normality. Categorical variable association was evaluated by c 2 test. Sensitivity and specificity were evaluated for each of the clinical and laboratory criteria as well as for those methods evaluated for PID diagnosis, using STATA 6.0 software. A PID forecast model was constructed from a multiple logistic regression model using the Stepwise method, having a 0.15 entry probability and 0.2 exit probability 22 . Those variables having the best discriminatory capacity in univariable analysis were then selected. The best clinical and laboratory indicators were taken into account. The model's predictive capacity was evaluated through specificity, sensitivity, positive and negative predictive values, and the ROC curve.

RESULTS
A diagnosis was achieved according to the previously described criteria in 61 (55%) of the 110 patients who presented with NSLAP between January 1998 and February 2000. The base characteristics of the group studied showed an average age of 28.5 years, 12% of such patients being less than 20 years old. Twenty percent were using an interuterine device (IUD) and 8% had a history of PID. Patients' base characteristics showed that patients having PID were younger than the control group. This difference takes on greater clinical importance when categorizing age in women older or younger than 20. The current and past use of an IUD and the presence of sexually transmitted disease (STD) were more frequent in the PID group. None of the other characteristics showed relevant differences between the two groups ( Table 1).
Patients' diagnoses can be seen in Table 2. It can be observed that there is a very low frequency of appendicitis for the type of population being studied. The prevalence of PID was 51%.
Isolated evaluation of operative characteristics for clinical criteria showed low sensitivity in all except anexial tenderness. Purulent endocervical secretion, neutrophyl count greater than 80%, and abdominal rebound pain showed the best specificity. Only one patient presented with a temperature greater than 38°C (Table 3).
Regarding operative characteristics for diagnostic methods for PID, we found that clinical examination on admission showed the greatest sensitivity, but is not very specific; accuracy is just 69%. This method's specificity improved with time, eventually reaching 100%. However, clear diagnosis was only achieved by this method in 13 of 16 patients (81%) within 48 hours of admission, with the method being used in hospitals as   Table 2 Exact diagnosis in patients having non-specific acute lower abdominal pain previously defined. Laparoscopy also has an optimum specificity; however, it does present some false negatives, having a 91% global accuracy.
There was significant isolation of anaerobic bacteria (odds ratio, OR: 4.87; 95% confidence interval, CI: 0.83-50.2) and mycoplasma (OR 5.7; 95% CI: 1.01-58.2) in endometrium in PID patients (Table 5). Three patients with mycolplasma in the endometrium also presented with syphilis. Another case, diagnosed with syphilis, presented an ectopic pregnancy as a complication of an underliying PID. Sensitivity was 32% and specificity rose to 90% when just isolation of anaerobic bacteria or N. gonorrhoeae in the endometrium was considered to be a sign of PID.
It was found that the best prediction model for PID in patients presenting with NSLAP was that involving the following variables: younger than 20 years old; current use of IUD; purulent endocervical secretion; and negative pregnancy test. The area below the ROC curve was 0.83. This model has 69% sensitivity, 88% specificity, 86% positive predictive value, 74% negative predictive value, 79% accuracy, and 50% prevalency    (Table 6). Operative characteristics can be seen in Figure 2.

DISCUSSION
The search for safe interventions taking available resources into account when establishing diagnosis is a priority for safe patient care in institutions. This statement becomes more forceful when considering the high costs of health attention currently sustained by employing cutting-edge technology, and the limited health-service resources available for attending patients. The case of NSLAP represents a good scenario for evaluating the different diagnostic techniques due to the high degree of uncertainty faced by clinical diagnosis. This situation has led to the diagnostic methods studied here being used without suitable prior evaluation of technology, either in terms of effectiveness (understood as being the appropriate use of an intervention in a given situation 1,23 ) or in terms of diagnostic accuracy.
Evaluation of diagnostic criteria has been limited by the difficulty of finding a control group allowing operative characteristics to be contrasted with clinical signs. Most studies have been carried out on a population of patients submitted to laparoscopy with clinical suspicion of PID, comparing patients in whom diagnosis had been confirmed (true positive result) with those in whom a different pathology was found (false positive result). They have also described the behavior of diagnostic criteria in patients having a confirmed PID diagnosis. The number of successes (true positive result) and mistakes (false negative result) can thus be estimated for a clinical sign or a laboratory result 11,24 . One study evaluated global PID clinical diagnosis in patients having acute pelvic pain, but did not reliably evaluate diagnostic criteria separately 10 .
According to the natural history of PID, the best diagnostic test would be one with a higher sensitivity, in order to prevent sequels in real clinical practice in our emergency rooms. We need a test providing as few false negative results as possible or one with high sensitivity (to avoid the real danger of infertility resulting from late treatment). We can thus reduce the consequences of this pathology with the most complete information possible, as quickly as possible (i.e. for us -the best possible).
Adnexal tenderness has recently been described as having the best sensitivity 25 , whereas endocervical secretion has been previously reported as having the best specificity 17 .   The performance of clinical criteria in this study may have been influenced by the studied population, where lower abdominal pain symptoms were less clear. It should be recalled that sensitivity and specificity are affected by the disease's spectrum 26 . Concerning the operative characteristics of clinical diagnosis, Sellors 10 found 52% sensitivity and 85% specificity. This difference can be explained because fallopian tube biopsy was taken as the gold standard in the aforementioned study. This may have presented a poor differential classification effect in our study, mainly affecting the group of patients with PID, as they would have been classified as being false positive, thus reducing specificity.
Even though laparoscopy is considered to be the gold standard for PID diagnosis, we saw that it gave an important number of false negatives. On comparing visual diagnosis with biopsy of the fimbria, sensitivity can fall to 65% 10 . Its high specificity sustains its use as an auxiliary technique for the clinical picture in difficult cases.
Endometrial biopsy has been reported as being a key element in PID diagnosis. Its sensitivity and specificity range from 75 to 92% and 67 to 87%, respectively 12,15 . There are differences in the gold standard used in these studies, based on visual diagnosis and microbiology of the upper genital tract.
There is no agreement respecting endometrial culture results; some authors do not agree that isolation of mycolplasma in endometrial cultures means a positive result 27 . Although we cannot determine causality, our results show a significant association between PID and endometrial mycoplasma-positive cultures. This finding is supported by the fact that, in some patients, mycoplasma isolation was accompanied by a diagnosis of STD. However, it might represent a marker for endometrial tissue infection.
If patients not presenting N. gonorroheae or anaerobic bacteria were to be excluded from analysis (ten patients in the PID group), sensitivity would rise to 50% (10/20) while specificity would rise to 90%.
The role of transvaginal pelvic sonography in PID diagnosis is under much discussion. Its sensitivity depends on the image described, estimated at between 32 and 85%, with 58-100% specificity 28 , and is operator-dependent. Our results are consistent with such results. Other prediction models have already been developed. Morcos and co-workers 11 , with 76% PID prevalency, found anexial pain, abdominal pain of duration less than 1 week, and high leukocyte count, to be the best predictors. Hudgu and co-workers 10 , with 67% PID prevalency, found that the best predictors were purulent endocervical secretion, speed of elevated globular sedimentation, and an endocervix culture for N. gonorrhoeae, both studies being diagnosed by laparoscopy. Peipert 21 , with 47% prevalency for PID diagnosed by endometrial biopsy, reported the following predictors: fever, leukocyte count greater than 10 000/mm 3 , and a positive bacteriological test in the cervix 21 .
Although the model's applicability is no longer useful for the whole population, it is more useful in women younger than 20 for whom the consequences of the disease can be more devastating (i.e. infertility, ectopic pregnancy, and chronic pain). Its predictive value will be most applicable to a gynecological reference service. In another scenario, prevalence ranges from 25 29 to 55% 30 of those patients in a general hospital or surgical service presenting gynecological pathology as the cause of NSLAP. If we take the lower percentage as estimating lower prevalency, we see that the model would have a 67% positive predictive value and an 89% negative predictive value.

CONCLUSIONS
(1) Clinical diagnosis was the best diagnostic method for PID screening in our environment; (2) laparoscopy showed the best specificity, and is thus useful in cases having an atypical clinical course for discarding pelvic pain caused by another factor; (3) the other diagnostic methods were of limited usefulness in our study; and (4) the greater and lesser criteria evaluated in isolation presented low operative characteristics, therefore their recommended use must again be evaluated in patients having an atypical PID clinical picture.