Diagnostic Accuracy Rates of Appendicitis Scoring Systems for the Stratified Age Groups

Background Many scoring systems have been developed for acute appendicitis, which is the most common emergent disorder in surgical practice. Considering the physiological changes and chronic diseases occurring with advancing age, an applied scoring system may not produce the same score in similar patients in all age groups. Objectives We aimed to compare the predictive values of scoring systems in different age groups. Methods In this prospective study, the patients operated on in our clinic with a prediagnosis of acute appendicitis between March 2020 and March 2021 were included. We divided them into three age groups as 18–45 years (group 1), 46–65 years (group 2), and >65 years (group 3). We compared the scores of the nine acute appendicitis scoring systems most commonly used in the literature for these age groups. Results A total of 203 patients were included in our study. The Alvarado scoring system yielded the most accurate results for group 1, whereas the Fenyo–Linberg scoring system was the most accurate system for group 2 and the Eskelinen scoring system for group 3. Conclusion Age should be considered as a major parameter during the selection of the scoring system to be applied for patients with prediagnosis of acute appendicitis. Our study revealed the Alvarado and the Fenyo–Lindberg scoring systems as the most accurate systems for the differential diagnosis of appendicitis in the 18–45 and 46–65 years age groups, respectively. Although we found the Eskelinen scoring system as the most accurate one in the >65 years age group, the confidence intervals indicated that it may not be appropriate for use alone in this group.


Introduction
Acute appendicitis remains the most frequent disorder requiring an emergent surgical intervention worldwide and occurs approximately in one of 10 individuals during the life course [1]. Considering that acute appendicitis commonly occurs in young employed adults, it also has negative economic and social impacts [2,3]. Although various studies have reported that the incidence of acute appendicitis remains higher in younger people, in particular, the males, acute appendicitis is no longer exclusively a disease of the youth in developed countries but also frequently seen in middle and advanced ages also [4][5][6].
Appendectomy is the most commonly performed abdominal surgery in general surgical practice. However, the negative appendectomy rates remain quite high (6.2-15.9%) despite the improved facilities and radiological examination methods [7,8]. Te abdominal surgical procedures, or a perforated appendix, can cause severe morbidities, such as recurrent episodes of intestinal obstruction due to intraabdominal adhesions, ectopic pregnancy etc. [9,10]. Tus, timely and accurate diagnosis is essential for the proper management [11,12]. Te most useful parameters for diagnosing acute appendicitis are the duration of abdominal pain, physical examination fndings, and laboratory parameters. Various imaging methods are used to validate the prediagnosis [13]. Many clinical scoring systems (CSSs) using diferent parameters have been developed to predict acute appendicitis [14,15]. Te purpose of applying a CSS is to predict acute appendicitis and distinguish the patients who need medical treatment or needing an emergent surgery preventing the probable complications that may increase mortality and morbidity [1,16]. CSSs may facilitate diferential diagnosis reducing the unnecessary radiologic examinations and surgical explorations [17]. Many previous studies have compared the efectiveness of CSSs developed for acute appendicitis [13,15]. Although the efcacy of these CSSs for the pediatric age group has been evaluated, no study has compared the efciency of these scoring systems among adult age groups [18].
In the present study, we aimed to compare the predictive values of scoring systems in diferent age groups. We tested the hypothesis proposing that a CSS may not show consistency in terms of accuracy rate among the diferent stratifed age groups.

Materials and Methods
Tis prospective study was approved by our institutional review board (approval no: 71522473/050.01.04/44). Te study was conducted in a tertiary training and research hospital. We evaluated patients aged >18 years with a diagnosis of acute appendicitis operated on in our clinic between March 2020 and March 2021.

Exclusion Criteria.
We excluded patients aged <18 years, pregnant patients, those who had given birth in the last 3 months, those with an existing malignancy, patients using steroids for any reason, immunosuppressed patients, COVID-19-positive patients, and patients with previous pelvic infammatory disease.

Study
Setting. An emergency medicine specialist evaluated patients who presented to the emergency department with abdominal pain. Te general surgical assistant physician was consulted after requesting blood tests and ultrasound (US) from each patient with a suspected acute abdomen. Subsequently, the surgical assistant examined the patients, evaluated the tests, and consulted the on-duty surgeon. Te on-duty consulting surgeon examined each patient within the frst hour after the consultation. An abdominal computed tomography (CT) scan was performed in cases where there was doubt about the diagnosis. Following these examinations, the consulting surgeon hospitalized the patients with a preliminary diagnosis of acute abdomen. Te emergency physician discharged the other patients or consulted with other departments. Te general surgery assistant collected the variables required to calculate the scores for each system (RIPASA (Raja Isteri Pengiran Anak Saleha Appendicitis), Appendicitis Infammatory Response (AIR), and the scoring systems of Tzanakis, Eskelinen, Ohmann, Lintula, Fenyo-Lindberg, and Karaman) used to predict acute appendicitis (Table 1) and calculated each patient's score for each scoring system. Regardless of the CSS result (the results were not disclosed to the surgeon), the consulting surgeon evaluated the patient's history, physical examination fndings, and laboratory and radiological   examination results, and an open or laparoscopic appendectomy procedure was performed if the patient was diagnosed with acute appendicitis. We defned the outcome of appendicitis according to the histopathological examination and recorded the histopathological examination results on the same datasheet to evaluate the accuracy of the CSSs for predicting acute appendicitis. We divided the patients into three age groups (18-45, 46-65, and >65 years). Statistical analyses were performed to compare the scoring systems between the groups.

Statistical Analysis.
Descriptive analyses were performed to obtain information on the general characteristics of the study population. We used the Kolmogorov-Smirnov test to evaluate whether the distributions of numerical variables were normal. We used the independent-samplesttest and Kruskal-Wallis test to compare the numeric variables between the groups. Te numeric variables are presented as mean ± standard deviation or median (Q1-Q3). Categorical variables were compared to the chi-square test, and are presented as counts and percentages. A p value <0.05 was considered signifcant. We used receiver operator characteristic curve analysis to identify the best cut-of values and assess the performance of the appendicitis test scores. Analyses were performed using SPSS statistical software (version 23.0; IBM Corp., Armonk, NY, USA).

Patients with and without Appendicitis.
Appendicitis was determined by histopathological examination in 180 of the 203 patients. Te mean age of the patient group with appendicitis was 37.43 ± 16.71 years, while that of the patient group without appendicitis was 32.09 ± 17.50 years. A total of 118 (65.5%) patients with appendicitis were male and 62 (34.5) were female. In total, 176 (97.7%) patients with appendicitis were native, and 4 (2.3%) were foreign. No signifcant diferences in age, gender, or nationality were observed between patients with and without appendicitis (p � 0.67, 0.96, and 0.84, respectively). We observed appendicitis by US in 169 patients with appendicitis but could not detect appendicitis in 11 (sensitivity: 91.4%, specifcity: 38.9%) ( Table 2).
Te pain relocation rate was signifcantly higher in the appendicitis group than in the group without appendicitis (53.6% vs. 34.8%, p � 0.04).
Te rate of decrease in bowel sounds on auscultation was higher in the appendicitis patient group than in the group without appendicitis (10% vs. 4%, p � 0.004).

Comparisons among the Tree Age Groups.
No signifcant diferences in gender or nationality were observed among the three groups (p � 0.811, 0.851, respectively).
A higher proportion of the 18-45 years group were admitted to the hospital at an early stage than in the other age groups (n � 86/149; 57.7%; p � 0.04). Te proportion of patients with abdominal pain onsetting in the periumbilical region that relocated to the right lower quadrant was signifcantly lower in the >65 years group than in the other two age groups (n � 15/20; 75%; p � 0.02). No diferences in any other complaints were observed among the groups ( Table 2).
No signifcant diferences in the physical examination fndings were observed among the age groups.
Te CRP level was signifcantly higher in the >65 years group than in the other two groups (86.8 (42.6-146.13) mg/ L) (p � 0.002). Te leukocyte count was higher in the 18-45 years group than in the other two groups (14,600 (11,200) μl/ml) (p � 0.049). No diferences in any other blood parameters or urinalysis were observed among the age groups. Te rate of appendicitis detection by US was 91.3% (n � 136/149) in the 18-45 years age group, 91.2% (n � 31/34) in the 46-65 years group, and 90% (n � 18/20) in the >65 years group. Acute appendicitis was revealed radiologically in 41 (91%) of 45 patients evaluated by CT. However, the diagnosis of only 39 (94.2%) of these patients was confrmed by histopathological examination; appendicitis was not detected in the remaining 2 patients (4.8%).

Discussion
Te diagnosis of appendicitis can be made with 75% certainty via physical examination and based on the complaints of patients [13]. However, we did not fnd any signifcant diferences in physical examination fndings on admission between patients with and without appendicitis, except for bowel sounds on auscultation and relocated pain. To perform the auscultation, the stethoscope is placed over the abdomen skin and the sounds produced by intestinal peristalsis are listened. However, the auscultation is a subjective examination method. It provides information about the presence of peritonitis but could not be used in the diferential diagnosis. In their recent clinical study, Zaborski et al. suggested that the evaluation of the structure of bowel sounds might enable the use of bowel sounds when making a diferential diagnosis [19]. Physical examination fndings and laboratory parameters are frequently reported in the literature as important factors for diagnosing acute appendicitis in patients aged 20-40 years who do not have a chronic disease or history of regular medication use or pregnancy [13]. Chronic medication use in the elderly and changes in pain tolerance with age may underlie the differences in physical examination fndings and blood parameters between age groups, and the variations in diagnostic parameters, rendering the diagnosis of acute appendicitis challenging [20]. Abdominal US, physical examination fndings, and laboratory parameters have diferent weights among scoring systems, and some parameters are not included in all systems. Terefore, it should not be expected that CSSs have the same predictive value in every age group.
Abdominal CT screening increases the accuracy of acute appendicitis diagnosis, and high sensitivity and specifcity can be achieved (98%). However, a CT scan is not routinely recommended for diagnosing acute appendicitis due to known disadvantages including radiation exposure, high cost, and unsuitability for pregnant women [21][22][23][24]. CT is more benefcial in complex cases where US is inadequate. Furthermore, the 2020 World Society of Emergency Surgery (WSES) guidelines recommend that CT be performed only in cases of negative US fndings; this will reduce the rate of CT by 50%. Tey recommended this strategy in patients with suspected appendicitis [3]. In our study, while US was performed in all patients during the preoperative period, CT was used in only 22.1% of patients, in accordance with the  Emergency Medicine International strategy proposed by the WSES. Our negative appendectomy rate was 11.3%, which is consistent with the literature [7]. Considering the high sensitivity and specifcity of US for diagnosing acute appendicitis, the Tzanakis CSS, which uses US as a criterion, was expected to be more accurate in all age groups [21,25]. However, this will only apply if all US examinations are performed under ideal conditions, and by experienced operators. Tzanakis et al. reported that US produced a false-negative rate of 24% for diagnosing appendicitis in the 2005 study in which they introduced their score [21]. Various studies have demonstrated that numerous factors afect the performance of the radiologist, such as the daily patient load, clinical experience, and consultations outside of working hours [15,26]. Moreover, the importance of US in the Tzanakis score (6 of 15 points, 40%) means that the radiologist's performance is the most crucial factor in the overall score. Contrary to our expectations, the Tzanakis score was not superior for predicting acute appendicitis over the other scores in any of the three age groups in our study. Radiologists carry out US examinations during various working hours; the predictive power of the Tzanakis score will increase if patients are evaluated during daytime working hours by the same experienced radiologist.
Te predictive power of the CSSs developed for acute appendicitis varies [1,27]. Te Alvarado scoring system is reportedly more accurate in Western populations, while the RIPASA is more accurate in Middle Eastern and some Asian populations; the two scores are comparable in Eastern populations [16,28]. Te RIPASA score was developed for the patients of RIPAS Hospital, Brunei in 2010. Foreign nationality was added as a variable to this scoring system, given the later hospital admissions of foreign patients (because they generally do not have social security numbers) [28]. Pain tolerance difers between patients of diferent ethnic origins [29]. Te proportion of the immigrant population in our population was 4.37%; furthermore, 45.5% of whom was <20 years of age. No signifcant diference was found between our patient groups in terms of nationality [30]. Tis may be one of the reasons why the RIPASA was not superior to the other CSSs in any age group in our study. In addition, the variations of demographic characteristics and socioeconomic levels of immigrants in diferent societies could explain the inability of RIPASA to outperform other scoring systems in our study.
Gender and negative urinalysis results are only considered in the RIPASA and Ohmann CSSs. Existing pain in the young age group is more likely due to appendicitis. However, in our study, the Ohmann CSS was not superior to any of the other CSSs in any age group. In their 1999 study, Ohmann et al. did not recommend using their CSS as a standard tool for the diferential diagnosis of acute appendicitis in any age group [31].
Te Fenyo-Lindberg, Lintula, and RIPASA CSSs consider the gender of the patient when predicting acute appendicitis. Te Fenyo-Lindberg CSS is more accurate than the others for predicting acute appendicitis in female patients [32]. In our study, the Fenyo-Lindberg CSS was more valuable in the diferential diagnosis of acute appendicitis in the 46-65 years group. However, its specifcity was low (0.46). It can be concluded that the Fenyo-Linberg scoring system was superior to the others in this age group due to the high female : male ratio in the 46-65 years group (38%).
Various studies have examined the cut-of values to exclude a diagnosis of appendicitis according to the Eskelinen score [33]. Te cut-of value was signifcantly higher in our 18-45 and >65 years age groups than in the 45-65 years group. Tis may be because right lower quadrant pain is a weighted parameter in the Eskelinen score, and pain tolerance changes with age (Table 4) [20]. In our study, when the cut-of value to exclude appendicitis was 67.7, the Eskelinen CSS was superior to the others in the oldest age group. Lintula et al. developed a scoring system based on examinations of children aged 4-15 years who presented to Kuopio University Hospital with suspected acute appendicitis [18]. Although their scoring system was more accurate in the 18-45 years age group in this study, it was not superior to the other scoring systems.
Muscular defense is the most critical factor afecting the total AIR score. However, the classifcation of muscular defense as mild, moderate, or high is based on subjective opinions and may difer among clinicians [16]. Considering that muscular defense is weaker during the early stage of the disease, and increases in the later period, the AIR score should be more accurate in elderly patients admitted to hospital in an advanced stage of the disease. However, in our study, the AIRS was not superior in any age group. Tis may be due to subjective judgments of the severity of examination fndings by the surgeons who performed the physical examinations.
Karaman et al. conducted a study in our city in 2018 and stated that the Karaman scoring system, which consists of six parameters, is more accurate for predicting appendicitis than the Alvarado score. However, in our study, the Karaman CSS was not superior to the other scoring systems in any age group [15]. Te relatively small sample size of our study group may be the reason why Karaman's score was not found to be superior to other scores among age groups. Te results obtained by Karaman et al. and the results obtained in our study are diferent despite both studies being carried out in the same country. Tis diference shows that the scoring systems whose validity and reliability have been revealed by various well designed studies should be preferred when choosing the appendicitis scoring system for the diagnosis of acute appendicitis. Almost all novel CSSs are compared to the Alvarado score, as it was the frst CSS developed for acute appendicitis and has frequently been shown to be accurate [1,15,28,31,34,35]. In our study, the most efective CSS in the 18-45 years age group was the Alvarado score, which was also the second most accurate scoring system (after the Fenyo-Lindberg CSS) among patients of all ages.

Limitations.
Te main limitation of this study was the small number of patients, particularly in the >65 years age group. Also, this was a single-center study and had a very high number of positive results that infated the predictive values.
Emergency Medicine International

Conclusion
Appendicitis scoring systems help clinicians make a diferential diagnosis in cases where imaging methods cannot be used or are insufcient. In this study, the most accurate scoring system for the diferential diagnosis of appendicitis was the Alvarado for the 18-45 years age group and Fenyo-Lindberg CSS for the 46-65 years group. Te Eskelinen scoring system was superior to the others in patients aged >65 years; however, it may not be appropriate to use this scoring system in this age group, based on the confdence intervals calculated herein.

Data Availability
Access to data is restricted. Te data were obtained with the permission of the hospital management and the ethics committee, with the guarantee that they would not be shared with third parties.

Ethical Approval
Tis study was approved by the Faculty of Medicine, Sakarya University, Ethics Committee approval no: 71522473/ 050.01.04/44.

Conflicts of Interest
Te authors declare that they have no conficts of interest.

Authors' Contributions
EG conceptualized and designed the study, acquired the data, analysed and interpreted the data, drafted and revised the article, and gave fnal approval for the submission. ZB conceptualized and designed the study, acquired the data, drafted and revised the article, and gave fnal approval for the submission. RC designed the study, performed critical revision of the article, and gave fnal approval for the submission. BM acquired the data, performed revision of the article, and gave fnal approval for the submission. BK acquired the data, performed revision of the article, and gave fnal approval for the submission. TH acquired the data, performed revision of the article, and gave fnal approval for the submission. FA acquired the data, performed revision of the article, and gave fnal approval for the submission. UE performed acquisition of data, analysis and interpretation of data, revision of the article, and gave fnal approval for the submission.