The Use of Scan Statistics and Control Charts in Assessing Ventilator-Associated Pneumonia Quality Control Programs

Scan statistics are concerned with clusters of events over time. In the realm of critical care medicine, such clusters might include the occurrence of ventilator-associated pneumonia (VAP). Given N patients over time, the number of observations in a “moving window” of fixed length can be counted and the maximum cluster value becomes a scan statistic for which both parametric and exact methods exist to calculate its rarity. A statistically unusual cluster may indicate a breakdown in quality. Another approach to monitoring rare events is a g-type statistical process control chart where prospectively observing unusually long periods of time between events can indicate a significant improvement in quality. Both methods are presented in detail and applied to a 24-bed medical/surgical ICU’s experience with VAP during a 27-month period.


INTRODUCTION
Respiratory failure is a common, resource intensive comorbidity among patients in the intensive care unit (ICU).Over 25% of adult, non-cardiac surgical patients require mechanical ventilation within 1 hour of ICU admission and 36% of all ICU patients require mechanical ventilation at some time during their stay [1,2].When intubated patients are on a mechanical ventilator for more than 48 hours, up to 20% can develop Ventilator-Associated Pneumonia (VAP), a potentially deadly condition which occurs in 100,000 to 300,000 patients per year in the United States [3,4,5].VAP is also associated with an increase in costs, length of stay, and over 50% of all antibiotic use in the ICU [5,6,7].Consequently, there has been a concerted effort to implement procedures that would reduce VAP [8,9], and since VAP may be preventable, a hospital's VAP rate is increasingly being used as a measure of hospital quality of care [5,10,11].Yet, the correlation between recommended practices or "safety bundles" of care and preventing VAP is unclear [5] and the use of a hospital's VAP rate as a quality measure is controversial [10,11].Despite this, the Centers for Medicare & Medicaid Services recently proposed withholding reimbursement for hospital costs associated with treating VAP [5].
Regardless of how different government agencies or insurers eventually decide to structure payments in patients with VAP, clinicians and hospital managers will want to assess their effectiveness in its prevention.Additionally, if there is an increase in the number of VAP events during a particular period of time, they need to know how likely that occurrence can happen by chance alone since an unusual cluster may indicate a breakdown in a safety protocol or an outbreak of a more virulent strain of bacteria.The purpose of this study is to illustrate some new statistical methods that can be used for a VAP quality control program.Examples will be provided using data taken from a 24bed medical/surgical closed ICU at a large tertiary care, teaching hospital in the urban northeast United States during January, 2006 to April, 2008.This information was collected as part of routine quality improvement efforts, and did not have identifiable patient data, and was thus not considered human subjects research requiring IRB approval.

THE DIAGNOSIS OF VAP
Currently, there is no single, unequivocal marker of VAP.Instead, physicians rely on several clinical signs such as the presence of a fever, increased pulmonary secretions, abnormal leukocyte counts, radiographic opacities, and cultures of pulmonary secretions to make the diagnosis.Prior studies have shown that diagnoses based on clinical signs alone leads to over-diagnosis while diagnoses based on bronchoalveolar lavage culture results alone may result in under-diagnosis [10,12].Further complicating matters is that other commonly encountered conditions in the ICU such as septicemia with pulmonary edema [10] can mimic VAP.
To accurately assess an ICU's VAP rate over time, the diagnosis of VAP must be consistent.To standardize the diagnosis, the Center for Disease Control's National Healthcare Safety Network (NHSN) (formerly known as the National Nosocomial Infection Surveillance System (NNIS)) has published a diagnostic algorithm (see Appendix A) [13].There is also a validated Clinical Pulmonary Infection Score (CPIS) for which a score above 6 is consistent with a diagnosis of pneumonia (see Appendix B [3,14,15]).In our study's ICU, one specialized infection control nurse (with one back up nurse) has the job of determining VAP among ventilated patients based on the NHSN criteria.Since only one person is making the interpretation, the VAP rate is not subject to interrater variability which may be a problem in other ICUs.
Much of the literature cites VAP rates as cases per 1,000 ventilator days, rather than incidence, but for the purposes of cluster analysis, we will focus on discrete events.Also, since data to this study were not tied to specific patients, rates per 1,000 vent days must be estimated based on population, not individual data.

ANALYZING A CLUSTER OF VAP EVENTS
Assuming that there is a consistent and accurate diagnosis of VAP over a period of time in an ICU, a major question is how to decide whether a particular cluster of events is unusual.For example, in one 3-month (91-day) period (September through November of 2006), there were 12 cases of VAP in this study's ICU.During this time, 278 patients received 1936 days of mechanical ventilation, implying a VAP rate of 6.2 VAPs/1000 vent days, well above the NHSN benchmark 50 th and even 75 th percentile at the time, and higher than the ICU's historic rate.In addition, 4 of these 12 cases occurred in the first week of October (See Figure 1).An analysis comparing 4 cases in 7 days (4/7) vs. 8 cases in the remaining 84 days (8/84) using a chi-square (χ 2 ) test or Fisher's exact test (a non-parametric version due to the sample size) reveals a p-value < 0.01.

Figure 1.
A graphical representation of new VAP cases from 9/1/2006 to 11/30/2006 These 4 cases happened to be consecutive (which was not relevant to the prior analysis) and if we view the chance of a VAP case to be 13% on any one day as there were 12 cases in 91 days, then the probability of 4 consecutive cases is less than 0.03%, implying a seemingly more unusual occurrence.However, the chi-square analysis only showed two arbitrarily defined proportions were different.That does not prove that the clustering was unusual.In fact, one can always "cherry pick" a particular sequence in a long run of observations and find clusters of events that appear unusual on the surface.One simply has to imagine the results of flipping a fair coin over time and realize that an occasional long run of consecutive heads or tails is more "natural" than a consistent alternating sequence of heads, tails, heads, tails.In our scenario, the appropriate question to ask is: What's the chance of observing 4 new VAP cases in any 7 day period (i.e., days 1 to 7, then days 2 to 8, then days 3 to 9, etc.) given that there were 12 cases in 91 days?To answer this question, we would need to rely on scan statistics.

Scan Statistics: The Discrete Case
Given N events distributed over a period of time, we can define S w as the largest number of events in a window of fixed time of length w' observed retrospectively.In our analysis, w' will be normalized, leading to the definition of w as the ratio of the time length of interest divided by the total time observed.In our example above with 7 days out of 91, w = 7/91 or ≈ 0.077.S w is called a scan statistic because one scans a series of time intervals (e.g., days 1 to 7, then days 2 to 8, etc.) to find the value of S w .We can define W k to be the smallest interval of time that contains k events.The distributions of S w and W k are related since P(S w > k) = P(W k < w).As per convention, we write the common probability with the notation: P(k; N, w) [16].
Historically, exact formulas have been known for two special cases: when all events happened within time window w and when two events happened within time window w.Their formulas are given in Equations 1 and 2, respectively [16]: More recently, Wallenstein and Neff derived an approximate formula (eqn.3) for P(k; N, w) [16]: where ( 4) and (5) The approximation formula is accurate when the derived P(k;N,w) is low (typically < 0.1) and is exact when k > N/2 and w < 0.5 [16].There are also published tables that are based on similar algorithms.For example, Neff and Naus derived probabilities for a range of k's and N's at different w's [17].
For our study's ICU, k = 4 events out of N = 12, in a window of w = 0.077 (7/91).Equation 3gives us a P(4; 12, 0.077) value of 0.38 while the tables of Neff and Naus give a value of 0.36 [17], and so we can conclude with either inference result that this type of clustering is definitely not unusual.
Table 1 provides p-values adapted from the tables of Neff and Naus [17] for some common values of N and k at a time period of w = 0.08 and highlight in boldface cases where the p-value < 0.05.For example, if we use 0.08 as our window length w, Table

Assessing a Seasonal Trend in the VAP Rate
Implicit in the prior analyses is that the incidence of VAP is random over time.Though this is often a reasonable simplifying assumption, it may not be true in a specific ICU.Rello et al. [18] found in a large U.S. database that trauma patients were at a higher risk for developing VAP.This is true in part because trauma patients are more likely to aspirate during their injury and a lung infection from this aspiration could appear 3 to 4 days later (presumably when the patient is still on a ventilator).There is seasonality to the presentation of trauma patients not only due to weather conditions, but also due to available hours of daylight [19].Thus, a rise in an ICU's VAP rate may not indicate a breakdown in quality but merely a seasonal change in patient case-mix.
A common mistake to determine if a seasonal trend exists in the data is to use a chisquare test with each month (or quarter, etc.)serving as the "bins" for the counts of VAP.The chi-square test is the wrong test to use as the 12! possible different ways to order the monthly counts would not affect the p-value of the test.Instead, a helpful first step would be to order the counts on a circular plot rather than a rectangular histogram to visually observe temporal trends.The circular plot is needed to help discern if a spike in VAP rate occurs during the winter as it does not split December from January.(Figure 2) There are several inference tests in the literature designed to detect seasonal trends.The first major improvement on the Chi-Square test was made by Edwards in 1961 [20].The Edwards test orders data sequentially over the unit circle's rim and assuming there are monthly data, creates 12 sectors.The counts in each month become weights and a weighted center of gravity is calculated using trigonometry.If the center of gravity is skewed away from the origin in the x-y plane, the null hypothesis is rejected and a seasonal trend is assumed [20].Unfortunately, the test is sensitive to extreme values; it lacks power, and assumes a sinusoidal trend in the data.Walter and Elwood improved upon the Edwards test by allowing for the size of the population at risk to vary during the time in question, though otherwise it is similarly flawed [21].Contemporaneously, Hewitt developed a simple, non-parametric test that had more power than the Edwards test but was designed for a 6 month cyclical trend and later Rogerson devised a more general version of Hewitt's method for 3, 4, or 5 month periods [22].Gao et al. recently proposed a more complex methodology based on angular (i.e., trigonometric) regression and made use of the von Mises distribution as they examine event data on the unit circle [23].Though this method and its extensions may have promise, they have acknowledged technical difficulties in its implementation [23].[16,24].The algorithm for deriving the ratchet scan statistic R is as follows (see Equation 4).To detect a three month seasonal trend (M =3) in yearly data when counts for each month are known, first find the "moving" 3-month maximum total in all 3-month "groupings" (i.e., find the number of VAP cases during {Jan, Feb, Mar}, {Feb, Mar, Apr},…, {Dec, Jan, Feb}) and denote the largest 3-month sum as R Max .
584 The Use of Scan Statistics and Control Charts in Assessing Ventilator-Associated Pneumonia Quality Control Programs Let w = 0.25 as 3 months is 25% (3/12) of the year, and N = the grand total number of events, then the ratchet scan static R is: (6) Figure 2 shows a circular plot of our study's VAP rate for each month over two years (e.g., January = 8 as there were 8 total cases in the two months of January during years 2006 and 2007).We do see an above average spike starting in September and peaking in October but curiously, August, November, and December had below average VAP counts.Is a seasonal trend present?The Edwards test gives a p-value of 0.011 on this dataset though as previously noted, it is sensitive to extreme values.The 3-month ratchet scan statistic is 1.83.Using Table 2 that provides critical thresholds for this statistic [16], we see that α falls between 0.5 and 0.1, which indicates no statistical significance at the traditional α of 0.05.If we calculated a one month (M=1) ratchet scan statistic (i.e., the maximum multinomial), we would get an R = 2.58 and see that it is somewhat unusual to have a month like October with 11 VAP cases, but again the statistic is between the thresholds of α = 0.1 and 0.05.

Prospective Scanning and the Use of Control Charts
The statistics that have been examined so far were retrospective.To derive probabilities for prospective events (e.g., the chance that there will be a certain number of events in fixed time period in the future), different unconditional scan statistics based on the Poisson process would be needed [16].Unfortunately, the computation of prospective scan statistic probabilities is more complex than the retrospective discrete case, though exact probability formulas are available and some limited results have been published in tabular form [16,17,25]. From a clinical perspective, there are alternatives to a prospective scan statistic per se, that are more useful when assessing an ICU's VAP rate in an on-going manner.One R = ( R -1 -Nw)/ Nw(1-w Max ) of the simplest types of benchmarking over time is illustrated in Figure 3 which is an approximation of the quarterly data from the study's ICU compared to published national VAP rates from the NHSN.Though lacking in statistical rigor, the figure shows that after a series of safety bundles (e.g., Head of Bed Elevation, blood glucose control, etc.) were implemented in the first quarter of 2005, the VAP rate declined.The figure also shows that by the start of 2008, this ICU was performing well as its VAP rate was lower than the NHSN's 25 th percentile.
The vertical dashed line marks the implementation of VAP safety bundles in the study's ICU.

Figure 3.
Quarterly ventilator-associated pneumonia (VAP) rates for the study ICU from January 2004 to June 2008 Another prospective quality control approach for an ICU is statistical process control (SPC) charts.Because VAP events are uncommon, a g-chart should be considered as it is a relatively new SPC chart ideally suited for rare events [26,27,28].In SPC, instead of monitoring the number of events like more traditional SPC charts, a g-chart plots the number of days between VAP events.The Center Line, Upper Control Limit, and Lower Control Limit for the g-chart are derived as follows [26,27,28]: Center Line (CL) = X -(average number of days between VAP events) Note that in a g-chart, as the process improves, the number of days between events would increase, and the data will move above the Center Line.In our study's ICU, the The Use of Scan Statistics and Control Charts in Assessing Ventilator-Associated Pneumonia Quality Control Programs average number of days between events was 14.0 days.The UCL formula above then yields a value of 57.5 days.Because the LCL value is negative, we set it to 0 as per convention.The downside to doing this is to lose the ability to detect a rate increase.
Benneyan provides several ways to address a negative LCL (e.g., use a value smaller than 3) which can be useful [26,27].As a caveat, the UCL and LCL are based on an approximation to the normal distribution which may not always be appropriate to a particular dataset.The g-chart for the study's ICU is presented in Figure 4.In our example, the first VAP event occurred at 1/6/06 and the y-coordinate of this point is at 6 as it is 6 days from the start of the study period.The second event occurred at 1/30/06 and is plotted as the second point from the left (y = 24) and due to space constraints this date and some others are not shown.On 1/30/06 another event also occurred and so the third point has its y coordinate at 0.

G-Chart for number of days between ventilator associated pneumonia (VAP) events
With a g-chart in place, accepted SPC "rules" can then be applied to determine unusual patterns indicating deteriorating or improving quality over time.For example, having 8 or more consecutive values below or above the CL is significant, as is 12 of 14 consecutive values on the same side of the CL or observing a value (i.e., a time period with no VAP events) more than 3 times the CL.
An interesting example of a g-chart application can be found in the work of Wall et al. studying catheter infections in the ICU [29].They constructed a g-chart before and after a quality initiative was implemented.They observed no values above the baseline UCL during the pre-initiative period but several values above the UCL during the postinitiative period, and so concluded that the initiative had a significant impact on quality.Following this line of reasoning, we observe two periods in Figure 4 where there was a potentially significant improvement in VAP rates as they were both above the UCL, though these may also have been periods where some intermittent special cause variation entered or left the ICU.
Both of these "well-performing" episodes happened in the spring (and a poor performing episode happened in the fall of 2006).This ICU had essentially the same personnel of intensivist physicians, nurses, and respiratory therapists, and compliance with established safety protocols for pneumonia patients was consistently close to 100%.Explanation for the consistent decline in VAP over time and during the spring in particular is unknown, though we did observe differences in case-mix of medical, surgical, and trauma patients throughout the year and note that these statistics do not adjust for case-mix (i.e., varying probabilities of getting VAP) which can distort the results.Thus, when a possible VAP cluster (or any outcome which has a variable diagnostic criteria) arises, the first task should be to confirm the validity of the data by checking if a different person is doing surveillance, if there's been a change in diagnostic method, or if there are changes in the ICU case mix.In this study's ICU, the person doing surveillance and the diagnostic criteria remained constant, though case mix (particularly the prevalence of trauma patients) did vary over time.

Alternative SPC Methods
A more common but more complex SPC method is the cumulative sum (CUSUM) chart.In this method, rather than plotting an individual statistic of interest (e.g., a mean value or days between events), a statistic based on cumulative sums is derived and plotted.In our context, a VAP rate value is determined a priori-perhaps based on an ICU's rate from the last year.Going forward, the sums of the current rate minus the target (historic) rate are added.If there is a constant shift above this rate, a visual shift is evidenced in the plot and there are methods to determine if the shift is statistically significant [30].An advantage of the CUSUM chart over traditional SPC methods is that it can detect a shift sooner; i.e., it requires a shorter Average Run Length (ARL) [30,31].A notable variant of the CUSUM is Bernoulli CUSUM (BCUSUM) developed by Reynolds and Stoumbos [32].
Sego et al. compared a BCUSUM method to a variety of other SPC procedures that have been proposed to monitor changes in rates when the event of interest is rare [33].Their conclusion was that the BCUSUM method was the most efficient in terms of ARL needed to detect a true shift [33].In a related paper, Joner et al. [34] compared the BCUSUM to a prospective scan statistic method described by Naus and Wallenstein [35].They found that the BCUSUM was slightly more efficient in detecting a rate increase but the scan method was easier to use in practice.They also show how both methods can be used together for surveillance and their methodology has much to recommend [34].
588 The Use of Scan Statistics and Control Charts in Assessing Ventilator-Associated Pneumonia Quality Control Programs

Case Mix and its Role in VAP
If case mix plays a substantial role in VAP rates, then a more advanced analysis based on risk-adjusted sequential probability ratios may be warranted, particularly in ICUs that treat a heterogeneous mix of patients over time.A practical example of this methodology can be found in [36].An alternative approach is to derive standardized VAP ratios similar to standardized mortality ratios (i.e., to make each VAP case the event of interest instead of each hospital death) and to adjust with a regression model predicting VAP, that is analogous to mortality prediction models such as APACHE IV or MPM III familiar to the intensive care community [37,38].We reiterate that a hospital's VAP rate is a controversial measure of quality.There are safety protocols that physicians follow to prevent VAP and there are certain interventions that focus on reducing the risk of VAP.These include weaning protocols to limit the time spent on the ventilator, protocols that prevent the aspiration of contaminated secretions, and a judicious use of antibiotics to avoid digestive tract colonization of harmful bacteria.In addition, new technologies such as subglottic suctioning and silver tipped endotracheal tubes may lower VAP rates [39,40].The fact that the VAP rate has decreased both in this study's ICU and nationally as evidenced by Figure 3 shows that VAP can be reduced.Nevertheless, some VAP cases are likely unavoidable (i.e., the difference between special cause and common cause variation) and physicians should carefully assess the patient's risk factors and the therapeutic interventions the patient did and did not receive when each VAP case occurs.

CONCLUSION
Scan statistics and control charts have been described in detail and applied to an ICU's experience with VAP during a 27-month period.Statistical methods discussed in this paper can be applied to not only VAP but also other serious events such as central line infections, medication errors, or falls to name just a few.Traditional statistical inference methods like the Chi-Square test and traditional SPC charts are often inappropriate for these scenarios.We encourage clinicians to adopt the new statistical methods of scan statistics, ratchet scan statistics, and g-type SPC charts or more advanced CUSUM methods for rare events in designing, implementing, and evaluating quality improvement initiatives.

NOMENCLATURE k
Afixed number of events of interest that are < N where N is the total number of events.M Number of consecutive months representing the seasonal trend in the ratchet scan statistic.N The total number of events in the study period P(k; N, w) The probability of k events out of N events occurring during time window w.R Ratchet scan static S w The largest number of events in a window of fixed time length w Normalized time length; defined as the ratio of time length of interest (e.g., 1 week) divided by the total time observed (e.g., 1 year).W k The smallest interval of time that contains k events Journal of Healthcare Engineering • Vol.
Journal of Healthcare Engineering • Vol. 1 • No. 4 • 2010 583 ‡ Adapted with permission from the American Mathematical Society from material originally published in Neff N, Naus J. Selected Tables in Mathematical Statistics, Vol.6: The Distribution of the Size of the Maximum Cluster of Points on a Line.American Mathematical Society, Providence, RI, copyright 1980.

Figure 2 .
Figure 2. Two years of new VAP cases by month from 2006 to 2007

Figure 4 .
Figure 4.G-Chart for number of days between ventilator associated pneumonia (VAP) events

Table 1 . P-values for the retrospective case of P(k,N,0.08) [17] ‡
1 582 The Use of Scan Statistics and Control Charts in Assessing Ventilator-Associated Pneumonia Quality Control Programs 1 shows that the p-value for 4 events out of 12 is 0.38887.Clinicians should find Table 1 with a length w of 0.08 useful for their own ICUs as it is approximately 7 days out of 3 months, 1 month out of one year, or 2 weeks out of 6 months, etc.
1 • No. 4 • 2010 Greek α Significance level of the hypothesis test