Comparison of Weight-Gain-Based Prediction Models for Retinopathy of Prematurity in an Australian Population

Purpose Four weight-gain-based algorithms are compared for the prediction of type 1 ROP in an Australian cohort: the weight, insulin-like growth factor, neonatal retinopathy of prematurity (WINROP) algorithm, the Children's Hospital of Philadelphia Retinopathy of Prematurity (CHOPROP), the Colorado Retinopathy of Prematurity (CO-ROP) algorithm, and the postnatal growth, retinopathy of prematurity (G-ROP) algorithm. Methods A four-year retrospective cohort analysis of infants screened for ROP in a tertiary neonatal intensive care unit in Brisbane, Australia. The main outcome measures were sensitivities, specificities, and positive and negative predictive values. Results 531 infants were included (mean gestational age 28 + 3). 24 infants (4.5%) developed type 1 ROP. The sensitivities, specificities, and negative predictive values, respectively, for type 1 ROP (95% confidence intervals) were for WINROP 83.3% (61.1–93.3%), 52.3% (47.8–56.7%), and 98.4% (96.1–99.4%); for CHOPROP 100% (86.2–100%), 46.0% (41.7–50,3%), and 100% (98.4–100%); for CO-ROP 100% (86.2–100%), 32.0% (28.0%–36.1%), and 100% (98.3–100%); and for G-ROP 100% (86.2–100%), 28.2% (24.5–32.3%), and 100% (97.4–100%). Of the five infants with persistent nontype 1 ROP that underwent treatment, only CO-ROP was able to successfully identify all. Conclusions CHOPROP, CO-ROP, and G-ROP performed well in this Australian population. CHOPROP, CO-ROP, and G-ROP would reduce the number of infants requiring examinations by 43.9%, 30.5%, and 26.9%, respectively, compared to current ROP screening guidelines. Weight-gain-based algorithms would be a useful adjunct to the current ROP screening.


Introduction
Retinopathy of prematurity (ROP), a disease of the developing retinal vasculature of premature infants [1,2], is a signifcant cause of adverse events and morbidity such as retinal detachment and irreversible visual impairment [1,2].
Infants at risk of developing ROP undergo repeated retinal screening examinations to detect severe disease that requires treatment [3][4][5].Current ROP screening guidelines recommend examination of infants below a certain gestational age (GA) and birth weight (BW) which are determined according to the local characteristics of the premature population and the quality of neonatal care [6] (e.g., GA less than 31 weeks and BW less than 1250 g in Queensland, Australia).Infants with higher GA or BW than screening cutofs who have an unstable clinical course are also screened by the judgement of the neonatologist [7].
Te detection yield of ROP screening is low.According to several years of the Australian and New Zealand Neonatal Network Annual Reports [8], it is clear that the signifcant majority of infants screened for ROP have a low likelihood of developing severe disease (3-4%) [8], and this matches studies in other developed nations [9][10][11][12][13].Current screening methods for ROP cause neonatal distress including hypertension, decreased oxygen saturation, and the oculocardiac refex [4,14,15].Other issues with the low detection yield of ROP screening include parental anxiety [4] and frequent hospital presentations or prolonged hospital admissions for screening [16].As only a small number of infants examined require treatment for ROP, improving the detection of ROP has the potential to increase the cost-efectiveness of current ROP screening [7].
With a greater understanding of the pathophysiology of ROP, clinical studies have shown that prolonged early IGF-1 defcits are associated with a higher risk of subsequent sightthreatening ROP [1,2,17].Defciencies in IGF-1 lead to a hypoxic preclinical phase resulting in a more severe subsequent proliferative vascular clinical phase.However, as routine serial IGF-1 level monitoring would be challenging to obtain and costly, postnatal weight gain has been adopted as a surrogate [10].
Several screening algorithms using postnatal weight gain to refect serum IGF-1 levels have been developed in the last decade [9,10,[18][19][20][21].Tese algorithms utilise the postnatal weight gain in combination with the GA and BW characteristics to signal an alarm that a particular infant has a high risk of developing severe ROP.Refecting the variable and complex nature of this disease, these algorithms must be validated at a local level prior to being implemented into clinical practice.Terefore, further studies in Australian cohorts are required [7].
To our knowledge, no studies have directly compared the outcomes of all four algorithms for the same cohort.Tis study aims to compare the performance of four algorithms, namely CHOP-ROP (Children's Hospital of Philadelphia ROP) [18,19], WINROP (weight, insulin-like growth factor I, neonatal ROP) [10,20], CO-ROP (Colorado retinopathy of prematurity model) [21], and G-ROP (postnatal growth and retinopathy of prematurity) [9] in an Australian tertiary level NICU setting.Te secondary objective is to estimate the impact of a more targeted screening process in reducing the number of examinations in low-risk infants to focus on high-risk infants.

Methods
Tis retrospective cohort study was conducted from January 2017 to December 2020 including all premature infants admitted to the neonatal intensive care unit (NICU) at the Mater Hospital in Brisbane, Queensland, who underwent ROP screening (criteria GA < 31 and/or BW < 1250 g or who had an unstable clinical course determined by the treating neonatologist).Digital wide feld images and standard binocular indirect ophthalmoscopy (if required) were used to diagnose and classify ROP.
Weight data were collected from a review of the electronic records.Infants with the following were excluded from this study: the presence of clinical conditions that cause nonphysiological weight gain (including hydrocephalus and severe subcutaneous oedema) and infants with incomplete medical records (for example, infants transferred from another hospital without weight gain data or infants who were having ongoing retinal screening but who deceased prior to the determination of fnal ROP outcome were excluded).
Data collected included birthweight, gestational age, ROP outcomes including treatment, treatment modality, the postmenstrual age at the time of treatment, and all postnatal weight gain measurements until discharge from the ROP screening clinic.Te ETROP study [22] classifcation was the basis for the categorisation in our study (no ROP, mild ROP, type 1 ROP, and type 2 ROP).Mild ROP is the presence of ROP that does not meet the criteria for type 1 or type 2 ROP.
Data were entered into the following four weight gain predictive algorithms according to their inclusion criteria.
2.1.WINROP.Only infants less than 32 weeks of gestation at birth irrespective of the BW are eligible to be entered into the WINROP algorithm, which is available online [10,20] [19].CHOPROP requires the documentation of neonatal weight at the end of the second week to be included in the algorithm.Weight change in the frst week was disregarded as per the original study.Te daily weight gain rate was calculated by weekly measurements (the diference between current weight and previous week's weight is divided by 7).Alarm cutof of >/ � 0.014 was used to identify neonates at risk of type 1 ROP.

CO-ROP.
Infants less than or equal to 30 weeks of GA and who have a birthweight of less than 1501 g are eligible to be evaluated by CO-ROP [21].Tey also should not gain more than 650 g by the 28th day of life.

G-ROP.
Tis is the latest of the algorithms to be designed and utilised by the largest cohort of infants during development [9].Infants meet the criteria for ROP screening if they met any of the following six criteria: (1) birthweight <1051 g; (2) gestational age <28 weeks; (3) weight gain between day 10 and 19 <120 grams; (4) weight gain between day 20-29 <180 grams; (5) weight gain between day 30-39 <170 grams; (6) diagnosis of hydrocephalus.For infants not meeting the GA, BW, or hydrocephalus criteria for G-ROP, there was no weight measurement at a particular measurement day (for instance, day 10, 19, 20, 29, 30, and 39 for G-ROP), and then, the nearest weight measurement (within 2 days) was used.
Diagnostic performances of all four algorithms were described by calculating sensitivity, specifcity, PPV, NPV, and likelihood ratios.Te Wilson method was used to determine the 95% confdence intervals for all calculations.We also sought to calculate the efciency of these algorithms by calculating the reduction in the number of infants that would require eye examinations.For this, we proposed that the algorithms were utilised to make decisions on whether 2 Journal of Ophthalmology infants would be screened or not based on whether they were alarmed or not.Infants that did not alarm would not undergo an eye examination.
Ethical approval for the study was obtained from the Mater Research Governance (HREC/15/MHS/112).

Results
Five hundred thirty-one infants met the inclusion criteria.Figure 1 describes the study fow and reasons for exclusion from the study.
531 infants included in the study had a median BW of 1100 g (IQR 432 g) and the median GA of 28 weeks (IQR 3 weeks).Of the infants, 296 (55.7%) were male.Any ROP developed in 356 infants (67.0%), of whom 24 (4.5%) developed type 1 ROP, and all received intravitreal bevacizumab injections and/or laser retinal photocoagulation.A further 40 (7.5%) infants developed type 2 ROP of whom 5 received late laser retinal photocoagulation for nonresolving activity/ongoing ischaemia.Te demographics of included infants are found in Table 1.
Diagnostic performances of all four screening algorithms are shown in Table 2.All 531 infants were entered into CHOP-ROP, CO-ROP, and G-ROP to determine the risk of type 1 ROP (Table 2).Using WINROP, a total of 508 infants were entered (the remaining 23 infants were more than 32 weeks gestation, and none of these 23 infants developed type 1 or type 2 ROP).
All four algorithms have high negative predictive values (NPV) for type 1, type 2, and treated ROP.CO-ROP had the highest negative predictive values and 95% confdence intervals (95% CI) for type 1, type 2, and treated ROP closely followed by G-ROP.Te 95% CI for NPV for an infant with a negative test result on CO-ROP that does not have type 1 ROP, not have type 2 ROP, and not have treatment requiring ROP was 98.3-100%, 94.7%-99.4%,and 97.7-100%, respectively.Likelihood ratios for all algorithms screening type 1, type 2, treated, and any ROP were calculated and included in Table 2.We found that the negative likelihood ratios were consistent with the high negative predictive values for type 1 ROP and treated ROP.
If the algorithms were used to reduce the number of infants requiring examinations, G-ROP would have reduced the number of infants requiring examinations by 143 (26.9%) including 4 neonates with type 2 ROP infants of which one required treatment; compared to 162 (30.5%) for CO-ROP including 3 infants with type 2 ROP, none of whom required treatment; if CHOPROP was used 233 (43.9%), infants would not undergo examinations, and this number includes 8 infants with type 2 ROP of which 2 required treatment; and WINROP would reduce the number of infants undergoing examinations by 252 (49.7%) which includes 13 infants with type 2 ROP of which 1 required treatment and 4 infants with type 1 ROP (who all required treatment).

Discussion
We present, for the frst time, a comparison of the sensitivities, specifcities, predictive values, and likelihood ratios for multiple weight gain algorithms within a single Australian cohort.
Our study found a prevalence of 4.5% of type 1 ROP in infants screened for ROP between 2017 and 2020 which is similar to other prevalence studies in industrialised nations.Postnatal Growth and Retinopathy of Prematurity (G-ROP) retrospective cohort study conducted in 29 hospitals [11] found a prevalence rate of 6.1% of infants developing type 1 ROP.In a large Swedish cohort, the rate was 5.3% [12], and in a UK cohort, the prevalence rate was 4.0% [13].
In this study's Australian cohort, WINROP had the lowest sensitivity for detecting type 1 ROP.Of all the algorithms WINROP has been the most extensively studied; 36 studies across the world with a resultant range of sensitivities from 100% to 55% [23].Tere is one other study conducted within an Australian population that found a sensitivity of 85.7%, a specifcity of 59.0%, an PPV of 7.0%, and an NPV of 99.1% [24].A recent systematic review found that WINROP has a sensitivity of 89%, specifcity of 57%, and a negative likelihood ratio of 0.19 [23].Tere were three validation studies that found sensitivities less than 75% [25][26][27].Zepeda-Romero et al. argued that their low sensitivity was because the cohort of infants studied were exposed to unmonitored supplemental oxygen that caused larger and more mature infants to develop severe ROP [26].When the validation study was repeated, with NICU changes implementing monitoring with constant pulse oximetry, oxygen saturation targets of 85-95%, alarms for oxygen saturation at 90-95%, and education courses for medical and nursing staf, the authors found an increased sensitivity of 80% [28].A Taiwanese study also noted poor sensitivity in WINROP which was suggested to be a likely consequence of regional variation in expected weight gains between a South-East Asian premature infant and a European premature infant [27].Models developed from small cohorts can be overftted resulting in undesirable outcomes when validation studies in other regions of the world are performed [7].Such results highlighted the need to perform such validation studies within our own neonatal population prior to implementation into clinical practice.
We found the algorithms did not have 100% sensitivity in detecting type 2 ROP; however, it should be noted that WINROP and CHOPROP were designed to identify type 1 ROP only.In our cohort, we found a prevalence of 4.7% of persistent nontype 1 ROP that required treatment.Tis is similar to Koucheki et al. [44] who found 4.9% of infants in the Canadian population.Tese rates are lower than that found by Liu et al. [45] who performed a secondary analysis of data from the G-ROP study to look at the prevalence and indications for treating infants who did not meet ETROP type 1 ROP criteria.Tey found that of the 1004 eyes of 514 infants who received treatment for ROP, 126 eyes of 91 infants (0.8% of all eyes and 12.5% of treated eyes in G-ROP) were treating nontype 1 ROP [45].In Koucheki et al., the decision to treat the cases of nontype 1 ROP with  4 Journal of Ophthalmology unfavourable structural outcomes was made before postmenstrual age 45 [44], whereas in our study, this occurred from postmenstrual age week 45 and up to postmenstrual age week 55.Of the fve infants with persistent nontype 1 ROP that underwent treatment, only CO-ROP was able to successfully identify all (CHOPROP alarmed 3 out of the 5 infants).As there are no clear current guidelines on whether to treat or observe persisting nontype 1 ROP, the clinical experience of the ophthalmologist and the patient factors ultimately determine management.Tis ultimately limits the utility of these algorithms in these neonates.Te use of weight-gain-based algorithms could be a useful adjunct to screening by reducing the number of infants requiring examinations.We found a reduction in eye examinations with the utilisation of weight-gainbased ROP algorithms ranging from 26.9% for G-ROP to 43.9% for CHOPROP.For CHOPROP, CO-ROP, and G-ROP, our study found similar percentages to other studies in the reduction of number of eye examinations needed [19,20,23,[29][30][31][32][33][34][35].Our study and other worldwide studies show that weight-gain-based ROP algorithms would help reduce the workload for ROP screening by better targeting premature infants most at risk for developing severe ROP.
To date, validation studies have predominantly only reported sensitivities and specifcities and have not included other evaluations of screening tests such as predictive values and likelihood ratios [9,10,[18][19][20][21].Tese indicators do not allow an individual clinician to safely decide whether a particular infant can be safely excluded from screening, especially if the infant has a high pretest probability of developing severe ROP (such as extremely preterm or extremely low birthweight/adverse clinical course).We report on likelihood ratios as they can be a better way to apply the results of diagnostic tests to the individual patient [46].
Tere are some limitations in this study.Due to its retrospective nature, there was no method to standardise the data collection; for example, a diference of 1 day in the weight gain measurement can modify the alarm thresholds (particularly, for WINROP and CHOPROP).In addition, 63 infants out of 594 (10.6%) infants were excluded predominantly because there were not enough postnatal weight measurements to be able to determine an alarm risk.Tis is a limitation as some algorithms require longitudinal weight measurements and infants who have had their neonatal care in a diferent hospital and subsequently get transferred do not have complete postnatal weight gain data.Tis data loss has been a noted issue previously [25].Furthermore, retrospectively assessing clinical records limits the ability to accurately diferentiate physiological from nonphysiological weight gains.
A new ROP predictive model has been proposed known as DIGIROP [47].Tis model utilises data available at birth for greater convenience and was demonstrated to be as accurate as CHOPROP, WINROP, and CO-ROP [48].Once the latest version becomes available, further validation studies would be useful.
Our fndings suggest that weight-gain-based ROP predictive models could play a role as an adjunct to ROP screening and add to reported sensitivities and specifcities worldwide.Weight-gain-based ROP predictive models would improve the balance between reducing screenings and ensuring timely intervention so that healthcare systems can allocate resources more efciently.Caution should be advised if these algorithms are used as screening criteria, as 3 of the 4 algorithms missed at least one infant that required treatment.Weight-gain-based algorithms are also limited as around 10% of infants were excluded due to interhospital transfer, death, and nonphysiological weight gain.
Te direction of future studies in weight-gain-based ROP predictive models should not only continue to study the efcacy of these algorithms in the local population but also report the negative predictive values and negative likelihood ratios in addition to sensitivity and specifcity.Future studies could also assess how easily weight-gainbased algorithms can be incorporated into ROP screening clinics such as by measuring the time added to the clinic list preparation when utilising these algorithms.Te costefectiveness of weight-gain-based ROP predictive models could be determined from the number of infants that would not require screening.Prospective studies should be considered to collect an increasing amount of data that can optimise the criteria that weight-gain-based ROP predictive models are based on.From these studies, it can be determined whether weight-gain-based ROP predictive models can shift the cutof screening criteria for ophthalmoscopy screening to a lower gestational age and birth weight.

Figure 1 :
Figure 1: Flowchart for patients included in the validation comparison of weight-gain-based prediction models for retinopathy of prematurity in an Australian population.

Table 1 :
Demographics of infants included in the study.