Novice Reviewers Retain High Sensitivity and Specificity of Posterior Segment Disease Identification with iWellnessExam™

Introduction. Four novices to Spectral Domain Optical Coherence Tomography (SD-OCT) image review were provided a brief lecture on the interpretation of iVue iWellnessExam™ findings (available on iVue® SD-OCT, Optovue, Inc., Fremont, CA). For a cohort of 126 (Confirmed) Normal, 101 (Confirmed) Disease subjects, iWellnessExam™ OD, OS, and OU reports were provided. Each novice independently reviewed and sorted the subjects into one of four categories: normal, retinal disease, optic nerve (ON) disease, and retinal + ON disease. Their accuracy is compared between the novices and with an expert reviewer. Results. Posterior segment disease was properly detected by novices with sensitivities of 90.6%, any disease; 84.3%, retinal disease; 88.0%, ON disease; expert sensitivity: 96.0%, 95.5%, and 90.0%, respectively; specificity: 84.3%, novices; 99.2%, expert. Novice accuracy correlates best with clinical exposure and amount of time spent reviewing each image set. The novices' negative predictive value was 92.0% (i.e., very few false negatives). Conclusions. Novices can be trained to screen for posterior segment disease efficiently and effectively using iWellnessExam™ data, with high sensitivity, while maintaining high specificity. Novice reviewer accuracy covaries with both clinical exposure and time spent per image set. These findings support exploration of training nonophthalmic technicians in a primary medical care setting.


Introduction
A recently published article [1] on the specificity and sensitivity of disease identification utilizing the iVue iWellnessExam™ test revealed that the data provided were sufficient for a well-trained eye clinician to review and accurately detect disease in a very high percent of subjects with either retinal and/or optic nerve (ON) disease and to accurately confirm health in an extremely high percent of healthy controls. This SD-OCT scan obtains a substantial amount of data for the assessment of both central retina and optic nerve integrity simultaneously [2][3][4][5][6][7]. (review previous study for details) [1]. A follow-up pilot study was undertaken with the same set of data to determine whether novice review of the same SD-OCT data is an effective way to identify retinal and/or optic nerve disease and to confirm health in normal subjects.
The previous study was designed to measure the specificity and sensitivity of a well-trained optometric clinician, utilizing only data obtained on the iWellnessExam™ test, in the identification of retinal and optic nerve disease in a cohort of Confirmed Normal (CN) and Confirmed Disease (CD) subjects. Specificity data were obtained by evaluating patients within the Primary Care clinic at the University Eye Center (UEC) at SUNY State College of Optometry who were determined to be both without retinal and without ON disorder (CN subjects). Sensitivity data were obtained by evaluating patients within the Ocular Disease and Special Testing Service at the UEC with known central retinal 2 Journal of Ophthalmology and/or optic nerve disorders (CD subjects). All glaucoma suspects were excluded from evaluation. SUNY IRB approval was obtained prior to the initiation of the study, and all subjects signed a SUNY IRB approved informed consent document.

Materials and Methods
Two groups of patients were examined: a "Confirmed Normal" (CN) cohort for the specificity aspect of the study (126 subjects) and a "Confirmed Disease" (CD) cohort for the sensitivity aspect of the study (101 subjects). Of the CD patients, 67 had retinal pathology; 50 had ON pathology.
(Sixteen (16) fell into both categories, with both retinal and ON pathology.) No "glaucoma suspects" were included for evaluation, as their status as a normal or as an ON pathology subject could not be clearly established. Data were obtained in the previous study, utilizing the iVue SD-OCT. It scans at 26,000 A-scans/second, with an axial resolution of 5 microns [8]. All analyses were made utilizing the iWellnessExam™, a one-step SD-OCT scan, which images a 7 mm × 7 mm area of the posterior pole centered on the fovea. The iWellnessExam™ report provides eight highresolution cross-sectional retinal images, along with its data analysis results: a full retinal thickness map, a ganglion cell complex (GCC) map, and a report on Superior/Inferior (S/I) symmetry within the eye, and symmetry between eyes. Note that these scans were obtained and reviewed before the release of the normative database for the iVue system.
Four individuals who were novices at reviewing SD-OCT images were enlisted to participate in the clinical review of this data set. The novices were each of a different level of clinical and educational experience in the ophthalmic field.   These four novices were provided with a single, 1.5-hour lecture with author JS on the nature of the data obtained on iVue iWellnessExam™, and on both numerical and pictorial data interpretation. Prior to this lecture, none of the novices had any exposure to the iVue system. Subject data sets were given a randomized code number, which served as the only identifier for each subject. Reviewers did not have access to any supplementary patient history, demographic, or clinical data. The novice reviewers were instructed to classify each subject into one of four categories: (1) normal, (2) retinal disease, (3) ON disease, and (4) retinal + ON disease. They were also requested to record the amount of time spent in review sessions so that an estimate of the amount of time spent per image could be made.

Results
Demographics and pathologies are listed in previous article [1].
Novice reviewers accurately identified disease (sensitivity) in 90.6 ± 6.3% of CD subjects and accurately identified health (specificity) in 84.3 ± 5.2% of CN subjects, utilizing only the iWellnessExam™ data. See Table 1 and Figures 1 and 2 for a detailed display of reviewer sensitivity and specificity data. Overall sensitivity for ocular disease improved with academic experience level.
Data were also evaluated for predictive value. These are measures of the reliability of a positive or a negative result on a test. Positive predictive value (PPV) is the percent of time that a positive test result will indicate disease. PPV is calculated as the number of true positives relative to the number of subjects who were identified as "positive" for the condition in question. Negative predictive value (NPV) is the percent of time that a negative test result will indicate health. NPV is calculated as the number of true negatives relative to the number of subjects who were identified as "negative" for any condition.
(Thus, a reviewer with high sensitivity for a disease, but who tends to over-refer, will identify more subjects as positive for a test than are truly positive. This will adversely impact the PPV.) All novice reviewers demonstrated a greater PPV for the general category of disease than for either subcategory and a greater PPV for retinal disease than for ON disease (see Figure 3). This implies that overreferrals for disease primarily occurred in subjects who had only retinal disease (category 2) but were classified as category 4 (retinal + ON disease). The novices on the whole perform well on the most important factor: appropriate referral of patients who have any disease (82.4 ± 5.0%, with a range of 78-89%). Retinal disease overreferrals in patients with ON disease appear to abate with optometric education (3rd year more successful than 1st year at correctly identifying retinal disease). ON disease overreferral remains somewhat elevated in patients who have retinal disease. By contrast, all reviewers performed with a high NPV, ranging from 85% to 98% (see Figure 4; Table 2). If the novices identified a patient as normal, there was a 92.0 ± 4.8% chance that disease was not present.
With a small sample of novice reviewers, and with variations in their educational backgrounds, it is not easy to rank their relative exposures to ophthalmic conditions and expected disease identification ability. Plots of their performance were translated to Receiver Operator Characteristics (ROC) space. This evaluates each subject's false positive rate (1 − specificity) relative to their true positive rate (sensitivity). See Figure 5. Best overall performance is defined by minimizing the false positives while maximizing sensitivity, with the most desirable performance being plotted at the top left corner of the ROC space. ROC plots were used to compare (1) expert performance for overall disease and for the two subcategories of retinal and optic nerve disease ( Figure 5(a)) and (2) the novices with the expert and with each other (Figure 5(b), all disease; Figure 5(c), retinal disease; Figure 5(d), optic nerve disease). For ease of comparison, the two-dimensional ROC findings are also presented as an accuracy rating. Accuracy is calculated as the sum of the true positives and true negatives divided by the sum of the total number of positives and negatives. Figure 6 compares the novices' accuracy, arranged by relative amount 4 Journal of Ophthalmology  of time spent in optometric education. Figure 7 also compares their accuracy, rearranged to reflect their relative amount of clinical exposure time.

Time Spent per Image.
Novice reviewers were asked to record the time they spent performing image review.
The novices conducted image review over an average of 4 sittings (ranging from 2 to 6 sittings) and spent an average of 59 ± 13 sec per image set (range 49 to 77 sec per image set). See Table 3. There does seem to be a correlation between the amount of time spent per image set and the accuracy of the subject categorization among novices (see Figure 8).

Effective Screening.
Above all, a screening protocol needs to be capable of disease detection. The data obtained on iWellnessExam™ may complement the clinical data obtained in the course of a routine exam [1,[9][10][11][12]. Once disease is detected or suspected, appropriate referrals can be made for follow-up testing and clinical evaluation. The results here show that individuals who are novices at reviewing SD-OCT images can be trained in a short amount of time to achieve an impressive rate of detection of the presence of posterior segment disease, while maintaining high specificity for the affirmation of health in control subjects, using only the data provided on iWellnessExam™. Another study evaluating the learning curve of a novice relative to an expert in imaging interpretation showed a similar learning effect with good accuracy when compared to the expert [13]. A study evaluating the value of problem-based learning as compared with more conventional teaching methods concludes that problem-based learning produces better educational results [14]. Thus, in a clinical environment, an ongoing feedback process between the evaluating clinician and the detecting technician will help technicians learn to interpret scans with even greater levels of accuracy. the present pilot study. From an educational standpoint, the novices may be ranked: A < B < C < D (refer to Methods). However, from a clinical exposure standpoint, the amount of contact time with patients and with review of typical clinical data may be ranked: A < C < D < B, as the pre-1st-year technician has had 4 years of exposure to an ophthalmic environment and has collected clinical data from a typical cross section of the population. Assuming this technician is representative of the value of clinical learning, her performance edifies the findings of the value of problembased learning in medical education [11,14].

Educational versus Clinical
In some regards, the 1st-year student (C) may have had a relative challenge in identifying normal, as he spent a year in ophthalmic research, exposed to challenging cases of ophthalmic disease with subtle findings. This may have predisposed him to identify disease, even in subtle cases, but not to identify health (i.e., reduced specificity).

Interpreting Accuracy and the ROC Plots.
The ROC plots enable a two-dimensional perspective on reviewer accuracy, at a glance. The Pre-1st-year optometry student who has experience as an ophthalmic technician (pink square, Figure 5) consistently outperformed the other novices. This performance supports the need for clinical exposure to general practice in order to help students contextualize their clinical observations. The 1st-year and nonoptometric reviewer have similar levels of clinical exposure. While they make different errors in reviewing the data in Figures 5(b) and 5(c) (one has more false positives with lower sensitivity; the other has fewer false positive with higher sensitivity), their performance is similar in terms of their accuracy (see Figure 7).

Time Invested on Image
Review. The novices were asked to report on the amount of time spent reviewing the data and the number of sittings. Novices B and D took a longer amount of time reviewing each subject's data set (which consisted of 3 image files). There is an apparent correlation between the amount of time invested in image review and the accuracy of the overall categorization exercise. Interestingly, this correlation appears strongest for retinal disease ( 2 = 0.98), which requires a higher level of image scrutiny than the determination of optic nerve disease ( 2 = 0.78).

Challenges Predicting Optic Nerve Disease in the Presence of Retinal
Disease. The reduced PPV and reduced sensitivity for patients with optic nerve disease, as compared to retinal disease, may be attributed to the challenges of assessing RNFL in the presence of an irregular outer retina, or even inner retinal disturbances, such as vitreoretinal adhesions. The interpretation of these challenging situations has been explored in detail, "interpreting the ganglion cell complex in the presence of retinal pathology" [1,15].

Conclusions
The iWellnessExam™ offers the health care provider a very reliable technology for the clinical identification of eyes at risk. Novices can be trained in a short amount of time to effectively use the data from the iWellnessExam™ to screen for disease with a high rate of sensitivity, while maintaining high specificity. Accuracy of the novice reviewers covaries with both clinical exposure and time spent on image review per subject.

Future Directions
This study shows a small sample of novice reviewers with different levels of clinical and educational exposure. It would be insightful to undertake this review with a larger sample of students at various stages of optometric education. In the interest of public health, a similar study could be undertaken with training of nonophthalmic medical technicians, to explore the potential for the identification of eye disease in patients who do not seek routine eye care but do manage their health with primary medical providers. Indeed, it is often an eye exam which results in medical referrals following the identification of retinal pathology.