Glaucoma Diagnostic Accuracy of Machine Learning Classifiers Using Retinal Nerve Fiber Layer and Optic Nerve Data from SD-OCT

Purpose. To investigate the diagnostic accuracy of machine learning classifiers (MLCs) using retinal nerve fiber layer (RNFL) and optic nerve (ON) parameters obtained with spectral domain optical coherence tomography (SD-OCT). Methods. Fifty-seven patients with early to moderate primary open angle glaucoma and 46 healthy patients were recruited. All 103 patients underwent a complete ophthalmological examination, achromatic standard automated perimetry, and imaging with SD-OCT. Receiver operating characteristic (ROC) curves were built for RNFL and ON parameters. Ten MLCs were tested. Areas under ROC curves (aROCs) obtained for each SD-OCT parameter and MLC were compared. Results. The mean age was 56.5 ± 8.9 years for healthy individuals and 59.9 ± 9.0 years for glaucoma patients (P = 0.054). Mean deviation values were −1.4 dB for healthy individuals and −4.0 dB for glaucoma patients (P < 0.001). SD-OCT parameters with the greatest aROCs were cup/disc area ratio (0.846) and average cup/disc (0.843). aROCs obtained with classifiers varied from 0.687 (CTREE) to 0.877 (RAN). The aROC obtained with RAN (0.877) was not significantly different from the aROC obtained with the best single SD-OCT parameter (0.846) (P = 0.542). Conclusion. MLCs showed good accuracy but did not improve the sensitivity and specificity of SD-OCT for the diagnosis of glaucoma.


Introduction
Primary open-angle glaucoma is a chronic disease that is characterized by a progressive optic neuropathy and degeneration of the retinal nerve fiber layer (RNFL), resulting in a distinct appearance of the optic nerve head (ONH) and concomitant visual field (VF) loss. Examination of the RNFL and the ONH are recognized as valuable methods of diagnosing early glaucoma, since these changes are often detectable before VF loss [1]. Some studies have shown that as many as half of retinal ganglion cells can be lost before standard automated perimetry (SAP) shows a VF defect [2,3]. During the last years, several methods have emerged for the objective assessment of RNFL thickness and ONH topography [4].
Optical coherence tomography (OCT), first described by Huang et al. in 1991 [5], has been widely accepted in glaucoma management [6]. The Cirrus spectral domain OCT (SD-OCT) (Carl Zeiss Meditec Inc., Dublin, CA), one of the commercially available SD-OCT instruments, has an axial resolution of 5 m and a scan speed of 27,000 A-scans per second. The scanning area covers 6 mm × 6 mm × 2 mm, analyzing both RNFL thickness and ONH topography. This SD-OCT provides faster scanning than previous time domain OCTs (TD-OCT) [7].
Machine learning classifiers (MLCs) have been developed since 1962 [8] and have been used in ophthalmology research since 1990 [9]. MLCs train computerized systems to detect the relationship between multiple input parameters, eventually facilitating the diagnosis of a condition. In fact, some reports suggest that MLCs are as good as [10][11][12][13] or better than [14][15][16][17][18][19][20] currently available techniques for glaucoma diagnosis. In a recent study [21], we have demonstrated that MLCs using RNFL thickness measurements obtained with SD-OCT show good diagnostic accuracy. However, they did not improve the sensitivity and specificity of RNFL parameters alone. In a subsequent study, we analyzed the accuracy of MLCs using RNFL and VF parameters [22]. The purpose of this study is to evaluate the sensitivity and specificity of MLCs using both RNFL and ONH parameters measured by SD-OCT for the diagnosis of glaucoma.

Subjects.
This was a prospective, observational, crosssectional study. We analyzed 103 eyes of 103 participants (46 healthy control subjects and 57 patients with glaucoma), all of them older than 40 years, at the Glaucoma Service of the University of Campinas (UNICAMP), Brazil. Each participant had a complete ophthalmic evaluation that included medical history, best corrected visual acuity (BCVA), slit lamp biomicroscopy, measurement of intraocular pressure (IOP) with Goldmann tonometry, gonioscopy, dilated slit lamp fundus examination with a 78-diopter lens, SAP using the standard 24-2 Swedish interactive threshold algorithm (SITA) (Humphrey Field Analyzer II, Carl Zeiss Meditec Inc., Dublin, CA), and imaging with the Cirrus SD-OCT. All patients participated in two other studies published previously by our group [21,22]. However, after an upgrade of the Cirrus software (5.1.1.6), which allows the analysis of the ONH, a change in the signal strength of almost all OCT images was observed. We decided to modify the inclusion criteria, decreasing the minimum signal strength to 6 (instead of 7), which resulted in a total of 103 eyes of 110 participants.
Participants of both groups had a BCVA better or equal to 20/40, spherical refraction within ±5.0 diopters (D), cylinder correction within ±3.0 D and open angles on gonioscopy and reliable SAPs with false-positive errors <33%, false-negative errors <33%, and fixation losses <20%. We excluded all eyes with retinal diseases, uveitis, pseudophakia or aphakia, nonglaucomatous optic neuropathy, and significant cataract according to the criteria of Lens Opacification Classification System III (LOCSIII) [23], defined as the maximum nuclear opacity (NC3, NO3), cortical (C3), and subcapsular (P3). If both eyes were eligible, one eye was randomly selected.
The inclusion criteria for healthy eyes were IOP ≤ 21 mmHg with no history of elevated IOP or glaucoma cases in the family and two consecutive and reliable normal visual 3 fields.
The inclusion criteria for glaucomatous eyes were two or more IOP measurements >21 mmHg and a glaucomatous VF defect confirmed in two recent and reliable examinations. Eyes with glaucomatous VF defects were defined as those that met two of the following criteria: (1) cluster of 3 points with a probability of <5% on a pattern deviation map in a single hemifield, including at least 1 point with a probability of <1%; (2) glaucoma hemifield test outside 99% of the age-specific normal limits; and (3) pattern standard deviation outside 95% of the normal limit. The severity of glaucomatous damage was classified into (a) mild damage: mean deviation (MD) ≥ −6 dB; (b) moderate damage: MD between −6 dB and −15 dB; (c) advanced damage: MD ≤ −15 dB. Glaucomatous eyes with advanced damage were excluded from this study.
We respected the Declaration of Helsinki and obtained an informed consent from all participants. The study was approved by the University of Campinas Medical Institutional Review Board.

Optical Coherence Tomography.
All subjects had RNFL thickness and ONH topography measured with the Cirrus SD-OCT (software version 5.1.1.6). The ONH mode consists in a 3-dimensional dataset of 200 A-scans that are derived from 200 B-scans and analyzes a 6-mm 2 area centered on the optic disc. The software creates a RNFL thickness map from the 3-dimensional cube data set and centers the disc. Subsequently, it also extracts a circumpapillary circle of 1.73 mm of radius for RNFL thickness measurements. The SD-OCT provides RNFL thickness maps with 4 quadrants (superior, inferior, nasal, and temporal) and 12-clock-hours and average thickness measurements. All RNFL hour measurements were aligned according to the orientation of the right eye. Hence, clock hour 3 of the circumpapillary scan represented the nasal side of the optic disc for both eyes. The 5.1.1.6 software also allows the measurement of ONH parameters, such as rim area, disc area, average cup/disc ratio, vertical cup/disc ratio and cup volume. We created an additional parameter: the cup/disc area ratio, defined as: [(disc area − rim area)/disc area]. The end of Bruch's membrane is defined as the disc margin and is identified from the 3-dimensional cube dataset. The rim width around the circumference of the optic disc edge is determined by measuring the amount of neuroretinal tissue in the optic nerve [6]. We excluded all poor-quality scans analyzed at printouts with (a) incorrect identification of the vitreoretinal surface, (b) horizontal eye motion within the measurement circle in the en face image printouts, and (c) misidentification of Bruch's membrane. Only well-centered scans with a signal strength between 6 and 10 were included. All images were acquired with undilated pupils by a single, well-trained ophthalmologist, masked for the diagnosis.

Machine Learning Classifiers
. Ten MLC algorithms were tested using 23 parameters measured with the SD-OCT (17 RNFL and 6 ONH). The following MLCs were tested: bagging (BAG), naïve-bayes (NB), linear support vector machine (SVML), Gaussian support vector machine (SVMG), multilayer perceptron (MLP), radial basis function (RBF), random forest (RAN), ensemble selections (ENS), classification tree (CTREE), and AdaBoost M1 (ADA). The rationale behind each MLC was explained in a previous paper [21]. Initially, the classifiers were trained with all 23 SD-OCT parameters. Then, a backward feature selection was used to find the smallest number of parameters that resulted in the best accuracy. The analysis started with the full-dimensional feature set and sequentially deleted the feature with worst accuracy (based on the aROC) and restarted a new analysis.
Weka software version 3.7.7 (Waikato Environment for Knowledge Analysis, the University of Waikato, New Zealand) was used to develop all 10 classifiers. Both receiver operating characteristic (ROC) curves and the calculation of the area under the ROC curve (aROC) were obtained using this software. We used the 10-fold cross-validation resampling method to maximize the use of our data. All eyes were randomly divided into 10 subsets, each containing approximately the same number of healthy and glaucomatous eyes. Nine subsets were used for training the classifiers, while the remaining subset was used for testing the classification performance.

Statistical Analysis.
MedCalc software version 12.3.0 (MedCalc Software, Mariakerke, Belgium) was used in all analysis. Continuous variables were compared using the Student's -test and categorical variables were analyzed using the chi-square test.
aROCs were obtained for all 23 SD-OCT parameters: average thickness, 4 quadrants (superior, inferior, nasal, and temporal), and 12-clock-hours RNFL thickness measurements, rim area, disc area, cup/disc area, average cup/disc, vertical cup/disc, and cup volume measurements. aROCs obtained for each SD-OCT parameter and each machine learning classifier, before and after optimization, were compared using the test. values <0.05 were considered to be statistically significant.

Results
One hundred and three eyes of 103 patients were enrolled in this study; 46 of them were healthy eyes and 57 glaucomatous eyes.
The clinical characteristics of the study population are shown in Table 1. The mean age was 56.5 ± 8.9 years for healthy individuals and 59.9 ± 9.0 years for glaucoma patients ( = 0.054). There was no significant difference between groups regarding IOP (14.7 ± 2.6 mmHg and 13.8 ± 2.5 mmHg, resp.) ( = 0.100), but glaucoma patients were using a mean number of 2.0 ± 1.1 medications to lower IOP. Mean MD values were −1.4 ± 1.6 dB for healthy individuals and −4.0 ± 2.4 dB for glaucoma patients ( < 0.001). Among the glaucoma patients, 86.0% had early VF damage and 14.0% had moderate VF damage. Table 2 compares the mean SD-OCT values in both groups. All SD-OCT parameters were significantly different between the groups, except for the 3, 4, and 9 o' clock positions and disc area. Table 3 Table 4). The best aROC obtained with RAN trained with 13 parameters (0.877) was not significantly different from the aROC obtained with the best single optic nerve SD-OCT parameter (cup/disc area) aROC = 0.846 ( = 0.542) ( Figure 1) and from the aROC obtained with the best single retinal nerve fiber layer SD-OCT parameter (average thickness) aROC = 0.783 ( = 0.094).

Discussion
Statistical comparisons between the various published studies that investigate the glaucoma diagnostic accuracy are difficult because of different demographic distributions, inclusion and exclusion criteria, OCTs and MLCs employed and, mainly, the severity of glaucoma.
Since its inception, MLCs have been studied in combination with several apparatus designed to improve the diagnosis of glaucoma such as TD-OCT [11,15,17,19], SD-OCT [21,22], Heidelberg Retina Tomograph (HRT) [16,18,24], Scanning Laser Polarimetry (GDx) [14], and VF [12,17,18,20,22]. In the scientific literature, we identified six relevant studies involving the structural analysis of the ONH with OCT and/or HRT associated with MLCs in order to improve the accuracy in the diagnosis of glaucoma. Similar to previous reports, our study demonstrated that the reduction in the number of OCT parameters improved the performance of MLC [10,11,16,19,21,22,24]. However, there is disagreement about the superiority of MLCs over isolated parameters for the diagnosis of glaucoma.
As far as we know, this study was the first to use MLCs with both RNFL and ONH data obtained from SD-OCT trying to improve the glaucoma diagnostic accuracy. We found that the best classifier was RAN (aROC = 0.877) and the best individual parameter from SD-OCT was the cup/disc area ratio (aROC = 0.846), with no statistical difference between them ( = 0.542). We consider those results as reflecting a good diagnostic accuracy, especially because 86% of the glaucomatous eyes were classified as having mild VF damage (MD > −6 dB).
Burgansky-Eliash et al. used five MLCs built with RNFL, ONH, and macular data from TD-OCT (a total of 38 parameters, of which only the 8 parameters with the best correlation with MD were used). They examined 42 healthy eyes and 47 glaucomatous eyes (among these, 27 eyes with early glaucoma and 20 eyes with advanced glaucoma). The healthy subjects were significantly younger than the glaucoma patients ( = 0.001) and the mean VF MD of the glaucomatous eyes was −6.4 dB. They concluded that the aROC obtained with the best classifier SVM (0.981) was not significantly different from the aROC obtained with the best single ONH parameter of OCT, the rim area (0.969) ( = 0.07). On the other hand, the aROC obtained with SVM was significantly larger than the best single RNFL parameter average thickness (0.938) and mean macular thickness (0.839) ( = 0.01 and < 0.001, resp.) [11]. In our study, we did not observe differences between aROCs obtained with RNFL parameters compared with those obtained with classifiers. However, the previous study used VF information to reduce the number of OCT parameters, which could have introduced bias, adding functional information to a classifier that should have exclusively structural data.
In a study of our group, Vidotti et al. compared the performance of 17 RNFL parameters from SD-OCT and 10 MLCs in discriminating between 48 healthy and 62 glaucomatous eyes. The best individual parameter was inferior quadrant (aROC = 0.813), the best classifier trained with all OCT parameters was SVMg (aROC = 0.795), and the best classifier trained with two SD-OCT parameters was BAG (aROC = 0.818) ( = 0.93) [21]. Similar to our study, a large proportion (82.3%) of their   parameters ( = 0.038). SVM performance based on this input was also better than the performance of the average RNFL thickness ( = 0.013) [15]. Likewise, Huang et al. tested three MLCs in order to improve the accuracy of glaucoma diagnosis based on RNFL and ONH data obtained with TD-OCT. They analyzed 100 normal individuals and 89 glaucomatous patients with early VF damage (MD > −6 dB). The inferior quadrant thickness was the best individual OCT parameter (aROC = 0.832) and Mahalanobis was the best MLC (aROC = 0.849). However, there is no statistical comparison between the aROCs obtained with inferior quadrant and Mahalanobis in this paper [19]. Naithani et al. evaluated the relationship between RNFL and ONH from TD-OCT and HRTII and compared three TD-OCT-based MLCs with those inbuilt in HRTII for detection of glaucomatous damage. As we know, OCT and HRT evaluate the ONH with two different scanning techniques. Furthermore, they use distinct reference planes to define where the cup begins, which may cause a difference in the measured values of all ONH parameters evaluated by the two modalities. They evaluated 60 normal eyes and 60 glaucomatous eyes, 30 of those with early glaucoma and 30 with moderate glaucoma. LDA was the best MLC-OCT parameter (aROC = 0.982) and FSM functions as the best MLC-HRT parameter (aROC = 0.859). Although there was no statistical comparison between those values, they concluded that OCT algorithms perform better than HRT-based formulas in distinguishing patients with early or moderate glaucoma from normal subjects [24].
Finally, Townsend et al. aimed to assess the performance of seven classifiers trained on HRTIII parameters for discriminating between 60 healthy eyes and 140 glaucomatous eyes. The classifiers were trained on all 95 variables and smaller sets created with backward elimination. The aROC was calculated for classifiers, individual parameters, and HRTIII glaucoma probability scores (GPS). Vertical cup/disc ratio was the individual parameter with the best performance (aROC = 0.848), global GPS was the best GPS parameter (aROC = 0.829), and SVMr showed significant improvement over both (aROC = 0.904) ( = 0.018 and = 0.006, resp.). They concluded that MLC can provide a significant improvement in HRTIII diagnostic power over single parameters and GPS [16].
Our study has limitations, including a limited sample size in both groups and the use of 10-fold cross-validation resampling method that maximizes the analysis of our data but uses the same population to train MLCs and test their performance.
In conclusion, the MLCs obtained with RNFL and ONH data did not improve the sensitivity and specificity of the Cirrus SD-OCT for the diagnosis of mild to moderate glaucoma in this population, even though a good diagnostic accuracy was observed. Further studies with a larger sample, pool of new structural parameters of OCT, and new classifiers may improve the accuracy for the diagnosis of glaucoma.