Segmentation and Classification of Glaucoma Using U-Net with Deep Learning Model

Glaucoma is the second most common cause for blindness around the world and the third most common in Europe and the USA. Around 78 million people are presently living with glaucoma (2020). It is expected that 111.8 million people will have glaucoma by the year 2040. 90% of glaucoma is undetected in developing nations. It is essential to develop a glaucoma detection system for early diagnosis. In this research, early prediction of glaucoma using deep learning technique is proposed. In this proposed deep learning model, the ORIGA dataset is used for the evaluation of glaucoma images. The U-Net architecture based on deep learning algorithm is implemented for optic cup segmentation and a pretrained transfer learning model; DenseNet-201 is used for feature extraction along with deep convolution neural network (DCNN). The DCNN approach is used for the classification, where the final results will be representing whether the glaucoma infected or not. The primary objective of this research is to detect the glaucoma using the retinal fundus images, which can be useful to determine if the patient was affected by glaucoma or not. The result of this model can be positive or negative based on the outcome detected as infected by glaucoma or not. The model is evaluated using parameters such as accuracy, precision, recall, specificity, and F-measure. Also, a comparative analysis is conducted for the validation of the model proposed. The output is compared to other current deep learning models used for CNN classification, such as VGG-19, Inception ResNet, ResNet 152v2, and DenseNet-169. The proposed model achieved 98.82% accuracy in training and 96.90% in testing. Overall, the performance of the proposed model is better in all the analysis.


Introduction
It is important to diagnose glaucoma early on, which can reduce damage and loss of vision and ensure prompt and appropriate care. e worldwide prevalence of glaucoma for people ages 40 to 80 years is 3.54%. Each one out of 200 individuals aged 40 have glaucoma, which ascends to one in eight by age 80 [1]. Various glaucoma-related risk factors have been established, where the elevated intraocular pressure (IOP) that damages the optic nerves and blood vessels is the significant one. It can lead to total damage to the optic nerves and cause vision loss, if glaucoma is left untreated. is gradual and complete damage to the optic nerves is often followed by only mild or no symptoms, so it is known as the "sneak thief of sight" [2].
Glaucoma is one of the most common causes of irreversible vision loss after cataracts worldwide, accounting for 12 percent of all blindness cases each year. e number of people affected by glaucoma between the ages of 40 and 80 is expected to rise to 111.8 million by 2040. Furthermore, 2.4 percent of all people and 4.7 percent of those aged 70 and up are at risk of developing the disorder. Glaucoma is defined as the degeneration of retinal ganglion cells (RGCs) caused by a variety of disorders. RGC degeneration may result in two major health concerns: (i) Structural changes to the optic nerve head (ONH) and the nerve fiber layer (ii) Concurrent functional failures of the field of vision ese two glaucoma side effects might induce peripheral vision loss and, if left unchecked, blindness. Besides early detection and treatment, there is no cure for glaucoma. It is essential in developing automated techniques for detecting glaucoma early on [3]. A retinal fundus image is an essential tool for documenting the optic nerve's health, vitreous, macula, retina, and blood vessels. Ophthalmologists used a fundus camera to take the retinal image. e retinal image was used to diagnose eye disease like glaucoma. Glaucoma is a significant cause of global blindness that cannot be cured. Glaucoma disease can change the cup region's shape, which is the center portion of the ONH. e changes can be used as a parameter for the early indicator of glaucoma. e ONH transmits visual information from retina to the brain [2]. Figure 1 shows the retinal fundus images.
ere are no initial glaucoma symptoms but will gradually damage the optic nerves and then results in blindness. us, it is crucial to detect glaucoma as early as possible so that it can prevent visual damage. Physiologically, glaucoma is indicated by increased optic cup excavation. e increasing size of the optic cup will impact the size of the optic disc, and this relation is known as a cup-todisc ratio (CDR). It means ophthalmologists can diagnose glaucoma progression using the value of CDR measurement. e optic cup and optic disc segmentation will support to calculate the CDR from the retinal image [3]. e most noticeable symptom of glaucoma is often a loss of side vision, which might go unnoticed as the condition progresses.
is is why glaucoma is sometimes referred to as the sneaky thief of vision. In the case of extreme intraocular pressure levels, headache, sudden eye pain, impaired vision, or the formation of halos around lights might occur.
(i) Loss of vision (ii) Eye redness (iii) Hazy eyes (specifically in infants) (iv) Vomiting or nausea (v) Vision narrowing (tunnel vision) [4] ere are many forms of glaucoma, including angleclosure glaucoma (ACG), primary open-angle glaucoma (POAG), primary congenital glaucoma (PCG), normaltension glaucoma (NTG), pseudoexfoliative glaucoma (XFG), traumatic glaucoma (TG), uveitic glaucoma (UG), pigmentary glaucoma (PG), and neovascular glaucoma. e forms vary between different ethnicities in intensity, complexity, and occurrence. Open-angle and angle-closure glaucoma are the two major forms of glaucoma [4]. Figure 2 is shown in optic nerve head structure. e most common form of glaucoma is open-angle glaucoma also referred to as wide-angle glaucoma. It happens as a result of partial drainage canal blockage in which the pressure slowly rises as the fluid is not properly drained. Symptoms start from vision loss in the periphery and may not be detected until central vision is impaired. Angleclosure glaucoma caused by impulsive and aqueous drainage full blockage is often called acute glaucoma. e pressure increases exponentially, which quickly leads to vision loss. It is formed because of the angle of narrow drainage and the small and droopy iris. e iris is pulled inside the anterior angle of the eye against the trabecular mesh network (drainage canals) leading to blockage and bulging of the iris forward [5].
In most situations, this damage is caused by abnormal rise of the pressure inside the eye. e secretion rate is equalised to the drainage rate in healthy eyes. Glaucoma occurs when the drainage canal was partially or entirely blocked, leading to a surge in pressure known as intraocular pressure that affects the optic nerves used to relay signals to the brain where it is possible to perceive visual information. If this damage is left untreated, complete blindness will result. Hence, it is essential to diagnose glaucoma in early stage.
In this research, early prediction of glaucoma using deep learning technique is proposed. In this proposed deep learning model, the ORIGA dataset is used for the evaluation of glaucoma images. For segmentation, the U-Net segmentation model is implemented in this model and a pretrained transfer learning model, DenseNet-201, is used for feature extraction along with deep convolution neural network (DCNN). e DCNN approach is used for the classification, and the final results will be representing whether the glaucoma infected or not.

Related Works
Several study models have been developed by various authors for the segmentation and classification of glaucoma detection, each employing a different methodology and algorithm from the others. As will be detailed more, the majority of them are deep learning-based models with varying levels of performance analysis. e fact that retinal disease is such a terrible ailment makes it difficult to detect and distinguish between the two conditions. e most common approach used in most of the studies to diagnose glaucoma was the acquisition of retinal scans using digital capture equipment for visual content, which was the most common procedure used in most of the studies. e scan images were then preprocessed to equalize the anomalies. During the preprocessing stage, blood vessels were segmented and depicted in order to create a vessel free image. Furthermore, feature extraction was utilized to efficiently reduce the dimensions of an image in order to represent the interesting areas of an image as a compact feature vector that could be used for precisely classifying the large amount of data collected. Techniques such as textures, pixel intensity values, FFT coefficients, and histogram models were employed in the process of feature extraction and classification. Data analysis and classification were accomplished through the use of image classification, which involved examining the numerical aspects of an image. e data set was divided into several classifications based on the results, such as normal or glaucoma, to facilitate analysis. Prastyo et al. applied the U-Net segmentation technique to retinal fundus images in order to segment the optic cup. e segmentation of the optic cup and the optic disc aids in the achievement of improved performance in the detection of glaucoma disease. e ROI based on the optic disc image was cropped and segmented with the help of the U-Net algorithm. In order to obtain optimal training, an adaptive learning rate optimization technique was applied, and the model attained a dice coefficient rate of 98.42 percent and a loss rate of 0.15 percent during testing [6]. A model of attention-based CNN (AG-CNN) for identifying glaucoma was proposed by Li et al. and it was tested on a database known as the large-scale attention-based glaucoma database (LAG). e removal of large levels of redundancy from fundus images may result in a reduction in the accuracy and reliability of glaucoma identification. e AG-CNN model took this into consideration and made a decision on it. In this model, subnets of attention prediction, pathological region localization, and classification were combined to form an overall model. When it comes to detecting glaucoma, the model has a 96.2 percent accuracy rate and an AUC of 0.983. In several cases, the ROI was only partially highlighted, and the minor problematic regions were not correctly identified [7].
For the purpose of automatically segmenting the glaucoma images, MacCormick et al. developed a new glaucoma detection algorithm based on spatial detection. e method was developed on the basis of four assumptions: segmentation, deformation, shape, and size of the images were all taken into consideration. After a segmentation of the cup and disc of the retinal fundus images was completed, an estimation of the cup/disc ratio (CDR) in 24 cross sections was performed to generate the pCDR (CDR profile). e results were compared between healthy discs and glaucomatous discs on both external and internal validation, with the AUROC for internal validation being 99.6 percent and for external validation being 91 percent [8].
Juneja et al. proposed an artificial intelligence glaucoma expert system that was based on the segmentation of the optic cup and disc. In order to automate the identification of glaucoma, a deep learning architecture was designed, with CNN serving as the core element. In this model, two neural networks were integrated and used for segmenting images of the optic disc and cup of fundus, which were taken from different cameras. By examining 50 images, the model was able to segment the cup with 93 percent accuracy and the disc with 95.8 percent accuracy [9]. To diagnose glaucoma in retinal fundus images, Diaz-Pinto et al. used five ImageNet trained models, including the VGG-16, VGG-19, ResNet50, Inception-v3, and Xception, all of which were trained using ImageNet data. Performance study revealed that the Xception model outperformed the other models by obtaining better results, and the Xception model was then tested with five publicly accessible datasets for glaucoma diagnosis to confirm its superiority. e Xception model was more efficient than other commonly used models [10] due to its higher level of computing efficiency.
With the help of deep learning, SynaSreng et al. developed an automated two-stage model for glaucoma diagnosis and classification. Initially, the optic disc area was segmented using DeepLabv3+ architecture, but the encoder segment was replaced with several deep CNNs after the initial segmentation. For classification, a trained DCNN was employed with three approaches: transfer learning, feature descriptors learning using SVM, and constructing an ensemble of techniques in transfer learning and feature descriptors learning, respectively. It was possible to segment the optic discs using DeepLabv3+ and MobileNet architectures because of the integration of the two systems. Five separate glaucoma datasets were used in the classification process, which was done using an ensemble of algorithms. Finally, utilizing the ACRIMA dataset, DeepLabv3+ and MobileNet were able to achieve an accuracy of 99.7 percent for OD segmentation and 99.53 percent for classification using DeepLabv3+ and MobileNet [11].
To diagnose diabetic retinopathy, Mateen et al. developed a fundus image classification model that combined the VGG-19 with principal component analysis (PCA) and singular value decomposition (SVD) and used the VGG-19. e model's performance in region segmentation, feature extraction and selection, and classification has been improved by combining the Gaussian mixture model with the VGG, PCA, and SVD [12,13]. Fu et al. employed two deep learning-based glaucoma detection techniques, multilabel segmentation network (M-Net) and disc-aware ensemble network, to detect the presence of glaucoma (DENet). Initially, M-Net was utilized to solve the segmentations of both the optic cup and the disc, and DENet was used to combine the deep hierarchical context of the global fundus image with the local optic disc region in the initial stages. e CDR was calculated based on the segmentation of the optic cup and disc in order to determine the glaucoma risk. It is possible to get accurate results from an image without segmenting it using the DENet [13].
Jiang et al. developed a new multipath recurrent U-Net model for segmenting retinal fundus image. e efficiency of the model was validated by the performance of two segmentation processes like optic cup and disc segmentation and retinal vessel segmentation. e model achieved 99.67% accuracy for optic disc segmentation, 99.50% for optic cup segmentation, and 96.42% for retinal vessel segmentation by using the Drishti-GS1 dataset [14].
Mahum et al. proposed an early-stage glaucoma diagnosis model based on deep learning-based feature extraction. Images were preprocessed in the first phase before the region of interest was retrieved using segmentation. en, using the hybrid features descriptors, such as CNN, histogram of oriented gradients, local binary patterns, and speeded up robust features, characteristics of the optic disc were recovered from images including optic cup. Furthermore, HOG was used to extract low-level features, while the LBP and SURF descriptors were used to extract texture features. Furthermore, CNN was used to compute high-level characteristics. e feature selection and ranking technique of maximum relevance minimum redundancy was applied. Finally, multiclass classifiers such as SVM, KNN, and random forest were used to determine if fundus images were healthy or diseased [15].
Gheisari et al. proposed a new method for detecting glaucoma that combined temporal (dynamic vascular) and spatial (static structural) data. A CNN and recurrent neural network (RNN) classification model that extracts not just the spatial features in the fundus images but additionally the temporal features inherent in the consecutive images was developed. Because CNN was designed to diagnose glaucoma, it was built on spatial information encoded in images. CNN was used with RNN for increased performance in detecting glaucoma based on both temporal and spatial features [16].

Proposed Methodology
In this research, the deep learning-based models are proposed for segmentation and classification of glaucoma detection using retinal fundus images collected from ORIGA database. For segmentation, the U-Net architecture is used and a pretrained DenseNet-201 architecture was used to extract the features from the segmented image. For classification, the DCNN architecture is used to classify the images for detecting glaucoma.
ere are three convolutional blocks each in the contracting and expansive paths. Two convolutional layers consist of a block in the contracting path followed by a maxpooling layer with a pool size of 2 × 2. A block contains a 2 × 2 upsampling layer in the expansive path, a concatenation from the contracting path with the corresponding block (i.e., a merged layer), a dropout layer, and two convolutional layers. e connecting path includes two convolutional layers. Finally, a 1 × 1 convolutional layer with a sigmoid activation and a single filter to output pixel-wise class scores is the final output layer. Every convolution layer in blocks 1, 2, and 3 includes 112, 224, and 448 filters in the contracting path, while blocks 5, 6, and 7 include 224, 122, and 122 filters in the expansive path individually. ere are 448 filters in every convolutional layer in the connecting path. e proposed DCNN differs from the original U-Net in the number of filters chosen for the model to fit into the GPU memory in each convolution layer and the use of dropouts in the expansive path.

DenseNet-201 with CNN.
A DCNN model with pretrained DenseNet-201 is proposed in this research [21]. is DenseNet-201 model is based on deep transfer learning (DTL) as it is implemented to identify the glaucoma images from the input dataset by classifying the retinal fundus images. To extract features from the dataset, a pretrained DenseNet-201 model is used, and the DCNN model is used for classification. 256 × 256 is the input image size. e architecture of the DenseNet-201 with DCNN is shown in Figure 4.
DCNN usually performs well with a larger data set over a smaller one. TL could be useful in those CNN applications where the data set is not huge. For applications with comparatively smaller datasets, TL's concept utilizes the learned model from large datasets such as ImageNet. is removes the need for a large dataset and decreases the lengthy training time as needed when generated from scratch by the deep learning algorithm. TL is a deep learning method that uses a model trained for a single assignment as a starting point to train a model for a similar assignment. It is typically much quicker and simpler to fine-tune a TL network than training a network from scratch. By leveraging common models that have been already trained on large data sets, it allows the training of models using similar small labeled data. Training time and computing resources can be significantly decreased. With TL, the model does not need to be trained for as many epochs (a complete training period on the entire dataset) like a new model.

Journal of Healthcare Engineering
Because of the feature reuse possibility by various layers, the DenseNet-201 uses the condensed network that provides simple to train and highly parametrical effective models and expands variety in the following layer input and enhances the execution. On various data sets, such as CIFAR-100 and ImageNet, DenseNet-201 has shown remarkable results. Direct connections from each previous layer to every subsequent layer are added to boost connectivity in the Den-seNet-201 model as shown in Figure 5. e concatenation of feature can be mathematically expressed as fc i � NL i fc 0 , fc 1 , . . . , fc i−1 . (1) Here, NL i (•) was a nonlinear transformation that could be described as batch normalization (BN) composite function, accompanied by a rectified linear unit function (ReLU) and a (3 × 3) convolution layer.
For ease of implementation, [fc 0 , fc 1 , . . ., fc i − 1 ] indicates the feature map concatenation according to layers 0 to i − 1 are combined into a single tensor. Dense blocks are generated in the network architecture for downsampling purposes, divided by layers known as transition layer consisting of BN followed by a 1 × 1 convolution layer and an average 2 × 2 pooling layer. DenseNet-201's growth rate defines how dense architecture produces better results, and the "H" hyperparameter denotes it. Because of its structure, where feature maps were regarded as a network's global state, DenseNet-201 performs adequately well even with a minimal growth rate. erefore, all function maps of the preceding layers have access to each successive layer. Each layer includes "H" feature maps to the global state where each count of input feature maps at i th layers (fm) i was expressed as Here, the input layer channels are given by H 0 . A 1 × 1 convolution layer preceding each 3 × 3 convolution layer is added to increase computational performance, which reduces the input feature maps that were usually higher than the feature maps of output H. e 1 × 1 conv layer was known as the bottleneck layer and generates feature maps. FC layers act as a classifier in the classification stage. It uses extracted features and assesses the probability of a segment in the image. e architecture of DenseNet-201 is shown in Figure 6.
To create nonlinearity and to reduce overfitting, the activation function and dropout layer are typically used. Two dense layers of 128 and 64 neurons were implemented for classification. e DenseNet-201 feature extraction model was used for binary classification preceded by the sigmoid activation function to replace the softmax activation function utilized in the DenseNet-201 design. In the FC dense layer, every neuron was FC in the prior layer. e FC layer "i" whose input 2D feature map was extended to a 1D feature vector can be mathematically described.
Here, the Bernoulli function produces a vector v i � 1 randomly with a certain probability that obeys the 0-1 distribution.
e dimension of the vector is d i−1 . e dropout strategy is used by the initial two layers of the FC layer to randomly block some neurons based on a defined probability, which efficiently avoids overfitting situations in deep networks. e terms "x" and "u" describe the FC layer's respective weighting and offset parameters. e function of sigmoid activation was to convert nonnormalized outputs to 0 or 1 as binary outputs. erefore, it helps to classify the images as nonglaucoma or glaucoma. e sigmoid function can be expressed as where the neuron output is S. e weights and inputs, respectively, represent x i and z i .

Performance Analysis
e performance analysis of the proposed DCNN with the U-Net and DenseNet-201 model is assessed using the dataset in this section. e model is evaluated using parameters such as accuracy, precision, recall, specificity, and F-measure. Also, a comparative analysis is conducted for the validation of the model proposed. e output is compared to other current deep learning models used for CNN classification, such as VGG-19, Inception ResNet, ResNet 152v2, and DenseNet-169. On the MATLAB 2019a Simulink toolbox, all the experiments are implemented and carried out. e dataset is split into 75% for training and 30% for validating the performance analysis.

Original Image
Output Image Groundtruth Image Accuracy is the model's estimation of the performance subset. It is the primary output metric used to calculate the efficiency of the classification process. It is usually used to estimate when both the positive and negative classes are equally important. It is calculated using the following equation.
As shown in Table 1, the proposed model achieved better classification accuracy in both training and testing for classifying the glaucoma fundus images. e model obtained 98.82% training accuracy, which is 1.09% to 3.96% improved compared with other techniques. e testing accuracy is 96.90%, which is 1.36% to 5.26% increased performance than the other existing compared models. e graphical chart of the comparison is plotted in Figure 7.
Precision is a positive predictive value. It is the measure of the cumulative predictive positive value of the correctly predicted positive observation. e lower precision value reflects that a large number of false positives have affected the classification model. e measure of precision can be computed using the following equation.
e estimation of precision is tabulated in Table 2, which shows that the proposed model has achieved better precision value than the compared models. e model obtained 98.63% precision rate in training, which was 1.1% to 4.8% improved compared with other techniques. e precision rate in testing was 96.45%, which was 1.08% to 4.9% increased performance than the other existing compared models. Figure 8 shows the comparison of precision analysis. e sensitivity is also referred to as recall. It is the ratio of properly predicted positive evaluation of the overall positive predictive value. e lower recall value reflects that a large number of false negative values have affected the classification model. e recall estimation can be calculated using the following equation.  Journal of Healthcare Engineering e proposed model has gained better recall or sensitivity rate as tabulated in Table 3. e model obtained 98.95% recall rate in training, which was 1.1% to 4.05% improved compared with other techniques. e recall rate in testing was 97.03%, which was 1.3% to 5.06% better performance than the other existing compared models. e comparison graph is plotted, as shown in Figure 9.
As per this model, specificity is the prediction that healthy subjects do not have the disease. It is the percentage of subjects with no illness that is tested as negative. e specificity estimation can be calculated using the following equation.
As shown in Table 4, the proposed model has obtained a better specificity rate than the other comparative models of deep learning.      techniques. e specificity rate in testing was 96.33%, which was 0.6% to 6.4% better performance than the other existing compared models. Figure 10 represents the comparison of specificity estimated. e F-measure estimates the accuracy of the test and is defined as the weighted harmonic mean of the precision of the test and the recall. e accuracy does not take into account how the data was distributed. e F-measure is then utilized to manage the distribution problem with accuracy. When the data set has imbalance classes, it was useful. e F-measure estimation can be calculated using the following equation.
e F-measure estimation is tabulated in Table 5, which represents that the proposed model has achieved better F-measure value than the compared models. e model obtained 98.50% F-measure rate in training, which was 0.9% to 3.7% improved compared with other techniques. e F-measure rate in testing was 96.28%, which was 0.8% to 4.7% better performance than the other existing compared models. Figure 11 shows the comparison of F-measure analysis.
In this research, by comparing all the models like VGG-19, Inception ResNet, ResNet 152v2, and DenseNet-169, the proposed model has achieved better performance in both the training and testing stages. e least performance achieved model is Inception ResNet and DenseNet-169 has some close performance to the proposed model.

Conclusion
In this research, early prediction of glaucoma detection model using deep learning technique was proposed. In this proposed deep learning model, the ORIGA dataset was used for the evaluation of glaucoma images. 75% of the data was used for training and 25% of data was used for testing. For segmentation, the U-Net segmentation model was implemented in this model and a pretrained transfer learning model, DenseNet-201, was used for feature extraction along with DCNN. e DCNN approach was used to classify the images for glaucoma detection. e primary objective of this model was to detect the glaucoma using the retinal fundus images, which can be useful to determine whether the     patient is affected by glaucoma or not. By segmenting the fundus images, the optic cup region was segmented and compared with ground truth images from the dataset. After segmentation, the features were extracted from the images using DenseNet model and classified using DCNN. e proposed model obtained 98.82% training accuracy, which was 1.09% to 3.96% higher compared with other models and the testing accuracy was 96.90%, which was 1.36% to 5.26% higher than the compared models. By analyzing the performance analysis, the results obtained by the proposed model are efficient and the reason for not achieving 100% results was due to the false positives and false negatives. In future, this imbalance issue will be sorted out by improving the classifier and reducing the threshold. is model can be useful for various medical image segmentation and classification processes like diabetic retinopathy, brain tumor detection, breast cancer detection, etc.
Data Availability e datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.