Deep Learning Approach for Stages of Severity Classification in Diabetic Retinopathy Using Color Fundus Retinal Images

,


Introduction
In the human eye, the retina is a thin tissue layer. e retina is responsible for vision; it receives light and converted the received light input into some neural-based signals, and for visual recognition, these signals are forwarded to the brain. Sometimes the retina of the eye got damaged due to diabetes which is called diabetic retinopathy (DR) [1][2][3][4]. Here, "retino" means retina, and "pathy" means disease. e DR can be broadly categorized into 3 stages: (1) diabetes without retinopathy, (2) non-or preproliferative DR, and (3) proliferative DR. However, no such symptoms are observed to detect these different stages; this stage only is defined by pathology.
When the sugar level in the body increases it results in hyperglycemia which causes damage to the cells known as retinal pericytes [5]. ese retinal pericytes play a significant role in the regulation of blood flow. e pericytes damage can cause the inability to properly metabolize glucose in cells. is early stage of DR is called diabetes without retinopathy. ese can only be detected by using a microscope. An increase in vascular permeability or capillary permeability allows large molecules such as proteins and lipids to move in and out of vessels. If these fluids leak out they get trapped [6]. ese can be detected by going across several dilated eye exams. Also, this leakage results in swelling of the macula and it appears like yellow and white flex on the retina which is called hard exudates. Microaneurysms and hard exudates can be detected in the 2nd phase of DR which results in ischemia. Ischemia is a situation in which the poor supply of oxygen to the retina cells takes place. However, in the retina, Vascular Endothelial Growth Factor (VEGF) tries to suppress ischemia by producing new blood vessels. e last and final stage of DR results in blur vision and is called proliferative DR. Once the DR hits the 3rd stage it becomes proliferative and it can be very severe which results in loss of complete vision permanently. In PDR new blood vessels are formed which are abnormal vessels present in the retina. Some other shortcomings due to PDR are the detachment of the retina from the postal eye and sometimes new-formed blood vessels burst and bleed in the retina which results in permanent blindness. e early DR detection can be cured but when it becomes too severe the process of alleviating it becomes difficult. However, the severe stage treatment for DR includes photocoagulation or focal laser treatment which is laser treatment. Photocoagulation slows down the process of bleeding in the eye or leakage of fluid in the eye. After the dilation of the pupil, fundoscopy is performed which includes a test in which the fundus of the eye is checked by magnifying glass and light. Fundoscopy is performed to check the visualization of the entire retina. e changes seen in DR are microaneurysms, hemorrhages, and hard exudates.
However, classifying DR in different categories which are hemorrhages (HE), microaneurysms (ME), soft exudates (SE), and hard exudates (EX) makes it easy for the ophthalmologist to provide easy and best treatment according to the detected disease. A lot of studies have been completed on the image classification techniques, based on the signs of DR using computer vision [7,8].
For the last few years, various machine learning approaches have been used for automatic classification tasks [9][10][11][12][13]. For image classification, it follows the normal procedure initially in the preprocessing stage; the important features are extracted from the dataset of images by making use of convolutional layers. Convolutional Neural Network (CNN) has provided an easy way for researchers to create a new state-of-the-art algorithm that can classify diseases very easily for the good sake of ophthalmologists [14,15]. Figure 1 shows the presence of different types of lesions in the retina of an infected patient. e major contributions of the present research work are as follows: (1) A wide CNN-based computer-assisted lesions detection system for early and accurate categorization of lesions to aid in treatment planning. (2) An approach that involves image preprocessing, extraction of features, feature reduction, and image classification into different lesions present in the retina of the patient suffering from diabetic retinopathy.
(3) Applying data preprocessing techniques on images of various lesions, we were able to increase the CNN performance.
e sections of the paper that follow are structured in the following order: Section 2 provides a thorough overview of the literature, Section 3 explains the preliminary work (dataset description), the experimental environment, and the procedure, Section 4 focuses on the results and the discussion, and Section 5 concludes the paper and outlines the scope of future activities.

Related Work
Image processing plays a vital role in extracting significant data from an image [16]. Previously, a lot of research has been carried out for the detection of DR in the given clinical dataset of images. Several new state-of-the-art algorithms are also used to detect the DR from the given image dataset. Many researchers have given their contribution to overcome the disadvantages present in the detection of DR. Gadekallu et al. [17] introduce several DL and machine learning (ML) techniques with data normalization and dimensionality reduction approach to exact good results. Firefly and Principal Component Analysis techniques were applied for extracting the features and reducing the dimensionalities. Finally, these images were transferred to the classification process using a Deep Neural Network. Gangwar et al. [18] used an Inception-ResNet-v2 pretrained model and merged it with CNN layers for the detection of diabetic retinopathy. e proposed work used Messidor-1 diabetic retinopathy and APTOS 2019 blindness detection. e use of transfer learning gives good results. Reddy et al. [19] performed the min-max normalization technique on the given dataset to extract the diabetic images. Once these images were preprocessed, the ensemble-based ML algorithms were applied. e results show that the outcomes of ensemble learning algorithms are better than the traditional ML algorithm. Gupta et al. [8] worked on different DL pretrained approaches like Inception v3, VGG16, and VGG19 model for feature extraction from the images dataset of several lesions. To classify the lesions, the extracted features were passed to ML classifiers. Gayathri et al. [20] discussed a new technique for DR detection. Anisotropic Dual-Tree Complex Wavelet Transform and Haralick features were extracted. Also, ML classifiers such as SVM, RF, Tree, and J48 have been used for the binary as well as multiclass classification of different DR lesions. Messidor and DIARETDB0 datasets are used for their work. Nguyen et al. [21] have used various DL models for the classification of various lesions. By using DL techniques the automatic detection becomes easy in comparison to manual detection. It may be observed that the attractive results were found using the above models. e accuracy of 82% and sensitivity of 80% were marked. Erciyas et al. [22] proposed a deep learning-based technique for detecting diabetic retinopathy lesions automatically and 2 Mathematical Problems in Engineering independently of datasets and then classifying the lesions found. A data pool is generated in the first stage of the proposed technique by gathering diabetic retinopathy data from several datasets. Lesions are identified and the region of interest is tagged using Faster RCNN. e transfer learning and attention method are used to classify the pictures acquired in the second step. Wan et al. [23] presented a unique segmentation approach for various lesions in DR to tackle the problem. Because the proposed technique is based on a convolutional neural network and can be split into three modules: encoder, attention, and decoder, that is why it is named as EAD-Net. e fundus scans were submitted to the EAD-Net for automatic feature extraction and pixelwise label prediction after normalization and augmentation. e described EAD-Net technique is a unique clinical DR diagnosis-based method. e segmentation of four distinct types of lesions produces excellent results. Gharaibeh in [24] describes a new method for detecting microaneurysms and hemorrhages in fundus photographs. e author used partial swarm optimization (PSO) and Gaussian interval type 2 fuzzy membership function for the detection of diabetic retinopathy lesions.
e experimental results are based on the MATLAB simulation program, which uses the DR2 and Messidor databases. ese databases produce accurate and efficient categorization results with an accuracy of 95%.

Preprocessing.
e first dataset prepared for the Indian population for detecting eye disease is the IDRID dataset [25,26]. e idea is originated from one of the eye specialists located in Nanded, India. He captured the fundus images in IDRID. Out of thousands of images, experts verified 516 images to form a dataset based on adequate quality, clinical relevance, and no duplication of images. e image acquisition process is handled by using a 50°view camera (Kowa Vx-10 alpha digital fundus) with an image resolution of 4288 × 4288 pixels. e captured images were stored in jpeg format with an 800 KB image size. e dataset includes typical DR lesions and some normal retinal structures. e dataset highlights facts related to the stage of the disease along with the severity of the disease. e dataset contains eighty-one color fundus dataset images having the capability of the DR trace. ese images have various types of lesions like hard exudates (EX), soft exudates (SE), microaneurysms (MA), and hemorrhages (HE) represented in Table 1.

Training the Convolutional Network.
e dataset of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) is used to train the VGG16 model [27], with over 122 color fundus images, which was fine-tuned on the IDRID dataset using transfer learning.
For image classification, CNN is used and the performance of these networks outperformed humans on computer vision-related tasks. ILSVRC challenge organized in 2014 won by VGG. Also, CNN was developed by VGG and two separate models for CNN were designed, i.e., 16-layer model and 19-layer model. ese models are used for image classification. e architecture of VGG16 includes 16 trainable layers that include convolutional layer, sigmoid layer, and pooling layer in repetitive occurrences. Finally it was appended with the fully connected layer. e weights in the VGG model are freely available, easy to use, and loaded. In this work, several color fundus images are fine-tuned using transfer learning. On new predictive modeling tasks, pretrained models are used and utilized the advanced feature extraction potential of recognized models. Figure 2 shows the overall architecture for the classification of lesions using the VGG16 model.

Classifiers and Energy Functions Used.
In ML, the program in the computer is trained onset of inputs and thus uses this learning to classify new input data. is technique is called classification. e algorithm used for the implementation of classification is called a classifier. e different types of classifiers used in ML are SVM, RF, K-nearest neighbors, decision tree, logistic regression, naïve Bayes, AdaBoost, etc. [28]. In this work, once the model is trained to verify the applicability, classification is most important. e classifiers like logistic regression, neural network, SVM, RF, and AdaBoost can be applied to classify the various types of lesions such as ME, EX, SE, and HE in the fundus images of retina [29]. e advantage of using an LR classifier is that whenever the result of the input variables is categorical, logistic regression is utilized. e logistic regression method is applied when input data have binary output, i.e., 0 or 1. e ability of self-learning in neural network classifiers creates an output, i.e., not constrained by input data. e loss of data shows no effect on the operation of the system because input data reside on its network. SVM methods take a small amount of memory for execution. ese methods show significant results when classes have clear margin of distinction, e.g., high-dimensional spaces. SVM approaches are also effective when the number of observations is smaller than the parameters. Random forests outperform all other classification algorithms in terms of accuracy. e random forest approach can also handle large datasets with thousands of variables. When a class is rarer than other classes in the data, it can effectively equalize datasets. Adaptive Boosting (AdaBoost) is a common boosting approach that combines many weak classifiers into a single strong classifier.
When using the binary classification system, the results may be positive or negative. e possible classification outcomes by using binary classification are true negative (TN), true positive (TP), false negative (FN), and false positive (FP). Some measures are considered significantly important to find the performance of each classifier. Some of the measures are precision, recall, accuracy, F-score, and AUC [30]. e evaluation based on these measures can be done by the following.
Accuracy may be defined as the division of proportion of correct classifications by the total number of available cases: Accuracy � TP + TN (TP + TN + FP + FN) . (1) Recall may be derived as the total number of TP divided by the total number of actual positives: Precision is TP divided by the number of predicted positives (sum of TP and FP): F-score is the harmonic mean of precision value and recall value. e results of the F1-score are good when the values of both precision and recall are good. Also, the results for F-score will not be so good if the value of one measure is good at the expense of the other.
Sensitivity and specificity both are known as conditional probabilities. Sensitivity is determined as the probability for a TP given you have the disease. Specificity is the probability for a TN given you do not have the disease. We consider a symptom or a test to be effective in predicting a disease when both the sensitivity and the specificity are high [28].

Results and Discussion
e network which is trained using the VGG16 model was able to classify 122 lesion images segmented from the IDRID dataset. e results obtained gave us an insight into what the VGG model has learned. Also, sometimes convolutional networks give better results in object recognition than humans. e 122 lesion images comprised of four types of lesions: SE with 14 images, EX, HE, and MA with 27 images each. VGG16 is considered the standard model with various classifiers like NN, KNN, RF, LR, and SGD to classify the mentioned lesions [31]. e confusion matrix is also known as the error matrix; i.e., it is used to describe a classifier's performance on a particular test dataset [32] represented in Table 2. A confusion matrix is classified into subclasses based on correct and incorrect predictions of each class. Table 2 shows the logistic regression classifier in the confusion matrix where ME, HE, EX, and SE correctly predicted 26,19,19, and 8 images, respectively.
Another parameter is the ROC curve that can be used to calculate the efficiency of the model, i.e., the sensitivity divided by specificity at several threshold values of probability classifications. ROC curve can classify the true positive values with a larger accuracy rate and avoid the misclassification of false-negative rates. For the present work, the LR classifier with the ROC curve covered the maximum region. With the help of various classifiers like AdaBoost, NN, RF,  Type  Training set  Testing set  Total images  MA  54  27  81  SE  26  14  40  EX  54  27  81  HE  53  27  80   4 Mathematical Problems in Engineering LR, and SVM, the ROC curve shows accuracy classification with threshold values in Figures 1-4. is multiclass classification scenario is considered in Tables 3-5. ese tables depict that as ME accurately predicted 26 out of 27 images, SE predicted 9 out of 14 images correctly whereas HE and EX predicted 22 out of 27 images. Tables 3-7 show the concluded results that depict LR outperformed HE, EX, SE, and ME and also obtained better precision, accuracy, F1-score, recall, and AUC [33].
In the retina, ME is very important and appears initially. e starting stage of DR is known as ME and it can be identified in the retina by spotting red color dots which normally reside in the form of clusters. ROC curve for ME is shown in Figure 3.
While at initial stages ME is not identified and gets burst more than on the retinal area, the ME generates minor hemorrhage known as retinal hemorrhage. Eye bleeding is a common symptom that is normally detected in newly born babies and adults. e ROC curve for HE is depicted in Figure 4.
In the retina, the outer layer retaining the hard reside in the outer layer is EX. e EX is very much similar to liquid material and protein. In the retina, the outer layer retaining the hard reside is called EX. e EX is very much similar to the liquid material and protein. e ROC curve over EX is shown in Figure 5.
In the retina, arteriolar occlusion is arteries that exist in the blood-retina and SE will produce when these arteries move through the occlusion process. ese slightly larger precapillaries arteries are also known as cotton wool spots. ROC curve for SE is represented in Figure 6. e current study describes the application of VGG pretrained models for the categorization of various lesions found in the retina of diabetic patients' eyes. Furthermore, the lesions images were converted into feature space using      VGG16 and VGG19 pretrained models. ese models were used due to their clarity, ease of usability, and strength of their depth of learning. Multiple hidden layers in these models aid in collecting and analyzing the existence of characteristics in scans, as well as categorizing the components. Finally, the major advantage of using these networks is its consecutive blocks, which allow for a decrease in the quantity of spatial information required by inserting successive convolutional layers after each other. ese models were implemented in our lesion detection to improve the accuracy of detection of lesions in the given input jpeg images [34]. e findings obtained in our research may be compared to those produced by Gharaibeh in [24], who classified the lesions using partial swarm optimization and Gaussian interval type-2 fuzzy membership functions methods. When these approaches are compared to the accuracy gained in our research, it is evident that the accuracy produced in our study surpasses the accuracy reported in [24].
To carry out this research and implement our models, we used an Nvidia Tesla K80 GPU, and we built our algorithm on Google Colaboratory [35]. We used a deep CNN-based architecture for our model.

Conclusion
e present work has shown that the use of transfer learning with DL models can achieve high accuracy in the proper classification of different aneurysms in the retinal area of the eye inflicted by diabetic retinopathy. In our paper, we used VGG16 as our model trained on the ImageNet dataset, and then we froze the first few layers of the model. But we retrained the last few layers of the network to train it specifically to capture the higher-level abstracted image features from the IDRID dataset. Diabetic retinopathy disease targets the retinal portion of the eye and entails the loss of vision. Early diagnosis of this condition in prestages is helpful to avoid fate.
is needs very accurate learning models to correctly classify the stages. Our proposed model classifies the DR patient aneurysms into proper class with accuracy for classifying MA, SE, EX, and HE images as 95.9% by using LR, 91% by using SGD, 95.9% by using NN, and 93.4% by using HE. A high AUC of 99.7% is indicative of a fewer number of type 2 errors, which is highly desirable in medical classification problems. e rise of DL techniques with transfer learning has made it possible to achieve high accuracies in vision problems.Gharaibeh utilized the PSO and GIT2FMFS techniques to classify lesions in [19]. e accuracy attained      utilizing the aforementioned approaches was 95%, which was lower than the accuracy achieved in our study. is model can be extended to achieve higher accuracy by enhancing the size of the dataset or by implementing a new model such as ResNet, MobileNet, and EfficientNet which can learn more relevant features from the image dataset.

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors would like to confirm there are no conflicts of interest regarding the study.