A Hybrid Unsupervised Approach for Retinal Vessel Segmentation

Retinal vessel segmentation (RVS) is a significant source of useful information for monitoring, identification, initial medication, and surgical development of ophthalmic disorders. Most common disorders, i.e., stroke, diabetic retinopathy (DR), and cardiac diseases, often change the normal structure of the retinal vascular network. A lot of research has been committed to building an automatic RVS system. But, it is still an open issue. In this article, a framework is recommended for RVS with fast execution and competing outcomes. An initial binary image is obtained by the application of the MISODATA on the preprocessed image. For vessel structure enhancement, B-COSFIRE filters are utilized along with thresholding to obtain another binary image. These two binary images are combined by logical AND-type operation. Then, it is fused with the enhanced image of B-COSFIRE filters followed by thresholding to obtain the vessel location map (VLM). The methodology is verified on four different datasets: DRIVE, STARE, HRF, and CHASE_DB1, which are publicly accessible for benchmarking and validation. The obtained results are compared with the existing competing methods.


Introduction
The most essential sensory system for gathering information, navigation, and learning is the human visual system [1]. The retina is the sensitive part of the eye that contains fovea, light receptors, Optic disk, and macula. The retina is a layered tissue, coating the interior of the eye, which is an initial sensor of the communication system and gives a sense of sight. Moreover, it allows understanding the colors, dimensions, and shape of objects by processing the amount of light it reflects or emits. Retina image of an eye is captured with a fundus camera [2]. RGB photographs of the fundus are the protrusion of the internal surface of an eye. Imaging of the retina has emerged swiftly and now one of the most common practices in healthcare and for screening the patients suffering from ophthalmologic or systemic diseases. For identify ing numerous ophthalmologic diseases, the ophthalmologist uses vessel condition as an indicator which is a vital component in retinal fundus images.
Critical diagnostic to eye diseases in human retinal images can be indicated by its shape analysis, its appearance, blood vessels, morphological features, and tortuosity [3]. Structure of RVS is also used for screening of brain and heart stock diseases [4,5]. Retinal vessel structures play a significant role among other structures in fundus images. RVS is the elementary phase utilized for the examination of retina images [6]. Vascular-related diseases are diagnosed with the help of vessel delineation which is an important component of medical image processing. Additionally, ongoing research in the area of deep learning suggested multiple approaches with emphasis on the separation and the delineation of the vasculature.
The inadequate number of images and having lowcontrast in publicly available retina datasets is challenging for deep learning-based research. A dataset having a large number of retina images captured with a different imaging system and under diverse environmental conditions is required to train the supervised network. Deep learningbased methods will aid to control blindness, timely and precise identification of diseases for successful remedy, and thus vividly increase the life quality of patients with eye ailments [7]. RVS is a very difficult task due to many reasons: (1) The structure and formation of retinal vessels are very complex and there is a prominent dissimilarity in various local parts regarding the shape, size, and intensity in vessels.
(2) Some structures have the same intensity and shape as vessels, e.g., hemorrhage. Moreover, there are also thin microvessels, whose width is normally between ranges from one to a few pixels and which can be easily mixed with the background. There are irregular illumination in the images and having low-varying contrast [7,8]. Typically, noise in fundus images is added by the image-capturing procedure such as artifact on the lens or movement of the patient [9]. It is hard to differentiate vessels from other structures that are similar or noises in the retina image. In other words, thicker vessels are more prominent in comparison to the thinner ones as shown in Figure 1 (3) Different manual graders have different segmentation results. Manual RVS is also a very hard and tedious task. Over the recent two decades, automatic RVS has caught noteworthy attention and numerous such techniques are developed but they have performance degradation with the change of datasets. Some of the techniques are not fully automatic while others are incapable to handle pathological images. Some of these methods are evaluated on the datasets having a limited number of images while others have problems of oversegmentation or undersegmentation with abnormal images [10]. Hence, the dilemma of perfect RVS is still not answered.
Automated RVS techniques provide incredible support to the ophthalmologist in terms of identification and medication of numerous ophthalmological abnormalities. In this article, an automatic unsupervised approach is developed for RVS that consists a combination of the preprocessing steps, segmentation, vessel structure-based enhancement, and postprocessing steps. The preprocessing steps aim at exterminating noise and improving the contrast of the fundus image. Segmentation is performed by using the Modified Iterative Self Organizing Data Analysis Technique (MISO-DATA) to acquire a binary image that is fused with the segmented image of the Combination Of Shifted Filter Responses (B-COSFIRE). Then, the fused image is multiplied with the enhanced image of the B-COSFIRE to obtain the ini-tial vessel location map (VLM). Lastly, the VLM and the fused image are combined by logical OR-type operators to obtain final results. In a nutshell, the main contributions of this research are the following: (1) A mask image is not provided with all retina datasets.
Automatic masking creation is proposed for each image to extract ROI which suppresses the false positive rate (FPR).
(2) The proposed efficient denoising process (preprocessing steps) improves the selection of a suitable threshold.
(3) The basic ISODATA algorithm only one-time process the retina image locally and then globally, which sometimes makes it unable to find an optimal threshold. The modified ISODATA technique is introduced to find the global threshold of the entire image which is compared and equated with the individual local threshold of each segment in order to find the optimal threshold for more precise detection of vessels.
(4) The vessel location map (VLM) is a new scheme to achieve better performance. In this scheme, the background noise eradication and vessel enhancement are accomplished independently.
(5) A distinctive postprocessing steps (AND-type and OR-type operations) to reject misclassified foreground pixels.

Related Works
Numerous methodologies for RVS have been developed in literature [4,10]. These methodologies are arranged into two sets: supervised and unsupervised procedures. Supervised techniques utilizing a trained classifier for pixel classification into the foreground or background. Supervised techniques utilized various classifiers, for instance, adaptive boosting (AdaBoost), support vector machines (SVM), neural networks (NN), Gaussian mixture models (GMM) and k-nearest neighbors (k-NN). A RVS method utilizing a supervised k-NN classifier for isolation of foreground and background pixels was recommended by Niemeijer et al. [11], with a feature vector (FV) formation based on a multiscale (MS) Gaussian filter. Staal et al. [12] projected an equivalent RVS methodology using an FV generated based on a ridge detector. A feed-forward NN built classifier was applied by Marin et al. [13], using 7-D FV generated based on moment-invariant.
An SVM-based approach was presented by Ricci et al. [14], utilizing FV constructed through a rotation-invariant linear operator and pixel intensity. An AdaBoost classifier was suggested by Lupascu et al. [15], utilizing a 41 − D feature set. An ensemble-based RVS system applying a simple linear iterative clustering (SLIC) algorithm was presented by Wang et al. [16]. A GMM classifier-based scheme was recommended by Roychowdhury et al. [17], utilizing 8 − D FV extracted from the pixel neighborhood on first and secondorder gradient images.

BioMed Research International
Zhu et al. [18] offered an extreme learning machine(ELM) based RVS scheme utilizing a 39 − D FV generated by morphological and local attributes combined with attributes extracted from phase congruency, Hessian, and divergence of vector fields (DVF). Tang et al. [19] recommended an SVM-based RVS scheme utilizing an FV created based on MS vessel filtering and the Gabor wavelet features. A random forest classifier-based RVS system was proposed by Aslani et al. [20], utilizing a 17 − D FV created based on MS and the multiorientation Gabor filter responses and intensity feature combined with feature extracted from vesselness measure and B-COSFIRE filter.
A directionally sensitive vessel enhancement-based scheme combined with NN derived from the U-Net model was presented in [21]. Thangaraj et al. [22] constructed a 13 − D FV from the Gabor filter responses, Frangi's vesselness measure (1D), local binary pattern feature (1D), Hu moment invariants (7D), and grey-level cooccurrence matrix features (3D) for RVS utilizing NN-based approach. Memari et al. [23] recommended an arrangement of various enhancement techniques with the AdaBoost classifier to segregate foreground and background pixels.
A three-stage (thick vessel extraction, thin vessel extraction, and vessel fusion-based) deep learning approach were proposed in [24]. Guo et al. [25] suggested an MS deeply supervised network with short connections (BTS-DSN) for RVS. Local intensities, local binary patterns, a histogram of gradients, DVF, higher-order local autocorrelations, and morphological transformation features were used for RVS in [26]. Random forests were used for the selection of feature 1 function level = isodataðI d Þ; 2 Step 1: compute the mean intensity of image from histogram; 3 set T = mean ðIÞ; 4 ½counts, N = imhistðI d Þ; 5 i ⟵ iteration, let i = 1; 6 mu = cumsumðcountsÞ; 7 TðiÞ = ðsumðN * countsÞÞ/muðendÞ; 8 TðiÞ = roundðTðiÞÞ; 9 M ⟵ mean intensity of I d utilizing histogram; 10 Step 2: for i ⟵ 1do 11 Compute MAT ⟵ mean above threshold using T from step 1; 12 Compute MBT ⟵ mean below threshold using T from step 1; 13   BioMed Research International sets which were utilized in combination with the hierarchical classification methodology to extract the vessels. Alternatively, unsupervised systems are categorized based on matched filtering (MF), mathematical morphology (MM), and multiscale-based approaches. In matched filtering approaches, thick and thin vessels are extracted by the selection of large and small filter kernels, respectively. How-ever, the application of large kernels can accurately detect major vessels with the misclassification of thin vessels by increasing its width. Similarly, smaller kernels can accurately extract thin vessels along with the extraction of thick vessels in reduced widths. To obtain a complete vascular network, a conventional MF technique can be applied with a large number of diverse filter masks in various directions.

Proposed Model
The complete structure of the proposed RVS framework is introduced in this section. The information and description of every stage are also presented in subsections.
3.1. Overview. The proposed framework consists of two major blocks to obtain a final binary image: (1) retina image denoising and segmentation and (2) vessel structure-based enhancement and segmentation. The key objective of this framework is to extract vasculature excellently along with the elimination of noise and supplementary disease falsifications. The complete structure of the proposed framework is labeled in Figure 2. In which Block-I consists of the selection of suitable retina channel, contrast enhancement, noise filtering, region of interest (ROI) extraction, thresholding, and post processing steps. Block-II includes the application of B-COSFIRE filter, logical operations, and postprocessing steps. The initial binary vessel map of Block-I is fused with the B-COSFIRE filter segmented image in Block-II. Then, it is multiplied with the B-COSFIRE filter-enhanced image which is further thresholded. This output image is combined with the initial postprocessed image by the logical OR-type operation to obtain the final binary.
3.2. Block-I: Retina Image Denoising and Segmentation. In the first block, the retina image is passed through selected techniques to extract the initial denoised vessel map. The green band of the RGB retina image is extracted and nominated for subsequent operation due to its noticeable contrast difference between the vessel and other retina structures. The RGB retina images generally have contrast variations, low resolution and noise. To avoid such variations and produce more appropriate image for further processing, the vessel light reflex elimination and background uniformity operations are performed. Retinal vessel structures have poor reflectance when equated to other retinal planes. Some vessels contain a bright stripe (light reflex) which runs down the central length of the vessel. To overcome this problem, a disc-shape opening operator with a 3-pixel width SE is used on the green plane. A minimal value of disc width is selected to avoid the absorption of close vessels. The background uniformity and smoothness of random salt-and-pepper noise are obtained by the application of a 3 × 3 mean filter. Additional noise flattening is achieved with the application of a Gaussian kernel of size 9 × 9, mean = 0, and variance 1:8.
CLAHE [51,52] is applied on the preprocessed green channel to make vessel structures prominent. The CLAHE operation divides the input image into blocks (size 8 × 8 in our case) with the constraint of contrast improvement which is set to 0.01. The clip limit suppresses the noise level and escalates the contrast. The effect of the CLAHE process (I clahe ) along with the green plane is displayed in Figure 3. Histogram-based graphical demonstration of the contrast improvement operations is displayed in Figure 4. An averaging filter of size 49 × 49 is applied for smoothness and elimination of anatomical regions (e.g., optic disk, macula, and fovea). I avg symbolizes the output image of the averaging filter. The difference image (I d ) is computed for all pixels as follows.
The extra regions of the retinal image are cropped by the utilization of the masking method to extract ROI which reduced the computational complexity. An automatic mask is created from the red band of the retinal image. The reason behind using the red channel for mask construction is that it has a good vessel-background dissimilarity. The automatic mask is created for all datasets because the mask image is not available in some datasets. I d is thresholded by the MISO-DATA algorithm. The subsequent procedure is used to compute the threshold level, and the application of MISODATA is shown in Algorithm 1.    The isolated pixels with an area less than 25 pixels in the image (I s 1 ) are trimmed and fused with the B-COSFIRE filter segmented image of Block-II by AND-type operation. The physical stats (eccentricity and area) are utilized for the rejection of nonvessel structures. The vessel structures have a higher area and eccentricity as their pixels are linked and having an elongated structure. Figure 5 indicates the graphical results of the I avg , I d , and I s 1 .

Block-II: Vessel Structure-Based Enhancement and
Segmentation. In Block-II, the masked image of the Block-I is used as an input for vessel structure-based enhancement and RVS. B-COSFIRE filter [5] is applied for contrast improvement of vessel structures that will enhance noise also along with the enhancement of vessel structures if the image is not preprocessed. Therefore, the masked image is used for further processing. B-COSFIRE filter produced two results: binary segmented image (I s C ) and vessel structure-based enhanced image (I E C ). The outputs of B-COSFIRE filter are displayed in Figure 6. The AND-type operation is used to combine I s 1 with I s C that produced output image denoted by I And . The effect of AND-type operation is shown in Figure 7, which demonstrates that if an alternative operator like OR-type is utilized, it will introduce noise and misclassification. The advantage of using an AND-type operator is exposed in Figure 8 by displaying the visual results with and without using the AND-type operator. The I AND is postprocessed (I p 1 ) and multiplied with I E C which is further thresholded to obtain a segmented image (I s 2 ). Pixel-by-pixel multiplication aims at ensuring the detection of vessels at their correct position. The logical OR-type operation is used to produce the final result by coupling of I p 1 and I s 2 . The visual effects of the OR-type operator are presented in Figure 9.
The B-COSFIRE filter application includes convolution with difference of Gaussian (DoG) filters, its blurring effects, shifting the blurred responses, and an approximate point- 11 BioMed Research International wise weighted geometric mean (GM). A DoG function Do G σ ðx, yÞ is given by [5] DoG σ x, y ð Þ= 1 2πσ 2 exp − where σ is the standard deviation (SD) of the Gaussian function (GF) that decides the range of the boundary. 0:5σ is manually set SD value of the internal GF, and ðx, yÞ symbolizes the pixel position of the image. Response of DoG filter C σ ðx, yÞ with kernal function of DoG σ ðx − x ′ , y − y ′ Þ has been estimated by convolution, where (x ′ , y ′ ) denotes pixels intensity distribution.
where j : j + represents the half-wave rectification process to reject negative values.
In the B-COSFIRE filter, three factors (σ i , ρ i , ∅ i ) are used to represent each point i, where σ i = SD of the DoG filter, while ρ i and ∅ i denote the polar coordinates. This set of parameters is indicated by S = fðσ i , ρ i , ∅ i Þ | i = 1, ⋯, ng, where n represents the figure of measured DoG responses. The blurring process indicates the calculation of the extreme limit of the weighted thresholded responses of a DoG filter. The blurring operation is shown as follows.
where σ 0 ′ and α are constants. Each DoG-blurred outcome is moved in the reverse direction to ∅ i by a gap ρ i , and as a result, they can merge at the support center of the B-COSFIRE filter. Blurred and shifted responses of the DoG filter is indicated by S σ i ,ρ i ,∅ i ðx, yÞ for every tuple ðσ i , ρ i , ∅ i Þ in set S. The i th blurred and shifted response of the DoG filter is defined as where ω i = exp −ρ 2 i /2σ 2 and j : j t symbolizes the thresholding response at t, ð0 ≤ t ≤ 1Þ. Equation (6) represents the ANDtype operation that is attained by the B-COSFIRE filter only when all DoG filter responses S σ i ,ρ i ,∅ i are larger than zero. The overall step-by-step visual results according to the block diagram ( Figure 2) are portrayed in Figure 10.

Experimental Outcomes and Deliberation
This section will provide the information about datasets, performance metrics, analysis of experimental results, and time complexity of the proposed method.

4.1.
Datasets. The proposed system obtained remarkable results on the freely online available datasets: DRIVE [11,12], STARE [53], HRF [54], and CHASE_DB1 [55]. The magnificence of the framework is justified in terms of assessment with state-of-the-art systems. The datasets used for endorsement of the suggested framework are encapsulated in Table 1. The manually labeled results in all datasets are utilized as a gold standard for performance assessment of the proposed framework.  computed. This metric provides justification based on the properties (connectivity-area-length) of the segmented structures beyond the correctly classified image pixels. In Table 2, N = TN + TP + FN + FP, S = ðTP + FNÞ/N and P = ðTP + FPÞ/N [58]. The terms TP, TN, FP, and FN denote the true positive (exactly matched vessel pixels), true negative (exactly matched nonvessel pixels), false positive (invalidly predicated vessel pixels), and false negative (invalidly predicated nonvessel pixels), correspondingly.
Let I S be the extracted final binary image and I G the corresponding manual segmented image. The considered metric evaluates the following [57,58]: (i) Connectivity (C): it calculates the fragmentation grade of I S with respect to the manual segmentation I G and penalizes fragmented segmentation. It is computed as where # C ð·Þ sums the linked segments while #ð·Þ measures the number of vessel pixels in the considered binary image The value of ε controls the tolerance to lines of various sizes. We set ε = 2 (iii) Length (L): it determines the equivalent degree between I S and I G by computing the length of the two line networks: where φð·Þ is a skeletonization process and δ β ð·Þ is a morphological dilation with a disc SE of β pixel radius. The value of β controls the tolerance to dissimilarity of the line tracing output. We set β = 2. The final assessment parameter, named CAL, is demarcated as f ðC, A, LÞ = C · A · AL.  Tables 4 and 5 and Table 6, respectively. The best and worst results within Tables 3-6 are highlighted in italic font. The best and worst image results from each dataset are selected based on their accuracy's scores. Their pictorial results are shown in Figures 11-14.
The framework performs well on both healthy and pathological images of all selected datasets. The statistical results in Tables 3-6 validates that the suggested system is robust and has the capability to handle the bright lesions images of the STARE dataset, higher resolution images of the HRF dataset, low resolution images of the DRIVE dataset, and left/right eyes images of the CHASE_DB1 dataset. The anatomical structures are also efficiently omitted to avoid any misclassification.
The average statistical results of the proposed framework on all selected datasets are displayed in Table 7, which reflects that the highest mean score of Acc 0.997, Sn 0.814, Sp 0.997, and AUC 0.905 is achieved on the CHASE_DB1 dataset. The lowest FPR is also observed using the same dataset. The highest value of MCC 0.761 and CAL 0.699 is recorded on the HRF dataset. The highest value of each parameter is italicized in the respective column of the Table 7.
The average performance parameter scores of the proposed framework on the DRIVE and STARE datasets are compared with the existing literature in Table 8, while  Table 9 shows the result comparison of the HRF and CHASE_DB1 datasets. The Acc, Sn, and Sp results of all techniques in Tables 8 and 9 are acquired from their respective published articles while the AUC result is calculated by using the formula in Table 2.
In Table 8, the obtained results of the framework are compared with 19 unsupervised and 18 supervised existing techniques. The proposed framework achieved the highest Acc result than all unsupervised methods on the DRIVE dataset except Khan et al. [40], Memari et al. [59] which is 0.003%, and Fan et al. [60] which is 0.002% better than ours. The supervised methods Ricci and Perfetti [14], Lupascu et al. [15], Wang et al. [16], Zhu et al. [18], Thangaraj et al. [22], Memari et al. [23], Khowaja et al. [26], and Fan et al. [61] show 0.001%, 0.001%, 0.019%, 0.003%, 0.003%, 0.014%, 0.017%, and 0.008% better results than the proposed method, respectively. But some of these methods are only validated on one dataset, which reflects that they are tuned for a single dataset. Some of these methods produce a very low AUC score, which is a trade-off between Sn and Sp. Moreover, supervised methods are computationally very Roychowdhury [17] 3.11 sec Intel Core i3 CPU 2.6 GHz, 2 GB RAM Zhu [18] 12.160 sec 4.0 GHz Intel i7-4790K CPU and 32 GB RAM Memari [23] 8.2 mins Intel i5-M480 CPU, 2.67 GHz, 4 GB RAM Biswal [29] 3.3 sec Intel i3 (4010U CPU) 1.7 GHz, 4 GB RAM Badawi [3] 8 sec CPU 2.7 GHz, 16 GB RAM Yue [50] 4.6 sec Intel i5-6200U CPU 2.3 GHz, 8 GB RAM Khan [39] 6.1 sec 5 * Intel Core i3 CPU, 2.53 GHz, 4 GB RAM Khan [40] 1.56 sec Azzopardi [5] 11.83 sec Vlachos [47] 9.3 sec Bankhead [38] 22.45 sec Proposed 5.5 sec 17 BioMed Research International expensive. In the case of the STARE dataset, the framework produced highest Acc scores than all other methods. Table 9 reflects that there are very few techniques that used both HRF and CHASE_DB1 datasets for validation. The Acc score of the framework is higher than both supervised and unsupervised approaches on the HRF and CHASE_ DB1 datasets except Soomro et al. [62] and Fan et al. [61] which is slightly higher than ours on HRF dataset only. Fan et al. [61] showed higher Sp value than all other methods on the HRF dataset. The highest Sp value on CHASE_DB1 dataset is obtained by the proposed method. All the other supervised and unsupervised methods acquired a bit greater or equivalent values of Sn and AUC metric on the HRF and CHASE_DB1 datasets as compared to ours.
The average value of MCC attained by the proposed method is higher than all compared unsupervised approaches on the DRIVE, STARE, and HRF datasets, while it is statistically lower than the supervised methods (i.e., FC-CRF [73] and UP-CRF [73]) on the DRIVE, STARE, and CHASE_DB1 datasets. The CAL value of the proposed method is observed higher than all supervised and unsupervised methods on the HRF dataset, while it is statistically lower than or equivalent to CAL values of other methods on the DRIVE, STARE, and CHASE_DB1 datasets.  Table 11. The time values are computed on the single image taken from the DRIVE and STARE datasets.

Conclusion
Vessel extraction is momentous for inspecting abnormalities inside and around the retinal periphery. The retinal vessel segmentation is a challenging task due to the existence of pathologies, unpredictable dimensions and contour of the vessels, nonuniform clarification, and structural inconsistency between subjects. The proposed methodology is consistent, faster, and completely automated for isolation of retinal vascular network. The success of the proposed framework is evidently revealed by the RVS statistics on the DRIVE, STARE, HRF, and CHASE_DB1 datasets. The eradication of anomalous structures prior to enhancement boosted the efficiency of the proposed method. The application of logical operators avoids misclassification of foreground pixels which enhances the accuracy and makes the method robust. Pictorial representation validates that the framework is able to segment both healthy and unhealthy images. Furthermore, the method does not include any hand-marked data by experts for training, which makes it computationally fast.

Data Availability
All the data are fully available within the manuscript without any restriction.