Multichannel Retinal Blood Vessel Segmentation Based on the Combination of Matched Filter and U-Net Network

Aiming at the current problem of insufficient extraction of small retinal blood vessels, we propose a retinal blood vessel segmentation algorithm that combines supervised learning and unsupervised learning algorithms. In this study, we use a multiscale matched filter with vessel enhancement capability and a U-Net model with a coding and decoding network structure. Three channels are used to extract vessel features separately, and finally, the segmentation results of the three channels are merged. The algorithm proposed in this paper has been verified and evaluated on the DRIVE, STARE, and CHASE_DB1 datasets. The experimental results show that the proposed algorithm can segment small blood vessels better than most other methods. We conclude that our algorithm has reached 0.8745, 0.8903, and 0.8916 on the three datasets in the sensitivity metric, respectively, which is nearly 0.1 higher than other existing methods.


Introduction
The human eyes consist of the following parts: cornea, pupil, iris, vitreous, and retina. Abnormalities in any of these tissue structures may cause vision defects or even blindness. Among them, the study of retinal structure and its blood vessels is significant [1]. The extraction of retinal blood vessels and the characterization of morphological properties, such as diameter, shape, distortion, and bifurcation, can be used to screen, evaluate, and treat different ocular abnormalities [2]. Evaluation of retinal vascular properties, such as changes in width, is used to analyze hypertension, while bifurcation points and tortuosity can help identify cardiovascular disease and diabetic retinopathy [3].
The retinal vessel extraction methods, including pattern recognition, are classified into five core classes [4]. The pattern recognition techniques are generally divided into two categories: supervised learning and unsupervised learning. The supervised learning method needs to use manual seg-mentation images of ophthalmologists for training. This method requires many training images, and the training time is longer than that of other methods, but this method has an excellent generalized effect and can be applied to other images of the same type. Compared with supervised learning, nonsupervised learning methods, such as matched filtering, mathematical morphology operations, blood vessel tracking, and clustering, do not require corresponding image labels but analyze and process based on the existing data. These two types of methods have been applied and innovated by many researchers in recent years.
1.1. Unsupervised Learning Methods. Literature [5] proposed a new kernel-based technique, viz, Fréchet PDF-based matched filter. The new method performs a better matching between the vessel profile and Fréchet template. Literature [6] improved the extraction method of blood vessels, using a series of morphological operations to extract small blood vessels, and finally fused with the segmented image to supple-   BioMed Research International ment the small blood vessels. Compared with other algorithms, it can segment as many tiny blood vessels as possible. However, the steps of the algorithm are too complicated, and although the final segmentation effect obtains the smallest blood vessels, the small blood vessels are in an intermittent state as a whole, and they are not well connected with thicker blood vessels. Literature [7] proposed a new matched filtering method, which applies contrast-limited adaptive histogram equalization and Gaussian second-derivative-based matched filter in preprocessing and uses an entropy-based optimal threshold method performing binarization. This algorithm effectively improves the sensitivity metric of segmentation, but like literature [6], it does not perform well with accuracy.
Literature [8] proposed an automatic segmentation method of retinal blood vessels using a matched filter and fuzzy C -means clustering. The algorithm uses contrast-limited adaptive histogram equalization to enhance the contrast of the image. After using Gabor and Frangi filters for noise removal and background removal, the fuzzy C-means are used to extract the initial vascular network, and the integrated level set method is used to refine segmentation further. The algorithm has good sensitivity and specificity. The problem is that the ability to segment small blood vessels is limited, and many segmentation details are missed. Literature [9] proposed a novel method to extract the retinal blood vessel using local contrast normalization and a second-order detector.

BioMed Research International
The proposed methodology achieves higher accuracy in vessel segmentation than existing techniques. Literature [10] proposed a novel matched filter approach with the Gumbel probability distribution function as its kernel. The reason to achieve the higher accuracy is due to a better matching filter with the Gumbel PDF-based kernel.
1.2. Supervised Learning Methods. Literature [11] proposed a method using deep conventional neural networks and a hysteresis threshold method to detect the vessels accurately. The proposed method gives good performance in which more tiny vessels are detected. Literature [12] proposed a multilevel CNN model applied for automatic blood vessel segmentation in retinal fundus images. A novel max-resizing technique is proposed to improve the generalization of the training procedure for predicting blood vessels from retinal fundus images. Literature [13] proposed a new segmentlevel loss used with the pixel-wise loss to balance the importance between thick vessels and thin vessels in the training process. Literature [14] proposed a cross-connected convolutional neural network (CcNet) to automatically segment retinal vessel trees. The cross connections between a primary path and a secondary path fuse the multilevel features. This method has relatively advanced performances, including competitive strong robustness and segmentation speed. Literature [15] proposed a method for retinal vessel segmentation using patch-based fully convolutional networks. Literature [16] applied dilated convolutions in a deep neural network to improve the segmentation of retinal blood vessels from fundus images. Literature [17] proposed a new improved algorithm based on the U-Net network model. The algorithm integrates the Inception-Res structure module and the Dense-Inception structure module into the U-Net structure. The algorithm dramatically deepens the depth of the network but does not add additional training parameters. It has good segmentation performance in the image segmentation of retinal blood vessels and has strong generalization ability. Literature [18] proposed a new hybrid algorithm for retinal vessel segmentation on fundus images. The proposed algorithm applies a new directionally sensitive blood vessel enhancement before sending fundus images to U-Net. Literature [19] proposed a supervised method based on a pretrained fully convolutional network through transfer learning. This method simplifies the typical retinal vessel segmentation problem into regional semantic vessel element segmentation tasks. Generally, unsupervised methods are less complex and suffer from relatively lower accuracy than supervised methods [13].
To solve the problem of insufficient segmentation of small blood vessels in most papers, we have devised a new automatic segmentation framework for retinal vessels based on improving U-Net and a multiscale matched filter. The creative points of this paper are summarized as follows: (1) We proposed an improved black hat algorithm to enhance the characteristics of blood vessels and reduce the interference of other tissues (2) An algorithm combining a multiscale matched filter and U-Net neural network is proposed. This paper mainly uses the improved U-Net convolutional The rest of this paper is organized as follows. Section 2 outlines the proposed method and datasets. The performance of the proposed method and the discussion are described in detail in Section 3. A conclusion is drawn in Section 4.

System
Overview. The proposed algorithm consists of three steps: preprocessing datasets, training U-Net in 3 channels, and postprocessing. This algorithm's main feature extraction framework is based on the improved U-Net model, using three feature extraction channels. It is mainly to perform a whole feature extraction of the image in channel 1 so that some morphological operations are performed in the preprocessing part to reduce image artifacts and noise. On the remaining two channels, matched filters are used to extract retinal vessels of different scales, and then, the improved U-Net model is used to extract features, and the OR-type operator is used to fuse the final output image. Experimental results verify that the image processed by multichannel matched filtering is better than the unprocessed image. The overall flowchart is shown in Figure 1.

Datasets.
To verify the effectiveness of the algorithm in this paper, this paper chooses three commonly used public datasets for training and testing: DRIVE, STARE, and CHASE_DB1 datasets. These datasets include a wide range of challenging images. The DRIVE contains 40 color retinal fundus images divided into a training set and a testing set. The plane resolution of DRIVE is 565 × 584. The STARE contains 20 color retinal fundus images with a resolution of 605 × 700 pixels. Unlike the DRIVE, this dataset does not have a training set and a testing set. The CHASE_DB1 contains 28 color retinal fundus images with a resolution of 960 × 999 pixels, and the training set and testing set are also not divided. Each image in these three datasets has a label of retinal blood vessel image segmented manually by two professional physicians. We randomly selected 5 images in the STARE dataset as test images (im0002, im0077, im0163, 5 BioMed Research International im0255, and im0291), and the remaining 15 images were set as the training set. In CHASE_DB1, we select the last 8 images as the test set and the remaining 20 images as the training set. Note that mask images of STARE and CHASE_DB1 are not available, so we extracted the green channel of the images and then used some morphological algorithms and threshold algorithm to obtain the mask images.

Preprocessing.
In this paper, the green channel is selected as the input image of the preprocessing part. This is because the retinal blood vessels presented by the green channel have better contrast with the background compared with the red channel and the blue channel [20,21], as shown in Figure 2.
It can be seen from Figure 2 that the appearance of blood vessels on the green channel of the color image consists of 6 BioMed Research International more information compared to that on the red and blue channel images, but the overall image is still dark, and the contrast is not obvious. In order to improve this situation, adaptive histogram threshold processing (CLAHE) [22] and gamma transformation are performed on the extracted green channel grayscale image, as shown in Figure 3. In this part of the process, CLAHE is used to enhance the contrast between the nonvessels and blood vessels, and gamma transformation is used to adjust and reduce the background noise in the image. We can see Tables 1-3 in Supplementary Materials for a comprehensive comparison of blood vessel enhanced algorithms, and these data can prove that the CLAHE method improves the general performance of the proposed method.

Multichannel Feature Extraction
2.4.1. Channel 1. In order to retain all the blood vessel feature information of the image as much as possible, some morphological operations are used in channel 1 to remove background noise, and then, the U-Net network is used for feature extraction. For the artifacts caused by uneven illumination in the image and nonvascular structures, we use the morphological closing operation algorithm to estimate the background and then perform the result using the mathematical operation shown in equation (1). It can be seen intuitively from Figure 4 that the brighter video disc structure in the original image is removed, and most of the artifacts are also processed.
where f ðx, yÞ is the processed image and I close ðx, yÞ is the image after a morphological closing operation. We select disk type structuring elements for the closing operator having a radius of eleven pixels. Iðx, yÞ is the original image; m and n are the image pixel size.

Channel 2.
By analyzing the gray image of retinal blood vessels, it can be found that the cross-sectional gray intensity of blood vessels is distributed in an inverted Gaussian curve, the gray value of the center line of the blood vessel is low, and the gray value at the edge of the blood vessel is high [5]. Aiming at this remarkable feature of retinal blood vessel images, Chaudhuri et al. [23] designed a Gaussian matched filter and used its distribution to simulate the grayscale intensity distribution of blood vessel cross sections and filter the blood vessels in sections. In this paper, the matched filters are used in channel 2 and channel 3 to separately enhance and extract the large and small blood vessels to realize the comprehensive segmentation of retinal blood vessels.
bl ⟵ loss n 16. par where s is the width of the Gaussian kernel and l is the length of the Gaussian kernel. The blood vessel starts from the center of the optic disc and extends in multiple directions. Rotating the Gaussian kernel is used to filter the multidirectional blood vessels. Assuming that pðx, yÞ is a discrete point in the kernel function, the rotation matrix is θ i ð0 ≤ θ i ≤ pÞ is the angle of the i-th kernel function, and the coordinate value of pðx, yÞ after rotation is p i = ðu, vÞ; then, the i-th template kernel function is where N is the template field, and the value range is In actual algorithm applications, it is often necessary to consider the mean value of the correlation coefficient of the template filter, as shown in Among them, A represents the number of points in the template area. So, the final template kernel function is This paper improves and optimizes the dependence of Gaussian matched filter response on a vessel diameter. The image enhancement result using large-scale Gaussian matched filtering in channel 2 is shown in Figure 5, where the parameters are set to l = 10:8, s = 1:9, and 8 directions which means i = ½1, 2,⋯,8 in equation (3). It can be seen from the image that the algorithm has a better segmentation effect for thicker blood vessels and strong antinoise, but it has a poor segmentation effect on small blood vessels, and there is a problem that the smaller blood vessels cannot be distinguished from the background, and the blood vessels are easily broken. In order to solve this problem, this paper proposes an improved method based on the black hat algorithm, which can effectively reduce the influence of background noise by subtracting the original image before matching filter processing and the obtained image after processing to enhance the characteristics of blood vessels. We performed a series of processing transformations as shown in equations (8) and (9) on the images processed by large-scale matched filtering. We call where • is the morphological closing operation and bðu, vÞ is disk type structuring element, B hat ð f Þ is the black hat transformation, f ðx, yÞ is the original image, and gðx, yÞ is the final processed image.

Channel 3.
This paper uses a small-scale Gaussian matched filter to enhance the image of small blood vessels, as shown in Figure 6. After many experiments, the parameters of the matched filter are set as l = 5, s = 0:1, and 18 directions which means i = ½1, 2,⋯,18 in equation (3). Using small-scale filters can effectively enhance the small blood vessels in the image, but at the same time, it also enhances much striped noise in the image, and the enhancing effect on the thick blood vessels with central reflection is poor. To reduce the background noise, the black hat2 algorithm used in channel 2 is also used to remove the background in channel 3.

U-Net Model.
In image semantic segmentation using deep learning, the U-Net network model is the most widely used, which is improved based on the classic full convolutional network (FCN) [24]. U-Net is an image-to-image pixel-level classification network, and its network structure is apparent, as shown in Figure 7. U-Net is different from other standard segmentation networks: U-Net uses an entirely different feature fusion method-splicing. U-Net stitches the features together in the channel dimension. This  Unlike the structure in the original literature [24], this paper sets the padding value of 1 in each layer's convolution operation, and the convolution kernel size is 3 * 3. The purpose is to ensure that the output and input image sizes are consistent and avoid the size increasing operation in the output layer. It is essentially a binary classification operation in the output layer of U-Net. We use an adaptive threshold segmentation algorithm for processing in this paper. The idea of this algorithm is not to calculate the global image threshold but to calculate the local threshold according to different areas of the image, so for different areas of the image, the algorithm can adaptively calculate different thresholds and perform binary segmentation. The specific calculation process is shown in where b is the fixed parameter, ð2 m + 1Þ × ð2n + 1Þ is the area, and T is the area's threshold. This paper proposes a new loss function that combines the Dice coefficient with the two-class cross-entropy loss function. The Dice coefficient is widely used in the evaluation of image segmentation. In order to facilitate the formation of the minimized loss function, as shown in where X ∩ Y represents the common elements of the predic-tion graph and the label graph, X and Y represent the number of elements of the prediction graph and the label. In order to facilitate the calculation, approximate |X ∩ Y | as the dot product between the predicted probability map and the label, and add the elements in the result. |X | and |Y | are quantified by summing the squares of each element. As shown in where N is the number of pixels, pðk, iÞ ∈ ½0, 1 and qðk, iÞ ∈ ½0, 1 are the predicted probabilities and true labels of the pixel belonging to category k. The cross-entropy loss function used to optimize the network is shown as where TP and TN are the numbers of true positive and true negative pixels, respectively; N p and N n are the numbers of segmented pixels and nonsegmented pixels, respectively; y is the label value (y = 1, segmentation target; y = 0, background); and p is the predicted probability value of the pixel.
Notably, the coefficient λ is set to 0.5 in this work, and the flowchart of U-Net is summarized in Algorithm 1.

2.6.
Postprocessing. In the postprocessing, since the final segmentation image merges the three segmentation images, the noise in the resulting image is also superimposed on all the noises of the three images. Noises will undoubtedly have a significant impact on the actual effect of the segmented image, so this paper addresses this issue in the final postprocessing step. In this paper, a morphological algorithm is used to calculate the size of the connected area of the image. The 8adjacent connection method is adopted to eliminate the area with the connected area less than 25 pixels, which is to reclassify the area pixels as background. This paper selects a test image in the DRIVE dataset for experimental comparison, and the comparison images are shown in Figure 8. The U-Net model used in this paper is slightly different from the structure in literature [24]. In order to keep the input and output image sizes of the model consistent, the convolution structure is adjusted accordingly. The specific model structure parameters are shown in Table 1.
In training, we set the epoch to 30 and the initial learning rate lr to 0.01, and then, the learning rate is set to update in a three-stage formula, as shown in Setting a larger learning rate at the beginning is to make the model obtain the vicinity of the optimal global parameters faster, and this operation can reduce the training time of the model. After training for a particular epoch, the learning rate needs to be reduced accordingly in order to make the parameters closer to the optimal value in subsequent updates. The stochastic gradient descent (SGD) algorithm is used in the optimization of the loss function.

Training Image Preparation.
We randomly select 15 images from STARE and the first 20 images from CHASE_     [13] 0 DB1 as their respective training set. Due to the limited number of images in the existing dataset, to avoid the overfitting phenomenon in the model training, we perform data expansion processing on the training set of each dataset. Thanks to the translation invariance of the convolutional structure, the images in the training set in this paper were flipped horizontally and vertically and rotated 180 degrees to increase the amount of data 4 times.

Measuring Metrics.
In order to evaluate the segmentation performance of this algorithm, we use the following metrics to perform a comprehensive evaluation of the segmentation result. These metrics are accuracy (ACC), sensitivity (Se), specificity (Sp), and AUC and calculated as follows: where TP is true positive, FP is false positive, TN is true negative, and FN is false negative. Se is the sensitivity, which indicates the degree of classification of blood vessels and nonvascular pixels. In this paper, higher sensitivity indicates that more tiny blood vessels can be detected. Sp is specificity, which is used to express the ability of the algorithm to recognize nonvascular pixels. ACC is the accuracy of algorithm segmentation, reflecting the gap between the algorithm segmentation result and the natural result. AUC is the area under the ROC curve, and we adopt another calculation method to get the AUC, as shown in equation (19) [11]. Besides, we also use two other evaluation metrics to measure the effect of segmentation: MCC and CAL.
MCC is a correlation coefficient between the segmentation output of the algorithm and ground truth. It comprehensively considers TP, TN, FP, and FN, which is a relatively balanced metric. Finally, it is more suitable for an imbalanced class ratio.
CAL can be expressed as the product of C, A, and L as follows: Suppose S and S G are the segmentation result and the corresponding ground truth, respectively. These functions are defined as follows: (1) Connectivity (C): it evaluates the fragmentation degree between S and S G by comparing the number of connected components: where # C ð•Þ means the number of connected components, (2) Area (A): it evaluates the degree of intersecting area between S and S G and is defined as where δ α ð·Þ is a morphological dilation using a disc of α pixels in radius. We set α = 2.
(3) Length (L): it evaluates the equivalent degree between S and S G by computing the total length: where φð·Þ is the homotopic skeletonization and δ β ð•Þ is a morphological dilation with a disc of β pixel in radius. We set β = 2.

Results and Discussion
As shown in Figure 9, one test image is selected from each of the three datasets to display the segmentation results of each channel and the fusion results. It can be seen that some of the intermittent blood vessels of each channel are reconnected after fusion, and the number of small blood vessels in the fusion map is significantly higher than that of each channel segmentation map.
The DRIVE dataset is selected as the experimental object and compares the three channels' metric data in this paper. The results show that the overall fusion effect of the three channels is better than the segmentation results of every single channel; in particular, the sensitivity has been dramatically improved, as shown in Table 2.
To illustrate this paper's segmentation effect, we list various metrics on the DRIVE, STARE, and CHASE_DB1 datasets of different papers in recent years in Tables 3-5. It can be seen that the algorithm in this paper is superior to most similar papers in sensitivity and AUC metrics. To have a more comprehensive understanding of the overall segmentation effect of the test set, we show the relevant indicators of the prediction results of all test set images in Table 6. The other essential metrics are MCC and CAL, and they achieved by the proposed method has been contrasted with existing segmentation techniques on the DRIVE, STARE, and CHASE_ DB1 datasets shown in Table 7.
We selected image 19_test from the test set of the DRIVE dataset to display the segmentation results, as shown in Figure 10. Literature [5,27] segmented some small blood vessels, but it is still slightly insufficient compared to this paper's segmentation diagram. Literature [10] lacks many details, and the small blood vessels are not segmented. The segmentation result of literature [11] contains a lot of edge noise, and there are many intermittent blood vessels. Compared with the existing segmentation methods, the segmentation results in this paper have a good performance in terms of the integrity of the whole blood vessels and the segmentation of small blood vessels.
As shown in Figure 11, we select the test results of the image im0163 in the STARE dataset for comparison. It can be shown that the segmentation results of this paper are similar to those of literature [13,14], but the background noise in literature [13] is not eliminated. Compared with literature [5,10,27], the algorithm in this paper illuminates the optic disc structure in the original image as much as possible in the preprocessing part, so the problem that is incorrectly dividing part of the optic disc structure into blood vessels like these papers did not appear in the final segmentation result.
The CHASE_DB1 dataset is not used in most of the papers about retinal blood vessel segmentation. One of the reasons is that the dataset contains half of the abnormal images, which may cause some interference to the trained segmentation model. Meanwhile, this dataset is also a new and challenging dataset compared to the classic DRIVE and STARE datasets. We selected four images image_12R, image_13L, image_13R, and image_14L from the test set of the CHASE_DB1 dataset to compare the segmentation results in order to verify the generalizability of the proposed algorithm, as shown in Figure 12. The segmentation result of the algorithm in literature [19] has much noise, and some blood vessels are not effectively separated. Literature [28] (a) Original image (b) Ground truth (c) Literature [5] (d) Literature [13] (e) Literature [14] (f) Literature [10] (g) Literature [27] (h) Proposed result Figure 11: Comparison of different methods on the STARE dataset.
14 BioMed Research International does an excellent job in the segmentation of small blood vessels, but there is a problem that some blood vessels are not connected. Due to the postprocessing in this paper, the segmentation result on this dataset contains less noise and guarantees the continuity of most blood vessels. However, compared with the manual label, some tiny blood vessels cannot be completely segmented from the image background.
The source codes of the proposed framework have been running on the PC (Intel Core i5-6300HQ CPU, 2.30 GHz, 12.0 GB RAM, NVIDIA GTX 950M GPU). DRIVE, STARE, and CHASE_DB1 have spent 11.3 h, 7.1 h, and 16.4 h on training separately in each channel. The average testing time of test images was 1.34 s. Table 8 shows the parameter comparison of the proposed method with other methods based on U-Net, which can help us compare the framework complexity of different methods. Note that the parameters are not equal to the training time because some methods use slices of a train image as input of the network. For example, literature [19] has 42421 slices as the training set, which means it needs more time to train the network.   Figure 12: (a, e, i, m, q) Image_12R, (b, f, j, n, r) image_13L, (c, g, k, o, s) image_13R, and (d, h, l, p, t) image_14L from the CHASE_DB1 dataset. (a-d) Original images, (e-h) ground truth, (i-l) literature [19], (m-p) literature [28], and (q-t) proposed segmentation images.

Conclusion
This paper proposes a new retinal blood vessel segmentation method, which combines a multiscale matched filter with a U-Net neural network model of deep learning. First of all, we use an improved morphological image algorithm to effectively reduce the impact of image background in feature extraction. Additionally, in order to avoid ignoring the characteristics of small blood vessels, this paper performs multichannel feature extraction and segmentation on retinal blood vessel images. Finally, the segmented images of the three channels are merged, and various characteristics of retinal blood vessels are obtained as much as possible. In the training of the U-Net model, we used the loss function weighted by the Dice coefficient and the binary crossentropy to solve the image pixel imbalance problem. The algorithm of this paper is tested on the existing public datasets DRIVE, START, and CHASE_DB1. The experimental results show that there is better performance in four metrics compared with similar papers. The average sensitivity of the algorithm in this paper reached 0.8745, 0.8903, and 0.8916 on the DRIVE, STARE, and CHASE_DB1 datasets, respectively. This result is nearly 0.1 higher than the average sensitivity of other papers. The improvement of the sensitivity metric also reflects that the algorithm in this paper has a good performance in extracting small blood vessels. The focus of this paper is to combine the advantages of unsupervised algorithms and supervised algorithms. We did not make too many improvements to the U-Net network. Therefore, how to prune the deep learning network model structure will be an interesting research direction in the future.

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.