Multimodal Medical Image Fusion by Adaptive Manifold Filter

Medical image fusion plays an important role in diagnosis and treatment of diseases such as image-guided radiotherapy and surgery. The modified local contrast information is proposed to fuse multimodal medical images. Firstly, the adaptive manifold filter is introduced into filtering source images as the low-frequency part in the modified local contrast. Secondly, the modified spatial frequency of the source images is adopted as the high-frequency part in the modified local contrast. Finally, the pixel with larger modified local contrast is selected into the fused image. The presented scheme outperforms the guided filter method in spatial domain, the dual-tree complex wavelet transform-based method, nonsubsampled contourlet transform-based method, and four classic fusion methods in terms of visual quality. Furthermore, the mutual information values by the presented method are averagely 55%, 41%, and 62% higher than the three methods and those values of edge based similarity measure by the presented method are averagely 13%, 33%, and 14% higher than the three methods for the six pairs of source images.


Introduction
With the development of medical technology, computer science, and biomedical engineering technology, the medical image technology can provide the clinical diagnosis with a variety of multimodal medical images such as the computed tomography (CT), the magnetic resonance imaging (MRI), the single photon emission computed tomography (SPECT), the positron emission tomography (PET), and ultrasonic images [1]. Different medical image can display different information of the same viscera in the body. For example, the MRI is good at expressing the soft tissue information compared to the CT. However, the CT image can provide better information of tissue calcification and bone segment than the MRI can. In the clinic application, a single modal medical image often cannot provide doctors with enough information to make the correct diagnosis [2,3]. It is necessary to combine different modal images into one image with enough information of source images. The fused medical images can contain the vital information from the several modal images to demonstrate the comprehensive information of diseased tissue or organs. At the same time, the redundant information in the source images is abrogated. Hence, the doctor can easily make an accurate diagnosis or determine the accurate therapeutic scheme.
Generally, medical image fusion algorithms are divided into two categories: spatial domain methods and multiscale decomposition domain methods [4]. The spatial domain methods combine pixels or regions from source images into fused images in the spatial domain [5]. The other methods adopt the sparse transforms such as traditional wavelets pyramid, contourlet [6], and nonsubsampled contourlet transform [6]. Compared with the spatial domain methods, multiscale decomposition domain methods are of more time complexity because of their redundancy decomposition, especially for nonsubsampled contourlet transform-based fusion approaches. On the other hand, the spatial domain methods can be introduced into the clinical application and surgery procedure because of the low complexity. Generally speaking, fusion methods based on spatial domain can be performed in real time to provide clinic doctor with realtime diagnosis in the surgery. Therefore, this paper focuses on the multimodal medical image fusion method in the spatial domain.
In the latest years, many edge-preserving are active research topic in image processing such as the bilateral filter, weighted least squares [7], guided filter [8], domain transform filter [9], and cost-volume filter [10]. Due to the fact that edgepreserving filters can avoid ringing artifacts and preserve well the edge structure information, these edge-preserving filters have already been widely used in image matching, image dehazing, image denoising, and image classification [11]. The guided filter assumes that the filtered output is a linear transformation of the guidance image. Owing to guided filter based on a local linear model, Kang [11] introduced firstly the guided filter into image fusion area in spatial domain. The domain transform filter preserves the geodesic distance between points on the curve, adaptively warping the input signal so that 1D edge-preserving filtering can be efficiently performed in linear time. The recursive filter used in the domain transform filter makes itself not effective to deal with the complex edge structure with a large amount of discontinuity area. The cost-volume filter is a discrete optical flow approach which handles both fine (small-scale) motion structure and large displacements. The cost-volume leads to generic and fast framework that is widely applicable to computer vision problems. The adaptive manifold filter [12], which has the advantages of better global diffusion and edgepreserving ability, is a real-time high dimension filter on the basis of iterative filter. Moreover, adaptive manifold filter can produce high-quality results and require less memory. In this paper, the adaptive manifold filter is firstly introduced into the images fusion area, especially the multimodal medical image fusion.

Adaptive Manifold Filter.
The adaptive manifold filter is the first high-dimensional filter for performing highdimensional filtering of images and videos in real time [13]. The adaptive manifold filter is quite flexible and capable of producing responses that approximate to either standard Gaussian filters or non-local-means filters. The process of the adaptive manifold filter can mainly be divided into three parts: the projection part, the blurring part, and the gathering part.
Let ⊂ → ⊂ be a signal associating each point from its -dimensional spatial domain to a value in itsdimensional range . With regard to gray image, and are equal to 2 and 1, respectively [14].
Then, the number of manifolds is independent of the filter dimensionality and can be generated by the following function: where is defined as a linear correction calculated from the range standard deviation and defines the height calculated from the spatial standard deviation. Let { 1 , . . . , } be the set of samples obtained by sampling using a regular grid. We refer to each as a pixel. th -dimensional adaptive manifold can be described by a graph ( , ), and the manifold value ∈ associated with pixel ∈ is defined by the evaluation of a function : → at : = ( ) [15]. When the low-pass filtering is performed over the input signal , the first manifold 1 can be generated: where * is convolution operation and ℎ ∑ is a low-pass filter with covariance matrix ∑ . Based on the first manifold 1 , Gaussian distance-weighted projection of the pixel values of the image is performed on the manifold. The projection process can be represented as where ∑ /2 is diagonal covariance matrix with size of × which controls the decay of the Gaussian kernel . Gaussian filtering is performed over each manifold mixing the values Ψ 1 from all sampling pointŝ. Mathematically, the blurred values Ψ 2 (̂) can be expressed as wherê= ( , ) and Ψ 2 is the Gaussian filtering ondimensional space. The final filter response for each pixel is generated by interpolating blurred values Ψ 2 gathered from all adaptive manifolds: where is the total number of adaptive manifolds that will be used to filter a signal and is the weight corresponding to .

Modified Local Contrast.
The contrast feature of image can evaluate the difference of the intensity value at some pixels around the neighbor pixels. The human visual system is highly sensitive to the intensity contrast rather than the intensity value itself. In general, the same intensity value looks like a different intensity value depending on intensity values of neighboring pixels. According to [16], local luminance contrast can be defined as follows: where is the local brightness of image and is the brightness of the local background. In general, is regarded as local low-frequency information of an image and is treated as local high-frequency information of an image. Hence, a proper way to select high-frequency and lowfrequency information is necessary to ensure better information interpretation. The modified spatial frequency (MSF) [17] is calculated according to the row frequency, column frequency, and diagonal frequency of the image. The larger modified spatial frequency leads to the salient features such as edges, lines, and region boundaries. Hence, the modified spatial frequency of an image can be used as the highfrequency information of the image. On the other side, the filtered result of an image by adaptive manifold filter can be used as the low-frequency information of the image. Mathematically, the modified local contrast MLC( , ) in spatial domain is given by where MSF( , ) is the modified spatial frequency of image at row and column. On the other hand, AMF( , ) is the filtered result of image ( , ) by adaptive manifold filter. The modified spatial frequency is capable of capturing the fine details presented in the image because of incorporating the diagonal frequency, the row frequency, and column frequency. The modified spatial frequency can be calculated as where the spatial frequency SF( , ) can be calculated as follows [18,19]: where and denote the number of row and column of image ( , ), respectively. The diagonal frequency DF( , ) can be expressed as Figure 1 demonstrates the schematic diagram of proposed fusion algorithm. The steps of the proposed fusion approach in this paper can be briefly summarized as the following five steps:

Summary of Fusion Method.
(1) The source medical images and are registered, respectively.
(3) The modified spatial frequency of source medical image is adopted as the high-frequency information of modified local contrast information according to (7). The modified local contrast of source images and can, respectively, be defined as MLC ( , ) and MLC ( , ) which are expressed as where and are equal to MSF ( , ) and MSF ( , ) and represent the high-frequency information of modified local contrast information, respectively.
(4) The decision map ( , ) can be expressed as follows to fuse the source multimodal medical images:   Figure 2 is the T1-MRI and GD-MRI images, respectively. Group (d) in Figure 2 is the T1-MRI and MRA, respectively. The corresponding pixels of two input images have been perfectly matched. All images have the same size of 256 × 256 pixel, with 256level gray scale. On the one hand, the proposed method is compared with some classic image fusion methods such as principal components analysis (PCA), Laplacian pyramid, Gradient pyramid, and shift invariant discrete wavelet transform (SIDWT) which are compared in many works [4,20]. On the other hand, the performance of the proposed method is compared with the modified spatial frequency of NSCT coefficients motivated PCNN method proposed by Sudeb [17] and the dual-tree complex wavelet transform method combined with the nonsubsampled direction filter bank (NSDFB) by Liu [21]. In Sudeb's scheme based on NSCT, the pyramid filter and the direction filter are set to "pyrexc" and "vk," respectively. The decomposition levels of NSCT are set to [1,2,4] in accord with [17]. The three levels of dualtree complex wavelet transform are adopted to decompose the NSDFB coefficients in Liu's method. The direction filter is set to "cd." Furthermore, the guided filter method in spatial domain proposed by Kang [11] is compared with the proposed method because the proposed fusion method is part of the spatial-based domain fusion method. In Kang's method, the source images are decomposed into a base layer and a detail layer by average filtering. The guided filteringbased weighted average technique is adopted to make full use of spatial consistency for fusion of the base and detail layers. The parameters used in [11] are directly adopted in this comparison. The filter spatial standard deviation and filter range standard deviation is set to 14 and 0.10 in the adaptive manifold filter, separately.

Mutual Information.
Mutual information (MI), proposed by Piella [22], can demonstrate how much information the fused image conveys about the reference image. The MI is defined as MI = MI + MI , where MI can be calculated by where and denote the source image ( or ) and fused image, respectively. ℎ , is the joint gray level histogram of and , ℎ and ℎ are the normalized gray level histograms of and , and is the number of bins. Hence, the larger MI value indicates that the fused image acquires more information from image and image .

Edge Based Similarity Measure.
The edge based similarity measure / [23] gives the similarity between the edges transferred in the fusion process. Mathematically, / is defined as where and represent the input image, respectively. is the fused images. The definition of and is the same and is given as where * and * are the edge strength and orientation preservation values at location ( , ) of images, respectively. * represents image or image , separately. The dynamic range for / is [0, 1] and it should be as close to 1 as possible for better fusion.  Figure 2 and the proposed method fuses more information from source images than Kang's method and Sudeb's method. In summary, the proposed algorithm can convert the more accurate and necessary information into the fused images than other several methods can. At the same time, less useless image information such as block effect and artifacts is introduced into the fused images by the presented scheme.

Objective Evaluation Analysis.
Apart from the subjective performance evaluation, objective evaluation metrics are necessary to demonstrate the differences among the fused images. Tables 1, 2  is more effective than some state-of-the-art works and four classic methods.

Conclusion
In order to improve the effect of multimodal medical image fusion method and increase diagnostic accuracy, novel and effective medical image fusion algorithm in spatial domain is presented in this paper. The modified local contrast information is proposed as the decision map to fuse the multimodal medical images. In consideration of better global diffusion and edge-preserving ability of the adaptive manifold filter, the filtered result of source images by the adaptive manifold filter is introduced as the low-frequency part.   On the other side, the modified spatial frequency of the source images is adopted as the high-frequency part. The experiment results illustrate clearly that the presented scheme is better than many other fusion methods such as guided filter method in spatial domain, NSCT-based method in transform domain, the dual-tree complex wavelet combined with the NSDFB method, and several classic image fusion methods both in subjective performance and objective evaluation.