A Novel Approach for Detail-Enhanced Exposure Fusion Using Guided Filter

In this paper we propose a novel detail-enhancing exposure fusion approach using nonlinear translation-variant filter (NTF). With the captured Standard Dynamic Range (SDR) images under different exposure settings, first the fine details are extracted based on guided filter. Next, the base layers (i.e., images obtained from NTF) across all input images are fused using multiresolution pyramid. Exposure, contrast, and saturation measures are considered to generate a mask that guides the fusion process of the base layers. Finally, the fused base layer is combined with the extracted fine details to obtain detail-enhanced fused image. The goal is to preserve details in both very dark and extremely bright regions without High Dynamic Range Image (HDRI) representation and tone mapping step. Moreover, we have demonstrated that the proposed method is also suitable for the multifocus image fusion without introducing artifacts.


Introduction
In single exposure, normal digital camera can collect limited luminance variations from the real world scene, which is termed as low dynamic range (LDR) image. To circumvent this problem, modern digital photography offers the concept of exposure time variation to capture details in very dark or extremely bright regions, which control the amount of light allowed to fall on the sensor. Different LDR images are captured to collect complete luminance variations in rapid successions at different exposure settings known as exposure bracketing. However, each exposure will handle the small portion of the luminance variation in the entire scene. Short exposure can capture details from the bright regions (i.e., highlights) and long exposure can capture details from dark regions (i.e., shadows) (see Figure 1).
In the past decade, two solutions have been proposed to handle large luminance variations present in the natural scenes. The first option is the HDR representation. To date, many HDRI representation [1,2] techniques have been proposed, which extend dynamic range by compositing differently exposed images of the same scene. HDR images generally encode intensity variations with more than 8-bits and pixel values that are proportional to the true scene radiance, transformed by a nonlinear mapping called the camera response function. Four bytes "hdr" format was developed to encode radiance maps. The second option to encode radiance map is "floating point tiff, " which uses 12 bytes to encode 79 orders of magnitude approximately. Currently used standard display devices have smaller contrast ratio (i.e., 1 : 100) and the contrast ratio of LCD monitors can reach 1 : 400. Recently developed HDR display device prototypes [3] can represent high contrast ratio (i.e., 1 : 25,000), which are still not available in the market for the routine customers. Therefore, the HDR image needs to be tone-mapped first to appear on standard display device. Various local and global tone mapping methods [2] have been proposed to display HDR images on standard display devices. Local light adaption property of human visual system (HVS) is adopted in the local operators to correspond to the visual impression that an observer had when watching the original scene, while the global operators are spatially invariant and are less effective than the local operators.
The recently proposed second option is the "exposure fusion. " The fundamental goal of the exposure fusion is to preserve details in both very dark and extremely bright regions without HDRI representation and tone mapping step. The underlying idea of various exposure fusion approaches [4][5][6][7] is based on the utilization of different local measures to generate weight map to preserve details present in the different exposures.
The present work draws inspiration from imaging techniques that combine information from two or more images captured at different exposure settings but with different goals. The block diagrammatic representation of the present detail enhanced framework is shown in Figure 2. We seek to enhance fine details in the fused image by using edge preserving filter [8]. Edge preserving filters have been utilized in several image processing applications such as edge detection [9], image enhancement, and noise reduction [10]. Recently, joint bilateral filter [11] has been proposed which is effective for detecting and reducing large artifacts such as reflections using gradient projections. More recently, anisotropic diffusion [9] has been utilized for detail enhancement in exposure fusion [12], in which texture features are used to control the contribution of pixels from the input exposures. In our approach, the guided filter is preferred over other existing approaches because the gradients present near the edges are preserved accurately. We use guided filter [8] for base layer and detail layer extractions which is more effective for enhancing texture details and reducing gradient reversal artifacts near the strong edges in the fused image. Multiresolution approach is used to fuse computed base layers across all of the input images. The detail layers extracted from input exposures are manipulated and fused separately. The final detail enhanced fused image (see Figures 3 and 4) is obtained by integrating the fused base layer and the fused detail layer. The detailed description of the proposed approach is given in the forthcoming section. It is worth pointing out that our method essentially differs from [4], which aims at enhancing the texture and contrast details in the fused image with a nonlinear edge preserving filter (i.e., the guided filter). Moreover, it is demonstrated that the proposed approach fuses the multifocus images effectively and produces the result of rich visual details.

Edge Preserving Guided Filter.
In this section, we first describe the ability of the guided filter [8] derived from local linear model to preserve edges, and then show how it avoids gradient reversal artifacts near the strong edges that may appear in fused image after detail layer enhancement. We seek to maintain the shape of strong edges in the fused image that appears due to exposure time variation across input images. Guided filter was developed by He et al. [8] in 2010 as an alternative to bilateral filter [11]. It is an edge-preserving filter where the filtering output is a local linear model between the guidance and the filter output . The selection of guidance image will depend on the application [11]. In our implementation, an input image and guidance image are identical. The output of the guided filter for a pixel is computed as a weighted averages as follows: where ( ) are pixel indexes and is the filter kernel that is a function of guidance image and independent of input image . Let be a linear transform of in a window centered at the pixel as follows: where ( , ) are the linear coefficients assumed to be constant in and calculated in a small square image window of a radius (2 +1)×(2 +1). The local linear model (2) ensures that has an edge (i.e., discontinuities) only if has an edge, because ∇ = ∇ . Here, and are computed within to minimize the following cost function: where is the regularization term on linear coefficient a for numerical stability. The significance and relation of with the bilateral kernel [11] are given in [8]. In our implementation, we use = 2 (i.e., 5 × 5 square window) and = 0.01.

4
The Scientific World Journal The linear coefficients used to minimize the cost function in (3) are determined by linear regression [15] as follows: where and 2 are the mean and variance of in , | | is the number of pixels in , and is the mean of in .
The linear coefficients and are computed for all patches in the entire image. However, a pixel is involved in all windows that contains so the value of in (2) will be different for different windows. So, after taking the average of all the possible value of , the filtered output is determined as Here, and are computed as In practice, it is found that and in (7) are varying spatially to preserve strong edges of in , that is, ∇ ≈ ∇ . Therefore, computed in (6) preserves the strongest edges in while smoothing small changes in intensity. Let ( , ) be the base layer computed from (6) (i.e., ( , ) = and 1 ≤ ≤ ) for th input image denoted by ( , ). The detail layer is defined as the difference between the guided filter output and the input image, which is defined as

Computation of Laplacian and Gaussian Pyramid.
Researchers have attempted to synthesize and manipulate the features at several spatial resolutions that avoid the introduction of seam and artifacts such as contrast reversal or black halos. In the proposed algorithm, the band-pass [13] components at different resolutions are manipulated based on weight map that determine the pixel value in the reconstructed fused base layer. The pyramid representation expresses an image as a sum of spatially band-passed images while retaining local spatial information in each band. A pyramid is created by lowpass-filtering an image 0 with a compact two-dimensional filter. The filtered image is then subsampled by removing every other pixel and every other row to obtain a reduced image 1 . This process is repeated to form a Gaussian pyramid 0 , 1 , 2 , 3 , . . . , : ( , ) = ∑∑ −1 (2 + , 2 + ) , = 1, . . . , , where (0 < < ) refers to the number of levels in the pyramid.
Expanding 1 to the same size as 0 and subtracting yields the band-passed image 0 . A Laplacian pyramid 0 , 1 , 2 , . . . , −1 , can be built containing band-passed images of decreasing size and spatial frequency.
where the expanded image +1 is given by The original image can be reconstructed from the expanded band-pass images: The Gaussian pyramid contains low-passed versions of the original 0 , at progressively lower spatial frequencies. This effect is clearly seen when the Gaussian pyramid "levels" are expanded to the same size as 0 . The Laplacian pyramid consists of band-passed copies of 0 . Each Laplacian level contains the "edges" of a certain size and spans approximately an octave in spatial frequency.

Base Layer Fusion Based on Multiresolution
The ( , ) that contains well exposed pixels is reconstructed by expanding each level and then summing all the levels of the Laplacian pyramid: (8) across all the input exposures are linearly combined to produce fused detail layer ( , ) that yields combined texture information as follows:

Detail Layer Fusion and Manipulation. The detail layers computed in
The Scientific World Journal  where is the user defined parameter to control amplification of texture details (typically set to 5) and (⋅) is the nonlinear function to achieve detail enhancement while reducing noise and artifacts near strong edges due to overenhancement. We follow the approach of [10] to reduce noise across all detail layers. The nonlinear function (⋅) is defined as where is a smooth step function equal to 0 if ( , ) is less than 1% of the maximum intensity, 1 if it is more than 2%, with a smooth transition in between, and the parameter is used to control contrast in the detail layers. We have found that = 0.2 is a good default setting for all experiments. Finally, the detail enhanced fused image ( , ) is easily computed by simply adding up the fused base layer ( , ) computed in (14) and the manipulated fused detail layer ( , ) in (15) as follows: Figures 1, 3, and 4 depict examples of fused images from the multiexposure images. It is noticed that the proposed approach enhances texture details while preventing halos near strong edges. As shown in Figure 1(b), the details from all of the input images are perfectly combined and none of the four input exposures (see Figure 1(a)) reveals fine textures on the chair that are present in the fused image. In Figures 3(a)-3(d), we compare our results to the recently proposed approaches. Figures 3(a) and 3(b) show the fusion results using the multiresolution pyramid based approach. The result of Mertens et al. [4] (see Figure 3(a)) appears blurry and loses texture details while in our results (see Figure 3(d)) the wall texture and painting on the window glass are emphasized which are difficult to be visible in Figure 3(a). Clearly, this is suboptimal as it removes Pixel-to-pixel correlations by subtracting a low-pass filtered copy of the image from the image itself to generate a Laplacian pyramid and the result is a texture and edge details reduction in the fused image. Figure 3 the results using pyramid approach [13] which reveals many details but losses contrast and color information. Generalized random walks based exposure fusion is shown in Figure 3(c) which depicts less texture and color details in brightly illuminated regions (i.e., lamp and window glass). Note that Figure 3(d) retains colors, sharp edges, and details while also maintaining an overall reduction in high frequency artifacts near strong edges. Figure 4 shows our results for different image sequences captured at variable exposure settings (see Figure 4(a), Hermes; Figure 4(b), Chairs; and Figure 4(c), Syn (input images are courtesy of Jacques Joffre and Shree Nayar)). Note that, the strong edges and fine texture details are accurately preserved in the fused image without introducing halo artifacts. The halo artifacts will stand out if the detail layer undergoes a substantial boost.

Comparison with Other Exposure Fusion Methods.
Moreover, in Figure 5, it is demonstrated that the proposed method is also suitable for multifocus image fusion to yield rich contrast. As illustrated in Figure 5(c), the edges and textures are relatively better than those of input images. Because our approach excludes fine textures from the base layers, we can significantly preserve and enhance fine details separately. However, multiresolution pyramid approach can be accurately used for retaining strong edges and texture details enhancement in multifocus image fusion problem.  [4] on the Cathedral sequence. We accepted the default parameter settings suggested by the different edge preserving filters [9,11,14]. Figures 6(c) and 6(f) show, respectively, the fusion results using the anisotropic diffusion [9] based approach and the multiresolution pyramid based exposure fusion approach [4], which are both clearly close to the results obtained using guided filter (see Figure 6(g)), but overall, they yield less texture and edge details. The texture detail enhancement using bilateral filter [11] and weighted least square filter [14] shown in Figures 6(d) and 6(e), respectively, depicts overenhancement near strong edges and less color details. As shown in the close-up view in Figure 6(g), the proposed method based on guided filter can enhance the image texture details while preserving the strong edges without over enhancement.

Analysis of Free Parameters and Fusion Performance
Metrics. To analyze the effect of epsilon, gamma, and window size on quality score (Qabf) [16], entropy, and visual information fidelity for fusion (VIFF) [17], we have illustrated three plots (see Figures 7(a)-7(c), resp.) for input image sequence of "Cathedral. " To assess the effect of epsilon, gamma, and window size on fusion performance, the Qabf, entropy, and VIFF were adopted in all experiments executed on a PC with 2.2 GHz i5 processor and 2 GB of RAM. VIFF [17] first decomposes the source and fused images into blocks. Then, VIFF utilizes the models in VIF (GSM model, distortion model, and HVS model) to capture visual information from the two source-fused pairs. With the help of an effective visual information index, VIFF measures the effective visual information of the fusion in all blocks in each subband. Finally, the assessment result is calculated by integrating all the information in each subband. Qabf [16] evaluates the amount of edge information transferred from input images to the fused image. A Sobel operator is applied to yield the edge strength and orientation information for each pixel. First, to analyze the effect of on Qabf, entropy, and VIFF, the square window parameter ( ) and texture amplification parameter ( ) were set to 2 and 5, respectively. As shown in Figure 7(a), the quality score and entropy decreases as increases and VIFF increases as increases. It should be noticed in Figure 7(b) that the VIFF and entropy increase as increases and Qabf decreases as increases. It is preferred to have a small filter size ( ) to reduce computational time. In the analysis of , the other parameters are set to = 0.01 and = 5. The visual inspection of effect of on "Cathedral" sequence is depicted in Figure 8. It can easily be noticed (see Figures 8(a)-8(c)) that as increases, the strong edges and textures get overenhanced and therefore leads to artifacts. To analyze the influence of , it should be noticed that entropy and Qabf decrease as increases and VIFF increases as increases. In order to obtain optimal detail enhancement and low computational time, we have concluded that the best results were obtained with = 0.01, = 5, and = 2, which yield reasonably good results for all cases.

Conclusions
We proposed a method to construct a detail enhanced image from a set of multiexposure images by using a multiresolution decomposition technique. When compared with the existing 8 The Scientific World Journal techniques which use multiresolution and single resolution analysis for exposure fusion, the current proposed method performs better in terms of enhancement of texture details in the fused image. The framework is inspired by the edgepreserving property of guided filter that has better response near strong edges. The two layer decomposition based on guided filter is used to extract fine textures for detail enhancement. Moreover, we have demonstrated that the present method can also be applied to fuse multifocus images (i.e., images focused on different targets). More importantly, the information in the resultant image can be controlled with the help of the proposed free parameters.