Weighted Nuclear Norm Minimization Based Tongue Specular Reflection Removal

In computational tongue diagnosis, specular reflection is generally inevitable in tongue image acquisition, which has adverse impact on the feature extraction and tends to degrade the diagnosis performance. In this paper, we proposed a two-stage (i.e., the detection and inpainting pipeline) approach to address this issue: (i) by considering both highlight reflection and subreflection areas, a superpixel-based segmentation method was adopted for the detection of the specular reflection areas; (ii) by extending the weighted nuclear normminimization (WNNM) model, a nonlocal inpainting method is proposed for specular reflection removal. Experimental results on synthetic and real images show that the proposed method is accurate in detecting the specular reflection areas and is effective in restoring tongue image with more natural texture information of tongue body.


Introduction
In traditional Chinese medicine (TCM), the practitioners observe the color, shape, texture, and coating characteristics of tongue to evaluate the healthy condition of a person.Because of its convenience and effectiveness, for thousands of years, traditional tongue diagnosis has been very popular in the countries of East Asia, especially in China, Korea, Japan, and so forth [1][2][3].
Traditional tongue diagnosis, however, is a skill that requires years of training to master, and the diagnosis result is highly dependent on the practitioners' personal experience.That is, for a specific patient, the diagnosis results given by several practitioners may be distinctly different.In summary, the limitations of traditional tongue diagnosis greatly restrict its applications in modern medicine.
However, because of the saliva on the tongue body, specular reflection generally is inevitable for the existing tongue image acquisition systems [19].Figure 1 shows one typical semiclosed system for tongue image acquisition [10] and several tongue images with specular reflections.The specular reflection on the tongue image would adversely affect many tongue texture analysis results, for example, tongue fur, tongue texture, and tongue color detection.To alleviate the effect of specular reflection, one may develop some approaches to improve the robustness of the existing feature extraction and classification methods, but the natural strategy is to detect and repair the specular reflection areas (i.e., the detection and inpainting pipeline).
By far, several methods have been proposed for the detection and repairing of the specular reflection areas in tongue images.These methods, however, are limited in both detection and inpainting.In the detection stage, as shown in Figure 2, a typical specular reflection region in  tongue image includes both the highlight reflections and the subreflections, while the existing detection methods usually only considered the highlight reflections [16,20,21] or just adopted some trivial strategies (e.g., morphological operators) to cope with the subreflections [21].In the repairing stage, bilinear interpolation [20] and totalvariation-(TV-) based methods [21] were applied for inpainting the tongue specular reflection regions.These methods, however, only considered the local smoothness of the image and were not effective when the reflection areas are large.
In this paper, we adopted the detection and inpainting pipeline and proposed a novel method for the removal of specular reflection areas in tongue image.In the detection stage, adaptive thresholding and superpixel segmentation methods were applied for the highlight reflections and subreflections detection.Referring to the location of the initial highlight reflections obtained via thresholding, subreflections were defined as the surrounding superpixels with lower illumination than that of normal pixels of tongue body.The highlight reflections together with the subreflections were regarded as the final detection result of specular reflection areas.
In the inpainting stage, based on the homogenous property of tongue image, we proposed a nonlocal inpainting approach, that is, weighted nuclear norm minimization-(WNNM-) based tongue specular reflections removal.For a single small patch of a tongue image, there were many nonlocal similar patches across the whole image.If the patch and its nonlocal similar ones are stacked into a matrix, it was reasonable to assume that the matrix should be of low rank.Thus by solving a low rank matrix completion problem with WNNM framework, we could fill the specular reflection areas with the texture information of tongue.
This paper focused on the tongue image specular reflections removal, whose contributions are of twofold: (i) detection of highlight reflection and subreflection areas, and (ii) WNNM-based inpainting.First, we analyzed the adverse impact of subreflections, which to the best of our knowledge receive little attention in tongue specular reflection removal.To address this, we proposed a superpixel based specular reflection detection method which can effectively detect the highlight reflection together with subreflection areas, while affecting little the normal pixels around the subreflection areas.Second, we proposed a WNNM-based nonlocal inpainting method for the removal of specular reflections.The proposed method converts the inpainting problem to an unconstrained optimization problem which could be iteratively solved by using standard WNNM.Compared with other inpainting approaches, our method can obtain more natural textures information of tongue, especially for large reflection area, which could promote both the PSNR value and the visual effect of the restored tongue image.
The remainder of the paper is organized as follows: Section 2 describes the superpixel based specular reflections detection method.Section 3 presents the WNNM-based nonlocal inpainting method for specular reflection removal of tongue image.The experimental results are provided in Section 4, and finally Section 5 concludes this paper.

Tongue Image Specular Reflection Detection
In this section, we firstly described the characteristics of specular reflections of tongue image and then proposed our superpixel-based specular reflections detection method.
Figure 2(a) shows a typical example of tongue image with the specular reflection areas, while its enlarged subimage is shown in Figure 2(b).As shown in Figure 2(b), the pixels of highlight reflection areas always have higher illumination and less saturation values than other pixels, while the subreflections consist of many abnormal dark pixels with lower illumination around the highlight reflection areas.In [21], Jia et al. proposed to use morphological dilation to detect the subreflection pixels around the highlight reflection area.However, for large area of highlight reflection, because the surrounding abnormal pixels of subreflection area are not uniform, the morphological dilation often ended with inaccurate detection result by including much more normal pixels of tongue body, as shown in Figures 5(c) and 5(d).
To solve this problem, we proposed a two-stage method for the detection of specular reflection areas.First, utilizing hue-saturation-illumination (HSI) color model, an initial detection result was obtained for highlight reflection area.Then, the subreflection areas were further detected via superpixel segmentation.

Detection of Highlight Reflection Area.
As for our observation, the highlight reflection areas of tongue image often have higher illumination values and less saturation values.Thus, it is natural to transform the tongue image into the HSI color space, and thus the hue and saturation were adopted to describe color.The following equations were adopted to convert the tongue image into the HSI color space: where  = cos −1 {(1/2)[( − ) + ( − )]/ √( − ) 2 + ( − )( − )}, , , and  are the three channels of RGB color space, respectively.
Figure 3(a) shows a typical tongue image with specular reflection, while Figures 3(b) and 3(c) show the  channel and the  channel of Figure 3(a) in the HSI color space, respectively.For each channel, a threshold was used to check whether a pixel should belong to the highlight reflection area or not.The thresholds   for  channel and   for the  channel were adaptively obtained by where  max ,  min , and  mean denote, respectively, the maximum, minimum, and mean value of all illumination values in a tongue image, and  max ,  min , and  mean denote, respectively, the maximum, minimum, and mean value of all saturation values in the tongue image.Finally, the highlight reflection areas image HR could be obtained by using the following criterion: HR (, ) where (, ) and (, ) denote the  channel and  channel of pixel (, ), respectively.Figure 3(d) shows an example of the detection result of highlight reflection areas in a typical tongue image.

Superpixel Based Detection of Subreflection Area.
For a tongue image, by applying oversegmentation, a large number of small regions, that is, superpixels, were obtained.Subreflections were defined as the superpixels that surround the highlight reflection and which had lower illumination values than that of normal pixels of tongue body.Utilizing the isotropic characteristics of superpixels [22,23], we could naturally find the subreflections and avoid bringing in much normal pixels of tongue body.Superpixels can be obtained by oversegmenting a tongue image with any reasonable existing segmentation algorithms.In this paper, a graph-based segmentation method [22] was adopted.For each tongue image, an undirected graph was defined with the pixels/regions as nodes, connected by the edges of its neighborhood.A nonnegative weight () was used to edge  to measure the dissimilarity of its corresponding nodes.In the beginning, each pixel was a node.The graph-based segmentation algorithm gradually merged similar regions/nodes to the same superpixel.The merging process was motivated by the internal variation int() of region  defined as where || is the size of region , MST() is the minimum spanning tree of , and  is a nonnegative parameter related to the size or the numbers of superpixels.For any two different regions  1 and  2 , there will be at least one edge between them, with its weight higher than Int( 1 ) and Int( 2 ) (if there is no edge between the two regions, the edge weight is regarded as +∞).Otherwise, the two regions were merged into one new region, and the internal variation of this compound region was updated.Finally, for a tongue image, we could obtain a series of superpixels.For the detail of this algorithm, please refer to [22].
After oversegmentation, we could obtain many superpixels of tongue image which could be roughly grouped into three categories, highlight reflections superpixels, subreflection superpixels, and normal tongue body superpixels.As the subreflections were commonly around the highlight reflection, we could use the following strategy to locate them.First, as shown in Figure 4, with the initial highlight reflection detected in Section 2.1, we could obtain an interest area, a rectangle box with green border, and a circle band with red dash line border around each highlight reflection using morphological dilation operation.Then, for each superpixel in the circle band, we determinate its category with superpixel where  sp means pixels value of single superpixel in the band,  tb is the mean value of normal pixels of tongue body in the interest area, and  min and  max are two nonnegative parameters which were empirically set to 1 and 1.15 in this paper.The width of the circle band was adaptively adjusted according to the size of highlight reflection, while for large highlight reflection the width of corresponding circle was wider.By classifying all the candidate superpixels in the circle  band using Formula (5), we could detect all the subreflections and update the highlight reflections at the same time.
Parameter  in ( 4) is one parameter that indirectly affects the granularity of the final segmentation.With a larger , it usually leads to larger regions but also reduces a higher likelihood of missing segmentation boundaries, while a smaller  often leads to a consistent oversegmentation.Figure 5 shows the detection results by setting  to 0.01, 0.05, 0.10, and 0.15, respectively.The mask map in Figure 5(b) can cover the highlight reflections and subreflections, while excluding the normal pixels around subreflections.In Figure 5(a), the subreflections areas are not detected, while in Figures 5(c Finally, the highlight reflections and subreflections were combined together as the final detection results of specular reflection areas for inpainting.

The WNNM-Based Nonlocal Inpainting Method
In this section, we proposed an examplar-based inpainting method, that is, WNNM-based inpainting, for tongue specular reflection removal.The examplar-based methods can be traced back to 1999 [24] and had been widely adopted for image inpainting [25][26][27][28][29][30][31].The arising of nonlocal selfsimilarity methods [32] further triggered the development of examplar-based inpainting approaches.Most existing methods, however, fill the holes based on the best matched patch or the mean of the nonlocal similar patches, while the second-order information usually is neglected in inpainting.
To remedy this, a low rank based inpainting model [33] is proposed to synthesize the missing regions, where tensor trace norm is adopted as the low rank regularizer.Besides, the WNNM model [34] generally is superior to other low rank models, for example, trace norm or nuclear norm, for image denoising.This motivates us to employ WNNM for tongue specular reflection removal, resulting in the proposed WNNM-based inpainting model.Given a small patch on the tongue image, there are many nonlocal similar patches across the whole image.If we stretch each patch to a vector and stack all the vectors in a matrix, it is intuitive that the matrix should be of low rank.Based on this assumption, the tongue image inpainting work could be regarded as a low rank matrix completion problem.By utilizing a newly proposed weighted nuclear norm model, we further showed that tongue image inpainting could be well solved by iteratively performing weighted nuclear norm minimization (WNNM).
In the following subsection, a brief review on matrix completion was firstly introduced.Then, WNNM was described.Finally, the WNNM-based nonlocal inpainting method was proposed for specular reflections removal of tongue image.

Matrix Completion.
Matrix completion aims to recover a low rank matrix from incomplete samples of its entries, which has received considerable attention in many areas of engineering and science [35,36], for example, the well-known Netflix problem [37].Matrix completion can be cast as the following minimization problem: where  ∈ R × is recovered from  ∈ R × , Ω denotes the set of position of known entries in matrix , and  Ω (⋅) is a linear operator with the following definition: for any matrix The model in ( 6) is nonconvex and is NP-hard problem which is nontrivial to solve.Therefore, convex relaxation of (6) usually was adopted with the following formulation: where ‖‖ * = ∑    () is the nuclear norm of , and   () are the th singular values of .The problem in ( 8) can be approximated by the following unconstrained convex minimization problem: where ‖ ⋅ ‖  denotes the Frobenius norm, and  is a tradeoff parameter.When  goes infinite, the solution of (9) will be the same as that of (8).As an unconstrained optimization problem, (9) could be solved by the iterative shrinkage algorithm [38] and the APG method [39].

Weighted Nuclear Norm Minimization.
The iterative shrinkage algorithm of the model in ( 9) usually involves solving the nuclear norm minimization (NNM) problem as where , and  > 0. Cai et al. [40] proved that the NNM problem of ( 10) can be easily solved by the singular value thresholding (SVT) method; that is, where  +1 is the solution to (10),  * = Σ  is the singular value decomposition (SVD) of  * , Σ = diag( 1 ,  2 , . . .,   ) is the diagonal matrix with singular values, and S  (Σ) is the soft-thresholding function on Σ as Recently, Gu et al. [34] suggested that, as the softthresholding operator S  (Σ) shrinks each singular value with the same  in order to pursue the convex property, it ignores the prior knowledge about the singular values.Compared with the small singular values, the larger ones are generally associated with the major projection orientation of the matrix in the lower subspace and they should be shrunk less to preserve the major data components.Thus, they extended the standard nuclear norm to the weighted nuclear norm: where w = [ 1 , . . .,   ] and   ≥ 0 is a nonnegative and assigned to the singular value   () and the weighted nuclear norm minimization (WNNM) problem could be formulated as Generally, the model in ( 14) is not convex, but Xie et al. [41] further proved that if the weights, assigned to the singular vales, are arranged in ascending orders, the globally optimal solution could be obtained with the following theorem.
Theorem 1 (see [41]).If the weights w satisfy 0 ≤  1 ≤  2 ≤ ⋅ ⋅ ⋅ ≤   , then the WNNM problem ( 16) has a globally optimal solution: where  * = Σ  denotes the SVD of matrix  * , Σ is the diagonal matrix with singular values arranged in descending order; that is,  1 ≥  2 ≥ ⋅ ⋅ ⋅ ≥   , and S  (Σ) is the softthresholding operator: WNNM has been applied to image denoising and achieved better results than state of the arts, such as LSSC [42] and BM3D [43], by PSNR and SSIM values.Moreover, WNNM is effective in preserving the local structures of images and generating less visual artifacts.

Tongue Image
Inpainting by Using WNNM.In this section, we proposed a WNNM-based nonlocal inpainting method for repairing tongue image specular reflection areas.First, the WNNM-based inpainting model was introduced to utilize the nonlocal information of tongue image.Then, we described the proposed optimization algorithm by iteratively solving a series of standard WNNM problems.Finally, we analyzed the convergence of our algorithm.
Given a tongue image  with specular reflection Ω, the proposed WNNM-based tongue inpainting model is formulated as where  is the inpainted image, Ω denotes the nonreflection area, and  Ω is a linear projection operator defined in (7).The model in ( 17) can be obtained by substituting the standard nuclear norm with weighted nuclear norm in (9).Because of the introduction of the projection operator  Ω , the model in ( 17) cannot be directly solved by WNNM algorithm in [34].Thus, we adopt the following iterative shrinkage algorithm.In each iteration, we consider the following surrogate function [44] of () at a given point   : where (  ) = (1/2)‖ Ω (  ) − ‖ 2  and   is the Lipschitz constant of ∇(  ).(,   ) satisfies that (i) (,   ) ≥ () for any  and (ii) (  ,   ) = (  ).By ignoring the constant term, the minimization of ( 18) can be formulated as where   =   −(1/  )  Ω ( Ω (  )−).Equation ( 19) has the same form of ( 14), which could be solved by using WNNM.Finally, we can get the minimizer of ( 17) by iteratively solving (19) with proximal gradient as described in Algorithm 1.
There are many repeated patterns in the tongue image which are useful for tongue image inpainting.In order to take advantage of this nonlocal information, we split the tongue image into many 5 × 5 pixels patches.For a single patch   , many nonlocal similar patches could be found across the whole image.Stretching each similar patch to a vector and stacking them into a matrix   , it is intuitively that   should be of low rank.By using (19), we could get the estimation of   as which could be solved by the generalized soft-thresholding method in Theorem 1. Specifically, the weight   assigned to the th singular value,   (  ), of   as   = √/(  (  ) + ), where  > 0 is a constant,  is the number of similar patches in   , and  = 10 −6 is to avoid dividing by zero.
In this paper  is set to 10 −3 ,  is set to 0.1,   and  are set to 2 and 5, respectively.After that, the patch   can be inpainted by finding the most similar one to it in Ẑ .Finally, by aggregating all the patches p together, the whole image  can be updated.

Convergence of the WNNM-Based Nonlocal Inpainting
Method.In this section, we proved the convergence of WNNM-based inpainting method which could be summarized as in the following theorem.
From Theorem 2, the proposed algorithm can guarantee to decrease the loss function along with iterations until convergence to a fixed point.

Experimental Results
In this section, we validated the effectiveness of the proposed method for tongue image inpainting.For better evaluation, both synthetic images and real images were used in the experiment.The inpainting results of the TV-based method [21] were used for comparison.

Experimental Results on Synthetic Images.
In this section, we quantitatively compared the performance of the competing methods on synthetic images.The synthetic images were constructed as follows: we first detect the specular reflections of one real tongue image and then randomly put these reflections onto nonreflection tongue images to synthesize new images for inpainting.The benefit was that, on one hand, we could ensure that the topological structure of reflections was as natural as real ones.On the other hand, we could conveniently calculate the peak signal to noise ratio (PSNR) since we had the ground truth.
We used two ways to detect the highlight reflections of tongue image.One was the morphological dilation based method in [20] and the other was our superpixel based method.Synthetic images with both kinds of reflections were used in the experiments of WNNM-based inpainting and TV-based inpainting, respectively.
Figure 7 shows the comparison experimental results of competing methods on synthetic tongue images with reflections detected by morphological dilation based method, while Figure 8 shows the results on the synthetic tongue images with reflections detected by the proposed superpixel based method.
From Figures 7, 8, and 9, we could find that for the same synthetic tongue image, in terms of PSNR, the proposed inpainting method obtained better result than the TV-based inpainting method.Furthermore, the TV-based inpainting method usually finds the pixels via minimizing total variation value, while the proposed method usually finds the similar pixels to the area around the reflections.Thus, compared with the TV-based method, the proposed method is much better in handling the case of tongue image with large reflection areas by involving more texture information, as shown in Figure 9.In Figure 9(b), we manually add a block to original tongue image to simulate the large reflection area.It is easy to find that the inpainting result of the proposed method is more similar to the ground truth, which is much better than the TV-based method by involving more texture information of tongue body.

Experimental Results on Real Tongue Images.
For better evaluation of the proposed inpainting method, we used two real tongue images with serious specular reflection areas to validate its performance.The experimental inpainting results of three real tongue images are shown in Figure 10.Moreover, on the lower-right of each image of Figures 10(b) and 10(c), we further showed the enlarged subimages of the inpainting images.From Figure 10, one could observe that the proposed method would achieve satisfactory visual results with much texture information of tongue image while part of horizontal or vertical discontinuity can be observed from the results obtained by the TV-based method.
Based on the results on synthetic and real tongue images, we showed that the proposed method was better than TVbased method, either in terms of PSNR or in the visual results.The proposed method filled the reflection area with more texture information of tongue image, which was useful in the tongue texture analysis applications.Compared with TVbased inpainting method, the proposed method was more suitable for the specular reflection removal of tongue images.

Conclusion
In this paper, we proposed a WNNM-based nonlocal inpainting method for specular reflections removal of tongue image.In the proposed method, superpixel segmentation was adopted to handle the specular reflections detection task.Then, based on the nonlocal self-similarity, we transformed the problem of specular reflections inpainting of tongue image to a matrix completion problem and further proved the convergence of the proposed WNNM-based tongue image inpainting method.We evaluated the performance of the proposed approach by comparing with the TV-based tongue image inpainting method.The comparison experimental results on both synthetic tongue images and real tongue image showed that the proposed method could achieve not only higher PSNR, but also more satisfactory visual effects, which could involve more texture information of tongue body, and was more suitable for the specular reflections removal of tongue images.

Figure 1 :
Figure 1: Tongue image acquirement device and typical tongue images with specular reflections.

Figure 2 :
Figure 2: A tongue image with specular reflection area: (a) one of the highlight areas is marked by the green box and (b) the enlarged version of the subimage in the green box.It is observed that the pixels along the highlight area are usually darker than the usual pixels.

Figure 3 :
Figure 3: Tongue image in the HSI color space: (a) the typical tongue image, (b)  channel of tongue image, (c)  channel of tongue image, and (d) the initial detection of reflection.

Figure 4 :Figure 5 :
Figure 4: Sketch map of subreflection detection: (a) tongue image with specular reflections where green box was the select interest area, and (b) the enlarged interest area in (a) where the candidate superpixels were searched in the circle band with red dash line border.

Figure 6 :
Figure 6: Comparison of specular reflection area detection results: (a) the original tongue image, (b) the detection result by using the proposed superpixel-based method, (c) the detection result of morphological dilation method by using disk-shaped structuring element with 1 × 1 pixel, and (d) 5 × 5 pixels.
) and 5(d) many normal pixels are misclassified to the subreflection area.Our experiments show that, for most of tongue images, satisfactory subreflection detection results are obtained with  ∈ [0.05, 0.07].
Figure 6(b) shows the final detection result of the specular reflection area of Figure 6(a).For comparison, Figures 6(c) and 6(d) show the results of morphological dilation based method [21] by using diskshaped structuring element with 1 × 1 pixel and 5 × 5 pixels, respectively.Compared with the enlarged subimages in Figures 6(b) with 6(c) and 6(d), we can see that the mask map in Figure 6(c) does not cover all the subreflections, especially in the part inside the red circle, while the mask map in Figure 6(d) eroded too much normal pixels.Generally, the proposed superpixel-based method could accurately detect the highlight reflection and subreflection areas while compared with morphological dilation.

Figure 7 :
Figure 7: Comparision of inpainting results of synthetic tongue images with reflection detected by morphological dilation based method.

Figure 8 :
Figure 8: Comparison of inpainting results of synthetic tongue images with reflection detected by superpixel based method.

Figure 9 :
Figure 9: Comparison of inpainting result on tongue image with large reflection area.
(a) Original tongue images with specular reflections (b) Inpainting results of our WNNM-based method (c) Inpainting results of the TV-based method

Figure 10 :
Figure 10: Comparison of inpainting results of WNNM-based and TV-based method.