A Simple Framework for Face Photo-Sketch Synthesis

This paper proposes a simple framework for face photo-sketch synthesis. We first describe the shadow details on faces and extract the prominent facial feature by two-scale decomposition using bilateral filtering. Then, we enhance the hair and some unapparent facial feature regions by combining the edge map and hair color similarity map. Finally, we obtain the face photo sketch by adding the results of the two processes. Compared with current methods, the proposed framework demands non feature localization, training or iteration process, creating vivid hair in sketch synthesis, and process arbitrary lighting conditions of input images, especially for complex self-shadows. And more importantly, it can be easily expanded to natural scene. The effectiveness of the presented framework is evaluated on a variety of databases.


Introduction
Face sketching is a simple yet expressive representation of faces.It depicts a concise sketch of a face that captures the most essential perceptual information with a number of strokes 1 .It has useful applications for both digital entertainment and law enforcement.
In recent years, two kinds of representative methods for computer-based face sketching have been presented: 1 line drawing 1-5 and 2 Eigen transformation based 6, 7 .Line drawing-based methods are expressive to convey 3D shading information, at the cost of losing sketch texture.The performances of these approaches largely depend on the shape extraction and facial feature analysis algorithms, such as active appearance model 5 .Other line drawing methods use compositional and-or graph representation 8, 9 or the direct combined model 10 to generate face photo cartoon.Eigen-transformation-based approaches use complex mathematical models to synthesize face sketches, such as PCA, LDA, E-HMM, and MRF.Gao et al. 6 use an embedded hidden Markov model and a selective ensemble strategy to synthesize sketches from photos.However, the hair region is excluded in PCA/LDA/E-HMM-based methods 11 .Wang and Tang 2 use a multiscale Markov random fields MRFs model to synthesize face photo-sketch and recognize it.The face region is first divided into overlapping patches for learning, and the size of the patches decides the scale of local face structures to be learned.From a training set, then the joint photo-sketch model is learned at multiple scales using a multiscale MRF model.This method requires modeling both face shape and texture and can provide more texture information.However, the current approaches have three disadvantages: 1 both line drawing and Eigen-transformation-based methods require complex computing; 2 human face sketch is unnecessarily exaggerated, which depicts the facial feature with distortion; 3 most of the existing methods can only sketch the human face but fail to apply to natural scene images.
In this paper, we present a novel and simple face photo-sketch synthesis framework.The hair is synthesized using a two-scale decomposition and a color similarity map.The proposed framework is very simple, without any iteration, or facial feature extraction.Specially, the proposed method can easily be applied to natural scene for sketching.
A schematic overview of our framework is shown in Figure 1.Firstly, for an input face image, a two-scale image decomposition by bilateral filtering is used to describe the shading texture and the prominent feature shape, while the color similarity map-based hair creating can generate the hair texture and the unapparent facial feature.Then, the edge map is computed by edge detector from skin color similarity map.As a result, the hair, eyes, and thus mouth regions are enhanced by multiplying the edge map and hair color similarity map.Finally, the face photo sketch is synthesized by combining the results of former two processes using addition operation.

Bilateral Filter
The bilateral filter is an edge-preserving filter developed by Tomasi and Manduchi 12 .It is a normalized convolution in which the weighting for each pixel p is determined by the spatial distance from the center pixel q, as well as its relative difference in intensity.The spatial and intensity weighting functions f and g are typically Gaussian 13, 14 .The spatial kernel increases the weight of pixels that are spatially close, and the weight in the intensity domain decreases the weight of pixels with large intensity differences.Therefore, bilateral filter effectively blurs an image while keeping sharp edges intact.For input image I, output image J, and a window Ω neighboring to q, the bilateral filtering is defined as follows: where σ s and σ r are the size of spatial kernel and the range kernel, corresponding to the Gaussian functions f and g.When σ s increases, the larger features in image will be smoothed; when σ r increases, the bilateral filter will become closer to Gaussian blur.

Facial Feature Detail Detection by Two-Scale Image Decomposition
Multiscale image decomposition or multiscale retinex, MSR is developed by Jobson et al. [15][16][17] , in an attempt to bridge the gap between images and the human observation of scenes.It is widely used in HDR image rendering for color reproduction and contrast reduction 18, 19 , color enhancement, and color constancy processing.In HDR image rendition, the smallest scale is strong on detail and dynamic range compression but weak on tonal and color rendition.The reverse is true for the largest spatial scale.Multiscale retinex combines the strengths of each scale and mitigates the weaknesses of each 15 .
Durand and Dorsey 19 only use two-scale decomposition to decompose the input image into a "base" and a "detail" image.The base layer has its contrast reduced and contains only large-scale intensity variations, which is obtained by bilateral filtering.The detail layer is the division of the input intensity by the base layer, while the magnitude is unchanged, thus preserving detail.
Then, we will describe how two-scale decomposition can be used for lighting conditions and facial feature detail detection in face photo-sketch synthesis.According to image decomposition, we can get the detail image by subtracting the base layer from the input image.And the detail image can preserve the important details of the input image, such as edges, texture, and shadows, depending on the smoothing degree.In theory, the input image is smoothed more heavily; the details will be preserved more.Figures 2 and 3 illustrate the phenomenon.
On the other hand, facial feature and shadow details are very important to face photo sketch.The difference between sketches and photos mainly lies in two aspects: texture and shape, which are often exaggerated by the artist in sketch.The texture contains hair texture and shadow texture 2 .In this paper, we perform good shading effects near the interest features from detail images by two-scale image decomposition, proposed by Durand and Dorsey 19 .And the shape of obvious facial features is also obtained from the detail images.
The two-scale decomposition is performed on the logs of pixel intensities using piecewise-linear bilateral filtering and subsampling.On the one hand, the use of logs of intensities is because image can be considered as a product of reflectance and illumination component.So the decomposition can be viewed as an image separated into intrinsic layers of reflectance and illumination 20-22 , while base layer corresponds to illumination component and detail corresponds to reflectance 23 .Therefore, we can obtain the facial feature from the detail layer because of the distinct reflectance difference in facial feature and skin region.In fact, human vision is mostly sensitive to the reflectance rather than to the illumination conditions.Even more, the logarithm function deals with the low intensity far better than those high-intensity pixels because of its function character.On the other hand, the piecewise bilateral-linear filtering in the intensity domain and a subsampling in the spatial domain can efficiently accelerate bilateral filtering.
As is described in 2.1 , when the scale σ s of the spatial kernel or/and the scale σ r of intensity domain increases, the input image will be smoothed more.Although the scale σ s of the spatial kernel has little influence on the result, it plays an important role in facial feature detection.Several conclusions can be observed.1 For the same small scale σ r of intensity domain, the increase of scale σ s of the spatial domain will smooth more pixels near the edges of the facial feature, while it results in the heavier shadow in detail layer without change in edges.The results are shown in Figure 2 by using different spatial scales. 2 For the same scale σ s of spatial domain, the increase of scale σ r of the intensity domain will directly smooth the input image more than 1 , because the larger intensity scale will smooth more pixels on the edges 24, 25 .Figure 3 illustrates the different results with the intensity-scale changes.
Further research demonstrates that facial feature, such as hair, eyebrows, eyes, mouth, and nose, which have low intensity than skin, can be obtained in detail image by two-scale decomposition using piecewise bilateral filtering.With the small scale σ r in intensity domain and the larger scale σ s in spatial domain, we can get more reflectance component responses of the low contrast area, such as the shadow near the nose and the chin, as shown in Figure 2.While using the small scale σ s in spatial domain and larger scale σ r of intensity domain, lower intensity pixel in small area regions can appear in detail image, including eyes and eyebrows, as shown in Figure 3.When the scales in both intensity and spatial domain are set to be largest values, we can get the lowest intensity region, such as black hair, as shown in Figure 4.However, it is noticeable that this works only for the dark hair with fuscous color.To create good hair for arbitrary color, we propose a new method in Section 3. In addition,  because of the highlight on some human mouth, especially when its color and intensity are similar to skin on human face, the mouth will not be extracted at this step, and unapparent mouth extraction is dealt with in Section 3.
In this paper, we define the two-scale decomposition of input image as follows 16 :  R MSR i is the ith color component of the MSR output, i ∈ R, G, B , N is the number of scales, w n is the weighting factor for the nth scale, I i x, y is the image distribution in the ith color band, " * " denotes the convolution operation, and F n x, y is the weighting function in the nth bilateral filtering, that is, the F n x, y is given by,

2.4
So the base image is the output of bilateral filtering, and the detail image D is D x, y min R MSR i x, y .

2.5
As a conclusion, to ensure the speed and effect, we set the relative scale in two-scale decomposition as follows. 1 For creating the shadows near the facial features, the small intensity scale σ r is set to constant value of 0.05∼0.08,and the associated spatial scale σ s is set to constant value of 5% the image column size. 2 For creating the clear facial features, including eyes, eyebrows, and sometimes mouth, the large intensity scale σ r is set to a constant value of 0.35, and the spatial scale σ s is set to constant to a value of 2% the image column size.Experimental results demonstrate that the above fixed-scale values perform consistently well for all our face images.The results and analysis process are shown in Figure 5. Figure 7: Face photo-sketch synthesis results: a the background of the cropped images in CUHK database is simple; b our method illustrates the hair and facial feature more accurately, especially the profile of the face; c in order to get the better effect, we smoothed b by simple enhancing method; d the artist draws the face sketches with some exaggerations, such as nose bridge and mouth edge; e in the synthesis results of Tang's method, obviously, face profile is defective, and some of the important marks, such as the mole in the fourth man's face, are lost; f Zhang's method draws the facial features, eyes, and mouth and so forth, with some distortion in size and shape.

Color Similarity Map
In this section, we discuss two problems: 1 computing color similarity map for input image, which is used to select the skin region and hair region separately and 2 hair and unapparent facial feature creating.
To detect the skin color region, we propose a skin/hair classification method based on color similarity.A Gaussian similarity measuring function is defined to compare the similarity between two colors.Gaussian convolution performs directly on every pixel of the image in the same way, which achieves the similarity of all the pixels to specific skin color by the concept of color difference to compute it.If the difference between current pixel's color with the specific color is larger, the probability of this pixel of skin/hair is lower.Let G x denote the Gaussian masks, and E b is the specified color known color , color similarity, function can be defined as E p is the color of pixel p in CIE Lab color space, and E p − E b 2 is the color difference between the specified color and the pixel p in input image, whose value is determined by CIELAB color difference equation.σ c is the threshold of color difference, which determines whether the current pixel belongs to the same kind as the known color.Generally, we keep σ c constant to a value of 30.
To exactly compute the similarity of any color to the specified color, the first step is to confirm a benchmark color.It can be set by two methods: specified by user interaction or the program produces an average color automatically as benchmark while using a known color to select a certain percent of the most similar pixels.The time complexity of this method on confirming a skin color is lower than any face detection-based algorithm.Consequently, hair similarity map and skin similarity map can be achieved by different benchmark colors.

Hair and Unapparent Facial Feature Creating
After the skin color similarity computing, we can get the color similarity map, as shown in Figure 6.Because the color and intensity of skin distribute uniformly, the gradient is very small in skin region, while the gradient is large in hair region due to the irregular hair colors.In hair color similarity map, the hair region's value is the largest.Then the product image of gradient skin color similarity map and the negative of the hair color similarity map will strengthen the hair region by hair value minimization.The hair region enhancement is realized by multiplying the gradient edge map of skin color similarity and the hair color similarity map, which is proposed in Section 3.1.Since facial feature regions, such as nose and mouth regions, have obvious color difference, they will be also enhanced by the multiplication.On the other hand, the color of eyes and eyebrows either is similar to hair color or is distinct from skin color.Both of the cases will be enhanced by the above operation.Figure 6 shows the process of hair and facial feature creating.The proposed method can extract good hair texture under different lighting conditions.It is noticeable that the hair creating method can be applied to other scenes, for example, image abstraction, cartoon making, and wig wearing, in which hair region is necessary.

Experimental Results
We tested the two-scale decomposition-based face sketch synthesis framework on CUHK face photo-sketch database, which contains 606 faces totally.All the input images' size is 1024 × 768 pixels.On average for the uncropped CUHK student face image, one decomposition of an input image for RGB component took about 15.5 seconds 24 s with large spatial scale and 6.9 s with small spatial scale using two-scale image decomposition based on piecewise bilateral filtering.And the time for hair and unapparent feature creating based on color similarity computing took about 3 seconds.So the total synthesis time is

Natural Scene Sketch and Line Extraction
The method we proposed can be applied in natural scene sketch and line extraction, with a little modification.The modified framework is shown in Figure 8.While illustrating the detail by two-scale bilateral filtering, which is the same with proposed framework, high boost filtering 29 in Figure 8 is used to enhance highlights and shadows in image.Sobel detector can give prominence to the distinct edges.Both the computation cost of high boost filtering and Sobel detector are small, so the natural scene sketch is very fast.When multiplying the Sobel edge by the two detail layers, we can get the initial natural scene sketch, which has a line look, as shown in Figure 9 b .Then, we can extract the lines using Difference of Gaussian DoG .Without human skin similarity computing and human hair extraction, the line drawing process is speeded up.The results are shown in Figure 9.Our method performs well on thick subtle edges in input images.Although Kang's method depicts the edges with smooth and coherent lines 26, 27 , its speed is very low because of LIC.In addition, it is hard to detect the dense edges in some region, such as the edges in the building in the second image.On the other hand, DoG operator on the input image fails to deal with the edges in shadows and details.Our line extraction approach takes less than 0.7 second for a 256 * 256 image to synthesize a sketch.We implement our method using MATLAB and run the codes on a computer with 2.20 GHz CPU.The speed comparison is shown in Table 2.
If we want to get the better sketch of any input images, high boost filtering is preferred.Our approach operates well on any kind of images, such as outdoor natural scenes, animals, plants, building, and human faces.Some of the sketch results are shown in Figure 10. Figure 11 shows the human face sketch results of the CUHK database, which is introduced in Section 4. And all the parameters in bilateral filtering are the same with Section 2.2.The performance of our approach is well when it is used in human face.It can deal with gray images and color images of different races and different lighting conditions.More results on other kinds of images are shown in the appendix.

Image Stylization
Based on the initial line sketch, we can easily extract the edges by DoG operator.In combination with base layer in two-scale decomposition after color quantization, we can get the simple image abstraction as proposed in 27, 28 .The results are shown in Figure 12.

Discussion and Conclusion
We have presented a novel framework for human face photo-sketch synthesis, which is very simple, fast and requires no parameter setting.Firstly, by combing the two-time twoscale image decomposition results, the detail reflecting conditions of facial features and the prominent facial features are obtained.Secondly, based on the color similarity map, we extracted the unapparent facial features and created the vivid hair region.Finally, we Mathematical Problems in Engineering   exploited the framework by the simple addition operation of the former results, which can be applied to other races, such as the white race.Moreover, the framework can also be expanded to other applications, such as natural scene sketch and line extraction.In conclusion, the proposed framework is very simple in that no feature localization algorithm, complex mathematic model, or iteration is needed.In addition, the sketch synthesis result is more vivid than other methods, especially in hair texture creating.Most importantly, the method is easy to be used for other applications, such as line extraction, natural scene sketches, and image abstraction.Sketch recognition is more and more widely used in Sketch-based user interfaces 31 .Since the predominant forces in line extraction of our sketch method, we will try to implement recognition of face photo sketch or other image sketch.

Figure 2 :Figure 3 :
Figure 2:The influence of fixed intensity scale and varied spatial scale in decomposition.Row 1 is input image with scanline, and the scanplots of the log intensity along a row of input images cyan were represented using a floating point.Green, blue, and red are the corresponding scanplots of log intensity along the second row.Rows 2 and 3 are the decompositions of input images using different spatial scale, while row 2 is base images and row 3 is the detail images.From left to right, σ s 1%, 2%, and 5% the size of the input image, respectively, while σ r 0.08.

Figure 4 :Figure 5 :
Figure 4: Two-scale decomposition with large spatial scale and large intensity scale.The input image is the same with Figure 3. From left to right, σ s 10%, while σ r 0.60 and 0.95, respectively.

f
Zhang et al.  3

Figure 9 :
Figure 9: Comparisons of our line extraction based on initial scene sketch with existing methods: difference of Gaussian, Kang's line drawing 26, 27 .

Figure 10 :
Figure 10: Arbitrary scene sketch results: our method sketches the images vividly and fast.And the test images in the second and third rows were cited in DêCarlo's paper 28 .

Figure 11 :
Figure 11: Comparisons of our human face sketch results with the artist and Tang's approaches.The first proposed method is introduced in Figure 1, while the second method is in Figure 8.Both the presented methods in this paper successfully depict the facial feature without distortion.

Figure 12 :
Figure 12: Image abstraction based on our line extraction

Figure 13 :
Figure 13: The 1st row is input human face.The 2nd row is initial face sketch results in lines by the proposed approach in Section 5.The 3rd row is the final sketches by the proposed approach in Section 5.

Figure 14 :
Figure 14: Some sketch results of Olivetti-Att-ORL gray-scale database, which is built for face identification.The first row and the second row are input images and the corresponding sketch results.The ORL database can be downloaded from http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase .html.

Figure 15 :
Figure 15: Some sketch results of MUCT color face database, which contains human faces of different races and different lighting conditions.The 1st row and the 2nd row are input images, and sketch results, respectively.And the database can be downloaded from http://code.google.com/p/muct/downloads/list.The input image size is 640 * 480.

Figure 16 :
Figure 16: Some sketch results of images under varying lighting conditions downloaded from Internet.The first and the third columns are input images and the other columns are sketch results, respectively.
(a) Input image (b ) Initial scene sketch (c ) High boost filtering (d) Final scene sketch

Figure 17 :
Figure 17: Other natural scene sketches of images downloaded from the Berkeley image segmentation database 30 .

Table 1 :
The computing speed of the proposed method and others./mmlab.ie.cuhk.edu.hk/facesketch.html,whosesize is 250 * 200, the time cost is 2.4 seconds for two-scale decomposition and 0.45 seconds for hair and unapparent feature creating.The speed comparison is shown in Table1.Figure7shows the experimental results on CUHK cropped images.

Table 2 :
The computing speed of the proposed line extraction method and other.