Research Article Unsupervised Portrait Drawing Generation for Free Styles

Artistic portrait drawing (APDrawing) generation has seen progress in recent years. However, due to the naturally high scarcity and artistry, it is difcult to collect large-scale labeled and paired data and generally divide drawing styles into several specifc recognized categories. Existing works sufer from the limited labeled data and naive manual division of drawing styles according to the corresponding artists. Tey cannot adapt to the actual situations, for example, a single artist might have multiple drawing styles and APDrawings from diferent artists might share similar styles. In this paper, we propose to use unlabeled and unpaired data and perform the task in an unsupervised manner. Without manual division of drawing styles, we take each portrait drawing as a unique style and introduce self-supervised feature learning to learn free styles for unlabeled portrait drawings. Besides, we devise a style bank and a decoupled cycle structure to take over two main considerations in the task: generation quality and style control. Extensive experiments show that our model is more adaptable to diferent style inputs than state-of-the-art methods.


Introduction
In recent years, studies on neural style transfer [1][2][3] have fourished.Researchers are no longer satisfed with a single image expression form and have begun to consider more complex and diverse image translation relations [4][5][6][7][8][9][10][11][12][13][14][15][16].For example, CycleGAN [17] transfers a color photograph to a Monet painting.Neural style transfer [18] considers style transfer as a problem of texture transfer, taking the texture from a style image to a content one.Based on the same texture style modeling assumption, many other methods [19,20] have been developed for neural style transfer.
However, artistic portrait drawing (APDrawing) generation is completely not a texture-based task in neural style transfer.Artistic portrait drawing (APDrawing) difers in styles from other portrait paintings in terms of strokes, color composition, etc. APDrawing generation is to transform a face photo with the characteristic of a human face reserved and generate a highly abstract artistic portrait drawing.With little texture information, the previous texture-based style transfer methods thereby are not suitable in the APDrawing generation task.Two recent methods [21,22] have been specifcally proposed for the APDrawing generation task.APDrawingGAN [21] frstly constructed an APDrawing dataset, and developed a hierarchical structure and a distance transform loss.However, it requires paired data and the drawing style is thereby limited to a single one.Te method [22] further improved the generation quality and increased the number of generation styles to three.It proposed the asymmetric cycle structure, including a truncation loss and a relaxed forward cycle consistency.However, it naively divided drawing styles according to the corresponding artists, which did not conform to the actual situations.As shown in Figure 1, the frst artist uses parallel lines to draw shadows while the second artist often uses continuous thick lines and large dark regions, which is sometimes close to the drawing style of the third artist.Te similarities or even the sameness in styles between drawings of diferent artists and the diferences between drawings of the same artist, make it inappropriate to simply divide drawing styles by the artists.
Meanwhile, there exists no publicly available dataset with APDrawings of obviously diferent styles, which heavily relies on manual crawling and fltering.It is time-consuming and costly to collect large-scale labeled and paired training data.In addition, compared with other art paintings, the number of APDrawings is relatively rare.Te shortage of training data leads to a great need for a new unsupervised paradigm.
In this paper, we propose two realistic assumptions for the APDrawing generation task, i.e., there is only access to unlabeled and unpaired data and any manually explicit style defnition should be discarded.Te frst assumption enables us to naturally avoid the cost and difculty of annotating large-scale labeled and paired data.Based on the second assumption, we turn to treating each APDrawing as a single style instead of relying on manual division.Specifcally, contrastive self-supervised learning is introduced to learn styles for all input APDrawings, which actually ensures lowcost acquisition of large-scale training data.It pulls close each APDrawing and its augmented images and pushes away diferent APDrawings.In this way, our style feature extractor can explore latent relations among input data.
Intuitively, without a clear division of drawing styles, the self-supervised style features are accompanied by irrelevant information.It is not easy to embed such unconstrained style features into our generation process, preserving the generation quality and controlling the drawing styles as expected.Accordingly, we build up a style bank for all these styles.As a set of representative styles for style groups, the style bank can also be viewed as a way to reduce the dimension of the style feature space to stabilize the training process.However, we propose a decoupled cycle structure with two streams to guarantee the generation of vivid APDrawings and the generation for free styles.
In summary, the main contributions of our work are listed as follows: We propose to use unlabeled and unpaired data for the APDrawing generation task, which frees us from excessive dependence on labeled data.
Without the naively manual division of drawing styles, we treat each APDrawing as a single style and introduce contrastive self-supervised learning to learn style features for them.It enables us to generate APDrawings with free styles for diferent style inputs.We propose a style bank to update the original style features and a decoupled cycle structure, which guarantees the stability and robustness of training with a set of unsupervised style features.

Deep Learning in APDrawing.
With the help of deep learning techniques, great strides have been made towards more powerful and adequate artifcial intelligence for many vision processing tasks.Drawing related applications, such as line drawing colorization [23] and artistic shadow creation [24], also beneft from deep learning, which can produce more creative and richer paintings with less human eforts.Zhang et al. [25] proposed a deep learning framework for userguided line art fat flling.It included the split flling mechanism to directly estimate the result colors and infuence areas of scribbles.Im2Pencil [26] translated from photos to pencil drawings by a two-branch framework that learned separate flters for outline and shading generation, respectively.It can generate pencil drawings with style control.

Image-to-Image Translation.
Our method also takes the advantage of deep learning technique to image generation.Eforts on image-to-image translation usually fall into two categories: domain-level translation and instance-level translation.It was frstly proposed by [27]   International Journal of Intelligent Systems domains, defned by two sets of datasets.Many of them resorted to conditional generative adversarial network (cGAN) to synthesis images.Pix2Pix [28] was built on cGAN and used paired data between domains to learn the translation function.For many tasks, paired data are not available.To overcome the limitation, cycle consistency was then proposed in CycleGAN [17] and DualGAN [2], which make use of the cycle consistency constraint.Tis constraint enforces that the two mappings from domains A to B and from B to A, when applied consecutively to an image, revert the image back to itself.It regularized the training by reconstructing an original image from its translated image.StarGAN [29,30] and ComboGAN [31] were then proposed to extend the image-toimage translation between two domains to multiple domains based on the cycle consistency.Another line of methods [1,32] was not restricted in the image level but in the feature level.Tey assumed a shared latent space but MUNIT [32] postulated that only part of the latent space should be shared rather than a full latent space proposed in UNIT [1].However, these methods either cannot generate images of diferent styles, or are not suitable for our APDrawing generation task because of the lack of ability to describe facial features in detail.

Neural Style Transfer.
Neural style transfer is closely related to image-to-image translation, which aims at preserving the content of an image but transferring from the style of another image.Classic neural style transfer usually refers to the example-guided style transfer, while image-toimage translation mainly refers to the domain-based image translation.Neural style transfer was frstly proposed in image style transfer [18] to introduce a CNN to reproduce famous painting styles on natural images.It penalised differences between high-level CNN features of the generated image and the content image, and used the Gram matrix statics of features in the CNN to measure the style similarity.
Many follow-up studies [4,[33][34][35][36][37][38] were conducted to either improve or extend the method.Tese methods [20,39] addressed the issue of slow optimization process by training feed-forward neural networks.Diferent from these methods, the method of [19] replaced the Gram-based modeling way of styles but used a Markov random feld (MRF) regularizer.Combing deep convolutional neural networks (DCNNs) with MRF models-based texture synthesis can be applied to both photographic and nonphotorealistic synthesis tasks.
However, most methods model style as texture, which is not suitable for our task.With little textual information, APDrawing generation requires a high degree of abstraction and completeness of some facial details in strokes at the same time.As illustrated in Figure 2, the training process can be divided into two phases, i.e., extracting style features for unlabeled APDrawings and generating APDrawings with the desired styles.Te frst training phase uses unlabeled and unpaired APDrawings and introduces contrastive self-supervised learning to learn styles for them.Te style bank B is built up and the style features are updated to the similarities with the style bank.Te second stage uses a set of generators and discriminators.Te generator G and an inverse generator F are included to generate vivid APDrawings from input photos and style features, and transform APDrawings back to input photos without edge information loss, respectively.Te discriminators consist of D A and D P to guarantee the discrimination between the generated fake images and the real images in both domain A and P.

Method
Next, we will introduce details of our proposed method from the following aspects: (1) unsupervised style feature extraction and (2) unsupervised portrait generation.

Unsupervised Style Feature Extraction.
Previous methods [21,22] for the APDrawing generation task are either limited to one single style drawn by an artist, or a predefned division of drawing styles based on corresponding artists.In fact, it is hard to divide the drawing styles of APDrawings into several specifc categories.Meanwhile, there is a lack of public APDrawing datasets with several diferent drawing styles, and it is quite costly to collect such a large-scale labeled one.A new benchmark is required for the task, due to the high artistry in drawing styles and the scarcity in labeled data.Te new benchmark should be able to use unlabeled data and adapt to various drawing styles.
We introduce the contrastive self-supervised learning to train our style extractor for unlabeled and unpaired APDrawings.It is capable of adopting self-defned pseudolabels as supervision and utilizing the learned style features in the next APDrawing generation phase.As a discriminative approach, contrastive self-supervised learning aims at grouping similar samples closer and separating diverse samples far from each other as shown in Figure 2. Specifcally, the VGG19 network is adopted as our feature extractor, denoted as W. We pull augmented versions of the same sample close to each other while pushing away style features from diferent samples.Te formulation of the loss function is written as follows: where τ is a temperature coefcient, n − represents the number of negative samples of a i in a minibatch, and sim(•) measures the cosine similarity between two input vectors.In order to keep the style-invariance of these APDrawings, we choose some data augmentation methods, including cropping, resizing, horizon-fip, and rotation.In a minibatch, the original APDrawing, the transformed version of the original International Journal of Intelligent Systems APDrawing, and the aligned version of the APDrawing and its transformed version are positive samples to each other.In this self-supervised way, our feature extractor explores the underlying data structure in these APDrawings.
Considering the high dimensionality of the feature space and the scarcity of data, the model might not be able to learn a stable and robust mapping from APDrawings to style features.We turn to building up a style bank B � [b 1 , . . ., u T ], where T is the bank size.It is obtained by clustering style features of these unlabeled APDrawings using the K-means algorithm.On the one hand, it can reduce the dimension of the style feature space and stabilize the training process to obtain robust style features for APDrawings.On the other hand, the style bank can be viewed as a set of representative styles for style groups, which might beneft in alleviating the negative impact of irrelevant information of drawing styles brought by contrastive self-supervised learning.Finally, the updated style feature of a i is computed as the cosine similarities between the style bank and the original style feature, which is written as follows: where W(a i ) is a K-dimensional vector and B ∈ R K×T represents the style bank.All updated style features make up the set S, denoted as S � s i   i�N i�1 .

Unsupervised Portrait Generation.
With the updated style features as input, the portrait generation phase aims at transferring styles, defned by an input style image, to a face photo.During the generation process, the following two considerations need to be guaranteed: generation quality and style control.Generation quality ensures to generate a vivid portrait, preserving the facial features and less discrimination from the real ones.Style control enables to keep the drawing style unchanged, compared with the style input.Te total loss function can be summarized in the following form: (3)

Generation of Vivid Portraits.
Tere are two generators, G and F, using the architecture of autoencoder with residual blocks.Te discriminator set D A is based on PatchGAN [28].It involves a global discriminator, without information loss in the holistic characteristics, and a set of local discriminators for the fne details in facial regions.G and D A are used in the generation from input photos to APDrawings.F and D P are optimized in the opposite direction.Tey are trained with the adversarial loss, the asymmetric cycle consistency loss, and truncation loss [22], which are formulated as follows: In other style transfer tasks, color information is quite important for the style modeling, while it is irrelevant for our task.We propose to transform an input image to a gray-scale one before sending it to the generation network.It enforces our network focusing on the line strokes and shadow usage, bringing balance for training F and D P in the second cycle.Besides, we decouple two cycles by sending diferent pairs of input, including APDrawings, photos, and style features.Te richness of the pair combination is more conducive to our generation.

Generation of Portraits for Specifc Styles.
Te style classifcation network, denoted as D s , shares the frst few blocks with the global discriminator of D A .In order to achieve the purpose of style control, the style loss is formulated as follows: For real APDrawing a, D s outputs the predicted style feature to get close to which computed in the frst style feature extraction phase, denoted as _ s.For the generated APDrawing G(p, € s), its output style feature is specifed by the input style feature € s.Te style loss guides D s to produce the style features close to the real feature distribution S and generate APDrawings close to the desired styles.

Extraction of Style Features.
Te CNNs for image feature extraction consists of three Conv-BatchNorm-ReLU blocks with two 1/2-scale downsampling operations.For the transformer, the feature dimension is 256 and both encoder and decoder have 3 layers.We use the ReLU output of the 13th convolutional layer of VGG19 as the style feature.We set the initial learning rate to 0.001 for the frst 10 epochs and decay it by 10 times in the next 10 and 25 epochs.Te total training epoches are 30.Te batch size is fxed to 16 and the bank size is 10.

Generation of APDrawings.
We set the hyperparameter in equation ( 5) as λ 1 � 5 − 4.5i/n, λ 2 � 5, and λ 3 � 4.5i/n, where i and n are the current epoch and the total epochs, respectively.Te learning rate is set to 1.5e − 5 for the frst 100 epochs and is linearly decayed to 0 in the next 100 epochs.

User Study.
To evaluate the efectiveness of our method, we conduct a user study to compare with CycleGAN [17] and the method [22].We randomly sample 30 face photos from the test set of APDrawingCrawl and transform them to three diferent styles.Tere are 50 participants involved in the user study and each of them is provided with these 30 images, resulting in 1,500 votes.With a style example image, every 10 images are transformed to that style.Tere are 3 style images in total.All participants are given an input photo, a style example image and drawings generated by these three methods at the same time.Te voting criterion is based on image integrity, image quality, style perseverance, and face characteristic similarity.
As shown in Table 1, we have similar performance with the state-of-the-art method [22], which is in line with the expectations.Our method aims at solving the problem of APDrawing generation with unlabeled and unpaired training data, preserving the input style and the content from the face photo.Te method [21] requires paired data and the drawing style is thereby limited to a single one.Te method [22] further increases the number of generation styles to three with more collected data with labels.However, they sufer from the limited labeled data and naive manual division of drawing styles according to the corresponding artists, which do not conform to the actual situations.
Besides, it is truly hard for unsupervised methods to obtain apparently better generation quality than supervised methods, without the restricted classifcation categories and abundant labeled data for each class.So we emphasize the superior of our method on higher fexibility and scalability for diferent style inputs, and less dependence on abundant labeled data.

Qualitative Results
. We conduct qualitative model analysis on both style feature extraction phase and APDrawing generation phase to demonstrate our efectiveness.

Style Feature Extraction.
We use K-means to divide the style images into fve clusters and visualize the learned style features by t-SNE.As shown in Figure 3, our style International Journal of Intelligent Systems extraction is able to learn well-separated features.However, there is still the problem of unclear boundaries between diferent clusters, which can be expected and explained.From the very beginning, we do not want to manually divide the styles and therefore the style bank is introduced.Te style clustering is for generating the style bank and some unseparated samples demonstrate the inapplicability of simple divisions.Due to the high dimensionality of the feature space and the scarcity of data, it is hard for the model to learn robust style features.With the dimensionality constraints brought by the style bank, the learning process could be stabilized.
As shown in Figure 4, we show the portrait drawings nearest to the cluster centers (style bank) of all APDrawing styles.Tese drawings can be viewed as the prototype style images of each style in the style bank.From the line strokes and shadow usage of these APDrawings, our method has the ability to distinguish among various drawing styles and they are actually from diverse artists, i.e., Yann Legendre, vectorportral.com, Kathryn Rathke, Charles Burns, and an unknown artist.

APDrawing Generation.
We compare our method with a method Yi et al. [22] designed for the same APDrawing task, two examples-guided neural style transfer methods, image style transfer [18] and linear style transfer [5], and a unpaired imageto-image translation method ComboGAN [31].
As shown in Figure 5, we make a comparison with Yi et al. [22] to demonstrate the limitation of relying on the manual division of styles based on artists.We list three input styles and the corresponding generation results.Te frst two style input images are from the same artist, Kathryn Rathke.According to the style division strategy defned in Yi et al. [22], the output transferred images of input style1 and style2 should be the same, while ones of style2 and style3 should be obviously diferent.However, in fact, we can easily distinguish diferent drawing techniques between the input style1 and the input style2.Te style1 input image is distinguished by a large area of gray thick shadow on the face, while the style2 input image is not.Te input images of style2 and style3 share the similar usage of dark regions.Te manual style division according to the artist is apparently not  6 International Journal of Intelligent Systems suitable in the situation, resulting in confusion in the generated results of Yi.Compared to the method of Yi et al. [22], our method is more fexible to generate APDrawings with desired drawing styles.As shown in Figure 6, we make a comparison with two example-guided methods and a state-of-the-art one, Yi et al. [22].It can be easily seen that these two neural transfer methods either fail to capture the diferences in input styles, or cannot generate images with acceptable generation quality.Yi et al. [22] can generate drawings with three distinct styles.However, our method can actually generate drawings according to diferent input styles, with higher fexibility.Our frst generation result uses parallel lines to draw shadows, the second result tends to use clean lines, and the third one uses large area of dark regions.
As shown in Figure 7, we make a comparison with an image-to-image translation method and a state-of-the-art one.ComboGAN fails to generate vivid APDrawings with good generation quality.For example, there exist some strips and undesired gray color on the face.Yi et al. [22] can generate discriminative results for three fxed styles.Despite the lack of label information, our model can still capture diferences between diferent styles.Te generated images are consistent in a single style while they difer from images in other styles.

Ablation Study. We conduct ablation studies as follows:
(1) generation without using the style bank and (2) generation without gray-scale inputs, i.e., using the original color inputs.
As shown in Figure 8(c), only the center areas have visible traces of the generated drawings, and the hair disappears.Figure 8(e) has fner details in the cap and the hair than Figure 8(c).Without using the style bank, our method falls into the situation that the negative infuence is brought in by the accompanied redundant information of style features.It might lead to the leaving of a blank outside the fxed area or other undesired situations in our generated drawings.Te introduction of style bank is to eliminate the  [22] and ours.Te input style1 and input style2 are actually from the same artist Kathryn Rathke, while the input style3 is from another artist.Although input images of style1 and style2 are from the same artist, they apparently difer in drawing styles.Te method [22] simply treats style1 and style2 as the same style by predefned manual division of styles, while ours use a more adaptive paradigm for free style APDrawing generation.
International Journal of Intelligent Systems  , image style transfer [18] and linear style transfer [5] and an APDrawing generation method [22] with multiple styles.As shown in Figure 8(d), the color changes on the face lead to the extra or erroneous lines.With the color input, we can see that the results might be more sensitive to the areas with color changes, which is not desired in the APDrawing task.Te introduction of using gray-scale inputs is to avoid being afected by the task-irrelevant inputs.

Conclusion
In this paper, we propose to perform the APDrawing generation task in an unsupervised manner, which can avoid the difculties of collecting large-scale labeled data and the irrationality of dividing drawing styles into some specifc categories.We introduce contrastive self-supervised learning to learn free styles of APDrawings by treating each as a single style.Te style bank and corresponding decoupled cycle structure guarantee the generation quality and style control of the output APDrawings.Experiments show the fexibility and scalability of our method in generation of APDrawings with diferent styles, which is more adaptive compared to other state-of-the-art methods.In our future study, we will investigate how to improve the realism and faithfulness of the low-quality original photos, such as these with blurry texture, which would cause noisy and messy lines, failing to preserve fne details.

Figure 1 :
Figure 1: Examples of portraits and their corresponding artists.It can be noticed that there exist diferent drawing styles of portraits from the same artist and rarely similar drawing styles among diferent artists.Te observation motivates us to develop a new paradigm for generating portrait drawings with free styles.

3. 1 .
Overview.Our method is proposed for the APDrawing generation task, transferring the style of an APDrawing in domain A to the input photo in domain P. Te APDrawings and the photos are denoted as a i   i�N i�1 , where a i ∈ A, and p i   i�M i�1 , where p i ∈ P, respectively.N and M are the number of APDrawings and photos in our training set.

Figure 2 :
Figure 2: Te pipeline of our proposed model for the APDrawing generation task.It consists of two training phases, i.e., (a) a style feature extraction phase and (b) a generation phase.Te style feature extraction aims at learning the style feature for any given portrait drawing.Te generation phase generates portrait drawings with specifc styles of given style images from original input photos.

4. 1 .
Datasets.Although the training set of APDrawings in the method[22] have not been released, we have collected similar number of APDrawings to train our method.Due to the lack of public datasets with multiple styles, the APDrawings are crawled from the Internet to construct an APDrawing dataset with the size of 641, named APDrawingCrawl.It  consists of 116 APDrawings of the artist Charles Burns, 45 APDrawings of the artist Yann Legendre, 89 APDrawings of the artist Kathryn Rathke, 233 APDrawings from vectorportral.com, and other 158 APDrawings without tagged/labeled artist/ source information.641 APDrawings and 1000 face images from CELEBA-HQ [40] form our training set.Te testing set consists of 200 face photos, mainly from CELEBA-HQ.All the training images, including APDrawings and face photos, are resized at the resolution of 512 × 512.For training local discriminators in D A , training images are aligned using facial landmarks and perform face-parsing from the model BiSeNet [41].

Figure 3 :
Figure 3: T-SNE visualization on collected dataset APDrawingCrawl in the style extraction phase.We split all style images into fve clusters as representative styles.

Figure 4 :Figure 5 :
Figure 4: Visualization of portrait drawings nearest to the cluster centers (style bank) of all portrait drawings in the style extraction.Tese drawings can be viewed as the prototype style images of each style in the style bank.Tey are obviously diferent in drawing styles and actually from diverse artists, i.e., Yann Legendre, vectorportral.com,Kathryn Rathke, Charles Burns, and an unknown artist.

Figure 6 :
Figure6: Comparisons with two state-of-the-art example-guided neural style transfer methods, i.e., image style transfer[18] and linear style transfer[5] and an APDrawing generation method[22] with multiple styles.

(Figure 7 :Figure 8 :
Figure 7: Comparisons with a general unpaired image-to-image translation method ComboGAN [31] and a method [22] specially designed for the APDrawing generation task.Tey can generate images with diferent styles or translate image into several domains.

Table 1 :
[22] study results comparing our method with CycleGAN[17]and Yi et al.[22].Te i-th column represents the percentage of diferent methods that rank i-th among the three methods.