GANs with Multiple Constraints for Image Translation

Unpaired image translation is a challenging problem in computer vision, while existing generative adversarial networks (GANs) models mainly use the adversarial loss and other constraints to model. But the degree of constraint imposed on the generator and the discriminator is not enough, which results in bad image quality. In addition, we find that the current GANs-based models have not yet been implemented by adding an auxiliary domain, which is used to constrain the generator. To solve the problemmentioned above, we propose a multiscale and multilevel GANs (MMGANs) model for image translation. In this model, we add an auxiliary domain to constrain generator, which combines this auxiliary domain with the original domains for modelling and helps generator learn the detailed content of the image.Then we use multiscale and multilevel feature matching to constrain the discriminator.The purpose is to make the training process as stable as possible. Finally, we conduct experiments on six image translation tasks. The results verify the validity of the proposed model.


Introduction
Image translation [1] is similar to language translation, which converts the input image of source domain to a corresponding image of target domain.For example, inputting an image of the pear and turning it into an image of the apple.There are many methods [1][2][3][4][5][6] to solve image translation problem, but GANs based methods [1,[7][8][9][10][11] have gained increasing attention in the image translation.In the methods of GANs, they view the input image of source domain as the input of generator, which generates fake samples to deceive discriminator.And then the discriminator is responsible for judgment whether they are real samples of target domain or generated fake samples, in which, the deep convolution or deconvolution neural networks [12][13][14][15][16][17] are often used to construct the generator or discriminator.
According to whether the datasets are paired or unpaired, the image translation of GANs based methods can be roughly classified into two categories: paired and unpaired image translation.For paired methods [1,18], they require paired datasets.It is very difficult to prepare paired training datasets in practical applications, whose cost is expensive [19].To reduce the cost of obtaining paired training datasets, [20][21][22] propose the unpaired methods, which are unsupervised domain translation methods [23].
However, these paired or unpaired models are inadequate in generating the detailed information of images.To bring the better result of image translation, it is still a challenge task because of the following problems: (1) how to constrain the generator to generate the detailed content in the image and (2) how to stabilize the training process to obtain better performance of model, such as generated images and generalization performance.
On the one hand, there is the constraint generator.These GANs based methods gradually close to real data distribution by adjustment the generator parameters.In other words, this type of GANs based methods does not need to be pre-modelled, so they are too free for modelling.For the situation that there are many pixels in the image, this type of methods has an uncontrollable problem.To address this problem, [24] proposes constraining generator and discriminator by adding condition variable .Inspired by adding constraint, many researchers have come up with models from different perspectives.The first is to add text or semantic 2 Complexity information to model.Reference [25] adds text information into the cascaded GANs, which generates high definition of image from text.Reference [26] proposes structural GANs, which incorporates semantic information into a conditional generative model.The second is to add regularization term.Reference [27] combines mutual information to conduct adversarial modelling, which actually adds a regularization term of mutual information.Reference [7] brings in cyclic consistency constraint to achieve cross domain image translation.Reference [28] takes advantage of Wasserstein distance and the gradient penalty to conduct adversarial modelling.Then, due to the fact that Nash equilibrium points of original GANs are not asymptotically stable, in order to overcome this difficulty, [29] adds a regularization term in gradient update and proves that the equilibrium points of the original GANs are locally asymptotically stable after adding it.
By looking at the similar objects in real life, we find that they have a certain similarity in appearance or structure.In addition, there are few methods to constrain generator by adding an auxiliary domain [30].Inspired by this, we add the auxiliary domain, which is used to help the generator learn the detailed information in the image during image translation.
On the other hand, there is the stabilization training process.To obtain better generated images, many researchers have put forward their ideas on stabilize training process.Firstly, there is the missing model perspective.Reference [31] analyses the problem of original GANs objective function, which leads to easily the gradient vanishing and missing model when training GANs.Then [28] achieves the improvement of training process.Since the convergence of original GANs training is not proved, [32] proposes a two time scale update rule to train GANs, which can make the training process converge to a local Nash equilibrium.Secondly, there is the probabilistic perspective.Reference [33] studies the distribution of the squared singular values of the input-output Jacobian of the generator.Thirdly, there is the multiscale discriminator.Reference [34] proposes a multiscale discriminator to stabilize training process and generates the high resolution image.Inspired by these discussions, in order to stabilize the training process and improve the discriminating ability of the discriminator.We make use of ideas of the multiscale and multilevel to constrain the discriminator, which is implemented by deep convolutional neural networks [15,[35][36][37].
In this paper, we focus on the unpaired image translation task based on the method of GANs.To try to solve two problems: (1) In GANs based methods [7-9, 20, 21], the generator lacks control over the detailed information in the process of image generation.We try to constrain generator to generate more detailed content during image translation.
(2) To obtain better generated images or generalization performance of model, we stabilize the process of GANs training as much as possible.
To solve the aforementioned concerns, we propose a novel unpaired image translation framework from the perspective of simultaneous constraint generator and discriminator.On the one hand, in order to constrain generator, we add the auxiliary domain to model, which can help generator learn the detailed information in the image.Then we combine the additional auxiliary domain with the domains to be learned to model and design multiple generators and discriminators for image translation.On the other hand, to improve the ability of discriminator or stabilization training process, we use multiscale and multilevel feature matching to constrain discriminator.Finally, we use multiple generative losses, multiscale discriminator losses, multilevel feature matching losses, and full cycle consistency losses to constrain the proposed model.
Our main contributions are (1) We propose an unpaired, multiscale and multilevel feature matching generative adversarial networks (MMGANs) by adding auxiliary domain to achieve cross domain image translation.
(2) We modify the original GANs model from the perspective of simultaneous constraint generator and discriminator.In our model, we add an additional auxiliary domain as auxiliary information to help generator learn the details information during generative images.And we constrain the discriminator by multiscale and multilevel feature matching losses.
(3) Finally, we conduct experiments on the six tasks of image translation.According to the proposed evaluation method and the generated images, the experimental results show that our model has better performance.
The rest of this paper is organized as follows.Section 2 describes the proposed method and the detailed model.Section 3 provides the results and discussion.Section 4 concludes this paper.

Materials and Methods
In this paper, inspired by the constraint generator, cycle consistency of CycleGAN, multiscale discriminator and multilevel feature matching, we design the MMGANs model.On the one hand, we constrain generator by adding an additional domain [30] as ancillary information.On the other hand, we make use of multiscale and multilevel feature matching to constrain discriminator.To achieve this model, we specifically design multiple generator losses and multiscale discriminator losses, full cycle constraint losses, and multilevel feature matching losses.

Formulation Description.
We focus on unpaired image translation problem.For the convenience of the following description, we suppose that image translation is implemented in the domains  and .The added auxiliary domain is .We have   ,   , and   samples from the , , and  domains.In order to implement the proposed model, we design six generators (namely, G1:  → , G2:  → , G3:  → , F1:  → , F2:  → , and F3:  → ) and three discriminators (namely,   ,   , and   ).In particular, to stabilize the training process, our discriminator   has multiscale ( ∈ [1, 𝑚𝑠]).Moreover, to improve the quality of generated images, we add the multilevel feature matching in the discriminator.For example, the generator G3 generates samples of the target domain .Its input is  domain.The generator F1 generates samples of the target domain , whose input is  domain.Then the discriminator   determines whether the generated samples  ∈ 1(),  ∈ 3(), and the real samples  ∈  are real or fake.The discriminators   and   are similar to   .The framework of MMGANs model and its discriminator model are shown in Figures 1 and 2.

Adversarial Loss with Multiscale and Multilevel.
The proposed MMGANs model is inspired by [30], which consists of multiple generators and discriminators.In the proposed model, we use multiscale and multilevel to constrain the discriminator.In this way, the training process can be stabilized and the discriminator pays attention to the identification of the detailed content of the image.
Concretely, there are three inputs , 1(), and 3().It needs to distinguish real or fake sample in the three inputs for the discriminator.To stabilize the training process, we design multiscale discriminator losses.The adversarial losses with multiscale discriminator are as follows: where the subscript Dis means the cases of multiscale discriminator.

And L is
where  1 ,  2 , and  3 are parameters of vectors which are rows (dimension 1 * 3).  and   , respectively, mean cycle loss and full cycle loss [30].
We use the Adam optimizer to solve this model and we give a description of the algorithm.See Algorithm 1.
In addition, this article is a further version of our previous work [30], which has been greatly improved in the following places.
Firstly, we modify the discriminator by multiscale and multilevel constraint.In this way, the training process is more stable and the discriminator pays more attention to the matching of features in the generated image.The goal is to make the details of the generated image more realistic.Secondly, we also add the UNIT method [9] as baseline in the experiment.Furthermore, in the performance evaluation part, the AMT evaluation index is further modified to make the evaluation methods more objective and comprehensive.

Training and Testing Datasets.
We use fruits and seasons datasets for experiment in this paper.In which, the fruits dataset includes images of apple, orange, and pear and the seasons dataset covers the images of summer, autumn, and winter.Finally, we adopt the datasets of [30] and resize all images as 128 * 128.The training and testing sets are shown in the Table 1.

Baseline.
To evaluate the performance of the MMGANs, we compare our method with CycleGAN [7], DualGAN [8], and UNIT [9].CycleGAN: It makes use of the adversarial loss and cycle constraint to achieve the unpaired and unsupervised image translation from  domain to  domain or  domain to  domain.
DualGAN: It takes advantage of dual learning to constrain generator and discriminator.There are two generators and two discriminators to achieve the unpaired and unsupervised image translation in the DualGAN.This way expands the basic GANs into two coupled GANs.
UNIT: It encodes the images of two domains into a shared latent space through the shared weighted encoder and then realizes unsupervised cross-domain image translation by GANs.[8] is one of the methods for evaluating generated images by GANs.It requires that the observers only pick out  the real or fake image among the testing images.For selecting only real or fake samples, the performance of the model is not well characterized.The generated images can be classified into four categories by the observers: better, worse, both of are bad or good.Based on the above discussion, we propose a comprehensive way to describe the model.Firstly, let several observers distinguish worse, better, and both of good or bad images and count the numbers of four types of picked images.Then we obtain the mean number of them and respectively calculate their proportions.

Performance Evaluation. Amazon Mechanical Turk (AMT)
Supposing that   is the sum of the testing images.Compared with CycleGAN, DualGAN, and UNIT models, we, respectively, calculate the values of   ,   ,  ℎ , and  ℎ , which represent the mean number of worse, better, and both of good or bad generated images using MMGANs model.Next, the percentages can be calculated, respectively: Finally, we analyze the model by quantifying indicators 1, 2, 3, and 4.For the convenience of narration, we set, respectively, six experimental cases on image translation.As shown in the Table 2.

Setting Training Parameters.
In this section, we discuss parameter settings about the experiment, which includes  11 ,  12 ,  13 ,  21 ,  22 ,  23 ,  31 ,  32 , and  33 , and select the multiscale of discriminator.In addition, we set the norm  = 1 when training.In the specific implementation, we adopt the network structure of the CycleGAN model.
We choose the parameters by considering the training time and the quality of the generated images.Parameter : set  11 =  12 =  13 = 10,  21 = 5,  22 =  23 = 0, and  31 =  32 =  33 = 1 for all cases.The multiscale and multilevel parameters are shown in the Table 3.For example, we show that the results of values of different multiscale and multilevel from apple to orange in Figure 3.It explains that the different values of multiscale and multilevel affect the generated images.Therefore, we use different multiscale and multilevel parameters for different image translation tasks.

Generated Images.
To fairly compare the models, we present generated images by CycleGAN, DualGAN, UNIT, and MMGANs.And the auxiliary domains are, respectively, pear, orange, apple, winter, autumn, and summer.The testing results are shown as in the Figures 4, 5, and 6.In Figures 4, 5, and 6, the results show that the generated images by MMGANs are more realistic.The outline, color, background, and foreground are better than the generated images by CycleGAN, DualGAN, and UNIT.What is more, for generated images with multiple target entities, our model works also better.In contrast, the fidelity of generated images by CycleGAN, DualGAN and UNIT is even worse.These results tell that the augmented auxiliary domain information helps MMGANs model focus on the generation of image details.The generator is constrained by the augmented auxiliary domain, which makes generated images more realistic.

Performance Results.
According to our proposed evaluation method, we only tell a few observers to sort the images from the testing results.For example, when testing apple2orange, based on the observers first impression, let observers judge which image in the testing results is worse, better, both of are good or bad.
Compared with CycleGAN, DualGAN, and UNIT, the testing results show that our model can obtain better performance by comparing the values of 1, 2, 3, and 4.In addition, they also show that the probability of better quality of generated images is higher than the CycleGAN, DualGAN, and UNIT.Concretely, the value of 2 is higher  our MMGANs has a higher probability of generated better images.
3.5.Discussion.In this paper, we propose an approach to enhance the ability of GANs on image translation using augmented auxiliary domain to constrain generator and using multiscale and multilevel losses to constrain discriminator.In this regard, we have studied the controllability of GANs.Through the study of controllability of GANs, our MMGANs can make the generated images more realistic in image translation.
However, our model also failed in the image translation task.Possible reasons include the following.(1) There are bad samples in our training and testing sets, which may generate bad samples in testing sets.It is shown as in Figures 7 and 8.
(2) The performance of model needs further improvement.For example, introducing attention mechanism [38][39][40][41].In addition, to make the generated image diversified, and to make the generated process interactive, we will consider using semantic to control image generation in future work.

Conclusion
We presented a MMGANs model for image translation, which constrains the generator and discriminator by adding the auxiliary domain, full cycle consistency loss, multiscale, and multilevel loss.In particular, the constraint on the generator allows the generator to learn the detailed content in the image.The constraint of discriminator makes the whole training process more stable and the produced image more lifelike.Through experiments on the six tasks of image translation, our model achieves better performance than CycleGAN, DualGAN, and UNIT.

Figure 2 :
Figure 2: The example of multiscale and multilevel discriminator.There are three columns, which, respectively, represent the real samples  domain, generated samples from  domain by F1, and generated samples from  domain by G3.   represents multiscale and multilevel discriminator,  ∈ [1, ].

Figure 3 :
Figure 3: The results of values of different multiscale and multilevel from apple to orange.We show the original image, generated images with different multiscale and multilevel.

Figure 4 :
Figure 4: Generated images from test datasets.The left column is input image, then generated images with CycleGAN, DualGAN, UNIT, and MMGANs, respectively.

Figure 5 :
Figure 5: Generated images from test datasets (continuous).The left column is input image, then generated images with CycleGAN, DualGAN, UNIT, and MMGANs, respectively.

Figure 7 :Figure 8 :
Figure 7: Training samples.These bad training samples may cause the failure of image translation.

Table 1 :
Training and testing sets.We list the number of image in different data sets.

Table 2 :
Experiment settings.Image translation between apple and orange, we add images of the pear domain, namely, apple2orange or orange2apple.Others are similar.

Table 3 :
Multiscale and multilevel parameters in the different datasets.
3.4.Results of Testing.To be fair, all experiments are trained 200,000 times and testing by the data sets mentioned in this paper.The experiment focuses on two points: (1) compared with CycleGAN, DualGAN, UNIT, and MMGANs, we show generated images in different datasets and (2) we use our comprehensive performance evaluation method to calculate a quantitative ratio and analyze it.

Table 4 :
This table shows that the results of CycleGAN VS MMGANs, in which the third and the fourth column, respectively, represent a percentage of better quality of generated images by CycleGAN and MMGANs.The last two columns mean the generated images by the two models generated good images and bad images in the testing sets.

Table 5 :
This table shows that the results of DualGAN VS MMGANs, in which the third and the fourth column, respectively, represent a percentage of better quality of generated images by DualGAN and MMGANs.The last two columns mean the generated images by the two models generated good images and bad images in the testing sets.

Table 6 :
This table shows that the results of UNIT VS MMGANs, in which the third and the fourth column, respectively, represent a percentage of better quality of generated images by UNIT and MMGANs.The last two columns mean the generated images by the two models generated good images and bad images in the testing sets.