Generative adversarial networks are currently used to solve various problems and are one of the most popular models. Generator and discriminator are characteristics of continuous game process in training. While improving the quality of generated pictures, it will also make it difficult for the loss function to be stable, and the training speed will be extremely slow compared with other methods. In addition, since the generative adversarial networks directly learns the data distribution of samples, the model will become uncontrollable and the freedom of the model will become too large when the original data distribution is constantly approximated. A new transfer learning training idea for the unsupervised generation model is proposed based on the generation network. The decoder of trained variational autoencoders is used as the network architecture and parameters to generative adversarial network generator. In addition, the standard normal distribution is obtained by sampling and then input into the model to control the degree of freedom of the model. Finally, we evaluated our method on using the MNIST, CIFAR10, and LSUN datasets. The experiment shows that our proposed method can make the loss function converge as quickly as possible and increase the model accuracy.
Shanghai Key Laboratory of Intelligent Information ProcessingIIPL-2019-101. Introduction
GAN is a new image-generating model based on game theory, which innovatively combines the generative model and adversarial model, and proposes a useful training method based on model features to make the output resulting images clearer and sharper than other methods.
Compared with the previous complex image generation methods, the generation of adversarial network does not need to model its original dataset, but only needs to use generators to approach the original data distribution. Its generator and discriminator also do not need complex network structure, and the original deep neural network can achieve better generation effect. Although researchers have made a lot of improvements to the generation of adversarial network, there are still some points that need to be improved based on its own characteristics. For example, the model training speed is slow and the model freedom is too large. So, the purpose of our study is speeding up model training and reducing model freedom.
This is a challenging problem because (1) the generation of adversarial network is a process of continuous game between the generator and discriminator during training. While constantly improving the generation of images, it is difficult for the model to reach equilibrium or even stability due to the violent fluctuation of gradient, and the training speed will be too slow. Traditional GAN training on the CPU takes about 3 hours. (2) The generation of adversarial network requires no modeling in advance. Although the process of generating pictures is simplified, the model is too uncontrollable and the freedom of the model is too large.
To counter the problems above, we use the transfer learning method in the unsupervised deep model and combine with other variational autoencoders of the image generation model to reduce the resource consumption and time of training generative adversarial network. In addition, based on the particularity of the decoder input of the variational autoencoder, the freedom of the model is further limited by limiting the input data distribution of the generator, which provides an improved method for the future training of other unsupervised generator models. In recent years, based on the great potential of generating adversarial network, the method has been widely improved and applied by researchers. Some studies use GAN in semisupervised learning. Finally, some research studies combine GAN with specific applications and achieve good results.
We combine transfer learning and unsupervised learning to make up for the shortcomings of both sides and use standard normally distributed sampling as parameters to increase input of the model. The specific combination and control methods are as follows:
The decoder of the trained variational autoencoders is used as the network architecture and initial parameters to generative adversarial network generator. The generator is provided with a new starting point for learning and make the loss function fall down as quickly as possible. Experiments show that the method can reduce the training time by about half.
Take standard normally distributed sampling as the input data of the generator, and the problem of too much freedom of the model is improved by limiting the input data of the model.
2. Related Works
The idea of generative adversarial networks (GANs) is derived from game theory’s Nash equilibrium [1]. The model is shown in Figure 1. GAN can be seen as a collection of two networks: a generator G, which is responsible for fetching data close to real data, and D, a discriminator, determines whether the sample is a real sample or a false sample, t is sampled from the real data, and n is the data generated by the generator. Two networks are trained by gaming each other, and the loss function is(1)minGmaxDVD,G=Et∼pdatatlogDt+En∼pgnlog1−DGn.
Structural figure of generative adversarial networks.
GAN can be found from conferences and journals in recent years [2–8]and is already a hot topic in artificial intelligence [9–11]. It has many papers related to GANs. The original GAN [1] requires no Markov chain or expanded approximate inference network during training. DCGAN uses CNN to replace the basic nerve structure in GAN and uses global pooling layer to reduce the computation. PROGAN starts to train and gradually generates clear images from blurred images [12]. SAGAN uses the weighted sum of the elements of all positions as the response of that location by self-attention and better handles the dependencies between different regions [13]; LP-WGAN [14] replaces Lipschitz [15] to lp-norm to constrain weight search and effectively close the generated distribution and the true distribution. WGAN-GP [16] uses improved gradient penalty to solve the problem of associating parameters with limits to achieve real Lipschitz constraints. The LSGAN changes the target function of the model, more accurately describes the loss of the model and solves the problems of low picture quality and unstable training process [17]. F-GAN proofs that any divergence is suitable for generating models [18]. UGAN can remove the source category information retained in the generated image, making the source category of the generated image more difficult to track [19]. LSGAN uses Lipschitz regular conditions to further normalize its loss function to the density of real data [20]. MRGAN takes advantage of the unique geometry of real data, especially manifold information [21]. Geometric GAN [22] uses SVM to separate hyperplanes to maximize margin. RGAN [23]considers that the probability that the real data is true should be reduced because this can reduce the gap between true and false data and make the data smoother.
The autoencoder designs a neural network architecture that imposes bottlenecks on the network and forces the original input to compress knowledge representation. The autoencoder is shown in Figure 2. The autoencoder forces the model to retain only the changes of data required by the reconstruction input, but not the redundancy of input. Therefore, the loss function is referred to as(2)L=minθx−fx2+12tr∑x+μxTμX−K−logdet∑x.
Variational autoencoders include the encoder and decoder.
3. Methods
Transfer learning is to transfer the parameters of the model learned and trained to the new model to help the training of the new model. Most of the data or tasks are correlated. Through transfer learning, the model parameters can be shared with the new model in some way to speed up and optimize the learning efficiency of the model without learning from zero as most networks do. Transfer learning mainly includes the concept of data domain and task. A data domain D consists of the feature space X and the probability distribution PX, and X = x1,x2,...,xn. A task T consists of a label space Y and an objective function F. Different models can be transferred according to the correlation between the data domains and tasks of the source model and the target model in transfer learning.
3.1. Transfer Learning Theory Based on CNN
Transfer learning based on convolutional neural network can be divided into two methods.
In the development model, we need to select a relevant predictive modeling problem with rich data first. There is a relationship between the input data of the original task and the target task, the output data, and the concepts learned by adjusting the model parameters with the input data. Then, we need to define an excellent network model for the original task to ensure that the network can enhance the representation of features of original dataset as much as possible. Then, the network model of the original task can be selected as the training starting point of the network model of the new task. Part or all of the original network model can be selected according to the needs of the new task. Finally, according to the training results of the new model and the needs of the target task, the network model can be adjusted appropriately to make it perform better in the new task.
In the pretraining model, we select the source model first. The pretraining model is difference from the development model because the basic model does not need to be trained. Many research institutions have already trained the basic model in the super-large dataset, and the caller only needs to select the network model suitable for the new task. Then, we reuse the model, select the network model of the original task as the training starting point of the network model of the new task, and select part or all of the original network model according to the needs of the new task. Finally, according to the training results of the new model and the needs of the target task, the network model can be adjusted appropriately to make it perform better in the new task.
3.2. Transfer Learning Based on Unsupervised Generation Model
Transfer learning is generally applied to the supervised discriminant model. The source model is transferred to the target model after training through similar datasets, and a new training starting point is given to the target model to accelerate the training speed of the target model. By combining generative models with transfer learning, a new transfer method is presented which can be applied to the unsupervised generation model of deep learning. In this paper, two unsupervised generation models of generation adversarial network and variational autoencoder and their improvement are studied, and MNIST and LSUN datasets are selected as input data. The variational autoencoder is selected as the source model of transfer learning, and the original generative adversarial network and WGAN-GP are selected as the target model. This experiment adopts a new training method of transfer learning based on the unsupervised generation model:
Select the source model: the variational autoencoder is chosen as the source model of training
Reuse model: the model parameters of the decoder of the variational autoencoder are saved
Adjust model: the generative adversarial network is adjusted so that the input of the generative model is obtained from the standard positive attitude distribution, and then the iterative training is carried out according to the generative adversarial network algorithm
The improved original generation adversarial network adopts the small-batch stochastic gradient algorithm. The training times of the discriminator are k, which is a hyperparameter. The dataset is input into the encoder of the variational autocoder so that the encoder learns mean and variance. The potential variable n is selected from the standard normal distribution and input into the decoder to train the variational autocoder so that its loss function is stable or even minimal. The network structure and network parameters of the decoder of the variational autocoder after training are transferred to the generative adversarial network as its initial generator for the following training. The improved original generative adversarial network algorithm is described in Algorithm 1.
Algorithm 1: Improved original generative adversarial network Algorithm. D and G are the two parts that make up the GANs. Really sampled data t and the noise data n. Gn is the sample generated by G that is closest to real data. Dt is the discriminator of really sampled data, and DGn is the discriminator of fake data from the generator.
Input: noise data n1,…,nm from standard normal distribution pgn and t1,…,tm from the generated distributed data pdatat.
Output: The losses of generator and discriminator.
For generative adversarial network training times do
For k times do
Select m small batch samples from standard normal distribution pgn .
Select m small batch samples from the generated distributed data pdatat.
Optimize discriminator weights by stochastic gradient ascent algorithm: ∇θD1/m∑i=1mlogDti+log1−DGni
end For
Select m random samples from standard normal distribution pgn.
Optimize generator weights by random gradient descent algorithm: ∇θD1/m∑i=1mlog1−DGni
end For
4. Experiments
In this experiment, the MNIST, CIFAR10, and LSUN datasets are used to compare different models. The MNIST is a computer vision dataset, which contains 70,000 grayscale images of handwritten digits, each of which contains 28 ∗ 28 pixels, and each image has a corresponding label that is the corresponding number of images. The CIFAR10 has a total of 60,000 color images, and the image pixels are 32 ∗ 32, which are divided into 10 categories with 6,000 images in each category. The LSUN mainly includes 10 scene categories and 20 object categories, and each category has about one million pictures. It is a scene understanding image dataset, mainly including bedroom, house, living room, classroom, and other scene images. We improve the generative adversarial network, deep convolutional network, and WGAN-GP, respectively, in three datasets. The details are shown in Table 1. The original generative adversarial network trains the MNIST dataset, and the improved original generative adversarial network adopts the same network structure. The leaky ReLU activation function is used in the original network activation function. Deep convolution generative adversarial network is to change the structure of the original generative adversarial network. The improved deep convolution generative adversarial network adopts the same network structure. In the discriminator of deep convolution generative adversarial network, step convolution is used to replace pooling layer for down sampling, and in the generator, transposed convolution is used to replace upsampling. Removing the full connection layer and replacing the full connection layer with global pooling, the improved WGAN-GP and WGAN-GP adopt the same network structure as deep convolution generative adversarial network, while the WGAN-GP improves the loss function of network to avoid the model collapse.
The experimental comparison table.
Dataset
Algorithm
Instruction
MNIST
Original generative adversarial network
The experimental results of the original network and the improved network are shown, which show the effectiveness of improved network
Improved original generative adversarial network
CIFAR10
Deep convolution generative adversarial network
Improved deep convolution generative adversarial network
LSUN
WGAN-GP
Improved WGAN-GP
4.1. Improved Original Generative Adversarial Network
In this experiment, the low-pixel MNIST dataset is used for training, and the network structure of the generator and discriminator is relatively simple. In order to make a comparison with the original generative adversarial network, this experiment uses the MNIST dataset to train the two algorithms, respectively. The MNIST dataset generate the training result graph of the adversarial network in the original phase in Figures 3 and 4. The MNIST dataset in the generation of improved counter network training loss in Figures 5 and 6. Figures 7(a) and 7(b) show the generated images, respectively. In the experimental figure, d_loss is the loss value of the discriminator, g_loss is the loss value of the generator, and the x-coordinate is the number of iterations during training. As the training time increases, the loss of the network gradually decreases and tends to be stable when the training times of the original generative adversarial network reach to 120 k. And generative adversarial network in training reach to 40k, network convergence, generator, and discriminant loss function tends to be stable; therefore, when using a transfer generative adversarial network-based algorithm while the MNIST dataset for network training and greatly improves the convergence speed.
The discriminator loss function of generative adversarial networks.
The generator loss function of generative adversarial networks.
The discriminator loss function of improved generative adversarial networks.
The generator loss function of improved generative adversarial networks.
4.2. Improved Deep Convolution Generative Adversarial Network
In this experiment, we use the CIFAR10 for training and deep convolution which is adopted by the generator and discriminator to generative adversarial network. For the purpose of making a comparison with the adversarial network generated by the original deep convolution, this experiment uses the CIFAR10 dataset to train the two algorithms, respectively. The CIFAR10 dataset generates training result graph of countermeasure network in the original deep convolution in Figures 8 and 9. The CIFAR10 dataset generates the training result graph of adversarial network in the improved deep convolution in Figures 10(a) and 10(b). Generated images are shown in Figures 11 and 12. In the experimental figure, error_D is the loss value of the discriminator, error_G is the loss value of the generator, and the x-coordinate is the number of iterations during training. When the training times of the original deep convolution generation adversarial network reaches 15 k, the network reaches convergence and the loss function of the generator and discriminator tends to be stable and improves the deep convolution generative adversarial network in training reach to 8k, network convergence, generator, and discriminant loss function tends to be stable. Therefore, when deep convolution generative adversarial network based on transfer learning is used for training in CIFAR10, the convergence speed is improved. In addition, from the result graph of generated images, the quality of generated images is also improved to some extent.
The discriminator loss function of deep convolutional generative adversarial networks.
The generator loss function of deep convolutional generative adversarial networks.
(a) The discriminator loss function of improved deep convolutional generative adversarial networks. (b) The generator loss function of improved deep convolutional generative adversarial networks.
Deep convolutional generative adversarial network-generated pictures.
Improved deep convolutional generative adversarial network-generated pictures.
4.3. Improved WGAN-GP
In this experiment, the LSUN dataset is used for training, and the generator and discriminator used deep convolution to generate the adversarial network. In order to make a comparison with the original WGAN-GP. The LSUN dataset is used for training in the two algorithms in this experiment. The LSUN dataset in the original WGAN-GP training result graphs is shown in Figures 13 and 14. The LSUN dataset in the improved WGAN-GP training result graphs are shown in Figures 15 and 16. Generated images are shown in Figures 17 and 18, respectively. In the experimental figure, data/disc_cost is the loss value of the discriminator, data/gen_cost is the loss value of the generator, and the x-coordinate is the number of iterations during training. In the loss function graph of the generator and discriminator, when the training frequency of original deep WGAN-GP reaches 25k, the network reaches convergence and the loss function of the generator and discriminator tends to be stable. And improved WGAN-GP in training reaches to 20 k, network convergence, generator, and the discriminant loss function tends to be stable; therefore, when using WGAN-GP algorithm based on migration study in LSUN dataset for network training, improve the convergence speed, the other by the result of the generated image, and generated images are also better than the previous images, and images become more acute.
WGAN-GP discriminator loss function.
WGAN-GP generator loss function.
Improved WGAN-GP discriminator loss function figure.
Improved WGAN-GP generator loss function.
WGAN-GP generated pictures.
Improved WGAN-GP generated pictures.
5. Conclusions
In all unsupervised generative models, the generation of adversarial network has the advantages of clearer and sharper image generation, and does not need to use Markov chain to sample the input dataset repeatedly, and the model is more concise than other unsupervised network models. However, the problems of long training time and too much freedom of the model have not been improved. So, we put forward the transfer method between unsupervised generation models. First, we transfer the decoder of the variational autocoder to the generator that generates the adversarial network and the convergence speed of the adversarial network is greatly accelerated. Second, our model restricts the generation of the input of the antagonistic network and uses the standard normal distribution, which further limits the freedom of the model and makes it more instructive and closer to the real picture. Experiments show that generated pictures are more instructive and closer to the real pictures. The image resolution is still the focus of research, so we will design experiment to building a new network that makes the generated images more realistic and speeds up training in our future work.
Data Availability
The MNIST, CIFAR10, and LSUN data used to support the findings of this study are included within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This research was funded by the Open Program of Shanghai Key Lab of Intelligent Information Processing under Grant no. IIPL-2019-10.
GoodfellowI. J.Pouget-AbadieJ.MirzaM.Generative adversarial networks2014https://arxiv.org/abs/1406.2661GoudetO.KalainathanD.CaillouP.Learning functional causal models with generative neural networks2018New York, NY, USASpringer Link3980ZhangY.-Y.ShenC.-M.FengH.FletcherP. T.ZhangG.-X.Generative adversarial networks with joint distribution moment matching20197457959710.1007/s40305-019-00248-x2-s2.0-85067305207SongS.ZhangW.LiuJ.MeiT.Unsupervised person image generation with semantic parsing transformationProceedings of the IEEE Conference on Computer Vision and Pattern RecognitionJune 2019Long Beach, CA, USAIEEE2357236610.1109/CVPR.2019.00246OdenaA.OlahC.ShlensJ.Conditional image synthesis with auxiliary classifier GANs70Proceedings of the 34th International Conference on Machine LearningJuly 2017Sydney, Australia26422651YangC.LuX.LinZ.ShechtmanE.WangO.LiH.High-resolution image inpainting using multi-scale neural patch synthesisProceedings of the IEEE Conference on Computer Vision and Pattern RecognitionJuly 2017Honolulu, HI, USAIEEE6721672910.1109/CVPR.2017.4342-s2.0-85041905053ZhuJ.-Y.KrähenbühlP.ShechtmanE.EfrosA. A.Generative visual manipulation on the natural image manifoldProceedings of the European Conference on Computer VisionSeptember 2016New York, NY, USASpringer Link59761310.1007/978-3-319-46454-1_362-s2.0-84990053619LassnerC.Pons-MollG.GehlerP. V.A generative model of people in clothingProceedings of the IEEE International Conference on Computer VisionOctober 2017Venice, ItalyIEEE85386210.1109/iccv.2017.982-s2.0-85041387116GanH.LuoZ.MengM.MaY.SheQ.A risk degree-based safe semi-supervised learning algorithm201671859410.1007/s13042-015-0416-82-s2.0-84954221644FedusW.GoodfellowI.DaiA. M.MaskGAN: better text generation via filling2018https://arxiv.org/abs/1801.07736JetchevN.BergmannU.VollgrafR.Texture synthesis with spatial generative adversarial networks2016https://arxiv.org/abs/1611.08207ChoiY.ChoiM.KimM.HaJ.-W.KimS.ChooJ.StarGAN: unified generative adversarial networks for multi-domain image-to-image translationProceedings of the IEEE Conference on Computer Vision and Pattern RecognitionJune 2018Salt Lake City, UT, USAIEEE8789879710.1109/cvpr.2018.009162-s2.0-85062854894ZhangH.GoodfellowI.MetaxasD.OdenaA.Self-attention generative adversarial networks2018https://arxiv.org/abs/1805.08318ZhouC.ZhangJ.LiuJ.Lp-WGAN: using Lp-norm normalization to stabilize Wasserstein generative adversarial networks201816141542410.1016/j.knosys.2018.08.0042-s2.0-85053356887RubnerY.TomasiC.GuibasL. J.The earth mover’s distance as a metric for image retrieval20004029912110.1023/a:10265439000542-s2.0-0034313871WeiX.GongB.LiuZ.LuW.WangL.Improving the improved training of wasserstein gans: a consistency term and its dual effect2018https://arxiv.org/abs/1803.01541MaoX.LiQ.XieH.LauR. Y. K.WangZ.SmolleyS. P.Least squares generative adversarial networksProceedings of the IEEE International Conference on Computer VisionOctober 2017Venice, ItalyIEEE2794280210.1109/ICCV.2017.3042-s2.0-85041908569NowozinS.CsekeB.TomiokaR.f-GAN: training generative neural samplers using variational divergence minimizationProceedings of the Advances in Neural Information Processing Systems2016Barcelona, Spain271279WangY.YuB.WangL.3D conditional generative adversarial networks for high-quality PET image estimation at low dose201817455056210.1016/j.neuroimage.2018.03.0452-s2.0-85045122885WuC.LiL.YangZ.YanP.JiaoJ.Image-to-Image local feature translation using double adversarial networks based on CycleGANProceedings of theInternational Conference in Communications, Signal Processing, and SystemsAugust 2019New York, NY, USASpringer Link90791510.1007/978-981-13-6504-1_109LiY.XiaoN.OuyangW.Improved generative adversarial networks with reconstruction loss201932336337210.1016/j.neucom.2018.10.0142-s2.0-85055029754LimJ. H.YeJ. C.Geometric GAN2017https://arxiv.org/abs/1705.02894Jolicoeur-MartineauA.The relativistic discriminator: a key element missing from standard GAN2018https://arxiv.org/abs/1807.00734