Application of an Improved DCGAN for Image Generation

,


Introduction
With the introduction of the concepts of cloud computing and big data and the rapid development of computer hardware facilities, deep learning has undergone rapid development and has been used in many applications in recent years [1].However, the development of image generation technology is slow in several branches of deep learning.Before GANs were proposed, the main image generators were automatic regression models [2] and variational autoencoders [3].At the same time, based on the improvements in GANs theory, GANs have been applied in image conversion, image feature extraction, and other elds.
ere are many models used in image generation and modeling research, including BPT-CNN [4], the BERTbased deep spatial-temporal network [5], GeNet of deep convolutional neural network [6], and RNN-LSTM [4].However, as a new type of image generation model, GANs have attracted the attention of many researchers, who have gradually improved and provided a large number of mature image generation frameworks (such as DCGAN, CGAN, Pix2Pix, etc.).In terms of theoretical research on GANs, in 2014, Goodfellow et al. [7] rst described a new image generation model, the GANs, which is composed of a generator and a discriminator [7].In the same year, Mirza and Osindero [8] were inspired by the introduction of convolutional neural networks on the basis of GANs and proposed the CGAN, which solved the unstable training behavior problem of GANs by adding category labels [8].To solve the instability of GANs, in 2016, Goodfellow et al. [7] and Salimans et al. [9] proposed stabilizing the training process of DCGAN with feature matching, small batch recognition, and historical averaging, and this work provided a basis for follow-up research [9].With regard to the applications of GANs, Isola et al. [10] implemented image conversion using Pix2Pix and paired training data [10]; Zhang et al. [11] proposed StackGAN, which rst generates basic images and text descriptions based on the original image information and then improved the process to generate high-resolution images [11].In 2017, Zhu et al. [12] proposed CycleGAN, which solved the problem of Pix2Pix needing paired data and proposed the cycle-consistency loss function to realize image conversion from a horse to a zebra [12].Karras et al. [13] proposed StyleGAN in 2018 to accelerate and stabilize the training speed of the network by gradually increasing the numbers of generators and discriminators [13]. is method uses natural style-conversion technology, such as adaptive instance normalization (AdaIN), for reference purposes and realizes the real-time transmission of any style [14].BigGAN was proposed by Andrew Brock in 2019.BigGAN adopts a self-attention mechanism and spectral normalization, and it is a good model for image generation on ImageNet at present [15].In summary, there have been some comprehensive methods proposed to solve the problems of GANs generation and resolution in recent years.
Alec Radford et al. [16] used the CNN structure [17,18] to implement the GANs model for the first time and proposed the DCGAN [16].Liu et al. compared the unconstrained DCGAN and the constrained DCGAN, and the results showed that after adding constraints during the training phase, the DCGAN model significantly improved upon the results of the virtual face generation model, thus demonstrating the enhanced ability of the generator and discriminator [19].Mahmoud and Guo [20] used the DCGAN to extract depth features for strongly representing TSR images [20].Fang et al. [21] proposed a new gesture recognition algorithm based on a convolutional neural network and the DCGAN and applied this method to expression recognition, calculation, and text output, achieving good results in all cases [21].e actual image and noise vector of the DCGAN were trained, and smoke image training was used to generate a discriminator, thereby showing that the DCGAN can effectively monitor smoke images [22].e biggest differences between the DCGAN and original GANs are that the DCGAN uses a convolutional neural network (CNN) to replace the multilayer perceptron in the original GANs, removes the pooling layer, and uses a convolution with a defined step size to replace the upper sampling layer for improving the stability of the training process.
To solve the problem of the easily disappearing gradient in GANs, this paper further improves the DCGAN fully connected layer.For the activation functions of all layers of the discriminator, generator output, and other layers, LeakyR-eLU, Tanh, and ReLU functions are used, respectively, and the open source MNIST dataset is used.For verification, we use the improved DCGAN and GANs for comparison and establish ISs and ISc from the two aspects of image quality and image generation, respectively.According to the numerical value, it can be concluded that the improved DCGAN has better image processing ability.e remainder of the paper is organized as follows: Section 1 summarizes the progress of research with regard to GANs and the DCGAN; Section 2 mainly introduces the principles of the improved DCGAN algorithm and designs the network structure; Section 3 constructs the image generation models, with one based on GANs and the other based on the DCGAN; Section 4 introduces two image generation quality evaluation methods and analyzes the effects of the two models on image generation quality and image diversity.e study's discussion and conclusions are presented in Section 5.

Improved Design of the
where x is the distribution of the real data, x is the sample image data, P Z represents arbitrarily distributed noise, Z expresses the number of random vectors in P Z , and Eexpresses expectations.e first step is to find the minimum cross-entropy of the discriminator D under the condition where a generator G is given.e objective function was calculated by using where log[D(x)] is used to judge the sample data, log[1 − D(G(Z))] represents the judgment of the generated sample data, that is, the closeness of the distribution of the sample data output by the discriminator P data and the data distribution generated by GP G(x) , and x is a sample from the real data, according to At this time, because the data and generator have been given, they can be regarded as constants.Assuming that the data and generator are replaced, then Let f(D) � 0 in (1); then it is the maximum point: e second step is to fix the discriminator D. At this time, the optimization function for the generator G was calculated by using Furthermore, using (2), D * brings V(G, D) as the optimal solution of the generator, which can be calculated by using In practical training, the discriminator D is usually trained first.
en, the discriminator D is fixed and the generator G is trained.Next, we continue to fix G and train the discriminator D, performing iterative optimization training until P data � P G , at which point global optimization is achieved.

Design of the Structure of the DCGAN.
Compared with traditional GANs, the salient feature of the DCGAN is that a CNN is used to replace the multilayer perceptron.e pooling layer and sampling layer are removed in the CNN model.
e convolution layer is used to discriminate the image in the discriminator, and the deconvolution layer is used to generate the image in the generator.e specific structure of the DCGAN generator is as follows: the input layer is followed by a batch normalization layer (which can hasten the convergence of the model), and the reshaping layer is used to normalize the preliminary data; then, an upsampling layer, a Conv2DTranspose layer, and a batch normalization layer are used to sample, deconvolute, and normalize the data, respectively.In this paper, the DCGAN adds three groups of the above structures to increase the depth of the network.e main framework of the network architecture of the generator is shown in Figure 1.
In this paper, to solve the problem in which gradients disappear easily, the LeakyReLU activation function is used in all layers of the discriminator, and the Tanh activation function is used in the output layer of the generator, where definition of this function is For the generator, except for the activation function of the last layer, the ReLU activation function is used, and its definition is In addition, the concrete structure of the DCGAN discriminator is a Conv2D layer (2D convolution layer), a batch normalization layer, and a dropout layer (after the image is convoluted, normalization process is continued, and the dropout layer is added to increase the generalization ability of the model).ese three layers form a group, and four groups are added.Finally, a flattening layer and a fully connected layer are used to flatten the data and output the probability of whether it is sample data or generated data.Except for that of the last layer, the activation function of the other layers is the LeakyReLU function.e definition of this function is where x is a sample from the real data and a is the relevant parameter.e activation function of the last layer adopts the sigmoid function.e definition of this function is (11)

Construction of the Image Generation Models Based on the DCGAN and GANS
3.1.Data Sources. is paper uses the MNIST dataset, which is free of charge, is open source, contains small pixels, and includes a large number of points [24].It is composed of 250 handwritten digits (0-9, a total of 10 digits); it is relatively mature in image processing and image quality processing as shown in Figure 2.
To universalize the dataset, 50% of the data in the dataset are from high school students, and 50% are from Census Bureau staff.At the same time, this paper uses the Keras framework with Tensorflow as the back end.
e Keras framework is an open source artificial neural network library written in Python.
e code structure is written with an object-oriented method, which is completely modular and extensible.It is suitable for the implementation framework of the code in this experiment.

Determination of the DCGAN Model Parameters.
e collected basic data is used in the DCGAN model for experimental research, and the network parameters are determined through continuous testing.
e situation is as follows: the model generator accepts 1 × 100 random, normally distributed noise data.Considering that the layout of the pixels of the image generated by deconvolution is 28 × 28 (i.e., MNIST dataset image pixels), the input layer is set as a fully connected layer, and the number of neurons is 7 × 7 × 256.After that, a batch normalization layer and reshaping layer are added, and then the size of the data pixel is expanded through the upsampling layer.Using a 5 × 5 convolution kernel, the conv2dspread layer uses the "same" border model to preserve the convolution results at the boundary, so that the input and output dimensions are the same.e upsampling layer is added after the first module to make the pixel size of the output image 28 × 28. e activation function in this layer uses the ReLU function, and the last layer uses the Tanh function.
In the discriminator, a Conv2D layer (2D convolution layer), a batch normalization layer, and a dropout layer are added as a module, and four modules are added.Finally, a flattening layer and a fully connected layer are used as the back end.e Conv2D layer uses a 5 × 5 convolution kernel, the boundary mode of the convolution layer is the same as that in the generator, and the activation function is the Mobile Information Systems LeakyReLU function.e dropout layer in the module makes the activation value of a certain neuron have a certain probability p when it propagates forward. is can make the model independent of some local features and enhance the generalization ability of the model.e last activation function uses the sigmoid function, which can output the probability of discrimination, that is, the probability that the discriminator thinks the image belongs is the real image and not a generated image.
e discriminator uses the Adam optimizer with a learning rate of 0.0002, and the GANs use the RMSprop optimizer with a learning rate of 0.0001.
e RMSprop optimizer combines the exponential moving average of the square of the gradient to adjust the learning rate.It can converge effectively under an unstable objective function and yields good results with the DCGAN model.e batch size is 32, and the training time of each DCGAN iteration is 10000.e binary cross-entropy function is selected as the loss function.
e training process of the DCGAN is slightly different from that of the GANs because it takes more time to train the discriminator of the DCGAN model.First, when training the discriminator, the input data size is 2 × batch, where the input data contains the real data and generated data from one batch; second, the combined data of size 2 × batch are used as the input data to train the discriminator 5 times; finally, the whole GAN model is trained once with the random noise data from one batch as the input data, and only the generator is updated.A complete training cycle is a batch (epoch) in which the ratio of discriminator training iterations to generator training iterations is 1 : 5 to realize the alternating iterative training process.e main parameter configuration of the DCGAN is shown in Table 1.      2.
During training, the discriminator is trained once, and the input data are half true and half false; i.e., the input dataset is composed of half real data and half batch-generated data.Such a complete combined dataset is used as input data to train the discriminator once.en, the whole GAN model is trained once with a batch of random noise data as input, and only the generator is updated.Such a training cycle is a batch (epoch), which is performed to realize the alternating iterative training process.

Experimental Results of the Image Generation Models Based on the GANs and DCGAN.
ere are experimental groups in the training experiments of the image generation models based on the GANs and DCGAN.Each experimental group is divided into Experiment 1 (composed of 250 handwritten digits (0-9, a total of 10 digits)) and Experiment 2 (only using the number "6" in the dataset).e experiment under the unified experimental group is conducted to observe the learning effects of different networks with different image distribution complexity and image generation quality; the experiment with different experimental groups using the same dataset is done to compare the advantages and disadvantages of the GANs and DCGAN with regard to image generation.e experiments covered in this article require the use of the following: CPU frequency: 2.5 GHz, memory capacity: 16 GB, graphics chip: NVIDIA GeForce RTX 3070, and hard disk capacity: 512 GB. .By observing the output at each checkpoint, we can find that the original noise data are disordered.After 1K iterations, the image is gradually regionalized, and the only content is in the central area.However, most of the digital contours cannot be clearly recognized, and obvious characteristics of the deconvolution layer can be found.e image learning process includes regionalization and characterization rather than pixel learning, similar to the process of the GANs.Training 5K times can yield a gradually clearer line for which the numbers can be identified but not recognized clearly.After training 10K times, each number can be clearly identified, clear images are generated, and the complex image distribution is successfully fitted.
From the loss images of the discriminator (Figure 4(a)) and the generator (Figure 4(b)), it can be seen that the discriminator stabilizes at approximately 0.5 for a very short batch.For the 1K batch, the loss of the generator decreases to 2, then approaches 1 slowly, and finally stabilizes at approximately 1.5, but there is a downward trend.Because of this configuration, the experiment cannot continue; even if the Nash equilibrium point is not reached, the effect of the output image is still very good, and this shows that the DCGAN can obtain better experimental results than those of other methods under the premise of satisfying the computational power requirements.
(2) Experiment 2. Run the same training process as in Experiment 1 in the local environment for 10 h; the results are shown in Figure 5, in which Figures 5(a)-5(d) are the results of original noise image, the results after training 1K times, the results after training 5K times, and the results after training 10K times, respectively.By observing the output of each checkpoint, it can be found that the original noise is the same as in Experiment 1.When iterating 1K times, the number "6" can be identified, such as the first number from 6 Mobile Information Systems the left in the second row and the fourth number from the left in the third row; most of the number "6" can be recognized after iterating 5K times.All the numbers can be recognized after 10K iterations, but the numbers are not standardized, mainly because the MNIST dataset contains handwritten numbers from adults and children.
According to the loss images of the discriminator (Figure 6(a)) and generator (Figure 6(b)), the discriminator D starts to fluctuate at approximately 0.5 at 1000 epochs (the ratio of the training times of the discriminator and generator is 5 : 1); the generator also starts to stabilize at approximately 2 at 1K epochs, gradually converges to 1, and finally fluctuates at approximately 1 with a small fluctuation range.
rough the comparative training processes of the two experiments, it is found that the DCGAN model can adapt to both complex image distributions and simple image distributions, and it can converge earlier than other methods.Its learning law is characterized and regionalized.At the beginning, it concentrates on the central area, then learns the line features, and finally learns the location characteristics of the lines.

Training Results of the Image Generation Model Based on GANs
(1) Experiment 1. Set checkpoints in the training process, run the training process for 5 h in the experimental environment, and observe the output results once every 500 training iterations.
e results are shown in Figure 7, in which Figures 7(a)-7(d) are the results of the original noise image, the results after training 10K times, the results after training 50K times, and the results after training 100K times, respectively.By observing the output at each checkpoint, we can find that the original noise data are disordered.After 10K iterations, the image is gradually focused in the central area rather than at scattered points in each position.After iterating 50K times, some figures have a preliminary outline, indicating that the generator is gradually learning the image distribution of the original dataset.Continuing to train for a total of 100K iterations, one can find that most of the figures are clear and distinguishable.is situation lasts nearly 20K batches during the training process, indicating that the GANs maintain stability for a long time but only reach the pseudo-Nash equilibrium point.It is found that adding a hidden layer does not affect the experimental results but rather increases the experimental time.By observing the loss function images of the discriminator (Figure 8(a)) and generator (Figure 8(b)), it is not difficult to see that the loss of the generator decreases rapidly in each of the first 10K batches, then gradually approaches 1, fluctuates greatly between 30K and 70K iterations, and finally stabilizes near 1. is shows that the image generated by the generator can make the discriminator think that it is true.However, the discriminator gradually becomes stable at approximately 0.5 starting with the 15000th batch, and then it fluctuates up and down.As the number of batches increases, the fluctuation range does not decrease, and the overall loss is slightly less than 0.5, indicating that the discriminator has a high probability of correct discrimination (recognition of the real image).After 100 K iterations, the Nash equilibrium point cannot be reached, but the model cannot be further converged.
(2) Experiment 2. e running time of the training process is 5 h in the local environment, and the results are shown in Figure 9, where Figures 9(a)-9(d) are the results of the original noise image, the results after training 10K times, the results after training 50K times, and the results after training 100K times, respectively.e original noise is the same as in Experiment 1, and the learning process of the GANs can still be seen in (b) and (c), but the speed of the pixel setting is much higher than that in Experiment 1.Compared with Figures 9(d) and 7(d) of Experiment 1, it can be found that when iterating 100K times, GANs can effectively fit the simple image distribution.e second digit on the left in the second row of Figure 7(d) generates "6", but this is not as effective as simply learning "6".In Figure 9(d) of Experiment 2, all the numbers "6" can be clearly identified.
Comparing the loss function images of the discriminator (Figure 10(a)) and generator (Figure 10(b)), it is not difficult to find that the generator loss rapidly decreases in each of the first 20K batches, gradually approaches 1, and stabilizes near 1.
is shows that the image gradually generated by the generator can make the discriminator think it is true.Since the 20000th batch, the discriminator gradually stabilizes at approximately 0.5 and gradually reduces its fluctuation range, but the overall loss is slightly higher than 0.5, which 8 Mobile Information Systems indicates that the discriminator cannot correctly judge that the generated image is real, and the loss curve still cannot converge at 0.5; that is, it cannot reach the real Nash equilibrium point.

Construction of the Evaluation Index for Image
Generation.When comparing and analyzing the quality of images generated by models, two factors are generally considered: one is the textures of the images, and the other is the diversity of image generation.At present, the popular quantitative evaluation method for image texture is the inception score (IS).In this paper, the simple initial fraction (ISs) is used to evaluate image quality, and the complex initial fraction (ISc) is used to evaluate image diversity.

Image Quality Assessment.
e calculation formula of the ISs evaluation index using the simple initial score method is where x ∼ p G represents the image generated by the generator to be evaluated, p(y|x) represents the probability that the picture belongs to each category, and p(y) is the probability distribution of the image to be evaluated.e following formula can be obtained by further derivation: It can be seen from ( 13) that the larger the ISs evaluation index, the greater the differences between the p(y|x) and p(y) distributions.
If only the ISs comprehensive analysis method is used, there may be a large error.In this paper, a concept classification network is proposed to eliminate the error as much as possible.In the MNISTdataset, there are 10 numbers from 0 to 9, and each image is a black and white image (the value is composed of 0s and 1s).For any two random variables X, Y, the correlation coefficient is calculated as follows: where Cov(X, Y) in ( 14) is the covariance of X and Y, Var[X] is the variance of X, and Var[Y] is the variance of Y.
According to the principle of the concept classification network, it is not difficult to know that if the image correlation is weak, the classifier easily divides it into two categories; otherwise, it is easier to divide it into one category.When considering the diversity of images, we should consider the distribution of the labels.If the distribution of complex images is complex, we naturally want the labels to be evenly distributed instead of generating a certain kind of image.For example, when all the numbers in the MNIST dataset are used, a good situation is that the generated 0 s-9 s are evenly distributed, rather than the distribution containing more of certain numbers than others.In this case, we need to consider the edge probability p(y).In the ideal state, the expansion can yield p(y 1 ) � p(y 2 ) � • • • � p(y n ) � 1/n, where n is the number of classes in the original training data; the greater the entropy of p(y), the better the situation.
If the same kind of data is used, that is, a single number is used, although these data would be highly correlated and belong to the same class, because of the characteristics of the concept classification network, the set would still be divided into several categories, but limp(y 1 ) � 1 and limp(y i ) � 0 where i � 2, 3, . . ., n.At this time, n is the number of classification categories for the concept classification network.

Image Diversity Assessment.
e complex initial fraction (ISc) is based on the simple initial fraction method.According to the image classification value of the simple distribution, the entropy of the simple image distribution under discrete conditions is calculated, where H in (15) represents entropy, and the calculation formula is as follows: e edge probability value of the simple image distribution is included when i � 1; using ( 16), we calculated the potential: When i � 2, 3, . . ., n, lim p(y i ) � 0 + : us, the entropy limit of the concept classification network is obtained when the image tends to be simple distribution, i.e., when limH(p(y)) � 0 in (17).en, the formula of the ISc evaluation index becomes where H(y|x) in (18) represents the probability that an image belongs to a certain category.e higher the value, the higher the image quality.erefore, when the image distribution is simple, the ISc under a complex distribution can be used to comprehensively evaluate the diversity of the images.e diversity of image generation can be combined with two experiments and a comprehensive ISc analysis.e two groups of experiments discuss image generation under different complexities of the image distribution.If the ISc value can still reach a high value under the condition of a complex image distribution, this shows that the diversity of image generation is very high.

Comparative Analysis of Experimental Results.
According to the image quality evaluation method proposed in Section 4.1, two groups of experiments are compared and analyzed.For the MNIST dataset, first, 10K groups of images are generated by the generator, all data are divided into 10 pieces, and each piece is calculated and averaged.Second, the concept classification network is built to calculate the ISc and ISs of the experimental GANs and the DCGAN, respectively.e experimental results are shown in Table 3. e first experiment is conducted to evaluate the image diversity, and the second is performed to evaluate the image quality.
According to the experimental results in Table 3, in terms of image quality, the ISs value of the DCGAN model is 2.02 higher than that based on the GANs model; in terms of image generation diversity, the ISc value based on the DCGAN model reaches 6.10, which is 1.55 higher than that based on the GANs model.e results show that the improved DCGAN model has more advantages than the GANs model, can effectively solve the problem of low-quality images being output by the GANs model, and achieves good results.

Discussion and Conclusions
Based on the GANs model and the improved DCGAN model, this paper uses the MNIST dataset as experimental data for experiments on the algorithms and evaluates the quality and diversity of the generated images based on the ISs and ISc metrics: (1) is paper compares and analyzes the different performances of traditional GANs and the DCGAN in two groups of experiments.e DCGAN is a model based on a combination of GANs and a convolutional neural network.e fully connected layer is replaced by a convolution layer and deconvolution layer.
e structure of the DCGAN layers is redesigned: An upsampling layer is used in the output layer of the generator to expand the data, and a dropout layer is added in each layer of the discriminator.To solve the problem that the gradient easily disappears in GANs, the generator output process uses a beneficial Tanh function; the ReLU function is used in the other layers of the generator, and the LeakyReLU function is used in all layers of the discriminator.In the two groups of comparative experiments, it can be concluded that the images of the GANs model are generated in a column because of the flattening layers and reshaping layers.
e image generation method using the DCGAN constructed in this paper involves generating regional and characteristic images by using a conv2dspread layer (two-dimensional anticonvolution layer), which fundamentally solves the problem of low-quality images being generated by the GANs.
(2) In this paper, through the optimized DCGAN model, it is proven that the variables of a simple image distribution tend to be independent of each other, thereby overcoming the error caused by the independence of traditional indexes.e ISs value under the simple image distribution is taken as the image quality evaluation result, and the ISc value of the complex image distribution is combined with it to comprehensively evaluate the diversity and quality of the generated image.
rough two groups of experiments, it can be concluded that the image quality evaluation index ISs of the DCGAN model is 6.82, which is 2.02 higher  12 Mobile Information Systems than that of the GANs model with the same image distribution.e image diversity index ISc of the DCGAN is 6.10, while that of the GANs is only 4.55.
e reason for this is that the image quality evaluation index ISs of the DCGAN is 6.82, which is 2.02 higher than that of the GANs model with the same image distribution.In addition, the DCGAN has more advantages than the GANs in terms of its model framework and model detail parameters.

3. 3 .
Determination of the Parameters of the GAN Model.e noise data received by the generator are the same as above, and a basic multilayer perceptron is used to add these three modules as a fully connected layer, a LeakyReLU layer, and a batch normalization layer; the model ends with a fully connected layer and a reshaping layer.e activation function of the last layer uses the Tanh function.e discriminator uses a flattening layer to flatten the data and then adds two fully connected layers.e activation function also uses the LeakyReLU function, and the last activation function uses the sigmoid function to output the discrimination probability.Both the GANs and the discriminator use Adam as their optimizer, and the learning rate is 0.002.To calculate the update step size, the Adam optimizer comprehensively considers the first-order moment estimation (average value of the gradient) and second-order moment estimation (noncentral variance of the gradient).e batch size is 32.Due to the slow convergence speeds of GANs, the model can avoid falling into local optimal solutions and undergo training 100K times iteratively.e binary crossentropy function is selected as the loss function.e main parameter configuration of the GANs is shown in Table

Table 1 :
Main parameters of the DCGAN.

Table 2 :
Main parameters of the GANs.

Table 3 :
Experimental results of the image quality comparison.