Digital Image Art Style Transfer Algorithm Based on CycleGAN

With the continuous development and popularization of artificial intelligence technology in recent years, the field of deep learning has also developed relatively rapidly. The application of deep learning technology has attracted attention in image detection, image recognition, image recoloring, and image artistic style transfer. Some image art style transfer techniques with deep learning as the core are also widely used. This article intends to create an image art style transfer algorithm to quickly realize the image art style transfer based on the generation of confrontation network. The principle of generating a confrontation network is mainly to change the traditional deconvolution operation, by adjusting the image size and then convolving, using the content encoder and style encoder to encode the content and style of the selected image, and by extracting the content and style features. In order to enhance the effect of image artistic style transfer, the image is recognized by using a multi-scale discriminator. The experimental results show that this algorithm is effective and has great application and promotion value.


Introduction
e specific characteristics of artistic schools are mainly artistic styles, and different styles also present different cultural connotations and backgrounds. Image art style transfer is mainly the processing of digital images, which refers to the method of forming new image features by fusing the features of different images with each other.
ere are more and more achievements in the field of deep learning. For example, in 2015, scholar Gatys proposed a method of image art style transfer, through image reorganization, segmentation, and other methods, to generate a confrontation network model. It has also become particularly applicable in image processing, which has attracted the attention of many scholars. e main principle is to improve the performance of neural networks through adversarial training. In order to ensure the effectiveness and operability of the network, the applicability of the model is ensured by adjusting the introduction mechanism of the generative confrontation network and adjusting the frame result. is paper proposes a new artistic style based on the generative confrontation network model, through the extraction of image features and the transfer of image artistic styles. is article focuses on how to finally achieve the artistic visual effects that need to be presented based on the generative confrontation network model. Some problems that can be improved are discovered, and then verified by experiments, practical solutions are proposed, which have certain practical application value and important research significance.

Related Work
Deep neural networks contain one or more hidden layers of neural network structures, which have good generalization capabilities for complex problems. In theory, deep learning networks can fit arbitrary complex functions and show a powerful ability to learn sample features. In deep learning, CNN is the most commonly used deep neural network for processing computer vision tasks. By sharing the convolution kernel, the CNN can process high-dimensional data, and as long as the training parameters are set, the features in the image can be extracted through layered convolution. erefore, deep neural networks based on CNN are good at learning and extracting the features and special information contained in images, such as texture features, shape features, boundary features, grayscale features, topological features, and relational structures. e extraction and utilization of these features or information make the application of image processing further enriched and expanded.
Different experts and scholars have carried out different researches on image art style transfer processing. Among them, Sun et al. [1]proposed a CNN architecture with two paths, by extracting the object features and textures of the image, respectively. Features realize image style recognition and output image category. Zhao et al. proposed a [2]highdefinition image deep learning training algorithm. e method is mainly divided into three steps: (1) break the traditional method of extracting image features, by extracting the image features of high-and low-resolution images, which can greatly save training time; (2) establish a high-resolution deep learning model based on the dictionary model; (3) transform low-resolution images into highresolution images. Lu et al. [3][4][5] believed that the current deep learning model can no longer adapt to the aggregation of multiple images, and proposed a multi-image training model based on an improved neural network. Its framework is mainly to integrate the feature extraction and aggregation algorithms so that the image can extract multiple image features. Castillo et al. proposed constructing a model of artistic target transfer, its principle is mainly to carry out artistic transfer and fusion of a single image object. Regarding the artistic style transfer of dynamic video, Ruder et al. improved on the basis of Gatys, Gatys mainly transfers the artistic style of still images, and Ruder proposes to extract the artistic style of the video image, and extract and transfer the artistic style of the entire video. With the continuous extension and development of image processing methods and technologies in recent years, some creative thinking and methods have also been continuously developed. is paper proposes a generative confrontation network model, which encodes the style and content of the image based on the residual network structure and the encoding and decoding structure. e most important part of network decoding is the generative network, which uses the encoded information to generate the image of artistic style transfer, discriminate the network, make the image effect of the generated network better, and the image more vivid and lifelike, thereby improving the transfer effect of the entire image art style [6][7][8][9][10].

e Algorithm Flow of Generating a Confrontation Network
GAN is a powerful generative model. e model is mainly composed of two parts; First, the generative model, which is mainly to generate natural and real images as much as possible; second is the discrimination model, which mainly judges the authenticity of the image [11].

GAN Algorithm
Flow. e ultimate goal of GAN is to realize the transfer of artistic style by judging model D and generating model G. G and D are generally nonlinear mapping functions, usually in the form of a network structure such as a multilayer perceptron or a convolutional neural network change. Assuming that P g (z) � {z(1),. . ., z(n)} is a given random noise sample set that obeys a simple distribution, P data (x) � {x(1),. . .,x(n)} Is the real data sample set, G is the generating network, and D is the discriminant network (Algorithm 1) [8].

CycleGAN Algorithm.
CycleGAN can transform two different styles of images, such as the mutual conversion of photos and oil paintings, day and night, spring, summer, autumn, and winter. Assume that X and Y are data sample sets in two different fields, G is a generator generated from X field to Y field, F is a generator generated from Y field to X field, it is to identify the X field data (real X field sample). e true and false discriminator of the X field data is generated by F, and F is a discriminator for authenticating the Y field data (real Y field samples and Y field data generated by G), which is a cycle consistent loss. e CycleGAN algorithm is described as follows (Algorithm 2) [9].

Image Art Style Transfer Method.
e essence of the image to image conversion task is to learn the mapping between X and Y image domains: G XY: X ⟶ Y and G YX : Y ⟶ X. e goal of the unsupervised translation is to restore this mapping using unpaired samples in the original data. is method combines the instance information and adds a set of instance attribute information to the original image x to improve the image to image translation effect. erefore, this method can be understood as learning X × A and Y × for the joint mapping between B, each network is designed to simultaneously encode the image and the corresponding instance mask. e method is described in detail as follows: CycleGAN is selected as the basic structure, and the periodic consistency loss is used to encourage G YX (G YX (x)) ≈ x and G XY (G YX (y) ≈ y. As shown in Figure 1, it is composed of two generators g and F, and two discriminators D Y and D X . It is assumed that the given source image data set and target image data set are x i N i�1 ∈ X and y i M j�1 ∈ Y , N and M represent the data set size of the source image and the target image, respectively, X i represents the first image in the source image, and y j represents the j th image in the target image. e adversary discriminator D Y is responsible for distinguishing the original image {y} from the generated image {G( x )}, and the adversary discriminator D X is also responsible for distinguishing the original image{x} from the generated image {F (y) }.
is structure is circular, that is, converting from one domain to another and then returning. Among them, generators G and F and discriminators D Y and D X follow the structure in Figure 1 For the task of this paper, the structure is also maintained, and the whole includes two generators G XY and G YX , as well as the corresponding adversary discriminators D Y and D X . First train the generator G XY : X × A ⟶ Y, which simultaneously encodes and converts the original data (x, a) to the target domain data y′. e generator first inputs the original photo and the corresponding segmentation mask, then uses the image feature extractor to extract features from the photo and mask, respectively, connects the corresponding image with the mask features, feeds them back to the image generator, and finally outputs the transformed image. e adversary discriminator D Y : Y × B ("Y," "not Y"}, where dy encodes y and b (or y′ and b′) at the same time and then determines whether the data is in the target domain Y × B.
e results are fed back to the generator or discriminator to be improved. e specific structure is shown in Figure 2.
For generator G YX : Y × B ⟶ X, which simultaneously encodes and converts the original data (y′, b′) to the target domain data X. e generator first inputs the photo generated by G XY and its corresponding segmentation mask, then uses the image feature extractor to extract features from the photo and mask, respectively, connects the corresponding image with the mask features, feeds them back to the image generator, and finally outputs the transformed image. e adversary discriminator D X : X × A ("x," "not X"}, where DX encodes x′ and a (or x and a) at the same time, then distinguish whether the data is in the target domain X×A, and feedback the results to the generator or discriminator to be promoted. e specific structure is shown in Figure 3.
rough the continuous game and learning using the generator and discriminator, the generator can finally generate an image with the style of the target area, while keeping the background and corresponding structure position completely consistent with the original image. For our task, input the image containing the target clothing and the Figure 1: Periodic consistent generation countermeasure network model.

BEGIN FOR training iterations DO:
Extract m noise samples [z (1) ,. . ., z (m) ]from the noise data p g (z). Extract m noise samples [x (1) ,. . ., x (m) ]from the real data p data (x). Optimize the parameters of the discriminant network through stochastic gradient ascent.

BEGIN FOR training iterations DO:
Extract n sample data from the X field {x (1) , . . ., segmentation mask corresponding to the target clothing. After the whole model generation process, we can finally fix the clothing style migration task in the specific area of the target clothing, that is, complete the corresponding style migration of the target clothing while keeping the content and style of other areas unchanged. In addition, according to the target task requirements of this paper, the corresponding mask should be completely consistent before and after the image conversion, that is, the mask of the same image input generator should be the same as that in the discriminator in the whole process, which can be expressed as a � b′.

Image Art Style Transfer Based on Generative Confrontation Network
is article mainly improves the CycleGAN network to realize the artistic style transfer of images.
rough the improvement of the CycleGAN network, the effect of image art style transfer is improved on the premise of reducing the calculation resources and calculation time occupied by the algorithm. Partial optimizations have been made in all aspects of training for research purposes.
is chapter will sequentially introduce the image art style transfer network design based on the generative confrontation network, the design of the loss function, the training process and testing process of the network, the experimental environment, and the analysis of the experimental results [12][13][14].

Image Art Style Transfer Network Design Based on
Generative Confrontation Network

Encoder Design
(1) Content Coding. e new content encoder in the network structure of this article adopts the autoencoder network model. e autoencoder uses a neural network architecture.
In the self-encoder, the network part from the input layer to the coding layer is called the encoder, and the network part from the coding layer to the output layer is called the decoder [15]. e content encoder in the network structure of this article is mainly composed of the residual network. In 2016, He et al. suggested residual learning in the framework of ResNet, the input of the residual network is x, which is added to the output F(x) of the function through the shortcut operation, and the output is F(x) and x F(x) + x. e shortcut connection in the residual network increases the number of layers of the network, makes the extracted image features richer, and can effectively avoid the gradient disappearance and gradient explosion of the network [16]. e deep residual network can be effectively applied to the fields of image classification, image target detection, and image target localization. e content encoder in this article mainly uses an improved residual network, adding an instance normalization layer and a regular activation function after the convolutional layer. e schematic diagram of the residual structural unit in the content encoder is shown in Figure 4. e network structure diagram of the content encoder is shown in Figure 5.
e specific process of the content encoder is described as follows: (1) e input picture is a 256 × 256 pixel three-channel image, which is down-sampled; the first layer uses a 7 × 7 size convolution kernel, the number of convolution kernels is 64, the sliding step is 1, and the filling size is 3, then the instance normalization process is performed, and finally, the ReLU activation process is performed.
(2) e second and third layers both use 4 × 4 convolution kernels, the number of convolution kernels is 128 and 256, respectively, the sliding step size is 2 for both layers, and the filling size is 1, and then the instance is normalized, and finally the ReLU activation processing is carried out.
(3) en use 2 residual modules, there are two convolutional layers in the residual module, the number of convolution kernels in each convolutional layer is 256, the size of the convolution kernel is 3 × 3, and the sliding step size is 1 for all, the padding size is 1, then the instance normalization process is performed, and finally the ReLU activation process is performed; specifics are as shown in Table 1.
(2) Style Coding. e new style encoder in the network structure of this paper adopts the variational autoencoder network model. Variational autoencoders require the hidden vectors generated by encoding to follow a Gaussian distribution. is restriction enables the encoder to understand the underlying laws of the training data and enables the encoder to learn the hidden variable model of the input data. e ordinary autoencoder learns a certain function through training data, and the variational autoencoder can learn the probability distribution of the parameters based on the training data. e style encoding of the image adopts a variational autoencoder to increase the randomness of style features [17]. e specific implementation method of the variational autoencoder is to change the output result of the encoder from one to two in the encoding stage, and the two vectors  correspond to the mean vector and the standard deviation vector. and the hidden code vector is obtained by random sampling of the probability model [18][19][20]. e variational autoencoder can form a hidden vector of Gaussian distribution, randomly sample a hidden encoding vector from the Gaussian distribution as the style feature of the image, and then input the extracted style feature into the generation network. e network structure diagram of the style encoder is shown in Figure 6. e specific process of the style encoder is described as follows: (1) e input picture is a 256 × 256 pixel three-channel image, which is down-sampled; the first layer uses a 7 × 7 convolution kernel, the number of convolution kernels is 64, the sliding step is 1, and the filling size is 3, then proceed to the ReLU activation reason. (2) e second to fifth layers all use 4 × 4 convolution kernels, the number of convolution kernels are 128, 256, 256, 256, respectively, the sliding step size is 2, the padding size is 1, and the ReLU activation processing is performed; then the average pooling operation is carried out. (3) e last layer uses a 1 × 1 size convolution kernel to realize the function of full connection. e number of convolution kernels is 8, the sliding step is 1, no padding, and no activation function; specifics are as shown in Table 2.

Generating Network Design.
is article improves the generation network. In the process of implementing the image art style migration of the CycleGAN network, the generation network uses the U-net jump cascade network structure. e U-net performs multiple splicing in the decoding part and the encoding part, not only the previous decoding e result is used as input and the corresponding encoding part is also used as input for decoding so that the detailed information and high-level semantic information of the input image can be retained, but the U-net network structure occupies a large amount of memory, which increases the number of network calculations and the generated image. e result of artistic style transfer is rather vague [21][22][23]. In this paper, the generation network mainly uses the residual network structure, and at the same time, the improved generation networks G1 and G2 have the same network structure, but the purpose is different. e purpose of generating the network G1 is to receive the content and style features of the image, and then through the calculation of the intermediate hidden layer, so that the content of the image after the artistic style transfer remains unchanged, which improves the robustness of the network and makes the generated image more realistic. e generation network is a fully convolutional network, which uses residual structural units, up-sampling, and convolution operations to complete the artistic style transfer of the image. e residual structural unit used in the generation network is different from the residual structural unit used in the content encoder.   Computational Intelligence and Neuroscience e normalization method of the residual structural unit in the network uses adaptive normalization. e adaptive normalization method can speed up the stylization of the image and make the generated image retain a strong style. In the generation network, the content information and style information is first input into the residual network at the same time, and after 4 residual module units, specifics are as shown in Figure 7. e generative network is a convolutional neural network. e input is style features and content features, and the output is an image after artistic style transfer. e specific process is described as follows: (1) e input style coding and content coding first go through 4 residual network structures, in which each residual network structure uses a 3 × 3 size convolution kernel, the number of convolution kernels is 256, the sliding step size is 1, the padding size is 1, the normalization method uses adaptive normalization, and then performs the ReLU activation processing. (2) Use the nearest neighbor interpolation to double the size of the image, and then uses a 5 × 5 size convolution kernel. e number of convolution kernels are 128, 64, sliding step lengths are 1 for both, the padding size is 2, and the normalization method uses layer normalization, and then performs the ReLU activation processing. (3) e last layer uses a 7 × 7 convolution kernel, the number of convolution kernels is 3, the sliding step size is 1, and the padding size is 3. en perform Tanh activation function processing and output the image after the artistic style transfer. Specifics are as shown in Table 3.
As a result of artistic style transfer, some grid-like images often appear.
is is because the images of artistic style transfer are generally low-scoring and have high-level semantic information. After the features of the image are extracted through the convolutional neural network, the features are combined to complete the artistic style transfer of the image.

Discriminant Network Design.
e discriminant network in the CycleGAN network uses a convolutional neural network to extract the features of the entire image and then based on the extraction to distinguish the features. However, this method is easy to ignore the detailed information of the image, resulting in problems such as blurring and loss of details in the generated image. is paper uses the multiscale discriminant network structure in the construction of the discriminant network. e multiscale discriminant network structure is an improvement of the block discriminant network structure. e block discrimination network structure is to discriminate image blocks, and finally, take the average value as the final discrimination result of the discrimination network. Compared with discriminating the entire image, the block discrimination network makes the generated image closer to reality. e multiscale discriminant network combines multiple block discriminant networks to enhance the discriminative   Computational Intelligence and Neuroscience ability of the discriminant network. In this way, this article promotes the generated images to be more realistic, thereby improving the quality of the generated images. It is judged that the network structure of networks D1 and D2 is the same. In order to accelerate the convergence speed of the network and improve the generalization performance of the model, to make the image of the artistic style transfer more lifelike, the multiscale discriminator used in this article is composed of two identical discriminators, and each discriminator model has the same network structure. Discrimination, using the discriminant results of image blocks of different sizes as the final output, thereby improving the discriminative ability of the discriminant network. e discriminant network used in this paper is shown in Figure 8.
In the final analysis, the discriminant network is a convolutional neural network composed of a convolutional layer and a fully connected layer. e main principle is to compare and analyze the features of the sample images. e specific structure diagram is shown in Figure 9. e network structure of the discriminant network consists of 4 convolutional layers and 1 fully connected layer. e specific process is described as follows: (1) e input image is a three-channel image with a size of 256 × 256. (2) e first to fourth layers all use 4 × 4 convolution kernels, the number of convolution kernels are 64, 128, 256, and 512, the sliding step size is 2 for both, the filling size is 1, and the LReLU activation process is performed. (3) e fifth layer uses a 1 × 1 convolution kernel, the number of convolution kernels is 1, and the sliding step is 1, specifics are as shown in Table 4.
As shown in Table 4, in order to prevent the problem of gradient disappearance during the training process of the generative adversarial network, this paper uses the LReLU activation function in the activation function of the discriminant network except for the output layer. e last layer uses a 1×1 size convolution kernel to realize the function of full connection, convert multi-dimensional input data into one-dimensional, and output the discriminant result of the discriminant network.

Design of Loss Function.
e CycleGAN total loss consists of three parts: two adversarial losses and a cycle consistent loss. Taking x⟶G(x)⟶F(G(x)) as an example, the x ⟶ G(x) adversarial loss is l GAN G, D y � E y∼pdata(y) log D y (y) e confrontation loss of G(x) ⟶ F(G(x)) is In the training process of the network, the loss function can speed up the training of the network and improve the quality of image art style transfer.   e entire database has 4482 photos in summer and winter. e training set divides the photos into two folders trainA and trainB according to 8 : 2; the test set is divided into testA folders and testB folders. In the experiment, the batch size is set to 1, and the epoch is set to 10. rough repeated iterations, the Adam optimization parameters learning rate is set to 0.001 for optimization [24].
In the experiment established in this paper, the parameters of the image reconstruction loss function and the content encoding loss function are set, respectively. e weight of the image reconstruction loss function λ x is set to 10.0, the weight λ c of the content encoding loss function is set to 1.0, and the cyclic consistency loss weight of the function λ cyc is set to 10.0, and the counter loss function λ gan is set to 1.0, as shown in Table 5.

Experimental Results and Analysis.
In the process of running the model, you need to save the encoder, generate a network, discriminant network parameter settings, and model running parameter settings in a file every 30 times of training. When the training is interrupted due to external factors, the model has a memory function, and it can continue to run from the interrupted position until the training is completed. After all network training is completed, the artistic style is transferred through the saved model file information. Figure 10 is the winter image after the summer image migration experimental results of artistic style photos.
After testing the image art style migration of the summer2winter_yosemite dataset, it can be seen from Figure 10

Conclusion
Image art transfer technology has now been widely used and developed prospects, and has relatively successful cases in many fields, but also attracts a large number of experts and scholars to study. is paper proposes an improved generative adversarial network model, which improves the traditional drawbacks of artistic style transfer due to blurred images. e method can extract the content and style characteristics of the selected data set images at the same time. Based on the predecessors, the traditional generative confrontation network model is improved. e content and style encoder of the model are used to extract the features of the image under study. e automatic encoder based on the neural network applies the normalization method to the content and style features. Fusion changes the deconvolution operation in the traditional CycleGAN network, uses image size and convolution, improves the traditional discriminant network, and uses multiscale discriminators to further improve the overall effect of artistic style fusion and migration.
Data Availability e dataset can be accessed upon request.

Conflicts of Interest
e author declares that there are no conflicts of interest.   Computational Intelligence and Neuroscience 9