A Generative Adversarial Network Fused with Dual-Attention Mechanism and Its Application in Multitarget Image Fine Segmentation

Aiming at the problem of insignificant target morphological features, inaccurate detection and unclear boundary of small-target regions, and multitarget boundary overlap in multitarget complex image segmentation, combining the image segmentation mechanism of generative adversarial network with the feature enhancement method of nonlocal attention, a generative adversarial network fused with attention mechanism (AM-GAN) is proposed. The generative network in the model is composed of residual network and nonlocal attention module, which use the feature extraction and multiscale fusion mechanism of residual network, as well as feature enhancement and global information fusion ability of nonlocal spatial-channel dual attention to enhance the target features in the detection area and improve the continuity and clarity of the segmentation boundary. The adversarial network is composed of fully convolutional networks, which penalizes the loss of information in small-target regions by judging the authenticity of prediction and label segmentation and improves the detection ability of the generative adversarial model for small targets and the accuracy of multitarget segmentation. AM-GAN can use the GAN's inherent mechanism that reconstruct and repair high-resolution image, as well as the ability of nonlocal attention global receptive field to strengthen detail features, automatically learn to focus on target structures of different shapes and sizes, highlight salient features useful for specific tasks, reduce the loss of image detail features, improve the accuracy of small-target detection, and optimize the segmentation boundary of multitargets. Taking medical MRI abdominal image segmentation as a verification experiment, multitargets such as liver, left/right kidney, and spleen are selected for segmentation and abnormal tissue detection. In the case of small and unbalanced sample datasets, the class pixels' accuracy reaches 87.37%, the intersection over union is 92.42%, and the average Dice coefficient is 93%. Compared with other methods in the experiment, the segmentation precision and accuracy are greatly improved. It shows that the proposed method has good applicability for solving typical multitarget image segmentation problems such as small-target feature detection, boundary overlap, and offset deformation.


Introduction
Image multitarget segmentation is one of the hotspots in the field of image processing and artificial intelligence. Essentially, it can be expressed as pixel classification of multitarget with semantic labels, that is, segmenting and describing multitarget of interest in the image by using a set of object categories to classify and mark images at the pixel level [1,2]. With the continuous development of image analysis theory and deep learning technology, researchers have proposed accuracy of small-size targets are difficult to guarantee [8,9].
ere are still challenges in the research of image multitarget detection and segmentation.
At present, image segmentation can be divided into two categories: traditional methods and deep learning methods. Traditional image segmentation algorithms mostly use grayscale features to segment images [10]. Typical methods include threshold-based method [11], edge-based method [12], and region-based method [13]. However, there are problems such as susceptibility to the contrast of image gray features, poor segmentation of low-resolution and blurred images, and prone to oversegmentation of images. In general, traditional image segmentation methods are greatly affected by subjective factors, and the preprocessing process is complicated. For example, strong prior knowledge is needed to solve the problems such as the selection of seed points during segmentation and the selection of the threshold of the segmentation boundary. At the same time, small errors in the selection of key parameters have a greater impact on segmentation accuracy. ese problems make traditional image segmentation algorithms still have many restrictions when applied to image semantic segmentation.
Deep convolutional network is an effective image feature extraction and analysis method, which has been widely used in image classification, image generative, and target detection. Many excellent algorithms have emerged, such as AlexNet [14], ResNet [15], Faster R-CNN [16], and so on. Although deep convolutional networks have powerful feature extraction capabilities, they still have many limitations when applied to image segmentation problems. Take multitarget segmentation of medical images as an example. Medical images intuitively reflect the 2D and 3D morphological characteristics of organs and tissues in specific areas of the human body.
e human body has a relatively complex organ and tissue structure, and the anatomical structure of different human bodies is also different among individuals. At the same time, it is easy to be affected by factors such as noise, illumination, and local posture effect, and the image shape and various organ tissue regions are soft boundaries, showing regular or irregular dynamic periodicity with cardiac contraction or relaxation. ese factors increase the difficulty of medical image feature differentiation, target detection, and segmentation [17,18]. Moreover, the pooling layer in the convolutional neural network (CNN) will downsample the input image size and reduce the resolution of the image; the fully connected layer will turn the image features into vectors, destroying the spatial information of the image. erefore, it still has inadequacy in solving the problem of complex image segmentation. Although these convolution-based methods have achieved certain segmentation results, they ignore the spatial correlation of medical images such as CT and MRI, which is easy to produce nonsmooth and discontinuous segmentation results [19].
In order to solve the limitations of the structure and information processing mechanism of traditional convolutional neural networks in image segmentation problems, Long et al. [20] proposed the fully convolutional network (FCN) model. FCN abandons the fully connected layer of CNN and replaces it with the fully convolution structure. At the same time, it uses the deconvolution operator to restore the image size and introduces a shortcut-connection structure to fuse the high-level characteristics of the network with the low-level features to optimize the segmentation results. At present, most of the deep learning models used in image segmentation are based on the idea of the FCN model. Ronneberger et al. [21] proposed the U-Net model, which retains the convolution and deconvolution structure of the FCN model, but changes the way of fusion of high-level feature map and low-level feature map. e encoding part and the decoding part of U-Net are completely symmetrical, and the connection is realized through channel splicing and then convolution. At present, there are many variants of U-Net models, such as 3D U-Net [22], Res U-Net [23], Dens U-Net [24], and Attention U-Net [25]. In addition, the SegNet model proposed by Badrinarayanan et al. [26] replaces the deconvolution of FCN with an up-pooling method, which makes the upsampling part no longer participate in training while ensuring the segmentation accuracy, reducing the computational complexity. e above methods and mechanisms effectively solve the principle and strategy problems of image segmentation, but in complex image segmentation with noise and content diversity, there are still problems of unstable segmentation effect on lowresolution and fuzzy images and low accuracy of target pixel classification [27].
In the research of low-resolution image high-definition processing, Zhang et al. [28] (2021) build a hierarchical correlation filters model based on the multilevel convolutional features, which can suppress interference of background and similar objects. Chen et al. [29] (2021) proposed an image super-resolution reconstruction method using attention mechanism with the feature map. It uses the information extraction block of feature map attention mechanism to adaptively adjust the channel characteristics, enhance the feature expression ability, and facilitate reconstruction from original low-resolution images to multiscale super-resolution images. For the image inpainting, Chen et al. [30] (2021) proposed a novel image embedding algorithm based on encoder and similarity constraint, which effectively solved the problem of joint context awareness loss in image inpainting and improved the utilization of features.
e above works provide good support for fine segmentation of complex images. However, the proposed method will still be affected by obvious blur, and the training time required will also be longer. en, Chen et al. [31] (2021) proposed image completion algorithm based on the improved total variation minimization method, which can solve the issue mismatching and structure disconnecting in exemplarbased image inpainting. e generative adversarial network (GAN) proposed by Goodfellow et al. (2014) [32] is a deep learning method based on Nash equilibrium in game theory, including two parts: generator and discriminator. e generator in the GAN model, that is, the generator network model, is mainly used to generate target data. e typical generator structure is a neural network based on deconvolution, which restores the input image size through multiple deconvolution layers' upsampling and finally obtains generated image data. In image segmentation, the generator is essentially a segmentation model. For example, FCN [20], U-Net [21], and SegNet [26] can be selected as the generative network, which receives the input of the original image and takes the predictive segmentation as the output. e discriminator, that is, the adversarial network in GAN, usually uses CNNs as the basic model. Its inputs include the real data and the generator's generated data. e generated data are judged as false, and the real data are judged as true.
rough the authenticity judgment, the game learning is performed to optimize the generator's ability to generate data. In mechanism, GAN can reconstruct low-resolution images into super-resolution high-definition images [33] and train the generative model based on the surrounding pixels of the missing part of the image to repair the complete image [34]. If applied to the image segmentation problems in the case of blur, offset, and small target, it can effectively improve the quality and accuracy of image segmentation. Attention mechanism is a target feature enhancement method that is widely studied and applied at present. It can be used as a module of the deep learning model to focus attention on objects of interest [35,36]. However, most of the existing methods mainly focus on the local pixels of the target area and have low relevance to the image content with a large receptive field [37]. In order to capture the dependence of spatial long-distance information in the image, Wang et al. [38] proposed a nonlocal attention mechanism, the strategy of which is that the characteristic response value at a pixel is equal to the weighted average of the characteristic values at all receptive field points, that is, all points in the larger receptive field are connected to realize global information fusion. e nonlocal attention mechanism connects the understanding of global content with the semantics of local targets, which improves the enlightenment and restriction of target pixel classification. In complex image multiobject segmentation, if the image segmentation mechanism of GAN is combined with nonlocal attention feature enhancement method, it can effectively improve the accuracy of complex image multiobject segmentation and optimize segmentation boundary in mechanism.
Aiming at the poor accuracy of multitarget instance segmentation and small-scale target segmentation in complex images, a generative adversarial network fused with nonlocal attention mechanism is proposed in this paper. e generative network module of the GAN uses the residual network as the basic model for preliminary target segmentation.
e nonlocal spatial-channel dual-attention mechanism is added to the output feature map of the residual network to capture the long-distance dependence information of each feature point on the output feature map. e adversarial network module is constructed based on CNNs, which performs masking operations on the original image with prediction segmentation and label segmentation, respectively, and inputs the masking result into the adversarial network. e adversarial network judges the mask result of the predicted segmentation as false, and the mask result of the label segmentation is judged as true. And the generative network judges the mask result of the predicted segmentation as true. rough the game learning between the generative network and the adversarial network, the image segmentation ability of the generative network is optimized. Specifically, firstly, a generative network module is constructed based on residual network and nonlocal attention mechanism for preliminary image segmentation. On this basis, masking operations on the original image with the prediction and label segmentation, respectively, are performed, and the results are input to the adversarial network to optimize the segmentation results. AM-GAN strengthens the extraction and fusion of multiscale features, as well as the distinguishing ability of each instance boundary pixels, to achieve the fine segmentation of multitarget instance regions in complex image and improve the accuracy of small-target segmentation.
Medical images are often blurred due to the influence of the imaging environment and detecting equipment. Meanwhile, the complexity of human organ and tissue structure, the differences of different individual anatomical structures, and the influence of local posture effect also increase the difficulty of medical image feature differentiation and segmentation. In this paper, taking the segmentation of abdominal MRI images as a verification experiment, multitarget tissues such as liver, left/right kidney, and spleen are selected for segmentation and abnormal detection to verify the effectiveness of the model and algorithm.
In this paper, the problems in image segmentation, such as insignificant morphological characteristics of the target image, easy to be affected by noise, gray value distortion and local position effect, and unclear boundary of the target region, are studied. e main motivation is to establish a novel image segmentation model, which can improve the accuracy of complex image fine segmentation in mechanism. e novelty and main contributions of this paper are as follows: (1) A novel generative adversarial network fused with the attention mechanism (AM-GAN) multitarget image segmentation model is proposed. In mechanism, the image segmentation mechanism of GAN is organically combined with the feature enhancement method of nonlocal attention so that the model and algorithm can automatically learn to focus on the target structure with different shapes and sizes, highlighting the feature usefulness for specific tasks. It can effectively improve the problems of existing image segmentation methods, such as insufficient utilization of correlation information between image voxels, imprecise detection of small-target area, unclear boundary, and overlapping multitarget boundary, and effectively improve the accuracy of complex image multitarget segmentation.
(2) e generative module of AM-GAN combines the feature enhancement of nonlocal attention with the feature fusion method of the residual network. In the generative network, the understanding of the global content is linked with the semantics of the local target, and the low-level and high-level feature maps Computational Intelligence and Neuroscience are added through the shortcut-connection structure to achieve feature fusion. In mechanism, it can give play to the guidance and heuristics of target segmentation and refine the segmentation results. (3) In this paper, AM-GAN is proposed as the image multitarget segmentation model. In mechanism, it can use GAN's high-definition processing and repair capabilities to reduce the effects of noise, bias deformation, and gray value distortion. rough the information association and restriction of the nonlocal attention large receptive field to the local target, the continuity of target segmentation is maintained. Meanwhile, the Nash game strategy between generative network and adversarial network is adopted in AM-GAN algorithm to optimize the segmentation results and improve the continuity, smoothness, and accuracy of multitarget segmentation results.
In Section 1, the current challenges and current research status of the complex image multitarget segmentation are reviewed and analyzed. e ideas and algorithm strategies of a novel AM-GAN segmentation model established in this paper are pointed out. In Section 2, the AM-GAN model is established and its theoretical properties are analyzed. In Section 3, the comprehensive learning algorithm of AM-GAN is designed and proposed. In Section 4, multitarget segmentation experiments and result analysis are conducted based on medical images. Finally, the work of the paper is summarized, and the advantages and limitations are pointed out.

e Generative Adversarial Network Basic
Model. e generative adversarial network (GAN) basic model in this paper includes two parts: generator and discriminator. e basic structure and information processing flow are shown in Figure 1.
In Figure 1, the generative network module in GAN is mainly used to generate target data. e typical generative network generally is usually the neural network based on deconvolution, such as FCN, U-Net, and SegNet, which recovers to the size of the input image through sampling on multiple deconvolution layers and finally obtains the generated image data. For image segmentation, the generative network is used as an image segmentation model, which receives the input of the original image and takes the predictive segmentation as the output. e adversarial network in GAN usually adopts convolutional neural networks. Its input includes real data and generated data from the generator. rough authenticity judgment and game learning with the generator, the ability of the generator to generate data is optimized. e gray value of the input image to generative network is recorded as the random variable x, the distribution it obeys is set to P(x), and the noise data z obeys the uniform distribution. e generator maps the noise data z to G(z), and the authenticity discrimination probability of the image random variable x is D(x).
In learning, GAN is optimized according to the principle that the generative model maximizes log D(x) and the discrimination model minimizes log(1 − D(G(z))). e objective function is defined as follows: (1)

Nonlocal Attention Mechanism.
e core idea of nonlocal attention mechanism is that the characteristic response value at a pixel is equal to the weighted average of the characteristic values at all points, that is, all points in the receptive field are connected, and the dependency relationship between pixels based on spatial distance is established to realize global information fusion. e calculation formula [38] is where i represents the output position index, j represents any position of the input, x represents the input, y represents the output with the same size as x, f(·) is the similarity measurement function between pixels, g(·) represents the feature mapping of j, and C(x) is the normalization factor.
(f) and g can be implemented in many ways. If g is a Gaussian function, its expression is e operation in equation (3) represents the difference amplified exponentially after multiplying the two vector matrices, and equation (4) is the mathematical expression of the normalization function C(x).
Consider an extended form of equation (3). Firstly, the vector x is embedded into spatial mapping, that is, the two vectors are mapped to different feature spaces, and then, the Gaussian function is used to measure the similarity. e specific form is e vector dot product of function f can be expressed as Now, the normalization function C(x) is equal to N, and N is the number of positions of x.
Based on the paired function form in the relational network proposed by Santoro et al. [39], the function f can be expressed as a cascaded form: e nonlocal attention mechanism is integrated into convolutional neural network to form a general nonlocal block. e module can be integrated into any neural network structure. e definition of nonlocal block is as follows: where z is the output of the attention module, y is the output of equation (2), w z is the weight matrix of the number of restored input channels, and x is the input. e structure of nonlocal block is shown in Figure 2. In Figure 2, the nonlocal block first performs feature mapping on the input matrix I(C × H × W) with a 1 × 1 convolution, that is, the mapping operation of the functions g, θ, and φ in (3) and (5), to get matrices w r , w k , and w v . en, the matrix w k is deformed and transposed into a matrix F k (HW × T), and w r is transformed into a matrix F r (T × HW). F k is multiplied by F r and then normalized by Softmax to obtain the similarity measure matrix F(HW × HW). en, multiply the deformed matrix F v (T × HW) from w v and the matrix F to obtain the characteristic response matrix F s (T × HW). Finally, F s is deformed and multiplied by the convolution kernel w z to restore the original number of channels. e operation result is added to the input matrix I(C × H × W) to obtain the output O(C × H × W), and the final output feature map is a feature map with enhanced global information dependence.

Residual Generation Network Based on Dual-Attention
Mechanism. In the complex images' multitarget segmentation, if the number of image instance targets is large and the difference of individual gray value is small, the neural network needs to have strong ability of feature extraction, fusion, and recognition. In this paper, the residual network is used as the main body of the image segmentation model, and the shortcut-connection structure is added between different convolution layers, that is, the input of the upper network is directly superimposed with the output of the lower network at the element level, so as to reduce the loss of feature information and realize feature fusion of different levels. In this way, the network can still maintain good convergence properties in the deep case. e residual structure is shown in Figure 3.
At present, there are many classical residual network models, such as ResNet18, ResNet34, and ResNet50 [16]. In practice, the selection of the model is mainly based on the requirements of input image size, quality, and segmentation accuracy. Generally, the deeper the network, the stronger the feature extraction ability. In this paper, ResNet50 is selected as the basic model according to the problem of small-target region detection in complex image multitarget segmentation.
e structure of segmentation model based on ResNet50 is shown in Figure 4.
In Figure 4, the segmentation model is divided into six parts. (1) 7 × 7 convolution with step size of 2 and the number of output image channels is 64. (2) e pooling operation with step size of 2 and the 3 × 3 sliding window cascades three residual blocks. Each residual block contains three convolution layers. e first and third are 1 × 1 convolution to adjust the number of channels of the feature maps. e second is 3 × 3 convolution with a step size of 1. Equations (3) to (5) are stacked residual blocks. e last one includes average pooling with a step size of 1 and 7 × 7 sliding window, as well as fully connection and Softmax classification. Extract the output from the second to the fifth part of the model, and then, upsample the four output feature maps for prediction segmentation.
In practice, only relying on the residual network to perform image multitarget segmentation will result in the blurring of the segmentation boundary and the loss of small-target information, that is, the residual network cannot make full use of the image feature information to segment the image. In this paper, based on the residual network, a nonlocal attention mechanism is introduced to construct a dual-attention mechanism model that can integrate spatial and channel attention. e structure is shown in Figure 5.
In Figure 5, the spatial-attention module is used on a nonlocal mechanism to strengthen the dependency between all pixels on the feature maps, which is represented by a similar weight matrix. In this module, the output feature map of the residual network first undergoes 1 × 1 convolution to obtain three feature maps F1, F2, and F3, which have the same width and height as the input image, but the number of channels is reduced to 1/4 of the original. On this basis, F1 is transposed and multiplied by F2 and then processed by Softmax normalization to obtain the interpixel similarity matrix F(HW × HW). F is multiplied by F3, and Computational Intelligence and Neuroscience the original input is added to obtain the feature map after spatial feature optimization. e channel-attention mechanism is used to capture the interdependence between any two channels and update the value in one channel by using the weighted average of all channels. Its implementation steps are similar to the spatialattention module. e feature maps obtained by the spatialattention module and the channel-attention module are added and fused to generate a segmented image strengthened by the attention mechanism.
In Figure 5, the mathematical expression of Softmax is where z l i represents the i th pixel value in the l th column of the feature map F.
is formula normalizes the similarity measure matrix F by column so that the weight value of each column is within the interval [0, 1]. e nonlocal attention mechanism is combined with the residual network model to construct the generator module in the GAN. e structure is shown in Figure 6.
In Figure 6, the generator module receives image data input, and the input image is extracted features by the ResNet50 network based on the dual-attention mechanism; four output feature maps at different levels are obtained, denoted as F0, F1, F2, and F3. e F1, F2, and F3 feature maps are upsampled to the same size as F0, and then, the four output feature map channels are spliced and 3 × 3 convolution is performed to obtain the fused feature map F. e fusion feature map F is spliced with F0, F1, F2, and F3 at channel level, respectively, and then, input into the selfattention mechanism module after 3 × 3 convolution to obtain four prediction segmentation maps. ey are added and fused and averaged, and finally, the image multitarget prediction segmentation map is obtained.

e Adversarial Network Model Based on Convolutional
Network. In GAN, the adversarial network module penalizes the lost details and small-size target information of the network through game learning, which makes the multitarget images segmented by the generative network more accurate. In this paper, the adversarial network module in GAN is constructed based on CNN, and its structure is shown in Figure 7.
In Figure 7, the adversarial network model includes two inputs: one is dot-product image of segmentation label and the original image, and the other is dot-product image of generative network and the original image. Dot-product operation is a mask operation on the original image to obtain the area of label and prediction segmentation on the original image. e adversarial network performs feature extraction on the obtained detection area, distinguishes whether the input is a real or predicted segmentation area, and optimizes the segmentation result through the adversarial learning with the generative network. In this paper, the CNN in the adversarial network model contains a total of 5 convolutional layers, 5 pooling layers, and two fully connected layers. e size of the convolution kernel is 3 × 3, and the step size is 1 × 1, that is, the convolution operation does not change the size of the input. e pooling layer is the maximum pooling with a size of 2 × 2, and the step size is 2 × 2, which reduces the input resolution by half.
e ReLu function is selected as the activation function, and the Sigmoid function is used for classification operations. e Generative Adversarial Network-Fused Attention Mechanism. In this paper, a generative adversarial network segmentation model fused with attention mechanism (AM-GAN) is proposed and the overall structure is shown in Figure 8.
In Figure 8, the gold standard is an accurate result of manual segmentation by experts. In AM-GAN, the generative network is composed of the ResNet50 model and the nonlocal dual-attention mechanism module. It takes the original image as input and the predicted segmentation as     Computational Intelligence and Neuroscience output. e adversarial network is composed of CNNs. Its inputs include the mask of the predicted segmentation and the original image and the mask of the gold standard and the original image. e mask of the predicted segmentation is judged as false (marked as 0), and the mask of the gold standard is judged to be true (marked as 1).
Combining the generative network based on the residual network and the nonlocal dual-attention mechanism with the adversarial network based on the CNNs to build a generative adversarial network model for the multitarget image segmentation. After AM-GAN is trained to reach the optimum, the model used for image segmentation is a generator network. e structure of the AM-GAN segmentation model is shown in Figure 9.
As shown in Figure 9, in the generator network, the conv2x, conv3x, conv4x, and conv5x parts of the ResNet50 based on the dual-attention mechanism all have a prediction segmentation output, and the final prediction segmentation of the generator network is the average value of the prediction segmentation of these four parts. e input of the generator is subjected to feature extraction through the extended ResNet50 network, and the rough feature map is obtained by upsampling. e calculation formula of the upsampling output F 1 of the conv3x part in the extended ResNet50 network is where I represents the input image, conv3x(I) represents the convolution output of the conv3x part, w 1 represents the 1 × 1 convolution, whose purpose is to reduce the number of channels of the convolution output, b 1 represents the bias, BN(·) represents the batch normalization, Relu(·) is the activation function, and Umsample(·) is the interpolation upsampling. In this paper, the bilinear interpolation algorithm is used to upsample the feature map, and the output is F 1 . Using the same algorithm, the upsampled output F 2 and F 3 of the conv4x and conv5x parts can be obtained. e output F 0 of the conv2x part does not need to be upsampled, and the sizes of F 1 , F 2 , and F 3 are the same as F 0 . After feature extraction and upsampling of the input image, the output information of each part is fused. e calculation formula of the fusion feature map F is where cat(·) represents the splicing operation of feature map channel, w 2 is a 3 × 3 convolution, which is used to fuse the information between the four feature maps, w 3 is a 1 × 1 convolution, which is used to reduce the number of channels of the fusion feature map, b 2 and b 3 are biases, and Relu(·) is the activation function. After the information of each part is fused, the feature map is input into the dual-attention mechanism module for feature enhancement. e calculation formula of the predicted output P 1 of the conv3x module is where pAttention(·) represents the spatial-attention mechanism module, cAttention(·) represents the channelattention mechanism module, w 4 represents 1 × 1 convolution, which is used to convert the number of channels into the number of classification categories, Upsample(·) represents a bilinear interpolation up-sampling operation, which is used to restore the size of output feature map to the size of the input image to obtain a predicted segmented image. Using the same method, the predicted output P 0 , P 1 , and P 3 can be obtained. Finally, P 0 , P 1 , P 2 , and P 3 are added and averaged to obtain the predicted segmented image. e calculation formula is Equation (13) is the calculation expression for predicting segmentation P, where avg(·) represents the average operation.
In the GAN segmentation model established in this paper, the data processing capability of the ResNet50 extended based on the nonlocal dual-attention mechanism can mechanically ensure that the image multitarget features can be accurately extracted, while nonlocal attention mechanism also strengthens the output feature map of the ResNet50, which can further improve the accuracy of segmentation. e authenticity judgment of the adversarial network can guide the segmentation network to avoid the loss of detailed information as much as possible and optimize boundary segmentation and small-target segmentation.

e Loss Function.
In the complex images' multitarget segmentation, it is necessary to calculate classification errors of multiclass pixel. In this paper, the multiclassification cross-entropy loss [40] is used as the loss function of the generative network, which is defined as follows: Equation (14) represents the multiclassification crossentropy loss of the label image x and the predicted segmentation y, where (H, W, C) represents the length, width, and number of channels of the image.
In the AM-GAN segmentation model, the adversarial network uses the authenticity loss to perform game learning with the generative network, which is essentially a binary classification problem. erefore, the two-classification cross-entropy loss [41] is used as the loss function of the adversarial network, which is defined as follows: Equation (15) represents the two-classification crossentropy loss, where x represents the classification label and y represents the classification probability output of the adversarial network. e loss function of the generative network is defined as follows: Computational Intelligence and Neuroscience G x n , y n , θ g � L mec g x n , θ g , y n + λL bec d g x n , θ g , x n , 1 , (16) where x n is the input image, y n is the label segmentation, and θ g is the training parameter set of the generative network. According to equation (16), the loss of generative network is composed of two parts, that is, the multiclassification crossentropy loss of prediction segmentation and label segmentation and the loss of adversarial network which predict segmentation. e hyperparameter λ is used to balance these two losses. When optimizing the generative network, we minimize the objective function. e loss function of the adversarial network is defined as D x n , y n , θ d � L bec d x n , y n , θ d , 1 where θ d represents the training parameter set of the adversarial network. From (17), the loss function of the adversarial network consists of two parts. e first is the twoclassification cross-entropy loss of the mask map of input image x n and the label segmentation y n , which is judged to be true (that is, the value is 1). e second is the twoclassification cross-entropy loss of the mask map of input image x n and the label segmentation g(x n ), which is judged to be false (that is, the value is 0). When optimizing the adversarial network, we minimize the loss function. e loss function of GAN is composed of the loss of the generative network and the loss of the adversarial network. e calculation expression is V (G, D) � G x n , y n , θ g + D x n , y n , θ d .

e Training
Process. AM-GAN adopts the method of alternate training of generative network and adversarial network. In order to ensure the stability of training, the training times of the discriminating module are generally more than that of the generative module. In this paper, the minibatch gradient descent (MBGD) algorithm is used for AM-GAN training. e specific process is as follows: (1) Randomly sample n samples z n from noise samples, and n samples x n from real samples. (2) Gradient ascent algorithm is used to update the adversarial network: (3) Repeat steps (1) and (2) k times to update the adversarial network k times. (4) Sampling n generated samples from the noise samples, and we update the generative network once using gradient descent algorithm: (5) Sequentially, we repeat the above steps until the model training is stable and optimal.
e specific algorithm implementation is as follows: Step 1: determine and initialize the GAN training parameter set. e training parameter set of the generated network is θ g � (w 0 , w 1 , . . . , w m , b 0 , b 1 ,  . . . , b m , g, θ, φ, w z , w c ), where (w 0 , w 1 , . . . , w m ) is the set of convolution kernel weight of extended ResNet50 net- work, (b 0 , b 1 , . . . , b m ) is the set of biases, g, θ, φ, and w z are 1 × 1 convolution kernel weights of spatial-attention mechanism, and w c is 1 × 1 convolution kernel weights of channel-attention mechanism. e training parameter set  10 Computational Intelligence and Neuroscience of the adversarial network is θ d � (w 0 , w 1 , . . . , w n , b 0 , b 1 , . . . , b n ), where (w 0 , w 1 , . . . , w n ) is the set of convolution kernel weights of CNN, and (b 0 , b 1 , . . . , b n ) is the set of biases.
Step 2: randomly select N image samples x n from the sample set, and select corresponding N label segmentation samples y n .
Step 3: calculate the loss of the adversarial network: Step 4: calculate the training parameter gradient according to the chain rule, σ is the activation function, a i is the input feature map of the i th layer, z i is the feature map of the i th layer after convolution, w i is the weight of the i th layer, and b i is the bias of the i th layer. e formula for calculating the weight gradient using the chain rule is where σ ′ (·) is the derivative of the activation function and (·) T represents the matrix transpose. e formula for calculating the bias gradient using the chain rule is Step 5: use the MBGD optimizer to update the convolution weights and biases of the adversarial network: Step 6: repeat Step 2 Step 5 k times.
Step 7: randomly select N image samples x n , and select corresponding N label segmentation samples y n .
Step 8: calculate the loss ∇L(θ g ) of the generative network: L mec g x n , θ g , y n +λL bec d g x n , θ g , x n , 1 .
Step 9: use the MBGD optimizer to update the parameters θ g of the generation network.
Step 10: if the network converges to the optimal, then the training ends; otherwise, it returns to Step 2.
e epoch of training is set to 200 and the batch size is 10. According to the above algorithm process, the pseudocode of the training algorithm is shown in Algorithm 1.

e Experiment Dataset.
e dataset comes from the Combined Healthy Abdominal Organ Segmentation (CHAOS) competition dataset [42]. e CHAOS dataset contains the abdomen MRI images. In the experiment, the abdominal MRI abdominal images containing the label information of the liver, left/right kidney, and spleen were selected as the sample set. e typical abdomen image and label segmentation image are shown in Figure 10.
e experimental dataset contains 38 groups of abdominal MRI images, and each group has 26 slices with a size of 256 × 256, totaling 988 slices. e dataset is divided into a training set, a validation set, and a test set. e training set includes 30 groups of slices, the validation set includes 2 groups of slices, and the test set includes 6 groups of slices. Due to the small number of samples in the dataset, random rotation, mirroring, and other operations are used to enhance the data in the experiment. e final number of slices in the training set is expanded to 3120, the verification set is expanded to 104, and the test set is expanded to 312. e sample distribution of dataset is shown in Table 1.

4.2.
e Model Structure and Parameter Settings. e training strategy of the AM-GAN is to alternate training between the generative network and the adversarial network. In order to ensure the stability of training, the ratio of the training times of the generative network and the adversarial network is set to 1 : 6, that is, the adversarial network training is performed 6 times first, and then, the generative network is trained once. e size of the convolution kernel of the extended ResNet50 in the generative network is all 3 × 3, the step size of the downsampling is 2 × 2, and the other step size is 1 × 1. e activation function adopts the ReLu function, and the balance coefficient λ of the loss function of generative network is set to 0.2. e adversarial network includes 5 convolutional layers, 5 pooling layers, and 2 fully connected layers. e convolution kernel size of the convolutional layer is set to 3 × 3, and the step size is 1 × 1. e pooling window size of the pooling layer is set to 2 × 2, and the step size is 2 × 2.
e Sigmoid function is used as the classification function. e learning rate of the MBGD optimizer is set to 0.01, and the weight parameters are initialized with truncated normal distribution. e number of training iterations is 200, and the batch of one training is 10. e experimental environment is Linux system with NVIDIA GeForce RTX 2080Ti GPU.

4.3.
e Experiment Result and Analysis. e AM-GAN training is carried out according to the algorithm strategy in Computational Intelligence and Neuroscience Section 3 and the model structure and parameter setting in Section 4.2. Multitarget image segmentation is performed by generative network. Recall rate, precision, F1-score, accuracy, and Dice similarity coefficient are used to evaluate the properties of the segmentation model. e image segmentation experiment results of the test set are shown in Figure 11, and the index evaluation results are shown in Table 2.
In Figure 11, from left to right, there is the original abdominal image, the gold standard, that is, the image manually segmented by the expert, and the predicted segmented image by network. From Figure 11, the segmentation results of the GAN are clear and smooth. Whether it is a large-sized organ such as the liver or a small-sized organ such as the kidney and pancreas, the GAN can distinguish them more accurately, which shows that the proposed method has good applicability.
For analyzing the properties of the learning algorithm, the curves of the Dice coefficient indicators of four organs and tissues, including liver, left kidney, right kidney, and spleen changing with iteration, and the confusion matrix of the validation set are recorded.
e results are shown in Figure 12 and Table 3.  (4) N original abdominal MRI images were randomly selected, and N corresponding label segmentation images were selected; (5) Calculate the loss of the adversarial network by formula (17); (6) Update the training parameters of the adversarial network by formulas (21) and (22)    In Figure 12, the Dice similarity coefficient training curves of the liver, left kidney, right kidney, and spleen are shown. e Dice curve of the liver converges quickly and is relatively stable in the later iterations. is is because its size is relatively large compared to other organs, which causes the segmentation network to be more inclined to learn the classification of the liver area pixels in the early stage. e left kidney, right kidney, and spleen organs are smaller in size than the liver, and the segmentation network has a deeper level, which leads to easy loss of information and large fluctuations in the training. In the later stage of training, the average Dice coefficient of liver, left kidney, right kidney, and spleen on the training set is 0.92, which shows that the overall effect of model segmentation is good, and it has good applicability to smalltarget segmentation.
From Table 3, the segmentation accuracy of liver organs is the highest, 97.55%. is is because the size of liver organs is relatively large, and the network tends to learn their pixel weights of liver organs. e spatial location of the left kidney and the liver area is relatively close, and the boundary tissues overlap, leading to the segmentation error of the left kidney mainly from the liver and background. e right kidney and spleen organs are close in space, leading to mutual influence. e right kidney and spleen are small-target organs, and there is no boundary overlap, so the segmentation is less affected than the left kidney. From the confusion matrix, the accuracy of left kidney segmentation was 82.39%, while that of right kidney segmentation was 92.68% and that of spleen segmentation was 94.12%. It shows that the method in this paper has a good ability to identify and distinguish the features of complex image pixel categories.

Comparative Experiment and Analysis.
In order to verify the performance improvement of AM-GAN brought by the nonlocal attention mechanism, a ResNet-GAN model was constructed in the comparative experiment.
is model removes the nonlocal attention mechanism in AM-GAN, and other structures are the same as the AM-GAN. In addition, current mainstream, AM-FCN [43]    e four index values of accuracy, recall, precision, and F1-score are used for comparative evaluation. In order to avoid misleading the performance evaluation by high background indicators, all evaluation indicators do not include the value of background indicators. e specific results are shown in Table 4.
From Table 4, the four indexes of AM-FCN and Attention U-Net are lower than the proposed model. is is because it organically integrates GAN's inherent ability to process and repair low-resolution images, and the global to local content association and feature enhancement mechanism of nonlocal attention improves the accuracy of multitarget segmentation results. Except for the recall rate, other indexes of DANet are lower than that of the proposed model, which shows that the nonlocal attention mechanism used in this model can gather the receptive field from global to local target area to realize information context correlation. In addition, GAN can optimize the segmentation results in mechanism and improve the segmentation properties of the model, and the model and algorithm are relatively robust. Compared with ResNet-GAN, the accuracy of proposed model is 1.1% higher, which shows that the nonlocal attention can effectively realize the information association from global to local. Combined with GAN's probability exploration mechanism, the segmentation accuracy of the model is comprehensively improved. Compared with the SEVNet, due to the inherent high-definition processing and repairing capabilities of the GAN for noise images and the use of a game strategy, the optimization of the segmentation results is realized, and the accuracy, continuity, and smoothness of the segmentation results are comprehensively improved. Based on the above analysis, it shows that the method in this paper has comprehensive advantages when performing image target segmentation with insignificant structural features and achieves a better segmentation effect.

Conclusion
In this paper, aiming at the fine segmentation of multitarget complex images, a generative adversarial network model fused with attention mechanism is proposed. In mechanism, the AM-GAN can use the ability to process and restore highdefinition images of GAN to effectively reduce the effects of noise, offset distortion, and gray value distortion. Nonlocal spatial-channel dual attention is introduced to realize the information association and constraint of large receptive field content on local targets and maintain the continuity of segmentation results. At the same time, the Nash game strategy of the generative network and the adversarial network is adopted, which reduces the algorithm's loss of detailed features and effectively improves the accuracy of small-target segmentation. In terms of information processing mechanism, this method comprehensively realizes the context correlation of image information, the feature fusion of different levels and scales, the high-definition processing and repair of high-noise images, and the optimization of segmentation results. e experimental results show that the multitarget segmentation method proposed in this paper has good applicability for both small-size and large-size targets. Compared with the other methods, each evaluation index has been greatly improved. AM-GAN comprehensively utilizes the advantages of nonlocal attention mechanism and generative adversarial network, which can finely segment multi-instance targets in complex images. It has good applicability in mechanism to solve the image segmentation problem of insignificant morphological features and weak spatial information relevance. It improves the limitations and deficiencies of the comparison method in solving the above problems, provides a novel deep learning method for image segmentation, and has great application value and a good prospect for promotion. However, the information processing mechanism and algorithm process of the method in this paper are more complicated, and the image semantic knowledge and structural features are less used. From the experimental results, the proportion of image background pixels compared with target area pixels is too large, and the number of samples in the dataset is unbalanced, which still has a certain impact on the segmentation accuracy of this method. How to optimize the model and algorithm, improve the discrimination ability of the local target image and the background image, and embed the scene semantic feature knowledge into the segmentation model, which will be an important work in the next stage of research.
Data Availability e data that support the findings of this study are available upon request from the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this paper.

Supplementary Materials
e AM-GAN model combines the generative network based on the residual network and the nonlocal dual-attention mechanism with the adversarial network based on the CNNs to build a generative adversarial network model for the multitarget image segmentation. After AM-GAN is trained to reach the optimum, the model used for image segmentation is a generator network. (Supplementary Materials)