A Change Detection Method for Remote Sensing Images Based on Coupled Dictionary and Deep Learning

With the rapid development of remote sensing technology, change detection (CD) methods based on remote sensing images have been widely used in land resource planning, disaster monitoring, and urban expansion, among other fields. The purpose of CD is to accurately identify changes on the Earth's surface. However, most CD methods focus on changes between the pixels of multitemporal remote sensing image pairs while ignoring the coupled relationships between them. This often leads to uncertainty about edge pixels with regard to changing objects and misclassification of small objects. To solve these problems, we propose a CD method for remote sensing images that uses a coupled dictionary and deep learning. The proposed method realizes the spatial-temporal modeling and correlation of multitemporal remote sensing images through a coupled dictionary learning module and ensures the transferability of reconstruction coefficients between multisource image blocks. In addition, we constructed a differential feature discriminant network to calculate the dissimilarity probability for the change area. A new loss function that considers true/false discrimination loss, classification loss, and cross-entropy loss is proposed. The most discriminating features can be extracted and used for CD. The performance of the proposed method was verified on two well-known CD datasets. Extensive experimental results show that the proposed method is superior to other methods in terms of precision, recall, F1-score, IoU, and OA.


Introduction
In 1858, the first remote sensing image was taken over Paris, France. Human's cognition of the Earth has entered a new stage from local to large-scale surface observation as a result of remote sensing technology on various platforms. In recent years, with the rapid development of sensor and information technology, it has become possible to acquire and store large amounts of remote sensing images. Remote sensing images provide extensive Earth observational data that are characterized by high precision, wide range, and short period, offering important data support for the military, science, agriculture, environment, resources, transportation, and urban management [1]. Researching the surface ecological environment using these remote sensing images has great societal and scientific significance [2].
Remote sensing image change detection (CD) is an important application of remote sensing image interpretation that aims to compare and analyze two (or more) remote sensing images taken at different times in the same region and obtain change information about the ground object. is technology has been widely used in land planning [3], urban expansion [4], disaster assessment [5], agricultural investigation [6], scene classification [7], and ecological environment monitoring [8], among other fields. Traditional artificial visual interpretation methods mainly detect changes in the obtained remote sensing images using the visual interpretation experience, which has low efficiency and is easily affected by subjective conditions. erefore, such methods cannot meet current needs for dynamic monitoring of the environment. With the development of remote sensing image CD technology, although new CD methods continue to emerge, many new problems and challenges have arisen due to improvements in imaging ability and different properties of sensors. e main problems and challenges are as follows. First, the continuous improvement of spatial and spectral resolution of remote sensing images offers abundant ground object information, which significantly increases redundancy and noise. Second, with the continuous development of sensor technology, various new data sources are emerging. is poses a new challenge to the universality of existing remote sensing image processing methods. How to accurately and quickly extract useful information from massive datasets for specific tasks has become an urgent problem. ird, the applications for remote sensing images continuously involve more and more fields. us, a single model can no longer complete most of the complex tasks required. erefore, the existing image classification and CD models must start with a practical application and fully combine the advanced technologies in various fields to meet the needs of different industries. However, for a variety of complex tasks, it is difficult for human beings to obtain an effective and comprehensive model due to limited prior knowledge of the task. Identifying new methods to solve the aforementioned problems has increasingly attracted researchers' attention.
With the rapid development of computer theory and technology, artificial intelligence technology has appeared in image segmentation, image classification, CD, and other related research areas. In particular, the recent emergence of deep learning methods as a powerful driving force of artificial intelligence technology has greatly stimulated the enthusiasm of various research fields to utilize deep learning to solve existing problems. e outstanding deep learning model UNet is widely used in segmentation and achieved good performance in multiobject [9,10] and multiscale [11] remote sensing image segmentation. In respect of remote sensing image classification, the DRL-GAN is proposed by literature [12]; the network can not only enhance the image representation ability, but also identify tiny objects from remote sensing images. As for remote sensing image translation, the attention mechanism is introduced to the generative adversarial network architecture, and the SPA-GAN model is proposed by literature [13]. With regard to remote sensing image dehazing, considering the CNN's excellent performance in image dehazing, the FD-GAN is proposed by literature [14]; the model can produce more realistic and natural images with less color distortion and fewer artifacts. Concerning remote sensing image object detection, an architecture with ESRGAN, EEN, and detection network is proposed by literature [15]; the quality of remote sensing images is enhanced by using EESRGAN, and the detection performance is improved by backpropagating the loss into the EESRGAN. From the above achievement, we can discover that the rich hierarchical architecture of deep learning networks is conducive to adopting shallow to deep approaches. e target task is analyzed in depth and thoroughly, and the understanding of high-level and highdimensional features is gradually formed through low-level information analysis. e strong feature learning and expression abilities of deep learning models can be used to realize automatic robust learning if the model is able to effectively utilize a large amount of existing information. A large number of training models based on deep learning methods within the category of supervised learning can provide sufficient reference for massive multisource datasets, greatly improving the utilization of data and the timeliness of a variety of remote sensing tasks, including CD.
At present, deep learning methods have been used to carry out a great deal of research in the field of remote sensing image target extraction. e theory underlying such methods should also be applicable to the field of CD. Highlevel semantic features such as spectrum, space, and texture can be extracted from multitemporal images using deep learning to establish a nonlinear relationship between ground object features in multitemporal images. Remote sensing CD models based on deep learning are different from traditional remote sensing CD models in that deep learning can identify the features of data through representation learning, which can reduce the heavy dependence on manual work during feature extraction in remote sensing image CD. erefore, researching remote sensing image CD methods based on deep learning can not only improve the accuracy of remote sensing image CD, but also offer a new automatic and intelligent processing method for realizing current application requirements. e main contributions of the paper can be summarized as follows: (1) In this paper, a change detection framework based on coupled dictionary learning is proposed to map multisource and multitemporal data from low-dimensional space to high-dimensional space and then estimate its change. (2) A differential feature extraction network is proposed to construct a new reconstruction loss by integrating true-false discrimination loss, classification loss, and cross-entropy loss, which is conducive to training the coupled dictionary with stronger representation ability. (3) e multisource remote sensing image change detection is realized by combining the advantages of model driven method (coupled dictionary learning) and data driven method (differential feature extraction network), and data from different observation spaces can be modeled uniformly. e rest of this paper is organized in four sections. Section 2 is a description for related works of the change detection task we are dealing with. Section 3 provides details on the proposed model. Section 4 deals with the experimental results and discussion. Finally, Section 5 summarizes the observations and points out some probable future works to complete this paper.

Related Works
CD refers to the process of observing the same object on the Earth's surface at different times to determine its different states. e task of CD is to answer three questions about object change: (1) Has it changed? (2) Where has the change occurred? (3) What has changed? With the increasing resolution of sensors, the acquisition period of remote sensing images has gradually shortened, and a large number of CD methods based on remote sensing images have been proposed.
In recent years, the emergence of deep learning technology has provided new concepts for CD in remote sensing images and shown strong ability to solve the problems associated with traditional methods. First, compared with traditional methods, deep learning methods do not require a large amount of existing knowledge. Second, no manual intervention is required during the training process, which greatly improves the automation of remote sensing image CD. ird, since deep neural networks have strong learning and feature expression abilities, they are more suitable for processing high-dimensional data than traditional methods. erefore, remote sensing image CD methods based on deep learning methods are attracting increasing attention. Many deep learning methods, such as NN [16], FCN [17], Den-seNet [18], UNet [19], DeepLab [20], and SegNet [21], have been introduced for remote sensing image applications and have achieved extremely high accuracy. Some research results have even approached or attained the accuracy of human vision systems. Previous literature [22] has used an end-to-end recurrent neural network with a long short-term memory (LSTM) network to learn change rules of land cover. Literature [23] trained a deep belief network to classify changing and unchanging regions in SAR images, while literature [24] proposed an end-to-end two-dimensional convolutional neural network for hyperspectral image CD. Other studies [25,26] have researched the use of generative adversarial networks to detect the change areas of multispectral remote sensing images and other cross-domain remote sensing images. Literature [27] proposed a deep Siamese convolutional network (DSCN) to detect changing regions with contrast loss in images. A symmetric convolutional coupled network (SCCN) based on a DSCN was proposed in literature [28] to detect changes between optical and SAR images. In sum, existing CD methods based on deep learning can autonomously learn high-dimensional features of the changed region without manual intervention and perform CD according to the truth value.
Most CD methods analyze multitemporal remote sensing data obtained by the same sensor or same type of sensor. However, with the development of remote sensing imaging technology, there are more and more multisource remote sensing data covering the same area. e imaging mechanisms and results of these multisource data are quite different, making it difficult to detect changes using traditional CD technology. Although there may be major differences in multisource data, they ultimately constitute different descriptions of the same scene. e imaging makes comparisons difficult in the original low-dimensional space, but having the same ground object makes multisource data comparable in high-dimensional space. On the other hand, although rich data are provided by sensors, these data are often the result of a combination of factors. erefore, it is necessary to identify the key relevant information to explain and assist in decision-making. In recent years, sparse representation has increasingly been used in remote sensing image CD and data fusion. In 2021, in order to solve the problem that the accuracy of CD is easily affected by speckle noise, a method based on sparse representation and capsule network was proposed in literature [29]. First, sparse features of difference images are extracted using SR. Second, a lightweight capsule network (L-CapsNet) is constructed to mine the spatial relationships between features. Finally, a change map (CM) is generated. Experimental results show that this method can reduce the influence of speckle noise and obtain stronger robust features. An image fusion method based on sparse representation has also been proposed in the literature [30]. e method is based on the assumption that NDVI and HRVI with different resolutions have the same sparse coefficients in a particular dictionary. e application of this method in multisource NDVI CD was verified using GF-1 and GF-2 satellite datasets.
As mentioned above, multisource image pairs are different descriptions of the same object. e data representation and correlation in different observation space constitute a challenging task. Due to the difference of imaging mechanism, multisource images have completely different imaging results. is encouraged us to try to learn two independent and coupled dictionaries to relate different observation spaces. e dictionary can be coupled by forcing multiple image block pairs to share the same sparse representation. Inspired by the above idea, a coupled dictionary learning method is proposed for multisource image change detection.
e goal of coupled dictionary learning is to match the elements from different observation spaces, so that they have one-to-one correspondence in the same position, and realize the mapping of multisource data.
rough coupling dictionary learning, invariant image block pairs tend to have lower reconstruction errors, while variable image block pairs have higher reconstruction errors. is can suppress the unchanged area, highlight the changed area, and obtain a robust change map. However, the randomly selected training sample may contain some changing training samples, and these changing samples may interfere with the learning of the coupled dictionary. erefore, we design a new loss function to automatically select training samples with low reconstruction errors and exclude training samples with changes in the training set, so as to build a more robust coupled dictionary. Finally, the multisource data is mapped to a high-dimensional feature space to enhance the variation of multisource data.

Proposed Method
Firstly, we give a general introduction to the proposed model in this paper; the input of the model is the original remote sensing images, and the output of the model is the CD results of the corresponding pixel. en, the data preprocessing method, coupled dictionary learning method, and differential feature extraction network are introduced in detail. At the same time, a loss function for CD is introduced. Figure 1, the method proposed in this paper consists of five steps: input, preprocessing, coupled dictionary learning, CD, and output. First, the input consists of multitemporal and multisource remote sensing image data. Second, in the preprocessing step, local image blocks of two different Computational Intelligence and Neuroscience periods data obtained by different sensors on the same scene are expanded into vector representations. ird, each image block pair is stacked as a training sample for coupled dictionary learning and forced to share the same reconstruction coefficient. After the coupled dictionary is learned, it can be determined whether the current coupled dictionary meets the termination condition. If not, the training sample with low reconstruction error is chosen as the next generation of the training set to learn the dictionary pairs with higher coupled degrees (and, if necessary, the process is repeated). Fourth, based on the coupled dictionary learned in the second step, the reconstruction coefficient of each image block pair is calculated and input into the difference feature extraction network to estimate the change degree. Finally, the final binary CD result is produced as output.

Data Preprocessing.
e concept of sparsity first appeared in the 1970s in the context of the efficient coding hypothesis in neurobiology. Today, this idea is widely used in the field of image processing. e following preprocessing should be carried out before attempting remote sensing shadow CD.
Let X x i N i 1 and Y y i N i 1 represent n pairs of image blocks. Each pair (x i , y i ) is randomly sampled from two multisource remote sensing images taken at different times in the same geographical location. In the proposed method, two image blocks (x i , y i ) at the same position are regarded as a pair of image blocks. As shown in Figure 1 where m t n t w 2 . x i and y i can be regarded as column vectors with dimensions m 1 and m 2 , respectively.

Coupled Dictionary Learning.
e typical problems related to sparse representation are expressed as in the following equation: . . , d n ] ∈ R m×d represents the dictionary corresponding to the image; α i represents the sparse coefficient, which is obtained by sparse representation of the i th image block x i according to dictionary D; λ is the regular parameter; and λ α i 1 ensures the sparsity of vector α i . Finally, the constraint on d 2 i 2 ensures that each dictionary atom is homogeneously distributed in order to avoid toolarge values that will increase the time necessary for optimization. e coupled dictionary learning method for remote sensing images proposed in this paper is an extension of the above methods. Let x Let X x i y i , D D x D y ; (2) can be written as follows: where d j is the i th element of the coupled dictionary D.
After the above combination, the complex equations are simplified into sparse representation objective functions, and the corresponding optimization process is also simplified. Equation (3) is not a convex function, which is not convenient for optimization. However, when one of the values is fixed, the above equation becomes a convex function that can be optimized. erefore, the alternate iteration method is used to optimize it. e specific calculation method is as follows: (1) Optimizing coupled sparse coefficient α i . According to the above analysis, dictionary D must be fixed before coupled sparse coefficient α i is optimized. For the first iteration, D is unknown. e dictionary is initialized by randomly selecting data in the training set or randomly generating a matrix. After dictionary D is determined, the objective function becomes a sparse coding problem, and the sparse coefficients are calculated according to the categories. When the i th sparse coefficient α i is calculated, the remaining sparse coefficients that do not belong to the α i (i ≠ j) are fixed. e following equation can be used to solve the sparse coefficient α i : e iterative projection method [31] is used to solve (4).
(2) Optimizing dictionary D. In this optimization process, the sparse coefficient α i is a known condition. e method of updating the dictionary is similar to the previous procedure. When the dictionary D i belonging to the i th category is computed, the rest of the dictionaries D j , (j ≠ i) are fixed. e objective function can be simplified to the following equation: x i represents the sparse coefficient obtained by sparse representation of training image set X on dictionary D, which belongs to the i th category.

Differential Feature Extraction Network.
After obtaining the coupled dictionary D x and D y , the multisource remote sensing image data is mapped to the high-dimensional feature space. In this space, changing image blocks will have different reconstruction coefficients, while unchanging image blocks will have similar reconstruction coefficients. e change degree of each image block pair can be measured by comparing the reconstruction coefficients α x i and α y i of the i th pair of images x i and y i with their coupled dictionaries D x and D y . e calculation method for α x i and α y i is shown in (6), (7), and (8).
λ is the regular parameter. Equation (7) can be solved as in Since the goal of CD is to detect the differences between two images taken at different times, the difference feature vectors are calculated after the feature vectors α x i and α y i of the two images are obtained using the above method. e difference feature vector α d is calculated as in In order to learn the dissimilarity probability between two images from the difference feature vectors, a difference feature extraction network is designed after the difference feature vectors are obtained. e network consists of three fully connected layers, f 1 , f 2 , f 3 , as shown in Figure 1(d).
e value of its hidden layer can be calculated using the following equation: h (k) and h (k− 1) are the output matrices of the k th and (k − 1) th layers. W k is the weight that is input to the hidden layer, b (k) is the bias, and because we need is a binary value of 0 or 1, the ReLU(·) is selected as the activation function. When the information is transmitted to the last fully connected layer fc3, a value ranging from 0 to 1 is output, indicating the probability of dissimilarity. For the above differential feature discriminant network, it is expected that the generated dissimilarity probability can be judged as the correct category label and that the obtained dissimilarity probability will be large for the changing category and small for the unchanging category. erefore, Computational Intelligence and Neuroscience cross-entropy loss is added to its loss function. Its loss function consists of three parts, as shown in the following equation: L adv represents true/false loss, L cls represents category loss, and L E represents cross-entropy loss. ey are calculated as in the following equations: L cls E log C c|y fake , N is the number of training samples, and the value of y i is 0 or 1, which indicates the label of the input neighborhood image block pairs. 1 indicates that they have changed, and 0 indicates that they have not changed. y i fake represents the probability of dissimilarity of input neighborhood image block pairs.
For the optimize algorithm, SGD algorithm is a famous parameter optimization method proposed, which is widely used in model parameter optimization of deep learning. However, its disadvantage is that it has a slow descent speed and may oscillate near the minimum value [32]. erefore, SGD method based on first-order momentum is proposed [33]. is method updates each parameter with the same learning rate. However, deep neural network often contains a large number of parameters. For the parameters which are updated frequently, the model has accumulated a lot of knowledge about them; we hope that they are affected by a single sample, so the learning rate should be slower. However, for parameters that are occasionally updated, the model has little accumulated knowledge, and we hope that more knowledge can be learned from each random sample, so the learning rate should be higher. erefore, an optimization method named AdaGrad based on second-order momentum is proposed [34]. is method performs well for sparse data. However, it may reduce the learning rate monotonously to zero, thus ending the training process prematurely. Adam [35] optimization method solves the above problems completely. is method integrates firstorder momentum and second-order momentum, and the coefficient is used to control the first-order momentum and second-order momentum, which is more conducive to large-scale remote sensing image optimization. erefore, the Adam algorithm is used to optimize loss function L G . During the training process, all parameters of the network are fixed first, and they are updated. When L G training tends toward stability, this indicates that the network has reached a convergence state and the training process can be stopped.

Dataset.
e first dataset is the CDD [36], shown in Figure 2. It consists of seven pairs of images with 4725 × 2700 resolution obtained from Google Earth. A total of 16,000 image pairs (10,000 in the training set, 3,000 in the test set, and 3,000 in the validation set) with a spatial resolution of 3 − 100 cm were collected from the seven pairs of images.
e second dataset is the LEVIR-CD dataset [37], shown in Figure 3. e dataset contains 637 pairs of CD 1024 × 1024 images with a spatial resolution of 0.5 m. e images were collected from 20 different locations in Texas between 2002 and 2018. is dataset helps train more efficient CD models while focusing on interesting changes and reducing the impact of other irrelevant changes on the model.

Evaluation Metrics.
For CD, it is necessary to prove the effectiveness and robustness of the proposed method using quantitative analysis. For performance evaluation purposes, CD results are usually presented as binary images, wherein white pixels represent changed pixels and black pixels represent unchanged pixels. In this experiment, precision, recall, F 1 -score, OA, and IoU were used to evaluate the proposed methods. e calculation methods are shown in the following equations: OA  e Python version was 3.6.10, and the deep learning framework was PyTorch version 1.9.0.
During the training process, due to the low resolution of the CDD, its batch size was set to 16, while due to the high resolution of the LEVIR-CD dataset, its batch size was set to 2 in consideration of the capacity of the system. e initial    Computational Intelligence and Neuroscience 7 learning rate was set to 1e-3 and was adjusted automatically by the system according to the training situation. e number of epochs was set to 100. e K-SVD algorithm was used to learn the coupled dictionary. e differential feature extraction network was composed of three fully connected layers (fc 1 , fc 2 , and fc 3 ). e range of the numbers in the hidden layer of the differential feature extraction network was determined by empirical formulas and Kolmogorov theorem [38]. e number of nodes was set to 150, 100, and 1, respectively.

Ablation Analysis.
Ablation analysis was used to illustrate the contribution of the proposed method's component. We assessed the contribution of the component to the model by comparing the model that contains it with the one that does not. Instead of subtracting, we gradually added modules such as coupled dictionary learning (CDL), differential feature extraction (DFE), and L G loss function (L) onto our proposed baseline network to verify its performance. Compared with LEVID-CD dataset, CDD is a largescale remote sensing dataset. erefore, the CDD is used for ablation analysis. Table 1 shows the results of the ablation experiment. e last row in the table represents the performance of our proposed model. e first row represents the performance of the baseline model (sparse representation). e next row in Table 1 indicates that deploying the coupled dictionary learning module, differential feature extraction, and L G loss function to the baseline model leads to a significant improvement in all the metrics. Adding the coupled dictionary learning module to the model results in a slight improvement in all the metrics except the recall rate.
e F 1 -score is improved due to the addition of differential feature extraction. We then introduce the L G loss function, which raises all the metrics except slightly.
is variation can be attributed to L G loss function, which enhances the representation ability for the proposed method.

Results and Discussion.
To evaluate the performance and effectiveness of the proposed method, we compared it with several classical methods.
In this experiment, four typical change areas, road, car, single object, and multiple objects, were selected to evaluate the performance of the proposed method. As can be seen from Figures 4 to 7, the proposed method consistently achieved the best performance. e boundary of the proposed method's CM is clearer than those of the other methods, and the misclassification area of the proposed method is smaller than those of the other methods. In addition, the change maps obtained by the proposed method are almost consistent with the reference CM.
Meanwhile, the quantitative results in terms of the five quantitative evaluation metrics listed in Table 2 show that the proposed method achieved excellent performance in terms of precision, recall, F 1 -score, IoU, and OA. e OA of the proposed method was 0.883, which is 0.02 higher than that of the GANs method (which had the best precision among the seven SOTA CD methods). e accuracy value was also significantly improved compared with the other methods.
In this experiment, four typical change areas, singleobject change, more-object change, much-more-object change, and dense-object change, were selected to evaluate the performance of the proposed method. As can be seen in Figures 8 to 11, the proposed method achieved the best  Meanwhile, the quantitative results on the four quantitative evaluation metrics listed in Table 3 show that the proposed method achieved excellent performance in terms of precision, recall, F 1 -score, IoU, and OA. Compared with the seven SOTA CD methods, the precision, recall, F 1 -score, IoU, and OA of the proposed method were 0.877, 0.889, 0.873, 0.852, and 0.991, respectively, being 0.031, 0.006, 0.012, 0.005, and 0.004 higher than their counterparts for the GANs method (the second-best method).

Conclusions
In this paper, we proposed a CD method for multisource remote sensing images using a coupled dictionary and deep learning. To reveal the coupled relationship between multisource remote sensing images, a coupled dictionary module was proposed. is module maps multisource remote sensing images from low-dimensional to high-dimensional space to calculate the reconstruction coefficient. e changing image blocks will have different reconstruction coefficients, while unchanging image blocks will have similar reconstruction coefficients. To more accurately describe the degree of change in an object, we constructed a new image block for reconstruction loss in a differential feature discriminant network, which measures the change degree of each image block pair by considering true/false discrimination loss, classification loss, and cross-entropy loss. Compared with other CD methods, the proposed method can effectively alleviate the uncertainty of edge pixel CD and realize satisfactory results. In the future, we plan to focus on CD for large remote sensing images and extend the proposed method to practical applications. e method that we proposed has its limitations. e location where has changed in multisource remote sensing images can be detected by the proposed method. But what has changed in multisource remote sensing images cannot be detected. To address this problem, hyperspectral images containing a lot of spectral information can be used to detect what has changed. In addition, cloud and fog occlusion in remote sensing image can also affect change detection. erefore, it is necessary to preprocess remote sensing images before using the proposed method for change detection.
ese limitations indicate that although a highly efficient change detection method is proposed, it is still a challenging task to generalize the method to practical production, and this will be the direction of our future efforts.
Data Availability e underlying data supporting the results of our study can be found in [36] and [37].

Conflicts of Interest
e authors declare no conflicts of interest.