An Orthogonal Learning Differential Evolution Algorithm for Remote Sensing Image Registration

We introduce an area-based method for remote sensing image registration. We use orthogonal learning differential evolution algorithm to optimize the similarity metric between the reference image and the target image. Many local and global methods have been used to achieve the optimal similarity metric in the last few years. Because remote sensing images are usually influenced by large distortions and high noise, local methods will fail in some cases. For this reason, global methods are often required. The orthogonal learning (OL) strategy is efficient when searching in complex problem spaces. In addition, it can discover more useful information via orthogonal experimental design (OED). Differential evolution (DE) is a heuristic algorithm. It has shown to be efficient in solving the remote sensing image registration problem. So orthogonal learning differential evolution algorithm (OLDE) is efficient for many optimization problems. The OLDE method uses the OL strategy to guide the DE algorithm to discover more useful information. Experiments show that the OLDE method is more robust and efficient for registering remote sensing images.


Introduction
Image registration is an important step for many fields [1], such as change detection, image fusion, and object recognition.In order to provide complete information about the image, it is necessary to register the images taken from different sensors or from the same sensor at different times.The result of image registration will greatly influence the performance of the follow-up procedure.So remote sensing image registration methods should be efficient, robust, and accurate.
Image registration methods are usually divided into two categories: feature-based and intensity-based methods [2,3].Many feature-based methods have been proposed [4,5].These methods usually need to initially extract salient features, such as point, edge, contour, and region.Those features are matched using similarity measures to establish the geometric correspondence between two images.One of the main advantages of these approaches is that they are efficient and robust to noise, complex geometric distortions, and significant radiometric differences.However, they will only perform well on the condition that suitable features are extracted and reliable algorithms are used [3].For some images, where features are not obvious, intensity-based methods perform better than feature-based methods.
The key procedure of intensity-based method is to find the optimal similarity metric.The similarity metric is to measure how closely the gray values of two images are matched.The similarity metric for remote sensing image registration must be robust.There are many commonly used similarity metrics [2,[6][7][8][9][10][11]. Mutual information which is based on the Shannon definition of entropy [9,10] was widely used in a lot of current work.Mutual information has been shown to be robust and does not depend on the intensity scaling or specific dynamic range of the images [12].
The searching strategy optimizes the similarity metric.Both local and global search strategies are commonly used.Many local methods have been used in image registration [13,14].These local methods yield the best registration when the initial orientation is very close to the true transformation.Besides, they are still easy to be trapped in a local optimum [14,15].So the global optimization is often required and global methods have been successfully applied to image registration [8,12,[16][17][18][19][20][21][22].

Mathematical Problems in Engineering
The differential evolution (DE) algorithm was proposed by Storn and Price for global optimization over continuous search space [23].DE is a version of evolutionary algorithm (EA) that has proven to be fast and reliable in many applications [23][24][25][26][27][28][29].The DE algorithm has shown to be efficient in remote sensing image registration [30].The seminal idea of DE is to generate a new vector by adding the weighted difference between two trial vectors to a third vector.The new vector is defined as  =  1 + ( 2 −  3 ), where  1 ,  2 , and  3 are three randomly selected trial vectors from the population and  is a multiplier, which is the main parameter of the DE algorithm [25].
The orthogonal experimental design (OED) offers an ability to discover the best combination levels for different factors with a reasonably small number of experimental samples [31].Owing to the OEDs orthogonal prediction ability and test ability, the orthogonal learning (OL) strategy can construct a guidance exemplar with the ability to predict promising search directions toward the global optimum [31].The OL strategy has been successfully applied to many areas [32][33][34][35][36][37][38].In this paper, we will apply the OL strategy to image registration problem.Considering the effectiveness of the DE algorithm in image registration [30], we combine the OL strategy with the DE.The main idea of our method is based on the observation that the major step in the DE can be considered to be an "experiment." Based on this "experiment, " the OL strategy will construct a guidance exemplar with an ability to predict promising search directions toward the global optimum.In our method, the OED is used to discover the best combination of three trial vectors.
The rest of this paper is organized as follows.In Section 2, image transformation, similarity metrics, and optimization techniques are discussed.In Section 3, the orthogonal learning strategy is formulated.In Section 4, the proposed OLDE for image registration is presented.Experimental results are described in Section 5. Finally, conclusions are drawn in Section 6.

Image Transformation.
The image registration process is actually a process to seek the one-to-one mapping between two images.The process links the points in two images corresponding to the same spatial position.The mapping is commonly referred to as a transformation.It is a twodimensional transformation in a two-dimensional space.The proposed approach in this paper is to be used in image registration in two-dimensional space.We are using a widely applied affine transformation model to transform the target image.This will allow us to demonstrate the efficacy of the OLDE method in image registration.

Similarity Metrics.
At the correct registration, similarity metrics must be robust.They should attain a global or a very distinct local maximum.Most of the current work on remote sensing image registration utilizes mutual information which has been shown to be robust for remote sensing image registration.Mutual information represents the relative entropy of two images [6].The larger the value of mutual information, the better the registration of the two images.In general, given two images  and , their mutual information is (, ) = () + () − (, ), where (, ) is the joint entropy and the () and () are the entropies of  and .Respectively, (), (), and (, ) are given as where   () is the marginal probability density function of ,   (V) is the marginal probability density function of , and  , (, V) is the joint probability density function of  and .The   (),   (V), and  , (, V) can be estimated with the Parzen windows [10].Normalized mutual information   is given as Normalized mutual information is less sensitive to the size of overlap area.It has been proved that normalized mutual information is an accurate and robust image registration similarity metric in previous studies [6,8,39].Therefore, in current study, normalized mutual information was selected as the similarity measure.In image registration, the greater the value of   , the better the match between the two images.So this is a maximization problem.Many optimization problems are formulated as minimization problems, where the objective function is denoted as (), and, thus, it is understood that for image registration, the goal is to minimize −(), without loss of generality.

Optimization of Similarity Metrics.
The goal of image registration is to find the best transformation parameters to maximize the object function (similarity metric).Both image interpolation and joint probability density estimation are involved in image registration.Thus, computing input parameters for (3) is computationally expensive.Therefore, it is necessary to get an effective method to reduce this cost.The OLDE algorithm, which this paper proposes, is an efficient global method for image registration.

Motivation of Orthogonal Learning Strategy.
The OL strategy can discover more useful information toward the global optimum [31].Here is a simple case to illustrate the importance of the OL strategy.Given a 3-dimension sphere function, () =  2  1 +  is 0 and the minimum point is [0, 0, 0].Suppose that the current trial vectors are and . Furthermore, assume that the value of  is 0.5, and then the new vector is This results in the new vector with a cost value of 20, which is worse than both  1 and  3 .However, if we can discover good dimensions of the three vectors, we can then combine them to form three new trial vectors and . Then, the updated vector is  = [0, 0, 1.5], resulting in the updated vector with a cost value of 2.25.Thus, the object function is moving toward the global optimum value of 0. The simple case above has illustrated the importance of designing three new trial vectors to generate the updated vector .In order to find the best combination of trial vectors, we use orthogonal experimental design (OED) [40], which can get a relatively good vector through only a few experimental tests [31].

Orthogonal Experimental Design.
We use a simple example to explain OED.In this example, to yield maximum chemical product, we should find the best level combination of the three factors: temperature, time, and alkali.These three factors which will affect experimental results are shown in Table 1 and we denoted them as factors , , and .The temperature has three levels: 80 ∘ C, 85 ∘ C, or 90 ∘ C. The time can be 90 min, 120 min, and 150 min.And the alkali can be 5%, 6%, and 7%.Therefore, there are 27 (3 3 = 27) combinations of experimental designs totally.It is desirable to obtain or predict the best combination by sampling only a few representative experimental cases.
Let   (  ) denote an orthogonal array, where  is the number of factors and each factor has  levels,  is the number of the combinations of levels, and "" denotes a Latin square.Table 2 shows an orthogonal array  9 (3 3 ).Each row in this table shows one combination of levels.The orthogonal array has three properties.First, for the factor in any column, every level occurs in equal times.Secondly, for the two factors in any two columns, every combination of two levels occurs in equal times.Thirdly, the selected combinations are uniformly distributed over the whole space of all the possible combinations [32].We apply  9 (3 3 ) to the example of chemical experiment.An orthogonal array with three factors is shown in Table 3.
In this paper, we use OL strategy to guide the DE algorithm to select promising search directions towards the global optimum, which will enable us to achieve the best image registration results in terms of the normalized mutual information similarity metric.

Differential Evolution Algorithm. Then we execute DE
times to generate  new vectors.Each new vector  is created by combining three randomly selected trial vectors from the population.This combination process is defined as follows: where  1 ,  2 , and  3 are three randomly selected trial vectors from the population and  is a multiplier, which is the main parameter of the DE.

Fitness of Image Registration
Using the OLDE.We take (3) as a fitness function.Given two images  and , the aim of the problem becomes finding the best affine transformation  for  so that the normalized mutual information of  and () is maximized.

The Procedure of Image Registration Using the OLDE.
The procedure of the OLDE is as follows.
Step 1 (input).Input target image and reference image.
Step 3 (population evolution) Step 3.1.DE, as (8), is executed  times using multiplier .A new population   = { 1 ,  2 , . . .,   } is generated with Step 3.2.Randomly choose three vectors in   to undergo three-to-three orthogonal crossover.After evaluating the fitness of vectors  1 ,  2 , . . .,   as (3), the worst three vectors of them are recorded.The worst three vectors are replaced with three new vectors generated by the orthogonal crossover.This results in a new population   .
Step 3.3.Evaluate the fitness of vectors in   , and choose the best one   and record its fitness value (  ).After that increment the generation number  by 1.
Step 5 (fusion).Fuse the  (target) image and reference image to generate the result image.
In Step 3, the population is evolved and improved iteratively until halting criterion is satisfied.One possible halting criterion is to stop when the number of generation  is equal to a given maximum value.In Step 3.1, we must check that if   has been out of the range of th parameters, we should replace   with the initialized value   , where 1 ≤  ≤ , 1 ≤  ≤ 6.In Step 3.2, execute three-to-three orthogonal crossover.An orthogonal array is generated as in Table 2. Carry out the procedure of -to- orthogonal crossover.Here, we set  = 3 and  = 3.In this step, three worst vectors are eliminated.Experiments proved that the speed of convergence can be improved and the diversity of the population can be kept by replacing the three worst vectors with three vectors generated by orthogonal crossover.

Experiments and Discussion
To investigate the performance of our method, we have compared it against three image registration methods: genetic algorithm (GA), particle swarm optimization (PSO) [12], and the differential evolution (DE) [30] algorithm.
In both experiments, we set the size of the population  = 30 and the maximum allowed number of generations  = 200.In the GA algorithm, we set  = 0.05,  = 0.8, where  is the probability of mutation and  is the probability of crossover of GA algorithm.In the PSO algorithm, the weight  is declining linearly from 0.9 to 0.4 and  1 =  2 = 2.0.In the DE algorithm, we set the multiplier FD = 0.7 and the probability of crossover CR = 0.5.In the OLDE algorithm, we set initial value  = 0.7, where  is the multiplier of DE procedure in the OLDE.The allowed variation ranges of 6 affine transformation parameters used in our experiments are shown in Table 4.To test the performance of all methods, we have run each method for 20 times.We record the value of the normalized mutual information (NMI) for the best solution NMI best , the worst solution NMI worst , and the mean value NMI over the 20 times and  is the standard deviation (see Tables 5 and 6).

Ottawa Dataset Task.
In the first experiment, we select two images from Ottawa dataset as shown in Figure 1  In Figure 2, image registration results are shown only for the best experiment out of 20 experiments; that is, an image for each search method corresponding to the highest NMI value is shown in the figure.The matching image is created by one superimposed on the other.Table 5 records the statistical results for each method.The table also contains optimal transformation parameter values obtained by each optimization method.Our method can get the best result compared with the other three methods.In addition, our method can receive much smaller  value than the other methods.In fact, the standard deviation of 0.0002 for the OLDE is very small.Therefore, the OLDE method is robust with respect to the initial parameter values.In Figure 3, the NMI values are shown as a function of the generation number.As shown in the figure, the OLDE has a much higher NMI value than the other methods.Thus, our method outperforms all the comparison methods.The numerical results of independently running the optimization algorithms 20 times are shown in Table 6.Also in this case, the OLDE method is the best method in terms of the NMI.Our method can receive much smaller  value than DE, PSO, and GA.Therefore, the OLDE method is robust with respect to the initial parameter values.

Yellow River
In Figure 5, the NMI values are shown as a function of a generation number.As can be observed from the results, with the increase of generation, the performance of the other two methods, PSO and GA, is far worse than the OLDE method and the DE method.In more detail, after the 60th generation, the OLDE method emerges as a better method than DE method.Therefore, using OL to guide DE can improve the performance of DE method.
In Figure 6, image registration results are shown only for the best experiment out of 20 experiments; that is, an image for each search method corresponding to the highest NMI value is shown in the figure.The best results for each one of the four optimization methods are included in the figure .In the third experiment, we also select two images acquired by RADARSAT-2 at the Yellow River Estuary region in June 2008 and June 2009.The two images are shown in Figure 7.The resolution of both images is 400 × 400 pixels with an 8-bit dynamic range.
The numerical results of independently running the optimization algorithms 20 times are shown in Table 7.Also in this case, the OLDE method can receive the largest value

Conclusion
This paper proposed a method for remote sensing image registration using the orthogonal learning differential evolution (OLDE).The orthogonal learning (OL) strategy can construct a guidance exemplar with an ability to predict promising search directions toward the global optimum.Differential evolution (DE) is a version of evolutionary algorithm (EA) that has proven to be fast and reliable in many applications.
The OLDE method uses the OL strategy to guide the DE algorithm to select promising search directions towards the global optimum.
To investigate the performance of our method, we have compared it against three image registration methods: genetic algorithm (GA), particle swarm optimization (PSO), and the differential evolution (DE) algorithm.The OLDE method was shown to be able to achieve the best image registration results in terms of the normalized mutual information similarity metric.Furthermore, experiments showed that the OLDE method was robust and efficient with respect to the initial parameter values.

Figure 1 :
Figure 1: Ottawa dataset: (a) target image acquired in May 1997.(b) Reference image acquired in August 1997.
as the experimental images.These two images are the portions of the city of Ottawa acquired by RADARSAT SAR sensor in May 1997 and August 1997, respectively.They were provided by Defense Research and Development Canada (DRDC), Ottawa.Figure 1(a) shows the image acquired in May 1997 during the summer flooding and Figure 1(b) shows the image acquired in August 1997 after the summer flooding.The resolution of both images is 290 × 350 with 8 bits per pixels.

Figure 2 :
Figure 2: Result of Ottawa dataset: image (a) is the result of image registration using OLDE.Image (b) is the result of image registration using DE.Image (c) is the result of image registration using PSO.Image (d) is the result of image registration using GA.

Figure 3 :Figure 4 :
Figure 3: Result of Ottawa dataset: behavior of GA, PSO, DE, and OLDE for optimizing the normalized mutual information.
Dataset Task.In the second experiment, we use two images acquired by RADARSAT-2 at the Yellow River Estuary region in China in June 2008 and June 2009 as the experimental images.The two images are shown in Figure 4.The resolution of both Figures 4(a) and 4(b) is 600 × 500 pixels with an 8-bit dynamic range.

Figure 5 :
Figure 5: Result of Yellow River dataset: behavior of GA, PSO, DE, and the OLDE for optimizing the normalized mutual information.

Figure 6 :Figure 7 :
Figure 6: Result of Yellow River dataset: (a) is the result of image registration using the OLDE method.Image (b) is the result of image registration using DE.Image (c) is the result of image registration using PSO.Image (d) is the result of image registration using GA.

Figure 8 :
Figure 8: Result of Yellow River dataset: behavior of GA, PSO, DE, and the OLDE for optimizing the normalized mutual information.

Figure 9 :
Figure 9: Result of Yellow River dataset: image (a) is the result of image registration using the OLDE method.Image (b) is the result of image registration using DE.Image (c) is the result of image registration using PSO.Image (d) is the result of image registration using GA.

Table 1 :
Factors and levels of the chemical experiment example.
11 ,  12 ,  21 ,  22 ,  1 , and  2 are six transformation parameters.Then, the transformation formulas of image are represented as   =  11  +  12  +  1 ,   =  21  +  22  +  2 , ) is the coordinate of the target image and (  ,   ) is the coordinate of the transformed target image.Given 6 real parameters,  11 ,  12 ,  21 ,  22 ,  1 , and  2 , trial vectors are formed based on those values.Therefore, each trial vector in the initial population of  trial vectors is an array with 6 positions, with the parameter vector denoted by  = ( 11 ,  12 ,  21 ,  22 ,  1 ,  2).The initial population is randomly initialized so that each parameter can uniformly vary within a range of its own.4.2.Population Evolution.To create a new population, both orthogonal crossover and the DE algorithm are executed consecutively.We refer to this combination as the OLDE algorithm.4.2.1.Orthogonal Crossover Based on OrthogonalArray.Here, we introduce the process of -to- orthogonal crossover.For example, we have a population with  vectors.We select  of them to be the parent vectors and each parent vector with  real values.The details of the orthogonal crossover that these  parents are based on   (  ) to produce  vectors are as follows.Input  parent vectors   = ( 1 ,  2 , . . .,   ), where 1 ≤  ≤  and   has  real values.

Table 4 :
Variation ranges of 6 affine transformation parameters.

Table 5 :
Comparison of the image registration results on the Ottawa dataset obtained by different methods.

Table 6 :
Comparison of the image registration results on the Yellow River dataset obtained by different methods.

Table 7 :
Comparison of the image registration results on the Yellow River dataset obtained by different methods.