Multisensor Fusion of Landsat Images for High-Resolution Thermal Infrared Images Using Sparse Representations

Land surface temperature (LST) is an important parameter in the analysis of climate and human-environment interactions. Landsat Earth observation satellite data including a thermal band have been used for environmental research and applications; however, the spatial resolution of this thermal band is relatively low. This study investigates an efficient method of fusing Landsat panchromatic and thermal infrared images using a sparse representation (SR) technique.The application of SR is used for the estimation ofmissing details of the available thermal infrared (TIR) image to enhance its spatial features. First, we propose a method of building a proper dictionary considering the spatial resolution of the original thermal image. Second, a sparse representation relation between lowand high-resolution images is constructed in terms of the Landsat spectral response. We then compare the fused images created with different sampling factors and patch sizes. The results of both qualitative and quantitative evaluation show that the proposed method improves spatial resolution and preserves the thermal properties of basic LST data for use with environmental problems.


Introduction
Land surface temperature (LST) is defined as the temperature emitted by the surfaces of land objects.LST is a key variable in agricultural, climatological, hydrological, and environmental and ecological studies [1,2].Furthermore, it is a combination of the results of surface-atmosphere interactions and energy fluxes between the atmosphere and the ground [3].For urban climate research, satellite-based LST has the ability to depict comprehensively a complex temperature distribution spatially in highly heterogeneous urban environments [2].
LST can be obtained for a particular set of sample points from ground weather stations or, with the modern development of satellites and high-resolution sensors, LST can be estimated over large regions through the use of thermal infrared band data supplied by satellites [4].Many satellite sensors can sense thermal infrared radiation (TIR) with different spectral and spatial resolutions [5].Medium-resolution thermal infrared imagery/data, LANDSAT TM/ETM+, and ASTER have been used extensively for analysis on a regional scale to study surface temperature variations and to relate them to land cover characteristics [6].
Unfortunately, such TIR images from satellite sensors have low spatial resolutions and high noise levels.In contrast, the panchromatic band has a relatively higher resolution.The motivation for fusing TIR and panchromatic data is to produce data with an improved spatial resolution.
Panchromatic images observe radiation reflected from Earth's surface over a visible and near-infrared (NIR) wavelength range of 0.4 to 0.9 m, whereas TIR images observe radiation emitted from Earth's surface over a TIR wavelength range of 8 to 15 m [7].The physical processes in the panchromatic and TIR images from different bands result in difficulty in fusing the two types of images.
Many researchers in the field of image processing have developed efficient image fusion algorithms such as the expectation maximization (EM), discrete wavelet transform (DWT), and Laplacian pyramid [8,9] methods.Much literature focuses on presenting techniques of integrating IR and visible image information for enhanced visual quality.
In addition to these general multiresolution fusion methods, many techniques have been developed specifically for the fusion of multispectral satellite images [9].However, those methods have the requirement overlapping spectral responses, for example, between RGB visible bands and the panchromatic band.Additionally, fusion methods for satellite TIR and visible band images have been proposed using a physical property-based postcorrection solution using a twostep process [9] or using an optimal scaling factor in order to control the trade-off between spatial detail and the thermal information [10].However, using existing approaches, it is still necessary to preserve thermal properties in a one-step process.
Image fusion methods based on compressive sensing (CS) have gained attention in recent years; in these approaches, the high-frequency details of reconstructed high-resolution (HR) images can be learned from HR training images [11].Yang and Li [12] first introduced the use of sparse representation (SR) in image fusion.The sliding window technique (overlapping patches) is adopted in their method to make the fusion process more robust against noise and misregistration.Yang et al. [13] presented an SR method in the CS framework, which needs to train two dictionaries to ensure that the low-resolution (LR) and HR image patches have the same coefficients.However, the performance of this method relies heavily on the number of atoms in the dictionaries.Jiang et al. [14] proposed a practical fusion method using dictionaries constructed from a set of available PAN and MS images.Liu et al. [15] presented a general image fusion framework combining multiscale transforms (MST) and SR to overcome simultaneously the inherent defects of both the MST-and SR-based fusion methods.Low-pass bands are merged using an SR-based fusion approach that uses the max-L1 rule to obtain a fused sparse vector.
Inspired by these ideas, we propose a fusion method based on SR for Landsat TIR and panchromatic images.The proper TIR dictionary is shaped considering the spatial resolution of the original thermal images, and a SR relation between low-and high-resolution images is constructed in terms of the satellite sensor's spectral response.To verify the effectiveness of the proposed framework, four fusion methods, including the cross bilateral filter (CBF) [16], multiwavelet (MW) [9], discrete wavelet transform (DWT) [17], and optimal-scaling-factor-based fusion (OSF) [10] methods, are tested in our experiments.
The rest of this paper is organized as follows.We first introduce the basic theory of CS and describe the detailed fusion framework in Section 2. In Section 3, experimental results are shown and compared with the results of other approaches.Finally, Section 4 summarizes the main conclusions of this paper.

Proposed Fusion Framework
To better illustrate the proposed framework based on SR method, we present the details of our framework in this section.

Compressive Sensing.
Consider obtaining the value of  in the following problem for y ∈   and  ∈   y = D; (1) when  > , it is incomplete, and when  < , it is overcomplete.
Only in the case of  =  do we expect that the inverse of  exists.For the incomplete case no solution exists.For the overcomplete case, there exists more than one solution.This means that we obtain the desired solution only for certain conditions, which leads to optimization problems in the signal representation.Under the sparsity condition of a signal, we can choose the best signal recovery approach in the overcomplete case.If we assume that y is a measurement and  is some transform matrix or dictionary, we must decide what constitutes the true signal .When the signal  is sparse, then we can use the methods of optimizing problems: which is an  0 -minimization problem.It is an NP-hard problem, and it has several methods for approximation such as basis pursuit (BP) and orthogonal matching pursuit (OMP) [18].Another optimization approach is the  1 -minimization, which involves the following: Optimization based on  1 -minimization can exactly recover -sparse signals and closely approximate compressible signals with high probability using only  ≥  log(/) i.i.d.Gaussian measurements [19].

Signal Recovery Algorithm.
A stable solution of (3) can be found by using relaxation techniques such as basismatching pursuit or greedy algorithms such as orthogonal matching pursuit (OMP).We used the OMP method [20] to approximate the true signal  for the given dictionary , measured data y, and the sparsity of the true signal .The OMP method chooses  columns out of the dictionary by using projection methods for -sparse signal recovery.Initially, the signal y is projected to all the columns of .The column having maximal inner product is selected and an index set representing the position of the column is made.This is later used as an index for signal support.Then, the residual is saved as a new signal and the same procedure above is repeated until we obtain the index set of elements equal to the degree of sparsity.Finally, the residuals and the index set also successively update and the signal is approximated.

Problem Statement.
Let MS = {MS  },  = 1, . . ., , be the available MS image composed of  bands, with  as the HR PAN image.In our research, the objective of the fusion process is the estimation of the TIR image at the resolution of the PAN image, which is denoted as HR TIR (see Figure 1).X is tiled in  overlapping patches of size NR × NR, where  is the resolution ratio between the MS and PAN images, and  is a scalar coefficient tuned by the user.
denote a vector containing the pixels, arranged by columns, belonging to a generic patch of .In the SR-based framework,  is approximated as a linear combination of the elements of a dictionary D h , where the superscript h indicates that the dictionary is composed of high-spatial-resolution pixels The aim of our SRT is to find an optimal combination of x using the elements of the dictionary subject to a sparsity constraint on the coefficients , which results in minimum reconstruction error.The sparsity constraint enforces that the coefficient vector  has few nonzero entries (i.e., a small "pseudonorm" ‖‖ 0 ).A good choice of dictionary enables signal description using a (very) sparse representation.
Consequently, the suitability of the approach for the problem at hand derives itself from the consideration that patterns tend to repeat several times within an image, and the number of recurrences is smaller for high-variance zones.In other words, there are fewer recurrences for zones characterized by a greater amount of spatial detail [21].
As in [14], we construct the dictionary as Dictionary Δ h 1 is composed of  patches of the PAN image .
The estimation of the coefficient  is performed at a simulated low-resolution PAN image scale from the original MS image by solving (5).This process is based on the scale-invariance hypothesis that is justified by the direct correspondence of patches in the fused and original image [14].In more detail, we consider the underdetermined system y = D l  under the sparsity constraint over ; namely, we solve α = arg min ‖‖ 0 such that y = D l .
In (7), y = [ 1 ]  ∈   2 denotes a patch of the TIR image arranged by columns, which is a band of the original MS  images (see Figure 1).The dictionary D l = diag(Δ l 1 ) is the reduced-scale counterpart of D h and consists of Δ l 1 = [  1,1 , . . .,    1, ], each of which is composed of  patches of the low-resolution (LR) PAN image.The relative spectral response functions of the Landsat satellite sensors are shown in Figure 2. According to Figure 2, the LR PAN image patch  can be expressed as a nearly linear combination of all the bands of the corresponding original MS image patch [22].
We apply the OMP procedure to the TIR band.In other words, we find the vector  = [ TIR ].This allows an appreciable computational ease with respect to the overall optimization problem defined by (7) without introducing significant performance degradation [23].Each single corresponding patch of the image  is reconstructed by using formula (5).The whole image HR TIR is finally obtained by averaging the patches in the overlapping zones.

Source Images.
We tested the proposed fusion approach using Landsat 7 multispectral images.Landsat 7 satellite data were downloaded from the US Geological Survey Global Visualization Viewer (GloVis) website (http://glovis.usgs.gov/,accessed August 2016).The test data were acquired on 25 October 2015 from 115/036 (path/row) in World Reference System-2 (WRS-2).The ETM+ sensor scans a spectrum of eight bands with high resolution to provide images of Earth's surface with resolutions of 30 m for The test data cover Jellanamdo in South Korea, including mountains, agricultural fields, coastlines, and city areas.Consequently, we tested the performance of the fusion approach in both urban and rural areas.Figures 3(a Spatial details including paddy fields, foreshores, roads, and mountains are apparent from the PAN image in Figure 3(a), but we can recognize only the overall pattern of features from the TIR image in Figure 3(b).As seen in the PAN image (Figure 3(a)), most mountains, rivers, and foreshores have relatively lower reflectance values, and most paddy fields are brighter than mountains.There are two targets that are brightest: a greenhouse and a vacant lot covered with cement.As for the TIR image, paddy fields are much brighter than mountains and rivers (Figure 3(b)).The brightest spot in the center of the studied area is Suncheon's water treatment plant (Figure 4).It should be noted that some features show different greyscales in PAN and TIR images.For example, roads covered with different cement and asphalt materials are represented differently in greyscale in the PAN and TIR images because the TIR image is related to surface emittance, namely, surface temperature (Figure 4).Thus, fusion must be carefully performed in order to maintain the emission properties of the ground surface and to allow interpretation of the spatial variation of surface features.

Objective Evaluation Metrics.
The quantitative evaluation of different fusion images is not easy, because the reference temperature image, as a ground truth, does not exist in practice.Although various quality indices have been proposed to evaluate the image fusion performance, none of them is more plausible than the others [9,24].Thus, a few metrics are assessed for the fused image quality by evaluating the source images and the fused image.In this work, ten typical metrics are briefly introduced and employed to compare quantitatively the performances of the fusion methods.
(1) The pixel intensity mean (PIM) () measures an index of overall pixel values and is given by  = ∑  =1 ∑  =1 (, )/, where (, ) is the pixel value at (, ) and the image size is  × .Because we process a thermal image, it is important to keep PIM as close as possible to the original thermal image.
(3) Entropy () estimates the amount of information present in the image and is given by  = − ∑ −1 =0   log 2 (  ), where   is the probability of the intensity value  in an  gray level image.An image with high information content will have high entropy, resulting in better fusion quality.
(4) Mutual information (MI) quantifies the overall mutual information transferred to the fused image from the source images.MI can be defined as MI = MI  + MI  , where MI  = ∑  ∑   , (, )log 2 ( , (, )/  ()  ()) is the mutual information between the source image  and the fused image , and MI  = ∑  ∑   , (, )log 2 ( , (, )/   ()  ()) is the mutual information between the source image  and the fused image .
(8) The sum of the correlations of differences (SCD) is a quality metric that makes use of the sum of correlation values as a quality measure of the fused images [24].The difference images ( 1 and  2 ) are the difference between the fused    image and the input images ( and ).It is formulated as follows: where (⋅) is a function that calculates the correlations between  and  1 and  and  2 .
(9) The average thermal energy deviation (AVGD) is used because the fusion method is based on thermal radiation.This parameter evaluates the deviation of thermal energy between the synthesized image and the original IR image [9].(10) The universal image quality index (QI) was initially proposed as a universal measure of image quality achieved by modeling the structural distortion [26], but here we use it as an index of image similarity.The closer the QI to 1, the more similar the two images being compared.
For each of the ten metrics, a larger value generally indicates a better fused result except for  / ,  / , and AVGD.To guarantee the objectivity of evaluation results, all the parameters in the above metrics are set to the default values reported in the related publications.

Investigation of Sampling and Patch Sizes.
In this subsection, the effect of the sampling level and the dictionary patch size is studied for the optimal SRT fusion method.The lowand high-resolution dictionaries used in the sparse model are constructed and randomly sampled from the Landsat test image.The dictionary size is set to sampling levels 1/5, 1/10, 1/15, 1/20, 1/25, and 1/30.The local image patch sizes are selected to 20 × 20, 30 × 30, 40 × 40, 50 × 50, and 60 × 60 considering the computation time.
For convenience, a fusion method under the proposed framework, using a certain SRT, is denoted as SR-L-W.For example, SR-5-30 represents the fusion method using SRT with a 1/5 sampling level and 30 × 30 image patch dictionary.The error tolerance for the termination of the OMP algorithm is set to 1 − 4 and the OMP iteration number is fixed at 150.The step length of the sliding window in the patch tiling process is set to one.
We can see that the objective assessment results of the SRT methods according to various parameters suffer patch size effects when the sampling ratio is larger than 1/30 while the computation time increases with the size of the sampling data and patches (Figure 5).Therefore, we use SR-10-40 fusion for comparison with other popular fusion methods because it is desirable to use a patch size greater than 40 in this experiment.

Fused Results.
In this subsection, four fusion algorithms, CBF, MW, OSF, and DWT [17], are used to illustrate the advantages of the proposed fusion framework.The principle of image fusion using DWT is to merge the wavelet decompositions of the two original images using fusion methods applied to approximation coefficients and detail coefficients.TIR images were used for approximation and their average images were used for detail.MW is a multiwavelet image fusion algorithm for extracting fine textural features from PAN images and superimposing them onto the TIR image.
First, for each specific method, the effectiveness of the proposed method is verified using the above objective fusion metrics, and the impact of the sampling level of the dictionary is studied.We then make an overall comparison in terms of both objective assessment and subjective visual quality.All the fusion methods in this work were implemented in MATLAB on a computer with a 3.4 GHz CPU and 32 GB of RAM.
Table 2 lists the visual assessments of the fusion methods for different regions in Figure 4.The proposed SRT method shows pixel values similar to those in the TIR images in all regions (Figure 6).The OSF method shows similar pixel values except for cement and asphalt roads.In particular, we cannot see regions of similar pixel values with CBF fusion.This is mainly because the maintenance of TIR properties in the previous method is neglected.The SRT method involves a similar range to that of the original TIR image, but other methods do not.
Quantitative analysis of five image fusion methods is shown in Table 3.The values in each column labeled in bold indicate the best performance among all the methods.The PIM of the CBF and OSF fused images were considerably smaller than those of the original TIR image, suggesting that the fusion process did not maintain physical properties from TIR sources.The PIM and range in the SRT method were preserved.The CBF had the best value in terms of AG, MI, SF,  / , and  / .The OSF result has the best values in terms of E, CC, and SCD.Otherwise, the SRT method achieved the best values in terms of  / , UIQI, and AVGD.AG and SF reflect the local intensity variation.PAN has a high AG and SF value, and TIR has a low AG and SF value.Although the SRT method has a low AG and SF value, it may be considered to be relatively less affected by PAN image.The advantage of the SRT method over other methods is very clear in terms of minimal artifacts and less deviation from the original TIR images and, in terms of visual and some quantitative aspects.
Our method obtains more meaningful information from TIR images, without physical correction of the fused images, while abandoning less relevant information from PAN images.

Conclusion
In this paper, we present an SR-based image fusion framework for Landsat thermal images.Many previously proposed image fusion methods are capable of synthesizing multiple sensor images and producing good visual effects, but they do not retain the thermal properties of IR images.Our method is designed to preserve surface-temperature-related information as well as spatial resolution.We applied a sparserepresentation-based fusion method for estimating highresolution thermal images.We tested the impact of sampling level and patch size for SRT parameter adjustment.Finally, our fusion results were compared with four other fusion methods' results and analyzed using visual and objective assessments.The experimental results demonstrated that the proposed method improves spatial properties of TIR images and preserves their spectral consistency.Therefore, the proposed method can contribute to the estimation of high-resolution land surface temperature for environmental analysis.
Although this work shows a new approach toward thermal image fusion, the proposed method has some defects.First, because it is difficult to distinguish each feature in the mountainous and agricultural field of the study site, various sites have to be tested through additional experiments.Second, the optimal parameter for the SRT fusion has not been determined.Finally, it is necessary to evaluate a fused image with a high-resolution thermal reference using a thermal drone for more accurate evaluation.For this, further research will be carried out.

Figure 1 :
Figure 1: Schematic of the proposed SRT technique (we call our proposed method the SRT).

Figure 3 :
Figure 3: (a) PAN and (b) TIR images from Landsat 7 in the studied area.

Figure 4 :
Figure 4: Region of interest for visual assessment.

Table 1 :
Bands available with the Landsat 7 ETM+ sensor.

Table 2 :
Visual assessment of the CBF, MW, DWT, OSF, and SRT methods.

Table 3 :
Objective assessment of the CBF, MW, DWT, OSF, and SRT methods.