Automated Coregistration of Multisensor Orthophotos Generated from Unmanned Aerial Vehicle Platforms

Image coregistration is a key preprocessing step to ensure the e ﬀ ective application of very-high-resolution (VHR) orthophotos generated from multisensor images acquired from unmanned aerial vehicle (UAV) platforms. The most accurate method to align an orthophoto is the installation of air-photo targets at a test site prior to ﬂ ight image acquisition, and these targets were used as ground control points (GCPs) for georeferencing and georecti ﬁ cation. However, there are time and cost limitations related to installing the targets and conducting ﬁ eld surveys on the targets during every ﬂ ight. To address this problem, this paper presents an automated coregistration approach for orthophotos generated from VHR images acquired from multisensors mounted on UAV platforms. Spatial information from the orthophotos, provided by the global navigation satellite system (GNSS) at each image ’ s acquisition time, is used as ancillary information for phase correlation-based coregistration. A transformation function between the multisensor orthophotos is then estimated based on conjugate points (CPs), which are locally extracted over orthophotos using the phase correlation approach. Two multisensor datasets are constructed to evaluate the proposed approach. These visual and quantitative evaluations con ﬁ rm the superiority of the proposed method.


Introduction
Unmanned aerial vehicles (UAVs) have now become competitive platforms for remote sensing-based applications as they can be easily and cost-effectively deployed while collecting geospatial data with higher temporal/spatial resolution than other platforms [1][2][3][4][5][6][7][8][9][10]. They can also collect multisensor images by mounting diverse sensors on the UAV platforms to achieve a wide range of remote sensing applications [11][12][13][14]. To perform efficient geospatial analysis using the multisensor images acquired from a UAV platform, image coregistration, which geometrically overlays the images, is an essential step [14,15]. The most stable and accurate method to achieve coregistration is the installation of air-photo targets in a study area that coincides with UAV flight. The targets, whose ground coordinates have already been measured by global navigation satellite system-(GNSS-) based surveying, are detected in the images, and a transformation function is estimated for the ground and image coordinates. Using ground control points (GCPs), which are targets with a coordinate relationship between the ground and image domain, all images are then transformed to ground coordinates. If GNSS-based GCP target surveying is not performed, a position error occurs between the multisensor/multitemporal orthophotos or Digital Surface Models (DSMs), which are generated by multiple overlapping images captured during UAV flight. However, there are time and cost limitations related to installing targets and performing GNSS surveys during all flights. Conducting image coregistration without GNSS-based target surveying may be possible if we can automatically extract conjugate points (CPs) that exist simultaneously in the images and define their spatial relationships [16,17].
To do this, previous studies have introduced two representative methods to extract CPs: feature-and area-based matching approaches [18]. The feature-based method extracts CPs based on representative points on the images, e.g., dominant features, line intersections, or centroid pixels in close-boundary regions (i.e., segments). The extracted representative points between images are matched to each other using various descriptors or similarity measures along with the spatial relationships between points. The feature-based method can be applied when significant distortions and geometric differences occur between images. Thus, the featurebased method is more effective than the area-based method when working with VHR multitemporal images [19,20]. However, applying well-known feature-based matching methods, such as the scale-invariant feature transform (SIFT) [21] and speed up robust features (SURF) [22], to VHR multitemporal images also has several limitations because these are unable to be used to match an optical image to heterogeneous nonoptical data [23].
The area-based method uses templates of predefined size for each image and identifies similarity peaks between the templates to extract the CPs. This method works well when images have nonlinear radiometric properties, i.e., multisensor images [24]. Unlike methods such as the correlationbased approach that directly use pixel values to measure similarity, methods using phase correlation or mutual information to perform similarity calculations between templates are effective for these types of multisensor images [25,26]. The area-based method is appropriate when images do not have severe geometric differences in terms of scale, rotation, and translation. In this situation, the template search space used to identify the similarity peak associated with the CPs can be geometrically limited. The area-based method is likely to fail when images have significant distortions or geometric differences.
To generate an accurate orthophoto via fine coregistration of the VHR images acquired from UAVs, navigation information derived from GNSS-equipped UAVs can be used to constrain the search space and extract reliable CPs [27,28]. This GNSS-based navigation information can also be used to conduct the area-based coregistration approach between multisensor images. We propose an automated coregistration approach between orthophotos generated by multisensor images captured from a UAV platform. Due to the GNSS navigation data provided at the time of image capture during UAV flight, the two multisensor orthophotos contain ground coordinate information along with their spatial resolutions. The spatial information derived from the GNSS data allows us to effectively conduct the area-based matching approach. The phase correlation method, which is a representative area-based matching approach with the advantage of estimating the similarity between multisensor data, is exploited based on the condition that the search space used for matching is limited by the GNSS-based navigation information. Two multisensor datasets, i.e., the RGB-thermal infrared (TIR) and RGB-multispectral (MS) sensor images, are constructed to confirm the possibility of coregistering the multisensor orthophotos without performing GNSS surveying.
The main contributions of this study are as follows: (1) The process of installing GCP targets and performing target surveying on every flight is not necessary, and therefore, we can dramatically reduce the labor time and cost related to fieldwork (2) The proposed coregistration method is applicable to multisensor orthophotos, e.g., generated by RGB, TIR, and MS sensor images that have nonlinear radiometric differences (3) By constraining the search space for area-based matching to extract CPs, the proposed method is applicable to study sites that have similar repetitive patterns, which is a possible failure to detect reliable CPs The rest of the manuscript is organized as follows: the methodology is described in Section 2. The constructed two multisensor datasets are discussed in Section 3. The experimental results and discussion are illustrated in Section 4. Section 5 concludes the paper with a description of future work.

Methodology
The process of automated coregistration between UAVbased multisensor orthophotos consists of two main steps. First, the GNSS navigation data acquired during image capture is used to initially correlate the coordinates between orthophotos. Then, a template-based phase correlation approach is applied to estimate a transformation function that connects the target and reference orthophotos. Target orthophotos are then warped to the reference orthophoto coordinates based on the estimated transformation function. Figure 1 provides a flowchart of the proposed multisensor image coregistration approach. The following sections describe each step in detail.

Initial Orthophoto Geometric Alignment Based on GNSS
Data. During UAV flight, images are captured at every determined time interval. When each image is captured, position information is provided by the GNSS receiver mounted on the platform. Based on this information, each orthophoto generated from the captured overlapping images contains ground coordinates. Therefore, orthophotos generated from multisensor images have common position information, which is beneficial when constraining the area-based matching search space for coregistration. Specifically, orthophoto spatial resolutions, which are estimated while generating orthophotos from the captured images, are used to minimize scale differences. The resolution ratio is calculated, and one orthophoto (target orthophoto) is resampled to the spatial resolution of the other orthophoto (reference orthophoto). Then, the resampled target orthophoto is shifted to the reference orthophoto coordinates based on the amount of translation in the x-and y-directions derived from the GNSS-based ground coordinates. Subsequently, phase correlation-based matching is carried out to detect CPs and precisely coregister the orthophotos.

Phase Correlation-Based Coregistration.
To detect welldistributed CPs over the overlapping orthophotos, local templates are constructed over the entire reference orthophoto with identical intervals. The location of corresponding templates for the target orthophoto is determined based on the coordinate information derived from the GNSS data. Since the target orthophoto has been roughly transformed to the reference orthophoto coordinates by taking into account the scale and translation difference using GNSS-based spatial information, the corresponding local templates for the target orthophoto are geometrically overlapped with the reference templates.
Then, we perform the phase correlation approach to find similarity peaks, which are associated with the position where corresponding local templates may exhibit optimal translation differences. The phase correlation method can extract translation differences between images in the x-and ydirections [29]. This method searches for differences in the frequency domain. If we let I ref x, y and I tar x, y represent the corresponding local template images of the reference and target orthophotos, respectively, that differ only by a translation, (x 0 , y 0 ), we can derive I ref x, y with the following equation: The phase correlation (C) between the two template images is calculated with the following equation: where F and F −1 represent the 2-D Fourier and 2-D inverse Fourier transformations, respectively. Since we assume that template images only have translation differences, the phase correlation values are approximately zero everywhere except at the shifted displacement. We can interpret the location of this peak as the translation difference between the two corresponding local templates. We use the phase correlation peak  3 Journal of Sensors location between the template images to extract the welldistributed CPs. More specifically, a centroid for each local template in the reference orthophoto is selected as a CP for the reference template. Then, the corresponding CP position for the target template is determined as a shift in the location from the centroid of the local target template by the amount that has the highest similar value with the phase correlation. After conducting the local phase-correlation process, we can extract a large amount of well-distributed CPs, i.e., the number of local templates defined over the reference orthophoto.
Finally, the extracted CPs are exploited to estimate the affine transformation coefficients and outliers that have large residuals compared with other CPs are eliminated from the CP set. The affine transformation coefficients estimated from the remaining CPs are then used to warp the target orthophoto to the reference orthophoto coordinates.

Dataset Construction
We constructed two multisensor datasets. The first dataset is composed of multisensor orthophotos acquired from overlapping RGB and TIR multisensor images, and the second dataset consists of orthophotos acquired from RGB and MS sensors. We used Agisoft Photoscan software to perform image processing with the goal of generating the orthophotos.
3.1. RGB-TIR Orthophoto Dataset. The first dataset, composed of RGB and TIR sensor images, is located at Hapcheon-gun, Gyeongsangnam-do, South Korea. The site includes various land cover types, such as man-made structures, farmland, vegetation, and bare soil. A DJI Inspire 1 quadcopter was flown twice over the test site on June 16, 2017, with a Zenmuse X3 RGB camera (RGB sensor)   Journal of Sensors mounted on the first flight and a FLIR Zenmuse XT sensor (TIR sensor) mounted on the second flight. Table 1 presents the specifications of the UAV and sensors used during data acquisition. The overlap and sidelap were adequately established to generate orthophotos from both RGB and TIR images. During the flight, 102 RGB and 263 TIR images were captured. When captured, all images have GNSS-tag information at the position where the GNSS signal receiver is mounted on the UAV, such that the generated orthophotos contain ground coordinates, which are used as ancillary information to improve phase correlation performance. The spatial resolutions for the RGB and TIR orthophotos are approximately 5 and 15 cm, respectively. Table 2 lists the detailed specifications of the dataset, and Figure 2 presents the onsite generated RGB (reference) and TIR (target) orthophotos.

RGB-MS Orthophoto
Dataset. The second dataset, designed to conduct coregistration between the RGB and MS orthophotos, is located at the Texas A&M AgriLife research center farm in Corpus Christi, Texas, USA. This site includes two crop types (cotton and sorghum) managed by two different tillage treatments (conventional tillage and no-tillage). UAV data were collected on April 23, 2018, using RGB and MS platforms, which comprised the DJI Phantom 4 Pro with a standard RGB sensor and the DJI Matrice 100 with a SlantRange 3p sensor ( Table 3). The RGB data were collected at an altitude of 40 m and 80% overlap and sidelap. The MS data were collected at an altitude of 50 m and 70% overlap and sidelap ( Table 4). The numbers of RGB and MS raw images were 445 and 846, respectively. In addition to the UAV flights, GNSS surveying was performed on 11 GCP targets that were evenly installed over the entire field. All GCP coordinates were surveyed twice using a differential dual frequency GNSS manufactured by V-Map (http://v-map .net/). The averaged values of the surveyed coordinates at each point were used as tie point constraints in Agisoft Photoscan software to generate a georectified orthophoto with the RGB platform. On the contrary, orthophotos from the MS platform were generated using basic GNSS information collected by the UAV itself. By coregistering the MS orthophoto to the georectified RGB orthophoto coordinates, we can generate the georectified MS orthophoto without performing GNSS surveying. The GNSS surveying data were additionally used to evaluate the accuracy of the coregistration results for the MS orthophoto. The RGB and MS orthophoto spatial resolutions are 1.04 and 1.96 cm, respectively. The generated orthophotos from the RGB-MS dataset are presented in Figure 3.

Experimental Results
To conduct the proposed coregistration approach for the constructed datasets, we empirically set the required environments and parameters. For an efficient process, orthophotos initially aligned by GNSS-tagging data were reduced to four times their original size with an image pyramid-based resampling method. For the RGB orthophoto from the first dataset, a red-band image was selected to conduct the phase correlation calculation with the TIR orthophoto. For the second dataset, red-band images from both the RGB and MS datasets were used for the phase correlation process. The local template size was determined as 250 × 250 pixels with the same interval (250 pixels) in both the x-and y-directions. Extracted CPs with root mean square errors (RMSE) above 5 were removed from the CP set to estimate the affine transformation coefficients.

RGB-TIR Dataset Coregistration
Results. Based on the experiments implemented with the chosen environments, 56 CPs were extracted in the RGB-TIR dataset. Due to regularly selected template images, which are associated with CP positions with the same interval, the CPs are evenly distributed in the image. These CPs were used to estimate the affine coefficients with which the TIR orthophoto was warped to the RGB orthophoto's coordinates. To visually inspect the performance of the proposed coregistration method, Figure 4 shows some parts of the  We observe that misaligned linear features and building roofs in the original orthophoto pair (Figure 4(a)) are properly aligned using the proposed coregistration approach (Figure 4(b)).
To quantitatively evaluate coregistration performance, we calculated representative similarity measures, i.e., correlation coefficients (CC) and normalized mutual information (NMI) values, between the original multisensor orthophotos and the coregistered multisensor orthophotos. The CC estimates a covariance-based similarity, and the NMI measures a statistical correlation using joint entropy between two images. Larger values correspond to a higher similarity between images. The CC values for orthophotos I ref and I tar are calculated with the following equation: where σ I ref I tar denotes the covariance between the reference    (Table 5). Specifically, NMI values significantly increased from 0.06 to 0.47 because NMI is an effective index that measures the similarity between multisensor images with dissimilar radiometric properties. Therefore, visual and quantitative evaluations confirm that the proposed approach effectively coregisters RGB and TIR orthophotos.

RGB-MS Dataset Coregistration
Results. For the RGB-MS dataset, 389 CPs were extracted and evenly distributed in the entire overlapping region. The affine coefficients were estimated, and the MS orthophoto was transformed to the RGB orthophoto coordinates using the constructed transformation model. Similar to the first dataset, we performed visual inspection ( Figure 5). Interior and exterior rectangles represent the RGB and MS orthophotos, respectively. We note that the MS orthophoto is displayed with the NIR, R, and G color compositions. From Figure 5(b), we observe that the rows of plants are precisely aligned at the boundary between the rectangles with the application of the proposed coregistration method. Additionally, we performed an absolute quantitative assessment using GNSS survey data that was performed on the GCP targets. After processing the coregistration between the RGB and MS orthophotos, we manually digitized the points located at the center of each GCP target (e.g., marked with a yellow point in Figure 6). The ground coordinates of the digitized points were then compared with their GNSS survey coordinates to conduct the coregistration accuracy assessment ( Table 6). The overall RMSE was 0.111 m, which corresponds to approximately five pixels in the MS orthophoto. This indicates that the proposed method is able to provide high precision coregistration results in agricultural fields that have similar repetitive crop patterns with no artificial features, which poses a problem when searching for reliable CPs with reasonable RMSE values. These results are promising because the proposed approach enables the precise geometrical alignment of the multisensor orthophotos without conducting GNSS surveying on the targets, which is the most time-consuming and labor-intensive part of the entire UAV surveying process.

Conclusions
This study proposed an automated coregistration approach between UAV-based multisensor orthophotos. The phase correlation approach was employed to coregister multisensor orthophotos with the aid of GNSS-based navigation information derived during UAV flight. Two different multisensor datasets, composed, respectively, of RGB-thermal infrared and RGB-multispectral platforms, were constructed to evaluate the efficacy of the proposed method. Visual and quantitative evaluations confirmed that the proposed approach yields accurate coregistration performance for multisensor orthophotos without the need for GNSS surveying, which is the most time-consuming and labor-intensive part of the entire UAV surveying process. The overall RMSE for the RGB-MS dataset was approximately 0.111 m, which indicates that the proposed method works properly on agricultural fields that have similar repetitive crop patterns, especially areas that pose problems for the extraction of reliable CPs. For future work, we will analyze multitemporal datasets to improve the robustness of the proposed approach. Furthermore, we will exploit datasets constructed with a diverse combination of multisensor orthophotos to further verify this approach.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.