DMF Data Format Usage for Change Detection

Because of the availability of an overwhelming amount of remote sensing data obtained by different instruments, new techniques and applications have been developed in order to pursue the objective of detecting changes that occur in a particular area of the Earth or that affect a large part of the Earth. These studies have used datasets covering different wavelength ranges (visible, IR, radar, and so on), but common to all of them is the necessity for great accuracy to ensure that no bias is introduced due to data correction. Otherwise, a result may be the generation of false positives. Also, many studies have used several different datasets for the same area to detect changes (this is usually called data fusion), but there exists no specific data structure designed for this purpose. In this paper, we propose a data structure to be used for accurate change detection. This structure is transparent to the user and can be used for data fusion to improve those studies.


Introduction
In the last decades, several remote sensing satellites for quasidaily global Earth observation have been launched, and various new and improved versions of their instruments are to be launched in this present decade. The data they provide open up great opportunities for the rapidly growing field of change detection of the Earth [1][2][3].
Change detection can be used for precision agriculture by observing the changes that occur in the plants: for terrain motion, by analysing the differences in height and position of some key points of the ground (ground control points), evolution of glaciers, by observing some physical features of the ice together with their positions, exhaustion of water resources in rural areas, where water is extracted and used for irrigation, and many similar applications.
Thus, it is clear that is a very interesting area for remote sensing, geology, agriculture, and so forth, and even for politics and local governments. But despite having such a great potential, it also suffers from acute problems and difficulties. The main difficulty arises when one considers the very basic need to generate a simple product of a change detection algorithm and the particular necessity of having two images of the same scene acquired on different dates, and which reproduce exactly the same area (or at least a sufficiently large common area).
Having two or more images of the same scene and simply using them without worrying about geocorrection and the introduction of incorrectly located data points, or the commonplace assignment of the mean of neighbouring pixels, leads to the appearance of spatial and spectral errors. These are not accounted for and produce a large number of false positives and false negatives concerning changes in the scene being considered [4]. This is so even in the fairly straightforward case of only considering images provided by the same instrument and with the same geocorrection algorithm, but if images from different instruments and dates are to be used, the problems become ever more complex and error prone.
One of the main problems, therefore, is the need for a common basic data structure that overcomes these problems and provides a simple, yet powerful redistribution of data points so that no false data are introduced, and data from different instruments and dates can be dealt with together [5].
Different sensors produce data with different formats, geometries, and scales. In order, therefore, to evaluate changes from images of the same scene taken by different sensors, a unified data format would have to be used. There are today very many data formats to represent remote-sensing information. Examples are HDF [6], CDF [7], NetCDF [8], GeoTIFF [9], and JPEG2000 [10]. There are also many techniques in the literature designed to improve the storage of the information, but few data formats have been focused on improving how such methods are actually applied. In this sense, it is difficult to find a popular and conventional data format that has been developed for the purpose of offering improved change detection, or which only requires a minimum of modification to achieve such improvement.
Here, we propose a data structure that covers the aforementioned problems together with many others (in the data fusion field). In the following sections, we shall describe the structure and its usage and some experiments and results illustrating its benefits.
The rest of this paper is organized as follows. In Section 2, we present the proposed method and its usage, and in Section 3, the results of experiments performed with this structure together with some interesting results showing its capabilities for the field of change detection. Finally, in Section 4, we draw some conclusions on the structure and note some further improvements to be made to increase its usability and functionality.

Methods
There are currently several methodological approaches to geocorrecting images corresponding to the data acquired by remote sensors [11]. The most widely used involve either the insertion of reference points or ground control points (GCPs: pixels whose coordinates are already known) manually or automatically (such as image-image registration or imagemap registration) or the addition of extra bands or files with the spatial coordinates for each measurement (provided by the positioning system). But they all have the goal of placing all the measurements over a spatial grid in accordance with their location in order to display and work with an image whose representation is close to reality. The allocation of the original measurements to pixels of this referenced grid implies the potential appearance of unassigned pixels or of two or more measurements assigned to the same pixel. In the former case, it is possible to assign the nearest measurement (introducing a displacement of the information) or the mean of the measurements in the neighbourhood by a technique such as interpolation or convolution [12] (i.e., introducing false spectral information).
A matrix-based structure denominated Diffused Matrix Format (DMF) [13,14] was designed to minimize such spatial and spectral errors introduced into the geocorrection process. In the present work, this structure is modified in order to optimize its application in change detection methods.
As a result of the acquisition and geocorrection processes, some ancillary data files may be provided together with the resultant image [15]. Examples are the L0 image (as acquired by the sensor), the IGM image (that stores the spatial coordinates for each measurement of the L0 image, provided by the GPS/INS device of the positioning system), and the GLT image (a mask of the geocorrected image in which each cell indicates by means of a positive value that the measurement in the analogous pixel was originally allocated in that position or by means of a negative value that it was reallocated with a nearest-neighbour algorithm because that position was unassigned). All these kinds of file are very useful for extracting information about the accuracy when georeferencing the measurements, and they will be used to construct the DMF data structure and to perform the experiments with regard to change detection techniques.
To this end, the DMF structure has been adapted to store all the information corresponding to one sensor which has scanned a certain scene on different dates. The DMF is a matrix-based structure whose cells can store none, one, or several measurements together with their spatial coordinates. It has been tested in different fields of remote sensing processing, such as processing time [16], storage space [16], spatial accuracy [17], data fusion [18,19], and spaceborne processing [15], in all cases yielding very good results.

The Diffused Matrix Format (DMF).
The basic unit of the DMF structure is the so-called Diffused Measurement Record (DMR). This represents the information corresponding to a measurement acquired by the sensor. The information included consists of the coordinates of the pixel's location and its spectrum (Figure 1(a)).
The DMRs are arranged in a grid called the DMF matrix, in accordance with their spatial locations. This matrix consists of a regular spatial grid indexed in the N/S direction regarding the cartographic latitude and longitude, with the cell size being definable by the user. The fact of considering the exact spatial coordinates for each measurement in a way that is independent of the matrix in which they are stored (unlike the case of conventional formats) allows them to be located with a spatial precision that is limited only by the positioning system since the errors committed by misplacements have been eliminated.
Because the position of the acquired measurements can be completely random, there are three possible situations when assigning DMR measurements to the cells of the DMR matrix.
(1) One measurement has been assigned to the cell: the corresponding DMR record is stored in that cell as such.
(2) No measurement has been assigned to the cell: the cell will remain as Null, with no information about that position.
(3) Two or more measurements have been assigned to the cell: the cell must store all the information corresponding to that location, that is, store all the corresponding DMR records.
In order to allow for all three possibilities, the cells of the DMF matrix have been adapted to store a dynamic list of DMR records. In particular, there may be cells that store no measurements, only one measurement, or more than one measurement (Figure 1(b)). To provide for this, each DMR will also have a field that will store the memory address of the  next measurement stored in the same DMF cell (represented with an arrow in the figure). The form of sorting and storing the measurements in the list can be varied, for example, storing as first DMR either the first measurement acquired or that nearest to the centre of the DMF cell. Null DMF cells will store empty memory addresses.
The DMF format allows the user always to work with all the raw measurements just as they were acquired by the sensor and makes it easy to change the scale of the matrix (without altering the stored data) rather than be limited to a certain spatial resolution [20]. For instance, the greatest resolution of the DMF matrix would be each cell stores only one DMR record, and the lowest resolution would be where the DMF matrix consists of only one cell that stores all the image data. Furthermore, it is possible to process data obtained from different sensors flying at very different altitudes and with different ground tracks. Indeed, the orientation of the ground track is taken into account automatically by the DMF format.
Hence, the DMF format has several advantages over conventional data formats.
(i) All the measurements are placed in their corresponding DMF cell, according to their coordinates, without deleting any data if two or more are in the same DMF cell, and without displacing the data to unassigned DMF cells. In this case, the duplication and loss of information present in conventional data formats are obviated in the DMF format, and the spatial error is limited to that of the positioning system.
(ii) Because all the measurements are spatially identified with their coordinates, the process of geocorrection is simplified to the calculation of the DMF cell for each measurement. There is no need to use additional techniques such as warping, GCP assignment, and resampling. It is thus possible to construct the structure while the data is being acquired, providing the geocorrected DMF file to the scientific community at the very moment the acquisition has finished (avoiding delays of hours or even days to prepare and deliver the data). In addition, since the effect of the different ground tracks of the plane or the satellite is automatically corrected, the DMF has a clear potential use for airborne and spaceborne processing [15].
(iii) The size of the DMF cell (pixel size in conventional data formats) is adjustable and can be defined by the user as needed. The choice of a certain DMF cell size has no effect at all on the accuracy of the positioning of the measurements over the grid, so that the spatial accuracy is maintained.
Despite all of the DMF's clear advantageous possibilities, one must bear in mind some of its disadvantages.  (i) The fact of having very long DMR lists may affect the performance of the DMF data structure by increasing computation times. Similarly, this could also be the case if there is a high proportion of Null DMF cells. However, this can be avoided altogether or improved with a suitable choice of the DMF cell size.
(ii) Direct access to the data in the grid cells is lost, unlike with conventional data formats. It is necessary to analyse the entire DMR list in the corresponding DMF cell to find a certain measurement.
(iii) The user might not know which DMF cell size to use and choose one at random that could affect the overall performance of the structure. It is necessary to provide additional features that help the user to choose an appropriate DMF cell size.

Adaptation of the DMF Format for Change Detection.
In this work, the DMF data structure has been modified to make it capable of storing information of a certain scene but in different time periods. This was done by adding a new field to the DMR record-the Timestamp (Figure 2(a)). This new field consists of a date format parameter that will store the instant when the measurement was acquired, which can be provided by the GPS/INS device [21]. In this way, it is possible to know the exact location of the measurement and the exact time when it was acquired.
The DMR records will be placed in the DMF matrix just as in the conventional design, so that it will be possible to store all the information from different time periods together in the same data structure. This allows better and more accurate processing and change detection, since data from different instants are not kept in different image data structures, and it is possible to analyse each measurement with its neighbourhood more easily and more rapidly.

Experiments and Results
In order to test the performance of the DMF format in application to the field of change detection, a dataset comprising two 63-band images acquired by the AHS sensor [22,23]  (ESA). The two images were acquired in the same flight but on different overpasses, with a time difference of the order of minutes. INTA's purpose in acquiring these two real images with different spatial resolutions but with no changes at all was to provide a reliable dataset focused on the development of new methodological approaches to change detection.
Two experiments were performed to observe the performance of the DMF format in detecting changes in the same scene acquired on different dates. The first experiment shows the aspect of the DMF matrix after detecting all the changes. The second shows the increase in accuracy when using the DMF format versus the analogous geocorrected version, since this format stores the exact location of each measurement together with the information acquired by the instruments.

Display of the Results Provided by the Change Detection
Method. For the first experiment, the L0 images and their analogous IGM files (which store the coordinates of each L0 measurement) were used to construct the DMF versions of the scenes using the same DMF cell size as the pixel size (σ) used in the geocorrected versions. This task was performed as follows.
(1) Create a matrix according to the maximum and minimum longitudes and latitudes of the IGM file (covered area), where the number of columns and rows can be defined as num columns = ceil max longitude − min longitude σ , num rows = ceil max latitude − min latitude σ . (1) (2) For each measurement of the L0 image, obtain its coordinates (latitude and longitude) from the IGM file, create its analogous DMR record, and calculate its position in the DMF matrix as (3) There are two possibilities when assigning the DMR to its corresponding DMF cell: (i) the DMF cell has no measurements stored yet (Null): the new DMR will be assigned as such to that DMF cell; (ii) the DMF cell already has measurements in it: the new DMR will be stored in the DMR list by attaching it to the last DMR stored. In this way, this last DMR will point to the new one by storing its memory address. The new DMR will now be the last measurement of the DMR list in that DMF cell.
The results of constructing the DMF versions of the two images are shown in Figures 4(a) and 4(b). In these grayscale displays of the images, each pixel corresponds to a DMR cell. Since different measurements can be stored in the same DMF cell, only the DMR nearest to the centre of the DMF cell has been considered for display (when processing the DMF matrix it is necessary to consider the whole DMR list). Pixels in pure gray represent DMF cells that do not contain any DMR record (Null). As can be appreciated in the zoomed previews, this kind of Null DMF cell may exist within the image, representing small areas where no measurements were acquired by the instruments.
If a change detection method is to be applied to this scene, it would be necessary to use two different geocorrected images (Image1 and Image2). In order to avoid this, the form of constructing the DMF has been slightly modified for change detection, and now, instead of having two independent DMF matrices (one per image), all the measurements from both images will be stored in the same DMF matrix (Figure 4(c)). Thus, only one data structure would be used, with the origin of each DMR being distinguishable from the Timestamp field.
So, the procedure to detect the changes when using the conventional geocorrected images is performed in the following way. (1) For each pixel of Image1 (Pixel1), calculate the coordinates as follows: longitude = column × pixel size + corner longitude, latitude = row × pixel size + corner latitude. (3) (2) Locate the corresponding pixel in Image2 (Pixel2), as follows: (3) To calculate the difference between the measurements of both images, calculate the spectral angle (θ) between Pixel1 and Pixel2 as follows: The result is an RGB image with the same spatial size as Image1, but where each pixel indicates the angle θ relative to its analogue in Image2. The greater this angle, the whiter this pixel is painted (Figure 5(a)).
For the DMF format, the procedure to detect the changes is similar.
(1) For each measurement stored in the DMF matrix (considering that it is possible to find more than one in the same DMF cell) with Timestamp = T1 (Measurement1), search for the nearest measurement with Timestamp = T2 within the same DMF cell (Measurement2).

Journal of Electrical and Computer
(2) To calculate the difference between the measurements of both images, calculate the spectral angle (θ) between Measurement1 and Measurement2 as follows: The result is another DMF matrix with the same spatial size as the original DMF matrix, but storing single-value DMR records instead of spectrum-value DMR records. These single-value DMR records will store the value of the spectral angle (θ) between the reference measurement and its analogue. The representation of the resulting DMF matrix is carried out by creating an RGB image with the same spatial size as the DMF matrix, where each pixel indicates the angle of the measurement with Timestamp = T1 nearest to the centre of the DMF cell relative to the nearest measurement with Timestamp = T2 within the same DMF cell ( Figure 5(b)).
As one can see, the two results obtained by the geocorrected version and the DMF version are visually quite similar. The whiter pixels indicate areas where the changes are greater. As the original two images were acquired under the same conditions, the resultant image is almost black (with very low differences) except at the borders, caused by the inaccuracy of the positioning system.
However, the number of black pixels inside the DMF image indicates that there are many measurements with Timestamp = T1 that do not have measurements with Timestamp = T2 nearby (and vice versa), so that is why there is no information in the small area represented by that DMF cell. The geocorrected version of the first image (Image1) always finds an analogous pixel in the second image (Image2) since, even if there is no measurement nearby, to each pixel is assigned the nearest measurement during the geocorrection process, thereby introducing positioning and spatial errors. This additional inaccuracy is avoided when using the DMF format, so the result obtained in Figure 5(b) is free of this kind of spatial errors, the uncertainty having been reduced to that of the GPS/INS system.
There could be cases when, after detecting the changes in the images overall, some analysis of different regions is required. If there are not many points in common between the two images, there would exist a major proportion of Null DMF cells, making the task of analysing the changes in smaller areas more difficult. One can consider different ways to proceed when this is the case.
(1) The most recommendable option is just to analyse the non-Null DMF cells, even though there are few measurements in the region considered. Despite the small number of data, they still will all be real data acquired by the instruments. It could be possible to attach to this area analysis a parameter indicating the reliability of the operation, such parameter being directly related to the number of measurements.
(2) If the analysis requires the consideration of all the DMF cells within a considered area, and the Null DMF cells are really a problem, they could be filled by assigning one of the measurements in the neighbourhood (the nearest one), just as is done in conventional geocorrection processes. However, this option is less recommendable because false information is being introduced (increasing the spatial errors), that is, by assigning some measurement values to a small area in which no acquisition was performed at all.
In order to test how these measurement displacements affect the result of the change detection, the experiment to be described in Section 3.2 was carried out.

Accuracy of the DMF Format for Change Detection.
In order to study the performance of the DMF in change detection, a second experiment was carried out. This showed that DMF increases the accuracy since, with there being no previous geocorrection phase, there are no spatial errors, and some DMF cells may remain empty if no nearby measurements are present.
To test this, the images Image1 and Image2 created in Section 3.1 and shown in Figures 3 and 4 and the corresponding GLT files were considered, calculating the accuracy for the geocorrected images as follows.
(1) For each pixel of Image1 (Pixel1), calculate the spectral angle relative to its analogue in Image2 (Pixel2), as explained in Section 3.1.
(2) Check the sign for the positions of Pixel1 and Pixel2 in their corresponding GLT file. A positive sign implies that the measurement has been assigned to that pixel in the geocorrection process, whereas a negative sign implies that the pixel at that location was unassigned, and the nearest measurement (not of that pixel) was assigned. Hence, there are the following four possibilities: 00: both the measurement of Pixel1 and the measurement of Pixel2 were assigned with a nearest-neighbourhood algorithm; 01: the measurement of Pixel1 was assigned with a nearest-neighbourhood algorithm, but the measurement of Pixel2 was assigned originally; 10: the measurement of Pixel1 was assigned originally, but the measurement of Pixel2 was assigned with a nearest-neighbourhood algorithm; 11: both the measurement of Pixel1 and the measurement of Pixel2 were assigned originally.
Regarding the DMF format, there are no assignments made by using nearest-neighbourhood algorithms, so the 00, 01, and 10 possibilities will never exist, reducing the error caused by filling the empty pixels with that type of procedure.   Table 1 lists the mean angle (θ) for each case, for both the geocorrected and the DMF versions of the change detection result.
By analysing Table 1, one can make several deductions. One is that the percentage of real well-assigned pixels in the geocorrected version of the image is about 57%. This means that 57% of the measurements of the geocorrected images are exactly in the pixel where they should be according to their spatial coordinates. The rest of the measurements, about 43% of the total, have been reassigned to pixels in their neighbourhood in order to avoid empty holes in the image. This introduces further error when detecting the changes according to the mean spectral angle, reaching up to 0.0231 rad in the worst case (Case 00, both pixels in Image1 and Image2 have been reallocated through a nearestneighbourhood algorithm), and giving better results, a mean spectral angle of 0.0204 rad, when both pixels were well allocated (Case 11).
All the measurements in the DMF matrix are exactly in the DMF cell to which they should be allocated according to their spatial coordinates. That is why Cases 00, 01, and 10 always present zero measurements. That is, all the measurements are stored in their corresponding DMF location cell (Case 11) and no displacement of the measurements is performed with a nearest-neighbourhood algorithm. This means a reduction of the change detection error: considering all the measurements acquired by the sensor implies a mean spectral angle of 0.0194 rad versus the 0.0204 rad in the best case of the geocorrected versions.
Another deduction that can be made from the table is that far more measurements are found in the geocorrected version that in the analogous DMF representation. This is because some duplication of the data exists when assigning the nearest measurements to the empty pixels in the geocorrection process: a given pixel can be assigned to one, two, or more pixels if they are unassigned and are the nearest. This introduces a displacement of the information that implies a reduction in the applied method's accuracy. This kind of error does not exist when using the DMF format, since all the measurements acquired by the sensor (a total of 1,500,000) are stored as such. Just in Image1, the number of duplicated measurements is 1,953,586 − 1,500,000 = 453,586, a very large quantity of information that is not real. The result is that the mean spectral angle in the best case of the geocorrected image (11) is worse than in the analogous DMF version.

Conclusions
In this work, an adaptation of the Diffused Matrix Format (DMF), a data structure originally designed to improve the spatial allocation of the measurements acquired by the sensor, has been adapted for application to methods of change detection. It is based on a basic unit denominated the Diffused Matrix Record (DMR) that stores all the information corresponding to each measurement, including a Timestamp field that indicates the time when it was acquired by the sensor. To test its performance, a dataset was used consisting of two AHS images of the same scene, acquired in the same flight but from different overpasses. For each measurement of the first image, the analogous one is located in the second. The spectral angle between them indicates how different they are. The results showed that the accuracy when using the DMF format is greater than conventional structures for various reasons.
(1) Since it is possible to consider DMF cells with no measurements (Null), DMF does not make misplacements of the measurements to unassigned pixels as is done in conventional image geocorrection approaches, misplacements that introduce spatial errors that are absent when the DMF format is used.
(2) Because the exact location of each measurement (provided by the GPS/INS positioning system) is stored as DMR fields, DMF yields a spatial accuracy limited just to the errors of the positioning system.
(3) The use of the DMF format for change detection has additional advantages, such as using the same data structure to store all the data from different time periods together, instead of using different individual images. This allows easier and faster identification of analogous measurements with different timestamps, improving the performance of the change detection method.
(4) The use of a unique data structure capable of storing all the measurements acquired in different time periods opens up a new field in the possibility of designing new algorithms that could consider the temporal distance as an additional dimension, the aim of course being to obtain improved change detection results.