Target Detection Using Nonsingular Approximations for a Singular Covariance Matrix

Accurate covariance matrix estimation for high-dimensional data can be a difficult problem. A good approximation of the covariance matrix needs in most cases a prohibitively large number of pixels, that is, pixels from a stationary section of the image whose number is greater than several times the number of bands. Estimating the covariance matrix with a number of pixels that is on the order of the number of bands or less will cause not only a bad estimation of the covariance matrix but also a singular covariance matrix which cannot be inverted. In this paper we will investigate two methods to give a sufficient approximation for the covariance matrix while only using a small number of neighboring pixels. The first is the quasilocal covariance matrix (QLRX) that uses the variance of the global covariance instead of the variances that are too small and cause a singular covariance. The second method is sparse matrix transform (SMT) that performs a set of K-givens rotations to estimate the covariance matrix. We will compare results from target acquisition that are based on both of these methods. An improvement for the SMT algorithm is suggested.


Introduction
The most widely used algorithms for target detection are traditionally based on the covariance matrix [1].This matrix estimates the direction and magnitude of the noise in an image.In the equation for a matched filter presented in [1] we have x is the examined pixel, m is the estimate of that pixel based on the surroundings, Φ G is the global covariance matrix, and t is the target signature.In words, we can say that our matched filter for target detection will detect the target in a particular pixel x if x is different than its surroundings (x − m), unlike the noise (controlled by Φ −1 G ) and in the direction of the target.If the target signature is unknown, then the RX algorithm uses the target residual (x − m) as its own match, that is, Φ G is traditionally calculated as follows: Although the equation is theoretically justified if the background is stationary, it is often used in cases where this is not true.
In target detection, the image is not normally statistically stationary; it will however have quasistationary "patches" which connect to each other at the edges.When one estimates the mean and covariance matrix of the background of a particular pixel, the local neighboring pixels will have provided a better estimate than the pixels of the entire image.In [2], we show that much better results can be obtained if one uses a "quasilocal covariance matrix" (QLRX).In general terms, it uses the eigenvectors of the overall global matrix, but the eigenvalues are taken locally.This tends to lower the matched filter scores at edges in the data (when the image is going from one stationary distribution to another), but allows for accurate detection in less noisy areas.
The overall question of using a covariance matrix from local areas in which not enough data is sparse is actually a well-studied issue in the literature.In particular, in [3], Theiler et al. consider the sparse matrix rotation method for determining a covariance matrix based on limited data.In this paper, it is our intention to compare the two methods both in terms of their detection ability and their overall efficiency.

Local Covariance Matrix
Assume we are given a dataset X which is composed of n pixels with p dimensions.Φ G is the covariance matrix of this dataset.An SVD (singular value decomposition) can be used to decompose the global covariance matrix [2] into eigenvectors and eigenvalues; we will refer to this as PCA (principal component analysis) space.
To compare the covariance matrix based on the local area surrounding a pixel and a matrix based on all the available data (referred to as global), consider the statistics of the dataset X = E T G X G .Here E G is the rotating matrix based on the global eigenvectors [2], and X is the dataset after rotation into the PCA subspace.
If X is based on all the pixels in the image, then the covariance matrix of X consists of a diagonal matrix with the global eigenvalues on the diagonal.However, if X only contains the local surroundings, then the values on the diagonal of the covariance matrix of X will represent the variances of the local data in the direction of the global eigenvectors.
Mathematically, using the dataset X, for every pixel we calculate the local covariance Φ L from the nearest neighbors.The diagonal D L of the local covariance matrix is the variance of the neighbors in PCA subspace as follows: Since the local covariance is composed from a small number of samples, some of the variances may be inappropriately small or even cause a singular covariance.To avoid singularity, the variance matrix Λ QL will be the maximum between the variances of the global (Λ G ) covariance and the variances of local covariance in the PCA subspace ( D L ) as follows: In this way, if the local area of the pixel has a large variance in some bands, it will be whitened by the local variance; for the bands that have too small local variances that can even cause a singular covariance matrix, it will use the global variance.
The quasilocal covariance will be In the PCA subspace it will simply be: If we will calculate the RX in the PCA subspace, we will need fewer rotations, we will rotate only once all the data to the PCA subspace, and then we will get: m L -the mean of the selected surrounding pixels in PCA subspace.
For subpixel targets, previous work [1] shows that it will be better to use the mean of 8 neighbors m g .This can be done assuming that the target does not affect the surroundings; if we fear that the target has entered the surrounding areas, then we will ignore those pixels and only use external pixels to them for our estimate.
In this method, we use sparse matrix rotations to find the nearest covariance matrix to the original one which is still nonsingular.We can use SVD to decompose the local covariance as follows: Based on the fact that every eigenvectors matrix E (or any unitary matrix) can be extracted from a product of K spare orthonormal rotation matrix [3] we can write Every rotation matrix E k is a Givens rotation operating on coordinates indices (i k , j k ); the rotation will be on the surface that contains the vectors i k , j k as follows: With K = N 2 rotation we can get from the identity matrix to any rotation matrix.
The concept of SMT is to start from the identity matrix, rotate every time two axes in the direction of the axis of the eigenvectors of the local covariance matrix, and to stop the rotations when it gives the best fit without becoming singular.
In other words, from a first set of data we can determine the correlations between the variables.If we did all the possible rotations, we would have diagonalized the matrix, but we cannot do this since we do not have enough data to simultaneously find all the local eigenvectors.Instead, we do these rotations on the most correlated ones, testing our new matrix by the degree that it provides good results on a second dataset.When our correction to the second dataset fails, we stop the diagonalizing procedure.Mathematically, the rotation matrix T will be all the selected rotations combined: The variances will be the variances of the local covariance matrix in the direction of the rotation matrix T, Λ SMT = diag(T Φ L T) and the inversed covariance matrix will be to decide what rotation matrix is best we use the maximum likelihood covariance estimation and "leave-third-out" cross-validation.(Note that the use of leave-one-out cross-validation will give better results but will cost much more in computational efforts).
We divide the group of pixels into three groups.We take one third to be the tested pixels Y ∈ R (n/3)×p , and we use the other two thirds to make the approximation of the covariance Φ SMT .After every rotation we calculate the likelihood of covariance Φ SMT to describe correctly the group Y .We have processed to data to make sure that Y is zero mean as follows: We do this three times, each time another third is being taken out as the test data Y ; after combining the results of the three tests, we find the number of rotations that gives the best result (based on the highest value of P(Y )); we then use the full set and this number of rotations to get the final approximation of the covariance matrix (see Figure 1).To select every time the rotations that will make the biggest improvement, we perform greedy minimization, that is, always choosing the next rotation that will contribute most to reduce the correlation between data along the axis of the matrix as follows: S is the current covariance matrix, (i, j) are indices of two rows in the matrix, and S i j , S ii , S j j are the members in the matrix with those indices.
After we calculate the covariance matrix for SMT, we can use it for anomaly detection: x-the tested pixel.m L -the mean of the selected shrouding pixels in PCA subspace.
For subpixel targets as we stated previously, it will be better to use the mean of 8 neighbors m 8 .
The probability that Φ SMT describes Y correctly after k rotations; the k that will be chosen is the one that gives the maximum probability.

Dataset
Two datasets were used (Figure 2); a description of their origin can be found in Table 1 and in greater detail in [4].
The two data cubes (OBP1 and OBP2) are real data from the Hymap sensor in which anomalies were inserted artificially by linearly mixing the spectra of a green paint pixel with the original background pixel.For display purposes, in Figure 2, images with full-pixel paint spectra are shown.For the evaluation of anomaly detection results, images with a mixing ratio of 0.33 (P = 33%) were used.

Results
We now wish to compare the SMT and QLRX algorithms.We will perform RX anomaly detection (2) using the covariance matrices given by each of the algorithms.
Since the dataset being used contains implanted subpixel targets without any danger of overlap into neighboring pixels, the mean in the calculation of (2) was always the mean of the eight nearest neighbors.However, we must consider the correct neighborhood for the calculation of the covariance matrix for SMT that provides the best results.
The first test was done using only the nearest 8 neighbors for the approximation of the covariance; in this test, it is very easy to see that QLRX results are superior to the SMT results.The ROC curves are given in Figure 3.We assume that the area of the target (and of any examined pixel) consists of the square region of dimension OWS by OWS (outer window).The target area itself has area GWS by GWS (guard window);  for subpixel targets GWS will equal 1.Thus the neighboring pixels are those pixels which are located in the square set of pixels in the area OWS by OWS not contained in the inner GWS by GWS matrix.
In this picture is the result for P = 33%, but the tests for 10, 25, 50, and 100 percents gave similar results.
Results from the OBP2 dataset were comparable.When the OWS is larger, the results of the SMT improve dramatically.
For the dataset OBP1 we can see a large improvement as OWS increases.For this dataset QLRX gives better results, especially in the low CFAR (constant false alarm rate).
For the dataset OBP2, the differences between QLRX and SMT are reduced but still QLRX performs better in the low CFAR regime (see Figure 5).
Similar results were received for the cases in which 10, 25, 50, or 100 percent of the target is in the pixel.
The SMT has two main difficulties: first, the algorithm calculates a new covariance matrix at every point.This calculation needs a sequential set of rotations based on the training set followed by evaluations of the test set.Both sets are taken from the pixel surrounding only, so none of the information outside the selected group is used in the calculation.In QLRX, the eigenvectors are the same for all points (the eigenvectors of the global covariance).All that we need to do is measure the variance in the local area in the spectral direction of the eigenvalues and calculate the new covariance matrix.Second, the calculation of the SMT itself is highly dependent on the size of the "local" area.While a larger area improves the results, it also increases the time for calculation (see Table 2).

Improvements for SMT
A small change in the published method for doing SMT could lead to a large improvement.Name is OBP2, P = 33%, with GWS = 1 In the original algorithm, the initial assumed axes of the covariance matrix are in the direction of the original dataset.
Then the axes are rotated in pairs into the directions of the "local eigenvectors" to create new covariance matrices.
When we stop, some of the axes will be almost the same direction as the local covariance eigenvectors and some will be closer to the direction of the original axes.Now since the original directions were random, that is, not related to the correlations between the axes, it is easy to see that there is no reason that this should be optimum.In particular, would it not make more sense to start from the global eigenvectors and rotate into the local ones?Another benefit we will get from this approach is that we will start the rotation from a condition that most probably will be closer to the optimum point (see Figure 6); within fewer rotations, we will get to the maximum likelihood.We will call this new algorithm SMT PCA.
For the OBP1 dataset (Figure 7), the result after starting with the subspace based on the global eigenvectors are better than QLRX when OWS is big (7,9).SMT-PCA gives better results than SMT for any OWS.
For the OBP2 dataset (Figure 8), the result after starting with the subspace based on the global eigenvectors are better than QLRX when OWS is big (7,9).SMT-PCA gives better results from SMT for any OWS (for OWS = 3 SMT and SMT-PCA almost the same).
Examining the number of rotations needed in the SMT and in the SMT-PCA (see Table 3).
Figure 6: When starting from PCA subspace, we will start from a closer point to the maximum so we need fewer rotations; the delta in k between the original location to the current one is the rotations done by transforming to the PCA subspace.

Conclusions
As a preliminary to our conclusions, please note that when we discuss using a small or large number of pixels, that in all cases the number of pixels used is less than the number of spectral bands.Two methods in this paper have been considered for dealing with possibly singular covariance matrices.In the first (QLRX), we use global eigenvectors and local eigenvalues as an approximation of the inverse covariance matrix.In the second (SMT), we use an iterative process to slowly "twist" our axes to come closer to those determined by the data.
In our two datasets, we found that if a small area was used for estimating the background, the QLRX algorithm was superior.For large areas of background, QLRX remains superior, although SMT greatly improves as follows:

Number of pixels Number of bands
(i) the calculation time of QLRX is much smaller (two orders of magnitude) than both SMT and SMT-PCA, (ii) the calculation time of SMT PCA is less than the calculation time of the original SMT by about a factor of two, (iii) SMT-PCA and QLRX performance are better than those of SMT for any number of pixels, (iv) for a small number of pixels (p/n ≤ 0.1), the QLRX performance is better than that of SMT-PCA, (v) for a large number of pixels (0.2 ≤ p/n < 0.1), the performance of SMT-PCA is better than that of QLRX.

1 Figure 4 :
Figure 4: OBP1 results with OWS given by the number in the legend.

Figure 5 :
Figure 5: Similar to Figure 4 for the OBP2 dataset.

1 Figure 7 : 1 Figure 8 :
Figure 7: OBP1 results with OWS given by the number and the legend.

Table 1 :
Datasets information-OBP1 and OBP2 are parts of OBP.

Table 2 :
This table shows the time it took to complete the calculation of a dataset.

Table 3 :
This table shows the time it took to complete the calculation of a dataset.