JECE Journal of Electrical and Computer Engineering 2090-0155 2090-0147 Hindawi Publishing Corporation 628479 10.1155/2012/628479 628479 Research Article Target Detection Using Nonsingular Approximations for a Singular Covariance Matrix Gorelik Nir 1 Blumberg Dan 2 Rotman Stanley R. 1 Borghys Dirk 3 Hu Xiaofei 1 Department of Electrical and Computer Engineering Ben-Gurion University of the Negev Beer-Sheva 84105 Israel bgu.ac.il 2 Department of Geography and Environmental Development Ben-Gurion University of the Negev Beer-Sheva 84105 Israel bgu.ac.il 3 Signal and Image Centre Royal Military Academy 1000 Brussels Belgium rma.ac.be 2012 30 07 2012 2012 01 04 2012 07 06 2012 2012 Copyright © 2012 Nir Gorelik et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Accurate covariance matrix estimation for high-dimensional data can be a difficult problem. A good approximation of the covariance matrix needs in most cases a prohibitively large number of pixels, that is, pixels from a stationary section of the image whose number is greater than several times the number of bands. Estimating the covariance matrix with a number of pixels that is on the order of the number of bands or less will cause not only a bad estimation of the covariance matrix but also a singular covariance matrix which cannot be inverted. In this paper we will investigate two methods to give a sufficient approximation for the covariance matrix while only using a small number of neighboring pixels. The first is the quasilocal covariance matrix (QLRX) that uses the variance of the global covariance instead of the variances that are too small and cause a singular covariance. The second method is sparse matrix transform (SMT) that performs a set of K-givens rotations to estimate the covariance matrix. We will compare results from target acquisition that are based on both of these methods. An improvement for the SMT algorithm is suggested.

1. Introduction

The most widely used algorithms for target detection are traditionally based on the covariance matrix . This matrix estimates the direction and magnitude of the noise in an image. In the equation for a matched filter presented in  we have (1)R=tTΦG-1(x-m),x  is the examined pixel, m is the estimate of that pixel based on the surroundings, ΦG is the global covariance matrix, and t is the target signature. In words, we can say that our matched filter for target detection will detect the target in a particular pixel x if x is different than its surroundings (x-m), unlike the noise (controlled by ΦG-1) and in the direction of the target. If the target signature is unknown, then the RX algorithm uses the target residual (x-m) as its own match, that is, (2)R=(x-m)TΦG-1(x-m),ΦG is traditionally calculated as follows: (3)ΦG=1Ni=1N(xi-m)(xi-m)T. Although the equation is theoretically justified if the background is stationary, it is often used in cases where this is not true.

In target detection, the image is not normally statistically stationary; it will however have quasistationary “patches” which connect to each other at the edges. When one estimates the mean and covariance matrix of the background of a particular pixel, the local neighboring pixels will have provided a better estimate than the pixels of the entire image. In , we show that much better results can be obtained if one uses a “quasilocal covariance matrix” (QLRX). In general terms, it uses the eigenvectors of the overall global matrix, but the eigenvalues are taken locally. This tends to lower the matched filter scores at edges in the data (when the image is going from one stationary distribution to another), but allows for accurate detection in less noisy areas.

The overall question of using a covariance matrix from local areas in which not enough data is sparse is actually a well-studied issue in the literature. In particular, in , Theiler et al. consider the sparse matrix rotation method for determining a covariance matrix based on limited data. In this paper, it is our intention to compare the two methods both in terms of their detection ability and their overall efficiency.

2. Local Covariance Matrix

Assume we are given a dataset X which is composed of n pixels with p dimensions. ΦG is the covariance matrix of this dataset. An SVD (singular value decomposition) can be used to decompose the global covariance matrix  into eigenvectors and eigenvalues; we will refer to this as PCA (principal component analysis) space.

To compare the covariance matrix based on the local area surrounding a pixel and a matrix based on all the available data (referred to as global), consider the statistics of the dataset X~=EGTXG. Here EG is the rotating matrix based on the global eigenvectors , and X~ is the dataset after rotation into the PCA subspace.

If X~ is based on all the pixels in the image, then the covariance matrix of X~ consists of a diagonal matrix with the global eigenvalues on the diagonal. However, if X~ only contains the local surroundings, then the values on the diagonal of the covariance matrix of X~ will represent the variances of the local data in the direction of the global eigenvectors.

Mathematically, using the dataset X~, for every pixel we calculate the local covariance Φ~L from the nearest neighbors. The diagonal D~L of the local covariance matrix is the variance of the neighbors in PCA subspace as follows: (4)D~L=diag(EGTΦLEG)=diag(Φ~L). Since the local covariance is composed from a small number of samples, some of the variances may be inappropriately small or even cause a singular covariance. To avoid singularity, the variance matrix ΛQL will be the maximum between the variances of the global (ΛG) covariance and the variances of local covariance in the PCA subspace (D~L) as follows: (5)ΛQL=max(ΛG,D~L). In this way, if the local area of the pixel has a large variance in some bands, it will be whitened by the local variance; for the bands that have too small local variances that can even cause a singular covariance matrix, it will use the global variance. The quasilocal covariance will be (6)ΦQL=EGΛQLEGT. In the PCA subspace it will simply be: (7)Φ~QL=ΛQL. If we will calculate the RX in the PCA subspace, we will need fewer rotations, we will rotate only once all the data to the PCA subspace, and then we will get: (8)QLRX=(x~-m~L)TΛQL-1(x~-m~L)=i=1p(x~i-m~Li)2λi.m~L—the mean of the selected surrounding pixels in PCA subspace.

For subpixel targets, previous work  shows that it will be better to use the mean of 8 neighbors m~g. This can be done assuming that the target does not affect the surroundings; if we fear that the target has entered the surrounding areas, then we will ignore those pixels and only use external pixels to them for our estimate.

In this method, we use sparse matrix rotations to find the nearest covariance matrix to the original one which is still nonsingular. We can use SVD to decompose the local covariance as follows: (9)ΦL=ELΛLELT,ΦLRp×p. Based on the fact that every eigenvectors matrix E (or any unitary matrix) can be extracted from a product of K spare orthonormal rotation matrix  we can write (10)EL=k=K-10Ek=Ek-1Ek-2E0. Every rotation matrix Ek is a Givens rotation operating on coordinates indices (ik,jk); the rotation will be on the surface that contains the vectors ik,jk as follows: (11)Ek=(ikjk1cos(θk)sin(θk)::-sin(θk)cos(θk)1). With K=(N2) rotation we can get from the identity matrix to any rotation matrix.

The concept of SMT is to start from the identity matrix, rotate every time two axes in the direction of the axis of the eigenvectors of the local covariance matrix, and to stop the rotations when it gives the best fit without becoming singular.

In other words, from a first set of data we can determine the correlations between the variables. If we did all the possible rotations, we would have diagonalized the matrix, but we cannot do this since we do not have enough data to simultaneously find all the local eigenvectors. Instead, we do these rotations on the most correlated ones, testing our new matrix by the degree that it provides good results on a second dataset. When our correction to the second dataset fails, we stop the diagonalizing procedure.

Mathematically, the rotation matrix T will be all the selected rotations combined: (12)T=k=K-10Ek=Ek-1Ek-2E0,whenK<(N2). The variances will be the variances of the local covariance matrix in the direction of the rotation matrix T,ΛSMT=diag(TΦLT) and the inversed covariance matrix will be ΦSMT-1=TΛSMT-1=T. to decide what rotation matrix is best we use the maximum likelihood covariance estimation and “leave-third-out” cross-validation. (Note that the use of leave-one-out cross-validation will give better results but will cost much more in computational efforts).

We divide the group of pixels into three groups. We take one third to be the tested pixels YR(n/3)×p, and we use the other two thirds to make the approximation of the covariance ΦSMT. After every rotation we calculate the likelihood of covariance ΦSMT to describe correctly the group Y. We have processed to data to make sure that Y is zero mean as follows: (13)PΦSMT(Y)=1(2π)p/2|ΦSMT|1/2exp{-12tr{YTΦSMT-1Y}}. We do this three times, each time another third is being taken out as the test data Y; after combining the results of the three tests, we find the number of rotations that gives the best result (based on the highest value of P(Y)); we then use the full set and this number of rotations to get the final approximation of the covariance matrix (see Figure 1).

The probability that ΦSMT describes Y correctly after k rotations; the k that will be chosen is the one that gives the maximum probability.

To select every time the rotations that will make the biggest improvement, we perform greedy minimization, that is, always choosing the next rotation that will contribute most to reduce the correlation between data along the axis of the matrix as follows: (14)(ik,jk)argmaxi,jSij2SiiSjj,S is the current covariance matrix, (i,j) are indices of two rows in the matrix, and Sij,Sii,Sjj are the members in the matrix with those indices.

After we calculate the covariance matrix for SMT, we can use it for anomaly detection: (15)RXSMT=(x-mL)TΦSMT-1(x-mL),x—the tested pixel.

m L —the mean of the selected shrouding pixels in PCA subspace.

For subpixel targets as we stated previously, it will be better to use the mean of 8 neighbors m8.

3. Dataset

Two datasets were used (Figure 2); a description of their origin can be found in Table 1 and in greater detail in .

Datasets information—OBP1 and OBP2 are parts of OBP.

Name Site Sensor name No. bands Waveband (μm) Spat. Res. (m) Scene description
OBP Oberpfaffenhofen (Ge) Hymap 126 0.44–2.45 4 Airfield with agricultural area around
OBP1 Oberpfaffenhofen (Ge) Hymap 126 0.44–2.45 4 Agricultural area
OBP2 Oberpfaffenhofen (Ge) Hymap 126 0.44–2.45 4 Agricultural area

RGB composite of the original data cubes. From left to right: OBP1, OBP2.

The two data cubes (OBP1 and OBP2) are real data from the Hymap sensor in which anomalies were inserted artificially by linearly mixing the spectra of a green paint pixel with the original background pixel. For display purposes, in Figure 2, images with full-pixel paint spectra are shown. For the evaluation of anomaly detection results, images with a mixing ratio of 0.33 (P=33%) were used.

4. Results

We now wish to compare the SMT and QLRX algorithms. We will perform RX anomaly detection (2) using the covariance matrices given by each of the algorithms.

Since the dataset being used contains implanted subpixel targets without any danger of overlap into neighboring pixels, the mean in the calculation of (2) was always the mean of the eight nearest neighbors. However, we must consider the correct neighborhood for the calculation of the covariance matrix for SMT that provides the best results.

The first test was done using only the nearest 8 neighbors for the approximation of the covariance; in this test, it is very easy to see that QLRX results are superior to the SMT results. The ROC curves are given in Figure 3. We assume that the area of the target (and of any examined pixel) consists of the square region of dimension OWS by OWS (outer window). The target area itself has area GWS by GWS (guard window); for subpixel targets GWS will equal 1. Thus the neighboring pixels are those pixels which are located in the square set of pixels in the area OWS by OWS not contained in the inner GWS by GWS matrix.

Results of RX algorithm using QLRX and SMT on dataset OBP1 with the stated OWS = 3 and GWS = 1.

OBP1 results with OWS given by the number in the legend.

In this picture is the result for P=33%, but the tests for 10, 25, 50, and 100 percents gave similar results.

Results from the OBP2 dataset were comparable.

When the OWS is larger, the results of the SMT improve dramatically.

For the dataset OBP1 we can see a large improvement as OWS increases. For this dataset QLRX gives better results, especially in the low CFAR (constant false alarm rate).

For the dataset OBP2, the differences between QLRX and SMT are reduced but still QLRX performs better in the low CFAR regime (see Figure 5).

Similar to Figure 4 for the OBP2 dataset.

Similar results were received for the cases in which 10, 25, 50, or 100 percent of the target is in the pixel.

The SMT has two main difficulties: first, the algorithm calculates a new covariance matrix at every point. This calculation needs a sequential set of rotations based on the training set followed by evaluations of the test set. Both sets are taken from the pixel surrounding only, so none of the information outside the selected group is used in the calculation. In QLRX, the eigenvectors are the same for all points (the eigenvectors of the global covariance). All that we need to do is measure the variance in the local area in the spectral direction of the eigenvalues and calculate the new covariance matrix. Second, the calculation of the SMT itself is highly dependent on the size of the “local” area. While a larger area improves the results, it also increases the time for calculation (see Table 2).

This table shows the time it took to complete the calculation of a dataset.

“Name” “OWS” “GWS” “QLRX” time in seconds “SMT” time in seconds Time ratio
“OP1_T1_S10” 9 1 26 3845 148
“OP1_T1_S33” 9 1 26 3794 146
“OP1_T1_S100” 9 1 25 3820 153

“OP1_T1_S10” 7 1 25 2781 111
“OP1_T1_S33” 7 1 24 2769 115
“OP1_T1_S100” 7 1 24 2770 115

“OP1_T1_S10” 5 1 23 1965 85
“OP1_T1_S33” 5 1 23 1969 86
“OP1_T1_S100” 5 1 23 1965 85

“OP1_T1_S10” 3 1 23 1195 52
“OP1_T1_S33” 3 1 22 1187 54
“OP1_T1_S100” 3 1 23 1201 52

“OP2_T1_S10” 9 1 14 2135 153
“OP2_T1_S33” 9 1 14 2105 150
“OP2_T1_S100” 9 1 14 2095 150

“OP2_T1_S10” 7 1 14 1480 106
“OP2_T1_S33” 7 1 14 1481 106
“OP2_T1_S100” 7 1 14 1464 105

“OP2_T1_S10” 5 1 13 961 74
“OP2_T1_S33” 5 1 14 962 69
“OP2_T1_S100” 5 1 13 960 74

“OP2_T1_S10” 3 1 12 556 46
“OP2_T1_S33” 3 1 13 555 43
“OP2_T1_S100” 3 1 14 552 39
5. Improvements for SMT

A small change in the published method for doing SMT could lead to a large improvement.

In the original algorithm, the initial assumed axes of the covariance matrix are in the direction of the original dataset. Then the axes are rotated in pairs into the directions of the “local eigenvectors” to create new covariance matrices.

When we stop, some of the axes will be almost the same direction as the local covariance eigenvectors and some will be closer to the direction of the original axes.

Now since the original directions were random, that is, not related to the correlations between the axes, it is easy to see that there is no reason that this should be optimum. In particular, would it not make more sense to start from the global eigenvectors and rotate into the local ones? Another benefit we will get from this approach is that we will start the rotation from a condition that most probably will be closer to the optimum point (see Figure 6); within fewer rotations, we will get to the maximum likelihood. We will call this new algorithm SMT PCA.

When starting from PCA subspace, we will start from a closer point to the maximum so we need fewer rotations; the delta in k between the original location to the current one is the rotations done by transforming to the PCA subspace.

For the OBP1 dataset (Figure 7), the result after starting with the subspace based on the global eigenvectors are better than QLRX when OWS is big (7,9). SMT-PCA gives better results than SMT for any OWS.

OBP1 results with OWS given by the number and the legend.

For the OBP2 dataset (Figure 8), the result after starting with the subspace based on the global eigenvectors are better than QLRX when OWS is big (7,9). SMT-PCA gives better results from SMT for any OWS (for OWS = 3 SMT and SMT-PCA almost the same).

Similar to Figure 7 for the HAR dataset.

Examining the number of rotations needed in the SMT and in the SMT-PCA (see Table 3).

This table shows the time it took to complete the calculation of a dataset.

“Name” “OWS” “GWS” Original SMT number of rotations SMT after PCA number of rotations Rotations number ratio
“OP1_T1_S10” 9 1 3845 1851 2.1
“OP1_T1_S33” 9 1 3794 1831 2.1
“OP1_T1_S100” 9 1 3820 1806 2.1

“OP1_T1_S10” 7 1 2781 1508 1.8
“OP1_T1_S33” 7 1 2769 1498 1.8
“OP1_T1_S100” 7 1 2770 1457 1.9

“OP1_T1_S10” 5 1 1965 1177 1.7
“OP1_T1_S33” 5 1 1969 1203 1.6
“OP1_T1_S100” 5 1 1965 1122 1.8

“OP1_T1_S10” 3 1 1195 442 2.7
“OP1_T1_S33” 3 1 1187 407 2.9
“OP1_T1_S100” 3 1 1201 421 2.9

“OP2_T1_S10” 9 1 2135 839 2.5
“OP2_T1_S33” 9 1 2105 840 2.5
“OP2_T1_S100” 9 1 2095 827 2.5

“OP2_T1_S10” 7 1 1480 647 2.3
“OP2_T1_S33” 7 1 1481 641 2.3
“OP2_T1_S100” 7 1 1464 635 2.3

“OP2_T1_S10” 5 1 961 440 2.2
“OP2_T1_S33” 5 1 962 435 2.2
“OP2_T1_S100” 5 1 960 431 2.2

“OP2_T1_S10” 3 1 556 203 2.7
“OP2_T1_S33” 3 1 555 205 2.7
“OP2_T1_S100” 3 1 552 206 2.7

The smaller the number of rotations, the less time needed for the calculation.

6. Conclusions

As a preliminary to our conclusions, please note that when we discuss using a small or large number of pixels, that in all cases the number of pixels used is less than the number of spectral bands.

Two methods in this paper have been considered for dealing with possibly singular covariance matrices. In the first (QLRX), we use global eigenvectors and local eigenvalues as an approximation of the inverse covariance matrix. In the second (SMT), we use an iterative process to slowly “twist” our axes to come closer to those determined by the data.

In our two datasets, we found that if a small area was used for estimating the background, the QLRX algorithm was superior. For large areas of background, QLRX remains superior, although SMT greatly improves as follows: (16)Number  of  pixelsNumber  of  bands=pn(0,1),

the calculation time of QLRX is much smaller (two orders of magnitude) than both SMT and SMT-PCA,

the calculation time of SMT PCA is less than the calculation time of the original SMT by about a factor of two,

SMT-PCA and QLRX performance are better than those of SMT for any number of pixels,

for a small number of pixels (p/n0.1), the QLRX performance is better than that of SMT-PCA,

for a large number of pixels (0.2p/n<0.1), the performance of SMT-PCA is better than that of QLRX.

Acknowledgments

The authors gratefully recognize the partial support for this work from the Paul Ivanier Center for Robotics Research and Production Management, Beer-Sheva, Israel. The test images are part of the Oberpfaffenhofen HyMAP scene, collected in 2004, during an airborne campaign sponsored by the Belgian Science Policy Office (BelSPO). Flights were operated by the German Aerospace Center (DLR).

Caefer C. E. Stefanou M. S. Nielsen E. D. Rizzuto A. P. Raviv O. Rotman S. R. Analysis of false alarm distributions in the development and evaluation of hyperspectral point target detection algorithms Optical Engineering 2007 46 7 2-s2.0-34548621420 10.1117/1.2759894 076402 Caefer C. E. Silverman J. Orthal O. Antonelli D. Sharoni Y. Rotman S. R. Improved covariance matrices for point target detection in hyperspectral data Optical Engineering 2008 47 7 076402 2-s2.0-77949787894 10.1117/1.2965814 Theiler J. Cao G. Bachega L. R. Bouman C. A. Sparse matrix transform for hyperspectral image processing IEEE Journal on Selected Topics in Signal Processing 2011 5 3 424 437 2-s2.0-79957457001 10.1109/JSTSP.2010.2103924 Borghys D. Perneel C. Study of the influence of pre-processing on local statistics-based anomaly detector results Proceedings of the Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS '10) June 2010 1 4 2-s2.0-78649274123 10.1109/WHISPERS.2010.5594922