JECE Journal of Electrical and Computer Engineering 2090-0155 2090-0147 Hindawi Publishing Corporation 818175 10.1155/2012/818175 818175 Research Article Non-Gaussian Linear Mixing Models for Hyperspectral Images Bajorski Peter Gurram Prudhvi 1 Graduate Statistics Department and Center for Imaging Science Rochester Institute of Technology Rochester NY USA rit.edu 2012 4 10 2012 2012 17 06 2012 10 09 2012 2012 Copyright © 2012 Peter Bajorski. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Modeling of hyperspectral data with non-Gaussian distributions is gaining popularity in recent years. Such modeling mostly concentrates on attempts to describe a distribution, or its tails, of all image spectra. In this paper, we recognize that the presence of major materials in the image scene is likely to exhibit nonrandomness and only the remaining variability due to noise, or other factors, would exhibit random behavior. Hence, we assume a linear mixing model with a structured background, and we investigate various distributional models for the error term in that model. We propose one model based on the multivariate t-distribution and another one based on independent components following an exponential power distribution. The former model does not perform well in the context of the two images investigated in this paper, one AVIRIS and one HyMap image. On the other hand, the latter model works reasonably well with the AVIRIS image and very well with the HyMap image. This paper provides the tools that researchers can use for verifying a given model to be used with a given image.

1. Introduction

The following linear mixing model (with a structured background) is often used in hyperspectral imaging literature : (1)r=B·α+ε, where r is a p-dimensional vector (e.g., of reflectance or radiance,) of a pixel spectrum, B is a fixed matrix of spectra of m materials present in the image (as columns bj,j=1,,m), and α is an unknown vector of material abundances. The error term ε is often assumed to follow the multivariate normal (Gaussian) distribution N(0,σ2I) or some other distributional assumptions can be made here.

In this paper, we want to address the question whether model (1) is realistic for hyperspectral images and what type of a distribution should be used for the error term ε. In previous research , we provided a preliminary investigation of the marginal distributions of ε to be modeled by the exponential power distribution. We expand this research here to model the multivariate structure of ε and to propose some other models such as the multivariate t-distribution.

The multivariate t-distribution is a popular distribution for modeling hyperspectral data (see ). In the current literature, this distribution is mostly used for modeling the variability of background materials in a purely stochastic model without a deterministic part such as the one defined in model (1). Here we try to use that distribution for modeling the error term ε, while the major part of the image variability is explained by the deterministic part B·α of the model.

In Section 2, we introduce our notation and show how the deterministic and the stochastic parts of the model are constructed based on the singular value decomposition (SVD). We also provide details on an AVIRIS hyperspectral image used for the numerical results performed in this paper. In Section 3, we explore potential models for the marginal distributions of the error term. In Section 4, we explore potential models for the joint multivariate distributions of the error term. The AVIRIS image is used for numerical examples in Sections 3 and 4. In Section 5, we provide an additional example using a subset of the HyMap Cooke City image. Conclusions are formulated in Section 6.

2. Preliminary Considerations

In this paper, we are going to assume that the abundances of all materials sum up to 1, that is, (2)j=1mαj=1, where αi are coordinates of the vector α. In practice, this assumption may not necessarily be strictly fulfilled due to imperfections in estimation of background signatures and in the linearity of the spectral mixing process. However, it is reasonable to assume that (2) is fulfilled at least approximately. We do not specifically address another reasonable assumptions that 0αj1, but an appropriate choice of the spectral signatures in B should result in positive (or almost positive) αj.

We also assume that no a priori information is available about the spectral signatures in B, that is, we are trying to “guess” all terms on the right-hand side of (1). Let us now assume that we have a hyperspectral image with n pixels, and the model (1) takes the form (3)ri=B·αi+εi, for i=1,,n. Denote the average of all pixel spectra by r-=i=1nri. Let B* be a matrix of vectors (bj-r-),j=1,,m as columns. Because of the property (2), the model (3) is equivalent to the following model centered at r-: (4)ri-r-=B*·αi+εi.

Let us denote by X an n by p matrix of vectors (ri-r-),i=1,,n and write its SVD as (5)X=UDVT, where U is an n by p matrix (with columns uj,j=1,,p), D is a diagonal p by p matrix of singular values dj,j=1,,p, and V is an orthogonal p by p matrix (with columns vj,j=1,,p). We assume that d1d2dp0. The SVD in (5) can also be written as (6)X=j=1pdjujvjT.

In order to build a model of the form (4), we now want to identify m out of the total of p basis vectors vj such that j=1mdjujvjT represents the deterministic part B*·αi of the model (4). This deterministic part will be selected based on the idea that the major materials present in the image exhibit nonrandom behavior, while the remaining variability due to noise, or due to some undetected small amounts of materials, or due to other imperfections in the model, should exhibit more random behavior. We want the remaining sum j=m+1pdjujvjT to represent realizations of the error term ε. We will investigate that sum in the basis system defined by the vectors vj, that is, realizations of a random vector that we denote by Ym+1=(Ym+1,,Yp). Note that each n dimensional vector djuj,  j=(m+1),,p represents a sample from the distribution of Yj. We want to study those samples so that an appropriate distributional assumption about the error terms can be made. We can define the standardized form Zj=Yj/St.Dev.(Yj) with the sample of realizations given by n-1  uj. The vector Zm+1=(Zm+1,,Zp) represents the error term ε in the sphered, or whitened, coordinates. The marginal (uncorrelated) distributions of Zm+1 will be studied in the next section followed by modeling the joint distribution in Section 4.

Numerical results in Sections 3 and 4 use a 100 by 100 pixel (so n=10,000) AVIRIS urban image in Rochester, NY, USA, near the Lake Ontario shoreline (see Figure 1). The scene has a wide range of natural and man-made clutter including a mixture of commercial/warehouse and residential neighborhoods to add a wide range of spectral diversity. Prior to processing, invalid bands, due to atmospheric water absorption, were removed reducing the overall dimensionality to p=152 spectral bands. This image was used earlier in [9, 10].

Color rendering of the cluttered AVIRIS urban scene in Rochester, NY, USA, used in this paper.

3. Modeling Marginal Distributions

Here we investigate two families of distributions as models for the distributions of Zj's. The first one is the exponential power distribution (also called general error or general Gaussian distribution) with the location parameter μ, the scale parameter β>0, and the shape parameter λ>0. Its density is defined by (7)f(x)=λ2βΓ(1/λ)exp[-(|x-μ|β)λ]forx, where Γ(s)=0ts-1e-tdtfors>0 is the gamma function. This is a flexible family of distributions. Its special cases are the Gaussian distribution (λ=2) and the Laplace distribution (λ=1). We assume that μ=0 since the distribution of Zj is already centered. For each Zj,  j=1,,p, the remaining parameters, λ and β, were estimated using the maximum likelihood principle. The fit of data to the resulting exponential power distribution was then checked with the chi-square test based on the statistic (8)χ2=i=1k(Ni-n·pi)2n·pi, where Ni is the count of observations in the ith interval (i=1,,k=25), and pi is the interval probability based on the testing distribution. The intervals were chosen so that pi=1/k,  i=1,,k. The tested distribution should be rejected when χ2χk-1-m2(α/p), where χk-1-m2(α) is the upper 100·αth percentile from the chi-squared distribution with (k-1-m) degrees of freedom, and m is the number of estimated parameters. The value α/p is chosen based on the Bonferroni principle since p inferences are performed here. Figure 2 shows the base 10log values of the chi-square statistic plotted versus the dimension number j. The horizontal line is at the threshold level of χ25-1-22(0.01/152)=56.79 (1.75 on the log scale). The points above the threshold represent the first 19 dimensions and then 22, 23, 25, 28, and 67. Since not all dimension numbers are consecutive, we are faced with a difficulty of choosing a suitable value for m. We propose two potential strategies.

Select as m the largest dimension number (here 28) that is not an “outlier” (such as dimension 67 here). With this approach, we would accept the imperfect modeling along the 67th dimension, and some of the dimensions (20, 21, 24, 26, and 27) would be represented in the deterministic part of the model, even though they could be modeled stochastically by the exponential power distribution. This approach is consistent with the principle of keeping the dimensions with the highest explained variability.

Select all dimensions above the threshold for the deterministic part of the model (possibly including dimension 67) and use the remaining dimensions for the error term. From the point of view of notation in Section 2, we would reorder the dimensions, so that the “non-random” dimensions 22, 23, 25, 28, and 67 would be numbered from 20 to 24, and m would be chosen as 24.

Base 10log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the dimension number j (for the AVIRIS image).

Figure 3 shows the base 10log values of the chi-square statistic plotted versus λ. For the values below the threshold, where the exponential power distribution model would be used, the λ values range from 1.37 to 1.91. Hence, the distributions have tails heavier than the Gaussian distribution, but not as heavy as those of the Laplace distribution.

Base 10log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the shape parameter λ (for the AVIRIS image).

The second potential family of distributions for modeling the distributions of Zj's is the t-distribution with ν degrees of freedom given by the density (9)f(x)=Γ((ν+1)/2)πνΓ(ν/2)(1+x2ν)-(ν+1)/2forx.

Since Zj is scaled to have variance 1, we also need to scale the t-distribution. Hence, we fit a distribution with the density g(z)=γf(γz), where f is defined in (9) and γ=ν/(ν-2). We assume that ν is larger than 2 but is not necessarily an integer. The scaling depends on the unknown parameter ν, so it needs to be taken into account in the maximum likelihood estimation of ν. The fit of data to the scaled t-distribution was again checked with the chi-square test. Figure 4 is analogous to Figure 2, but here the chi-square statistic measures the fit to the scaled t-distribution. The horizontal line is at the threshold level of χ25-1-12(0.01/152)=58.36 (1.77 on the log scale). The points above the threshold represent the first 10 dimensions and then 12, 14, 17, 19, and 23. Here again, we can use one of the proposed two strategies for identifying the dimensionality m of the deterministic part of the model. We can see that the scaled t-distribution gives a significantly better fit than the exponential power distribution.

Base 10log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the dimension number j (for the AVIRIS image).

Figure 5 shows the base 10log values of the chi-square statistic plotted versus the number of degrees of freedom ν. (The parameter ν had a very large value for the first dimension (Z1 with the highest chi-squared value), and it is not shown in the graph for clarity of presentation.) For the values below the threshold, where the scaled t-distribution model would be used, the ν values range from 4.3 to 43.3. Hence again, the distributions have tails heavier than the Gaussian distribution, but not heavier than those of the scaled t-distribution with 4 degrees of freedom.

Base 10log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the number of degrees of freedom ν (for the AVIRIS image).

4. Modeling Joint Multivariate Distributions

The model fitting discussed in the previous section accounted only for the marginal distributions of Zj's. A more challenging task is to ensure a fit of the joint multivariate distribution of data to a suitable model. In the whitened version, the components of Zr=(Zr,,Zp) are uncorrelated, but they might be either stochastically independent or dependent, which depends on a specific assumed model. Again, we will use two competing models. The first one is based on the assumption of independent Zj components following an exponential power distribution. In order to verify this model, we investigate the joint distribution of Zr=(Zr,,Zp) for r=1,,p. More specifically, we investigate the fit of the theoretical cumulative distribution function (CDF) Gr(x)=P{Zrx,Zr+1x,,Zpx} to its empirical equivalent. Note that Gr(x) can be regarded as a univariate simplification of the multivariate CDF of Zr with all coordinates being equal. Based on the assumption of independence, Gr(x)=F(x)p-r+1, where F(x) is the CDF of the exponential power distribution defined in (7). The empirical equivalent of Gr(x) is the fraction of observations (realizations) such that Zjx,  j=r,,p, or equivalently maxrjpZjx. The CDF Gr(x) depends on the parameters β>0 and λ>0 that were estimated using the maximum likelihood principle. The chi-square test based on the statistic (8) was then used to assess the fit of the observations of maxrjpZj to Gr(x). The resulting base 10log values of the chi-squared statistic are shown in Figure 6. As before, the horizontal line is at the threshold level of χ25-1-22(0.01/152)=56.79 (1.75 on the log scale). The points below the threshold represent the dimensions 135 and 136, and then 138 through 152 (=p). The value for G137(x) is only slightly above the threshold. We can say that the model of independent components with an exponential power distribution is quite successful in modeling the multivariate distribution of the “last” 18 dimensions from 135 until the last one (152). However, the model is not very successful in modeling further dimensions, where we apparently observe significant dependencies among Zj’s. This is consistent with low-dimensional components of Zr being independent or having very weak dependence structure that does not show up as significant. That dependence structure gets stronger with higher dimensions.

Base 10 log values of the chi-square statistic (for testing the multivariate fit to Gr(x)) plotted versus r (for the AVIRIS image).

Figure 7 shows the base 10log values of the chi-square statistic plotted versus λ. For the values below the threshold, where a model with independent exponential power distributions would be used, the λ values range from 1.48 to 1.77. Hence again, the distributions have tails heavier than the Gaussian distribution, but not as heavy as those of the Laplace distribution.

Base 10log values of the chi-square statistic (for testing the multivariate fit to Gr(x)) plotted versus the shape parameter λ (for the AVIRIS image).

The second multivariate model that we want to discuss is based on the standard d-dimensional multivariate t-distribution with the density function (10)f(x)=Γ((ν+d)/2)(πν)d/2Γ(ν/2)(1+xTxν)-(ν+d)/2forxd.

The variance-covariance matrix of this distribution is equal to γ2·Id, where Id is a d-dimensional identity matrix and γ=ν/(ν-2). Since we deal here with sphered data modeled by Zr=(Zr,,Zp) (with Var(Zr)=Id and d=p-r+1), we want γZr to follow the standard multivariate t-distribution. Hence, an appropriate candidate distribution for Zr is a scaled multivariate t-distribution with the density function given by g(z)=γdf(γz), where f is defined in (10). We can write (11)g(z)=Γ((ν+d)/2)[π(ν-2)]d/2Γ(ν/2)(1+zTzν-2)-(ν+d)/2forzd.

This distribution is spherically contoured in the sphered coordinates and elliptically contoured in the original coordinates. All marginal distributions of the scaled multivariate t-distribution are the scaled t-distributions discussed in Section 3 that were already confirmed as reasonable models for the AVIRIS data. Here, we want to verify if the multivariate t-distribution provides an adequate multivariate structure for those data.

If γZr follows the standard multivariate t-distribution, then γZr2/d follows the F-distribution with d and ν degrees of freedom (see ). Hence, in order to check the assumption of the multivariate t-distribution, we verify if Zr2/d follows the F-distribution scaled by 1/γ2. As before, the degrees of freedom parameter ν is estimated based on the maximum likelihood principle, and the fit is assessed based on the chi-square statistic. Figure 8 shows the base 10log values of the chi-square statistic plotted versus r. The horizontal line is at the threshold level of χ25-1-12(0.01/152)=58.36 (1.77 on the log scale). We can see only one value below the threshold, suggesting the scaled F-distribution is suitable only for Z1522, which is consistent with the marginal scaled t-distribution for Z152 (discussed in Section 3) since the vector Z152 is one-dimensional. All remaining Zr,  r<p=152, are apparently not modeled well by the scaled multivariate t-distribution. We note that the components of the multivariate t-distribution are not independent, and the dependency structure proposed by this distribution is apparently not consistent with the AVIRIS hyperspectral data investigated here.

Base 10log values of the chi-square statistic (for testing the multivariate t-distribution) plotted versus r (for the AVIRIS image).

5. Cooke City Image

In the two previous sections, we used an AVIRIS image as an example to demonstrate how to fit a linear mixing model with the ε term being potentially non-Gaussian. Specific numerical results might be different, of course, for other images. Hence, it is interesting to perform the same calculations on another image. The AVIRIS image has a wide range of spectral diversity due to the presence of various natural and man-made materials. Hence, it would be interesting to try a more homogenous dataset with less variety. One way to do this could be to classify an image into various types of cover material, and then use spectra from one class as our dataset. A disadvantage of such an approach is the possibility of removing tails of distributions, which might have a tendency of being assigned to a different class. Hence, we chose a relatively uniform subimage of forest in the HyMap Cooke City (see ) image as marked by a red rectangle shown in Figure 9. Four spectral bands were also removed due to some suspicious values. The spectral dimensionality of the dataset used is then p=122, and the spatial dimensionality is 50 by 300 pixels for a total of n=15,000 pixels.

Color rendering of the 280 by 800 pixel HyMap Cooke City image (see  for details about the image).

In order to investigate the marginal distributions, we now follow the ideas and notation of Section 3. The fit with the exponential power distribution is verified with Figure 10, which is analogous to Figure 2. Using Strategy 1 explained in Section 3, we would identify 26 as the dimensionality of the deterministic part of the model.

Base 10log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the dimension number j (for the Cooke City image).

Figure 11 (analogous to Figure 3) shows the base 10log values of the chi-square statistic plotted versus λ. The highest value of λ is almost perfectly equal to 2, suggesting the Gaussian distribution. Most of the λ values are around 1.8 and higher, suggesting slightly lighter tails than those for the AVIRIS image (compare with Figure 3).

Base 10 log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the shape parameter λ (for the Cooke City image).

Figure 12 (analogous to Figure 4), shows a slightly better fit of the scaled t-distribution than that of the exponential power distribution. However, the resulting dimensionality would be identified as the same as before (26).

Base 10 log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the dimension number j (for the Cooke City image).

Figure 13 (analogous to Figure 5) shows the base 10log values of the chi-square statistic plotted versus the number of degrees of freedom ν. The high values of ν again point to more Gaussian-like marginal distributions than those of the AVIRIS image.

Base 10log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the number of degrees of freedom ν (for the Cooke City image).

So far, we discussed only the marginal distributions of Zj's as a precursor to checking the model fit. We now want to consider a more challenging task to ensure a fit of the joint multivariate distribution of data to a suitable model as discussed in Section 4. Figure 14 (analogous to Figure 6) shows the base 10log values of the chi-square statistic (for testing the multivariate fit to Gr(x) representing independent exponential power distributions) plotted versus r. This figure is so strikingly different from the analogous Figure 6, that the author double checked the code (in fact the same code was used in both cases). Note that even for r as small as 1 (which represents all dimensions from 1 until p=122), we obtain an acceptable fit with independent exponential power distributions. The only unacceptable fit is for k=78, which points to some minor issues with the model.

Base 10log values of the chi-square statistic (for testing the multivariate fit to Gr(x)) plotted versus r (for the Cooke City image).

Regarding the final conclusion about the model fit, we need to keep in mind that this is just one test of multivariate fit, and it needs to be considered together with the results shown in Figure 12 telling us that some of the marginal distributions do not have the satisfactory fit. Hence, we can accept the previous conclusion of the dimensionality of the deterministic part of the model as 26, and the remaining dimensions are remarkably well modeled by independent exponential power distributions. This successful modeling might be largely due to this dataset being a fairly homogenous set of spectra (forest area in the Cooke City image).

Since almost all chi-square values in Figure 14 are not significant, it would not be interesting to show an analog of Figure 7. Hence, in Figure 15, we created a plot of the shape parameter λ values (from the multivariate fit to Gr(x)) versus r. We can see that for large r, the fit is fairly close to the Gaussian distribution (λ close to 2). Then for smaller r, λ's are getting smaller, which represents much heavier tails of the distribution, almost up to the Laplace distribution (λ close to 1).

The shape parameter λ (from the multivariate fit to Gr(x)) plotted versus r (for the Cooke City image).

We have also checked the fit to the multivariate t-distribution as shown in Figure 16, which is an analog of Figure 8. We can observe the only good fit at r=p=122 (which is really a univariate fit of the last dimension), which means that none of the multivariate t-distributions fits well to the data.

Base 10log values of the chi-square statistic (for testing the multivariate t-distribution) plotted versus r (for the Cooke City image).

6. Conclusions

In this paper, we investigated various distributional models for the error term in the linear mixing model with a structured background. The models were tested on two datasets. One was an AVIRIS image and the other one was a subimage of a forest area in a HyMap Cooke City image. The first proposed model was based on independent components following an exponential power distribution. The model fitted reasonably well to both datasets in terms of modeling marginal distributions. For the AVIRIS image, the fit of the joint distribution worked quite well for a small number of components, but not for a larger number. For the forest area in the Cooke City image, the fit with the joint distribution of independent exponential power distributions worked very well. This successful modeling might be largely due to this dataset being a fairly homogenous set of spectra.

The second model was based on the multivariate t-distribution. It performed well in terms of the resulting marginal distributions in both datasets, but the dependency structure imposed by this distribution was entirely inconsistent with both datasets. More research is needed to investigate the two models on other hyperspectral images. However, the multivariate t-distribution model does not look promising at this point, while the exponential power distribution model seems to have more potential.

Manolakis D. G. Shaw G. Detection algorithms for hyperspectral imaging applications IEEE Signal Processing Magazine 2002 19 1 29 43 2-s2.0-0036129315 10.1109/79.974724 Scharf L. L. Friedlander B. Matched subspace detectors IEEE Transactions on Signal Processing 1994 42 8 2146 2156 2-s2.0-0028485933 10.1109/78.301849 Johnson S. The relationship between the matched-filter operator and the target signature space-orthogonal projection classifier IEEE Transactions on Geoscience and Remote Sensing 2000 38 1 283 286 2-s2.0-0033873529 Bajorski P. Analytical comparison of the matched filter and orthogonal subspace projection detectors for hyperspectral images IEEE Transactions on Geoscience and Remote Sensing 2007 45 7, part 2 2394 2402 2-s2.0-34347209724 10.1109/TGRS.2007.896544 Bajorski P. Shen S. S. Lewi P. E. A family of distributions for the error term in linear mixing models for hyperspectral images 7086 Imaging Spectrometry XIII August 2008 Proceedings of SPIE 2-s2.0-52349102560 10.1117/12.793380 Marden D. B. Manolakis D. Shen S. S. Lewi P. E. Using elliptically contoured distributions to model hyperspectral imaging data and generate statistically similar synthetic data 5425 Algorithms and Technologies for MultiSpectral, Hyperspectral, and Ultraspectral Imagery X April 2004 558 572 Proceedings of SPIE 2-s2.0-10444280883 10.1117/12.542077 Manolakis D. Rossacci M. Cipar J. Lockwood R. Cooley T. Jacobson J. Shen S. S. Lewi P. E. Statistical characterization of natural hyperspectral backgrounds using t-elliptically contoured distributions 5806 Algorithms and Technologies for Multispectral, Hyperspectral, and Ultraspectral Imagery XI April 2005 56 65 Proceedings of SPIE 2-s2.0-27544479599 10.1117/12.602315 Theiler J. Scovel C. Bouman C. A. Miller E. L. Pollak I. Uncorrelated versus independent Elliptically-contoured distributions for anomalous change detection in hyperspectral imagery 7246 Computational Imaging VII January 2009 72460T Proceedings of SPIE-IS & T Electronic Imaging 2-s2.0-65649115481 10.1117/12.814325 Bajorski P. Generalized detection fusion for hyperspectral images IEEE Transactions on Geoscience and Remote Sensing 2012 50 4 1199 11205 10.1109/TGRS.2011.2166160 Bajorski P. Statistics For Imaging, Optics, and Photonics 2011 New York, NY, USA John Wiley & Sons Anderson T. W. An Introduction to Multivariate Statistical Analysis 2003 3rd New York, NY, USA John Wiley & Sons Snyder D. Kerekes J. Fairweather I. Crabtree R. Shive J. Hager S. Development of a web-based application to evaluate target finding algorithms 2 Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS '08) July 2008 Boston, Mass, USA 915 918 2-s2.0-66549101524 10.1109/IGARSS.2008.4779144