Non-Gaussian Linear MixingModels for Hyperspectral Images

Modeling of hyperspectral data with non-Gaussian distributions is gaining popularity in recent years. Such modeling mostly concentrates on attempts to describe a distribution, or its tails, of all image spectra. In this paper, we recognize that the presence of major materials in the image scene is likely to exhibit nonrandomness and only the remaining variability due to noise, or other factors, would exhibit random behavior. Hence, we assume a linear mixing model with a structured background, and we investigate various distributional models for the error term in that model. We propose one model based on the multivariate t-distribution and another one based on independent components following an exponential power distribution. The former model does not perform well in the context of the two images investigated in this paper, one AVIRIS and one HyMap image. On the other hand, the latter model works reasonably well with the AVIRIS image and very well with the HyMap image. This paper provides the tools that researchers can use for verifying a given model to be used with a given image.


Introduction
The following linear mixing model (with a structured background) is often used in hyperspectral imaging literature [1][2][3][4]: where r is a p-dimensional vector (e.g., of reflectance or radiance,) of a pixel spectrum, B is a fixed matrix of spectra of m materials present in the image (as columns b j , j = 1, . . ., m), and α is an unknown vector of material abundances.The error term ε is often assumed to follow the multivariate normal (Gaussian) distribution N(0, σ 2 I) or some other distributional assumptions can be made here.
In this paper, we want to address the question whether model (1) is realistic for hyperspectral images and what type of a distribution should be used for the error term ε.In previous research [5], we provided a preliminary investigation of the marginal distributions of ε to be modeled by the exponential power distribution.We expand this research here to model the multivariate structure of ε and to propose some other models such as the multivariate t-distribution.
The multivariate t-distribution is a popular distribution for modeling hyperspectral data (see [6][7][8]).In the current literature, this distribution is mostly used for modeling the variability of background materials in a purely stochastic model without a deterministic part such as the one defined in model (1).Here we try to use that distribution for modeling the error term ε, while the major part of the image variability is explained by the deterministic part B • α of the model.
In Section 2, we introduce our notation and show how the deterministic and the stochastic parts of the model are constructed based on the singular value decomposition (SVD).We also provide details on an AVIRIS hyperspectral image used for the numerical results performed in this paper.In Section 3, we explore potential models for the marginal distributions of the error term.In Section 4, we explore potential models for the joint multivariate distributions of the error term.The AVIRIS image is used for numerical examples in Sections 3 and 4. In Section 5, we provide an additional example using a subset of the HyMap Cooke City image.Conclusions are formulated in Section 6.

Preliminary Considerations
In this paper, we are going to assume that the abundances of all materials sum up to 1, that is, where α i are coordinates of the vector α.In practice, this assumption may not necessarily be strictly fulfilled due to imperfections in estimation of background signatures and in the linearity of the spectral mixing process.However, it is reasonable to assume that ( 2) is fulfilled at least approximately.We do not specifically address another reasonable assumptions that 0 ≤ α j ≤ 1, but an appropriate choice of the spectral signatures in B should result in positive (or almost positive) α j .We also assume that no a priori information is available about the spectral signatures in B, that is, we are trying to "guess" all terms on the right-hand side of (1).Let us now assume that we have a hyperspectral image with n pixels, and the model (1) takes the form for i = 1, . . ., n. Denote the average of all pixel spectra by r = n i=1 r i .Let B * be a matrix of vectors (b j − r), j = 1, . . ., m as columns.Because of the property (2), the model ( 3) is equivalent to the following model centered at r: Let us denote by X an n by p matrix of vectors (r i −r), i = 1, . . ., n and write its SVD as where U is an n by p matrix (with columns u j , j = 1, . . ., p), D is a diagonal p by p matrix of singular values d j , j = 1, . . ., p, and V is an orthogonal p by p matrix (with columns v j , j = 1, . . ., p).We assume that The SVD in (5) can also be written as In order to build a model of the form (4), we now want to identify m out of the total of p basis vectors v j such that m j=1 d j u j v T j represents the deterministic part B * • α i of the model (4).This deterministic part will be selected based on the idea that the major materials present in the image exhibit nonrandom behavior, while the remaining variability due to noise, or due to some undetected small amounts of materials, or due to other imperfections in the model, should exhibit more random behavior.We want the remaining sum p j=m+1 d j u j v T j to represent realizations of the error term ε.We will investigate that sum in the basis system defined by the vectors v j , that is, realizations of a random vector that we denote by Y m+1 = (Y m+1 , . . ., Y p ).Note that each n dimensional vector d j u j , j = (m + 1), . . ., p represents a sample from the distribution of Y j .We want to study those samples so that an appropriate distributional assumption about the error terms can be made.We can define the standardized form Z j = Y j /St.Dev.(Yj ) with the sample of realizations given by √ n − 1 u j .The vector Z m+1 = (Z m+1 , . . ., Z p ) represents the error term ε in the sphered, or whitened, coordinates.The marginal (uncorrelated) distributions of Z m+1 will be studied in the next section followed by modeling the joint distribution in Section 4. Numerical results in Sections 3 and 4 use a 100 by 100 pixel (so n = 10,000) AVIRIS urban image in Rochester, NY, USA, near the Lake Ontario shoreline (see Figure 1).The scene has a wide range of natural and man-made clutter including a mixture of commercial/warehouse and residential neighborhoods to add a wide range of spectral diversity.Prior to processing, invalid bands, due to atmospheric water absorption, were removed reducing the overall dimensionality to p = 152 spectral bands.This image was used earlier in [9,10].

Modeling Marginal Distributions
Here we investigate two families of distributions as models for the distributions of Z j 's.The first one is the exponential power distribution (also called general error or general Gaussian distribution) with the location parameter μ, the scale parameter β > 0, and the shape parameter λ > 0. Its density is defined by where Γ(s) = ∞ 0 t s−1 e −t dt for s > 0 is the gamma function.This is a flexible family of distributions.Its special cases are the Gaussian distribution (λ = 2) and the Laplace distribution (λ = 1).We assume that μ = 0 since the distribution of Z j is already centered.For each Z j , j = 1, . . ., p, the remaining parameters, λ and β, were estimated using the maximum likelihood principle.The fit of data to the resulting exponential power distribution was then checked with the chi-square test based on the statistic where N i is the count of observations in the ith interval (i = 1, . . ., k = 25), and p i is the interval probability based on the testing distribution.The intervals were chosen so that p i = 1/k, i = 1, . . ., k.The tested distribution should be rejected when αth percentile from the chi-squared distribution with (k − 1 − m) degrees of freedom, and m is the number of estimated parameters.The value α/ p is chosen based on the Bonferroni principle since p inferences are performed here.Figure 2 shows the base 10 log values of the chi-square statistic plotted versus the dimension number j.The horizontal line is at the threshold level of χ 2 25−1−2 (0.01/152) = 56.79(1.75 on the log scale).The points above the threshold represent the first 19 dimensions and then 22, 23, 25, 28, and 67.Since not all dimension numbers are consecutive, we are faced with a difficulty of choosing a suitable value for m.We propose two potential strategies.
(1) Select as m the largest dimension number (here 28) that is not an "outlier" (such as dimension 67 here).With this approach, we would accept the imperfect modeling along the 67th dimension, and some of the dimensions (20, 21, 24, 26, and 27) would be represented in the deterministic part of the model, even though they could be modeled stochastically by the exponential power distribution.This approach is consistent with the principle of keeping the dimensions with the highest explained variability.
(2) Select all dimensions above the threshold for the deterministic part of the model (possibly including dimension 67) and use the remaining dimensions for the error term.From the point of view of notation in Section 2, we would reorder the dimensions, so that the "non-random" dimensions 22, 23, 25, 28, and 67 would be numbered from 20 to 24, and m would be chosen as 24.
Figure 3 shows the base 10 log values of the chi-square statistic plotted versus λ.For the values below the threshold, where the exponential power distribution model would be used, the λ values range from 1.37 to 1.91.Hence, the distributions have tails heavier than the Gaussian distribution, but not as heavy as those of the Laplace distribution.The second potential family of distributions for modeling the distributions of Z j 's is the t-distribution with ν degrees of freedom given by the density Since Z j is scaled to have variance 1, we also need to scale the t-distribution.Hence, we fit a distribution with the density g(z) = γ f (γz), where f is defined in ( 9) and γ = ν/(ν − 2).We assume that ν is larger than 2 but is not necessarily an integer.The scaling depends on the unknown parameter ν, so it needs to be taken into account in the maximum likelihood estimation of ν.The fit of data to the scaled t-distribution was again checked with the chi-square test.Figure 4 is analogous to Figure 2, but here the chi-square statistic measures the fit to the scaled t-distribution.The horizontal line is at the threshold level of χ 2 25−1−1 (0.01/152) = 58.36(1.77 on the log scale).The points above the threshold represent the first 10 dimensions and then 12, 14, 17, 19, and 23.Here again, we can use one of the proposed two strategies for identifying the dimensionality m of the deterministic part of the model.We can see that the scaled t-distribution gives a significantly better fit than the exponential power distribution.
Figure 5 shows the base 10 log values of the chi-square statistic plotted versus the number of degrees of freedom ν.(The parameter ν had a very large value for the first dimension (Z 1 with the highest chi-squared value), and it is not shown in the graph for clarity of presentation.)For the values below the threshold, where the scaled t-distribution model would be used, the ν values range from 4.3 to 43.3.Hence again, the distributions have tails heavier than the Gaussian distribution, but not heavier than those of the scaled t-distribution with 4 degrees of freedom.

Modeling Joint Multivariate Distributions
The model fitting discussed in the previous section accounted only for the marginal distributions of Z j 's.A more challenging task is to ensure a fit of the joint multivariate distribution of data to a suitable model.In the whitened version, the components of Z r = (Z r , . . ., Z p ) are uncorrelated, but they might be either stochastically independent or dependent, which depends on a specific assumed model.Again, we will use two competing models.The first one is based on the assumption of independent Z j components following an exponential power distribution.In order to verify this model, we investigate the joint distribution of Z r = (Z r , . . ., Z p ) for r = 1, . . ., p.More specifically, we investigate the fit of the theoretical cumulative distribution function (CDF) G r (x) = P{Z r ≤ x, Z r+1 ≤ x, . . ., Z p ≤ x} to its empirical equivalent.Note that G r (x) can be regarded as a univariate simplification of the multivariate CDF of Z r with all coordinates being equal.Based on the assumption of independence, G r (x) = F(x) p−r+1 , where F(x) is the CDF of the exponential power distribution defined in (7).The empirical equivalent of G r (x) is the fraction of observations (realizations) such that Z j ≤ x, j = r, . . ., p, or equivalently max r≤ j≤p Z j ≤ x.The CDF G r (x) depends on the parameters β > 0 and λ > 0 that were estimated using the maximum likelihood principle.The chi-square test based on the statistic (8) was then used to assess the fit of the observations of max r≤ j≤p Z j to G r (x).The resulting base 10 log values of the chi-squared statistic are shown in Figure 6.As before, the horizontal line is at the threshold level of χ 2 25−1−2 (0.01/152) = 56.79(1.75 on the log scale).The points below the threshold represent the dimensions 135 and 136, and then 138 through 152 (= p).The value for G 137 (x) is only slightly above the threshold.We can say that the model of independent components with an exponential power distribution is quite successful in modeling the multivariate distribution of the "last" 18 dimensions from 135 until the last one (152).However, the model is not very successful in modeling further dimensions, where we apparently observe significant dependencies among Z j 's.This is consistent with low-dimensional components of Z r being independent or having very weak dependence structure that does not show up as significant.That dependence structure gets stronger with higher dimensions.
Figure 7 shows the base 10 log values of the chi-square statistic plotted versus λ.For the values below the threshold, where a model with independent exponential power distributions would be used, the λ values range from 1.48 to 1.77.Hence again, the distributions have tails heavier than the Gaussian distribution, but not as heavy as those of the Laplace distribution.
The second multivariate model that we want to discuss is based on the standard d-dimensional multivariate tdistribution with the density function The variance-covariance matrix of this distribution is equal to γ 2 • I d , where I d is a d-dimensional identity matrix and γ = ν/(ν − 2).Since we deal here with sphered data modeled by Z r = (Z r , . . ., Z p ) (with Var (Z r ) = I d and d = p−r+1), we want γZ r to follow the standard multivariate t-distribution.Hence, an appropriate candidate distribution for Z r is a scaled multivariate t-distribution with the density function given by g(z) = γ d f (γz), where f is defined in (10).We can write This distribution is spherically contoured in the sphered coordinates and elliptically contoured in the original coordinates.All marginal distributions of the scaled multivariate t-distribution are the scaled t-distributions discussed in Section 3 that were already confirmed as reasonable models for the AVIRIS data.Here, we want to verify if the multivariate t-distribution provides an adequate multivariate structure for those data.
If γZ r follows the standard multivariate t-distribution, then γZ r 2 /d follows the F-distribution with d and ν degrees of freedom (see [11]).Hence, in order to check the assumption of the multivariate t-distribution, we verify if Z r 2 /d follows the F-distribution scaled by 1/γ 2 .As before, the degrees of freedom parameter ν is estimated based on the maximum likelihood principle, and the fit is assessed based on the chi-square statistic.Figure 8 shows the base 10 log values of the chi-square statistic plotted versus r.The horizontal line is at the threshold level of χ 2 25−1−1 (0.01/152) = 58.36(1.77 on the log scale).We can see only one value below the threshold, suggesting the scaled F-distribution is suitable only for Z 152 2 , which is consistent with the marginal scaled t-distribution for Z 152 (discussed in Section 3) since the vector Z 152 is one-dimensional.All remaining Z r , r < p = 152, are apparently not modeled well by the scaled multivariate t-distribution.We note that the components of the multivariate t-distribution are not independent, and the dependency structure proposed by this distribution is apparently not consistent with the AVIRIS hyperspectral data investigated here.

Cooke City Image
In the two previous sections, we used an AVIRIS image as an example to demonstrate how to fit a linear mixing model with the ε term being potentially non-Gaussian.Specific numerical results might be different, of course, for other images.Hence, it is interesting to perform the same calculations on another image.The AVIRIS image has a wide range of spectral diversity due to the presence of various natural and man-made materials.Hence, it would be interesting to try a more homogenous dataset with less variety.One way to do this could be to classify an image into various types of cover material, and then use spectra from one class as our dataset.A disadvantage of such an approach is the possibility of removing tails of distributions, which might have a tendency of being assigned to a different class.Hence, we chose a relatively uniform subimage of forest in the HyMap Cooke City (see [12]) image as marked by a red rectangle shown in Figure 9. Four spectral bands were also removed due to some suspicious values.The spectral dimensionality of the dataset used is then p = 122, and the spatial dimensionality is 50 by 300 pixels for a total of n = 15,000 pixels.
In order to investigate the marginal distributions, we now follow the ideas and notation of Section 3. The fit with the exponential power distribution is verified with Figure 10, which is analogous to Figure 2. Using Strategy 1 explained in Section 3, we would identify 26 as the dimensionality of the deterministic part of the model.
Figure 11 (analogous to Figure 3) shows the base 10 log values of the chi-square statistic plotted versus λ.The highest value of λ is almost perfectly equal to 2, suggesting the Gaussian distribution.Most of the λ values are around 1.8 and higher, suggesting slightly lighter tails than those for the AVIRIS image (compare with Figure 3).
Figure 12 (analogous to Figure 4), shows a slightly better fit of the scaled t-distribution than that of the exponential power distribution.However, the resulting dimensionality would be identified as the same as before (26).
Figure 13 (analogous to Figure 5) shows the base 10 log values of the chi-square statistic plotted versus the number of degrees of freedom ν.The high values of ν again point to more Gaussian-like marginal distributions than those of the AVIRIS image.
So far, we discussed only the marginal distributions of Z j 's as a precursor to checking the model fit.We now want to consider a more challenging task to ensure a fit of the joint multivariate distribution of data to a suitable model  as discussed in Section 4. Figure 14 (analogous to Figure 6) shows the base 10 log values of the chi-square statistic (for testing the multivariate fit to G r (x) representing independent exponential power distributions) plotted versus r.This figure is so strikingly different from the analogous Figure 6, that the author double checked the code (in fact the same code was used in both cases).Note that even for r as small as 1 (which represents all dimensions from 1 until p = 122), we obtain an acceptable fit with independent exponential power distributions.The only unacceptable fit is for k = 78, which points to some minor issues with the model.Regarding the final conclusion about the model fit, we need to keep in mind that this is just one test of multivariate fit, and it needs to be considered together with the results shown in Figure 12 telling us that some of the marginal distributions do not have the satisfactory fit.Hence, we can accept the previous conclusion of the dimensionality of the deterministic part of the model as 26, and the remaining dimensions are remarkably well modeled by independent exponential power distributions.This successful modeling might be largely due to this dataset being a fairly homogenous set of spectra (forest area in the Cooke City image).Since almost all chi-square values in Figure 14 are not significant, it would not be interesting to show an analog of Figure 7. Hence, in Figure 15, we created a plot of the shape parameter λ values (from the multivariate fit to G r (x)) versus r.We can see that for large r, the fit is fairly close to the Gaussian distribution (λ close to 2).Then for smaller r, λ's are getting smaller, which represents much heavier tails of the distribution, almost up to the Laplace distribution (λ close to 1).
We have also checked the fit to the multivariate tdistribution as shown in Figure 16, which is an analog of Figure 8.We can observe the only good fit at r = p = 122 (which is really a univariate fit of the last dimension), which means that none of the multivariate t-distributions fits well to the data.

Conclusions
In this paper, we investigated various distributional models for the error term in the linear mixing model with a structured background.The models were tested on two datasets.One was an AVIRIS image and the other one was a subimage of a forest area in a HyMap Cooke City image.The first proposed model was based on independent components following an exponential power distribution.The model fitted reasonably well to both datasets in terms of modeling marginal distributions.For the AVIRIS image, the fit of the joint distribution worked quite well for a small number of components, but not for a larger number.For the forest area in the Cooke City image, the fit with the joint distribution of independent exponential power distributions worked very well.This successful modeling might be largely due to this dataset being a fairly homogenous set of spectra.
The second model was based on the multivariate tdistribution.It performed well in terms of the resulting marginal distributions in both datasets, but the dependency structure imposed by this distribution was entirely inconsistent with both datasets.More research is needed to investigate the two models on other hyperspectral images.However, the multivariate t-distribution model does not look promising at this point, while the exponential power distribution model seems to have more potential.

Figure 1 :
Figure 1: Color rendering of the cluttered AVIRIS urban scene in Rochester, NY, USA, used in this paper.

Figure 2 :
Figure 2: Base 10 log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the dimension number j (for the AVIRIS image).

Figure 3 :
Figure 3: Base 10 log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the shape parameter λ (for the AVIRIS image).

Figure 4 :Figure 5 :
Figure 4: Base 10 log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the dimension number j (for the AVIRIS image).

Figure 6 :Figure 7 :
Figure 6: Base 10 log values of the chi-square statistic (for testing the multivariate fit to G r (x)) plotted versus r (for the AVIRIS image).

Figure 8 :
Figure 8: Base 10 log values of the chi-square statistic (for testing the multivariate t-distribution) plotted versus r (for the AVIRIS image).

Figure 9 :Figure 10 :
Figure 9: Color rendering of the 280 by 800 pixel HyMap Cooke City image (see[12] for details about the image).

Figure 11 :
Figure 11: Base 10 log values of the chi-square statistic (for testing the fit to the exponential power distribution) plotted versus the shape parameter λ (for the Cooke City image).

Figure 12 :
Figure 12: Base 10 log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the dimension number j (for the Cooke City image).

Figure 13 :
Figure 13: Base 10 log values of the chi-square statistic (for testing the fit to the scaled t-distribution) plotted versus the number of degrees of freedom ν (for the Cooke City image).

Figure 14 :Figure 15 :
Figure 14: Base 10 log values of the chi-square statistic (for testing the multivariate fit to G r (x)) plotted versus r (for the Cooke City image).

Figure 16 :
Figure 16: Base 10 log values of the chi-square statistic (for testing the multivariate t-distribution) plotted versus r (for the Cooke City image).