Analysis of a ParetoMixture Distribution for Maritime Surveillance Radar

The Pareto distribution has been shown to be an excellent model for X-band high-resolution maritime surveillance radar clutter returns. Given the success of mixture distributions in radar, it is thus of interest to consider the effect of Pareto mixture models. This paper introduces a formulation of a Pareto intensity mixture distribution and investigates coherent multilook radar detector performance using this new clutter model. Clutter parameter estimates are derived from data sets produced by the Defence Science and Technology Organisation’s Ingara maritime surveillance radar.


Introduction
In recent years at the Australian Defence Science and Technology Organisation (DSTO), clutter mixture models have been analysed, since they can provide tighter fits to spiky clutter in the distribution's upper tail region [1,2].The main application has been to improve the fit of the K-Distribution to real data, resulting in the KK-Distribution introduced in [1].This mixture model involves two separate K-Distributions combined in the amplitude domain.The justification for this is that one component of the mixture models the Bragg/whitecap scatterers, while the other accounts for sea spikes.It was also shown in [1] that the KK-Distribution modelled real polarimetric high-resolution sea clutter returns better than a single K-Distribution, for the case of horizontally polarised clutter, which tends to be spikier than vertically polarised returns.
Recently, the Pareto distribution has been proposed as an alternative clutter model and has also been validated for high-resolution radar clutter returns [3,4].This intensity model has been found to fit both low and high grazing angle clutter returns, for X-band radar operating in a maritime surveillance environment.It has been found to match the performance of the K-Distribution and to be almost as accurate as the KK-model.There are examples where it matches the performance of the KK-Distribution exactly [4].The advantage of using a Pareto clutter model is that it is simpler mathematically, resulting in more basic and efficient detection schemes [5].Thus it is of interest to examine whether, in comparison to the success of the KK-Distribution, a Pareto mixture model yields any improvement in clutter modelling and detection performance.
The paper is structured by introducing the Pareto distribution in Section 2 and illustrating why it is becoming of interest to the radar signal processing community.Section 3 proposes a Pareto mixture model and describes the result of fitting it to DSTO's high-resolution radar clutter sets.Section 4 derives the Neyman-Pearson optimal detector, based upon this clutter model.In addition, a generalised likelihood ratio test (GLRT) detector is produced.Finally, Section 5 gives some examples of detector performance curves, with parameters estimated from real clutter returns.

The Pareto Distribution
As maritime surveillance radar resolution has improved over the years, the backscattering from the sea surface, known as sea clutter, has been found to deviate significantly from the traditional Gaussian model [6].As a consequence of this, many models of clutter have been proposed and analysed, including the Lognormal [7], Weibull [8] and the K-Distribution [9,10].The KK-Distribution [11,12] was proposed to see whether more accurate clutter fits could be achieved, especially for the very spiky horizontally polarised clutter returns.This model proved to be very successful but has a number of serious shortcomings.The first is that the model is a 5-parameter family, although [1] reduces this to 4, based upon DSTO X-band sea clutter data.Previous models studied in radar signal processing have been two parameter distributions.The complexity of the K-and KK-Distribution clutter models also results in difficult mathematical expressions for the coherent multilook detectors.A good example of this is the detector introduced in [13], which shows the KK-Distribution coherent multilook detector is analytically difficult to work with.
The Pareto distribution [14,15] is named after the Italian economist Vilfredo Pareto (15 July 1848-19, August 1923) [16,17] and is a power law probability distribution that has been found to be an excellent model of longtailed phenomena [18].Its original application was in the modelling of income over a population [19].It has been used in the modelling of actuarial data; an example is in excess of loss quotations in insurance [20].Its usefulness as a model for long-tailed distributions has resulted in applications to a number of diverse areas including physics, hydrology, and seismology [18].It has also found suitable applications in engineering, such as internet teletraffic modelling [21].From a radar perspective, its simplicity has resulted in the construction of much simpler detection schemes [5,22,23].
In the intensity domain, the Pareto distribution has a power law density given by for t ≥ β p .The parameter α p > 0 is called the distribution's shape parameter, which governs the tail behaviour.Parameter β p > 0 is the scale parameter, which determines where the distribution's support begins.Its mean is given by E(X) = α p β p /(α p − 1), while its variance is var(X) = β p 2 α p /(α p − 1) 2 (α p − 2).Hence for finite first and second moments, it is required that α p > 2.
Figure 1 shows an example of correlated Pareto intensity clutter, generated using the compound Gaussian model defined in [5].The figure shows the clutter in both the time and frequency domain.In this case, the Pareto shape parameter is α p = 25, while its scale parameter is β p = 1.The compound Gaussian model has a clutter covariance matrix Σ that is Toeplitz, with factor κ = 0.01, resulting in weak clutter returns.This means the (i, j)th entry of Σ is given by κ |i− j| , for i and j in the set {1, 2, . . ., N} with N the number of rows/columns of Σ.
Figure 2 shows the same Pareto distribution simulated with stronger clutter returns (Toeplitz factor κ = 0.99).In this case we see the correlation having a significant effect on the clutter returns.
These figures show the difficulty of performing target detection in a spiky clutter environment.Both Figures 1   and 2 illustrate that many small targets would easily be missed because of the spikiness of the clutter.In addition, false detections are likely for the same reason.Hence it is important to try to find detectors which allow increased probability of detection, even if there are only minor gains in performance.Small gains are significant for detection of small targets in spiky clutter environments.

A Pareto Mixture Model
The extension of the Pareto model ( 1) to a mixture model can be based upon the definition of the KK-Distribution in [1].The Pareto mixture distribution, in the intensity domain, is defined by where β pp is the distribution's scale parameter (so that t ≥ β pp ) and α pp 1 and α pp 2 are two shape parameters.These three parameters are all assumed to be positive real numbers.The mixing coefficient is k ∈ [0, 1].The standard Pareto distribution density (1) is recovered by setting k to zero or one in (2).In contrast to the KK-Distribution, the Pareto mixture was chosen to have a single-scale parameter, and two shape parameters.The idea behind this was that since the scale parameter determines the support, it would be useful to have only one such parameter and allow for greater distributional fitting through two shape parameters.
In MATLAB, an algorithm was created to optimise the fit of the Pareto mixture to real data, in a similar vein to MATLAB's automatic Pareto fit function gpfit, using maximum likelihood estimation.This algorithm fitted the density (2) to individual data sets based upon the four distributional parameters (α pp 1 , α pp 2 , β pp , and k). Figure 3 shows an example of the fit of a Pareto mixture distribution to real data, using empirical distribution functions.The data used for this purpose is the DSTO Ingara maritime sea clutter trials data, which is fully polarised high-resolution X-band clutter returns.Details of the data and radar can be found in [1,24].This data was collected during a trial in 2004, in the Southern Ocean, and has been well documented.A description of the radar and clutter analysis can be found in [1], while the fitting of the Pareto distribution to this data is described in [4].Key points about the trial are that the radar operated in a circular spotlight mode, scanning the same patch of sea surface at different azimuth angles.The Ingara radar is a fully polarimetric, operating in X-band [24].No targets were present in the trial, so that the returns are pure clutter.
This figure shows the Ingara data corresponding to run 34690, at azimuth angle of 255 • , with horizontal polarisation, as considered in [4].The fitted Pareto parameters are α p = 3.191 and β p = 0.0047, while the Pareto mixture parameters are α pp 1 = 3.6637, α pp 2 = 7.0837, β pp = 0.0083, and k = 0.5402.For comparison, a K-and KK-Distribution were also fitted, using the parameter estimation guidelines in [1].The fitted K-Distribution has scale parameter c = 74.45 and shape parameter ν = 2.863, while the KK-Distribution has scale parameters c 1 = 71.45 and c 2 = 344.39,the same shape parameter and mixing coefficient k = 0.01.In this case the K-and KK-Distributions are extremely close together, while the Pareto mixture (denoted PP in Figure 3) fits the data more closely.Under magnification, the KK-Distribution is closer to the data than the K-Distribution, especially in the upper tail region.The Pareto mixture clearly fits the data better than a single Pareto distribution.This was observed in all the Ingara data examined, implying that the Pareto mixture model gave a better fit to the data.

Detector Performance
Neyman-Pearson detectors [15] can be constructed using the Pareto mixture distribution as an intensity clutter model.The formulation of the basic detection problem is outlined briefly, following the set up in [5].The radar return is denoted z, which is a vector of length N, corresponding to the number of looks.The coherent multilook detection problem is stated as the statistical hypothesis test H 0 : z = c against H 1 : z = Rp + c, where all complex vectors are N × 1.The vector p is the Doppler steering vector, whose components are given by p( j) = e j2πi fD , for j ∈ {1, 2, . . ., N}, where f D is the normalised target Doppler frequency (−0.5 ≤ f d ≤ 0.5) and is assumed to be completely known.The complex random variable R represents the target, and |R| is its amplitude.The clutter vector c will be modelled as a spherically invariant random process (SIRP).This means we assume that the clutter takes the form c = SG, where G is a zero mean complex Gaussian process with covariance matrix Σ and S is a univariate nonnegative random variable with density f S .Useful references on the SIRP approach include [25,26].As in [5], it will be assumed that Σ is semi-positive definite, implying that a Cholesky factorisation can be applied to its inverse.Consequently, there exists a matrix A such that Σ −1 = A H A, where A H is the Hermitian transpose.This enables a whitening approach to be applied to the original hypothesis test, since SIRPs are unaffected by a linear transformation [25].As a result of this, the original hypothesis test can be re-expressed as H 0 : r = n versus H 1 : r = Ru + n, where r = Az, n = Ac and u = Ap.The advantage of this is that the clutter process n = SAG is, when conditioned on S, a complex Gaussian process because AG is multidimensional complex Gaussian process, with zero mean but covariance the N × N identity matrix.This enables the corresponding densities to be written down more simply.
In particular, since f n|S (x) = (1/π N s 2N )e −s −2 x 2 , it can be shown that where is called the characteristic function [25].For the case of a fixed target model (so that R is constant), the density under H 1 can be shown to be To extend this to the case where R is unknown the GLRT uses the approximation: in the density under H 1 .
In order to examine the performance of detectors based upon a Pareto mixture distribution assumption, it is necessary to specify the SIRP that generates the desired marginal intensity distributions.To generate a Pareto mixture SIRP, one uses the characteristic function: where ζ is defined by It can be shown that the SIRP with characteristic function (7) has Pareto mixture marginal intensity distributions.The Neyman-Pearson optimal detector [15] is given by the ratio of densities under the respective hypotheses: where L(r) is the likelihood function and τ is the detection threshold.The notation (9) means we reject the null hypothesis H 0 (return is pure clutter) if the likelihood exceeds the threshold (set by the probability of false alarm).When we reject H 0 we accept the alternative hypothesis H 1 , implying that there is strong evidence of a target present in the radar return r.The densities (3) and ( 5), together with (7), can be applied to (9) to construct the optimal detector explicitly.The GLRT is obtained explicitly by applying (6) to (5) prior to it being applied to (9).

Detector Performance Considerations
Using the DSTO Ingara data sets, the performance of the optimal detector, and GLRT, for Pareto mixture clutter models was assessed.Due to the fact that the Ingara data is pure clutter, a synthetic Gaussian target model has been used throughout, as in [5].Pareto and Pareto mixture clutter models have had their parameters estimated from Ingara data sets.The detector performance curves have then been generated using Monte Carlo simulation, using 10 6 runs in each case.
The comparison of detectors based upon the Pareto and Pareto mixture clutter models produced some very interesting results.In most cases it was observed that the mixture distribution assumption did not improve detector performance significantly.This is illustrated in Figure 4, which is a detection performance, plotting the probability of detection (Pd) as a function of the target signal to clutter ratio (SCR).In this example, the mixture distribution has α pp 1 = 20, α pp 2 = 100, β pp = 0.05, and k = 0.8.The Pareto distribution has α p = 2.347 and β p = 0.001277.Additionally, N = 10 looks and the normalised Doppler frequency is f D = 0.5.The false alarm probability has been set to 10 −6 .The simulated clutter has been generated with a Toeplitz covariance matrix with κ = 0.4.This example corresponds to a typical horizontally polarised Ingara data set.Figure 4 shows the optimum detector result as well as the GLRT detector.As can be observed, although the mixture model provided a better clutter fit, there is only a minor gain in detection performance.
Figure 5 is a second example of detector performance.Here, the same clutter parameters and normalised Doppler frequency have been used as for Figure 4.However, the number of looks is N = 5 and the Toeplitz factor has been increased to 0.7.Additionally, the false alarm probability has been reduced to 10 −7 .Comparing the pairs of decision rules, we again observe that the mixture model does not provide a significant improvement in detection performance.
Varying the number of looks, normalised Doppler frequency and Toeplitz covariance factor did not significantly alter the results.Hence it appears that although the KKmixture distribution provides significant detection performance relative to that provided by a detector based upon a single K-Distributed clutter assumption [11], the same is not repeated when the Pareto assumption is applied.This implies a single Pareto distribution is sufficient from a detection performance perspective.

Conclusions
Although a Pareto mixture model can provide a better fit to the upper tail region in real radar clutter returns, this has been observed to not significantly improve detection performance, which is the primary application of radar.Increasing the number of degrees of freedom in a model adds computational cost, slowing the radar's ability to perform its operation in real time.Hence it is computationally better to use a nonmixture model when designing radar detection schemes, under the basic Pareto clutter model assumption.

Figure 1 :Figure 2 :
Figure 1: An example of correlated Pareto clutter returns, where the correlation is small.(a) shows the clutter in the time domain, while the (b) is the result of a fast Fourier transform (FFT) applied to the same clutter.

Figure 3 :
Figure 3: Clutter fit example, showing the empirical distribution functions F(x) for real Ingara data compared to the fitted Pareto, Pareto mixture (PP), K-and KK-Distributions, as a function of intensity x.Here the latter two coincide almost exactly, while the Pareto mixture provides the best fit.

5 Optimal 1 Figure 4 :
Figure 4: Comparison of detector performance curves for Pareto and Pareto mixture models.The probability of detection (Pd) is plotted as a function of the signal to clutter ratio (SCR).The optimal detectors are denoted Optimal (PP) for the mixture model and Optimal (P) for the standard Pareto counterpart.The GLRT for the Pareto mixture is denoted GLRT (PP), while the same for a single Pareto model is GLRT (P).The figure shows that the mixture model provides only a very small detection gain.

5 Figure 5 :
Figure 5: A second example of detector performance, with same legend as for Figure 1.In this case, the clutter strength has been increased (Toeplitz factor of 0.7) and the number of looks decreased.As in the previous figure, the detection gain provided by a mixture model is very small.