An Information Geometric Viewpoint on the Detection of Range Distributed Targets

The paper adopts the information geometry, to put forward a new viewpoint on the detection of range distributed targets embedded in Gaussian noise with unknown covariance. The original hypothesis test problem is formulated as the discrimination between distributions of the measurements and the noise. The Siegel distance, which is exactly the well-known geodesic distance between images of the original distributions via embedding into a higher-dimensional manifold, is given as an intrinsic measure on the difference between multivariate normal distributions. Without the assumption of uncorrelated measurements, we propose a set of geometric distance detectors, which is designed based on the Siegel distance and different from the generalized likelihood ratio algorithm or other common criterions in statistics. As special cases, the classical optimal matched filter, Rao test, and Wald test, which have been proven to have the CFAR property, belong to the set. Moreover, it is also accessible to an intuitively geometric analysis about how strongly the data contradict the null hypothesis.


Introduction
Worthy of repetitive thought is the problem of detecting range distributed targets in the presence of Gaussian noise with unknown covariance.In the actual projects, measurements are collected via radar or sonar, which will be modeled as reflections from targets.With the collected data, a hypothesis test aims at distinguishing between target returns plus noise and noise only.In 1986, Kelly firstly proposed his famous generalized likelihood ratio test (GLRT) for detecting point targets in [1].Later, a lot of advances on the detection of distributed targets are made based on the generalized likelihood ratio algorithm.There is no exception for the recent work that mainly focuses on the research of adaptive detection.In this field, one class is dealing with completely unknown signals [2,3].Specifically, paper [2] by Conte et al. provides a valuable statistical tool for ensuring the CFAR property.Others consider detecting a rank-one steering vector [4][5][6], or a spread-spectrum signal [7,8], and so forth.All these detectors are reasonably testified to have perfect detection performances in corresponding situations.
However, a common problem is that the detectors designed by the generalized likelihood ratio algorithm are limited to the case of determinate signals, due to the given form of distribution under the alternative hypothesis.In this paper, the detection problem with completely unknown target returns is discussed.In addition, there is no assumption whether the measurements are correlated or not.The original hypothesis test problem is formulated as a measure on the difference between distributions of the measurements and the noise, which is intuitively described via the platform of information geometry.
The differential geometric approach was first introduced by Rao [9], in order to supply a way for the construction of distances between parametric density functions.Later related papers greatly improved Rao's concept.It is worth mentioning that Amari presented statistical models with a differential structure in [10,11], which plays an important role in information geometry.The geodesic distance has been put forward as the shortest length between two distributions on the statistical manifold endowed with the Fisher metric and Levi-Civita connection.A very active field of information geometry is about the manifold of multivariate normal distributions.On the multivariate normal manifold, the explicit expression of the geodesics has been derived by Eriksen [12] and later by Calvo and Oller [13], when the initial distribution and direction are given.However, the geodesic distance between multivariate normal distributions has not yet been obtained, except for on the submanifold of multivariate normal distributions with the same mean [14].The lack of the explicit expression greatly limits the application of the geodesic distance.In 1990, Calvo and Oller proposed the Siegel distance on the multivariate normal manifold via embedding into the Siegel group [15].The Siegel distance aims at calculating the geodesic distance between the images of the initial distributions and also provides a lower bound for the geodesic distance.Moreover, it is proven by Calvo et al. in [16] that the Siegel distance shows the similar behavior with the geodesic distance in special cases.This suggests that the Siegel distance is reasonable to be applied to measure the difference between the general multivariate normal distributions.
In this paper, the key point is how to calculate the Siegel distance between the distributions of the measurements and the noise.Since the target returns are completely unknown, the possible distributions of measurements form a submanifold of multivariate normal distributions.By calculating the Siegel distance D from some point on the multivariate normal manifold to the submanifold, a set of detectors are designed with the critical region where  denotes the significance level and the constant   is chosen to satisfy From the viewpoint of information geometry, it also provides an intuitive analysis about how strongly the measurements contradict the null hypothesis.Moreover, the optimal matched filter, Rao test (or modified 2-step GLRT), and Wald test (or 2-step GLRT), which have been proven to have the CFAR property in [2], can also be derived from the set of detectors.
The outline of the paper is as follows: Section 2 shows the problem formulation.Section 3 reviews some important principles related to the geodesic distance between multivariate normal distributions.Section 4 is devoted to the Siegel distance and the design of geometric distance detectors and provides a discussion on the choice of the nominal noise covariance matrix in Section 5. Some intuitive interpretations are presented in Section 6. Section 7 gives a conclusion.

Problem Formulation
The problem under investigation is the detection of completely unknown target returns in the presence of Gaussian noise with unknown covariance.We assume the targets are spatially distributed across  range cells, from which the primary data {z  },  = 1, . . ., , are collected.The primary data consist of the possible target returns plus Gaussian noise.To acquire information about the noises, the secondary data coming from  cells arranged around the targets are supposed to be available.Or rather, the secondary data consist of a training set {z  },  =  + 1, . . .,  + , of which each sample shares the same distribution as the Gaussian noise adherent to the primary data.Therefore, the detection problem is formulated as  ].In fact, the hypotheses (3) carry out a comparison between the distributions of z  and the noise.As usual, considering the distribution of the whole measurement set Z  is verified to be more effective [4].
Suppose the joint probability distribution of the measurements Z  is Gaussian with an unknown covariance matrix Σ.Let Z 0  be the realization of random vector Z  .In the real-time processing such as radar target detection, owing to the unknown mean of Z  , the multivariate normal sampling model for Z  is regarded as where S  + denotes the set of  ×  symmetric positive-definite matrices.As a comparison, an array of noise with the same size is considered.Let N   = [n  1 , . . ., n   ]; then we can specify the sampling distribution of N  as where Λ ∈ S  + denotes the block diagonal matrix having the same block S ∈ S  + .The matrix S, which is obtained from the collected data, contains information about the unknown noise covariance.We will have a discussion in Section 5.
The remaining problem is how to compare the two distributions above.It is essentially a property of distance and can be intuitively described via the platform of information geometry.In this paper, our major concern is to measure the difference between distributions (4) and (5) with methods of information geometry and also to give out a geometric criterion of how strongly the data contradict the hypothesis  0 .
Remark 1.In [2,4,5], since the distribution of Z  under  1 is specified as it is limited to discuss the case of determinate signals.Thus, many other cases cannot be described by the process of generalized likelihood ratio algorithm; however, the derived detectors still have perfect performances in the simulations of [2].For our development, the methods of information geometry enlarge the scope of our discussion, since the covariance Σ in (4) has no constraints on the matrix structure.
Remark 2. It is common in literature works to use distance as a test statistic.Please refer to [14,15].In fact, the Mahalanobis distance can also be used to derive a perfect test, but it is not an intrinsic distance in geometry.

The Riemannian Geometry of the Multivariate Normal Model
We hereafter review some important notions in information geometry, in order to induce a measure of distance on the multivariate normal model.

Statistical Manifold.
The statistical model describes a family of parameterized probability distributions: where Ξ is a subset of R  .Under the condition of  ∞ diffeomorphism, the family G is considered as an -dimensional statistical manifold [11].Each probability distribution (x; ) is represented by a point on the manifold G with the corresponding coordinate .

The Fisher Metric and the Geodesic Distance.
For a statistical manifold G, the Fisher metric aims at defining an inner product on the tangent space   (G).It is given by the Fisher information matrix () widely used in information science as where E  denotes the expectation with respect to the distribution (x; ).With the Fisher metric, the length of a curve (),  ∈ [, ], on the manifold is defined in [11] as where the dot denotes the differentiation with respect to the variable  and the indices indicate the corresponding components in the given coordinate parameterisations.Suppose a curve joining (x;  1 ) and (x;  2 ); the length between the endpoints depends on the choice of the curve.The geodesic distance between (x;  1 ) and (x;  2 ) is defined as the minimum length over all possible curves connecting the two points on the manifold G [17]; that is, We must emphasize that the geodesic distance is not only the intrinsic metric on the manifold of parametric probability distributions but also a genuine metric in the sense that it satisfies the nonnegative, symmetry properties and the triangle inequality.As a measure between probability distributions, the Kullback-Leibler divergence is also popular in information science, which is well-known as the relative entropy [18]: However, it is not a genuine metric, since it fails to be symmetric and satisfies the triangle inequality [19].

Riemannian Connection and Geodesics.
Besides the Fisher metric, another fundamental tensor in Riemannian geometry is the Riemannian connection.The Riemannian connection is a symmetric affine connection that defines a linear one-to-one mapping between tangent spaces.For the Fisher metric, the Riemannian connection exists uniquely.
The Riemannian connection coefficients are given in the form of the Christoffel symbols Γ   by the Fisher metric [11] as Geodesics are defined as the autoparallel curves on the manifold.Usually with the Christoffel symbols, a geodesic () is calculated by the equations [11]: = 0,  = 1, . . ., .
It must be pointed out that, on Riemannian manifold, the geodesic distance between two points equals the length of the shortest geodesic segment between them [14].

Riemannian Geometry of the Multivariate Normal Manifold.
The multivariate normal manifold consists of dimensional normal distributions parameterized by the mean and covariance matrix and is denoted by As detailed in [14], the coordinate system of G  is given by {  ,  = 1, . . ., ;   ,  ≤ }.Moreover, the tangent space   (G  ) coincides with the product space of -dimensional vectors and  ×  symmetric matrices, which is denoted by R  × S  .According to (8), the Fisher metric is given in the form of inner product: where  denotes the point (, Σ) and  1 ,  2 ∈ R  , A, B ∈ S  .According to ( 12) and ( 13), the geodesics on the multivariate normal manifold G  are calculated by solving the following equations: The geodesic distance is calculated by integral (9) along the shortest geodesic.Specifically, when  = 1, the geodesic is explicitly given in [14,20,21].The geodesic distance on the univariate normal manifold is given by Besides the aforementioned basic properties, the geodesic distance defined on the multivariate normal manifold owns the invariance property under the group transformation R  × GL  (R) which acts as where GL  (R) denotes the group of -dimensional regular matrices.
Unfortunately, in the general case  > 1, the analytical solutions to the geodesic equations ( 16) are complicated.The recent development is the achievement of deriving the explicit formula of the geodesic curves in [12,13,22], when the initial point and initial direction of the geodesic are given.Nevertheless, it remains unsolved how to obtain the geodesic curves given the endpoints and further to calculate the geodesic distance between them.However, efforts are made on the set of multivariate normal distributions with constant mean vectors, and another well-known geometry is defined.

The Submanifold with Fixed Mean Vector. Consider the submanifold G 𝑛
0 in G  defined by where  0 ∈ R  is a constant vector.The structure of G   0 has been studied in [14,15,23].The geodesics have been explicitly given and the geodesic distance has been computed.In particular, it is shown in [14] that G   0 is a totally geodesic submanifold of G  .That means, on the manifold G  , the geodesic joining two points of G   0 will line entirely in G   0 .Thus, if given Σ 1 , Σ 2 ∈ S  + , the geodesic from N( 0 , Σ 1 ) to N( 0 , Σ 2 ) is given in [17], by and the initial vector is in the direction of For any A ∈ S  , define where   ,  = 1, . . ., , are the eigenvalues of Σ −1 1 Σ 2 .

The Design of Geometric Distance Detector
As described in Section 2, it is possible to apply the results of geodesic distance to compare the distributions of Z  and N  , further to construct test statistics for problem (3).
Let  =  × .Note Σ ∈ S  + in (4); thus the possible distributions of Z  constitute a submanifold G  Z 0  in G  .We denote the distribution in (5) as the point  1 in the manifold G  .Since in general case  > 1, it has been mentioned before that the direct acquisition of the geodesic distance between multivariate normal distributions with different mean vectors is hard.Here we introduce the Siegel distance given by Calvo and Oller in [15], which attempts to calculate the geodesic distance between the images of the original distributions based on embedding into G +1 0 : Then the Siegel distance is defined in [15] as Similarly, the Siegel distance between  1 and the submanifold G Z 0  is defined as where   1 denotes the normal distribution N(−Z 0  , Λ) and The second equality is due to the invariance properties of the Siegel distance, which is the same as the geodesic distance.The last equality follows from the definition of the Siegel distance in (24).
It has been proven that G  is isometric to (G  ).However, (G  ) is not a geodesic submanifold in G +1 0 .That is to say, the geodesic curve joining the images ( 1 ) and ( 2 ) in G +1 0 may contain points outside (G  ).Thus, the Siegel distance provides a lower bound for the geodesic distance between multivariate normal distributions, which has also been verified as a distance measure on the multivariate normal manifold.

Lemma 3 (Pythagorean theorem).
Let  be a point in G and let G 0 be a submanifold of G.A necessary and sufficient condition for a point   ∈ G 0 to be a stationary point of the function D(, ⋅) :   → D(, ) restricted on G 0 is for the geodesic connecting  and   to be orthogonal to G 0 at   .
The proof of the Pythagorean theorem is common in differential geometry.Then the lemma follows.
Theorem 4. The point    2 in G  0 that achieves the minimum of (25) must satisfy for some  ∈ R and b ∈ R  .
Proof.By the Pythagorean theorem, we have = tr (( This theorem follows due to the arbitrariness of Σ. The matrix Q given in (30) has at most two nonzero eigenvalues  1 ,  2 , which must satisfy Note that the eigenvalues  1 and  2 are opposite sign while b ̸ = 0.
Theorem 5.If  1 and  2 are a solution of the following equations, then the minimum in ( 25) is achieved by and the Siegel distance between  1 and G Proof.For the eigenvalues  1 ,  2 , the corresponding eigenvectors of Q in (30) can be given as Taking ( + 1) × ( + 1) orthogonal matrix we have we have where From ( 28) and (39), we can get Therefore, From ( 41)-(42), we have Equations ( 43) and ( 46) imply (33).Since G +1 0 is a complete manifold, there exists a solution to (33) of the two variables  1 and  2 .
The equations in (33) are nonlinear which can be solved numerically, and then we can calculate the Siegel distance D  by (35).Let  2 = (Z 0  )  Λ −1 Z 0  .From Theorem 5, it is easy to find the Siegel distance D  closely related to  2 .As stated in Section 1, with the Siegel distance D  , the detection problem (3) can be carried out with the critical region defined as (1).Note that the matrix S in Λ is undeterminate, which does not affect the derivation of D  .Therefore, a set of geometric distance detectors are given via different choices of the matrix S.

The Choice of Matrix S
Let z 0  be the realization of random vector z  for each  = 1, . . .,  + .Note that Figure 1 illustrates how the Siegel distance varies with the logarithm of  2 increasing.The curve is obvious to demonstrate a monotonic increase.Thus the critical region in ( 1) is rewritten as where ℎ is an increasing function of  2 .If the underlying covariance M is known, it is natural to take S as M directly.In this case, the optimal matched filter is derived.Otherwise, efficient information about M is given by the maximum likelihood estimate based on the received data.When only with the secondary data, S is replaced by thus the Wald test belongs to the set of geometric distance detectors.In addition, substituting the maximum likelihood estimate based on the whole set of data in place of S, we can find it is the same as the Rao test.In summary, it is verified that the referred three classical tests are members of the set of geometric distance detectors designed based on the Siegel distance.

Interpretations
In this section, some figures are presented, in order to give intuitive interpretations about how the geometric distance detectors work.As special cases, the optimal matched filter, the Rao test, and Wald test are described in the form of geometric distance detectors.
Figure 2 illustrates the definition of the geodesic distance from some point on the univariate normal manifold to the submanifold with the same mean.A group of geodesics from N(0, 1) are shown.Gradient colors along the geodesics represent different geodesic distances given by (17) from N(0, 1) to the current point.The geodesic distance from N(0, 1) to G 1 0.55 is given by the integral along the geodesic marked red.As a geometric distance detector, a threshold is required for the hypothesis test problem. 0 is not rejected if the Siegel distance between the distribution of noise and the submanifold determined by the measurements is less than the threshold.The critical region in (1) is obtained by a projection of submanifolds that are outside of the threshold on the observation space.The same principle applies to the Siegel distance on the multivariate normal manifold.As without visualization features, the multidimensional cases are not presented here.
Figure 3 shows a simulation with  = 1,  = 2, since the examples with a higher dimension have similar behaviors but no visualization features.The distributions of noise are displayed at the origin, with the corresponding noise covariance matrix: the underlying covariance M and the sample covariance S 1 or S 2 .All curves were drawn as the projections on the measurement space.It is obvious that the part outside a contour curve represents the critical region of the geometry detector with the corresponding alarm probability listed in the legend.As the intrinsic merit of geometric measure, the magnitude of the Siegel distance reflects how strongly the collected data contradict the null hypothesis  0 .In Figure 3(d), contour curves with the same alarm probability  = 10 −4 are presented.We can observe that the contour curves of the Siegel distance based on S 1 and S 2 appear to overlap.This is due to the equivalency of the Rao test and Wald test when  = 1 [24].In fact, if we denote the eigenvalues of the  ×  matrix Ω = ∑  =1 z  S −1 1 z   as  1 , . . .,   , then, according to [2,3], the Rao test and Wald test are, respectively, rewritten as When  = 1, Ω = z 1 S −1 1 z  1 has only one nonzero eigenvalue.Thus, both the Rao test and the Wald test coincide to test for the nonzero eigenvalue.However, they yield different Siegel distances because of the distinct covariance matrices which are neglected in projection mapping.The acceptance regions of both the Rao test and the Wald test can be improved by increasing the number of secondary data, so as to approach that of the geometric distance detector based on the underlying noise covariance.
As a general rule, the Rao test and Wald test have different acceptance regions.Figure 4 displays another simulation with

Conclusion
In this paper, an information geometric viewpoint on how to deal with the detection problem of range distributed targets embedded in Gaussian noise with unknown covariance is put forward.More precisely, we have derived a set of geometric distance detectors, of which the optimal matched filter, the Rao test, and Wald test are members.This establishes a link between the information geometry and the hypothesis testing.As a future research, other choices of S can be tested, and it might also be of interest to find one with a better performance among the set of geometric distance detectors.

Figure 1 :
Figure 1: The Siegel distance D versus the logarithm of  2 .

Figure 2 :
Figure 2: Illustration of the geodesic distance from N(0, 1) on the univariate normal manifold to the submanifold G 1 0.55 .The blue straight line shows the submanifold with the fixed mean 0.55.The geodesic curves from the initial point N(0, 1) with different initial directions are displayed with gradient colored curves, and the one orthogonal to the submanifold G 1 0.55 is marked red.

Figure 3 :
Figure 3: Contour curves of the Siegel distance based on (a) the underlying covariance matrix M; (b) the sample covariance matrix S 1 or (c) S 2 .The values marked on the contour curves denote different Siegel distances from the origin to the submanifold specified by the current point.  s listed in the legends denote the corresponding alarm probabilities of the geometric distance detectors; (d) contour curves of the Siegel distances based on different choice of the noise covariance matrix, with the same alarm probability  = 10 −4 .All curves were drawn as the projections on the measurement space.

2 Figure 4 :
Figure 4: The contour curves of the Siegel distance based on the sample covariance matrices S 1 and S 2 , with the same probability  = 10 −4 .All curves are drawn as the projections on the eigenvalues space of Ω.