Unsupervised Classification Method for Polarimetric Synthetic Aperture Radar Imagery Based on Yamaguchi Four-Component Decomposition Model

For improving the accuracy of unsupervised classification based on scattering models, the four-component Yamaguchi model is introduced, which is an improved version of the best-known three-component Freeman model. Therewith, the four-component model is combined with the Wishart distance model. The new proposed algorithm of clustering is rolled out thereafter and the procedure of this new method is listed. In experiments, seven areas of various homogeneities are singled out from the Flevoland sample image in AIRSAR dataset. Qualitative and quantitative experiments are performed for a comparative study. It can be easily seen that the resolution and details are remarkably upgraded by the new proposed method. The accuracy of classification in homogeneous areas has also increased significantly by adopting the new iterative algorithm.


Introduction
Terrain and land-use classification are credibly the most important applications of polarimetric synthetic aperture radar (shorted for PolSAR).This paper concentrates on improvement of the unsupervised classification for the polarimetric synthetic aperture radar imagery.The methods for classification are commonly divided into two categories, supervised classification and unsupervised classification.Supervised classification can be applied if training sets obtained from a ground truth measurement map are available.However, the training sets are commonly absent for most applications.As for the unsupervised classifications, training samples or ground truth maps are not prerequisite.In general, unsupervised classification methods can be divided into three categories.In the first category, earlier classification algorithms of the first category classify polarimetric SAR based on statistical characteristics.Kong et al. [1] proposed an optimal classification scheme using a quadratic distance measure.Yueh et al. [2] and Lim et al. [3] extended it for normalized polarimetric SAR data.For PolSAR data represented in covariance or coherency matrix, Lee et al. originated a maximum likelihood classifier to classify polarimetric SAR data according to terrain types based on the Wishart distribution [4].The second category classifies polarimetric SAR imagery on the basis of the intrinsic attributes of physical scattering mechanisms.This category of methods possesses the extra benefit of furnishing information for class type classification.Nevertheless, the results of classification present commonly the loss of imagery details.The third category combined the physical scattering characteristics and statistical properties for classification.Lee et al. [5] in 1999 proposed a new algorithm which applied the Cloude and Pottier decomposition scheme for initial classification.Significant improvement in classification of details during iterations was observed.It had been demonstrated in this paper that the final classification could be substantially different from initial classified results and pixels of different scattering mechanisms could be mixed together.Chen et al. [6] present a new unsupervised land cover/land-use classification scheme based on polarimetric scattering similarity.It identifies the major and minor scattering mechanisms automatically based on the relative magnitude of multiple-scattering similarities.However, the identification depends solely on entropy value which is not an accurate measurement index.Shah Hosseini et al. [7] in 2011 proposed putting forward a classification algorithm based on the SVM technique for the fully polarimetric AIRSAR L-band data.It combined several different polarimetric parameters for promoting the classification accuracy.The combination of parameters is man-made and subjective.Panigrahi and Mishra [8] in 2011 proposed an entropy based land cover classification based on Gaussian mixed model.Xu et al. [9] in 2014 employed the polarimetric interferometric similarity parameter and the scattering power SPAN and had overcome the misclassification problems to some extent.Nonetheless this new revision can only be appropriate for interferometry datasets.
In practice, the two widely used methods in applications of classification are unsupervised classification based on Cloude and Pottier decomposition and unsupervised classification based on Freeman three-component decomposition [5,10].In unsupervised classification based on Cloude and Pottier decomposition, scattering mechanisms are characterized by entropy and averaged alpha angle.The physical scattering characteristic associated with each zone provides information for terrain type assignment.These parameters for characterizing scattering mechanisms proved to be effective in providing information for terrain type assignment.This distinctive advantage is counteracted by preestablished zone boundaries in the entropy and alpha plane.That is to say that no comprehensive theory accounts satisfactorily for the determination of zone boundaries.
Three-component Freeman and Durden model has been successfully applied to decompose PolSAR image under the well-known reflection symmetry condition.Lee et al. [10] in 2004 proposed the famous algorithm which used a combination of the Freeman model and the maximum likelihood classifier based on the complex Wishart distribution.This algorithm is effective and computationally efficient.A four-component scattering model was proposed to decompose polarimetric synthetic aperture radar images by Yamaguchi et al. [11][12][13][14][15].This four-component model is an extension of Freeman model and is used to deal with nonreflection symmetric scattering case.In this paper, we combine the four-component Yamaguchi model with the Wishart distance measure.As for the classification of polarimetric synthetic aperture radar imagery, the low-level physical scattering model outclasses the statistical model for classifying.The four-component Yamaguchi model is a more accurate physical model which is vital for capturing scattering properties of each pixel.After combining the Wishart distance measure, which is a statistical approach for classification and cluttering, a considerable improvement can be obtained.The rest of this paper is arranged as follows.In Section 2 the probabilistic model will be introduced for our new proposed method.In Section 3 the four-component Yamaguchi model for decomposition is put forward.And then our new proposed classifying schema is presented.In Section 4 the experiment is listed for sample data from the AIRSAR dataset.In the last section a conclusion is drawn.

Probabilistic Model and Wishart Distance of PolSAR Data
2.1.Polarimetric Covariance Matrix.This paper deals with only monostatic backscattering case.For a reciprocal medium, a complex scattering vector is presented by In general, we can use  1 ,  2 , and  3 to denote  HH , √ 2  HV , and  VV in the linear basis or  HH +  VV ,  HH −  VV , and 2 HV in the Pauli basis.For nonreciprocal medium or for bistatic radars, the dimension of vector  is 4, because  HV ̸ =  VH .When radar illuminates an area of a random surface containing many elementary scatters,  can be modeled as having a multivariate complex Gaussian distribution.The distribution model is presented in In formula (2), the matrix  is called complex covariance matrix, which is expressed as formula (3).In formula (3), the superscript " * " denotes complex conjugate transpose and || is the determinant of .According to the definition of , it can be easily seen that the complex matrix  is Hermitian.The real and imaginary parts of any two complex elements of  are assumed to follow circular Gaussian distribution.The circular Gaussian assumption has been validated experimentally to be valid.Multilook polarimetric SAR processing is realized by averaging several independent one-look covariance matrices.The -look covariance matrix is expressed as formula (4).Let  = ; then the matrix  follows a complex Wishart distribution, which is listed in formula (5).In formula (5), the function (, ) is defined as formula (6).Consider (, ) =  0.5(−1) Γ () , . . ., Γ ( −  + 1) .(6)

Wishart Distance.
Based on the Bayes maximum likelihood classifier, a distance measure was derived by Lee et al. in 1994 [4].This distance measure is the famous Wishart distance, which is presented in formula (7).This measure characterizes the distance between a pixel and a class, whereafter this distance can be used in discriminating analysis for each pixel.In formula (7),   denotes the class center covariance matrix, which can be approximated using the averaging of all the training samples.For each pixel, the distance to each class is calculated and then it will be assigned to the class with the minimum distance.Consider In unsupervised classification, it is important to know the distance between classes.These distances are used as criteria to split class or to merge classes.The distance between two classes   is defined as formula (8).  denotes the distance between class  and class .Consider The Wishart distance measure is simple to apply and effective for terrain and land-use classification.It is provided with two good characteristics.Firstly, the Wishart distance measure is independent of the number of looks.Thus it is applicable to multilook processed or speckle filtered polarimetric SAR data.Secondly, this distance measure is independent of polarization basis.Whether the data format is covariance matrix or coherency matrix, it will produce identical classification result.

Four-Component Scattering Model and Revised Classification Method
The three-component scattering model by Freeman and Durden, based on the reflection symmetry, is a helpful tool for the interpretation of the scatterer.However, for PolSAR images including urban areas, the reflection symmetry condition does not hold.In this section, an analysis will be put forward at first for the three-component models.Then the well-established four-component Yamaguchi model will be presented.At last, formulas for calculation of this new model will be given.

Analysis and Extension of Freeman Model.
For the scenes including urban area, the reflection symmetry condition becomes invalid.Under such circumstances, it is necessary to take the effect of ⟨ HH  * HV ⟩ ̸ = 0 and ⟨ VV  * HV ⟩ ̸ = 0 into account.This condition is often called nonreflection symmetry constraint.When examining the actual data from dataset, it can be frequently observed that ⟨ HH  * HV ⟩ ̸ = 0 and ⟨ VV  * HV ⟩ ̸ = 0. Thus the reflection symmetry conclusion is inconsistent with the observed actual scattering phenomena.In order to remold the decomposition scheme for the more general scattering scenes, it is necessary to introduce another term into the model which corresponds to ⟨ HH  * HV ⟩ ̸ = 0 and ⟨ VV  * HV ⟩ ̸ = 0. Helix scattering power as the fourth component is introduced for the more general scenes.This helix scattering power term corresponds to ⟨ HH  * HV ⟩ ̸ = 0 and ⟨ VV  * HV ⟩ ̸ = 0.This term is close to zero for most natural scenes while it is noticeable for urban areas.This helix term is brought in by the scattering matrix of helices and is related to the complicated shapes of man-made structures, which are dominant in urban areas [11].
As regards the calculation of volume scattering power, it is necessary to revamp the original scheme [11].The calculation of volume scattering power should be based on the ratio of backscattering magnitudes of ⟨| HH | 2 ⟩ versus ⟨| VV | 2 ⟩.In the theoretical models of original volume scattering, a crowd of randomly oriented dipoles is implemented with a probability function, which is uniform for the orientation angles.Nevertheless, for vegetated areas, scattering from tree trunks and branches seems to display a certain characteristic angle distribution.

Implementation of Four-Component Model.
It has been proved that identical results can be obtained when applying decomposition scheme on coherency matrix [13].Helix scattering power, denoted by   , can be obtained by comparing the empirical data with the expansion of coherency matrix.The expression of   is listed as Volume scattering power is determined according to the ratio of ⟨| HH | 2 ⟩ versus ⟨| VV | 2 ⟩, as has been mentioned in Section 3.1.Let ratio = 10 ⋅ log(⟨| HH | 2 ⟩/⟨| VV | 2 ⟩).If the ratio is less than −2 dB, then the volume scattering power is listed as formula (10).If the ratio is greater than −2 dB and less than 2 dB, then the volume scattering power is listed as formula (11).If the ratio is greater than 2 dB, then the volume scattering power is listed as formula (12).Consider The surface scattering power   can be determined thereafter.The expression of   is listed as formula The double-bounce scattering power   can also be determined immediately.The expression of   is listed as formula (14).Those intermediate variables which are needful for the calculation of   and   are listed as formulas ( 15)-( 17).Consider It is of vital importance to deal with the problem of negative powers in practical applications.The common situations of such type include  V < 0,  V +   > total power, and   < 0 or   < 0. To overcome these imperfections, a simple constraint on the scattered powers is applied such that all the powers should be positive and less than the total power.If  V < 0, then   is mandatorily set to zero.If  V +   > total power, then the surface scattering power and the doublebounce scattering power are mandatorily set to zero.If   < 0 or   < 0, then the corresponding variables can be set to zero.

Scheme for Unsupervised
Classification.Supervised classification excels unsupervised classification for the reason that the former utilizes the certain a priori knowledge as the training samples.The method proposed by Lee et al. [5] adopts the core idea of supervised classification.The kernel of that method consists of applying Freeman decomposition and classifying according to Wishart distance.The results of Freeman decomposition play the part of initial training sets.It has been validated in experiments that this method gains good performance in applications.Due to the intrinsic deficiency of Freeman model, there exist some errors when applying this method on actual data.These errors are notable especially for urban areas.Nevertheless, the core idea of this method deserves being carried forward.We continue to follow this core idea in this paper.The coherency matrix format is selected in our proposed new method.The equivalency has been proved for the covariance matrix and coherency matrix in [5].A brief description of the steps of our new proposed method is listed as follows.(a) We apply Yamaguchi four-component decomposition on coherency matrix.(b) The results produced in last step will be used as initial training sets.All the pixels are classified to form several class sets.(c) Further clustering will be adopted according to the dominant power of each pixel.This procedure of clustering is based on Wishart distance introduced in Section 2.2.Iterative clustering can be applied if needed.(d) We assign color for each pixel according to its dominant power.
The detailed steps of our new proposed method will be put forward as below.
(1) Applying Yamaguchi decomposition on coherency matrix of each pixel: then each pixel is classified to a top-level class according to its dominant power.That is to say that the dominant power is determined by the maximum of these four components of scattering power.There exist four top-level classes in total, including surface scattering, double-bounce scattering, volume scattering, and helix scattering.These four top-level classes are denoted by S, DB, V, and H, respectively.It is of vital importance to note that the assignment of top-level classes for each pixel cannot be altered in the following steps.This requirement can guarantee that the scattering mechanism of each pixel will not be altered.
(2) Applying sorting algorithm on the members of each top-level class according to their dominant scattering power: then all the members of each top-level class are divided into several second-level classes.For example, we can divide each top-level class into 25 second-level classes.(3) Applying clustering on the second-level classes: this clustering is based on the distance between classes defined in formula (8).The clustering for these second-level classes cannot be out of range of its toplevel class.For example, we can merge all the secondlevel classes into   second-level classes.
(4) Reassigning second-level classes for each pixel: for all the members of each top-level class, it is needed to calculate the Wishart distance afresh.Then the discriminatory analysis is performed on each pixel based on the newest Wishart distance.Iterative discriminatory analysis can be performed if needed.
(5) Assigning color for each second-level class: the averaged dominant power needs to be calculated for each second-level class.The color shade for each secondlevel class is determined according to its averaged dominant power.
The flow chart of this scheme is presented in Figure 1.

Experiments
Probabilistic model, Wishart distance of PolSAR data, and Four-component scattering model are presented in the former sections.Unsupervised classification scheme is also put forward thereafter.In this section, several experiments will be performed to validate the performance of the new proposed method.The AIRSAR dataset provided by Jet Propulsion Laboratory NASA is used for this measurement.The Flevoland image is singled out as the experimental sample.The dimension of this image is 750 rows and 1024 columns.The cell size of pixel is 6.6 m in range direction and 12.10 m in azimuth.There exist many cropland areas, roads, and water areas.Qualitative measurement is presented in Figure 2. None of any speckle filters applying on this image is a precondition.The initial classification results using Freeman model and those after being clustered by 4 iterations are exhibited in Figures 2(a It is impossible to acquire the minor errors between the two methods mentioned above by naked-eye observation.Quantitative measurement is performed thereafter using the index named classification accuracy.The classification mapping figure is gotten in the literature [5].The classification accuracy can be defined according to the classification mapping figure.The classification accuracy is equal to the ratio of pixel amount divided by the corresponding amount in classification mapping figure.Firstly, ten homogeneous areas are singled out for the measurement of classification accuracy.These areas include stembean, forest, rapeseed, baresoil, potato, beet, pea, wheat, water, and lucerne.Six areas among these ten areas are listed in Table 1.These results using Freeman and Yamaguchi model, denoted by FDD and New, respectively, are listed from top to bottom.Among these six areas, the stembean and rapeseed area are of relatively low homogeneity.The wheat is of intermediate homogeneity.The forest, baresoil, and water are of relatively high homogeneity.It can be easily seen from Table 1 that the accuracy measured in areas of low homogeneity is not appropriate for reference.The reason is that these areas consist of pixels owning mixed scattering mechanisms.As for the areas of relatively high homogeneity, such as baresoil, the vast majority are surface scattering pixels.The classification accuracy for this area has increased from 75.02% (12194 pixels) to 80.05% (13012 pixels).

Conclusion
In this paper the classic Freeman three-component model is analyzed at first.The Yamaguchi four-component model, which is an upgraded model for Freeman model, is introduced into unsupervised classification.The Wishart distance is also employed in this unsupervised classification for initial discriminatory analysis and iterative clustering.The complete

Figure 1 :
Figure 1: The flow chart of unsupervised classification.

Figure 2 :
Figure 2: Initial classification results and those after being clustered.(a) Initial classification results using Freeman model.(b) The classification results after being clustered by 4 iterations.(c) Initial classification results using Yamaguchi model.(d) The classification results after being clustered by 4 iterations.
) and 2(b), respectively.The initial classification results using Yamaguchi model and those after being clustered by 4 iterations are exhibited in Figures2(c) and 2(d), respectively.It can be easily seen that the spatial details are enhanced by iterative clustering.