Improvement of Sentinel-1 Remote Sensing Data Classification by DWT and PCA

This article presents a new alternative for data resource, by applying the proposed methods of Principal Components Analysis (PCA) or of Discrete Wavelet Transformation (DWT) on the VV and VH polarization images of the Sentinel-1 radar satellite, aiming at a better classification of data. The study area concerns the Houareb site located in the city of Kairouan in central Tunisia. In addition to Sentinel-1 data, field truth data and the Euclidian Minimum Distance (EMD) criterion were used for classification and validation. Energy descriptors have been proposed in this study for classifications. Cross validation was used to evaluate the results of the classification. The best classification result was achieved using the DWT method applied on VH and VV images with an Overall Precision (OA) of 0.671 and 0.548, respectively, against an OA value of 0.371 and of 0.449 when the PCA method and the Minimum Distance (MDist) classifier were applied on the dual (VV; VH) polarization, respectively. The DWT transformation gives the highest Kappa Precision Coefficient (KPC) of 0.8.


Introduction
Sentinel-1 satellite images in Interferometric Wide (IW) swath mode at level 1 with Ground Range Detection (GRD) were widely used in the last years to identify the Land Cover (LC). Makinde and Oyelade [1] established the LC map of the Lagos site, located in Nigeria by using the maximum likelihood method which was applied for both the VV and VH polarization GRD_IW Sentinel-1 images and achieved an OA of 0.757 and a KPC of 0.719. Abdikan et al. [2] combined VV and VH image polarizations according to eight scenarios and used the Support Vector Machine (SVM) classifier to map the LC in urban area of the megapole site at Istanbul, in Turkey. Results showed higher OA value, of 0.933 instead of 0.739 when the VV polarization is used. Balzter et al. [3] used the full polarizations and the digital elevation data of the SRTM to provide the LC maps of the Thüringen region located in Germany. Four classified images were provided using the Random Forest (RF) method. Result evaluation was made with the 2006-CORINE map. The OA and KPC values were equal to 0.684 and 0.63, respectively. Various approaches used the synergy between the remotely sensed optical and radar data to elaborate the retrieval LC maps. Recently, Bousbih et al. [4] used the classical RF and SVM classifiers with different indicators, derived from the Sentinel constellations (Sentinel-1 and Sentinel-2) to generate the clay content map of a semiarid LC located in central Tunisia. The used moisture indicator was given from the soil moisture maps over different periods. Maps are obtained by combining the Synthetic Aperture Radar (SAR) and the optic data in the Water Cloud Model (WCM) and in the Integral Equation Model (IEM). Gao et al. [5] used the Sentinel-1 and the sentinel-2 data fusion over a site in Urgell (Catalunya, Spain) to retrieve the soil moisture by using the change detection approach and the backscattered NDVI Sentinel-1 radar index. The synergy use of the optic and the radar data was helpful to LC discrimination and identification. However, the full polarimetric radar data and the optic data were not free at the same time. The almost free IW_GRD_Sentinel-1 data are available in VV and VH polarizations, only, like for the Tunisian country. Sentinel-1 and Sentinel-2 data can be free downloaded from the Copernicus web site (https:// scihub.copernicus.eu/). Unfortunately, the only use of this dual polarization (VV; VH) provides poor OA and KPC values for Sentinel-1 data classification. For this raison, we propose to use PCA or DWT orthogonal transformations on VV and VH Sentinel-1 images in order to improve the LC classification.
The orthogonal transformations can lead to a spatial or a frequency analysis depending on the nature of the transformation [6]. When the Finite Impulse Response (FIR) filter is used to filter the satellite image, the frequency analysis can be made. Hence, the new components issued from the filtering enclose energy information. The new component issued from the spatial transformation are obtained from the pixel image projection procedure such us the PCA and the Independent Component Analyses (ICA). These methods use interband covariance matrix to build the new space representation in which bands are uncorrelated.
The PCA transformation is widely used in feature extraction methods. Alons and Malpica [7] used the PCA to obtain a pan sharpening remote sensing images from the multispectral bands in order to perform the automatic classification methods, while Wang and Wang [8] elaborated a regionbased unsupervised segmentation by adaptively combining the texture and the spectral distributions using the PCA method. Singh and Kaur [9] used the PCA to verify if the reduced feature sets for both the water and the urban area coverage in SAR image database are the same as obtained by the GLCM (Gray Level Cooccurrence Matrix) and the GLRLM (Gray Level Run Length Matrix).
In literature, the DWT was applied on (i) SAR data for noise filtering of the Sentinel-1 data [10], (ii) for preclassification change detection from the Sentinel-1 multipolarized images [11], (iii) to retrieve the wind direction form a series of VV-Sentinel-1 images [12], (iv) for image classification [13], and for parameterizing the feedforward neural networks to improve remote sensing LC identification [14]. In this context, this paper shows the powerful use of the DWT components of the Sentinel-1 data, comparing to the use of the spatial PCA method, for a classification procedure. Classifications accuracies are based on the field measurements acquired over the Kairouan plain study site, located in central Tunisia (Lat. 35°40.686 ′ N, Long. 10°5.7798 ′ E).
During the two last decades, this study area was a subject of interest of many researchers. The most recent study is that of Bousbih et al. [4]. Authors generate the clay content map of the plain, using the Sentinel-1 and the Sentinel-2 data. Bousbih et al. [15] used the multidate VV and VH Sentinel-1 measurements in synergy with Landsat-8 optical data, to evaluate the soil characteristics (moisture and roughness) and the vegetation parameters of the same site. Zribi et al. [16] used the ENVISAT ASAR radar data (C-band) in synergy with the SPOT-5 optical images. Zribi et al. [17] elaborated for the same site the moisture model of bare soils by combining the Envisat ASAR and the TerraSAR-X multispectral (L, C, and X) data, which was acquired within in situ measurements of the soil moisture content and the ground surface roughness. Other studies were carried out to map the Kairouan plain LC by using only the optical sensors [18]. So far, there is no any satisfactory classified map for this site.
This paper aims to provide a Sentinel-1 classified image of the Kairouan plain by using the above-mentioned methods. Our paper is organized as follows: in Section 2, the studied area is described; the ground measurements and the database are presented and the two proposed procedures are described. Section 3 presents the results and the discussion. Finally, conclusions are presented in the last section.

Study Site.
The study region is located in the south west of Kairouan as presented in Figure 1 with latitudes 35°to 35°45′ N and longitudes 9°30′ to 10°15′ E, in the center of Tunisia. The study area was chosen as an experimental area by the SIDA FAO project. The region is characterized by a semiarid climate, and it is considered as a plain area. In the Kairouan plain, agriculture is the main economic activity. It includes animal husbandry, cereals and vegetable crops, and arboriculture (almond, apricot, citrus, and olive) dominated mainly by olive plantations.

GCP Measures.
A campaign of Ground Control Points (GCP) of LCs was carried out from 10 to 12 June 2019 on the study area by using the free ODK brief case android application (https://odk-demo.readthedocs.io/en/latest/briefcaseinstall.html). The accuracy of the GCP is 14.53 meters. The GPS point's data were retrieved on computer and processed on the Qgis 3.4.6 software to obtain the georeferenced LC map which represents the GCPs centered in polygons of each LC species. Seven LC species were identified on this site as presented in Table 1 and Figure 2: fruit trees, cereals, fallow, bare soil, soil covered with straws, olive trees, and vegetable crops. The urban area, daim and wadi were visited in campaign and then localized on the TCI Sentinel-2B subimage of the study area as presented in Figure 1.  Figure 3(a). The SRTM data can be free downloaded from website http:// edcsns17.cr.usgs.gov/EarthExplorer/.
The study area presents in the north and north-west a mountain chain. The DEM map was been thresholded, so pixel slope values >15% were converted to grayscale 255 and the others to 0 as presented in Figure 3(b). From the slope map, we masked some reliefs of the study area with black color (as presented in Figure 3(c)).

Sentinel-1 Data and Processing.
Sentinel-1 data were downloaded from the ESA's Copernicus Open Access Hub (https://scihub.copernicus.eu/) at the date of 05 June 2019. No LC change was detected from this date to the campaign date. Sentinel-1 satellite provides the level 1 GRD products in IW mode in descending and ascending orbit at an angle of incidence between 39 and 40 degrees and in the C band 2 Journal of Sensors frequency with a spatial resolution of 10 meters in dual polarization (VV and VH). The Sentinel-1 data was preprocessed using the Sentinel Application Platform (SNAP) as follows: (i) the calibration step allows the conversion of the signal recorded by the sensor in the form of digital accounts into backscatter coefficient; (ii) the geometric terrain correction (georeferencing) step permits to correct the geometric distortions by using the Digital Elevation Model (DEM) proposed by the NASA's Shuttle Radar Topography Mission (SRTM); (iii) the thermal noise removal step using the Lee filter.
The subimages as shown in Figures 4(a) and 4(b) were extracted from the VH and VV polarization images; they cover the region of interest (as presented in Figure 1). The size of each image is 3227 pixels in line by 2917 pixels in column and is coded on 16 bits. The VH and VV subimages have a covariance of 733.5. Thus, the LC similarities included in the polarization images reduce the accuracy of the classification. We propose to reduce this correlation between bands by the use of PCA or DWT transformations.

Classifier
Algorithms. Classification methods which use pixel values or distribution function give low OA values because of the effect of the cross-pixel's values correlation of the LCs [19]. The PCA and the DWT methods transform  3 Journal of Sensors the initial images into a set of new images which are uncorrelated with each from other, and the LCs are uncorrelated in the new space representation. The contribution of these transformations is to reduce the LC correlation between the VV and VH bands and so to increase the LC-OA value of the classified image.
In this paper, DWT and PCA transformations were applied on Sentinel-1 data and the obtained pansharpening images of the study area have been classified according to the EMD criterion ( Figure 5) Results were compared to those obtained with the use of the MDist method in order to see the contribution of orthogonal transformations for the improvement of the classification results.
2.5.1. DWT Description. DWT is a multiresolution approach. The bidimensional DWT is achieved by implementing a bank of single-dimensional filters (filters designed from a Mother Function (MF)) which are low-pass, hðxÞ, and high-pass, gðxÞ, analysis filters. For one level of redundant DWT decomposition as presented in Figure 6, the image is  Journal of Sensors decomposed into three detail images (LH, HL, and HH) corresponding to the image distinct frequency bands and into the LL subband which is the low-pass-filtered version of the image. The LL subband is further decomposed with the same manner, in the second level decomposition [20,21]. The four subbands have the same size as the original image in case of the redundant DWT decomposition. We have applied the redundant DWT on the VV and VH Sentinel-1 images separately at one level of decomposition, and we have obtained three details subbands and an LL band for each image.

PCA Transform.
To apply the PCA on the Sentinel-1 data (composed of the bibands VV and VH) we have, firstly, to compute the covariance matrix between the VV and VH images and, secondly, to project each pixel of Sentinel-1 image from the VV and VH space representation into the new space CP1 and CP2, respectively, by using a transform matrix which is composed by the eigenvectors of the covariance matrix [6]. Figure 7 shows the new CP1 and CP2 components of the Sentinel-1 images of the study area.

LC Descriptors.
The used LC descriptors are described as follows [6,22]: These descriptors are computed in a Region Of Interest (ROI), where (l, m) are the coordinates of a pixel in the pansharpening image D; M is the ROI size, given in pixels.

Classification
Procedure. The EMD criterion was used to classify the Sentinel-1 data of the study area by using the DWT or PCA methods as shown in Figures 5(a) and 5(b), respectively. In the classification procedure, the EMD represents the minimum Euclidian distance between the Local Descriptor Vector (LDV) and the LC energy Descriptor Vectors (LC_DV). The LDV (use (3) in case of DWT, use (4) in case of PCA) was computed for each ROI on the pansharpening images according to formulas (1) and (2), for each position of the ROI-Scanning Window (ScW), which has a size equal to kxk. The used scanning shift is equal to 1, and the classification result is a labeled image.

Rows Columns
LL LH HH HL Image  (Table 1), it attributed an LC_DV (Table 2). We have considering 2/3 of these LC polygons as training samples, and the rest (1/3) were used for validation of the image classification.
Journal of Sensors

Journal of Sensors
where TP and FP are the numbers of the true-positive and false-positive predictions for a considered class. FP is the sum of values in the corresponding column (excluding the TP), and FN for a class is the sum of values in the corresponding row (excluding the TP). The Kappa coefficient (KPC) is also a measure used in remote sensing classification data assessment. KPC is computed from the nonnormalized confusion matrix (Table 3). In this case, coefficients (X ij ) of the confusion matrix represent the number of pixels in LC i which is associated to the class j. KPC is defined using equation (7), where p e is the expected agreement ratio (8), and oa is the observed agreement (9).

Results and Discussions
The results show that by using the DWT method, just one decomposition level is enough to achieve satisfying for all LC classes' accuracy assessment. We could identify nine LC classes among the ten observed classes with an R value (6) above than 0.65 as presented in Table 4. However, the best R value achieved for the C3-class (soil covered with straws), not well identified by DWT procedure, is equal to 0.44. As seen in Table 4, these best results, for each LC classes, depend on a specific choice of the DWT MF, the polarization, and the ScW size (kxk). So, with only the use of VH polarization (resp. VV), we can well identify classes C2, C4, C5, C6, C7, C9, and C10 (resp., C1, C4, and C8). The ScW size is equal to 9 or 11, and the OA values are beyond from 0.474 to 0.671, in these cases. The OA and R gives good accuracy assessments, but, for some classes such as C2, C6, C8, and C10 (resp., C1 and C8) for which R > 0:6 and Pr < 0:58 (5), they present a risk of confusion with other classes in the event of use of the polarization VH (resp., polarization VV). To avoid this risk, Pr must converge to 1 (high R value and low FP value). Table 5 illustrates the eight classes for which Pr > 0:5 with (R > 0:57, FP < 47%). With these restrictions, the R value for some classes (C2, C6, and C8) can be less than those illustrated in Table 4. The classes C3 and C10 are not identified in the case of this restriction (condition). The classes C1, C4, C5, C6, C7, C8, and C9 give the highest R values considering the restriction, in case of use of the polarization VH. All the classes, except C3 and C10, can be well identified by the use of the MF named 'coiflet5', the polarization VH and k equal to 11. The OA and the KPC values are equal to 0.671 and 0.8, respectively, in this case (Table 4, Table 6). Figure 8(a) (resp., Figure 8(b)) illustrates the VH (resp.VV) classified image, in case of k equal to 11 and MF 'coiflet5' (resp., MF 'sym2').
During the field campaign in June, we noticed that the wadi's soils (C4) were dry and were dotted with straws, so it can be considered as a bare soil. In addition, the farmers begin to harvest grain. In light of these findings, we merged the classes C3 and C4 in classification procedure. The results show that we can achieve an R value of these merged classes equal to 0.97 by considering MF as 'Db45', the VV polarization and k equal to 5.
The results obtained by the PCA classification procedure ( Figure 5(b)) do not give satisfaction ( Table 7). The OA and KPC values do not exceed 0.371 and 0.336, respectively (Table 6). Results show only the classes C2 and C9 can present a TP > 50% but with an FP > 50% (87.57% and 89.12%, respectively). These classes have a high-class confusion risk (Table 8). So, DWT method permits to give best results than PCA and MDist method for all classes with lower values of FP.

Conclusion
In this article, we present the potential of the Discreet Wavelet Transform (DWT) to retrieve efficient subbands for classification of Sentinel-1 images (VH and VV). The proposed classification approach was tested on the Kairouan plain, 10 Journal of Sensors located in central Tunisia. The Land Cover (LC) of this semiarid region presents a diversity of classes: fruit trees, cereals, fallow, bare soil, soil covered with straws, olive trees, vegetable crop, urban area, daim, and wadi. The proposed LC descriptors (energy, norm L1) and the Euclidian Minimum Distance criterion were used to classify the Sentinel-1 data of the study area. Classification results were assessed by the use of the ground truth data. As shown in our study, DWT approach gives better classification results than the Principal Components Analysis approach and the minimum distance classification. High representation of data with DWT improves OA from 0.371 to 0.671 and the KPC from 0.336 to 0.8.
In the future, we plan to use features extracted from the DWT pansharpening images as inputs for machine learning to automate the classification process in order to optimize time for the LC mapping. True-positive rate of the considered class TP: True-positive prediction for a considered class FP: False-positive prediction for a considered class

Data Availability
The Shuttle Radar Topography Mission (SRTM) data was used to extract the Digital Elevation Model (DEM) at 30 m of the study area. The SRTM data can be free downloaded from website http://edcsns17.cr.usgs.gov/EarthExplorer/Sentinel-1data and from the ESA's Copernicus Open Access Hub (https:// scihub.copernicus.eu/) The Sentinel-1 data was preprocessed using the Sentinel Application Platform (SNAP) as follows: (i) the calibration step allows the conversion of the signal recorded by the sensor in the form of digital accounts into backscatter coefficient; (ii) the geometric terrain corrections (georeferencing) step permits to correct the geometric distortions by using the Digital Elevation Model (DEM) proposed by the NASA's Shuttle Radar Topography Mission (SRTM); (iii) the thermal noise removal step using the Lee filter.

Conflicts of Interest
The authors declare no conflict of interest.