A Comparative Study of Land Cover Classification by Using Multispectral and Texture Data

The main objective of this study is to find out the importance of machine vision approach for the classification of five types of land cover data such as bare land, desert rangeland, green pasture, fertile cultivated land, and Sutlej river land. A novel spectra-statistical framework is designed to classify the subjective land cover data types accurately. Multispectral data of these land covers were acquired by using a handheld device named multispectral radiometer in the form of five spectral bands (blue, green, red, near infrared, and shortwave infrared) while texture data were acquired with a digital camera by the transformation of acquired images into 229 texture features for each image. The most discriminant 30 features of each image were obtained by integrating the three statistical features selection techniques such as Fisher, Probability of Error plus Average Correlation, and Mutual Information (F + PA + MI). Selected texture data clustering was verified by nonlinear discriminant analysis while linear discriminant analysis approach was applied for multispectral data. For classification, the texture and multispectral data were deployed to artificial neural network (ANN: n-class). By implementing a cross validation method (80-20), we received an accuracy of 91.332% for texture data and 96.40% for multispectral data, respectively.


Introduction
Image processing and remote sensing are playing a vital role for the betterment of the agriculture field [1]. By using this technology, we can classify vast land cover area into different categories [2]. Not only would this be helpful for the socioeconomic sector but also it fulfills the needs of the future for sustainable development. In the twenty-first century, the world is facing the challenge of hunger, food, and poverty [3]. This issue can be resolved by increase in crop production and better utilization of cultivated land. Land cover information is necessary for different policy making, planning, and management purposes including land record of a forest, desert, farmland, and wetland as well as other biophysical resources, which are required for land cover information. Researchers are trying to get the benefits of technology by involving it in the agriculture field [4]. It is being tried to enhance the cultivated land area and 2 BioMed Research International monitor the land through field survey [5]. For the success of such surveys, more time with expensive labor is required. In developing countries like Pakistan, it seems to be very difficult to spend a lot of resources on such projects. Whether directly or indirectly, almost 50% of the population of these countries is associated with the agriculture profession [6]. All the preceding issues highlight the importance of the proper land management and better crop growth and production. According to geographical distribution of the country, it is categorized into different land cover types like barren, fertile, rocky and sandy, and so forth. In Pakistan, the conventional field based survey system could not be properly managed due to financial and technical limitations. For this reason, remote sensing technology could not be used for natural resource management up till now, as was proposed by the relevant professionals [7]. Many researchers used this technology for better resource management; for example, a two-layer conditional random field (CRF) model was proposed for land cover and land use classification [8]. Similarly, a multilayer conditional random field (MCRF) land classification model was suggested. It was used for multitemporal with multiscale remote sensing data [9]. A gray level cooccurrence matrix with different window size images was used to find the four land types of aerial data. Different statistical features, that is, dissimilarity, homogeneity, angular second moment, and entropy, were calculated to classify the data [10]. A supervised pixel-based classification algorithm was used by implementing Markov Random Field (MRF) technique to distinguish the agriculture land cover area (cropland and grassland). It gave the satisfactory results for updating in GIS database for the cropland and grassland region [11]. A new idea of image spectroscopy (IS) and near-infrared spectroscopy (NIRS) was presented by [12] and it predicted that in the future it will be potentially used in many disciplines like geology, environmental sciences, precision agriculture, urban development, water and soil sciences, and so forth. The objective of this study is to design a simple, concise, and robust framework to classify the five types of land cover data in an absolute natural environment by using spectral and texture features. To accomplish this study, the procedural steps of data collection, image preprocessing, feature extraction, feature selection, feature reduction, and classification are employed for this classification framework.

Study Area
In this study, involving the technologies, that is, image processing and remote sensing, in land cover classification instead of conventional field surveys is tried. This study is conducted at division Bahawalpur of Punjab province (Pakistan) and covered area is 45,588 square kilometers, which is the areawise largest division of this province, located at 29 ∘ 23 44 N and 71 ∘ 41 1 E and shown in Figure 1. This study focuses on the land cover assessment, management, and classification through photographic and multispectral radiometric data of this area, which is mostly barren and desert rangeland. It will also help to monitor the land cover changes and estimate the biomass of land vegetation, which is used for forecasting different crops yield assessment.

Material and Methods
In this study, two types of data are being acquired: (1) photographic data for texture features and (2) radiometric data for remote sensing. Remote sensing data are acquired by using a device named multispectral radiometer (MSR5), CROPSCAN. It is a handheld device, which provides data equivalent to satellite Landsat 5 TM (Thematic Mapper). MSR5 provides an alternative way of acquiring data for remote sensing where satellite or radar datasets are not easily available. Its output data comprises five spectral bands which include visible (blue, green, and red) and infrared and shortwave infrared ranges from 450 nm to 1750 nm, whereas photographic data are acquired by a digital camera. This study will be based on the analysis of five types of land cover datasets, bare land, desert rangeland, fertile cultivated land, green pasture, and Sutlej river land.  3.1. Photographic Data and Image Acquisition. The abovementioned five different types of land cover plots have a 43560-square-foot area (1 acre) for each type. Digital photographs of bare land, desert rangeland, fertile cultivated land, green pasture, and Sutlej river land are taken by a digital Nikon camera, model COOLPIX having a 10.1-megapixel resolution, which are shown in Figure 2. The 15 colored images of each type of land cover with the dimensions of 4288 × 3216 pixels and 24-bit depth of jpg format are acquired. To increase the dataset, four nonoverlapping regions of interests (ROIs) of size (512 × 512) on each image are developed; in this way total (75 × 4 = 300) subimages data are arranged for the analysis. The photographic data are acquired at the altitude of 5 feet from the ground surface of the same specific location where radiometric data are acquired. To keep away from the sun shadow effect, the data are acquired at noontime (1.00 pm to 3.00 pm) under a clear sky. At the time of data acquisition, the light intensity is measured by digital Luxmeter MS 6610, MATECH, and described in Table 1.

Remote Sensing Data Acquisition.
Remote sensing can be defined as the collection of data in the form of radiations about an object taken from a particular distance [16]. Remote sensing is now playing an important role in many disciplines, that is, environmental sciences, geography, agriculture, forestry, botany, meteorology, oceanography, and earth sciences [17].  already used for the assessment and measurement of crops weeds effect [18] and vegetation cover estimation and diseases estimation [19,20]. For remote sensing, data are acquired 50 MSR scans of each plot at 5 feet's height of land cover surface. Each MSR5 scan contains five wave bands, three visible (blue, green, and red) and two invisible (near infrared and shortwave infrared). Five different types of land cover contain total 250 spectral data instances.

Spectral Features.
Multispectral radiometer (MSR5) has five different sections of spectrum, including visible, which include the blue, green, red, near infrared (NIR), and shortwave infrared (SWIR) [15]. MSR5 spectrum consists of different wavelengths, which are measured in nanometer (nm) and described in Table 2. This device will be used to collect data at a specific height normal to the land surface. The device that is used in this study is MSR5 with serial number 566. It contains five spectral bands, which are shown in Table 2. In this study, the data acquired by this device in each scan is at the height of 5 feet and it covers land area for each scan that is almost half of the under height which is almost 2.5 square feet's diameter of land  cover. The multispectral data acquiring process is shown in Figure 3.

Proposed Methodology.
For this study any special laboratory setup for morphological and color features has not been established, just acquired texture features for photographic data and spectral features for remote sensing MSR5 data. A novel spectra-statistical design framework is proposed for subjective land cover classification. The proposed framework is described in Figure 4. The proposed spectra-statistical design framework describes the functionality of this study that is given below in detail. The proposed methodology has been implemented by using MaZda software version 4.6 on Intel5 Core i3 processor 2.4 gigahertz (GHz) with a 64-bit operating system.

Preprocessing.
Each image has a vast irrelevant area, so prior to further processing the relevant portion of the image was extracted. The extracted relevant portions of the images were converted to grayscale images (8 bits) and were stored in bitmap (bmp) format because the software MaZda better works for this format to calculate the statistical texture parameters [21]. By using image converter software, the contrast of grayscale images was enhanced.

Feature Extraction.
Transition of an image into its statistical attributes is called feature extraction, which are used for the classification of an image. There are different methods for feature extraction, that is, texture, Gabor, wavelet transform and boundary features, and so forth.

Texture
Features. Statistical texture features are categorized into the first order, which relates to the intensity of the individual pixels, while the second order relates to the occurrence of neighboring pixels. First-order statistical parameters are directly based on histogram features of an image while second-order parameters are based on the gray level cooccurrence matrix (GLCM). For this study, total 229 statistical texture features are calculated for each region of interest (ROI) by using MaZda software version 4.6. The calculated parameters are grouped as 9 first-order statistical parameters and 11 second-order (Haralick) statistical parameters derived from GLCM in all four directions (0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ ) up to 5-pixel distance 220 (11 × 4 × 5) [22]. It means that each region of interest (ROI) has presented by 229 statistical textural features. Statistically total 300 subimages' data are presented by a 300 × 229 = 68700 dimensional features' vector space.

Feature Selection.
Feature selection is an important study area where hundred to thousand features space datasets are available. Its objective is to select the most significant features in the employed procedures. Furthermore, reliable classification results are based on a large number of features; usually big data have been required, which is not easily available. It is necessary to reduce the dimensionality of statistical features vector space, which has the capability to discriminate and classify the different types of these land cover classes. These approaches have been used for the selection of the most discriminant set of features. Finally, we can achieve fast and cost-effective classification accuracy based on these selected features. In this study, three features selection approaches, that is, Fisher Coefficient (F), Probability of Error (POE) plus Average Correlation Coefficient (ACC), and Mutual Information (MI) Coefficient, have been used to reduce the features vector space. In this study, features selection has been performed through the combined set of the three already mentioned approaches (F + PA + MI) for the entire features vector space by using MaZda software. Fisher Coefficient (F) [23] mathematically is described as where is between-class variance, is within-class variance, is probability of feature , and and are variance and mean value of feature in the given class.
Probability of Error plus Average Correlation Coefficient (POE + ACC) [24] is defined as POE ( ) = number of misclassified samples total number of samples , Mutual Information (MI) Coefficient [25] is explained by the given mathematical relation: It is important to show here that, for this study, as all the 229 calculated features of each image have not been equally significant for land cover classification, MaZda software selects 10 most discriminant features for each method in descending order according to their significance. For analysis, it is observed that combined set of feature selection approaches provide better classification results; in this way total 30 features (10 features by each approach) have been selected [26]. A set of 30 features has been acquired for further processing. These selected features have been shown with respect to their corresponding three feature selection techniques including F, PA, and MI in Table 3. No doubt, in Table 3, the MI based selected features are highly correlated such as "inverse difference moment" but they have different interpixel distance and direction and due to this difference, their calculated values are also different. For each pixel distance ( ) and angular direction value ( ), the intensive nature of computation is involved and acquired different texture feature values for the same parameter, that is, "inverse difference moment." For this study, we have taken = 1-, 2-, 3-, 4-, and 5-pixel distance with angle = 0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ . Therefore, for this reason, we cannot ignore any value of the given texture features. Each value (MI base texture features) actually describes the land cover dataset into its own dimension or direction and as a whole these features reveal the entire texture patterns. It is reported by different researchers [10,11] that five different control features such as window size, texture derivative(s), input channel (i.e., spectral channel to measure the texture), quantization level of output channel (8 bits, 16 bits, and 32 bits), and the spatial components (i.e., interpixel distance and angle during cooccurrence matrix computation) play a vital role during the analysis of GLCM texture features.
3.10. Feature Reduction. Feature reduction techniques are also called feature projection. In feature reduction, the original feature space of selected features is transformed to a new space having lower dimensionality. It is also called projection space in which data are clustered in respective classes. These feature projection techniques include the linear discriminant analysis (LDA), principal component analysis (PCA), and nonlinear discriminant analysis (NDA). For such purpose, usually PCA, LDA, and NDA approaches are employed. Features reduction techniques maintain the actual structure of the data as much as possible while reducing the number of dimensions. Thus in the reduced feature space, the execution time with cost is also reduced and we get smaller dimension space. It is observed that the obtained results are approximately reliable to the original data space. Before starting the classification, the data are standardized to reduce the impact of undesirable variation within the data due to exceptions and other factors by applying the following statistical equation: where is the consistent value of the th feature and = 1, 2, 3, . . . , .
is original feature value, is mean feature value, and is standard deviation.
The above discussed feature selection techniques (F + PA + MI) only select the significant features but do not quantify how much these can be classified. To get the feature data projection, the selected 30 features' data are deployed to nonlinear discriminant analysis (NDA) available in B11 software integrated with MaZda [27]. In this technique there are 3 layers (input layer and the first and second hidden layer and output layer) of processing elements (neurons) that are presented. NDA can be described by logistic function. Its value is equal to 0.5 for = 0, and it changes smoothly from 0 to 1 for varying from large negative to large positive values: If is the feature vector and it is the input to the artificial neural network (ANN), the input terminals are equal to . Vector is the output of ANN, whose dimension is equal to the number of types in the dataset. Thus, the ANN had output terminals: Now, here = 1, 2, 3, . . . , . Consider Here = 1, 2, 3, . . . , . Consider Here = 1, 2, 3, . . . , ℎ . Supervised learning methods are based on input patterns and correct classes where they belong to { , }, where = 1, 2, 3, . . . , . For this purpose, the following errors function is calculated: While for MSR5 datasets, linear discriminant analysis (LDA) gives the better results for features data clustered and projection. Let ( ) denote the th pattern in class , where = 1, 2, 3, . . . , , and = 1, 2, 3, . . . , . Define the within-class scatter matrix as where is the mean vector of class . Similarly, define the between-class scatter matrix as Here is the mean vector of the shared data. The total scatter matrix is the objective of LDA and through this we can get a linear transform matrix: The proposed NDA architecture is given for both types of dataset in Tables 4 and 5. 3.11. Classification. For this work, we have applied supervised classification artificial neural network (ANN). This classifier is employed due to two reasons; first of all we have supervised data (due to five land covers) and it is discussed by [28] Figure 5: Implemented ANN classifier model [15].
that ANN is a strong and efficient technique for noisy data and also for those datasets which are acquired in natural open environment. The implemented classifier based on feed forward approach with a single hidden layer of sigmoidal neurons is shown in Figure 5. If is the number of deployed input feature vectors to ANN classifier then input terminals are equal to . The output feature vector is , whose dimensions are determined by the number of classes to be classified. Thus, the ANN has output terminals: where = 1, 2, 3, . . . , and the outputs of the hidden layers are given as We see here that = 1, 2, 3, . . . , ℎ . For training and testing purpose, the weight coefficients are adjusted and how much actual output value is close to the desired output is observed.
Supervised training techniques are based on input patterns and correct categories where they belong to { , }, where = 1, 2, 3, . . . , ; then following is the error function which is reduced by changing of weights V and : Here, better data clustering based on the NDA approach has been received as compared to the other three approaches. It is observed that the discussed above first three ROIs do not give satisfactory results. They have received less than 70% feature projection accuracy based on these three ROIs, which are not acceptable, whereas, for ROI (512 × 512), we received 80%, 84%, and 88.324% accuracy by using F, PA, and MI, respectively, in projection space of NDA. Because it has been reported by a number of researchers that usually the classification is proportional to the number of features deployed [28], the same strategy was implemented to have better results. For this purpose, the authors merged the selected features by the already above discussed three approaches (F + PA + MI). In this way, a set of 30 features (10 features of each selection method) is received by combining these three approaches on ROI (512 × 512). Then these 30 features were deployed to RDA, PCA, LDA, and NDA by using the K-fold (80-20) cross validation method. It has been observed that NDA has given better data clustering and projection accuracy 99.64% as compared to the other three features reduction techniques. These results are summarized in Table 6. The statistical texture data analyses of RDA, PCA, LDA, and NDA are shown in Table 6. From this table, it is clear that NDA leads the best data projection accuracy of 99.64% as compared to the remaining three approaches including RDA, PCA, and LDA. Figure 6 represents the photographic  features data clustering of five input land cover classes in NDA projection space.

Results and Discussion
It is observed by different researchers [27,28] that feature reduction techniques including raw data analysis (RDA), linear discriminant analysis (LDA), and principal component analysis (PCA) performed well on linearly separable data because PCA and LDA use linear transformation of the input data. These techniques have the ability for feature compression. The most expressive features (MEF) are obtained by PCA and the most discriminating features (MDF) are found from LDA technique. These features vectors have not as many features as the original feature vectors space does. Due to this reason, they cannot help in classification of linearly nonseparable data. Such data need hypersurfaces instead of hyperplanes for data clusters separation. That is why nonlinear discriminant analysis (NDA) is used for nonlinear transformation of the feature vectors, such that the input data are projected on a space (probably of lower dimensionality as compared to PCA and LDA) in which they become linearly separated. In this study, this technique is implemented by using a feed forward artificial neural network (ANN) with two hidden layers of sigmoid-type neurons. To verify the capability of data clustering based on selected features of complex nonlinear datasets, nonlinear discriminant analysis (NDA) is the best approach. Therefore, we have employed the same approach; moreover, B11 software has also a number of options by which NDA may be configured to have the best result. Nonlinear discriminant analysis (NDA) graph shows the properly clustered data into its five appropriate classes. Data clustered graph is shown in Figure 7. By the implementation of (ANN: n-class) training and testing, available in B11 integrated with MaZda software, is performed to verify the validity of classifier. For this purpose, a cross validation K-fold (80-20) method is used. For training purpose, 48 data instances of each ROIs size (512 × 512) from land cover type are used. Total 240 data instances out of 300 were used for training with each iteration. Testing is performed on 60 data instances (12 data instances from each land cover type). Here an accuracy of 100% is acquired when the classifier is trained over the architecture setting already discussed above in Section 3.11 and an average classification accuracy of 91.334% is obtained when the classifier is tested for photographic data. So, five types of land cover data are classified properly by using (ANN: n-class) method. Statistical texture data are shown in Table 7.
The performances of the classifier in testing phase for different classes are summarized in confusion matrix, Table 8. Total 300 data instances of photographic data (60 data instances of each land cover) are shown in the appropriate five different classes.  Here confusion matrix, Table 8, for photographic data is presented of five different land cover types by graphical way in Figure 8.

Spectral Data.
As we have already mentioned, a scene is completely explored based on five spectral bands, blue, green, and red, near infrared, and shortwave infrared acquired by MSR5. The whole data (250 scans) are acquired by MSR5 and deployed to RDA, PCA, LDA, and NDA to verify the validity of data projection. Now, data projection accuracy of 98.7% for RDA, 98.4% for PCA, 99.5% for LDA, and 99.4% for NDA is received. It is clear that the best feature projection accuracy is received by LDA approach as shown in Table 9. The results of data projection are presented in Table 9 in detail.
Multispectral features data analyses of RDA, PCA, LDA, and NDA are shown in Table 9; this shows that LDA outperforms the others and gives 99.5% feature data projection accuracy. For feature reduction techniques, feature data projection graph of MSR5 is shown in Figure 9.
Linear discriminant analysis (LDA) graph shows the properly clustered data into its five appropriate classes as compared to other employed reduction techniques. Data cluster graph is shown in Figure 10.  For the purpose of training and testing, artificial neural network (ANN: n-class) classifier has been employed; the same K-fold (80-20) cross validation method is also used for multispectral data classification. A dataset of two hundred scans of five multispectral parameters, blue, green, and red, near infrared, and shortwave infrared, is deployed for (ANN: n-class) training purpose with the same architecture settings as mentioned above in Section 3.11. The output training results for multispectral data are summarized in Table 10 and are represented graphically in Figure 11. Similarly, under the same architecture setting as discussed earlier in classification Section 3.11, ANN classifier is tested by deploying 50 disjoints data instances (10 data instances of each land cover type) of the selected five multispectral features of land cover types. MSR5 data are shown in Table 10.
ANN classifier revealed very promising results during this training and testing phase. An average classification accuracy of 100% has been achieved when the classifier has been trained over this data. Similarly, an average classification accuracy of 96.40% has been achieved when classifier is tested. So, five land cover types' data are classified properly by using (ANN: n-class) methods. Confusion matrix table of multispectral data classification by using (ANN: n-class) method of five different land cover types is shown in Table 11. Now, confusion matrix graph for MSR5 data is presented by using the (ANN: n-class) method of five different land cover types which are shown in Figure 11.  When comparing both multispectral and statistical texture data classification accuracies, it is observed that multispectral accuracy result is better 96.40% as compared to statistical texture data result which is 91.334%. A comparison graph between multispectral and statistical texture data is shown in Figure 12.
The reason for this classification accuracy difference is that statistical analysis outperforms other methods on fine texture as compared to coarse texture. This is the reason texture data classification accuracy is lower than multispectral data [29,30]. In this study, the photographic data are taken at 5 feet's height so the areas under these photographs are not equally covered and distributed; besides these ROIs also play an important role for classification [31]; as ROIs size increased then accuracy is also observed better. It is the fact that if photographs are taken on more height and area under the region is covered maximum then classification accuracy can be improved. Secondly it is observed that almost (5% to 6%) better classification results are obtained by the remote sensing MSR5 data as compared to photographic data (400 nm to 700 nm) because MSR5 data comprises visible (400 nm to 700 nm) and invisible near-infrared (NIR) and shortwave infrared (SWIR) (790 nm to 1750 nm) wavelength. Data acquisition techniques with normalization and standardization of data with classifier may also impact on results for better classification. By implementing these sophisticated  quantitative parameters rather than conventional qualitative parameters, they can accurately classify the different types of land cover data. Generally, the proposed methodology provides a novel technique for mapping and classifying land cover data by using multispectral and digital photographic data.

Conclusions
In this study, five types of land cover data are classified by using quantitative parameters instead of conventional qualitative parameters and an accuracy of 96.40% for spectral dataset and 91.334% for statistical texture dataset is achieved. Up to what extent these classes may be classified into their appropriate patterns classes is a difficult task and it is also a verification of intra-and interclassification pattern features of these five land cover data types. Five spectral and nine firstorder with eleven second-order cooccurrence matrix features are used to test the land cover datasets which made this framework novel and more reliable and robust than other land classification frameworks in which morphological, size, color, and other geometry features have been used. Artificial neural network is used very effectively for the classification of these five different land cover types such as fertile cultivated land, green pasture, desert rangeland, bare land, and Sutlej river land. In the future, we may enhance this study for hyperspectral data of crop growth and yield assessment. We can also take results with new technique of data fusion by combining MSR5 data with digital photographic data for considering different environmental factors like rain, usage of fertilizers, dry weather, and soil and air moisture effects.