Multisource Data Fusion Framework for Land Use / Land Cover Classification Using Machine Vision

1Department of Computer Science & IT, The Islamia University of Bahawalpur, Punjab 63100, Pakistan 2Key Laboratory of Photo-Electronic Imaging Technology and System, School of Computer Science, Beijing Institute of Technology (BIT), Beijing 100081, China 3Department of Computer Science, NFC IET, Multan, Punjab 60000, Pakistan 4Department of Computer Sciences, Quaid-i-Azam University, Islamabad 45320, Pakistan 5Department of Computer Science, Virtual University of Pakistan, Lahore, Punjab 54000, Pakistan


Introduction
The conventional methodologies are present to measure and monitor the land use/land cover (LU/LC) for regional and global environment changes [1].The real-time LU/LC data is very important for resource management, future prediction, crop growth assessment, and sustainable development [2].Although conventionally LU/LC data is collected through field base survey, remote sensing data collection has its own importance due to time, accuracy, and transparency factors and so forth.During the last decade, space-borne multispectral data have proved more beneficial over ground and airborne data for land monitoring, assessment, and accurate information due to their increased spectral resolution.
Previously single source dataset is mostly used for LU/LC classification but recently multisource dataset is used for better overall accuracy results.Land cover is a primary factor that plays an important role for physical and chemical variation in environment.The change in LU/LC can be accurately identified by monitoring the regional and global classification maps continuously.When remote sensing data is used along with ground truth data then it provides reliable and cost-effective LU/LC information.Remote sensing mostly used the synthetic aperture radar (SAR) data for LU/LC information but cloudy weather is one of the major obstacles to acquire the information through optical imagery.It has been strengthened the significance of new tools and techniques for acquiring LU/LC thematic information from remote sensing data [3].In recent years, satellite-based remote sensing data have been very hot research area for earth scientists.Many researchers have worked on combining the spectral and optical data, which enhanced discrimination power of integrated data and their overall classification accuracy results [4,5], and described the simple fused model for land cover classification which is named fused mixture model.The spatial and temporal adaptive-reflectance fusion model (STARFM) was proposed by [6], which gave the better accuracy results.For earth observation applications, remotely accessed sensor base multispectral data provides better large-scale information as compared to optical data [7].The fusion techniques enhance the operational capabilities of dataset with respect to other tuning factors and overall accuracy results [8].In data fusion, two or more datasets are merged together to acquire one single dataset with the entire dataset features individually [9].The low resolution multispectral dataset is fused with high resolution optical radar dataset to get the better results in terms of spatial resolution and overall classification accuracy [10].Huang with his companion described that LU/LC is the coarse dataset in spatial resolution and changes frequently when observing through remote sensing and it is very difficult to measure and monitor the change accurately [11].Different image fusion techniques with their constraints in implementation stages are discussed by [12,13].They proved quantitatively that fusion plays important role in better interoperational capabilities and reduces the ambiguity linked with the data acquired by different sensors or by same sensor with temporal variation.Quartz rich and poor mineral types are identified by using the image fusion method with the implementation of supervised classifier maximum likelihood (ML) and acquired overall accuracy and kappa coefficient of 96.53% and 0.95, respectively [14].
In this study, it has been tried to design a framework for analyzing the potential of multispectral dataset fused with texture feature dataset for the discrimination of different LU/LC classes.

Study Area
This study explains the data fusion technique for LU/LC classification instead of traditional ground base field surveys.All the experimentations have been performed at Islamia University of Bahawalpur Punjab province (Pakistan) located at 29 ∘ 23  44  N and 71 ∘ 41  1  E. This study describes the LU/LC monitoring, management, and classification using fused dataset generated by the combination of photographic and multispectral radiometric data, which is mostly bare and deserted rangeland.It would provide accurate results for LU/LC cover changes and prediction for better crop yield assessment.

Dataset
For this study, multispectral dataset is obtained by using the device named Multispectral Radiometer Crop Scan (MSR5).It gives data which is equivalent to the Satellite Landsat TM (Thematic Mapper) [15].It has five spectral bands, that is, blue (B), green (G), red (R), near infrared (NIR), and shortwave infrared (SWIR) ranges from 450 nanometers to 1750 nanometers, while digital photographic data are acquired by a high resolution digital NIKON Coolpix camera having 10.1-megapixel resolution.

Material and Methods
The objective of this study is to analyze the five types of LU/LC multispectral data with the digital photographic data.A multisource data fusion frame work is designed to classify the subjective LU/LC type's data accurately.Different image processing techniques have been applied on photographic data, that is, color to gray scale conversion, enhanced contrast, and image sharpening procedure.A still camera is mounted at 4-feet height stand and acquired five types of LU/LC images dataset.For image dataset, 20 images of each LU/LC with the dimension of 4288 × 3216 pixels with 24bit depth of jpg format have been acquired.To increase the size of image dataset, 5 nonoverlapping region of interests (ROIs) with different pixels size, that is, (32 × 32), (64 × 64), (128 × 128), (256 × 256), and (512 × 512), have been taken on each image with the dimension of (4288 × 3216) and a total of 100 (20 × 5) images of above discussed sizes have been developed for each land type and a dataset containing 500 images on five types of LU/LC has been developed for experimentations.Similarly, for multispectral dataset, five spectral bands data are acquired and each band comprises visible ranges, that is, B, G, and R, from 400 nanometers to 700 nanometers, invisible bands near infrared (NIR) ranging from 750 nanometers to 900 nanometers, and shortwave infrared ranges from 1350 nanometers to 1650 nanometers.The multispectral dataset are acquired using a device multispectral radiometer (MSR5) serial number 566.For (MSR5) dataset, it has been observed that 100 scans of each of the LU/LC types and a total of 500 scans are acquired on the same location where digital images have been acquired.To avoid sun shadow effect, whole data gathering process has been completed at noon time (1.00 pm to 2.30 pm) under clear sky.
Experimentation.This study is unique because there is no need for any special laboratory setup.For image dataset, prior to further processing, different sizes of images have been converted from color to gray scale images (8 bits) and stored in bitmap (.bmp) format because MaZda software works better to calculate texture features in this format.The contrast level of grayscale images has been enhanced by using the image converter software.Now image dataset has been ready to calculate the first-order histogram and second-order texture parameters.MaZda software has been used to calculate 9 histogram features and 11 second-order texture features (Haralick) using gray level cooccurrence matrix (GLCM) in four dimensions, that is, 0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ up to 5-pixel distance and calculated 220 (11 × 4 × 5) texture features with 9 histogram features and 229 features in total for each ROI.It has been observed that total 114500 (229 × 500) features space for whole image dataset have been calculated [16].It is important to be mentioned here that it is not so easy to handle this large-scale feature space that is why three feature selection techniques, namely, Fisher (F), Probability of Error plus Average Correlation Coefficient (PE + AC), and Mutual Information (MI), have been employed to extract optimized features dataset.These three techniques have been merged together (F + PA + MI) and extracted thirty most discriminant features (10 features by each technique) out of 229 features space for each (ROI) image dataset.All the experimentations have been performed using MaZda software version 4.6 with Weka data mining tool version (3.6.12) on Intel5 Core i3 processor 2.4 gigahertz (GHz) with a 64-bit operating system [17].
Proposed Methodology.The proposed methodology has been described in Figure 1.First data fusion algorithm has been described with all procedural steps.

Data Fusion Algorithm
Start main

{
Input  Multispectral and Photographic land use/ Land cover dataset For { Step 1 to Step 7 Step 1 = Photographic and multispectral datasets  five land types.
Step 2 = Data Preprocessing Step 3 = Developed co-occurrence matrix for photographic dataset and extract texture features Step 4 = Multispectral dataset with five spectral bands  visible and invisible wavelength Step 5 = Three feature selection techniques, fisher (F), probability of error plus average correlation (POE + AC) and mutual information (MI) are merged (F + PA + MI) and employed on photographic dataset.
Step 6 = Extract 30 optimized texture features dataset Step actually describe the LU/LC dataset into its own direction and as whole these features disclose the entire texture patterns.It has been discussed by many researchers [10][11][12][13][14] that five control features, that is, window size, texture derivative(s), input channel (i.e., spectral channel to measure the texture), quantization level of output channel (8 bits, 16 bits, and 32 bits), and the spatial components, that is, interpixel distance and angle during cooccurrence matrix computation, play a very important role during the analysis of texture features.
In the fourth step, these 30 texture features are combined with 5 multispectral datasets and a fused dataset is developed with the combination of two different sources of data [18].
Table 1 describes the optimized texture feature dataset while Table 2 describes the multispectral feature dataset.
In the last step, this fused dataset has been deployed to different machine vision classifiers, that is, artificial neural network (MLP), Naïve Bayes (NB), Random Forest (RF), and J48; here j48 is the implementation of C4.5 algorithm of decision tree in Weka software.Figure 1 describes the multisource data fusion framework for LU/LC classification.

Results and Discussion
It has been observed that, as discussed above for image dataset, four ROIs with different pixel sizes, that is, 32 × 32, 64 × 64, 128 × 128, and 256 × 256, do not give satisfactory results for classification.The overall classification accuracy of less than 75% has been observed by implementing the MLP, NB, j48, and RF classifiers on the basis of these four ROIs which have not been acceptable, while, on ROI 512 × 512, the promising results for image data classification are provided.Finally to generate the fused dataset, the ROI of size 512 × 512 has been merged with multispectral dataset.For classification, different machine vision classifiers have been employed on this fused dataset using Weka software version (3.6.12), that is, Multilayer Perceptron (MLP), Naïve Bayes (NB), Random Forest (RF), and J48 [19].These machine vision classifiers are employed on optimized fused dataset.Before deploying the fused dataset on Weka software, it has been converted into the Attribute Relation File Format (ARFF).This fused dataset has also been compared to both individual texture and multispectral dataset.These machine vision approaches have the potential to analyze the fused dataset.For this fused dataset, it has been separated into 66% for training and 34% for testing with 10-fold cross-validation method and same strategy also has been implemented for individual datasets, namely, multispectral data and texture.Besides this, quite a few other performance evaluating factors, that is, mean absolute error (MAE), root mean squared error (RMSE), confusion matrix, true positive (TP), false positive (FP), receiver-operating characteristic (ROC), time complexity (), and overall accuracy (OA), have also been calculated.At first, the fused dataset for LU/LC classification has been employed with different machine vision classifiers, namely, MLP, RF, NB, and J48 with an optimized set of 35 features that have shown different accuracy results.The overall accuracy with different performance oriented parameters are shown in Table 3.
Table 4 represents a confusion matrix of fused dataset; it includes the information which is extracted by deploying the MLP classifier and diagonal of table shows the maximal values which are placed in five different LU/LC classes.MLP shows the best overall accuracy among different employed classifiers.
Fused dataset LU/LC classification graph of MLP is shown in Figure 2.This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes.Graphically data have been classified into five LU/LC classes, that is, "blue color" for fused dataset, "green color" for texture, and "red color" for multispectral dataset.Figure 2 explained the LU/LC data classification in MLP graph.Similarly, for texture and multispectral dataset, the same classifiers with same strategy have been employed as discussed in the above fused dataset.For texture dataset, 30 optimized texture features [20] have been deployed while, for multispectral dataset, 5 spectral features have been individually implemented.It has been observed that, for both texture and multispectral dataset, MLP classifier has shown the higher overall accuracy as compared to the others deploying classifiers.As a result, the deployed MLP classifier showed the higher overall accuracy with others performance evaluating parameters including kappa coefficient, TP, FP, ROC, MAE, RMSE, and time complexity factors [21,22].The overall accuracy with different performance evaluating parameters is shown in Table 5.Table 6 represents a confusion matrix for texture dataset; it contains the information which is actual and predicted data for MLP classifier.MLP shows the best overall accuracy among different employed classifiers.Texture dataset LU/LC classification graph of MLP is shown in Figure 2.This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes.
Figure 2 explained the LU/LC data classification in MLP graph.The overall accuracy of multispectral dataset with different performance evaluating parameters with details is shown in Table 7 [23].
It contains the information which is actual and predicted data for MLP classification system.MLP shows the best overall accuracy among different employed classifiers for multispectral datasets.MLP confusion table for multispectral dataset is shown in Table 8.
Multispectral LU/LC dataset classification graph of MLP is shown in Figure 2.This shows that each type of dataset has 100 data instances or (ROIs) and these ROIs or data have been classified into their five classes.Figure 2 explained the LU/LC data classification in MLP graph.
Finally, a comparative LU/LC classification graph of fused, multispectral, and texture dataset using MLP classifier is shown in Figure 3.This shows that each type of dataset has 100 data instances (ROIs) and these ROIs or data have been classified into their five classes.The classification graph for MLP classifier is shown in Figure 3.It has been observed that fused dataset has relatively better overall accuracy as compared to multispectral and texture dataset [24].It shows that data fusion plays a vital role for better land assessment, management, and accurate monitoring purposes [25,26].

Conclusions
This study is focused on the classification of five different types of LU/LC datasets.Four data mining classifiers, that is, MLP, RF, NB, and J48, have been employed on fused, texture, and multispectral dataset.These three types of dataset (fused, texture, and multispectral) have been examined for overall accuracy in classification with some other performance evaluating factors as discussed above in Results and Discussion.All the classifiers have shown satisfactory results, but multilayer perceptron (MLP) result was considerably better among all of them.It has been observed that, after deploying MLP, an overall accuracy of 96.67% for texture data, 97.60% for multispectral data, and 99.60 for fused dataset has been observed.Fused dataset has shown better overall accuracy  among all types of dataset.It has been observed that final classification results of three datasets are not differing too much but other performance evaluating factors, that is, kappa statistics, RMSE, TP, FP, MAE, and execution time also play an important role for analysis.It is worth mentioning here that photographic data (texture data) is the visual data and its visual frequency ranges from 400 nm to 700 nm which has classification accuracy of 96.67% while multispectral data include the visual plus nonvisual data (IR and SWIR) and nonvisual frequency ranges from 750 nm to 1650 nm and attained classification accuracy of 97.60%, while fused dataset which integrated both types of data, that is, multispectral and statistical texture, acquired better overall accuracy which is 99.60% as compared to multispectral and texture dataset.Finally, it is observed that as dataset features values have been increased, the overall accuracy results have also been observed better and this shows that multisource data integration significantly improves the analysis and classification of LU/LC types and the employed classification framework is a powerful tool to generate reliable, comprehensive, and accurate results for LU/LC classification.In addition, it has been observed that this method can be used for decisionmaking, future prediction, and quick and accurate analysis of land use and land cover, when employing sophisticated rules on multisource LU/LC datasets.In future, the effect of variation in light intensity with incident light angle will be verified.

Figure 3 :
Figure 3: A comparison classification graph of fused, multispectral, and texture dataset.

Table 1 :
Now Figure1describes the proposed methodology in detail.At first step, two different types of datasets are acquired, that is, image dataset and multispectral dataset.The second step employs different image preprocessing filters, that is, Sobel or Laplacian, to sharpen the images and extract firstorder and 2nd-order texture features.In step three, optimized features dataset has been acquired by implementing three combined feature selection techniques (F + PA + MI) and 30 most discriminant features are extracted.These 30 optimized texture features are shown in Table1.It has been observed in Table2that the Mutual Information (MI) based selected texture features are very much correlated, namely, "inverse difference moment," but these features have variation in interpixel distance and dimension and, due to this variation, their computed values are also different.For every pixel dis-Integrated texture feature selection table (F + PA + MI) for ROI (512 × 512).
7 = 30 optimized texture features + 5 spectral features  fused dataset tance () and angular dimension (), the different calculated values are acquired for this texture feature which is "inverse difference moment."For this study, we have taken  = 1, 2, 3, 4 and 5-pixel distance with angle dimension  = 0 ∘ , 45 ∘ , 90 ∘ , and 135 ∘ .So, as a result, we cannot ignore any value of the given texture features.MI based texture features values

Table 5 :
Texture data classification table.

Table 6 :
Confusion table of texture data for multilayer perceptron (MLP).

Table 8 :
Confusion table of multispectral data for multilayer perceptron (MLP).