A Deep Fusion Gaussian Mixture Model for Multiview Land Data Clustering

School of Software Technology, Dalian University of Technology, Dalian 116620, China Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province, Dalian 116620, China College of Natural Resources and Environment, South China Agricultural University, Guangzhou 510642, China Guangdong Province Key Laboratory for Land Use and Consolidation, South China Agricultural University, Guangzhou 510642, China Guangdong Province Engineering Research Center for Land Information Technology, South China Agricultural University, Guangzhou 510642, China


Introduction
With the rapid industrialization and urbanization over the world, environmental contamination is attracting increasing attention nowadays, which is caused by unreasonable usage of natural resources, such as the overuse of coal [1]. Among the environmental contamination, the status of soil contamination of heavy metals is the core concern of the public, with the scare of the heavy metal security of agricultural products that easily have a direct influence on our health [2]. A large number of researchers force on the control of soil contamination of heavy metals by mining intrinsic patterns hidden over various heavy metals which can do a favor to the contamination control and environmental protection. However, the correlation over various heavy metals and the highdimension representation of heavy metal data pose vast challenges on the accurate mining of patterns over heavy metals of soil contamination. With the continuous development of industrialization and urbanization, more research is still required to capture effective patterns of highdimension representation of heavy metal data, to control the soil contamination.
In recent years, large amounts of research have been proposed to learn patterns of data to improve our lives [3][4][5][6][7][8]. For example, Chen et al. used multivariate statistics and geostatistics to explore distributions of heavy metals in the soil of northwest China, which can capture pollution sources of heavy metals based on patterns of distributions [9].
Additionally, the chemical mass balance model, factor analysis, target transformation factor analysis, and principal component analysis are used to capture the complicated relationship of heavy metals [10][11][12]. Those statistical methods are able to mine patterns of heavy metals in simple cases where there are not many kinds of heavy metals. Also, they can only mine contamination patterns in a single view. In other words, those traditional statistical methods cannot well learn complex contamination patterns of heavy metals in the current soil, which are expressed by high-dimension data. Thus, to explore the complicated patterns of various heavy metals requires novel computing methods.
Clustering, as a fundamental approach to pattern mining, divides data into several groups based on data similarity; hence, data in the same group are more similar than data in different groups [13]. It is widely used in various domains, such as text recognition and image processing [14][15][16][17]. Among various clustering algorithms, the Gaussian mixture model, as a generating method, captures each cluster by a probability distribution, which well fit multiview characteristics of data in a natural manner [18]. Inspired by this, a Gaussian mixture model is introduced to mine the multiview heavy metal data. However, the current Gaussian mixture model-based methods neglect the multiview information of data, especially the deep intrinsic fusion features of all views.
To solve those challenges, in this paper, a multiview Gaussian mixture model is proposed to naturally capture complicated relationships over multiviews on the basis of deep fusion features of data, which can potentially mine robust patterns of heavy metals in practice. In particular, a deep fusion feature architecture with modality-specific and modality-common stacked autoencoders is designed to distill fusion representations from the information of all views. Then, the Gaussian mixture model is extended on the fusion representations to naturally recognize the accurate patterns of the intra-and inter-views. Extensive experiments are conducted on the representative datasets to evaluate the performance of the multiview Gaussian mixture model. Results show that the proposed method can greatly outperform the compared methods.
Thus, the major contributions of this paper are threefold: (i) To accurately capture complex patterns of heavy metal data, a multiview Gaussian mixture model is introduced based on the fusion representations, which fully considers information of each view in a nonlinear manner (ii) To distill fusion representations from the information of all views, a deep fusion feature architecture is designed, which consists of modality-specific and modality-common stacked autoencoders (iii) Extensive experiments with outperforming results are conducted to assess the performance on the representative datasets The rest of the paper is organized as follows. Section 2 reviews common methods in statistical learning about the pattern mining of heavy metals. Sections 3 and 4 are the fun-damentals of the proposed method. Section 5 describes the details of the proposed method, and Section 6 validates the proposed method. Finally, Section 7 concludes this work.

Related Works
To trace the source of the soil heavy metal pollution, a lot of statistical methods were proposed. Most of all can be grouped into the following: Linear regression. Because of its simplicity and efficiency, linear regression is a frequently used method [19]. It tries to find the best linear projection function through updating the parameters of the function using the least square method or the gradient descent method. For example, Tian et al. [20] improved the multiple linear regression (MLR) method to quantitatively estimate relationships between soil properties and sources of heavy metals. In MLR, heavy metal concentrations were regarded as dependent variables while the scores of soil properties and sources were independent variables. However, due to the influence of various complex factors, such as climate, parent material, topography, and human activities, the linear projection cannot well model correlations between the environmental parameters and the soil properties in the practice of soil pollution research [21].
Decision tree. Decision tree methods such as classification and regression tree (CART) and random forest (RF) use a tree structure for deciding classification results by judging from the root to leaves [22,23]. For example, Qiu et al. [24] applied stepwise linear regression (SLR), CART, and RF to the prediction of the soil Cd's spatial distribution. In that article, RF was the best method for handling the nonlinear and hierarchical relationships between soil Cd and influence factors. Wang et al. [25] aimed to use RF and the stochastic gradient boosting (SGB) method for identifying and apportioning heavy metal pollution. Both RF and SGB showed that the biggest reason for the concentrations of Pb and Cd was anthropogenic sources.
Neural network. The neural network imitates the mechanism of human brains, recombining the information of input to extract some simple and fuzzy features, producing the corresponding impression and judgment. Furthermore, nonlinear activation functions of each layer, such as the sigmoid function and Rectified Linear Unit (ReLU) function, play a great role in the nonlinear fitting ability. One representative work is [26]. Specifically, neural networks with Monte Carlo simulations are combined to address the uncertainties from data quality and measurement errors in predicting the copper's phytoavailability in contaminated soils against the soil input parameters.
Principal component analysis (PCA). The principal component analysis uses the covariance matrix of data matrix for choosing principal components of data so that it can eliminate the less important properties for reducing the dimension of data and extracting hidden subsets to detect possible sources. For surveying the Chinese farmland soil metal accumulation at the national scale, Niu et al. [27] performed multivariate statistical analysis on soil properties and metal concentrations using PCA and correlation analysis. Research results on 11 metals showed that Pb, Cd, Zn, and Wireless Communications and Mobile Computing Cu had the concentrations above reference values. At the same time, results indicated that the 4 metals' accumulation may be associated with artificial fertilization. Also, Sun et al. [28] used PCA and correlation coefficient analysis to mine the agricultural soil major and trace element accumulation in the Gannan area, China. More PCA-based research includes [29,30].
Cluster analysis (CA). CA classifies the data points into several disjoint and nonempty clusters on the basis of the similarity or distance among data points. There are various clustering algorithms used in the heavy metal analysis, such as spectral clustering, K-means, and hierarchical clustering. For the characterization of heavy metals in soils, Chai et al. [31] performed PCA and clustering analysis on data from the surface and underlying horizons of grassland. Three principal components were extracted, and hierarchical clustering proved this result. Moreover, in the three clusters from hierarchical clustering, clusters 1 and 2 were merged at a higher level so that the heavy metals in clusters 1 and 2 had a similar source. Similarly, Liu et al. [32] applied PCA and clustering analysis on data from the outskirts of Changchun, China. Results showed that Pb, Cu, and Zn were from human activities, while Cr and Ni were from natural sources.
In summary, the above methods can mine patterns of heavy metals in soil in simple cases where there are not many kinds of heavy metals. However, they neglect the multiview characteristics of land data, leading to undesired result patterns in complicated cases. Also, those methods cannot capture intrinsic patterns within high-dimension representations of land data. To solve those challenges, a deep fusion Gaussian mixture model for multiview land data clustering is proposed in this paper.

The Deep Stacked Autoencoder
The deep stacked autoencoder is a neural network of the fully connected paradigm on the basis of autoencoders, as shown in Figure 1 [33][34][35]. It extracts instinct representations of data by data reconstruction between an encoder and a decoder where the encoder constructs deeper representations layer by layer with the decoder reconstructing the input [36][37][38]. The deep stacked autoencoder is trained by a greedy layer-wise method in which each layer in the encoder and the corresponding layer of the decoder are modeled as an autoencoder to obtain the pretrained parameters followed by an end-to-end fine-tuning training.
Specifically, in a deep stacked autocoder of l layers, the s -th layer is modeled as an autoencoder with the ðl − s + 1Þth layer to pretrain weights and biases in the following form: where w s , w l−s+1 , b s , and b l−s+1 are the weights and biases of the s-th layer and the ðl − s + 1Þ-th layer, respectively. ⊙ is the matrix product. h denotes the hidden representation.
After the pretraining, each hidden layer in the deep stacked autocoder is fine-tuned as follows: which is based on the stochastic gradient descent algorithm.

The Gaussian Mixture Model
A Gaussian mixture model (GMM) is a generative probabilistic model with trainable parameters [16]. It uses several basis Gaussian components to naturally represent multimodal characteristics of collected data by a weighted superposition operation, where each Gaussian component denotes a modal source. Generally, the Gaussian mixture model is trained by the expectation-maximization method by maximizing the likelihood function, where the expectation step computes probability distributions of each sample generated from each basis component and the maximization step learns the mean, covariance, and weight parameters of each basis component. GMMs have been widely used in various applications, such as text clustering and image recognition. Given a dataset X = fx 1 , x 2 ,⋯,x N g with x i ∈ R d , the Gaussian mixture distributions are denoted as where w k is the weight of each basis Gaussian component and gðx i ; μ k , Σ k Þ represents the basis distribution parameterized by the mean vector μ k and the covariance matrix Σ k with the following form: The d is the dimension of data, and K is the number of basis Gaussian components.
Thus, to fit the given dataset X = fx 1 , x 2 ,⋯,x N g, the logarithm likelihood function of GMM is expressed in the following form: where z i ∈ f0, 1g K , ∑ K k=0 z ik = 1, denotes the component from which x i is generated. Then, setting the derivates of log L to 3 Wireless Communications and Mobile Computing be zero, we can get the computing equations of the mean, covariance, and weight parameters of each basis component.
in which Generally, the expectation-maximization method is used to train GMM in an iterative maximization manner where current parameters are employed to estimate future parameters.

The Multiview Fusion Gaussian Mixture Model Algorithm
To mine complicated fusion relationships over multiview data, a deep fusion representation-based Gaussian mixture model is proposed, which is composed of the deep fusion feature learning and the expectation-maximization clustering.
In the deep fusion feature learning, intrinsic view-specific features are first extracted by each view-specific stacked autoencoder. Then, those view-specific features are concentrated via a view-common stacked autoencoder, capturing fusion representations of multiview data. In the expectation-   Figure 2: The computing paradigm of the multiview fusion Gaussian mixture model. Modality-specific encoders, modality-common encoderdecoder, and modality-specific decoders are linked in a cascaded manner where data are transferred into hidden representations of each view by modality-specific encoders; then, those hidden representations are concentrated, which are reconstructed via the modality-common encoderdecoder, and finally, the reconstructed hidden representations are decoded into the original data space by modality-specific decoders. 4 Wireless Communications and Mobile Computing maximization clustering, the Gaussian mixture model is used to recognize structure patterns of complicated shapes.

The Deep Fusion Feature Learning.
To obtain the effective representations of multiview data, a deep fusion architecture is designed on the basis of the unsupervised encode-decode manner, which can avoid the dimensionality curse of data. As shown in Figure 2, in the deep fusion architecture, all the views of data are simultaneously fed into the corresponding view-specific stacked autoencoders, learning intrinsic view-specific features. In detail, given the multiview dataset fx 1 , i Þ, each sample x i is mapped to the view-specific feature space as follows: To train those parameters, the features of all views are mapped to original data space as follows: where g j l ðg j l−1 ð⋯ðg j 1 ðÞÞÞÞ denotes the decoding network function. The view-specific encoder is cascaded by the corresponding decoder to get the pretrained weights and biases with the help of the stochastic gradient descent algorithm via the end-to-end training.
After the view-specific intrinsic representations fh 1 i , h 2 i , ⋯,h v i g are obtained; they are concentrated in the following form: where conðÞ is the linear concentration function. Then, a view-common stacked autoencoder is used to transfer the concentrated representations to a fusion feature space, learning fused representations of multiview data via in which encoderðÞ and decoderðÞ are deep neural networks with the same number of layers.

The Clustering Pattern
Mining. Specifically, after obtaining the fusion representations of the multiview dataset f f 1 , f 2 ,⋯,f N g, the Gaussian mixture model with K basis components is defined as follows: where w k denotes the weight of the k-th basis Gaussian model, f i represents the i-th fusion representation, and gðf i ; μ k , Σ k Þ is the basis distribution parameterized by the mean vector μ k and the covariance matrix Σ k with the following form: The d is the dimension of fusion representations of data. Thus, the logarithm likelihood function of the given data is expressed in the following form: where z i ∈ f0, 1g K , ∑ K k=0 z ik = 1, denotes the component from which f i is generated.
Then, setting the derivates of log L to be zero, we can get the computing equations of the mean, covariance, and weight parameters of each basis component.

The Multiview Fusion Gaussian Mixture Model
Algorithm. The multiview fusion Gaussian mixture model algorithm consists of two steps, i.e., fusion feature learning and pattern mining. In the former step, all view-specific stacked autoencoders and view-common stacked autoencoders are trained in a greedy layer-wise unsupervised manner. Then, an end-to-end fine-tuning training is conducted on the basis of SGD. In the latter step, the fusion features of multiview data extracted in the former step are fed into the multiview Gaussian mixture model with the predefined K. Then, the parameters in each component Gaussian model and weight coefficients between Gaussian models are learned based on the expectation-maximization algorithm. The details of the multiview fusion Gaussian mixture model algorithm are shown in Algorithm 1.

Experiments
To evaluate the performance of the multiview fusion Gaussian mixture model, extensive experiments are conducted on two datasets. Those experiments are implemented by Python, and the details of the experiments are described in the following.
6.1. Compared Methods. K -means. K-means is a typical clustering method that is widely used in practice as a representative baseline.
Gaussian mixture model. The Gaussian mixture model is a generative method based on the probability distribution. It mines cluster patterns of data by multiple Gaussian distributions.
In the experiments, the K-means and Gaussian mixture model are used as the based model, which are extended to modality-specific, modality-common, modality-fused methods with respect to raw, shallow, and deep representations of data.
6.2. Datasets. MNIST-EMNIST. MNIST [39] and EMNIST [40] are the representative datasets of images, which contain images of numbers from 0 to 9. They are widely used in image classification and image clustering. In the experiments, MNIST and EMNIST are fed into a fully connected neural network and a convolutional neural network, respectively, in feature learning to represent different views. The results are illustrated in Tables 1-4. Also, Figure 3 visualizes the feature learning processing. The multiview fusion Gaussian mixture model algorithm. Input: the multiview dataset fx 1 , x 2 ,⋯,x N g, the number of component models K, the hyperparameters of deep fusion architecture Output: patterns of the input data 1. To randomly initialize parameters of each autoencoder in the deep fusion architecture; 2. To train each autoencoder layer by layer; 3. To fine-tune the deep fusion architecture in an end-to-end manner; 4. To randomly initialize model parameters and weight coefficients of Gaussian models; 5. To compute the probability of each sample generated from each Gaussian model; 6. To compute the model parameters and weight coefficients of each Gaussian model; 7. To update model parameters and weight coefficients of Gaussian models; 8. Go to 5 until convergence, then output the probability of each data sample generated from each Gaussian model as patterns of the input data.     From the above results, several observations can be concluded. In raw representations of data, K-means produced better results than GMM in terms of ARI and NMI. This is because the less important properties in raw representations are also modeled by the probability distributions of GMM, decreasing the clustering performance. The second observation is that the deep feature-based methods (K-means-DM, GMM-DM) outperform the shallow methods (K-means-M, GMM-M), since the proposed modality-specific stacked autoencoder can well extract intrinsic features of each view of data. Additionally, the clustering results of GMM-DM are better than those of K-means-DM, since the multiple Gaussian distributions in GMM can better fit patterns of data than the hard division in K-means with the clear features. The third observation is that the proposed method achieves the best results in terms of ARI and NMI, since it can distill information from all views by the designed deep fusion network. The observations of the results demonstrate the outperformance of the proposed method. Figure 3 shows the t-SNE figures of the above models to visualize features learned by each model. There are two observations. First, the fusion model learns better representations than each single-view model. Specifically, the proposed model produces features where the distance of similar data is closer than that of dissimilar data, shown in the third column. Furthermore, the distance between different clusters is further. Second, the proposed model learns data representations faster than single-view models. In detail, the representations produced by the fusion model are more disorderly than those by the compared models at the beginning, while the fusion model achieves better representations after the same number of training epochs.

Conclusions
In this paper, a deep fusion Gaussian mixture model is proposed for multiview data clustering based on deep fusion representations, which can potentially capture intrinsic patterns of heavy metal data. In this model, a deep fusion feature architecture of modality-specific and modality-common stacked autoencoders is designed to merge fusion information of all views of data, which can well capture deep intrinsic fusion representations of data. Afterward, the Gaussian mixture model is extended on the fusion representations to naturally recognize the accurate patterns. Finally, results show the outperformance of the proposed methods by extensive experiments. In the future, more effective deep clustering methods will be explored, which are trained in an end-to-end manner.

Data Availability
The datasets used in this paper are public datasets which can be accessed by the following websites: MNIST and EMNIST (https://pytorch.org/docs/stable/torchvision/datasets.html).