Image Retrieval Based on Multiview Constrained Nonnegative Matrix Factorization and Gaussian Mixture Model Spectral Clustering Method

Content-based image retrieval has recently become an important research topic and has been widely used for managing images from repertories. In this article, we address an efficient technique, called MNGS, which integrates multiview constrained nonnegative matrix factorization (NMF) and Gaussian mixture model- (GMM-) based spectral clustering for image retrieval. In the proposed methodology, the multiview NMF scheme provides competitive sparse representations of underlying images through decomposition of a similarity-preserving matrix that is formed by fusing multiple features from different visual aspects. In particular, the proposed method merges manifold constraints into the standard NMF objective function to impose an orthogonality constraint on the basis matrix and satisfy the structure preservation requirement of the coefficient matrix. To manipulate the clustering method on sparse representations, this paper has developed a GMM-based spectral clustering method in which the Gaussian components are regrouped in spectral space, which significantly improves the retrieval effectiveness. In this way, image retrieval of the whole database translates to a nearest-neighbour search in the cluster containing the query image. Simultaneously, this study investigates the proof of convergence of the objective function and the analysis of the computational complexity. Experimental results on three standard image datasets reveal the advantages that can be achieved with the proposed retrieval scheme.


Introduction
With the increasing abundance of digital images available from a variety of sources, content-based image retrieval (CBIR) from the huge databases has attracted a lot of attention in the past decade [1][2][3].An effective CBIR system should search images by computing the similarity of the extracted features (views) between the user-defined query pattern and images in large-scale collections.Existing visual features include, but are not limited to, intensity, shape, colour, scene, texture, and local invariant.The early CBIR system calculates the similarity with only one feature [4,5].This would normally lead to undesirable retrieval results due to insufficient representation.That is, it is quite difficult to effectively distinguish all types of images using a single feature.Generally, to achieve proper results in the CBIR framework, appropriate features that can well capture the meaningful contents of underlying images are usually integrated [6][7][8].However, these visual features are often high dimensional and nonsparse, and direct manipulation of feature descriptors is the most time-consuming operation.To solve this limitation, reduction in dimensionality is widely used.Some popular dimensionality reduction techniques include linear discriminant analysis (LDA) [9], principal component analysis (PCA) [10], independent component analysis (ICA) [11], and singular value decomposition (SVD) [12].Nonnegative matrix factorization (NMF), as a novel tool for source separation [13][14][15], can be an alternative way to reduce the dimensionality.It decomposes a nonnegative matrix into two small nonnegative matrices: basis matrix and coefficient matrix.The basis characteristic of NMF is that all elements are not negative, which distinguishes it from other conventional dimensionality reduction techniques.The coefficient matrix models the features of images as an additive combination of a set of basis vectors.However, as is well known, the original NMF does not always yield a structure constraint for sparse representations of features during decomposition [16].In recent years, various researches have been reported to extend the standard NMF by enforcing a structure preservation constraint on the objective function [17,18].Among them, a graph-embedding objective function of NMF encodes the graph information of the images into the sparse representation [19].Liu et al. [20] introduced constrained nonnegative matrix factorization (CNMF) in which the label information considered as an additional hard constraint of semisupervised retrieval is directly incorporated into the original NMF.Another fashionable NMF algorithm, topographic NMF (TNMF), was proposed by Xiao et al. [21], in which they imposed a topographic constraint on the objective function to pool together structure-corrected features.The normalization strategies proposed for all above-mentioned NMF-based techniques pertain to the coefficient matrix of NMF decomposition.In fact, studies on the constraints of the basic matrix are still limited.
More recently, some works based on statistical model frameworks have been reported for image retrieval [22,23].For example, Zeng et al. [24] introduced an image description algorithm that characterizes a colour image by combining the spatiogram with Gaussian mixture model-(GMM-) based colour quantization.Similarly, another colour image indexing method through spatiochromatic multichannel GMM was introduced by Piatek and Smolka [25].Marakakis et al. [26] proposed a relevance feedback method for CBIR using GMM as image representations where Kullback-Leibler (KL) divergence is employed.Its retrieval capability mainly relies on the facts that mixture modelbased techniques have provided powerful methodologies for data clustering [27,28].This type of technique has the capability to model the uncertainty in a statistical manner.Specifically, GMM fits different shapes of observed data using multivariable Gaussian distribution.A special virtue of the GMM is that it requires estimation of a small number of parameters.
As the aforementioned discussions, in this paper, we propose a novel technique combining multiview constrained NMF and GMM-based spectral clustering (MNGS) for image retrieval.It is noteworthy to highlight the following attractive characteristics of the proposed MNGS.First, multiple features are extracted from the underlying images, and then MNGS integrates these features to obtain a similaritypreserving matrix.Second, we incorporate two constrained terms into the original NMF objective function to represent latent feature information in a low-dimensional space.The first constrained term will help guarantee the basic matrix orthogonality as much as possible to reduce the redundancy.Therefore, this constraint will tend to obtain competitive sparse representations of the visual features.The remaining constraint allows us to consider the latent graph information of the images and satisfy the structure preservation requirement.More importantly, this study provides the proof of convergence of the objective function in detail to ensure that the algorithm converges to the local minima during decomposition.Third, a multivariable GMM is embedded into the proposed MNGS to model the distribution of the sparse features in terms of the coefficient matrix of NMF.Consequently, images with sparse features belonging to the same Gaussian component are similar, so these images can be labelled with the same subcluster.Considering the complexity of images in repertories, it becomes natural for one to assign more components of GMM to label images.In general, the larger the component, the more accurate the indexing results using GMM.However, the computational cost is more expensive owing to the learning of parameters of GMM.To match the optimal number of components, inspired by the work of [29], finally, spectral clustering based on KL divergence is utilized to merge the GMM components and achieve the desired retrieval results.Specifically, by the eigendecomposition of the similarity matrix measured by KL divergence, similar GMM components can be grouped into several spectral components in a lower-dimensional spectral space.Thus, each sparse feature can be labelled by both a GMM component and a spectral component, which might lead to more accurate clustering results.With the label information of the query image using the mentioned statistical model, similar image retrieval can be effectively performed with the clustering results.
The rest of this paper is organized as follows.Section 2 introduces the related work, including the classical NMF and GMM.In Section 3, we describe the details of the proposed framework, followed by the proof of the convergence of this approach and the complexity analysis in Section 4. Retrieval experiments conducted on real-world image datasets are discussed in Section 5. Section 6 reports the concluding remarks and some suggestions for further research.

Preliminaries
This section briefly reviews the classical NMF and GMM.The former is reasonably attractive owing to its intrinsic advantage of providing a low-dimensional description for nonnegative data.The latter shows its accuracy and effectiveness in most clustering tasks.

Nonnegative Matrix Factorization.
The NMF is commonly used to decompose a matrix into two nonnegative matrices under the condition that all its elements are nonnegative.Mathematically, given a nonnegative data matrix  = [ 1 ,  2 , . . .,   ] ∈ R × , NMF aims at finding two nonnegative matrices  = [  ] ∈ R × and  = [  ] ∈ R × such that the original data matrix  can be well approximated by where  is a new reduced dimension (inner dimension) and satisfies  < min{, }.In this way, each column of   can be regarded as a sparse representation of the associated column vector in .There are many criteria to solve the factoring problem and evaluate the quality of where ‖ ⋅ ‖  denotes the Frobenius norm.This objective function is proved to be nonconvex and nonincreasing [16].To minimize the above objective function, Lee and Seung [30] derived the multiplicative updates of the basic matrix  and coefficient matrix , respectively, as follows: The above multiplicative updates can ensure convergence to a local optimal solution.Thus, the iteration stops when the objective function converges or the maximum number of iterations is reached.

Gaussian Mixture Model.
The classical Gaussian mixture model assumes that each observed data point is dependent on the label C  .The density function of GMM at each observation   can be described by where Π = { 1 ,  2 , . . .,   }, and the prior distribution   indicates the probability of each component belonging to GMM, which satisfies the constraints Expression (4) can be regarded as a linear combination of several Gaussian components.Each component is a Gaussian distribution that has its own covariance   and mean   , defined as To optimize the parameter set C  = {  ,   }, the loglikelihood function of (4) must be maximized by the expectation maximization (EM) algorithm [31], which is expressed as

The Proposed Method
In this section, we introduce an image retrieval approach, called MNGS, which consists of four major parts: feature extraction, multiview constrained NMF, GMM-based spectral clustering, and similarity ranking.Figure 1 shows the overall framework of the proposed MNGS method.
Mathematical Problems in Engineering 3.1.Feature Extraction.Images may be represented in terms of different features, each of which implies different information, such as orientation, texture, intensity, scene, colour, and spatial relation.To construct the image feature space, this paper utilizes several visual descriptors for the detection of objects.Scene characteristics play an important role in the description of the object.To obtain the scene information, in this study, the energy spectrum of the grey-scaled image is first filtered by 32 Gabor filters in 4 frequency bands with 8 orientations.Each filtered spectrum image is then decomposed into 4 × 4 grid subregions leading to a 512dimensional Gist [32] feature.
Orientation information is a common feature for each object.Here, the grey-scaled image with Gamma correction is divided by 3 × 3 blocks, which consist of 4 × 4 cells.By calculating 8 gradient histograms of each cell, the HOG feature [33] with 1152-dimensionality is obtained.
The texture feature is one of the most important characteristics for image retrieval strategy.This is because each object has its own texture in nature.According to [34], LBP is highly suitable to characterize the texture of an object; therefore, this paper adopts a 512-dimensional LBP feature.More specifically, LBP redefines the grey value of each pixel by thresholding a 3 × 3 neighbourhood, and the histogram of this new grey-scaled image is defined as the LBP feature.
Generally, features extracted from the natural image include colour information.Therefore, it is very important for the image retrieval system to select a colour descriptor that is robust to illumination, colour, and hue.This paper computes a 64-bin histogram of each RGB channel and lists them together, leading to a 192-dimensional ColorHist [35].
Thus, each image can be simultaneously described by four visual features, such as Gist, HOG, LBP, and ColorHist.

Multiview Constrained NMF.
This paper seeks a novel constrained objective function to make it converge to a local minimum.We thus first consider incorporating the structure constraint into the conventional NMF framework for similarity preservation. Let We then adopt a Gaussian-like heat kernel weighting as follows: where the scalable parameter  (V) stands for the median of all paired distances measured by the Frobenius form.We can then construct the following feature matrix to fuse the different views: where  V is the view number; in the current study,  V = 4.In classical NMF, the coefficient matrix cannot reflect the latent similar contents shared by different views, which is meaningful to the image retrieval system.To solve this problem, a symmetric similarity measurement matrix is introduced to describe the closeness of the considered images, which is expressed as where   is a unit matrix with size  × .With the similarity matrix defined above, we can enforce the structure constraint into the matrix factorization process by introducing the following regularization: where tr(⋅) denotes the trace of a matrix,  is a Laplacian matrix  = Λ −  [36], and Λ represents a diagonal matrix whose element corresponds to Λ  = ∑    .
In addition, considering the redundancy of the basic matrix, the current paper imposes an orthogonality constraint on each basic vector and expects this orthogonal regularization to sharply reduce the redundancy and make the basic matrix near-orthogonal.Here, the regularization for the basic matrix is defined as Based on the discussions above, by introducing the structure and orthogonality constraints into the objective function of a classical NMF framework, we obtain the proposed objective function formulated by arg min where  and  are two nonnegative parameters for balancing the factorization error and regularized constraints.
Because the objective function of the proposed regularized NMF is not convex, it is unrealistic to search for a global solution with respect to both  and .To solve the optimization problem, instead, a novel multiplicative update scheme for  and  is introduced in this paper to achieve a local minimum.
Considering the nonnegativity of the two variables  and , the Lagrange multipliers Φ = [  ] ∈ R × and Ψ = [  ] ∈ R × are utilized to optimize the objective function (13).This yields the corresponding Lagrange function written by To find  and  that minimize L, we take the partial derivatives of L with respect to  and  on both sides Applying the Karush-Kuhn-Tucker condition [37]     = 0 and     = 0 lead to the following update formula for  and : Based on this condition, it is easy to derive the ultimate update rules as follows: Regarding the convergence of the update rules, we have the following theorem.
This theorem guarantees that the proposed objective function can converge to a local optimum, and its proof is given via an auxiliary function in Section 4.

GMM-Based Spectral
Clustering.The coefficient matrix of the proposed multiview NMF can be regarded as a lowrank representation of the latent sparse features from various sources of data.Specifically, each row of the coefficient matrix represents the feature of the associated training sample.Thus, once we obtain the coefficient matrix, clustering of the training data is realized by using the common clustering method, such as -means or fuzzy -means (FCM) on the sparse features.In our method, to achieve this, we applied GMM to analyse the data of the coefficient matrix and cluster  = [ 1 ,  2 , . . .,   ]  into  labels, where   is a -dimensional feature vector.Labels are denoted by (C 1 , C 2 , . . ., C  ).The proposed GMM-based spectral clustering strategy assumes that it is possible to fit the distribution of different features of the coefficient matrix using a GMM with parameter Ξ = {  , C  = {  ,   }}.In light of (7), it is clear that the EM algorithm cannot be directly adopted to optimize these parameters.To solve this problem, Jensen's inequality in the form of log(∑  =1   ) ≥ ∑  =1   log() is introduced.Thus, we can rewrite the log-likelihood function (7) as follows: With the Bayesian theory, the posterior probability   indicates the possibility of   belonging to the  component by Considering the constraints 0 ≤   ≤ 1 and ∑  =1   = 1, the prior distribution   can be estimated by setting the partial derivative of the objective function (Ξ) over it to zero: where  is the Lagrange multiplier.Equation ( 22) can yield the following form at step ( + 1): Similarly, taking the derivative of the objective function with respect to the mean   and covariance   as zero, we obtain their estimations as follows: According to the posterior probability, the final labelling result is determined by A crucial problem in GMM-based clustering is how to select a proper number of Gaussian components (subclusters).Owing to the complexity of the training samples, generally, the number of components in GMM is expected to still be larger than the number of artificial labels.The subclusters consisting of sparse features labelled by similar GMM components are then regarded as similar.Finally, we merge the subclusters using spectral clustering to match the number of artificial labels.In our method, two main successive steps are considered.First, the similarity measurement of different Gaussian components is implemented using KL divergence [38], defined as In particular, according to (27), the explicit expression of KL divergence between two Gaussian distributions can be obtained by After similarity measurement has been completed, second, the generation of a symmetric similarity matrix is critical to the success of spectral clustering.We then transform the subclusters obtained by GMM into a two-dimensional eigenspace by utilizing eigenvalues and eigenvectors of the symmetric similarity matrix defined by The eigenvalues and eigenvectors can be obtained by eigendecomposition of Laplacian matrix  GMM = Λ GMM −  GMM , where Λ GMM is a diagonal matrix with Λ GMM  = ∑  S GMM  .According to graph Laplacian theory [39], the crucial feature information of the subclusters is contained in the eigenvectors corresponding to the second and third minimal eigenvalues.Assume that two chosen eigenvectors are . Some conventional clustering approaches, such as -means and FCM, are then conducted on the points {(V 1  , V 2  ) |  = 1, . . ., } to obtain the clustering results.Finally, if some subclusters belong to the same labels in the spectral space (Ω 1 , Ω 2 , . . ., Ω  ), where  is the number of artificial labels, the samples belonging to each subcluster are treated as similar and can be merged as a new spectral component Ω  .The framework of the MNGS method is outlined as follows.

Similarity Ranking.
The related kernel matrix of the similarity measurement between a given query image and training samples can be expressed by where  (V) test represents the Vth feature of the query image and  (V)   corresponds to the same type of feature about the training sample.Meanwhile, using a linear projection matrix , We can directly project the kernel matrix (30) into the lowdimensional space and finally obtain the sparse representation of the test sample as follows: To rank the training samples based on its similarity against the query image in descending order, the probability that the query image belongs to different GMM components should be computed first.
. ., } to compare the similarity between different spectral components against the query.With the rules and similarity measurements mentioned above, all training samples can be sorted in a reverse order (e.g., the third step, the second step, and the first step).Finally, for each query, we can retrieve the best matches for it using this queue.
Input.Multiview feature fusion matrix  using (9), the inner dimension , the regularized parameters (, ), the components  of GMM, and the image dataset.

Convergence and Computational Complexity Analysis
The convergence proof of the proposed method benefits from an auxiliary function, which is characterized by the following lemma.
Lemma 2. If ℎ is an auxiliary function of  and the conditions ℎ(,   ) ≥ () and ℎ(, ) = () are satisfied, then  will be convergent under the update Proof.Obviously, according to the conditions, we have The equality ( +1 ) = (  ) holds only if   is the local minimum of ℎ(,   ).
Because the update operations defined by ( 18) and ( 19) are element-wise in nature, if we let  be a constant, it is sufficient to verify that (, ) = () is convergent for any element   in .To accomplish this, this paper defines an auxiliary function regarding    as follows: From (36), it is easy to find ℎ( t  ,    ) = (   ); thus, the problem is reduced to prove ℎ(,    ) ≥   ().To achieve this, the auxiliary function in ( 36) is compared with the Taylor series expansion of   (), given by where    and    represent the first and second derivatives, respectively.Using (33), we have Consequently, comparing (36) to (37) by employing (38), the problem of ℎ(,    ) ≥   () can be equivalently transformed to prove With the definition of matrix multiplication, the left hand of the inequality above is certified as Additionally, as mentioned in Algorithm Substituting ℎ(,    ) of ( 36) into (34), we can arrive at the local minimum of the auxiliary function denoted by We rewrite (42) as It is obvious that ( 43) is equivalent to the expression of ( 16); therefore, the minimum solution ( 42) is equivalent to the update rule for the basis matrix in (18).Now, combining ( 36), ( 37), (39), and ( 42), we can obtain Next, following a procedure similar to that described above, fixing  as a constant, the update rule for  will be Mathematical Problems in Engineering proved to be nonincreasing.Let   () represent the elementbased objective function.Similarly, the proof begins with the introduction of an auxiliary function with respect to    , defined by Because ℎ(   ,    ) = (   ) is obvious, the Taylor series expansion of   () is then utilized to prove the inequality ℎ(,    ) ≥   (), expressed as According to (33), the corresponding first-and second-order derivatives of   () regarding    can be formulated by We could find without difficulty from ( 45) and ( 46) that ℎ(,    ) ≥   () yields Obviously, we have the following inequality: Substituting ℎ(,    ) into (34), the update rule of  in ( 19) can be obtained as a local optimum of the auxiliary function (45): Again combining ( 45), ( 46), (48), and (50), the following inequality is inferred as 44) and (51), the objective function (, ) can be proved to be nonincreasing with (18) and (19) as  ( +1 ,  +1 ) ≤  (  ,  +1 ) ≤  (  ,   ) . (52) Thus, we have proved the convergence of Theorem 1.

Complexity Analysis.
This subsection mainly analyses the computational complexity of the proposed MNGS algorithm using the large O notation.The computational cost of MNGS primarily comprises three parts.First, we need O(2 2 (∑  V V=1  V )) float point operations to obtain the multiview feature fusion matrix  and the similarity measurement matrix  in terms of ( 9) and (10), respectively.Based on the updating rules reported by ( 18) and ( 19), the cost required for updating matrices  and  is then O( 2 ).Additionally, the proposed MNGS requires O( 3 ) to work with the matrix operation for each component of GMM and O( 2 ) to label the membership for each sample of the subclusters.Thus, assuming that subclusters , samples , and iterations  have been considered, the total time complexity of the EM algorithm is O( 2 +  3 ).Finally, concentrating on the spectral clustering represented by ( 28) and ( 29), the cost spent on the similarity matrix of GMM components is O( 2  3 ).Except for this cost, the computational complexity of clustering the similar components is O( 3 ).Considering the relationships  >  and  > , the time cost of the spectral clustering can be ignored.Thus, from the foregoing complexity analysis, if the algorithm terminates after  iterations, the maximum overall cost of the proposed

Experimental Results
This section presents a series of experiments to verify the accuracy and effectiveness of the proposed MNGS algorithm.In the process of training, the searching performance evaluates the potential clustering capability held by multiview constrained NMF.Once the clustering result is obtained using GMM-based spectral clustering, all  training images in the dataset are ranked according to the probabilities that the images and the query belong to the clustering components in the retrieval phase.A returned image set with retrieval length L in the form of a percentage is then viewed as the  × L images nearest the query.Three publicly available datasets used for the present study include Calth-256 (Calth-256: http://www.vision.caltech.edu/ImageDatasets/Caltech256/) [40], CIFAR-10 (CIFAR-10/100: http://www.cs.toronto.edu/∼Ekriz/cifar.html)[41], and CIFAR-100 (CIFAR-10/100: http://www.cs.toronto.edu/∼kriz/cifar.html)[41].The evaluations are carried out on a general purpose computer with an Intel Core 2 Duo 2.1 GHz CPU (T6570) and 3 GB of RAM under the Windows 7 environment.All algorithms have been implemented using the MATLAB 2010b software application.

Preparation for Datasets.
The Calth-256 dataset includes 30,607 images and consists of 257 categories; each category contains more than 80 colour images.These images cover a wide variety of nature scenes and artificial objects.The experiment randomly selects four categories, named friedegg, touring-bike, tweezer, and watermelon, with a total of 415 images as training samples, and forty images are randomly chosen from these four categories as query images.
CIFAR-10 and CIFAR-100 are databases of 60,000 images with 10 groups and 100 groups, respectively.For the former dataset, we randomly consider 541 images selected from 10 groups as the training set, and 10 images from each group are randomly selected as query samples.We also perform the experiment over the CIFAR-100 dataset, in which we select 400 images from four categories: beetle, bicycle, chair, and mountain.Similarly, the testing set consists of forty images randomly selected from the four categories.In this work, the images are resized to the resolution of 128 × 128 pixels for convenience of feature extraction.

Parameter Selection.
It can be seen from Section 3 that there are four parameters in the proposed MNGS: inner dimension  of NMF, component numbers  of GMM, and regularization parameters , .It is natural to think that the retrieval accuracy might be affected by these parameters; therefore, this subsection first discusses the parameters we choose for the proposed method.The evaluation is performed based on the two most commonly used metrics: precision and recall rate [42].The precision is defined as the ratio of the relevant images in all retrieval images: Precision = Number of relevant images with retrieval length Total number of images with retrieval length .
The recall represents the ratio of the retrieved relevant image to all relevant images in the dataset, defined as Recall = Number of relevant images with retrieval length Total number of relevant images in the dataset . (54) The NMF-based method has shown great advantage in working with high-dimensional data because NMF can reduce the dimensionality of considered data while preserving the characteristics of the underlying data.It is clear that NMF with a small inner dimension  can efficiently reduce the complexity of decomposition.However, an excessively small  could lead to an unfavourable retrieval result because of the lost information of the data.Conversely, having a large inner dimension  may incur extra computational cost.Thus, one key issue is that  can yield a "meaningful" retrieval result.Following a similar consideration to [43], the work first discusses the influence of , for example,  = 8, 16, 32, 64, 96, 128, on the retrieval results when parameters  = 0.15 and  = 0.325 are fixed.Table 1 summarizes the corresponding results under different choices of inner dimension .As we can observe from this table, for the proposed framework, its performance at  = 64 for datasets Calth-256 and CIFAR-100 and  = 32 for CIFAR-10 is better than that at other  values.Next, we test the performance of the proposed method with varying components  of GMM.As mentioned before, each artificially selected category can be further subdivided In the proposed MNGS, there are two regularization coefficients , , which represent the trade-off between factorization errors and regularized constraints.The following values for regularization coefficient  are considered first: 0.05, 0.15, 0.35, 0.55, 0.75, and 0.95.Table 3 illustrates the comparison using different datasets with a fixed coefficient  = 0.325, and the retrieval results indicate that the best performance is obtained using  = 0.15.Similarly, we investigate the effect of parameter  on the retrieval performance, and experimental results using different values of  are depicted in Table 4.In this case, we can see from the table that using  = 0.325 obviously leads to both the highest precision and recall rate for all three datasets.From these results, we find if the parameters are set to  = 0.15 and  = 0.325, our approach provides the best retrial results.
To investigate the influence of feature dimensions on the retrieval results and execution time, the proposed method adopts two difference feature dimensions.One is mentioned, which we refer to as a higher dimension case.Another case we just call the lower-dimensional features where Gist feature, HOG feature, LBP feature, and ColorHist feature are 256, 576, 256, and 96, respectively.We can observe from Table 5 that the reduction of feature dimensions helps to reduce the time consumption.But then, it is accompanied with  moderately decreasing of precision and recall rate.Unless otherwise specified, the first feature dimensions are used in our method.

Performance and Comparison.
This subsection evaluates the retrieval performance of the proposed MNGS algorithm.Considering some random initialization of the proposed method, repeated retrieval performance should be considered on the same dataset to obtain a relatively stable result.More precisely, taking Calth-256 and CIFAR-100 datasets as an example, for each retrieval task, the training process is conducted once, yet 40 iterations of testing for 40 query images that are selected from the corresponding dataset would be carried out to assess the retrieval accuracy.As mentioned before, the datasets used in this experiment are parts of the complete datasets but are still named Calth-256, CIFAR-10, and CIFAR-100 for convenience in the following description.The experiments will assign parameters , , ,  in terms of the above discussion.We perform the comparison with three stateof-the-art approaches on these datasets.The first method measures the similarity by modelling the RGB image with Gaussian distributions and calculating the KL divergence between the two GMMs, which we call KL-GMM [44] in our experiment.The second approach detects the resembled images by labelling the dataset via multiview joint nonnegative matrix factorization (JNMF) [45].The third algorithm searches the nearest neighbour of the query image in a low-dimensional feature space via multiview alignment hashing (MAH) [43].The precision and recall rate versus different retrieval lengths are adopted to compare the behaviour of different approaches, and the results are exhibited in Figure 2.
For the precision curve, we can observe that, compared with using the CIFAR-10 dataset, all approaches achieve higherprecision values while using Calth-256 and CIFAR-100.We believe that this is because the samples of CIFAR-10 present more complex details.Another interesting finding is that the plots of prevision tend to a constant / with increasing retrieval length values, where  represents the total number of images in each category.This is because most images that heavily match the query assemble in the former queue owing to the similarity ranking of the training set.Additionally, with increasing retrieval length, as a common phenomenon, the recall rates of all algorithms exhibit an ascending trend.It is noteworthy that the algorithms with feature extraction always achieve higher retrieval accuracy, such as JNMF, MAH, and MNGS.In addition, in the proposed method, the GMMbased spectral clustering scheme achieves better performance than regression-based hashing of MAH.Therefore, compared with JNMF, MAH, and KL-GMM shown in Figure 2, it is obvious that MNGS outperforms its competitors in terms of assessment criteria.Table 6 presents the top six images retrieved from the respective categories corresponding to the query images.The advantage of the algorithm can be found from the number of correctly matched images against query images.It can be observed that the retrieval accuracy for KL-GMM is slightly poorer than the other methods.This is consistent with the results depicted by Figure 2. The proposed method is proved again to moderately improve the retrieval performance.algorithms, the computational times required in the training phase could have been increased.In contrast, KL-GMM costs remarkably less training time.However, for the average CPU time in the testing phase, as we can see from this figure, no significant difference could be found for all approaches; only KL-GMM and MNGS spend slightly more time than the other methods on Calth-256 and CIFAR-10, as does MNGS on CIFAR-100.

Conclusions
In this paper, a novel retrieval algorithm called MNGS was presented, in which the sparse representations of the training data were precisely learnt via constrained NMF with pooling of multiview features.The main contribution of this work includes the following four aspects: (1) the proposed method imposed structure constraints and orthogonal regularization into the standard NMF framework and demonstrated its convergence; (2) by incorporating multiple structure-correlated features together, the coefficient matrix of MNGS preserved features from all images in the lowdimensional space; (3) the EM algorithm was incorporated to estimate the GMM components for the sparse features; and (4) spectral clustering based on the KL divergence strategy was designed to obtain the final retrieval results.Experiments on the three standard datasets indicated that, when appropriate parameters were used, the desired image retrieval could be achieved in terms of searching accuracy and effectiveness.

Figure 1 :
Figure 1: Flowchart of the proposed framework.During the training period, raw image features are extracted, whose sparse representations acquired by NMF are labelled via GMM and spectral clustering.Utilizing the sparse features of the query image and the trained GMM, the proposed method then outputs the similarity ranking based on the probabilities that the images belong to the clusters.

5. 4 .
Time Consumed.This subsection compares the computational time for each of the aforementioned approaches.The runtimes including the training time and testing time of the different algorithms are illustrated in Figure 3. Owing to the feature extraction steps of the JNMF, MAH, and MNGS

Figure 3 :
Figure 3: Average runtime of the four algorithms on three datasets, (a) training times and (b) test times.
After all probabilities {Φ 1 ( test | C 1 ), Φ 2 ( test | C 2 ), . . ., Φ  ( test | C  )} have been obtained, the next objective is to retrieve the most similar samples for the given query image.The whole process follows the following three steps: (1) for the samples belonging to each component of GMM, the similarity is compared by ranking the probabilities {Φ  (  | C  ) |   ∈ C  } of participators in descending order; (2) we compare the probabilities {Φ  ( test | C  ) | C  ∈ Ω  } in each spectral component to distinguish the similarity of different subclusters against the query image with descending order; (3) the last step sorts the probabilities {max{Φ 1, each column of  = [ 1 ,  2 , . . .,   ] is normalized during each iteration.This means that (  )  =      = 1.With this conclusion, the remaining part in (39) can be equivalently proved by 2 (   + )  = 2 ( )  + 2 (  )  − 1)    .

Table 1 :
Comparison retrieval performance in terms of  by fixing retrieval length ( L = 0.1), regularization parameters ( = 0.15,  = 0.325), and GMM component  = 12,  = 10, and  = 6 for Calth-256, CIFAR-10, and CIFAR-100, respectively.Bold values indicate the best result of the column.intoseveralsubclustersowing to the abundant images of the category.Considering that the number of subclusters labelled by GMM is always larger than the number of its artificial categories, the experiment assigns the subclusters (component numbers) of GMM to ( = 6, 8, 10, 12, 14, and 16) for datasets Calth-256 and CIFAR-100 and ( = 12, 14, 18, 22, 26, and 30) for CIFAR-10.Table2records the corresponding retrieval results for the three datasets with different component numbers.Based upon the results of precision and recall rate analysis, it is obvious that  1 = 6 for Calth-256 and CIFAR-100 and  2 = 12 for CIFAR-10 provide higher retrieval precision.Thus, this experiment indicates that the resulting subclusters near the artificial class labels and an excessively large component number  might reduce the retrieval precision. *

Table 5 :
The influence of feature dimensions on retrieval results and execution time.

Table 6 :
The top 6 images retrieved by using different algorithms on the Calth-256 dataset.The first column is the query image, followed by the retrieved images listed in the form of columns.