Feature representation learning is a key issue in artificial intelligence research. Multiview multimedia data can provide rich information, which makes feature representation become one of the current research hotspots in data analysis. Recently, a large number of multiview data feature representation methods have been proposed, among which matrix factorization shows the excellent performance. Therefore, we propose an adaptive-weighted multiview deep basis matrix factorization (AMDBMF) method that integrates matrix factorization, deep learning, and view fusion together. Specifically, we first perform deep basis matrix factorization on data of each view. Then, all views are integrated to complete the procedure of multiview feature learning. Finally, we propose an adaptive weighting strategy to fuse the low-dimensional features of each view so that a unified feature representation can be obtained for multiview multimedia data. We also design an iterative update algorithm to optimize the objective function and justify the convergence of the optimization algorithm through numerical experiments. We conducted clustering experiments on five multiview multimedia datasets and compare the proposed method with several excellent current methods. The experimental results demonstrate that the clustering performance of the proposed method is better than those of the other comparison methods.
Young Talent Cultivation Program of Jiangxi Normal UniversityGraduate Innovation Foundation Project of Jiangxi Normal UniversityYJS2020045Fundamental Research Funds for the Central Universities2412019FZ049Science and Technology Research Project of Jiangxi Provincial Department of EducationGJJ191689GJJ191709Provincial Key Research and Development Program of Jiangxi20202BABL20201620192ACBL21031China Postdoctoral Science Foundation2019M661117National Natural Science Foundation of China717620186200617461962026620620401. Introduction
With the rapid development of computer technology, the collected multimedia data from many research fields, such as computer vision, image processing, and natural language processing, always have features with high dimension and complex structures. These high-dimensional data can not only provide abundant information but also bring some problems such as the “curse of dimensionality” [1, 2]. Therefore, how to effectively deal with high-dimensional data has become a widespread concern [3]. Dimensionality reduction is an efficient way to solve this issue, which can map the original data to a low-dimensional space and obtain a low-dimensional representation derived from the hidden information in the original data [4].
In recent years, many dimensionality reduction methods have been proposed for multimedia data [5]. The matrix factorization method has become one of the research hotspots owing to its simple theoretical basis and easy implementation. Principal component analysis (PCA) [6], independent components analysis (ICA) [7], vector quantization (VQ) [8], etc. are well-known matrix factorization methods that can obtain a low-rank approximation matrix by decomposing a high-dimensional data matrix, and they can effectively extract a low-dimensional representation from high-dimensional data. However, these methods do not utilize any constraints on the matrix elements during the process of matrix decomposition. It means that the results allow negative elements, which give rise to the loss of physical meaning in low-dimensional representations. To solve this problem, Lee et al. added nonnegative constraints into matrix decomposition and proposed a nonnegative matrix factorization (NMF) [9] method. The low-dimensional feature representations obtained by NMF method are part-based so that they have strong interpretability. Consequently, NMF has attracted the wide attention of researchers. There are a large number of improved algorithms based on NMF have been emerged, which have achieved great success in computer vision, natural language processing, speech recognition, DNA sequence analysis, and other areas [10–13].
NMF decomposes the original nonnegative data matrix into the product of a nonnegative basis matrix and a nonnegative coefficient matrix (also called low-dimensional feature matrix). The original data can be expressed as a linear combination of basis matrices, and the combination coefficients can form the coefficient matrix. Since NMF uses nonnegative constraints, it reflects the intuitive notion of combining parts to form a whole and has better interpretability than other methods. The obtained experimental results indicate that NMF has achieved good performance on image and document clustering tasks. Nevertheless, the traditional NMF method only considers the nonnegativity constraints of the elements, which may result in the obtained basis matrix having poor sparseness and independence. To solve the above problems, researchers have imposed additional constraints on the basis matrix or the coefficient matrix and proposed a series of improved methods. For instance, Hoyer [14] designed a sparsity measurement criterion and proposed an NMF variant with sparsity constraints (NMF-SC). Moreover, to enhance the independence of the obtained basis matrices and low-dimensional representation, Choi [15] proposed orthogonal nonnegative matrix factorization (ONMF), which imposed orthogonal constraints on the basis matrix and the coefficient matrix. However, the above methods have nonnegative limitations on the original data, thereby limiting the applicability of these NMF-based algorithms. Therefore, Ding et al. [16] proposed a semi-nonnegative matrix factorization (SNMF). Different from traditional NMF, SNMF relaxed the limitations on the original data and coefficient matrix and only imposed a nonnegative constraint on the basis matrix. The methods mentioned above have better capabilities than their predecessors for feature extraction and achieved better results in real-world tasks, but they only extracted shallow features [17].
In recent years, deep learning has exhibited outstanding performance in feature representation tasks [18–20]. Therefore, many researchers have introduced deep learning into matrix factorization and proposed a large number of deep feature representation methods [21–27]. Ahn et al. [21] proposed multilayer nonnegative matrix factorization (MNMF). Different from traditional NMF-based approaches, MNMF decomposed the coefficient matrix several times to obtain an underlying part-based representation that can extract deep hierarchical features from the original data. In addition, to expand the application scope, Trigeorgis et al. [22] integrated deep factorization and semi-NMF to propose a deep semi-nonnegative matrix factorization (deep semi-NMF) method. However, both MNMF and deep semi-NMF only considered the deep decomposition of the coefficient matrix for the training data. For the new test data, the basis matrix was used to obtain the deep low-dimensional representation. Therefore, the basis matrix directly affected the results of the deep low-dimensional representation. To obtain a more accurate deep low-dimensional representation of the original data matrix, Zhao et al. [23] applied deep factorization to the basis matrix and proposed a deep NMF method based on basis image learning.
With the rapid development of the Internet and data collection technology, a large amount of multiview multimedia data can be easily acquired [28–30]. For example, an object can be shot from different views. An image can be described with different types of features such as color, texture, and shape. These multiview multimedia data can provide different information for each view, but they also contain potential correlations among these different views. Furthermore, they contain more information than single-view data. It is possible to simply integrate multiview data into single-view data, which ignores the differences and potential correlations between the various views of the data [28–30].
Consequently, extensive multiview data dimensionality reduction methods have been proposed [31–33]. Liu et al. [34] proposed a multiview NMF (multi-NMF) method which established the relationship between different perspectives by learning the common coefficient matrix among different views. Subsequently, Chang et al. [35] introduced a new regularization term into the multi-NMF and used it for clothing image clustering. Inspired by ONMF, Liang et al. [36] proposed NMF with coorthogonal constraints (NMFCC) for multiview multimedia data clustering. Additionally, to consider the correlations between multiple views, Zhan et al. [37] jointly optimized the graph matrix and concept factorization process and proposed an adaptive structure concept factorization (ASCF) method for multiview clustering. Although the above methods can handle multiview multimedia data well, they still belong to the class of feature representation method based on shallow factorization [38, 39]. The underlying deep features in the multiview data are still not available. Therefore, Zhao et al. [40] maximized the mutual information between various views, which forced the nonnegative representation of the last layer in each view to be as similar as possible. Then, the deep semi-NMF method was applied to multiview multimedia data clustering. Different from the existing studies, to adaptively provide feature weights for different perspectives in the multiperspective deep feature representation procedure, Huang et al. introduced an adaptive-weighted framework into the multiview deep semi-NMF and proposed an adaptive-weighted multiview clustering method based on deep matrix factorization [41]. Unlike the literature [40], it can adaptively assign weights to different views in a multiview deep feature representation. However, these methods still consider only the deep decomposition of the coefficient matrix. Therefore, an adaptive-weighted multiview deep basis matrix factorization (AMDBMF) is proposed for multimedia data clustering in this paper. Different from the above methods, AMDBMF first decomposes the basis matrix using a deep way on the data of each view simultaneously and then integrates the low-dimensional features of all view through the adaptive weighting mechanism to extract more accurate multiperspective deep low-dimensional representations. The flowchart of the proposed AMDBMF approach is shown in Figure 1. At last, we perform extensive experiments on five publicly available multiview multimedia datasets. These experimental results show that the proposed AMDBMF approach outperforms the existing related approaches.
The flowchart of the proposed AMDBMF approach.
The remainder of this paper is organized as follows. “Related Works” describes the related algorithms including NMF and deep semi-NMF briefly. “Adaptive-Weighted Multiview Deep Basis Matrix Factorization” introduces the adaptive-weighted multiview deep basis matrix factorization (AMDBMF) algorithm in detail. The experimental results and analysis are discussed in “Experiments and Analysis.” Finally, the conclusions are given in “Conclusions and Future Work.”
2. Related Works2.1. Nonnegative Matrix Factorization
Suppose that the given multimedia data can be represented asX=x1,⋯,xN∈R+D×N, where D is the dimensionality of the data and N is the number of samples. Each sample can be represented as a D-dimensional feature vector xj1≤j≤N. NMF is aimed at finding two low-ranking nonnegative matrices W=w1,⋯,wd∈R+D×d and H=h1,⋯,hN∈R+d×Nk<<Nandd<<D that fulfill X≈WH. After obtaining W and H, the original data can be expressed as xj≈∑i=1dwihij, that is, each sample can be expressed as a linear combination of the basis matrix W=w1,⋯,wd, and the coefficient vector is hj. Therefore, the matrices W and H are called the basis matrix and coefficient matrix, respectively. The objective function of NMF is defined as follows:
(1)minW,HX−WHF2s.t.W≥0,H≥0,where F is the Frobenius norm operation.
According to the Karush-Kuhn-Tucker (KKT) condition, the update formulas for variables W and H are as follows:
(2)Wil=WilXHTilWHHTil,Hlj=HljWTXljWTWHlj.
2.2. Deep Nonnegative Matrix Factorization
The traditional NMF method can remove redundant information and reveal the hidden semantic features of multimedia data, but it cannot learn an effective feature representation for the data. For example, a facial image contains various changes such as posture, lighting, and expression changes. Therefore, Trigeorgis et al. [22] pointed out that the coefficient matrix, as a low-dimensional representation of high-dimensional data, should be able to continue to be decomposed so that more abstract low-dimensional features can be obtained. Thus, these processes of deep factorization are defined as
(3)X≈W1H1,H1≈W2H2,⋯⋯Hl−1≈WlHl,where Wi and Hi represent the factorization results of the i-th layer. It can be seen from Eq. (3) that deep NMF performs a procedure of matrix factorization at each layer and uses the decomposed coefficient matrix as the input data of the next layer to continue decomposing. Consequently, the process of deep matrix factorization performed on the data is expressed as
(4)X≈W1W2⋯WlHl.
The objective function of deep NMF is defined as follows:
(5)minX−W1W2⋯WlHlF2.
Similar to that of NMF, the update formula can be defined as follows:
(6)Wi=ΨTΨ−1ΨTXH~iTH~iH~iT−1,Hi=Hi⊙ΨTXpos+ΨTΨnegHiΨTXneg+ΨTΨposHi,where Ψ=W1⋯Wi−1, H~i denotes the reconstruction of the i-th layer’s feature matrix, and the symbol ⊙ represents the dot product of matrices. Apos=A+A/2 represents a matrix operation that restrains all the negative elements to zeros and keeps the positive elements unchanged. On the contrary, Aneg=A−A/2 turns the positive elements to be zeros while the negative elements are to be nonnegative.
3. Adaptive-Weighted Multiview Deep Basis Matrix Factorization
First, an adaptive-weighted multiview deep basis matrix factorization (AMDBMF) method is proposed, which incorporates the nonnegative matrix factorization and deep learning into a unified framework. Next, an optimization algorithm with an iterative updating rule is designed to solve the objective function of AMDBMF. Then, an adaptive-weighted fusion mechanism is provided. Finally, we provide the complexity analysis of the proposed algorithm.
Suppose that X=X1,⋯,XN denotes a multimedia data set which contains N samples. Each sample xii=1,⋯,N is described by M views. Thus, the m-th view’s features for this sample can be represented as ximm=1,⋯,M. The features of all samples in this view can be represented as Xm=x1m,⋯,xNm.
3.1. Objective Function
First, matrix factorization is performed on the features in each view of the multimedia data, and the objective function can be defined as
(7)minWm,HmXm−WmHmF2s.t.Wm≥0,Hm≥0,where Wm and Hm denote the basis matrix and the coefficient matrix of the m-th view’s features, respectively.
Then, the deep factorization is performed on Wm. The process is defined as follows:
(8)Xm≈W1mH1m,W1m≈W2mH2m,⋯⋯Wl−2m≈Wl−1mHl−1m,Wl−1m≈WlmHlm,where W1m,W2m,⋯,Wl−1m,Wlm and H1m,H2m,⋯,Hl−1m,Hlm denote the basis matrices and coefficient matrices for each layer in the m-th view, respectively.
By combining it with Eq. (8), Eq. (7) is further rewritten as follows:
(9)minWlm,HlmXm−WlmHlmHl−1m⋯H2mH1mF2s.t.Wlm≥0,Hlm≥0.
Finally, to fuse the data from multiple perspectives, the final objective function is defined as
(10)minWlm,Hlm∑m=1MXm−WlmHlmHl−1m⋯H2mH1mF2s.t.Wlm≥0,Hlm≥0.
3.2. Optimization
From Eq. (10), we can find that the objective function is nonconvex for all variables, but it is convex for each of them on their own. Therefore, we design an iterative update algorithm to find the local optimal solution of the objective function. To solve this problem, one variable is updated while the other variables are fixed. The detailed updating rules are described as follows.
The optimal objective function for variables Wlm and Hlm can be defined as
(11)minWlm,HlmXm−WlmHlmHl−1m⋯H2mH1mF2s.t.Wlm≥0,Hlm≥0.
Let Λlm=Hl−1m⋯H2mH1m, and then Eq. (11) can be simplified as
(12)minWlm,HlmXm−WlmHlmΛl−1mF2=trXm−WlmHlmΛl−1mTXm−WlmHlmΛl−1m=tr−2XmTWlmHlmΛl−1m+Λl−1mTHlmTWlmTWlmiHlmΛl−1ms.t.Wlm≥0,Hlm≥0.
The Lagrangian function of Eq. (12) is expressed as
(13)φWlm,Hlm=Xm−WlmHlmΛl−1mF2+trγlmHlm+trηlmWlm,where γlm and ηlm are Lagrange multipliers.
Taking the partial derivatives of Eq. (13) with respect to Wlm and Hlm, and setting these derivatives to zero, we have
(14)∂φWlm,Hlm∂Wlm=∂Xm−WlmHlmΛl−1mF2∂Wlm+∂trηlmWlm∂Wlm=0,∂φWlm,Hlm∂Hlm=∂Xm−WlmHlmΛl−1mF2∂Hlm+∂trγlmHlm∂Hlm=0.
According to the KKT condition γlmHlmij=0 and ηlmWlmij=0 [41], the update rules of variables Wlm and Hlm are as follows:
(15)Wlm=Wlm⊙XΛl−1mTHlmTWlmHlmΛl−1mΛl−1mTHlmT,Hlm=Hlm⊙WlmTXmΛl−1mTWlmTWlmHlmΛl−1mΛl−1mT,where the symbol ⊙ represents the dot product of matrices.
Finally, the algorithmic steps of the proposed method are given in Algorithm 1. To make it easier to understand, Figure 2 depicts the block diagram of the proposed optimization algorithm.
Algorithm 1: The optimization algorithm.
Input:
Multiview nonnegative matrix X=X1,X2,⋯,Xm,⋯,XM
The hidden features of each layer d1,d2,⋯dl,⋯,dL
Initialize:
The basis matrix for each view Wlm,l=1,2,⋯,L,m=1,2,⋯,M
The coefficient matrix for each view Hlm,l=1,2,⋯,L,m=1,2,⋯,M
Pretrain:
for m=1:M do
for l=1:L do
Wlm,Hlm=NMFWl−1m,kl
end for
end for
Update:
Repeat
for m=1:M do
for l=1:L do
if l==1
Λlm=I
else
Λlm=Hl−1m⋯H2mH1m
end
Update Wlm=Wlm⊙XΛl−1mTHlmT/WlmHlmΛl−1mΛl−1mTHlmT
Update Hlm=Hlm⊙WlmTXmΛl−1mT/WlmTWlmHlmΛl−1mΛl−1mT
end for
end for
Until Reach the convergence condition or the maximum number of iterations
Output:
The basis matrix for each view Wlm,l=1,2,⋯,L,m=1,2,⋯,M
The coefficient matrix for each view Hlm,l=1,2,⋯,L,m=1,2,⋯,M
The block diagram of the proposed optimization algorithm.
3.3. Feature Confusion
After obtaining the basis matrix and coefficient matrix of each layer for each view through the optimization algorithm, an adaptive-weighted fusion mechanism is adopted to obtain a low-dimensional representation of the multiview data, and the weight calculation is
(16)αm=12Xm−WlmHlmHl−1m⋯H2mH1mF2+ε,where ε is a small constant.
Then, αm is normalized by Eq. (17)
(17)αm=αm∑m=1Mαm.
Finally, since the low-dimensional representation of each view is expressed as Hm=HlmΛl−1m, the fusion of the low-dimensional features derived from the multiview data can be expressed as
(18)H∗=∑m=1MαmHm.
3.4. Complexity Analysis
Clearly, the proposed algorithm can be divided into two stages: pretraining and fine-tuning. For convenience, suppose that the number of iterations is T, M is the number of data views, and L is the number of layers. The number of features for all views is D, and the number of low-dimensional representations for each layer is K. In the pretraining process, the complexity of a single view is OTDNK. Therefore, the complexity of the whole pretraining process is OTMLDNK. For the fine-tuning part, the main computational complexity is derived from updating Λl−1m, Wlm, and Hlm, which requires OTML‐1NK2, OTMLDNK, and OTMLDNK complexity, respectively. Since D>>K, the total computational complexity of the proposed algorithm is OTMLDNK.
4. Experiments and Analysis4.1. Datasets
Five commonly used multiview multimedia datasets from the Internet are used in the experiments to verify the effectiveness of the proposed method.
4.1.1. 3sources
This dataset includes a collection of 416 news events and 948 related news reports from February to April 2009 from three well-known news media outlets, including BBC, Reuters, and Guardian. In the experiments, 169 news items reported by all three news media outlets are used. These news events include six categories: business, entertainment, health, politics, sports, and technology (http://mlg.ucd.ie/datasets/3sources.html).
4.1.2. BBC [42]
This dataset contains 685 news articles collected from the BBC News Network between 2004 and 2005. Each article is divided into four parts, and the data consist of five kinds of news: business, entertainment, politics, sport, and technology (http://mlg.ucd.ie/datasets/segment.html).
4.1.3. BBC Sport [42]
This dataset includes 737 news articles from the BBC Sport network from 2004 to 2005. These news articles cover six fields, such as track, field, cricket, football, rugby, and tennis (http://mlg.ucd.ie/datasets/segment.html).
4.1.4. Reuters [43]
This is a dataset that includes 1200 English articles from six types of samples, and each article has been translated into French, German, Italian, and Spanish (http://lig-membres.imag.fr/grimal/data.html).
4.1.5. Wikipedia [43]
This dataset consists of specific Wikipedia material with 2669 articles in 29 categories (http://www.svcl.ucsd.edu/projects/crossmodal/). In the experiments, we select a subset of the 10 most popular categories containing a total of 693 samples. The detailed statistical information about the different datasets is given in Table 1.
Statistical information about the different datasets.
Datasets
Size (N)
Classes (C)
Views (V)
Feature dimension of each view
d1
d2
d3
d4
d5
3sources
169
6
3
3560
3068
3631
—
—
BBC
685
5
4
4659
4633
4465
4684
—
BBC Sport
544
5
2
3183
3203
—
—
—
Reuters
1200
6
5
12000
12000
12000
12000
12000
Wikipedia
693
10
2
128
10
—
—
—
4.2. Metrics
In the experiments, we select three commonly used clustering evaluation indicators [44]: accuracy (ACC), normalized mutual information (NMI), and purity to evaluate the performance of the proposed method.
Assuming that the clustering result of xi is li and that the corresponding true label is ti, then the clustering accuracy (ACC) [45] is defined as
(19)ACC=∑i=1Nδti,mapliN,where the function δ· is defined as follows:
(20)δx,y=1x=yδx,y=0x≠y.
The function map· maps the clustering result to the corresponding true label. The Kuhn-Munkres algorithm [46] is employed to find the best mapping result.
Assume that L and T are the clustering result and the true label set, respectively. The mutual information (MI) between them is defined as
(21)MIL,T=∑li∈L,ti∈Tpli,ti⋅log2pli,tipli⋅pti,where pli and pti represent the probabilities that a sample is randomly selected from the dataset belonging to li and ti, respectively. pli,ti represents the joint probability of a sample randomly being selected from the dataset belonging to li and ti. Let HL and HT represent the entropies of L and T, respectively. Since the value range of mutual information is between 0 and maxHL,HT, the normalized mutual information (NMI) is defined as
(22)NMI=MIL,TmaxHL,HT.
Purity is a straightforward and transparent evaluation method that is defined as follows:
(23)purity=1k∑i=1kCidCi,where k represents the number of clusters, Cid is the number of elements in the most numerous category in cluster Ci, and Ci is the number of elements in cluster Ci.
4.3. Experimental Results and Analysis
In the first experiment, to test the influences of the parameters on the proposed method, we set the number of factorization layers L and the feature dimension D of each layer to 1,2,3 and 10,30,50,70,90,110,130, respectively. Furthermore, we adopt a grid search to find the optimal parameter value. In the experiment, the low-dimensional features obtained by the proposed algorithm are clustered by the K-means algorithm. Since the initialization of the K-means algorithm has an impact on the clustering results, we repeat the random initialization process with 10 times and report the mean value. First, the optimal feature dimension of each layer is fixed, and the numbers of layers are changed. As shown in Figure 3, in most cases, when the number of layers is set to 1, the result of each measure is poorer than the rest. However, as the number of layers increases, the performance also increases. It shows that the deep factorization helps to improve the performance of the proposed method.
The influence of factorization layer L on the clustering results, (a) ACC, (b) NMI, and (c) purity.
Then, the numbers of layers are fixed, and the dimension of the feature is changed. The result is shown in Figure 4. It can be seen that as the dimensionality increases, the clustering performance also improves in most cases. However, this trend is not always maintained, and the clustering performance decreases or remains stable as the dimensionality increases once the performance reaches the optimal level. The details of the optimal parameter groups in our proposed algorithm are listed in Table 2.
The influences of feature dimension D on clustering results, (a) ACC, (b) NMI, and (c) purity.
The optimal parameter groups in our proposed algorithm.
Datasets
ACC
NMI
Purity
3sources
{3, 50}
{3, 50}
{3, 50}
BBC
{2, 110}
{3, 110}
{2, 110}
BBC Sport
{2, 90}
{3, 50}
{2, 90}
Reuters
{3, 110}
{2, 110}
{2, 70}
Wikipedia
{3, 10}
{2, 130}
{2, 130}
The second experiment is conducted to verify that the fusion of multiview information is beneficial for improving the clustering performance of the proposed method. First, we perform traditional NMF and deep basis matrix factorization (DBMF) for the data of each view. Then, we obtain the low-dimensional features of the multiview data by fusing the features of different views with equal weight. Finally, the proposed AMDBMF method is compared with the above two methods. The comparison results are listed in Tables 3–5. According to the tables, the performance of the DBMF method is better than that of the traditional NMF method, which indicates that more abstract features can be obtained through the deep factorization. The performance of the proposed AMDBMF method is better than that of the DNBMF method, which verifies that the adaptive fusion of different views is beneficial for extracting more robust low-dimensional features from multiview data.
The ACCs (%) of the AMDBMF and single-view methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
NMF
55.03
46.39
61.75
73.57
56.71
DBMF
76.98
63.84
80.24
67.67
57.11
AMDBMF
79.29
82.86
87.65
90.61
60.12
The NMIs (%) of the AMDBMF and single-view methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
NMF
51.90
18.74
40.45
66.84
51.23
DBMF
67.16
56.71
66.80
59.47
54.89
AMDBMF
69.78
63.97
71.21
86.63
56.84
The purities (%) of the AMDBMF and single-view methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
NMF
72.01
48.66
64.67
75.34
60.75
DBMF
81.72
73.74
83.62
69.98
59.60
AMDBMF
82.84
82.86
87.65
91.13
63.49
The third experiment compares the performance of the proposed AMDBMF method with those of some currently popular multiview algorithms, including MVCF [37], DeepMVC [41], GMC [47], and NMFCC [36]. MVCF utilized the correlation information between the views obtained by jointly optimizing the graph matrix of the data of each view. DeepMVC used a nonparameterized adaptive learning method to obtain the weights between views. NMFCC introduces orthogonal constraints into the basis matrix and coefficient matrix. The best results yielded by the different multiview learning methods on different datasets are shown in Tables 6–8. It can be seen that the performances of the proposed method are significantly better than those of the other comparison methods in most cases. Since these methods use different mechanisms to fuse multiview data information, all the methods present different performances on different databases. Therefore, how to effectively integrate fusion mechanisms is still an open problem.
The ACCs (%) of different multiview methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
MVCF
69.94
63.18
66.82
63.42
54.04
DeepMVC
65.68
72.12
72.12
76.25
60.46
GMC
69.23
69.34
80.70
66.58
44.88
NMFCC
69.82
73.40
83.09
76.04
61.46
AMDBMF
79.29
82.86
87.65
90.61
60.12
The NMIs (%) of different multiview methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
MVCF
60.64
45.33
46.01
65.92
49.24
DeepMVC
56.50
52.58
52.58
73.76
55.45
GMC
54.80
48.52
72.26
70.43
36.14
NMFCC
61.91
53.95
67.38
73.92
56.00
AMDBMF
69.78
63.97
71.21
83.63
56.84
The purities (%) of different multiview methods on different datasets.
Methods
3sources
BBC
BBC Sport
Reuters
Wikipedia
MVCF
78.52
64.64
68.36
68.84
57.17
DeepMVC
75.15
72.12
72.12
78.00
63.64
GMC
74.56
69.34
84.38
67.25
48.20
NMFCC
78.70
73.45
83.09
78.96
64.86
AMDBMF
82.84
82.86
87.65
91.13
63.49
The final experiment verifies the convergence of the proposed optimization algorithm. The convergence curves of the proposed method on different datasets are given in Figure 5. As seen from the figures, the iterative update rules in Algorithm 1 decrease the objective function value obtained by our proposed method. Moreover, we can also see that our proposed method converges very quickly on these datasets.
The convergence curves of the proposed method on five datasets: (a) 3sources, (b) BBC, (c) BBC Sport, (d) Reuters, and (e) Wikipedia.
5. Conclusions and Future Work
To efficiently learn the feature representations of multiview multimedia data, this paper proposes a new deep nonnegative matrix factorization method with multiview learning. Unlike traditional methods, the proposed method deeply decomposes the basis matrix, so it not only can learn the component representation of the original data but also can learn more abstract deep features. Furthermore, to effectively fuse the available multiview data information, this paper introduces an adaptive feature fusion mechanism.
To solve the shortcoming of information fusion for multiview data, a large number of fusion mechanisms have been proposed, and they achieve different performances on different datasets. Therefore, how to effectively integrate different mechanisms to improve the feature representation ability of a given approach is one of the key research tasks to be addressed in the future. Moreover, we will apply our method to other fields such as medical image procession and medical text analysis [48].
Data Availability
The data are derived from public domain resources.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research is supported by the National Natural Science Foundation of China under grant nos. 62062040, 61962026, 62006174, and 71762018, the Chinese Postdoctoral Science Foundation (grant no. 2019M661117), the Provincial Key Research and Development Program of Jiangxi under grant nos. 20192ACBL21031 and 20202BABL202016, the Science and Technology Research Project of Jiangxi Provincial Department of Education (grant nos. GJJ191709 and GJJ191689), Fundamental Research Funds for the Central Universities under grant no. 2412019FZ049, the Graduate Innovation Foundation Project of Jiangxi Normal University under grant no. YJS2020045, and the Young Talent Cultivation Program of Jiangxi Normal University.
YiY.ChenY.WangJ.LeiG.DaiJ.ZhangH.Joint feature representation and classification via adaptive graph semi-supervised nonnegative matrix factorization202089, article 11598410.1016/j.image.2020.115984YiY.WangJ.ZhouW.ZhengC.KongJ.QiaoS.Non-negative matrix factorization with locality constrained adaptive graph202030242744110.1109/TCSVT.2019.28929712-s2.0-85060303609ReddyG. T.ReddyM. P. K.LakshmannaK.KaluriR.RajputD. S.SrivastavaG.BakerT.Analysis of dimensionality reduction techniques on big data20208547765478810.1109/ACCESS.2020.2980942YiY.WangJ.ZhouW.FangY.KongJ.LuY.Joint graph optimization and projection learning for dimensionality reduction20199225827310.1016/j.patcog.2019.03.0242-s2.0-85064075551AyeshaS.HanifM. K.TalibR.Overview and comparative study of dimensionality reduction techniques for high dimensional data202059445810.1016/j.inffus.2020.01.005AbdiH.WilliamsL. J.Principal component analysis20102443345910.1002/wics.1012-s2.0-77957553895HyvärinenA.OjaE.Independent component analysis: algorithms and applications2000134-541143010.1016/S0893-6080(00)00026-52-s2.0-004282682210946390LindeY.BuzoA.GrayR.An algorithm for vector quantizer design1980281849510.1109/TCOM.1980.10945772-s2.0-0018918171LeeD. D.SeungH. S.Learning the parts of objects by non-negative matrix factorization1999401675578879110.1038/445652-s2.0-003359260610548103WangY. X.ZhangY. J.Nonnegative matrix factorization: a comprehensive review20122561336135310.1109/tkde.2012.512-s2.0-84897584251JamaliA. A.KusalikA.WuF. X.MDIPA: a microRNA–drug interaction prediction approach based on non-negative matrix factorization202036205061506710.1093/bioinformatics/btaa57733212495ChaliseP.NiY.FridleyB. L.Network-based integrative clustering of multiple types of genomic data using non-negative matrix factorization202011810362510.1016/j.compbiomed.2020.103625HouM.LiJ.LuG.A supervised non-negative matrix factorization model for speech emotion recognition2020124132010.1016/j.specom.2020.08.002HoyerP. O.Non-negative matrix factorization with sparseness constraints20045914571469ChoiS.Algorithms for orthogonal nonnegative matrix factorization2008 IEEE International Joint Conference on Neural NetworksJune 2008Hong Kong, China1828183210.1109/IJCNN.2008.46340462-s2.0-56349098310DingC. H. Q.LiT.JordanM. I.Convex and semi-nonnegative matrix factorizations2008321455510.1109/tpami.2008.2772-s2.0-84871617105FanJ.ChengJ.Matrix completion by deep matrix factorization201898344110.1016/j.neunet.2017.10.0072-s2.0-8503405191129154225BengioY.CourvilleA.VincentP.Representation learning: a review and new perspectives20133581798182810.1109/TPAMI.2013.502-s2.0-8487985488923787338LeCunY.BengioY.HintonG.Deep learning2015521755343644410.1038/nature145392-s2.0-8493063027726017442ZhongG.WangL. N.LingX.DongJ.An overview on data representation learning: from traditional feature learning to recent deep learning20162426527810.1016/j.jfds.2017.05.001AhnJ. H.KimS.OhJ. H.ChoiS.Multiple nonnegative-matrix factorization of dynamic PET imagesProceedings of Asian Conference on Computer Vision2004Jeju, Korea10091013TrigeorgisG.BousmalisK.ZafeiriouS.SchullerB. W.A deep matrix factorization method for learning attribute representations201639341742910.1109/TPAMI.2016.25545552-s2.0-85012894180ZhaoY.WangH.PeiJ.Deep non-negative matrix factorization architecture based on underlying basis images learning20214361897191310.1109/TPAMI.2019.296267931899412MengY.ShangR.ShangF.JiaoL.YangS.StolkinR.Semi-supervised graph regularized deep NMF with bi-orthogonal constraints for data representation20203193245325810.1109/tnnls.2019.2939637TongM.ChenY.MaL.BaiH.YueX.NMF with local constraint and deep NMF with temporal dependencies constraint for action recognition20203294481450510.1007/s00521-018-3685-92-s2.0-85052565844LiJ.ZhouG.QiuY.WangY.ZhangY.XieS.Deep graph regularized non-negative matrix factorization for multi-view clustering202039010811610.1016/j.neucom.2019.12.054ShuZ.WuX.HuC.YouC. Z.FanH. H.Deep semi-nonnegative matrix factorization with elastic preserving for data representation20218021707172410.1007/s11042-020-09766-wZhaoJ.XieX.XuX.SunS.Multi-view learning overview: recent progress and new challenges201738435410.1016/j.inffus.2017.02.0072-s2.0-85015035411LiY.YangM.ZhangZ.A survey of multi-view representation learning201831101863188310.1109/TKDE.2018.28720632-s2.0-85054619405HussainT.MuhammadK.DingW.LloretJ.BaikS. W.de AlbuquerqueV. H. C.A comprehensive survey of multi-view video summarization202110910756710.1016/j.patcog.2020.107567XuX.YangY.DengC.NieF.Adaptive graph weighting for multi-view dimensionality reduction201916518619610.1016/j.sigpro.2019.06.0262-s2.0-85068883501ZhangR.NieF.LiX.WeiX.Feature selection with multi-view data: a survey20195015816710.1016/j.inffus.2018.11.0192-s2.0-85061028120LuoP.PengJ.GuanZ.FanJ.Dual regularized multi-view non-negative matrix factorization for clustering201829411110.1016/j.neucom.2017.10.0232-s2.0-85044502378LiuJ.WangC.GaoJ.HanJ.Multi-view clustering via joint nonnegative matrix factorizationProceedings of the 2013 SIAM International Conference on Data Mining2013Austin, TX, USA25226010.1137/1.9781611972832.28ChangW. Y.WeiC. P.WangY. C. F.Multi-view nonnegative matrix factorization for clothing image characterization2014 22nd International Conference on Pattern RecognitionAugust 2014Stockholm, Sweden1272127710.1109/ICPR.2014.2282-s2.0-84919915605LiangN.YangZ.LiZ.SunW.XieS.Multi-view clustering by non-negative matrix factorization with co-orthogonal constraints202019410558210.1016/j.knosys.2020.105582ZhanK.ShiJ.WangJ.WangH.XieY.Adaptive structure concept factorization for multiview clustering20183041080110310.1162/neco_a_010552-s2.0-8504432590329342398WeiS.WangJ.YuG.DomeniconiC.ZhangX.Multi-view multiple clusterings using deep matrix factorization20203446348635510.1609/aaai.v34i04.6104ZhaoW.XuC.GuanZ.LiuY.Multiview concept learning via deep matrix factorization202132281482510.1109/TNNLS.2020.297953232275617ZhaoH.DingZ.FuY.Multi-view clustering via deep matrix factorization2017311HuangS.KangZ.XuZ.Auto-weighted multi-view clustering via deep matrix decomposition202097, article 10701510.1016/j.patcog.2019.107015GreeneD.CunninghamP.Practical solutions to the problem of diagonal dominance in kernel document clusteringProceedings of the 23rd international conference on Machine learning2006New York, NY, USA37738410.1145/1143844.11438922-s2.0-34250762673BissonG.GrimalC.HuangT.ZengZ.LiC.LeungC. S.An architecture to efficiently learn co-similarities from multi-view datasets2012Berlin, HeidelbergSpringer18419310.1007/978-3-642-34475-6_232-s2.0-84869049002FuL.LinP.VasilakosA. V.WangS.An overview of recent multi-view clustering202040214816110.1016/j.neucom.2020.02.104YiY.ShiY.ZhangH.WangJ.KongJ.Label propagation based semi-supervised non-negative matrix factorization for feature extraction20151491021103710.1016/j.neucom.2014.07.0312-s2.0-85027941172ZhuH.ZhouM. C.Efficient role transfer based on Kuhn–Munkres algorithm201142249149610.1109/TSMCA.2011.21595872-s2.0-84857505320WangH.YangY.LiuB.GMC: graph-based multi-view clustering20193261116112910.1109/TKDE.2019.29038102-s2.0-85063012218HuangX.ZhongB.CaoY.YiY.GuM.Chest X-ray lung Chinese description generation based on semantic labels and hierarchical LSTM2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)December 2020Seoul, Korea (South)1020102310.1109/BIBM49941.2020.9313293