We present a novel method, called graph sparse nonnegative matrix factorization, for dimensionality reduction. The affinity graph and sparse constraint are further taken into consideration in nonnegative matrix factorization and it is shown that the proposed matrix factorization method can respect the intrinsic graph structure and provide the sparse representation. Different from some existing traditional methods, the inertial neural network was developed, which can be used to optimize our proposed matrix factorization problem. By adopting one parameter in the neural network, the global optimal solution can be searched. Finally, simulations on numerical examples and clustering in real-world data illustrate the effectiveness and performance of the proposed method.
1. Introduction
Dimensionality reduction plays a fundamental role in image processing, and many researchers have been seeking effective methods to solve this problem. For a given image database, there are many distinct features, whereas the available features are far less enough. Thus, it is of great significance to find useful features with low-dimensionality to represent the original feature space. For this purpose, matrix factorization techniques have attracted great attention in recent decades [1–3], and many different methods have been developed by using different criteria. The most familiar methods include Singular Value Decomposition (SVD) [4], Principal Component Analysis (PCA) [5], and Vector Quantization (VQ) [6]. The main idea of matrix factorization methods is finding several matrices whose product approximates to the original matrix. In dimensionality reduction, the dimension of the decomposed matrices is smaller than that of the original matrix. This gives rise to a low-dimensional compact representation of the original data points, which can facilitate clustering or classification.
Among these matrix factorization methods, one of the most used methods is nonnegative matrix factorization (NMF) [3], which requires the decomposed matrices to be nonnegative. The effect of the nonnegative constraint leads NMF to learn a part-based representation of high-dimensional data, and it is applied to so many areas such as signal processing [7], data mining [8, 9], and computer vision [10]. In general, NMF is shown to be effective for unsupervised learning problems, but not applicable to supervised learning problems. To overcome this problem, some researchers [11–13] have presented semi-supervised learning theory to achieve better performance in dimensionality reduction. In the light of locality preserving projection, a graph regularized nonnegative matrix factorization method (GNMF) has been proposed to impose the geometrical information on the data space. The geometrical structure is constructed by a nearest neighbor graph [11]. Based on the idea of label propagation, Liu et al. [13] imposed the label information constraint into nonnegative matrix factorization (CNMF). The idea of CNMF is that the neighboring data points with the same class are supposed to merge together in the low-dimensional representation space.
Motivated by previous researches in matrix factorization, in this paper, we propose a novel method, called graph sparse nonnegative matrix factorization (GSNMF), for dimensionality reduction, which can be used for semi-supervised learning problems. In GSNMF, a sparse constraint is imposed on GNMF, and this leads matrix factorization to learn a sparse part-based representation of the original data space. The sparse constraint causes GSNMF to be a nonconvex nonsmooth problem, and traditional optimization algorithms can not be optimized directly. Recently, numerous neural networks have emerged as a powerful tool for optimization problems [14–27]. For some nonconvex problems, an inertial projection neural network (IPNN) [16] has been proposed to search different local optimal solutions by the inertial term. In [17], a shuffled frog leaping algorithm (SFLA) has been developed using the recurrent neural network. Based on the SFLA framework, the global optimal solution can be searched. Moreover, there are many optimization methods for nonconvex nonsmooth problems that use neural networks [22–27].
It is worth highlighting some advantages of our proposed method as follows:
Traditional algorithms for GNMF [11] and NMF [3] can easily trap into local optimum solution, and these algorithms are sensitive to initial values, while our proposed algorithm using inertial projection neural network can avoid these problems.
Our proposed algorithm can be initialized by sparser matrices; however, GNMF and NMF may fail in this case.
By adopting one parameter in the neural network, GSNMF has the better clustering effect than GNMF and NMF.
The rest of the paper is organized as follows. In Section 2, some related works to NMF are briefly reviewed; then we introduce GSNMF. Section 3 reviews the inertial projection neural network theory and provides a convergence proof to GSNMF. Section 4 presents numerical examples to demonstrate the validity of our proposed algorithm. Experiments on clustering are given in Section 5. Finally, we present some concluding marks and future work in Section 6.
2. Problem Formulation
To find the effective features of high dimensionality data, matrix factorization can be used to learn sets of features to represent data. Given a data matrix Y=[Y1,…,Yn]∈Rm×n and an integer k, matrix factorization is to find two matrices A∈Rm×k and S=[S1,…,Sn]∈Rk×n such that(1)Y≈AS.When k≪minm,n, the matrix factorization method can be regarded as a dimensionality reduction method. In image dimensionality reduction, each column of S is a basis vector to capture the original image data and each column of A is the representation with respect to the new basis. The most used method to measure the approximation is the Frobenius norm in the following form:(2)minA,SfA,S=12Y-ASF2=12∑i=1m∑j=1nyij-∑k=1raikskj2.
Different matrix factorization methods imposed different constraints on (2) that can solve different practical problems. At present, the most used matrix factorization method is nonnegative matrix factorization (NMF) [3] with nonnegative constraints on A and S. The classic algorithm is summarized as follows:(3)aik=aikYSTikASSTik,skj=skjATYkjATASTkj.
Recently, Cai et al. [11] proposed a graph regularized nonnegative matrix factorization method (GNMF) and incorporated the geometrical information into the data space. The goal of GNMF is to find effective basis vectors to represent the intrinsic structure. The research has presented the natural assumption that if two data points yj and yl from Y are close in the intrinsic geometry of the data distribution, new representations of two points sj and sl are also close to each other. For each data point yj, we find its p nearest neighbors and put edges between yj and its neighbors. Edges between each data points yj can be considered as the weight matrix W. If nodes j and l are connected by an edge, then Wjl=1. W can be described by(4)Wjl=1,if yj and yl have the same label,0,otherwise.
The low-dimensional representation of yj with respect to the new basis is sj. The Euclidean distance is used to measure the dissimilarity between sj and sl by(5)dsj,sl=sj-sl2.With the above analysis, the following term is used to measure the smoothness of the low-dimensional representation:(6)R=12∑j,l=1nsj-sl2Wjl=∑j=1nsjTsjFjj-∑j,l=1nsjTslWjl=trSFST-trSWST=trSLST,where tr(·) denotes the trace of a matrix, Fjj=∑j=1nWjl, and L=F-W. Combining (6) and (2), the new objective function is defined by the Euclidean distance.(7)minA⩾0,S⩾0fA,S=12Y-ASF2+αtrSLST.The algorithm to solve (7) is presented as follows:(8)aik=aikYSTikASSTik,skj=skjATY+SWTkjATAST+SFTkj.
When α=0 or W=0, GNMF is equivalent to nonnegative matrix factorization. In the representation of the image data, GNMF and NMF only consider the Euclidean structure of image space. However, recent researches have shown that human generated images may from a submanifold of the ambient Euclidean space [28, 29]. In general, the human generated images cannot uniformly fill up the high-dimensional Euclidean space. Therefore, the matrix factorization should respect the intrinsic manifold structure and learn the sparse basis to represent the image data. In the light of sparse coding [30], we impose the sparse constraint on (7) and the optimization problem is transformed into another form:(9)minA⩾0,S⩾0fA,S=12Y-ASF2+αtrSLST+βΦS.
Because the optimization problem (9) is nonconvex, a block-coordinate update (BCD) [31] structure is proposed to optimize GSNMF. Given the initial A and S, BCD alternatively solves(10)minS∗fS=12Y-ASF2+αtrSLST+βΦSs.t.S⩾0,(11)minA∗fA=12YT-S∗TATF2s.t.A⩾0until convergence. Since (11) and (10) have a similar form, we only consider how to solve (10); then (11) can be solved accordingly. The problem (10) can be transformed into the following vector form:(12)minHfH=12∑i=1nYi-AHi22+12α∑j,l=1nHj-Hl22Wjl+βH1s.t.H⩾0,where(13)H=H1⋮Hn=S1⋮Sn∈Rnk,Hi∈Rk,Yi∈Rm.It is evident that L1-norm is not differentiable. However, [32] has presented a method to solve it. Supposing H=U-V, problem (12) can be rewritten as follows:(14)minU,VfU,V=12∑i=1nYi-AUi-Vi2+12α∑j,l=1nUj-Vj-Ul-Vl22Wjl+βU-V1s.t.U-V⩾0,where Ui=(Hi)+∈Rk, Vi=(-Hi)+∈Rk, (Hi)+=maxHi,0,i=1,…,n, 1nk=[1,…,1]T, and U and V are, respectively, defined as follows:(15)U=U1⋮Un∈Rnk,V=V1⋮Vn∈Rnk.According to the BCD structure, (14) can be separated into two subproblems. Given the initial U and V, one alternatively solves(16)minUfU∗=12∑i=1nYi+AVi-AUi22+12α∑j,l=1nUj-Ul-Vj-Vl22Wjl+βU1s.t.U⩾V,(17)minVfV∗=12∑i=1nYi-AUi--AVi22+12α∑j,l=1nVj-Vl-Uj-Ul22Wjl+βV1s.t.0≤V≤Uuntil convergence. Since (16) and (17) have a similar form, we only consider how to solve (16); then (17) can be solved accordingly. Equation (16) can be transformed into the following convex quadratic program (CQP):(18)minUfU=12UTCU+DUs.t.U∈Ω,where(19)E=diagATA,…,ATA∈Rnk×nk,Fij=diagWij,…,Wij∈Rn×n,F=F11F12⋯F1kF21F22⋯F2k⋮⋮⋮⋮Fk1Fk2⋯Fkk,G=diagF11,F12,…,FkkC=E-αF+αGD=β1nkT-Y1TA+V1TATA,…,YnTA+VnTATA-2αGVT-FVTΩ=U∣U⩾V.
According to the above analysis, problem (11) can be also transformed into a convex quadratic program problem. For saving space, we do not provide the derivation process. In the following section, we will introduce IPNN to optimize (18).
3. Neural Network Model and Analysis3.1. Inertial Projection Neural Network
To solve problem (18), we establish the following neural network using IPNN [16]:(20)dUdt=X,dXdt=-λX+PΩU-∇fU-U,Ut0=U0∈Ω,where ∇f(U)=CU+DT. Now, we are ready to show the convergence and optimality of (20). For any U∈R2nk and X∈R2nk, we set Q1=(U1T,X1T)T, Q2=(U2T,X2T)T, and R(Qi)=Xi-λXi+PΩUi-∇fUi-Ui; then we will present the following theorems.
Theorem 1.
For any initial point Q0∈Ω with the initial condition Q(0)=Q0, there exists unique continuous solution Q(t)∈Ω, where t∈[t0,+∞).
Proof.
Note that ∇f(U)=CU+DT is Lipschitz continuous. Let L be the Lipschitz constant. Thus, for any Q1 and Q2∈Ω, we obtain(21)RQ1-RQ2≤X1-X2+λX1-X2+U1-U2+PΩU1-∇fU1-PΩU2-∇fU2≤2+λQ1-Q2+I-CU1-U2≤2+λ+I-CQ1-Q2,where I and L are unit matrix and Lipschitz constant, respectively. Therefore, R(Q) is Lipschitz continuous on Ω. There exists unique solution Q(t) with initial condition Q(t0) by the local existence theorem of solution to ordinary differential equations.
Theorem 2.
Define N(U)=U-PΩ(U-F(U)), if the following two conditions hold.
(1) For any U1 and U2∈Rnk,(22)NU1-NU2,U1-U2⩾cosθ1+I-CNU1-NU2,where θ is the angle between N(U1)-N(U2) and U1-U2.
(2)λ>1+I-C/cosθ. Then, the solution of model (20) converges to optimal solution set of (18).
Proof.
Considering U∗∈Ω and the Lyapunov function V(t)=(1/2)Ut-U∗=(1/2)(U(t)-U∗)T(U(t)-U∗), we obtain(23)V¨t+λV˙t=XTX+U-U∗TX˙+λUt-U∗TX=X2+Ut-U∗,X˙+λX=X2+Ut-U∗,PΩU-∇fU-U=X2+Ut-U∗,-NU.Since N(U∗)=0 and Condition (1) holds, (23) can be rewritten as(24)X2=V¨t+λV˙t+Ut-U∗,NU=V¨t+λV˙t+Ut-U∗,NU-NU∗≥V¨t+λV˙t+cosθ1+I-CNU2=V¨t+λV˙t+cosθ1+I-CX˙+λX2=V¨t+λV˙t+cosθ1+I-CX˙2+λcosθ1+I-CdX2dt+λ2cosθ1+I-CX2.Then, (24) can be transformed into(25)V¨t+λV˙t+cosθ1+I-CX˙2+λcosθ1+I-CdX2dt+λ2cosθ1+I-C-1X2≤0,which indicates that the function Zt=V˙t+λVt+λcosθ/1+I-CX2+cosθ/1+I-C∫0tX˙s2ds+λ2cosθ/1+I-C-1∫0tXs2 is monotone nonincreasing. Thus, for any t>0,(26)Zt⩽Z0=V˙0+λV0+cosθ1+I-CX02=U0-U∗TX0+λ2U0-U∗2+λcosθ1+I-CX02.Since λ>1+I-C/cosθ, we obtain λ2cosθ/1+I-C-1>0. Further(27)V˙t+λVt⩽Z0.By multiplying the inequality (27) into eλt, we obtain V˙(t)eλt+λV(t)eλt⩽Z0eλt which implies that(28)dVteλtdt⩽Z0eλt.Integrating (28) from 0 to t, it is obtained that Vt≤V0-1/λZ0e-λt+1/λZ0. Therefore, the trajectory of model (20) is bounded.
Since V(t) is bounded and inequality (26) holds, we obtain(29)V˙t+cosθ1+I-CXt≤Z0.Thus, (29) can be rewritten as(30)Ut-U∗,Xt+cosθ1+I-CXt2⩽Z0.Since U(t)-U∗ is bounded, (30) indicates that X(t)2 is also bounded. From (25) and (30), one obtains ∫0+∞X(s)2ds<+∞ and ∫0+∞X˙(s)ds<+∞.
Assuming ∫0+∞X(s)2ds=Q<+∞, and since ∫0+∞X˙(s)2ds<+∞, it implies that there exists P>0, such that X˙(t)<P for any t∈(0,+∞). Therefore, one obtains(31)∫0+∞Xs2X˙sds≤P∫0+∞Xs2ds<+∞,and it is easy to obtain that(32)∫0+∞Xs2X˙sds≤13limt→+∞Xt3-X03<+∞.
According to the theory of Calculus, the value of limt→+∞X(t)3 exists. Therefore, the value of limt→+∞X(t) also exists. Since ∫0+∞X(s)2ds<+∞, we obtain limt→+∞X(t)=0. Since V(t) is bounded, we have limt→+∞X(t)=0.
It follows from (20) that(33)limt→+∞X˙+U-PΩU-∇fU=0.If limt→+∞X˙(t)=0, then(34)limt→+∞U-PΩU-∇fU=0,which implies that limt→+∞U(t)=U∗. In the following, we will prove limt→+∞X˙(t)=0.
Defining rε(t)=(1/ε)(X(t+ε)-X(t)), and substituting rε(t) into (20), one obtains(35)r˙εt+λrεt=-1εNUt+ε+1εNUt.Since Condition (1) holds, we have(36)NUt+ε-NUtNUt+ε-NUt≤ε1+I-CcosθNUt+ε-NUtUt+ε-Utε.Hence, we get(37)NUt+ε-NUt≤ε1+I-Ccosθsups∈t,+∞Xs.
Integrating (35), it gives limt→+∞suprε(t)=0. Since X˙(t)≤suprε(t), we obtain limt→+∞X˙(t)=0. Thus, the solution of system (20) converges to the optimal set Ω. The proof is completed.
Remark 3.
In Theorem 2, Condition (1) should be satisfied. In the following, we discuss the existence of parameter cosθ/1+I-C. For any U1 and U2, we have(38)NU1-NU2=U1-PΩU1-∇fU1-U2+PΩU2-∇fU2≤U1-U2+PΩU1-∇fU1-PΩU2-∇fU2≤U1-U2+I-CU1-U2≤1+I-CU1-U2.Therefore, N(U) is also Lipschitz continuous and(39)NU1-NU22≤1+I-CNU1-NU2U1-U2.Thus, we get(40)NU1-NU22≤1+I-CcosθNU1-NU2,U1-U2,where θ is the angle between N(U1)-N(U2) and U1-U2. It is easy to obtain(41)NU1-NU2,U1-U2⩾cosθ1+I-CNU1-NU2.
3.2. Algorithms
Based on the above analysis, we summarize Algorithms 1, 2, and 3 to optimize GSNMF. Firstly, the parameters D, E, F, and G can be derived by Algorithm 1. Secondly, Algorithm 2 applies IPNN to optimize the CQP problem. Thirdly, the optimization problem (9) is divided into two CQP problems which are solved alternatively by Algorithm 3. In the following, we analyse the time complexity of our proposed algorithms. The main cost of our proposed algorithms is spent on the calculation of the gradient ∇f in Algorithm 2. To optimize subproblem (10), the operation in Algorithm 2 is the matrix product EU-UF+UG, which takes O(nk2+n2k). With the cost O(mnk) for calculating DT, the complexity of using Algorithm 2 to optimize the subproblem (10) is (42)Omnk+K1×Onk2+n2k.Similarly, the complexity of using Algorithm 2 to optimize the subproblem (11) is (43)Omnk+K1×Omk2.Hence, the overall cost to solve GSNMF is (44)Omnk+K1×Omk2+nk2+kn2.We summarize the time complexity of one iteration round of GSNMF, GNMF, and NMF in Table 1. At each iteration, there are two O(mnk) operations, same as NMF and GNMF.
Time complexity of GSNMF, GNMF, and NMF.
Solver
Time complexity
GSNMF
O(mnk)+K1×O(mk2+nk2+kn2)
GNMF
O(mnk+mk2+nk2+kn2)
NMF
O(mnk+mk2+nk2)
<bold>Algorithm 1: </bold>Parameters.
Input: Y, A, V, W, α, β
Output: D, E, F, G
(1) Calculate the number of rows k and columns n in the matrix V
(2) Calculate E with E←ATA
(3) Calculate F with F←αW
(4) Calculate the diagonal matrix G with Gii←α∑j=1nWij
(5) Calculate D with D←β1n×k-(YTA+VTATA)-2α(GVT-FVT)
In this section, we exhibit the global searching ability of GSNMF. By adjusting the inertial term λ in the neural network, different local optimal solutions can be searched. Let m=8, n=8, k=5, 1≤λ≤10, α=0, β=0, K1=5, K2=500, and h=0.2. In order to ensure the validity of this experiment, we provide the initial Y, A, and S in Tables 2, 3, and 4, respectively. Table 5 shows the comparison between GSNMF and NMF. To investigate whether GSNMF can converge, the convergence curve is depicted in Figure 1 with λ=1.
Y.
4.345
1.79
2.92
0.852
6.036
6.929
2.803
2.445
0.513
1.164
8.54
8.645
0.879
12.309
4.731
1.326
4.277
3.665
5.619
2.1
6.048
6.164
1.077
4.167
5.658
5.788
0.887
4.599
7.653
2.342
7.534
7.104
2.282
0.3
12.657
3.248
4.33
1.508
1.819
1.26
1.856
7.669
2.113
4.084
7.616
0.287
2.865
1.329
0.589
6.886
2.388
0.446
7.759
2.456
0.486
7.189
0.214
16.156
3.684
1.973
0.25
3.841
2.825
1.482
A.
0.797
0.394
0.001
1.029
0.182
0.3
0.633
0.28
1.14
0.727
0.81
0.078
1.589
1.764
1.152
0.062
0.408
0.265
0.78
0.716
0.073
0.389
0.005
1.768
0.166
1.921
1.45
1.474
0.673
0.016
0.346
1.896
0.284
1.135
0.087
1.849
0.058
0.445
1.574
1.745
S.
0.071
0.242
0.413
1.101
1.345
0.967
0.923
0.533
1.319
1.305
0.329
0.99
0.309
1.173
0.734
1.309
1.377
0.796
0.913
0.377
0.504
1.322
0.378
0.682
0.403
0.608
0.224
0.163
0.612
0.162
0.724
0.849
0.929
1.402
0.773
1.727
0.039
0.187
0.446
0.132
Comparisons between GSNMF using different λ and NMF.
GNMF
NMF
Objective values
λ=1, 20.5172
20.5172
λ=3, 20.7534
λ=6, 21.6804
Iteration number versus objective value for GSNMF when λ=1.
5. Application in Image Clustering5.1. Databases
To examine the clustering performance of GSNMF, we present the experiment in two databases including IRIS and COIL20. Their details are presented in the following (see also Table 6).
Two data sets.
Dataset
size (n)
Dimensionality (m)
Classes (k)
IRIS
150
4
3
COIL20
1440
256
20
(1) IRIS. It includes 150 instances with 4 features. There are 3 classes including Versicolour, Setosa, and Virginca. Each class includes 50 instances.
(2) COIL20. This data set is an image library which contains 1440 instances with 16 × 16 gray scale features. There are 20 different classes and each class contains 72 instances.
5.2. Compared Methods
We present the clustering performance on two databases using GSNMF and GNMF. There are two metrics including accuracy and normalized mutual information [33] to evaluate the clustering performance. To reveal the effect of the sparse constraint, different cardinalities (the number of zero entries) of the initial S are considered to evaluate the clustering performance. Because NMF is a nonconvex problem, different initial A and S may lead to different local optimal solutions. For the fair comparison, we try 10 different initial A and S and report the average results. We compare different methods in two cases. Firstly, A and S are randomly generated between 0 and 1. Secondly, A and S are randomly generated between −1 and 1.
5.3. Compared Results
Tables 7 and 8 present the clustering results on IRIS in two cases. The parameters for IRIS are h=0.02, K1=3, K2=200, α=1, β=1e-4, and 0≤λ≤10. The clustering performance on COIL20 is shown in Tables 9 and 10 in two cases. The parameters for COIL20 are h=0.02, K1=20, K2=200, α=1, β=1e-4, and 0≤λ≤10. These tables reveal some interesting points:
When the sparse constraint is imposed on GNMF, the clustering performance of GSNMF is better than NMF and GNMF.
When the initial A and S have some negative entries, NMF and GNMF fail to cluster. However, GSNMF is not affected in this case.
Clustering performance on IRIS in case 1.
Cardinality (%)
Accuracy (%)
Normalized mutual information (%)
SGNMF
GNMF
NMF
SGNMF
GNMF
NMF
0
98.27
91.60
83.87
93.09
80.66
70.58
10
98.47
77.93
69.47
93.97
55.70
48.48
20
98.27
71.93
68.67
93.09
42.23
38.86
30
98.33
61.93
62.87
93.39
28.90
28.82
40
98.27
56.93
55.80
93.15
18.89
17.59
50
98.33
53.40
53.27
93.39
13.35
12.96
Clustering performance on IRIS in case 2.
Cardinality (%)
Accuracy (%)
Normalized mutual information (%)
SGNMF
GNMF
NMF
SGNMF
GNMF
NMF
0
98.60
34.00
34.00
94.56
1.035
1.035
10
98.13
34.00
34.00
93.39
1.035
1.035
20
98.20
34.00
34.00
93.02
1.035
1.035
30
98.00
34.00
34.00
92.27
1.035
1.035
40
98.13
34.00
34.00
92.92
1.035
1.035
50
98.00
34.00
34.00
92.33
1.035
1.035
Clustering performance on COIL20 in case 1.
Cardinality (%)
Accuracy (%)
Normalized mutual information (%)
SGNMF
GNMF
NMF
SGNMF
GNMF
NMF
0
68.92
72.92
64.54
77.52
84.88
74.18
10
68.24
63.37
57.63
77.05
71.78
65.49
20
68.15
53.40
48.39
77.31
61.90
56.26
30
68.38
45.81
43.91
77.39
54.34
51.60
40
68.06
41.03
38.28
77.31
49.45
46.26
50
68.67
35.25
32.64
77.61
42.92
41.42
Clustering performance on COIL20 in case 2.
Cardinality (%)
Accuracy (%)
Normalized mutual information (%)
SGNMF
GNMF
NMF
SGNMF
GNMF
NMF
0
68.22
5.07
5.07
77.05
1.38
1.035
10
68.16
5.07
5.07
77.36
1.38
1.035
20
68.78
5.07
5.07
77.11
1.38
1.035
30
68.12
5.07
5.07
77.08
1.38
1.035
40
68.38
5.07
5.07
77.33
1.38
1.035
50
68.25
5.07
5.07
77.67
1.38
1.035
5.4. Parameters Selection
Our GSNMF has one essential parameter: the inertia term λ. Figures 2 and 3 depict the average performance of GSNMF with different λ.
The performance of GSNMF versus different λ on IRIS data set.
Accuracy
Normalized mutual information
The performance of GSNMF versus different λ on COIL20 data set.
Accuracy
Normalized mutual information
5.5. Convergence Study
According to the BCD and IPNN theory, the method for optimizing GSNMF is proved to be convergent. Here we investigate whether this method can converge to a stationary point. Figure 4 depicts the convergence curves of GSNMF on two data sets. For each figure, the x-axis is the iteration number and the y-axis denotes the objective value.
Iteration number versus objective value (100 : 1 scale) for GSNMF when λ=5 on IRIS and COIL20 data sets.
IRIS
COIL20
6. Conclusion and Future Work
We propose a dimensionality reduction method, which can be solved by the inertial projection neural network. According to the experiments, three advantages are presented. Firstly, different local solutions can be achieved with different inertial terms λ. Secondly, the clustering performance cannot be affected by the negative initial values. However, GNMF and NMF have poor performance in clustering with negative initial values. Thirdly, if the initial values are sparse, our proposed method performs better than GNMF and NMF in the clustering.
Several topics remain to be discussed in our future work:
There is a parameter λ which searches the global optimal solution of GSNMF. Thus, a suitable value of λ is critical to our algorithm. It remains unclear how to select λ theoretically.
h is a step length to decide the convergence rate in Algorithm 2. If it is assigned a small value, slow convergence makes a bad clustering performance. Thus, an adaptive step length should be considered.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
AgarwalS.AwanA.RothD.Learning to detect objects in images via a sparse, part-based representationDeerwesterS.DumaisS. T.FurnasG. W.LandauerT. K.HarshmanR.Indexing by latent semantic analysisLeeD. D.SeungH. S.Learning the parts of objects by non-negative matrix factorizationKalmanD.A Singularly Valuable Decomposition: The SVD of a MatrixJolliffeI. T.GershoA.GrayR. M.CichockiA.ZdunekR.AmariS.-I.New algorithms for non-negative matrix factorization in applications to blind source separation5Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP '06)May 2006Toulouse, FranceV621V62410.1109/ICASSP.2006.16613522-s2.0-33947675352PaucaV. P.ShahnazF.BerryM. W.PlemmonsR. J.Text mining using non-negative matrix factorizationsProceedings of the 4th SIAM International Conference on Data Mining2004452456MR2388467XuW.LiuX.GongY.Document clustering based on non-negative matrix factorizationProceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR '03)August 2003Toronto, Canada26727310.1145/860435.860485HoyerP. O.Non-negative sparse codingProceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing200255756510.1109/NNSP.2002.1030067CaiD.HeX.HanJ.HuangT. S.Graph regularized nonnegative matrix factorization for data representationChenW.-S.ZhaoY.PanB.ChenB.Supervised kernel nonnegative matrix factorization for face recognitionLiuH.WuZ.LiX.CaiD.HuangT. S.Constrained nonnegative matrix factorization for image representationHopfieldJ. J.TankD. W.Neural computation of decisions in optimization problemsXiaY.LeungH.WangJ.A projection neural network and its application to constrained optimization problemsHeX.HuangT.YuJ.LiC.An inertial projection neural network for solving variational inequalitiesCheH.LiC.HeX.HuangT.An intelligent method of swarm neural networks for equalities-constrained nonconvex optimizationWangJ.Analysis and design of a recurrent neural network for linear programmingHuX.ZhangB.An alternative recurrent neural network for solving variational inequalities and related optimization problemsHuX.WangJ.A recurrent neural network for solving a class of general variational inequalitiesHuX.WangJ.Design of general projection neural networks for solving monotone linear variational inequalities and linear and quadratic optimization problemsYanZ.WangJ.LiG.A collective neurodynamic optimization approach to bound-constrained nonconvex optimizationLiuQ.WangJ.A one-layer recurrent neural network with a discontinuous hard-limiting activation function for quadratic programmingFanJ.WangJ.A collective neurodynamic optimization approach to nonnegative matrix factorizationYanZ.WangJ.Nonlinear model predictive control based on collective neurodynamic optimizationShiJ.RenX.DaiG.WangJ.ZhangZ.A non-convex relaxation approach to sparse dictionary learningProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '11)June 20111809181610.1109/CVPR.2011.59955922-s2.0-80052910914LiG.YanZ.WangJ.A one-layer recurrent neural network for constrained nonconvex optimizationLeeH.BattleA.RainaR.NgA.Efficient sparse coding algorithmsProceedings of the International Conference on Neural Information Processing Systems2006801808CaiD.BaoH.HeX.Sparse concept coding for visual analysisProceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2011June 2011USA290529102-s2.0-8005291314110.1109/CVPR.2011.5995390OlshausenB. A.FieldD. J.Emergence of simple-cell receptive field properties by learning a sparse code for natural imagesTsengP.Convergence of a block coordinate descent method for nondifferentiable minimizationFigueiredoM. A. T.NowakR. D.WrightS. J.Gradient projection for sparse reconstruction: application to compressed sensing and other inverse problemsCaiD.HeX.HanJ.Document clustering using locality preserving indexing