^{1}

^{2, 3, 4}

^{1}

^{2}

^{3}

^{4}

Recently, a considerable growth of interest in projected gradient (PG) methods has been observed due to their high efficiency in solving large-scale convex minimization problems subject to linear constraints. Since the minimization problems underlying nonnegative matrix factorization (NMF) of large matrices well matches this class of minimization problems, we investigate and test some recent PG methods in the context of their applicability to NMF. In particular, the paper focuses on the following modified methods: projected Landweber, Barzilai-Borwein gradient projection, projected sequential subspace optimization (PSESOP), interior-point Newton (IPN), and sequential coordinate-wise. The proposed and implemented NMF PG algorithms are compared with respect to their performance in terms of signal-to-interference ratio (SIR) and elapsed time, using a simple benchmark of mixed partially dependent nonnegative signals.

Nonnegative
matrix factorization (NMF) finds such nonnegative factors (matrices)

This method has found a variety of real-world
applications in the areas such as blind separation of images and nonnegative
signals [

Depending on an application, the estimated factors may
have different interpretation. For example, Lee and Seung [

Our objective is to estimate the mixing matrix

Obviously, NMF is not unique in general case, and it
is characterized by a scale and permutation indeterminacies. These problems
have been addressed recently by many researchers [

The noise distribution is strongly
application-dependent, however, in many BSS applications, a Gaussian noise is
expected. Here our considerations are restricted to this case, however, the
alternative NMF algorithms optimized to different distributions of the noise
exist and can be found, for example, in [

NMF was proposed by Paatero and Tapper [

In this paper, we extend our approach to NMF that we
have initialized in [

The main objective of this paper is to develop,
extend, and/or modify some of the most promising PG algorithms to a standard
NMF problem and to find optimal conditions or parameters for such a class of
NMF algorithms. The second objective is to compare the performance and
complexity of these algorithms for NMF problems, and to discover or establish
the most efficient and promising algorithms. We would like to emphasize that
most of the discussed algorithms have not been implemented neither used till
now or even tested before for NMF problems, but they have been rather
considered for solving a standard system of algebraic equations:

In Section

In contrast to the multiplicative algorithms, the
class of PG algorithms has additive updates. The algorithms discussed here
approximately solve nonnegative least squares (NNLS) problems with the basic
alternating minimization technique that is used in NMF:

The solution

Similarly, the KKT conditions for the solution

There are many approaches to solve the problems (

The projection

The Landweber method [

One of the fundamental PG algorithms for NMF was
proposed by Lin in [

For computation of

The Matlab implementation of the Lin-PG algorithm is
given in [

The Barzilai-Borwein gradient projection method
[

Since

The Matlab implementation of the GPSR-BB algorithm,
which solves the system

The projected
sequential subspace optimization (PSESOP) method [

% Barzilai-Borwein gradient projection (GPSR-BB) algorithm

%

function [X] = nmf_gpsr_bb(A,Y,X,no_iter)

%

% [X] = nmf_gpsr_bb(A,Y,X,no_iter) finds such matrix X that solves

% the equation AX = Y subject to nonnegativity constraints.

%

% INPUTS:

% A - system matrix of dimension [I by J]

% Y - matrix of observations [I by T]

% X - matrix of initial guess [J by T]

% no_iter - maximum number of iterations

%

% OUTPUTS:

% X - matrix of estimated sources [J by T]

%

% #########################################################################

% Parameters

alpha_min = 1E-8; alpha_max = 1;

alpha = .1*ones(1,size(Y,2));

B = A’*A; Yt = A’*Y;

for k=1:no_iter

end

The parameter

The line search vector

The interior point Newton (IPN) algorithm [

If the solution is degenerate, that is,

The transformation of the normal matrix

Since our cost function is quadratic, its minimization
in a single step is performed with combining the projected Newton step with the
constrained scaled Cauchy step that is given in the form

The usage of the constrained scaled Cauchy step leads
to the following updates:

The Matlab code of the IPN algorithm, which solves the
system

% Interior Point Newton (IPN) algorithm function

%

function [x] = nmf_ipn(A,y,x,no_iter)

%

% [x]=nmf_ipn(A,y,x,no_iter) finds such x that solves the equation Ax = y

% subject to nonnegativity constraints.

%

% INPUTS:

% A - system matrix of dimension [I by J]

% y - vector of observations [I by 1]

% x - vector of initial guess [J by 1]

% no_iter - maximum number of iterations

%

% OUTPUTS:

% x - vector of estimated sources [J by 1]

%

% #########################################################################

% Parameters

s = 1.8; theta = 0.5; rho = .1; beta = 1;

H = A’*A; yt = A’*y; J = size(x,1);

% Main loop

for k=1:no_iter

g = H*x - yt; d = ones(J,1); d(g >= 0) = x(g >= 0);

ek = zeros(J,1); ek(g >=0 & g < x.

M = H + diag(ek./d);

dg = d.*g;

tau1 = (g’*dg)/(dg’*M*dg); tau_2vec = x./dg;

tau2 = theta*min(tau_2vec(dg > 0));

tau = tau1*ones(J,1); tau(x - tau1*dg <= 0) = tau2;

w = 1./(d + ek); sk = sqrt(w.*d); pc = - tau.*dg;

Z = repmat(sk,1,J).*M.*repmat(sk’,J,1);

rt = -g./sk;

p = pt.*sk;

phx = max(0, x + p) - x;

ph = max(rho, 1 - norm(phx))*phx;

Phi_pc = .5*pc’*M*pc + pc’*g; Phi_ph = .5*ph’*M*ph + ph’*g;

red_p = Phi_ph/Phi_pc; dp = pc - ph;

if red_p >= beta

else

end

sk = t*dp + ph;

x = x + sk;

end % for k

The NNLS problem (

The sequential coordinate-wise algorithm (SCWA)
proposed first by Franc et al. [

Updating only single variable

Finally, the SCWA can take the following updates:

All the proposed algorithms were implemented in our
NMFLAB, and evaluated with the numerical tests related to typical BSS problems.
We used the synthetic benchmark of 4 partially dependent nonnegative signals
(with only

Dataset: (a) original 4 source signals, (b) observed 8 mixed signals.

Because the number of variables in

In general, the FP-ALS algorithm solves the
least-squares problem

The alternating minimization is nonconvex in spite of
the cost function being convex with respect to one set of variables. Thus, most
NMF algorithms may get stuck in local minima, and hence, the initialization
plays a key role. In the performed tests, we applied the multistart initialization
described in [

The multilayer technique can be regarded as multistep
decomposition. In the first step, we perform the basic decomposition

There are many stopping criteria for terminating the
alternating steps. We stop the iterations if

The algorithms have been evaluated with the
signal-to-interference ratio (SIR) measures, calculated separately for each
source signal and each column in the mixing matrix. Since NMF suffers from
scale and permutation indeterminacies, the estimated components are adequately
permuted and rescaled. First, the source and estimated signals are normalized
to a uniform variance, and then the estimated signals are permuted to keep the
same order as the source signals. In NMFLAB [

We test the algorithms with the Monte Carlo (MC)
analysis, running each algorithm 100 times. Each run has been initialized with
the multistart procedure. The algorithms have been evaluated with the mean-SIR
values that are calculated as follows:

Mean-SIRs [dB] obtained with 100 samples of Monte
Carlo analysis for the estimation of sources and columns of mixing matrix from
noise-free mixtures of signals in Figure

Algorithm | Mean- | Mean- | Time | ||||||
---|---|---|---|---|---|---|---|---|---|

M-NMF (best) | 21 | 22.1 | 42.6 | 37.3 | 26.6 | 27.3 | 44.7 | 40.7 | 1.9 |

M-NMF (mean) | 13.1 | 13.8 | 26.7 | 23.1 | 14.7 | 15.2 | 28.9 | 27.6 | |

M-NMF (worst) | 5.5 | 5.7 | 5.3 | 6.3 | 5.8 | 6.5 | 5 | 5.5 | |

OPL(best) | 22.9 | 25.3 | 46.5 | 42 | 23.9 | 23.5 | 55.8 | 51 | 1.9 |

OPL(mean) | 14.7 | 14 | 25.5 | 27.2 | 15.3 | 14.8 | 23.9 | 25.4 | |

OPL(worst) | 4.8 | 4.8 | 4.8 | 5.0 | 4.6 | 4.6 | 4.6 | 4.8 | |

Lin-PG(best) | 36.3 | 23.6 | 78.6 | 103.7 | 34.2 | 33.3 | 78.5 | 92.8 | 8.8 |

Lin-PG(mean) | 19.7 | 18.3 | 40.9 | 61.2 | 18.5 | 18.2 | 38.4 | 55.4 | |

Lin-PG(worst) | 14.4 | 13.1 | 17.5 | 40.1 | 13.9 | 13.8 | 18.1 | 34.4 | |

GPSR-BB(best) | 18.2 | 22.7 | 7.3 | 113.8 | 22.8 | 54.3 | 9.4 | 108.1 | 2.4 |

GPSR-BB(mean) | 11.2 | 20.2 | 7 | 53.1 | 11 | 20.5 | 5.1 | 53.1 | |

GPSR-BB(worst) | 7.4 | 17.3 | 6.8 | 24.9 | 4.6 | 14.7 | 2 | 23 | |

PSESOP(best) | 21.2 | 22.6 | 71.1 | 132.2 | 23.4 | 55.5 | 56.5 | 137.2 | 5.4 |

PSESOP(mean) | 15.2 | 20 | 29.4 | 57.3 | 15.9 | 34.5 | 27.4 | 65.3 | |

PSESOP(worst) | 8.3 | 15.8 | 6.9 | 28.7 | 8.2 | 16.6 | 7.2 | 30.9 | |

IPG(best) | 20.6 | 22.2 | 52.1 | 84.3 | 35.7 | 28.6 | 54.2 | 81.4 | 2.7 |

IPG(mean) | 20.1 | 18.2 | 35.3 | 44.1 | 19.7 | 19.1 | 33.8 | 36.7 | |

IPG(worst) | 10.5 | 13.4 | 9.4 | 21.2 | 10.2 | 13.5 | 8.9 | 15.5 | |

IPN(best) | 20.8 | 22.6 | 59.9 | 65.8 | 53.5 | 52.4 | 68.6 | 67.2 | 14.2 |

IPN(mean) | 19.4 | 17.3 | 38.2 | 22.5 | 22.8 | 19.1 | 36.6 | 21 | |

IPN(worst) | 11.7 | 15.2 | 7.5 | 7.1 | 5.7 | 2 | 1.5 | 2 | |

RMRNSD(best) | 24.7 | 21.6 | 22.2 | 57.9 | 30.2 | 43.5 | 25.5 | 62.4 | 3.8 |

RMRNSD(mean) | 14.3 | 19.2 | 8.3 | 33.8 | 17 | 21.5 | 8.4 | 33.4 | |

RMRNSD(worst) | 5.5 | 15.9 | 3.6 | 8.4 | 4.7 | 13.8 | 1 | 3.9 | |

SCWA(best) | 12.1 | 20.4 | 10.6 | 24.5 | 6.3 | 25.6 | 11.9 | 34.4 | 2.5 |

SCWA(mean) | 11.2 | 16.3 | 9.3 | 20.9 | 5.3 | 18.6 | 9.4 | 21.7 | |

SCWA(worst) | 7.3 | 11.4 | 6.9 | 12.8 | 3.8 | 10 | 3.3 | 10.8 |

For comparison, Table

The performance of the proposed NMF algorithms can be
inferred from the results given in Table

It is easy to notice that our NMF-PSESOP algorithm gives the best estimation (the sample which has the highest best-SIR value), and it gives only slightly lower mean-SIR values than the Lin-PG algorithm. Considering the elapsed time, the PL, GPSR-BB, SCWA, and IPG belong to the fastest algorithms, while the Lin-PG and IPN algorithms are the slowest.

The multilayer technique generally improves the
performance and consistency of all the tested algorithms if the number of
observation is close to the number of nonnegative components. The highest
improvement can be observed for the NMF-PSESOP algorithm, especially when the
number of inner iterations is greater than one (typically,

In summary, the best and the most promising NMG-PG algorithms are NMF-PSESOP, GPSR-BB, and IPG algorithms. However, the final selection of the algorithm depends on a size of the problem to be solved. Nevertheless, the projected gradient NMF algorithms seem to be much better (in the sense of speed and performance) in our tests than the multiplicative algorithms, provided that we can use the squared Euclidean cost function which is optimal for data with a Gaussian noise.