^{1}

^{1}

^{1}

In recent years, nonnegative matrix factorization (NMF) methods of a reduced image data representation attracted the attention of computer vision community. These methods are considered as a convenient part-based representation of image data for recognition tasks with occluded objects. A novel modification in NMF recognition tasks is proposed which utilizes the matrix sparseness control introduced by Hoyer. We have analyzed the influence of sparseness on recognition rates (RRs) for various dimensions of subspaces generated for two image databases, ORL face database, and USPS handwritten digit database. We have studied the behavior of four types of distances between a projected unknown image object and feature vectors in NMF subspaces generated for training data. One of these metrics also is a novelty we proposed. In the recognition phase, partial occlusions in the test images have been modeled by putting two randomly large, randomly positioned black rectangles into each test image.

Subspace methods represent a separate branch of high-dimensional data analysis, such as in areas of computer vision and pattern recognition. In particular, these methods have found efficient applications in the fields of face identification and recognition of digits and characters. In general, they are characterized by learning a set of basis vectors from a set of suitable image templates. The subspace spanned by this vector basis captures the essential structure of the input data. Having found the subspace (offline phase), the classification of a new image (online phase) is accomplished by projecting it on the subspace in some way and by finding the nearest neighbor of templates projected onto this subspace.

In 1999, Lee and Seung [

The main idea of NMF application in visual object
recognition is that the NMF algorithm identifies localized parts describing the
structure of that object type. These localized parts can be added in a purely
additive way with varying combination coefficients to form the individual
objects. The original algorithm of Lee and Seung could not achieve this
locality of essential object parts in a proper way. Thus other authors
investigated the possibilities to control the sparseness of the basis images
(columns in

One important problem by using NMF for recognition
tasks is how to obtain NMF subspace projections for new image data that are
comparable with the feature vectors determined in NMF coded in matrix

An important aspect in measuring distances in NMF
subspaces, which is necessary in recognition tasks, is the used metric. NMF
subspace basis vectors do not form an orthogonal system. Due to this fact, it
is not convenient to apply the natural Euclidean metric. Guillamet and
Vitrià [

In our research, we focus on studying the influence of
matrix sparseness parameters, subspace dimension, and the use of distance
measures on the recognition rates, in particular for partially occluded
objects. We use Hoyer's algorithms to achieve sparseness control. Additionally,
we propose a modification of the entire NMF task similar to the methods of Yuan
and Oja [

In Section

The aim of the work of Hoyer [

Given a nonnegative
data matrix

The projected
gradient descent algorithm for NMF with sparseness constraints essentially
takes a step in the direction of the negative gradient, and subsequently
projects onto the constraint space, making sure that the taken step is small
enough that the objective function is reduced at every step. The main muscle of
the algorithm is the projection operator proposed by Hoyer [

In the papers mentioned up to now, the attention was concentrated on methodological aspects of NMF as a part-based representation of image data, as well as on numerical properties of the developed optimization algorithms applied to the matrix factorization problem. It turned out that the notion of matrix sparseness involved in NMF plays the central role in part-based representation. However, little effort has been devoted to systematic analysis of the behavior of the NMF algorithms in actual pattern recognition problems, especially for partially occluded data.

For a particular recognition, task of objects
represented by a set of training images (

Visualization
of the Nmfsc results for a low-dimension example (3D data sets

Nonetheless, both methods have their disadvantages. The method of Guillamet and Vitria operates with nonorthogonal projected feature vectors that directly stem from the NMF algorithm and do not reflect the data cluster separation in the subspace. On the other hand, the conventional method does not accommodate the optimal data approximation result determined in NMF because one of the two optimal factor matrices is substituted by a different one in the classification phase. Our intention was to combine the benefits of both methods into one, that is, benefits of orthogonal projections of input data and preservation of the optimal training data approximation of NMF. We achieve this by changing the NMF task itself. Before we present this modification, we recall in more detail how the orthogonal projections of the input data are computed.

As the basis matrix

If one has decided to use the orthogonal projections
of input data onto the subspace as feature vectors, the fact that the matrix

Within this novel concept (

There are two existing methods that are related to
modNMF in two complementary ways. In projective NMF (

Having solved the NMF task for the given training
images (matrix

As outlined by various authors mentioned in Section

For comparison reasons, we also included the Euclidean
metric

The third metric, Riemannian distance, will be
described in more detail, as it is the basis of our proposal, ARC-distance. Liu
and Zheng [

For the standard Euclidean metric

To be able to deal with partial occlusions, the
correctly chosen distance measure should also be able to discriminate two
specific cases of vectors: (i) a case for which the value of the Riemannian
distance of two vectors is large because of great deviations in all components
of these vectors, and (ii) a case when only a few components contribute to the
great value of the Riemannian distance, that is, when the error of recognition
is sparsely distributed over the feature vector components. Therefore, to
define a modified Riemannian (shortly “ARC-distance”) distance, we introduce
a sparseness term into the Riemannian metric formula, that is,

The goal of our study was to investigate influences of

For our experiments, we chose three widely used image
databases: (i) the Cambridge ORL face database (cited in paper of Li et al.
[

An example of face images of one person selected from the ORL face database—two top lines. An example of different randomly occluded faces—the bottom line.

An example of handwritten digit images selected from the USPS database—two top lines. An example of different randomly occluded digits—the bottom line.

An example of face images of two persons selected from the CBCL face database—two top lines. An example of different randomly occluded faces—the bottom line.

In the case of ORL database, the number of training
images was 222, and the number of testing images was 151. These two sets of
images were chosen as disjunctive sets. For the experiments with USPS database,
we chose 2000 training images and 1000 testing images (different from the
training ones again). (In the USPS
recognition rate plots (Figures

The results of
the first set of our experiments, accomplished for all three image bases, and
for unoccluded, as well as occluded images are displayed in Figures

Classification
results for ORL training image data using Hoyer's method. (a), (c), (e): unoccluded
test images for

Classification
results for USPS training image data using Hoyer's method. (a), (c), (e): unoccluded
test images for

Classification
results for CBCL training image data using Hoyer's method. (a), (c), (e): unoccluded
test images for

Classification results for ORL training image data. (a), (c), (e): Hoyer's
Nmfsc algorithm applied to occluded test images for

For unoccluded images, all three data sets show
similar RR behavior in the cases of the Riemannian-like metrics (Riemannian and
ARC-distance), only CBCL RR are slightly smaller. The Euclidean and diffusion
curves for the ORL and CBCL data are almost as high as for the Riemannian-like measures,
but also, as one would expect. Their behavior for USPS data even more fulfills
these expectations, as they are much smaller than the Riemannian-like RR curves
and, moreover, decrease with increasing dimension. This behavior is expectable,
as more (nonorthogonal) basis vectors introduce more error components into the
distance computation. This happens due to the fact that Euclidean and diffusion
distance do not take into account the mutual basis vector angles. The dimension
reduction for all datasets is very high, as for Riemannian-like metric all
three achieve the maximal RR at about

The RR behavior for occluded data differs severely
between ORL and USPS data. First, RR maxima for USPS data are higher than for
ORL data—below 0.7 in the ORL case versus about 0.75 for USPS data. Second,
for ORL data the RR curves of the metrics do not behave in the expected way.
Euclidean and diffusion distance generate much better results than the
Riemannian-like. For USPS, RR behave qualitatively in the same way as in the
unoccluded case, RR values are only smaller. Finally, RR maxima are achieved
for higher dimension values in the ORL case, that is, a much smaller dimension
reduction. In the case of CBCL image database, the situation changes
dramatically in comparison to that of ORL face images: in average the RR are
50% smaller, they are reaching approximately the value of 0.3 (comparing to 0.7
maximum for ORL). For two value combinations of the sparseness parameters in
Hoyer's method (Figures

In the second
part of our study, we were interested in a comparison of the RR of Nmfsc and
modNMF, latter one being implemented with Hoyer's sparseness control mechanisms.
Of course, since the NMF methodology is intended mainly to generate part-based
subspace representation of template images, our further interest was
concentrated only on occluded images. These results, obtained for optimum
values of sparseness parameter

Classification results for USPS training image data. (a), (c), (e): Hoyer's
Nmfsc algorithm applied to occluded test images for

Classification results for CBCL training image data. (a), (c), (e): Hoyer's
Nmfsc algorithm applied to occluded test images for

The qualitative behavior of the RR curves of ORL faces
according to the distance measures is the same as described in Section

In this paper,
we have analyzed the influence of the matrix sparseness, controlled in NMF
tasks via Hoyer's algorithm [

(1) The ability of NMF methods to solve recognition tasks is dependent on the kind of used images and the databases as a whole. Independently of the method, the RR for USPS data are higher than those for ORL face data. This finding could be ascribed to the simpler structure of the digits (almost binary data, lower resolution, objects sparsely cover the image area). Moreover, USPS contain much larger classes (USPS: 2000 training images for only 10 classes, ORL: 222 images with only 5 training images per class), so that the interclass variations in USPS can better be covered. In general, the RR obtained for faces from the CBCL database are significantly worse than in comparable cases with ORL face images. We assign these results to the poor resolution of the structured face image data.

(2) Not following the overall expectation, Euclidean and diffusion distances showed better recognition performances for occluded test images in the case of ORL data. As these do not take into account subspace bases angles this is a surprise. USPS
data treated with Hoyer's Nmfsc method behave like expected: with increasing
dimension and decreasing

(3) Massive recognition experiments using Nmsfc and modNMF algorithms, reported in our preliminary study [

USPS performed better and followed the overall expectations better than ORL and CBCL. We basically ascribe this fact to the different training data situations. As mentioned in the first point above, inter-class variations were much more covered for the USPS dataset than for the face images. The novel modNMF algorithm even improved the results achieved in the case of the already well performing USPS data set. ARC-distance in its current form did not fulfill the expectations in the experiments. Significantly, lower spatial resolution of the CBCL face data than the face data in the ORL image base is reflected in apparent decrease of recognition rates for occluded images for both methods being compared. Various distances used for the CBCL database manifested little influence on RR.

Spratling [