Image classification is a process that depends on the descriptor used to represent an object. To create such descriptors we use object models with rich information of the distribution of points. The object model stage is improved with an optimization process by spreading the point that conforms the mesh. In this paper, particle swarm optimization (PSO) is used to improve the model generation, while for the classification problem a support vector machine (SVM) is used. In order to measure the performance of the proposed method a group of objects from a public RGBD object data set has been used. Experimental results show that our approach improves the distribution on the feature space of the model, which allows to reduce the number of support vectors obtained in the training process.
Over the past years, there has been an increasing interest in object recognition. Object recognition can be divided in two major tasks: object localization and image classification.
In this paper we propose an image classification system based on invariant moment descriptor that includes depth information. The 3D data allows producing small and robust descriptors that will improve the image classification. These descriptors are constructed using object models with rich information of the distribution of points. The model generation stage requires that best points are selected; therefore this stage can be defined as an optimization problem.
Mathematical optimization is the selection of the best element with regard to some criteria. In the simplest case, it is consisted of maximizing or minimizing a fitness function [
The sensor used in this work is a Kinect [
In this work we analyze the inclusion of 3D information (provided by an RGBD sensor) and the use of the PSO algorithm to create robust object models. Using this approach we can construct small and robust descriptors that improve image classification.
The rest of the paper is organized as follows. The next section will present the proposed image classification system. In Section
In recent years image classification has become a very active research field. The goal of the classification task is to develop an algorithm which will assign a classlabel to each one of the samples in the training data set. There are two main approaches for classification in the literature, namely, (a) supervised approach and (b) unsupervised approach, where the former uses a set of samples to train the classification, and the latter performs the classification by exploring the data [
One of the most widely used detectors is the Harris Corner detector [
Many different feature descriptors have been proposed in the literature: steerable filters [
Histogram of oriented gradients (HoG) was proposed in [
One of the problems encountered during the design of an image classification technique is data overfitting, which arises when the training samples are small in comparison with the number of features. To overcome this problem we design a descriptor with an optimized 3D point distribution.
The contribution of this work is the development of a small and robust descriptor based on invariant moments and 3D information in order to improve the classification process. The 3D information is incorporated from depth data obtained from an RGBD sensor. The object model is optimized through the use of a PSO algorithm; this optimization allows improving the classification.
The proposed classification system uses 3D information and is based on local features, invariant moments, contour generation, and a reduction of depth information using a mesh grid. It consists of five main steps: feature extraction, contour creation, mesh reduction, mesh optimization, and invariant moment descriptor formation; see Figure
Image classification system.
The first step is the extraction of SURF keypoints from the input image (Figure
SURF and SIFT detectors employ slightly different ways of detecting features. SIFT detector builds image pyramids and filters each layer with a Gaussian of increasing sigma values and then it takes the difference. SURF detector uses a box filter approximation. A comparison of the SIFT and SURF detectors is presented in [
The RGBD sensor provides a huge amount of data; thus the information must be reduced into a contourmesh model to decrease computational cost. To construct the contourmesh we use keypoints. First, they are scaled and translated; then, we compute the magnitude of the keypoints with respect to their centroid
Totem object, with SURF keypoints.
Sample contour for the totem object.
In the next sections we will explain the last two steps of the proposed approach, namely, the mesh reduction step and the construction of the descriptor.
In this step, the 3D data that belongs to the object contour is segmented, and then a cloud of 3D points is obtained. Due to the large number of points in the cloud, a reduction of the data will be required in order to keep a low computational cost. Therefore, in this step a mesh that covers the 3D point cloud is constructed, which allows to reduce the number of 3D points in the cloud, without losing important information.
First, we proceed to extract depth information contained within the boundaries of the contour and reduce the points that will be taken to compute the moments. The reduction of points aims to generate a smaller set with rich information by adjusting a mesh grid over the object. The initial position of the points is obtained sectioning the bounding box, into equally separated cells, generating
Then, for each invalid point, we take at random two valid points and move the outlier to a position between them,; this can be seen as a biased migration. Later, we attach the
Mesh creation example.
An image through the different stages of the procedure. (a) Original input image. (b) Feature extraction. (c) Contour created for the object. (d) Depth information attached in a mesh grid.
It is important to mention that a point will be considered valid if and only if the coordinates
These steps generate an object model for which we can extract information that we can use for classification; however the simple migration of points produces a model in which points are not equally distributed. This problem is solved by applying evolutionary computation.
The mesh reduction step produces an object model; however the points are not equally distributed and thus an optimization step is required. For the optimization step, we have chosen an evolutionary computation (EC) technique, due to all the constraints of the mesh optimization problem.
Evolutionary algorithms (EA) are stochastic search methods inspired by the behavior of swarms’ populations or natural biological behavior. In general there are five popular algorithms: genetic algorithms [
To solve the mesh optimization problem, the PSO algorithm (Algorithm
(1)
(2) Calculate fitness value of each particle using the fitness function
(3) Update local best if the current fitness value is better
(4) Determine global best: take the best fitness particle and compare it to current best
(5) For each particle
(6) Calculate particle velocity according to (
(7) Update particle position according to (
(8)
In our approach, each PSO particle represents a point in the mesh. The problem has multiple boundaries; every point of the contour is one. Thus, we have to check if particles lie inside the polygon and clamp them after every update. Instead of gathering particles to a global best position they take positions separated uniformly from each other. This is obtained through the fitness function and a modification in the updating rules.
The objective of this method is the construction of a mesh with the best distribution of points. The first step in the mesh construction is the determination of the object contour (Section
PSO is a stochastic search method inspired from the behavior of swarm animals, like bird flocking and fish schooling. In PSO, particles, or solution candidates, move over the search space towards the zone with the best conditions using a cognitive component of the relative particle and a social component generated by the swarm (best local and global positions). This lets PSO to evolve social behavior and relative movement into global optimum solutions [
In the iterative process, the position
For more details on PSO, the interested reader is referred to view [
Instead of a single fitness function we undertake three steps to get a fitness value. First, we measure the distance of each particle to its nearest neighbor; this measure gives information about how separated is every particle. We only take the distance between the current
Then we calculate the mean of distances
Next, we compute the difference between the local distance of the particle to its neighbor and the mean global distance. Thus, the fitness of
This is the fitness value, by minimizing (
Since our function is multiconstrained, we add a penalty term to the fitness function. Particles that go out of the boundaries are penalized according to the distance that they have to the centroid. We add a constant
Additional to the penalization term, we also use a preserve feasibility approach as explained by [
The position update includes a variable
The threshold value depends on the range of the data set; it establishes how far to the border the data can be without being affected. In our case feature points are scaled to
In the velocity update, the global coefficient has the effect on particles to move towards the best position of the swarm. Since in this application we do not need that effect, the term is replaced by one that makes particles get closer or farther from their neighbor as needed. This behavior is accomplished by using the sign of the distance between particles, and a constant value
The fitness function of the algorithm is defined to minimize the distance between the particle
With respect to the stop criteria of the algorithm, we cannot force PSO to stop when only the best value
The inclusion of PSO in the mesh reduction step of the image classification system aims to spread points and create models that describe the entire object surface better. After the initial generation of points in a grid over the object and the migration of points that lie outside the object, the PSO variation is applied to the mesh of points. Particles are initialized with the coordinate values of points. The number of particles is therefore determined by the number of points that we want over the object. A maximum number of iterations is set and the algorithm is executed, until the stop criterion is reached. An object model is recovered from the local best position
(1) Obtain object contour
(2) Generate starting point coordinates from bounding box
(3) Migrate nonvalid points
(4) PSO spreading
In Figures
Different stages of the mesh grid procedure for the hair dryer object. (a) Original input image. (b) Feature extraction. (c) Contour created for the object. (d) Depth information attached in a mesh grid.
Different stages of the mesh grid procedure for the earth globe object. (a) Original input image. (b) Feature extraction. (c) Contour created for the object. (d) Depth information attached in a mesh grid.
Moments provide useful and compact information of a data set, such as its spread or dispersion. A pattern may be represented by a density distribution function, moments can be obtained for a set of points representing an object, and they can then be used to discriminate between objects [
In particular, [
Finally, the descriptor is given as input to a SVM and we get the result on whether the image contains the target object. The SVM [
In the following, we validate the proposed approach and compare it with histograms of oriented gradients (HOG) [
First, a data set composed by 5 objects was defined (cups, hair dryers, irons, cereal boxes, and soda cans) with around 50 images for each object; see Figure
Objects conforming the data set.
In addition to the house made data set, the RGBD object data set from the University of Washington [
Objects from the RGBD data set of the University of Washington [
We worked with images containing a single object with a discriminative background. As explained before, we start by extracting key points from the image. The contours were formed by 72 bins of 5 degrees each. The mesh was composed by a
In our case, for
The crossvalidation method was used to validate the training process [
To define the SVM model, different SVMs with linear,
The first experiment consisted of binary classification of one object being discriminated from another, 1 versus 1 classification; see Table
Binary classification using 2 objects.
1 versus 1 classification percentages  

Test  PSAmoment descriptor  
Name  CR  C1E  C2E 
Dryer and box  100%  0%  0% 
Can and irons  94%  3%  3% 
Cup and can  86%  10%  4% 
Iron and dryer  88%  4%  8% 
Box and cup  90%  0%  10% 
The second test consisted of binary classification using the 5 objects (an object being discriminated from the rest); see Table
Binary classification using 5 objects.
1 versus all classification percentages  

Test  PSAmoment descriptor  
Name  CR  FP  FN 
Dryer  96%  2%  2% 
Iron  88%  0%  12% 
Cup  92%  2%  6% 
Can  90%  10%  0% 
Box  96%  2%  2% 
Finally, a test was made in multiclass classification where objects were classified all at once. Each object was set as a different class and the classification delivered the class to which the objects belong. This test generated a 79% of correct classification. An 8% of incorrect classification was present in class 1 (cups) and 9% in class 2 (irons); all objects of classes 3 and 4 (irons and cans) were correctly classified, and 4% of incorrect classification was in class 5 (box).
After testing the descriptors with the small data set, classification tests were performed using a group of objects from the RGBD object data set of the University of Washington. All objects were discriminated from the rest in binary classification. Tests were performed using first the moment descriptor obtained when points in the net are simply migrated, then the same tests were done using the point spreading approach. Later, these results were compared with histograms of oriented gradients (HOG) [
The classification has been made for each descriptor (moment, momentPSA, HOG, and SIFT) using SVMs. From crossvalidation the best results obtained were a linear kernel function with a cost parameter
Tests using objects from the Washington University data set.
Classification percentages  

Object  Object 1  Object 2  Object 3  Object 4  
Result  CR  FN  FP  CR  FN  FP  CR  FN  FP  CR  FN  FP 
Moment descriptor  80  8  12  96  0  4  92  4  4  79  7  14 
MomentPSA descriptor  85  5  10  98  0  2  97  1  2  82  8  10 
HOG descriptor  99  1  0  98  2  0  90  8  2  85  10  5 
SIFT descriptor  74  22  4  85  8  7  76  10  14  70  27  3 
HAAR cascade  90  9  1  89  7  4  75  10  15  72  27  1 
We can see from these results that spreading the points over the object surface improves the descriptor computed and then the classification of the objects. With this approach we have similar results to the wellknown HOG descriptors which are used in many of the stateoftheart recognition systems. These two descriptors achieved the highest correct recognition percentages in all the tests.
The accuracy of the classification system has been assessed in terms of receiving operating characteristics curves (ROC) which relate the positive and false acceptance rates according to an acceptance threshold
ROC plots for the classification of the bowl. (a) HOG ROC, (b) moments ROC, (c) momentsPSA ROC, (d) HAAR ROC.
ROC plots for the classification of the coffee mug. (a) HOG ROC, (b) moments ROC, (c) momentsPSA ROC, (d) HAAR ROC.
With the ROC curves, we can also note that the momentsPSA classifiers show improvement over the one with the grid that is not optimized, and also it shows similar performance to the classification through HOG descriptors.
The efficiency of the descriptors using a SVM classifier was also evaluated in terms of the number of support vectors obtained by the SVM algorithm. This was done in the cases of moment, momentPSA, HOG, and SIFT descriptors. Table
Average and minimum number of support vectors.
Support vectors  

Object  Object 1  Object 2  Object 3  Object 4  
Support vectors  avg.  min.  avg.  min.  avg.  min.  avg.  min. 
Moment descriptor  20  15  25  17  101  73  27  20 
MomentPSA descriptor  8  5  24  12  73  52  12  10 
HOG descriptor  300  23  400  60  370  43  418  70 
SIFT descriptor  310  22  404  65  401  84  450  72 
With these results we can see that although HOG and our approach have similar classification results, the number of support vectors needed in the training process is smaller for our approach, and when the distribution of the points is optimized it also results in patterns that are better distributed in the feature space so we need even less support vectors.
In this paper we propose the design of a new small and robust feature descriptor based on SURF features, 3D data obtained by RGBD sensor, an optimization based on a modified PSO, and moment invariants. The experimental results presented in this section show that even that the new descriptors are small the addition of 3D data is advantageous and provides robust features. The importance of the mesh optimization step lies on the reduction of points and more distributed points over the object. Table
In order to determine the time required to construct the descriptor, we have tested each step 100 times for each object to estimate its average processing time, a resume of this results is presented in Table
Descriptor construction time. Where the steps are feature extraction (FE), contour creation (CC), mesh reduction (MR), mesh optimization (MO), and invariant moments (IM).
Step  FE  CC  MR  MO  IM 

Time  0.0929 s  0.0419 s  0.0202 s  0.0566 s 

In the previous section the different steps of the proposed approach had been presented. The approach produces a descriptor based on moment invariants which has been computed using 3D data provided by an RGBD sensor. A new method has been introduced for the extraction of object models from the 3D data.
The first step of the procedure is the feature extraction; in this approach the SURF detector was used, and the difficulties of obtaining the SURF features are the traditional of any image feature detector. The second step is the contour creation: this step takes as input the features detected in the previous step and extracts the contour of the object. In this step we must take into account that some interest points do not belong to the object; they are close to the contour but they are outside of it. This point can be seen as noise, and they are eliminated if their distance to the centroid is the double of the mean distance.
After the construction of the object contour, the depth information is added. If the area occupied by the object is big, then there would be a lot of data; thus the next steps will require a lot of computational process. For this reason, the third step of the proposed approach tries to optimize the mesh in order to reduce the computational time. In this step a modified version of PSO has been used to optimize the points on the mesh. One of the challenges encountered on this step was the stop criteria; we cannot force PSO to stop when the best value is zero, and therefore the algorithm stops when the best value of each particle
Taking into account the steps illustrated in Figure
With respect to the feature extraction step, as we mentioned before, it is made by using a SURF detector. The theoretical complexity of SURF was determined and validated through experimentation in [
The contour creation step is a linear process with respect to the number of points of the cloud produced by the SURF algorithm; nevertheless when the contour is created this number is reduced and only the points which belongs to the silhouette of the object are taken to be processed by the mesh reduction step, which is also linear with respect to this reduced number of points.
The mesh optimization step is performed by using the PSO algorithm. The number of computations to complete a PSO run is the computations of the cost function and the position and velocity updates, which are directly proportional to the number of particles and iterations of the algorithm. The computational complexity of evaluating the cost function depends basically on the Euclidean distance of the particles, which requires
The invariant moment descriptor generation seems to be the more expensive step which is executed in real time by the system because it is a process with a computation complexity of
The last step of the system is the classification of the descriptors obtained; it is made using a SVM. A SVM solves a quadratic programming problem (QP) and therefore its computation complexity depends on the QP solver used. The computation complexity of the traditional algorithm of the SVM classifier is
In this paper we have presented an image classification technique based on an invariant moment descriptor that includes depth information. The inclusion of 3D data enables invariant moments to produce small and robust descriptors improving image classification. To create such descriptors, we used object models with rich information of the distribution of points. The application of the optimization algorithm PSO to the model generation stage improved the computed descriptor and the object recognition. From experimental results, it is clear that these descriptors have achieved high correct recognition percentages. Furthermore, the number of support vectors obtained in the training process is smaller for our approach, due to the fact that the points are optimized and thus the patterns are better distributed in the feature space.
Position of PSO particle
Best position found by PSO particle
Velocity of PSO particle
Best position found by the swarm
PSO inertia weight
Uniform random numbers
PSO acceleration constants
PSO fitness function
Fitness penalization gain
Particle closeness to boundaries
Particle acceleration with respect to closest particle
Moment descriptor
Distance of particle
Mean of particles distances.
The authors declare that there is no conflict of interests regarding the publication of this paper.
The authors would like to thank CONACYT and the University of Guadalajara. This work has been partially supported by the CONACYT projects CB156567, CB106838, CB103191, and INFR229696.