Current RGBD sensors provide a big amount of valuable information for mobile robotics tasks like 3D map reconstruction, but the storage and processing of the incremental data provided by the different sensors through time quickly become unmanageable. In this work, we focus on 3D maps representation and propose the use of the Growing Neural Gas (GNG) network as a model to represent 3D input data. GNG method is able to represent the input data with a desired amount of neurons or resolution while preserving the topology of the input space. Experiments show how GNG method yields a better input space adaptation than other stateoftheart 3D map representation methods.
A 3D point is comprised of (
The amount of data in RGBD maps is huge since the number of poses is high. In a typical map with 10,000 poses, the data could consist of more than 3 billion of 3D points, which are unaffordable for representation and other tasks. Furthermore, as acquisitions frame rate is high, for a common area, a huge amount of redundant points is used to represent the input space. Due to the huge quantity of data, several methods have been proposed to reduce the number of points in the map while preserving the main features of the data, as it would be used in further tasks.
Elevation maps were a commonly used structure in the past [
Following this idea of 3D space representation, some other structures have been proposed like occupancy grids or Octrees. Occupancy grids represent the entire space as 3D cell grids. The cell information could consist of a single value of occupancy or contain more complex information as the probability of occupancy. Several works in mobile robotics have used this structure as a base for their applications [
Wang et al. [
Other approaches use selforganizing maps in order to reduce the input space. Viejo et al. [
The rest of this work is organized as follows. First, in Section
One way of selecting points of interest in 3D point clouds is the use of a topographic mapping where a lowdimensional map is fitted to the high dimensional manifold of the model, whilst preserving the topographic structure of the data.
In this section, we review some typical methods to represent and compress 3D data. First, we propose the use of a Growing Neural Gas algorithm to reduce and represent 3D point cloud maps. Then, we briefly describe two wellknown data structures in order to compare them with our method.
A common way to achieve data dimension reduction is by using selforganising neural networks where input patterns are projected onto a network of neural units such that similar patterns are projected onto units adjacent in the network and vice versa. As a result of this projection, a representation of the input patterns is achieved, which in postprocessing stages allows exploiting the similarity relations of the input patterns.
However, most common approaches do not provide good neighborhood and topology preservation if the logical structure of the input pattern is not known a priori. In fact, the most common approaches specify in advance the number of neurons in the network and a graph that represents topological relationships between them, for example, a twodimensional grid, and seek the best match to the given input pattern manifold. When this is not the case, the networks fail to provide good topology preservation as in the case of Kohonen’s algorithm [
The approach presented in this paper is based on selforganising networks trained using the Growing Neural Gas learning method [
In GNG, the nodes of the network compete to determine the ones with the highest similarity to the input distribution. In our case, the input distribution is a finite set of 3D points extracted from different types of sensors. The highest similarity reflects which node together with its topological neighbors is the closest one to the input sample point which is the pattern generated by the network. The
The nodes move towards the input distribution by adapting their position to the input data geometry. During the learning process local error measures are gathered to determine where to insert new nodes. New nodes are inserted near the node with the highest accumulated error. At each adaptation step a connection between the winner and its topological neighbors is created as dictated by the competitive Hebbian learning method. This is continued until an ending condition is fulfilled, for example, the evaluation of the optimal network topology, a predefined network size or a deadline.
The network is specified as follows.
It is a set
It is a set of edges (connections) between pairs of neurons. These connections are not weighted and its purpose is to define the topological structure. The edges are determined using the competitive Hebbian learning algorithm. An edgeaging scheme is used to remove connections that are invalid due to the activation of the neuron during the adaptation process.
The GNG learning algorithm is as follows.
Start with two neurons
Generate a random input signal
Find the nearest neuron (winner neuron)
Increase the age of all the edges emanating from
Add the squared distance between the input signal and the winner neuron to a counter error of
Move the winner neuron
If
Remove the edges larger than
With every certain number
Determine the neuron
Insert a new neuron
Insert new edges connecting the neuron
Decrease the error variables of neurons
Decrease all error variables by multiplying them with a constant
If the stopping criterion is not yet achieved, go to step 2.
In summary, the adaptation of the network to the input space takes place in step 6. The insertion of connections (step 7) between the two closest neurons to the input patterns establishes an induced Delaunay triangulation in the input space. The elimination of connections (step 8) eliminates the edges that no longer comprise the triangulation. This is made by eliminating the connections between neurons that are no longer activated or isolated. Finally, the accumulated error (step 5) allows the identification of those areas in the input space where it is necessary to increase the number of neurons to improve the mapping.
Using a Growing Neural Gas model to represent 3D data has some advantages over the traditionally used methods like voxel grid or Octrees. For example, we specify the number of neurons (representative points of the map), while other methods like the voxel grid or Octree get different number of occupied cells depending on the distribution and resolution of the cells (voxels on voxel grid and leaves on Octree based methods).
Most 3D point cloud mapping algorithms typically use the spatial organization of the points to encode them in a structure like an Octree to reduce the amount of information. An Octree is a tree data structure in which their internal nodes have exactly eight children. Octrees make a partition of the threedimensional space by recursively subdividing it into eight octants. It starts from a user specified volume space or it computes the bounding box of the input set. Then, each node or cell is subdivided into 8 children nodes until a certain condition is reached. These conditions vary depending on the problem or the Octree implementation. A commonly used condition is to stop producing new children nodes when the volume or size of the corresponding cell node reaches the desired precision.
One of the main features of the Octree representation is that nodes not containing input space points are not subdivided and therefore those leaf nodes represent an empty volume of the space. This feature is useful for some mobile applications as robot navigation. There exist different approaches to select the representative point of the occupied nodes. A simple one is to get the center of the node cell but using the mean or centroid of the cell inner points improves the preservation of the topology. This approach offers better results but it has a higher computational and memory cost.
The VG downsampling technique is based on the input space sampling using a grid of 3D voxels [
The VG method, as the Octree based methods, presents the same problems compared to other subsampling techniques: it is not possible to define the final number of points which represents the surface, geometric information loss due to the reduction of the points within a voxel and sensitivity to noisy input spaces.
In this subsection we briefly describe the main differences of the abovedescribed methods. Figures
Partial point cloud example.
GNG representation example of the partial point cloud shown in Figure
Octree representation example of the partial point cloud shown in Figure
Voxel grid representation example of the partial point cloud shown in Figure
Both, voxel grid and Octree methods should provide similar results due to their final representation of the points. In a point cloud reduction application, the Octree gets their representatives of the leaf nodes and if we use the same resolution as the voxel grid method we get a similar division of the space in cubes or cells of the same dimension. The voxel grid method is the most simple and fastest reduction method but it does not have any of the advantages of the Octree structure or GNG model like neighbor searching facilities.
Figure
Twodimensional examples of the three tested methods.
In this section we test the quality of adaptation of the three described methods. We first describe the data used in the experiments and then we analyze the results of the tested methods, both quantitatively and qualitatively.
To test the implemented scene mapping systems on room map scenarios, we used the TUM RGBD dataset [
This dataset contains 39 sequences recorded in two different scenarios. The fr1 datasets were recorded in a typical office environment (first scenario) and the fr2 datasets were recorded in a large industrial hall (second scenario). Figures
Example of the “fr1 360” groundtruth point cloud map.
Example of the “fr3 long” groundtruth point cloud map.
Table
Number of points of each groundtruth map dataset.
Dataset  Number of input points 

fr1 
1049739 
fr1 desk  1952544 
fr1 360  2357039 
fr1 desk2  2751402 
fr2 
3492032 
fr2 desk  5841800 
fr3 long  1636623 
As we previously mentioned, we are going to compare the proposed GNG adaptation with two commonly used data structures in the stateoftheart, Octree, and voxel grid. The implementation of both methods is included in the Point Cloud Library (PCL) (the Point Cloud Library (or PCL) is a large scale, open project [
We extensively tested the implemented methods using different number of representatives. Since the three tested methods reduce the amount of noise in the generated map, it is needed to know the real distance from the selected representatives to the original input space. The following measure specifies how close the representations are from the original model.
A quantitative measure of the input space adaptation of the generated map is obtained by computing the mean error (ME) of the reduced map against sampled points (input space):
Figure
Closest neighbor distance mean errors of the different datasets.
It is important to point out again that the representative selection method used in this comparison is given by the used implementations. But Octree and voxel grid methods can use a representative selection strategy. GNG adaption shows the best results on all datasets. It is noticeable that the GNG gets lower errors with different number of representatives but as the number of representatives increases the three different methods converge to the same error.
In this subsection we qualitatively analyze the results obtained with the three different methods. Figure
(a) Original point cloud map. (b) Octree reduction. (c) Voxel grid reduction. (d) GNG representation.
Figure
Other examples of GNG representation with two additional maps. Left part is a zoomed detail.
With respect to computational cost, our method is feasible to be included in a modern system using general purpose computing platforms. However, we designed in a previous work [
In Table
Runtimes and speedup of GPU versus CPU implementation for different GNG versions.
Neurons  Patterns  CPU runtime (s)  GPU speedup  GPU runtime (s) 

5000  250  63  3×  21 
12000  350  526  5×  105.2 
18000  500  1448  6×  241.33 
The experiments showed how the GNG is able to adapt their topology to represent the input map space. In [
Normal estimation methods are based on the analysis of the eigenvectors and eigenvalues of a covariance matrix created from the nearest neighbours and are very sensitive to noisy data. Therefore, we computed normals on raw and filtered point clouds in order to demonstrate how a simple 3D processing step like normal or curvature estimation is affected by the presence of noise.
Figure
Normal estimation comparison. (a) Normal estimation on raw point cloud. (b) Normal estimation on filtered point cloud produced by the GNG method.
In order to test the keypoint detector/descriptor improvement, we performed a transformation estimation algorithm and we compared the GNG results against the voxel grid representation and against the entire source point cloud. We used the available descriptors [
RMS deviation error (meters) is obtained using different detectordescriptor combinations. Combinations are computed on the original point cloud (raw) and different filtered point clouds using voxel grid and the proposed GNG method. Keypoint detector search radius is equal to 0.05 meters. Feature extractor search radius is equal to 0.02 meters.
GNG  VG  Raw  

5000 
10000 
20000 
5000 
10000 
20000 
All points  
SIFT3D  FPFH  0.168  0.092  0.231  0.239  0.073  0.139  0.103 
CSHOT  0.052  0.037  0.019  0.063  0.07  0.037  0.039  
PFH  0.185  0.367  0.255  0.54  0.171  0.375  0.082  
PFHRGB  0.106  0.029  0.057  0.08  0.05  0.027  0.041  


Harris3D  FPFH  0.151  0.114  0.079  0.404  0.088  0.18  0.128 
CSHOT  0.085  0.046  0.052  0.038  0.033  0.069  0.066  
PFH  0.109  0.177  0.153  0.305  0.097  0.469  0.144  
PFHRGB  0.054  0.047  0.042  0.033  0.058  0.043  0.093  


Tomasi3D  FPFH  0.049  0.054  0.27  0.383  0.127  0.148  0.659 
CSHOT  0.043  0.023  0.063  0.022  0.047  0.046  0.049  
PFH  0.189  0.308  0.319  0.067  0.258  0.123  0.765  
PFHRGB  0.112  0.121  0.066  0.066  0.049  0.078  0.062  


Noble3D  FPFH  0.143  0.186  0.199  0.387  0.188  0.289  0.114 
CSHOT  0.098  0.052  0.063  0.085  0.059  0.048  0.06  
PFH  0.273  0.127  0.239  0.073  0.244  0.188  0.099  
PFHRGB  0.188  0.117  0.064  0.096  0.082  0.079  0.077  


Lowe3D  FPFH  0.143  0.117  0.076  0.387  0.188  0.093  0.203 
CSHOT  0.065  0.052  0.062  0.04  0.059  0.107  0.067  
PFH  0.273  0.12  0.239  0.073  0.244  0.167  0.24  
PFHRGB  0.188  0.135  0.016  0.077  0.082  0.076  0.027  


Curvature3D  FPFH  0.099  0.228  0.113  0.228  0.103  0.093  — 
CSHOT  0.048  0.032  0.022  0.033  0.042  0.05  —  
PFH  0.262  0.151  0.123  0.244  0.101  0.057  —  
PFHRGB  0.139  0.071  0.053  0.068  0.083  0.023  — 
3D maps obtained from RGBD data are useful for robotics tasks, like robot navigation. But this kind of maps contains a huge amount of data, which must be reduced to properly process the map. In this paper, we have presented a method to represent and reduce 3D maps. Our method is based on a GNG neural network which is adapted to the 3D input space. The experiments carried out demonstrated the validity of our method, as it provided better adaptation than two of the most used methods for these tasks: voxel grid and Octree.
As future works, we propose to extend our method to provide a useful map for robot navigation. We also plan to provide the GNG with a way to revert the reduction or compression of the points, storing information in the neurons neighborhood (color, point distribution, etc.).
The authors declare that there is no conflict of interests regarding the publication of this paper.
This work was partially funded by the Spanish Government DPI201340534R grant. Experiments were made possible with a generous donation of hardware from NVDIA.