Monte Carlo Registration and Its Application with Autonomous Robots

and


Introduction
Pose estimation is needed in a lot of different robotic applications, such as object pose estimation for grasping or manipulating objects, mobile robot localization, or registration of submodels in 3D modeling.In many cases a streaming pose estimation is beneficial or mandatory.One example is autonomous object modeling: if an object is placed on a table or other objects are in the proximity, the bottom or occluded part cannot be modeled without repositioning the object.However, if the object is repositioned and a registration is performed with newly acquired data, the autonomous 3D modeling could continue in order to acquire a complete object model.A streaming pose estimation after object repositioning has various positive impacts on the entire approach.First, time is saved as the pose estimation is readily available after data acquisition.Second, streaming pose estimation has the potential of reporting convergence or failure during data acquisition, enabling an autonomous reaction of the robot, for example, switching to modeling or rescanning.Third, after convergence, acquired data can be passed on-the-fly to modeling modules, resulting in a seamless transition from pose estimation to modeling.
Existing methods do not satisfy the requirements for onthe-fly global pose estimation.On the one hand, particle filters for mobile robot localization are able to keep pace with data acquisition.Unfortunately, they typically work either locally, are specialized to a certain sensor type, or cope only with 2D pose estimation.On the other hand, many global pose estimation methods cannot be adopted to work streamingly.Recently, we tried to fill this gap by introducing a particle filter registration [1] and adapting it to streaming pose estimation [2].The work is aimed at applying in autonomous 3D modeling of unknown objects with laser stripers [3].
In this paper, we review these Monte Carlo methods and their performance.We focus especially on streaming registration, meaning that the pose is estimated on-the-fly during data acquisition.For autonomous object modeling, we propose to combine the registration method with the approach presented in [4] for creating complete object models.Further, we show various use cases, such as in robot assisted teleoperation, allowing for partial autonomy when grasping a power screwdriver as can be seen in Figure 1.
Compared to our previous works [1][2][3], new and extended experiments especially in autonomous 3D modeling and mobile robot localization are shown (Sections 8.3 and 8.4).We enhance the streaming registration by providing a convergence criterion.Moreover, the influence of the sampling distribution, the introduced pose optimization during particle weighting, and the convergence criterion are investigated in detail.The main contributions in contrast to our previous works are (i) investigation of the influence of the proposed optimization step frequency on the robustness, reliability, and accuracy of results; (ii) investigation of the convergence behavior of the optimization frequency in comparison to the previous variants; (iii) definition of a convergence criterion; (iv) investigation of its influence on the robustness, reliability, and accuracy of results; (v) application in mobile robotics and comparison to a standard particle filter.
The remainder of this work is organized as follows.In Section 2, the related work concerning pose estimation and autonomous object modeling is reviewed.Then, in Section 3, the scalar features utilized in this work are introduced.Afterwards, particle filters are shortly reviewed and their application to registration is derived (Section 4).In the subsequent two sections, the sampling (Section 5) and scoring (Section 6) of particles are detailed.In Section 7, a feasibility study with the basic Monte Carlo method for global registration is performed.In Section 8, real experiments with the streaming Monte Carlo registration and its results in the autonomous object modeling and mobile robot localization scenario are presented.The work is completed by a conclusion and a perspective on future work (Section 9).

Related Work
Registration is a crucial part of many technical applications such as reverse engineering, rapid prototyping, or manipulation tasks in manufacturing processes.It denotes the estimation of an unknown rigid motion between two 3D models of the same object.
Registration methods can be partitioned into fine (or local) and coarse (or global) ones.Fine methods require a good initial guess of the sought-after transformation.Coarse methods try to estimate the true rigid motion within a global search space.Both are reviewed in the following.For fine methods only a short overview including its application to mobile robot localization is given, as coarse methods are more closely related.Further, related work on real systems which autonomously model unknown objects is provided.

Fine Registration Methods.
The iterative closest point (ICP) algorithm [5] is notably the most important representative of local methods (see [6] for an overview or [7] for a generalization).However, specialized methods have been proposed recently for RGB-D cameras [8][9][10] and image features [11,12].A further local technique related to our work is tracking with particle filters [13,14], which is specialized for fast local object movement estimation.Choi and Christensen [10] contribute a RGB-D object tracking approach implemented on a GPU and give an excellent general overview of particle filters in tracking problems, including edge based tracking, contour tracking, SLAM, and RGB-D based pose estimation.Therefore, the reader is referred to that work and the references cited therein for related tracking methods.
Localization and SLAM of mobile systems are also closely related but often specialized for local methods or methods restricted to 2D and could be categorized in between registration and tracking.A comprehensive overview of mobile robotics approaches is given by Sturm et al. [6].Often particle filters based on pure depth images are used [15] which are not suitable for the stated problem, as the weighting is time consuming.Moreover, our estimation problem is not in 2D, and thus we need a much larger number of particles than classic mobile robotics particle filters.Additionally, we have to cope with higher update rates.

Coarse Registration Methods.
In a large number of publications, contour and shape matching techniques are used, for which we refer the reader to the work of Ferrari et al. [16] and the literature cited therein.Note that we consider these techniques inappropriate for streaming pose estimation with laser stripers as they are slow and the contour or shape given by a few laser stripes is not very meaningful.
One of the most common methods in pose estimation is the random sampling consensus (RANSAC) introduced by Fischler and Bolles [17].Its application to registration has been shown by Chen and Hung [18].Commonly, subsets of points, point-normal pairs, or higher-dimensional features are sampled in the data sets to calculate unique rigid motions.Winkelbach [19] contributes an efficient way to sample pointnormal pairs in order to build transformations.Drost et al. [20] also use point-normal pairs and a variant of the Generalized Hough Transform as voting scheme.Hillenbrand [21] contributes a robust cluster search in a set of transformations which are calculated from samples of either point-triples or point-normal pairs.Rusu et al. [22] use the Fast Point Feature Histogram in order to assign correspondences and use a sample consensus method in order to maximize the 3D overlap.Unfortunately, adapting these RANSAC-based methods to work with streaming data is not possible because a uniform sampling of points cannot be achieved before all data is acquired.
Another group of algorithms tries to group correspondences [23] or exploit salient points [24].Aldoma et al. [23] evaluate various high-dimensional features for object recognition with a correspondence grouping method based on geometric consistency.A "center-star" variant of [24], followed by RANSAC-based filtering, yields a matching of similarly spaced point sets.In contrast to Rusu et al. [24] who search for salient points, points are sampled uniformly.The reduction to significant points is common.Gelfand et al. [25] reduce the data to a very small point set and apply a branch-and-bound algorithm to assign correspondences.The transformation is calculated by a least squares estimation.Cheng et al. [26] use a region growing algorithm to calculate feature areas.In order to find correct correspondences, they use a relaxation labeling method.Both last methods rely on finding the unique and correct correspondences of feature points or areas.Again, this class of algorithms cannot be adapted to work with streaming data, since global data sets are needed, especially for feature calculation.
Barequet and Sharir [27] introduce a method that is based on the unique decomposability of rigid motions into a rotation and a translation.They iteratively search a discrete space of rotations by clustering the corresponding translations and finding the most definite cluster as the best rotation.In a later paper [28], they modify the method to work with directed features, that is, feature points with surface normals.They build ⟨feature point, normal⟩ pairs to directly get possible rotations (with 1 free DOF).
A similar approach has been proposed by Tombari and Di Stefano [29].The features they use yield a complete reference frame, not only a sole surface normal (as in [28]).Therefore, a correspondence pair defines a rigid motion, in contrast to a set of pure rotations.However, the scoring/voting table is the same (up to a constant translation) as in [28].In order to calculate the best translation, Tombari and Di Stefano apply an ICP iteration to the correspondence pairs after voting.Barequet and Sharir use the found transformation directly (ICP can be applied to the whole data set afterwards).Tombari and Di Stefano also use their method for object detection and prove that it is more robust and reliable than other standard methods that use a pose space clustering or geometric consistency.
Glover et al. [30] use a Bayes filter for pose estimation, where the rotational part of the transformations is represented by a Bingham distribution and extend their approach for multiple object detection in cluttered scenes [31].
Rink et al. [1] reformulate Barequet and Sharir's approach as a particle filter and show that, in applications with very noisy data, relying on accurate surface normals or reproducible reference frames (as in [29]) can fail.Thus, scalar feature descriptors are proposed.Furthermore, a comparison to similar strategies [22,23,27,29] is given and the advantages of explicit integration of prior knowledge about the searched transformation are presented.In a subsequent paper Rink et al. [2] advance the idea of particle filtering with scalar features to streaming pose estimation, adapting the search space to the space of rigid body transformations and giving a theoretically sound weighting of particles.The streaming feature calculation in that approach is based on a streaming principal component analysis used for tangential plane estimation in streaming mesh construction, proposed originally by Bodenmüller [32].In a recent paper, Rink and Kriegel [3] optimize their method for the application in autonomous 3D modeling.An optimization in the particle weighting step is introduced and sampling pose particles according to a truncated Bingham/normal distribution is compared to uniform sampling.Further, they integrate the streaming pose estimation into an autonomous 3D modeling approach.

Autonomous Object Modeling.
In autonomous object modeling, usually a robot-sensor system is utilized to plan a Next-Best-View (NBV) in order to acquire a 3D model of an unknown object.Although the area of NBV planning has been widely explored [33,34], there is little research on real systems for autonomous object modeling.
For the purpose of modeling cultural heritage objects, Karaszewski et al. [35] present a measurement system comprising a turntable and a vertically moveable pedestal.First, they select areas in the boundary area as viewpoint candidates.Then, low point density areas are selected.The digitization time is several hours, even for small objects.Khalfaoui et al. [36] combine an industrial robot, a turntable, and a very large and expensive fringe projection system.In order to plan NBVs, they define barely visible surfaces as viewpoint candidates.They perform a mean shift clustering for NBV selection.In order to avoid viewpoints being very close to each other, they use a minimal distance criterion.In the resulting model, several holes remain.Torabi and Gupta [37] use a smaller robot with 2D range sensor and focus on the exploration, not on the object modeling.Vasquez-Gomez et al. [38] autonomously reconstruct unknown objects with a mobile manipulator and a Kinect sensor.In order to avoid collisions, NBVs are sampled in configuration space and evaluated in Cartesian space.
Kriegel et al. [4] combine an industrial robot with a laser striper and focus on high quality model acquisition.Therefore, they define a quality criterion and consider the surface quality during Next-Best-Scan (NBS) planning.None of these approaches are able to obtain the bottom part of an object.In [39] the last approach [4] is used to create a pose estimation data set.There, initially occluded parts are also modeled.However, the objects have to be repositioned manually about a defined axis quite perfectly because an ICP is used for registration.

Distinction.
In this work, we present Monte Carlo registration methods and apply it to different scenarios.The streaming registration works with pure depth measurements and is thus not specialized for a particular sensor.Notably it also works with laser stripers.It is a global method and works streamingly.To the best of our knowledge, there is so far no method satisfying these requirements.In addition to the results presented in [1][2][3], we perform a more indepth analysis of the theoretical basis for the presented method.Further, we review the Monte Carlo registration approaches with more details concerning the used sampling methods.We improve the original methods, especially by equipping the streaming pose estimation method with a convergence criterion and perform extended experiments.Moreover, we combine the registration method with the approach presented in [4] which focuses on the modeling and plans NBSs for a laser striper for creating complete object models.Thus, we extend the autonomous modeling system by enabling the acquisition of complete object models by arbitrary repositioning of objects, which has not been done so far.

Features
For a streaming calculation only local features can be used, since regional or global features are computationally too expensive.In the literature, various multidimensional features exist [22] that work well with a low or moderate level of noise.In this work scalar features are used for two reasons.On the one hand, if a small neighborhood radius is used for feature calculation, a high level of noise increases the number of false matches [23].This limits the advantage of the higher expressiveness over scalar features.On the other hand, with a large neighborhood radius, an iterative calculation is difficult to achieve because an exhaustive neighborhood search has to be performed.Moreover, the scalar curvature features used in this work already proved to be suitable for an iterative streaming calculation; see [32] for an application of some of the features as a quality measure for streaming surface normal estimation.
Point clouds and triangle meshes are common representations for 3D sensor data and can be directly computed from range images or from streams [32].Examples for angular features working on different data types are given by Barequet and Sharir [27].However, as special data structures are presumed there, the proposed mean angles for meshes cannot be used for the situation considered here.In the following we present some applicable features for homogeneous triangle meshes and point clouds and show how to compute them streamingly.Note that every feature point  = (  ,   , V  ) ∈ R 3 × S 2 × R comprises coordinates   , a surface normal   (S 2 being the unit sphere), and a feature value V  .In this work we use two types of neighborhoods for the calculation of features, depending on the data structure that is used.A polygon mesh contains a set of vertices V, a set of edges E, and a set of polygons.Each edge  ∈ E is defined by two vertices V 1 , V 2 ∈ V, denoted by ⟨V 1 , V 2 ⟩.When dealing with homogeneous polygon meshes, it is reasonable to define the neighborhood of a vertex  ∈ V as all points that are connected with  via  edges.It can be defined recursively by  0 () = {} and

Normal Cosines in
Figure 2 shows the MNC, the MaNC, and the MiNC of a triangle mesh of a wooden workpiece and their histograms.Note that points containing border vertices in their neighborhood are excluded from the feature calculation (depicted in white) because a robust feature calculation is not possible for this case.Since holes especially arise in high curvature areas, significant features could be incorrectly computed there.Note that in this case synthetic point data is used for mesh and feature generation.In nontechnical data sets, the distribution of the features is often closer to a normal distribution than in this example.
If point clouds are used or the polygon meshes are not homogeneous (i.e., edge lengths vary strongly), we use ball neighborhoods.For a point cloud  and a point  ∈  let The normal cosine used in practice depends on the objects that are expected.Flat objects like metal sheets do not yield relevant curvature features, disabling a robust pose estimation.In such cases we use the feature value  2 / 3 , denoted as EVQ23.This feature characterizes border points of the object (see Figure 13 for an example).

Streaming Feature Calculation.
Feature-based streaming pose estimation requires streaming feature calculation which is implemented by a processing pipeline comprising three stages: the density limitation, the normal estimation, and the feature generation step.The depth points coming from a realtime data stream have to pass a limitation test in order to be inserted into the model: each newly acquired point that is closer than a distance   to any point already inserted into the model is rejected.Thus, the computational effort can be controlled because the entire Euclidean point density of the model is limited.For each point passing the density limitation, a surface normal is estimated using principal component analysis for all points within a ball neighborhood with radius   .Only points with a robust normal estimation (see [32] for details) are transferred to the subsequent feature generation step.

Eigenvalues.
At the end of the normal estimation stage, the eigenvalues of the point neighborhood covariance matrix are readily available from the principal component analysis.Thus, the streaming feature calculation for EVQ13 and EVQ23 is straightforward: if a stable normal is ready, the corresponding feature point is calculated Besides the correspondence assignment features can be used to reduce the data to significant points.On the one hand this leads to lower computational costs.On the other hand correspondence assignment is concentrated on characteristic areas of the object, reducing the number of false positive correspondences.Therefore, the feature values are first summarized in a histogram.Then the histogram's bins can be removed iteratively.Either the biggest bin (see Figure 3) or the leftmost/rightmost bin is removed until only a given percentage of the original data remains.Alternatively, it can be useful to keep some points from the biggest histogram bins, for example, when dealing with great flat areas.Then, the smallest bins are kept and from the remaining bins a certain fraction is sampled so that the desired overall fraction remains in the end.Figure 3 shows the differences in these reduction strategies for the MNC.In ((a), (b), (c), (d)), the remaining feature points are depicted.In ((e), (f), (g), (h)), the corresponding histograms of feature values are shown.Clearly, the removal of the biggest bins results in remaining convex and concave extrema.The removal of the leftmost/rightmost bins results in remaining convex/concave extrema.In this case, weighted random reduction in the biggest bins results in evenly sized bins for the most frequent feature values.
In order to assign correspondences, we use discretized feature values, so-called feature classes, conferring some robustness to noisy sensor data.Given a number  of feature classes, the classes are defined uniquely by the maximum feature  max , the minimum feature  min , and an equal bin width  = ( max − min )/.In this work, values of  = 7,  = 5, or  = 3 are used.Then, every feature point of one data set corresponds to every feature point with a feature of the same class in the other data set.This implicates a normalization and allows the data sets to be measured with different sensors.3.5.Features in Time-of-Flight Camera Data.Robust feature calculation from noisy Time-of-Flight camera (ToF cam) data is a special challenge.Figure 4 illustrates the noise with an exemplary depth image of a power screwdriver.In such cases, the selection of key points relying purely on geometry fails.Though, we take advantage of the additional intensity image provided by ToF cams because optical and geometric edges often coincide in the data of technical products.In this particular use case, we search features on the handle of the screwdriver.On the CAD data template, the fine geometric features can easily be detected; see Figure 5.In the real ToF cam depth image, these geometric edges cannot be found.However, the intensity gradients in the intensity image include the searched edge at the handle of the power screwdriver.As they also include the transition between the shape of the screwdriver and the background, edges in the range image must be subtracted.Therefore, we use an image filter combination consisting of dilation, blurring, and a Canny edge detector on the intensity image as well as on the depth image to acquire the edges (for filtering the software OpenCV 2.4 is used).Finally, the depth edge image is subtracted from the intensity edge image, as shown in Figure 6.In total, exactly the geometric edges we sought for are extracted.Section 7.2.2 shows an application and more details.

Monte Carlo Registration
In this section, we introduce the basic Monte Carlo registration methods.First, we review the general particle filter and its specialization for registration.Then, the proposed methods are derived.

Particle Filter.
For a detailed description of particle filters and their applications the reader is referred to the literature [10,[13][14][15].Here, just the notations used in subsequent sections are given and the most important points are reviewed.
Let  be a random variable and let ( | ) be the probability density function (pdf) of  conditioned on some parameter .Let further () be the a priori pdf of the parameter .According to Bayes' rule, after some observation  of , the posterior of  is given by with ∝ being the symbol for proportionality.If repeated measurements of  are carried out, Bayes' rule can be applied iteratively (if the measurements are statistically independent).Suppose that the state of the parameter  changes between the observations   and  +1 of  according to a transition function   : where   (⋅) describes a systematic change with an error    that follows the pdf    .The transition   is changing over time in many applications.Moreover,    often depends on   .Therefore, the pdf    is changing, too.In the case of registration,   as well as    can be assumed to be constant.The assumption of first-order Markov chains and the independence of  +1 from   yields Now let the pdf of   be represented by   particles: with sampled states    and corresponding weights    .After a transition according to   and a new observation of , new particles are sampled according to (5).Note that sampling from the pdf (   ) has to be possible, which is not generally the case.Therefore,    is typically assumed to be distributed normally or uniformly.The new particles are each weighted with Finally,  +1 particles are resampled according to these weights.Thus, particle filtering can be summarized as follows: (1) Initialize ( = 0) the  0 particles

Monte Carlo Registration.
In the case of Monte Carlo registration, the particles can either be sampled in the space of rotations [1] or rigid body transformations [2].Let T be the corresponding state space in either case.Further,   ∈ T denotes the unknown transformation between two models  and  of the same object at time step .Then, every particle (1) sample pose particles initially; (2) weigh particles by (  | , ); (3) resample particles according to their weights; (4) optionally: adapt some parameters; (5) sample particles in neighborhoods of existing particles; return to Step (2) if not converged.

MCR and SMCR.
The Monte Carlo registration method presented in [1] is denoted by MCR in this paper.It works offline and uses all feature points in every weighting step and the particles are rotations.The particle weighting is done by a heuristic cluster evaluation.The streaming Monte Carlo registration method presented in [2] is denoted by SMCR in this paper.It works streamingly and can use either new incoming feature points or all accumulated feature points in the weighting steps.The particles are rigid body transformations.The particle weighting is done with respect to a truncated normal distribution.For SMCR, the number of particles, the sampling radii, and the radius for the score function from Section 6 are adapted in each Step (4) of Section 4.2.Each of these parameters has a maximum value and is reduced by a factor of 0.8 in Step (4), until a minimum value is reached.

Sampling Pose Particles
Initially the particles are sampled uniformly in a region according to the prior knowledge.In the subsequent iterations only neighborhood sampling is performed.This is done either with truncated uniform distributions or truncated normal/Bingham distributions.The radius of the truncated neighborhood can be adapted with convergence over time.
As sampling of translations uniformly or normally in neighborhoods is trivial, only the sampling of rotations is detailed here.Rigid body transformations are sampled by sampling the rotational and translational parts separately.

Sampling Unit Vectors Uniformly.
A prerequisite for our approach to sample rotations uniformly is to sample uniformly distributed points in neighborhoods on the unit sphere S 2 in R 3 .Sampling a point on the unit sphere can be achieved by the following steps: According to the transformation theorem for densities,  will be uniformly distributed on the unit sphere [44].
Let the -neighborhood   () of a vector  ∈ S 2 be defined as where  ∈ [0, ] .(10) Then, the above approach can be adapted to sample unit vectors in an -neighborhood of the first standard basis vector  1 with  ≤ /2.Just modify steps one and two as follows: (1) Sample V uniformly distributed in [− sin(); sin()].
(2) Sample  uniformly distributed in [−; ].This specialization is biased for  > /2.In this case simple rejection sampling can be used.

Sampling Rotations in Neighborhoods
Uniformly.In statistics, various strategies are common to choose priors that express ignorance about a parameter.In this work, uniform distributions are used for the initial sampling of transformations.The most common way to achieve a deterministic uniform sampling in a space is to build a homogeneous grid in that space.However, this leads to biased grids when dealing with the common representations of rotations.Sampling matrices straightforward in this manner is not even possible.There are sophisticated algorithms for deterministic uniform sampling of rotations [45][46][47], but unbiased deterministic sampling is not possible.
However, uniform sampling in a statistical sense can be achieved [48,49] and is advantageous compared to simple grid sampling on Euler angles [1].In the remainder of this section we detail our variant of Shoemake's method [49] to sample rotations uniformly in neighborhoods.Let R be the space of rotations in the remainder.The -neighborhood   () of a rotation  is defined as with (, R) being the rotational difference between two rotations , R, that is, the absolute value of the angle of the axis-angle representation of  ∘ R−1 .Let  1 ,  2 ,  3 be the basis vectors of the base coordinate system.In order to sample rotations on -neighborhoods, we represent a rotation by the transformed coordinate system ẽ1 , ẽ2 , ẽ3 , that is, the columns of the rotation matrix.Sampling rotations in a neighborhood   () of  is done by sampling a rotation   in   (id) and calculating R =  ∘   .In order to sample in   (id), we propose the following procedure (see Figure 7): (1) Sample ẽ1 in the -neighborhood of  1 .
(2) Calculate the vector (3) Calculate the vector Figure 7 illustrates these relations: V 2 is perpendicular to V 3 , which in turn is perpendicular to ẽ1 and  2 .Therefore, V 2 lies in the plane spanned by ẽ1 and  2 and thus V 2 is as close as possible to  2 conditioned on a fixed ẽ1 .This assures an overall rotation angle as small as possible before applying the random rotation around ẽ1 in Step (4).Rotating around ẽ1 moves V 2 onto ẽ2 .For  < /2, rotations around ẽ1 by an angle greater than  cause V 2 to leave the -neighborhood of  2 .For  ≥ /2, this is not necessarily the case.Together, this motivates Step (4) because each transformed basis vector ẽ must not lie outside the -neighborhood of the basis vector   if the resulting rotation is to lie in the -neighborhood of the identity.Rotations that lie outside the neighborhood are removed by Step (5). Figure 8(a) shows uniformly sampled rotations in an -neighborhood with  = 90 ∘ .The rotations are represented by the transformed unit vectors ẽ1 and ẽ3 , that is, the first and last columns of the corresponding rotation matrix.For better visibility,  2 was left out.

Truncated Bingham Sampling.
On the space of rotations, no normal distribution exists, though so-called projected Gaussians have been proposed [50].Most similar are special cases of Bingham distributions, which have already been used for pose estimation [30].We propose to sample from a truncated special case of a Bingham distribution, with rejection sampling.Shortly, after sampling a rotation  uniformly in an -neighborhood of a rotation R, we do rejection sampling with the weight exp((, R) 2 /2 2 ).This sampling strategy is easy to implement and if  2 is chosen appropriately (independent of ), the computation time is negligible compared to the weighting step of the particles.For better visibility, the normal distribution's variance is chosen as 18 ∘ in this example.In practice, we choose the standard deviation to be half the neighborhood radius.

Scoring Pose Particles
In this section, the scoring methods for rotations and rigid body transformations are presented.Scoring of a rotation is done by a cluster evaluation on the set of corresponding possible translations.This voting scheme is very similar to the generalized Hough transform.Scoring of a rigid body transformation is done by evaluating a truncated normal pdf.
The former voting scheme is used in MCR and the latter in SMCR.

Scoring Rotations.
In order to score a rotation, it is first applied to the corresponding data set.Then, all translational differences between all corresponding points are calculated (correspondences between the data sets are defined by equal feature classes).These differences define the set of all possible translations for the considered rotation.In order to find the one rotation with the most clustered set of such translations, two strategies are considered in this work.The first is to store the translations in a three-dimensional voting table and use the maximum number of elements in one bin as score.This method will be denoted in table in the remainder and is detailed in [27][28][29].The second is to store the translations in a voxel space and use the maximum number of neighbors in a ball neighborhood as score; see Figure 9.This scoring method will be denoted as nb and was originally presented in [1].The pdf (  | , ) (see Section 4.2) cannot be used as score function, since it is not known and there is no reasonable assumption about it.

Scoring Rigid Body
Transformations.The scores of the particles representing rigid body transformations are calculated according to a truncated normal distribution of the feature point coordinates, conditional on the pose.Let ,  be the feature point sets of the template and the incoming data, correspondingly.Further, let the features be classified; that is, V  is a discrete category for every  ∈  (and correspondingly for ).Every particle describes a rigid body transformation  = (, ), defined by a rotation  and a translation .In the following,  ∈  is corresponding to  ∈ .For a known transformation  between data and template, it is reasonable to assume that   follows a normal distribution with expectation (  ) and a covariance matrix Σ =  2 ⋅ id.
If the errors are identically and independently distributed and a set of feature points p = { 1 , . . .,   } and a set of correspondences q = { 1 , . . .,   } are given, the conditional pdf of all feature point locations is The corresponding   are approximated by the nearest feature point to   with the same feature class: For a  with erroneous feature class, no correct corresponding  will be found in the template.The best we can do in this case is to assume a uniform distribution of the feature point location, conditional on a wrong corresponding .For some distance threshold  max we define and adopt (13) to which corresponds to (truncated) normally distributed errors if the correspondences are found within a radius  max and uniformly distributed errors if not (with the density equal to that of the truncated normal at its boundary).This is actually an improper pdf because its integral is unbounded.Nevertheless, sampling importance resampling [51] is possible with it.Let therefore   be the incoming set of feature points at time step .The results in Section 8.2 will show that  =   leads to poor convergence behavior.The convergence can be improved by using all previously measured feature points for ; that is,  fl ⋃  =0   .In order to distinguish the two scoring variants, we call the former streaming particle filter registration (SPFR) and the latter streaming Monte Carlo registration (SMCR) in the remainder.As shown in [2,3] and reviewed in Section 8.2, SMCR has a better convergence behavior.Note that the offline variant MCR for global registration always uses all feature points.

Optimization.
If a   with   ≤  max according to (14) and ( 15) is found, it defines a correspondence to   .Thus, each particle yields a set of correspondences in the weighting step.In order to correct the pose particle, these correspondences can be used for an ICP iteration.If such an optimization step is applied, the method is denoted by SMCRO in the remainder.An additional subscript defines the frequency of such an optimization; for example, SMCRO 5 denotes an optimization in every fifth update.

Convergence Criterion.
One advantage of streaming pose estimation is the possibility of stopping data acquisition as soon as the estimation converged.For this purpose a convergence criterion for SMCR is introduced.In every update step we calculate the rotational and translational difference of the highest rated transformation to the highest rated transformation from the last step.If the differences are below some thresholds   and   in five consecutive steps, the calculation is aborted.We combine the optimization in every step with this criterion and denote the method by SMCRC.

Feasibility Study with MCR
In this section, we briefly summarize the most important results that are obtained with MCR.For a more detailed review, the reader is referred to [1].The results serve as feasibility study on the question, whether this kind of particle filter can compete with the state-of-the-art registration methods.Otherwise, the effort of developing a streaming variant would not be justified.

Validation with Artificial Data.
For the validation with artificial data, two submeshes of the well-known Stanford Bunny are extracted; see Figure 10.The computing time , the mean rotational error , and the percentage of successful estimations sr are investigated.A successful estimation is defined by a rotational error less than 20 ∘ and reflects robustness as such errors can be equalized by ICP.All experiments in this section were performed by 1000 test runs.In each of these test runs, the underlying rotation was chosen randomly.
First, three different sampling strategies are examined.The first is the original one [27], denoted by det in the following.There, rotations are represented by Euler angles and sampled initially on a homogeneous grid with resolution  0 .In each further step only the best rotation is kept and neighboring rotations are resampled on a local grid with 27 grid points, including the currently best rotation as center point.Subsequently, two stages are carried out: a coarse neighborhood search and a fine neighborhood search.The initial grid resolutions of the coarse and fine search are denoted by  1 and  2 in the following.Both in the coarse and in the fine search the grid resolution is adapted depending on whether a better rotation has been found in the last step or not.If a better rotation has been found in the coarse search, the local grid sampling is repeated around that rotation with the initial coarse resolution  1 ; otherwise the local search The second sampling strategy (denoted by randdet) is a mixture of discrete and random sampling: the initial samples are drawn as before.The neighborhood search is performed by randomly sampling 27 new rotations in the neighborhood (with adapted radius as before) of the best rotation of the previous step.
The third method is denoted by rand and is an important resampling approach: in every step samples are drawn randomly and resampled according to the scores.Therefore, not only in the neighborhood of the best rotation new samples are drawn, but in the neighborhood of all rotations.In order to assure comparability, initially the same number of rotations is sampled as in the first method.This number is defined by the initial resolution  0 .In each further step, the number of samples and the neighborhood radius are reduced by a factor of 0.5.
As convergence to the correct rotation is assumed, the resolution in the scoring (the bin width of table or the ball radius of nb in Section 6.1) is also adapted for each sampling stage.Sampling with det comprises three stages: an initial, a coarse, and a fine search.The bin widths of table in these stages are denoted as  0 ,  1 ,  2 , respectively, and are decreasing: The results concerning sampling and scoring strategies are depicted in Tables 1 and 2 and can be summarized as follows: if the particle number is chosen properly, the proposed sampling rand outperforms the original method det concerning robustness and accuracy.The proposed scoring method nb yields slightly better results in accuracy and robustness compared to the original table.Though a computationally expensive neighborhood search in an octree has to be performed and thus computation time is much higher compared to computing a discretized 3D vote.Therefore, nb can only be recommended if computation time is irrelevant.Therefore, table and rand are used in the remainder.
Finally, a comparison of MCR to alternative approaches implemented in PCL was performed.These are the Hough voting method of Tombari and Di Stefano [29], the Geometric Consistency (GC) approach used by Aldoma et al. [23], and the SAC-IA method of Rusu et al. [22].Figure 11 shows  violin plots [52] of the results, including two versions of SAC-IA, first with the default 1000 (1k) iterations, then with an increased number of 3000 (3k) iterations.Notably, SAC-IA always runs up to the maximum allowed number (1000 or 3000 in this test).It maximizes the overlap quality directly and therefore performs well with low-dimensional features that are not as descriptive.Typically an increased number of iterations is needed to outperform MCR.MCR and SAC-IA clearly outperformed both GC and Hough when using the scalar features.

Experiments with Real Data.
In the following, selected results with real data from different 3D sensors are presented, showing the effectiveness of MCR under hard conditions like small overlap areas and noisy data.

Registration with Laser Striper.
In the first experiments, we use the DLR Multisensory 3D Modeler [53] that comprises a laser-line projector and a stereocamera system, implementing a laser-stripe range sensor.It is attached to a 6 DOF industrial robot, the KUKA KR16-2.model after registration and remeshing on (e), with mapped textures on (f).

Registration of a Wooden Workpiece.
Registration of Steel Sheets.In this case, the problem is to map texture to a high quality 3D surface model.In order to overcome the sensor's problems with reflecting materials the object is sprayed with developer, losing the original texture information but gaining a high quality surface model.Afterwards a low quality surface model with correct texture (gained from the unsprayed object) is registered to the first model.Figure 13 shows (from (a) to (g)) the complete reference surface model, the feature points calculated from the point cloud (EVQ23, Section 3.2), and the reduced feature points (rightmost bins have been removed).In the middle the feature points before (d) and after (e) reduction of the second measurement are depicted.The texture information (belonging to the second measurement) from a monocamera shot and the result, that is, the reference model with mapped texture, are shown in Figures 13(f) and 13(g).

CAD Model Registration with ToF Camera Images.
Operators in telepresence systems (as in [54]) profit from semiautonomous functions, like grasping tools for manipulation.Here, we depict a use case where our method is employed to help the humanoid robot "Justin" grasp a power screwdriver, as shown in Figure 1.The pose estimation of the screwdriver is based on one frame of a SwissRanger SR4000 ToF cam fixed on one side of the torso.As the mounting is known, the screwdriver's position on a table is assumed to be known up to 10 cm.Further, two rotational degrees of freedom (DOF) are known up to a tolerance of 40 ∘ and 120 ∘ , respectively.In this setup the major challenge is the high noise of the ToF cam (Figure 4), which additionally produces an inaccurate extrinsic calibration.The template feature points for global registration are calculated from CAD data.As the features we want to extract define a gap at the handle, concave regions have to be extracted.Therefore, points on the surface polygons of the CAD data are sampled and remeshed initially.On the resulting homogeneous triangle mesh, the MaNC(3) is calculated and the leftmost bins of the feature histogram are removed according to Section 3.4, until 30% of the feature points remain.Figure 5 shows the template point cloud model and the calculated feature points.
From the incoming data, that is, one frame of the ToF cam, the feature points are extracted as outlined in Section 3.5.Prior to that, the following strategy is used to handle the camera data.Considering the roughly known pose, the acquired depth image of the whole scene and its corresponding intensity image are 3D-cropped to the surrounding of the table first.Then, the table top is removed from the depth image with the help of a plane detection.Finally, the edges described in Section 3.5 are extracted to be used as features; see Figure 6.
As feature values do not correspond between template and measured data, only one feature class comprising all points is used.This works well as long as two conditions are met.First, visual and geometric features have to coincide, which is often the case with technical products.Second, in order to keep the number of false matches low, there should be not too many different edges.
Due to the possible incorporation of prior knowledge into MCR, the initial sample consists of 549 rotations on a grid, where the two rotational DOFs are sampled in steps of 2 ∘ and 5 ∘ , respectively.The pose estimation with MCR is fine adjusted with a subsequent ICP.The robot fulfills its task fluently, as the overall computing time is between 1 and 5 seconds.An example of a successfully fitted template in the original depth image is shown in Figure 1.
In order to prove the methods competitiveness it is tested against alternative methods as in Section 7.1.Here, 1000 random poses are tested using MCR and the methods from PCL. Obtaining 1000 real scans with ground truth pose is problematic.Therefore, the model is aligned to a scan manually.In order to get 1000 different poses, 1000 random rotations are applied to it.Figure 14 shows violin plots of the resulting rotational errors.Summarizing, MCR outperforms the other methods clearly.Note that the seemingly good results of SAC-IA are only due to rejected estimations outside the prior knowledge region.Only in 1% of the cases an acceptable estimation is obtained.The mean computation times for SAC-IA, SAC-IA full, and MCR are 0.47 s, 3.54 s, and 0.14 s, respectively, with under 0.01 s added for MCR ICP (if ICP correctly converged).

Experiments with SMCR
In this section, the main experiments with SMCR are carried out on an industrial robot and the method is applied to autonomous object modeling.In the following, the utilized hardware and experimental setup are described.Then, the results for SMCR are discussed, followed by the integration of SMCR with autonomous object modeling and an evaluation of its performance.Further, the application in mobile robot localization is depicted.

System Setup.
Here, the utilized hardware, test objects, evaluation criteria, and parameters are described.

Hardware.
For the experiments a 6 DOF industrial robot, the KUKA KR16-2, with mounted laser striper is utilized (see Figure 15).For the KR16-2, the absolute positioning error is in millimeter range.The streaming Monte Carlo registration and the autonomous object modeling are run on an external computer with Quad Xeon W3520 2.67 GHz CPUs and 6 GB RAM as the KUKA Robot Control 4 (KRC4) is not designed for additional modules.The communication between KRC4 and the external PC is performed at 250 Hz using the KUKA Robot-Sensor Interface.The laser striper is a Micro-Epsilon ScanControl 2700-100 which obtains a stripe of 640 depth points in a range of 0.3 m to 0.6 m at 50 Hz with a maximum measuring error of approx.0.5 mm.

Test Objects and Data.
All experiments in Sections 8.1-8.3 are performed for three objects: a Zeus bust, a bunny, and a wooden chevron (see Figure 16(a)).These represent different application domains, namely, cultural heritage, household, and manufacturing.The approximate height of the Zeus bust is 22 cm and that of the bunny and chevron 18 cm.Figure 16(b) shows the calculated features for these objects and Figure 16(c) the autonomously acquired 3d models (see Section 8.3).

Evaluation Criteria and Parameters.
Similar to Section 7, the evaluations are done with respect to the median of rotational and translational error, denoted by   and   ,  respectively, and the success rate sr.Here, a success is defined, if the final error in translation and rotation is below 8 mm and 8 ∘ , which are tighter bounds than in Section 7.
As stated in Section 4, some parameters are adapted (by a factor of 0.8) in the iterative process.If not stated otherwise, we use a maximum number of 200 and a minimum number of 20 particles.The maximum scoring radius  max starts with 40 mm and is bounded from below by 4 mm.Neighborhood sampling of translations starts with a radius of 10 mm and is bounded by 1 mm.

Pose Estimation with SMCR.
In this section, first an overview of the different modules of SMCR and its integration are given.Then, the influence of prior knowledge is investigated.Finally, the concept of SMCR is verified by comparison to offline global methods.Details on the results can also be found in [2].Stream Processing, and the Particle Filter module, as depicted in Figure 17.
The 3D Data Acquisition module synchronizes the depth images from the laser scanner with the pose information from the robot [55].Therefore, the resulting depth images contain the sensor pose in robot coordinates, so that they need not be aligned to each other.
The depth image stream is handled to the Depth Image Stream Processing module.Here, the Depth Image Stream Switch component is dividing the stream again into a pose stream and a depth image stream.The pose stream is ending in the Sensor Movement Detection component, where the pose information is evaluated.If the translational or rotational difference to the last pose used for an update exceeds some predefined thresholds th  and th  , respectively, the sensor has moved significantly, and the particle filter is triggered to perform an update.
The depth image stream serves for feature point calculation.The depth images are converted to depth points and passed to the Stream Feature Estimator component (see

Data and Parameters.
The method is evaluated with 10 different scan paths of the Zeus bust, 8 scan paths of the bunny, and 5 scan paths of the chevron.The difference in scan path numbers is due to the different object shapes and the chevron's symmetry.In order to ensure independent results, the scan paths are placed all around the objects.Each test is repeated 100 times to achieve a meaningful number of test runs.The tests are performed once on the real robot and the scan data is saved, as it is not feasible to repeat the whole scanning process so often.Then, the repeated tests are performed on the saved data.
The scans are registered to surface models, which are generated with a commercial 3D modeling system in advance.Feature points calculated from these models are depicted in Figure 16(b).Each feature point set is classified with 5 classes and the middle class is removed.The reduced feature point sets serve as template models and are used for registration during the laser scans.The template models of the bunny, the chevron, and the Zeus bust consist of 6714, 13075, and 8771 feature points, respectively.The robot's pose error during a scan is usually negligible, concerning the quality of the acquired 3D models.Nevertheless, between two different scans, significant differences of robot configurations lead to considerable pose errors between the acquired 3D scan data.In between different scans, there typically occur gaps of up to 3 mm.Therefore, for each scan, a ground truth estimation is necessary because the resulting pose estimation accuracy is in the range of millimeters.
The ground truth estimations are calculated by utilizing MCR, followed by an ICP working on all acquired raw points.Correct results are assured by a visual inspection by a human operator.Moreover, the coordinate root mean square error after the ICP is checked to be lower than 0.2 mm.

Influence of Prior Knowledge.
The prior knowledge about the searched transformation is separated into a translational and a rotational part.The translational part is expressed in terms of a cuboid.The volume of that cuboid is denoted by () in this section.The rotational part is explained by a Additionally, the rotation axis can be assumed to be fixed in some cases, for example, if the object has been turned around the -axis.The maximal difference from the mean rotation is denoted   .If not stated otherwise, the initial translations are sampled in a cuboid with an extension of 16 cm × 10 cm × 3 cm, which corresponds to the approximate position on the scanning pedestal.
In order to examine the influence of the prior knowledge, one scan of the bunny is registered to a ground truth surface model, which was acquired with a commercial scanning system.Table 3 shows the results.The initial sampling radius for the rotation seems to have little effect on the success rate and on   /  : for both fixed () of 0.48 L and 1.22 L neither the success rate nor the errors are clearly increasing or decreasing with increasing radius.The volume of the cuboid for the initial translation has a clear influence if increased, as the last row shows.Both the success rate and the errors are significantly worse for a translational volume of 4.00 L compared to that of 0.48 L and 1.22 L. This confirms the suitability in autonomous 3D modeling because there is often good a priori knowledge about the position of the object, whereas the rotation cannot be told beforehand.In our scenario the rotation axis is known approximately, and often the rotation angle is restricted within a known range.

Comparisons with Offline
Methods.This section shows the comparison of SMCR to MCR and SAC-IA.We chose SAC-IA as reference because it performs best in the previous experiments in Section 7. In these experiments high quality scans are used.Therefore, SAC-IA yields the best results when applied to the complete point cloud (downsampled to a density of 3 mm) with the FPFH feature, a multidimensional feature [22], which has been used in these experiments.As SAC-IA is not taking any prior information into account, it needs to perform many trials, resulting in long computation times.Introducing prior information is possible, but the method slows down a lot by this, as discussed in Section 7.2.2.Table 4 shows the result for 100 runs with all scans.Again, SAC-IA works well with 3000 iterations (results can be improved by using even more trials, but we do not find that feasible).  ,   ,   denote the scans of the Zeus bust, the bunny, and the chevron, respectively.The highest success rate and lowest translational and rotational error are highlighted for each scan.Overall, MCR resulted in the lowest pose errors.However, MCR took up to 30 seconds and SAC-IA up to a few minutes in the worst case whereas SMCR did not require any additional computation time.
Figure 18 highlights one of the cases where both SMCR and MCR perform poorly but SAC-IA performs relatively well, in a violin plot in (a).The  6 scan captures a smooth surface at the back of the statue's head (on Figure 18(b)), which contains relatively few feature points.This results in larger errors of MCR and SMCR because they rely on local features.Contrary, SAC-IA uses all the points and FPFH and manages to find a good transformation in 44% of the cases.However, the distribution of the errors shows that while SAC-IA performs better, it fails completely in many cases, which influences the median not too much.In order to increase the performance of MCR and SMCR in such cases, more points could be considered.Though, more points lead to an increased computation time and introduces more uncertainty (due to a higher number of false matches).

SMCR in Autonomous Object
Modeling.In this section, first an overview is given on how the SMCR workflow (see Figure 17) is integrated into autonomous object modeling.Then, the convergence behavior and the performance of the different streaming registration variants and ICP are compared.Finally, the accuracy of autonomous modeling with integrated SMCR and repositioning is compared with the previous method.
Similar investigations can be found in [2,3].In contrast to those, here the particle optimization and the convergence behavior and the convergence criterion are investigated in detail.Basically, only the investigations of the uniform and Gaussian/Bingham sampling are covered by the previous publications.But for the sake of comparability with our new methods, we recomputed these tests as well in this work.

Overview.
In this section, the integration of SMCR into the autonomous 3D modeling approach (see [4]) is presented.As the SMCR workflow has been described in detail in Section 8.2.1, here we concentrate on the modeling part as can be seen in the overview in Figure 19.Initially, an arbitrary laser scan of the unknown object is performed with the robotsensor system (see Figure 15).The corresponding depth image stream contains the robot's pose information and is handled to three modules: Mesh Update, Probabilistic Space Update, and Feature Calculation.The features are calculated in realtime but will be used later for registration after the object has been repositioned.After updating the mesh and the probabilistic space, it is checked if the triangle mesh (surface model) has reached the desired quality, that is, if there are no holes except for not scannable parts.Then, Next-Best-Scan Planning, collision-free Motion Planning, and further laser scans are performed repeatedly, until the quality is reached.The mesh enables planning possible scan path candidates and selecting a Next-Best-Scan, in order to reach the desired surface model quality.The probabilistic space is a volumetric  For more details regarding the autonomous modeling and its modules we refer to [4].
As soon as the desired mesh quality has been reached, the object is repositioned in order to model previously occluded object parts.This is done by rotating it onto one of its sides.Then, a laser scan is performed along the region of interest.Again, the Feature Calculation is carried out on-thefly.Synchronously, the Particle Filter component iteratively performs a neighborhood sampling and weighting of the particles with the incoming feature points.After the laser scan has finished the pose estimation is instantly available.Finally, an ICP is used for fine registration, which results in a precise transformation between the original object position and the object position after repositioning.Then, the autonomous modeling is continued until a complete model is generated.In order to be able to model the object within the same coordinate system, all further generated laser scans are transformed by the resulting transformation from the registration.

SMCR and ICP with Partially Overlapping Submodels.
In order to show that local methods like ICP cannot be applied, we register two partially overlapping scans of the Zeus bust to each other, which poses a higher challenge than registering a submodel to a complete object model with low noise.For this purpose, we record 20 different pairs of overlapping scans (see Figure 20 for an example).To one of the scans in each pair we apply 10 different random transformations with translational and rotational variations.
The translation vectors have a maximal norm of 20 mm and the rotations a maximum rotation angle of 45 ∘ , 90 ∘ , 120 ∘ , or 180 ∘ around the -axis.The prior knowledge in SMCR is set accordingly, denoted with 45, 90, 120, 180, respectively in Table 5.  stands for the object Zeus and  presents the different methods described below.After the SMCR, we apply an ICP with a small search radius (20 mm) and few iterations for fine fitting, denoted by SMCR-ICP.We compare the results to a pure ICP with a bigger search radius (50 mm), simply denoted by ICP.For each rotation range we test the original Uniform neighborhood sampling , the proposed Gaussian sampling , and the Gaussian sampling with Optimization step , denoted by a capital , , or   in the data names, respectively.For instance, 90 denotes the case of uniformly sampled rotations with a maximal rotation angle of 90 ∘ for the Zeus bust, and 120 5 denotes the case of optimization in every 5th step (normal/Bingham sampling) and a maximal rotation angle of 120 ∘ for the bunny.In the table, the highest success rate and lowest translational and rotational error are highlighted for each rotation angle over all methods.Overall, the accuracy in translation and the success rate are increased for SMCR when applying the optimization step.The rotational accuracy is not generally increased.The effects appear in all rotation ranges.Concerning pure ICP, it becomes clear that, with increasing rotation range, its performance significantly decreases, both in accuracy and reliability.SMCR-ICP outperforms pure ICP clearly, both in reliability and accuracy.Further, the performance of pure SMCR is also significantly better than ICP.The frequency of the optimization steps has no clear effect on the results, neither on pure SMCR nor on the combination with ICP.Uniform sampling yields slightly better results than Bingham/Gaussian sampling.Success rates of about 0.6 of pure SMCR appear to be pretty small for two reasons: on the one hand, the parameters are not tuned for the data sets.On the other hand, the partially overlapping scans are harder to register as it seems at first glance.The most descriptive features are not easy to scan and very similar features are spread over the object.Moreover, the descriptive features appear more or less randomly in the data sets because the scan paths are chosen arbitrarily.The results in the preceding sections show that with this kind of data other state-of-the-art methods perform even worse, even when registering to a complete high-precision ground truth surface model.

Convergence Behavior for SPFR, SMCR, and SMCRO.
In this section, the convergence of SMCRO, SMCR, and SPFR is investigated.In order to account for the application in autonomous 3D modeling, we first autonomously acquire a more or less complete model of the bunny and the Zeus bust (without bottom part).Then, we reposition them on the scanning pedestal and acquire one scan manually.With this scan, 1000 repetitions for pose estimation are performed.
The ground truth estimation is calculated by utilizing MCR, followed by an ICP working on all acquired raw points of the ten subscans.Correctness is assured by visual inspection of a human operator.
Therefore, we repeat estimations for one scan path of the bunny and the Zeus bust 1000 times and calculate the medians of the translational and rotational errors in each update step, as depicted in Figure 21.In the case of SMCRO, the optimization is carried out in every step and every 2nd, every 5th, or every 10th step.Clearly, SMCR yields better convergence behavior than SPFR, which does not reach the success criterion at all.SMCRO in turn converges faster than SMCR, and the faster the more optimization steps are performed.Note that, at the update steps, the error medians often visibly drop down.
However, the optimization needs to be carried out with caution as, in individual cases, for too early or too many optimization steps the method may tend to converge to the wrong transformation, especially for objects with many symmetries.Therefore we started optimization not before the 5th step in any case.

SMCR and ICP in Autonomous
Modeling.In this section, an extensive evaluation of SPFR, SMCR, and SMCRO and comparison to ICP are performed.Therefore, a more or less complete model (without bottom part) of the object is autonomously acquired initially.Then, the object is repositioned on the scanning pedestal and 10 different single scans are performed manually.Each of the manual scans is transformed by ten different random transformations, giving a total of 100 different test scans for each object.These are registered to the corresponding previously acquired complete  model.The ground truth estimation is calculated by MCR and an ICP working on the raw points of the ten subscans.Visual inspection of a human operator assures the correctness.
Goodness of Fit and Success Rates.Table 6 shows the results of the experiments concerning rotational and translational error as well as the success rate.In contrast to the experiments in Section 8.3.2, the registration is performed based on the 3D model.Additionally we perform them for the bunny (denoted by a capital ) and the chevron () objects.
The results of the bunny and the Zeus bust clearly show that the proposed optimization step yields a higher accuracy and success rate.Moreover, in many cases, results get better with increasing optimization frequency.Finally, the errors SMCR itself performs good, but accuracy is not comparable to the ICP (if both are successful).The proposed Gaussian sampling does not lead to higher accuracy or success rates.The accuracy as well as the success rate is not much influenced in the example of the bunny.The opposite is true for the Zeus bust.The pure ICP is working surprisingly good, especially with the chevron, though it gets unusable for rotation angles over 45 ∘ for the bust.SMCR-ICP works extremely good, even in cases when SMCR yields problematic results.
Concerning the chevron, the rotation is estimated pretty good by all methods and the translation very bad espcially when allowing for large rotations.Note that the low median in the ICP results is misleading, as the success rates are pretty low, compared to the results with the Zeus bust and the bunny.An explanation could be that on the one hand the chevron has big flat surface areas which allow a robust estimation of rotations.On the other hand, this seems to allow the translation to slide along these areas, especially vertically along the triangular part.Additionally, there are a lot of spurious measurements, including the pedestal the object is placed on.
Convergence Criterion.The results obtained with the application of the convergence criterion (SMCRC) are denoted with a capital  as last letter in the table rows of Table 6; for example, 45 denotes the experiments with the Zeus bust for an angle of 45 ∘ and with abortion due to convergence.In the tests, we used convergence thresholds of   = 2 mm and   = 2 ∘ (see Section 6.2.3).Concerning the Zeus bust and the bunny, Table 6 shows that the results with SMCRC are comparable with those of SMCRO 2 , and after application of ICP comparable to SMCRO 1 .Further, SMCRC clearly outperforms the variants with few optimization steps (SMCRO 5 and SMCRO 10 ).Concerning the chevron, SMCRC yields clearly worse results than SMCRO.Probably this is due to the geometry of the chevron, where initially a large translational error still allows for a very good matching of wrong correspondences.

Autonomous Object Modeling.
Here, we compare the autonomous modeling results without repositioning the object as in [4] with the integration of the SMCR and repositioning of the object as presented in Figure 19.Therefore, the complete object modeling is performed 10 times each for the bunny, the Zeus bust, and the wooden chevron.For a comparison of the autonomous modeling method [4] with the other state-of-the-art methods concerning the algorithms for NBV planning, we refer to [56].It has been shown that the NBV approach which plans NBVs based on the boundaries of the surface models and considers information gain and surface quality for the NBV selection outperforms the other methods.
During these experiments, the object is manually placed onto its side after the desired quality for the visible object parts has been reached.For the 10 runs, different arbitrary initial scans and variations in the repositioning object orientation are chosen.The average model completeness and coordinate root mean square (CRMS) error when comparing with ground truth models are given in Table 7.The completeness is evaluated by comparing a ground truth model with the generated triangle mesh.The CRMS gives a measure for the model error which is influenced by the fact that details in the object are not modeled perfectly as can be seen in Figure 16(c).The error is mainly a result of sensor noise, sensor calibration, and robot accuracy which for the KUKA KR16-2 is in millimeter range.The completeness after repositioning is larger as the bottom parts have been filled.Figure 22 shows exemplarily for the Zeus bust how the bottom part is filled accurately with no major deviations due to the different object positions.The completeness still does not reach 100% which is due to the NBS planning which aborts based on a coverage estimation utilizing the current surface model which sometimes is noisy.However, these are just small holes which can easily be filled by a postprocessing technique.For the bunny and chevron, 100% is reached for some runs whereas for the Zeus bust a small part in the chin area below the beard could never be filled due to sensor restrictions as this area is very narrow.The CRMS shows that, due to object repositioning and SMCR, the model error does not increase.The CRMS is even slightly lower when the object is repositioned.One reason for this is probably due to the fact that along borders in the mesh larger errors occur due to incorrect matching (see Figure 22(a)).Further, the objects do not have many details on the bottom and thus the error is lower which influences the average error positively.

SMCRO in Mobile Robot
Localization.In order to compare SMCRO with a standard particle filter based on 3D depth images (see [15]), we set up a simulated kidnapped robot scenario, where a mobile robot is randomly placed in a predefined area at an unknown pose (, , ) ∈ R 2 × [−, ) in a given global 3D map as shown in Figure 23.The particle filter weights the particles with a likelihood representing independently and identically normally distributed errors for the depth values.Note that the map is 3D and the depth image simulation (for expected depth images in the weighting step) is also done in 3D, whereas the pose estimation is only in 2D.Our robot is moving omnidirectional on four wheels and is equipped with eight ToF cameras, each having a field of view of about 35 ∘ × 45 ∘ and a resolution of 48 × 64 pixels.This setup is chosen such to represent the KUKA OmniRob, equipped with eight O3D100 ToF cameras of ifm.The robot's task is to get "home" immediately, which can be divided into three subtasks: at first the robot actively localizes itself using the ToF cameras and a particle filter to get its current pose in the global map.After a successful localization the robot plans a piecewise linear path to reach the goal, containing about 20 waypoints.Finally the robot moves to the goal along the calculated path.For the odometry as well as the ToF data artificial noise is added.At every waypoint the robot corrects its odometrical pose using the ToF cameras to reduce the dead reckoning error.For correcting the pose, the standard particle filter is used.SMCRO is running in parallel, enabling a comparison of pose estimates, starting with the initial localization.
In order to achieve a better comparison the whole task has been repeated 40 times, resulting in a total number of 866 pose estimation steps.Figure 24(b) shows the errors of all these steps, starting with the first run.Clearly a periodic behavior is seen, which resembles the strong dependency of the pose estimation quality from the real pose in the environment.Note that the histograms (see Figure 24(a)) have been cut off on the right for better visibility.Both histograms and plots show that the accuracy of SMCRO outperforms that of the standard particle filter.The medians of translational and rotational errors are 76.4 mm and 0.94 ∘ for the standard particle filter and 47.6 mm and 0.26 ∘ for SMCRO.The maximum of translational and rotational errors are 637.5 mm and 13.8 ∘ for the standard particle filter and 340.5 mm and 4.4 ∘ for SMCRO.
Note that the mean computation times are 1.59 s for the standard particle filter and 2.1 s for SMCRO.In contrast to SMCRO, the standard particle filter has already been optimized for the mobile robot application.The number of beams from the depth images used for the updates are dependent on the number of particles in order to assure acceptable computation times.8.5.Discussion.The comparison with offline global methods yields no definite best method.But they clearly show that the method is competitive in accuracy and robustness with available state-of-the-art methods in many cases.The advantage of no extra computation time is clear, as even for these simple objects the offline methods need up to 2.5 minutes for getting similar results, whereas our method does not need any extra computation time.This effect will dramatically increase with larger objects, which could not be investigated with the hardware setup in this paper, due to kinematic constraints.The comparison between uniform and normal/Bingham sampling show that normal/Bingham sampling does neither improve accuracy nor robustness.The investigation of the different scoring variants (SPFR, SMCR, SMCRO, and SMCRC) shows that SMCR has a clearly better convergence behavior than SPFR.The optimization in SMCRO yields faster convergence, higher accuracy, and higher success rate.In many cases, it is advantageous and poses no problem to optimize in every weighting step.However, this has to be done carefully, as convergence to false transformations can occur.In the data sets of this paper, no delay in updating occurred when updating in every step.However, in bigger data sets, it could lead to problems concerning computation time.The given convergence criterion proved to work efficiently when combined with optimization, with only slightly lower accuracy and robustness.
For autonomous modeling with SMCR, the results show that almost complete 3D models including object parts which are not visible in the initial pose can be created.Further, the average model error when comparing to ground truth is not increased by the object repositioning and SMCR which shows that the pose estimation is performed accurately for all runs.
First experiments in mobile robot localization show that SMCR is able to achieve a significantly higher pose accuracy than a tuned standard particle filter with only slightly higher computation time which can easily be optimized.

Conclusion and Future Work
In this work, Monte Carlo registration methods have been presented and advanced.The offline particle filter variant searching in the space of rotations outperformed the state-ofthe-art algorithms, especially when prior knowledge is available.The proposed curvature features proved to be robust under sensor noise.For streaming application, the scoring of rotations is too time consuming.Thus, the space of rigid body transformations is searched in this case.Various real data experiments showed the competitiveness of the streaming variant.Thereby, convergence behavior and influence of prior knowledge have been investigated.The streaming variant has been enhanced with pose optimization and convergence criterion.The applicability in autonomous 3D modeling has been proven by various experiments with an industrial robot and a laser striper.The integration of the streaming registration into autonomous object modeling worked robustly and allowed for obtaining complete high quality 3D surface models of initially unknown objects.Finally, experiments in mobile robot localization showed that the straightforward application without any tuning yielded a higher accuracy than a standard particle filter (with comparable computation time).
Future work will focus on autonomous feedback of failure, in order to enable rescanning and detailed investigation of other convergence criteria.Furthermore, we want to apply the method during modeling of object scenes as presented in [56] where the template models contain less data as objects are occluded by others.Moreover, we want to apply the method to real mobile robots for localization and for modeling of larger indoor areas of buildings with big data sets.If data becomes too big to keep all feature points in memory, probably the most demanding challenge will be the combination with and the development of data structures that enable reloading and unloading afforded feature points for the weighting.

Figure 1 :
Figure 1: A humanoid robot (a) acquires a depth image of an object scene with a Time-of-Flight camera and registers the model of a power screwdriver ((c), (d), (e)) which allows for grasping the object (b).(c)-(e) depth values are color coded from red (close) to green (far), and the located model is shown in blue (points) and gray (CAD).

Figure 2 :
Figure 2: (a)-(d) The MNC, MaNC, MiNC, and EVQ13 of a wooden workpiece (high values are light red and low values are blue).(e)-(h) The corresponding histograms (most frequent value cut-off).

Figure 4 :
Figure 4: The noise in a ToF cam depth image prevents reliable geometric feature calculation.The power screwdriver is colored light blue.

Figure 5 :Figure 6 :
Figure 5: Point cloud models describing a screwdriver: (a) shows the template point cloud model (light blue) and the calculated feature points (red).(b) shows the cropped depth image with emphasized feature points (red) extracted from the ToF data (before plane deletion).

Figure 8 (
Figure 8(b)  shows truncated Bingham sampled rotations in a neighborhood of radius 90 ∘ .For better visibility, the normal distribution's variance is chosen as 18 ∘ in this example.In practice, we choose the standard deviation to be half the neighborhood radius.

Figure 9 :
Figure 9: Scoring rotations: the maximum number of translations in a ball neighborhood in the set of translations defines a score for the clusteredness.

Figure 10 :
Figure 10: The two data sets obtained by selecting parts of the Stanford Bunny.

Figure 11 :
Figure 11: The rotational errors for the Stanford Bunny, comparison with methods available in PCL.A standard boxplot from python's matplotlib-package is overlaid with a density plot, in order to capture the multimodal distribution of the errors.
Figures 12(a), 12(b), 12(c), and 12(d) show the feature points and the reduced feature points used to match two scans of a wooden workpiece.Figures 12(e) and 12(f) show the result: the complete surface

Figure 12 :Figure 13 :
Figure 12: Modeling a wooden workpiece.Two sets of feature points ((a) and (b)), the reduced feature points ((c) and (d)), and the result after registration ((e) and (f)): the remeshed model (e) and the whole model with textures (f).

Figure 14 :
Figure 14: Violin plots of the rotational errors for the screwdriver compared to methods from PCL.Note that the Hough voting method never finds a solution, GC only in around 66% of the cases, and SAC-IA in around 1% of the cases (or less, when all points are used).MCR gives a result in all 1000 runs.

Figure 15 :
Figure 15: An industrial robot with attached laser striper performs a scan of an object placed onto a pedestal.The features are calculated in a real-time stream and are used to estimate the object's pose.

Figure 16 :
Figure 16: Zeus bust, bunny, and wooden chevron object (from left to right): the test objects (a) with corresponding template models (b) used for the experiments and surface models (c) acquired during autonomous object modeling (see Section 8.3.5).Colors in the template model describe the different feature classes: light/dark red occur in convex regions, blue/purple in concave regions (plane regions are removed).

8. 2 . 1 .Figure 17 :
Figure 17: The SMCR workflow is divided into three main modules: the 3D Data Acquisition, the Depth Image Stream Processing, and the Particle Filter.Each module contains different components.

Section 3 )
. The resulting feature point stream is handled to the Stream Feature Classifier component, which classifies the features according to the class borders of the template.Finally, the feature point stream is collected in the Feature Point Collector component from which the Particle Weighter component acquires the latest feature points on demand.The Particle Filter module itself contains the Particle Sampler, the Particle Weighter, and the Particle Resampler components and works as described in Section 4. The Particle Sampler component starts sampling the particles when it is notified by the Sensor Movement Detection component.It performs a neighborhood sampling of the transformation particles (see Section 5).When finished, the Particle Weighter component acquires the latest feature points from the Feature Point Collector component.The particle weighting is carried out according to Section 6.After weighting, the particles are resampled with an importance resampling step by the Particle Resampler.

Figure 18 :
Figure 18: (a) Violin plots of exemplary translational errors in mm and rotational errors in degree for SMCR (left) and SAC-IA (right) for scan  6 .(b) Scan  6 (green) on ground truth model of Zeus.

Figure 20 :
Figure 20: Two typical scans of Zeus bust before ((a) and (b)) and after alignment (c).

Figure 21 :
Figure21: Exemplary error convergence of SPFR (red), SMCR (green), and SMCRO (blue) for 1000 runs on a bunny (top) and a Zeus (bottom) scan.Optimization is performed in every (solid) step and every 5th (dashed) and every 10th (dotted) step.Left: translational error in mm.Right: rotational error in degree.-axis: step number.The black horizontal line represents the success threshold of 8 degrees or 8 mm.

Figure 22 :
Figure 22: 3D model of Zeus bust from bottom view without (a) and with repositioning the object and performing SMCR (b).

Figure 23 :
Figure 23: Kidnapped robot problem: a robot is randomly placed in a predefined area (green) in a known map.After self-localization the robot plans a path (yellow) to its goal.
Errors of all update steps of all iterations, sorted by runs

Figure 24 :
Figure 24: Comparison between SMCRO (blue) and a standard particle (red) based on depth images.In the plots the errors are sorted by runs, resulting in periodic effects.
Polygon Meshes and Point Clouds.Let in the following  be a point with surface normal   and neighborhood ().For a  ∈ () \ {} we define

Table 1 :
Mean computing time , mean rotational error  in degree, and number of successful estimations in percent ().Entries are for sampling types det/randdet/rand., the coarse search is aborted.Then, the fine search is started with the initial resolution  2 .If a better rotation can be found, the local grid sampling is repeated around that rotation with the initial fine resolution  2 ; otherwise the current resolution is cut in half and the local search is repeated.If the resolution falls below a minimum, the search is finished.

Table 2 :
Mean computing time , mean rotational error  in degree, and number of successful estimations in percent.Entries are for scoring methods table and nb.

Table 3 :
sr,   , and   for different initial a priori knowledge.Fifth row: only rotations about the -axis are sampled.

Table 4 :
100 runs: sr,   , and   and the mean computation time  for SMCR, MCR, and SAC-IA.
Overview of the integration of autonomous object modeling with streaming Monte Carlo registration.Gray boxes represent modules, white ovals robot-sensor system actions, and blue diamond boxes decisions.

Table 7 :
Comparison of modeling results without and with repositioning using SMCR (average of 10 runs).