3D Point Cloud Simplification Based on k -Nearest Neighbor and Clustering

. While the reconstruction of 3D objects is increasingly used today, the simpliﬁcation of 3D point cloud, however, becomes a substantial phase in this process of reconstruction. This is due to the huge amounts of dense 3D point cloud produced by 3D scanning devices. In this paper, a new approach is proposed to simplify 3D point cloud based on k -nearest neighbor ( k -NN) and clustering algorithm. Initially, 3D point cloud is divided into clusters using k -means algorithm. Then, an entropy estimation is performed for each cluster to remove the ones that have minimal entropy. In this paper, MATLAB is used to carry out the simulation, and the performance of our method is testiﬁed by test dataset. Numerous experiments demonstrate the eﬀectiveness of the proposed simpliﬁcation method of 3D point cloud.


Introduction
e simplification of a 3D point cloud, obtained from the digitization of a real object, is a primordial and important step in the field of 3D reconstruction. is step ensures the optimization of the number of points that constitute the 3D point cloud [1]. e scanning of a real object is facilitated by a device called 3D scanner [2]. is device may be broken down into three primary sorts: contact, active noncontact, and passive noncontact.
Simplification of a 3D set of points can be defined as follows: being given an original surface S presented by a point cloud X such that |X| � N, simplification of X consists of calculating a point cloud X ′ such that |X| � M, knowing that |.| is a cardinality. After simplification, we obtain a simplified point cloud such that |X ′ | ≤ |X|. It should be noted that X ′ samples a surface S ′ close to the original surface S that is sampled by X.
Several scientific articles have studied and presented simplification methods. Pauly et al. [3] proposed a method based on hierarchical decomposition of the sample of points, calculated by binary partition of space. e cutting planes are defined by the centre and the main direction of each region. e partitioning criterion depends both on a maximum number of points and on variations in local geometry in a region. Due to the spatial nature of this approach, it is difficult to control the quality of the distribution of points on the sampled surface. Wu and Kobbelt [4] computed an optimal set of splats to cover a sampled surface. e first step of the method consists in locally approximating the surface at each point of the sample by a circular or elliptical plane surface element called a splat. In the second step, the redundant splats are eliminated during a filtering process of the surface expansion type. To guarantee the recovery of the entire sampled surface, the algorithm proceeds as follows. For each splat processed, the points it covers are projected onto its plane, and then only the splats associated with the points projected inside the convex envelope of the projected points are eliminated. During this process, the regularity of the distribution is not checked. A relaxation phase can be applied to determine an optimal position for the remaining splats. is method makes it possible to generate high quality splat covers for smooth surfaces, by filtering noise. However, this method is penalized by the cost of its initialization and that of the relaxation phase for large point samples. Linsen [5] presented a technique that associates a scalar value with each point locally measuring the average variation of certain information, such as the proximity of neighbors or the direction of normal. e points with the weakest measurement are removed iteratively. e algorithm has the disadvantage of not giving any guarantee on the density of the resulting set of points. Dey et al. [6] used an approximation of the LFS (local feature size) of the sampled area.
is approximation is calculated from the Delaunay triangulation of the sample of input points, which has the drawback of very large samples. Alexa et al. [7] estimated the local geometrical properties of the sampled surface using a Moving Least Squares (MLS) model of the underlying surface, which requires having oriented normal in a consistent manner. ey calculate the contribution of a point to this surface by projecting it onto an MLS surface estimated from neighboring points. e distance between the position of the point and its projection on the surface provides a measure of error. e points for which this distance is the smallest are removed. is method does not guarantee the density of the resulting sample points. To compensate, Alexa et al. [7] proposed to enrich the sample in the undersampled regions by considering the projection of these on a plan. ey calculated the plane Voronoi diagram of the projected points so as to insert new points equidistant from the first.
ese new points are then raised to the surface using the projection operator. e process is repeated until the Euclidean distance between the next point to be added and the nearest existing point becomes less than a certain threshold. While this method achieves quality results, the intensive use of the MLS projection operator makes it expensive for very large samples. Pauly et al. [3] have directly extended the mesh simplification technique of Garland and Heckbert [8] for point samples by considering the relations of nearest neighbors as connectivity relations. Pairs of nearest neighbors are thus contracted, replacing two points with a new point calculated as a weighted average of the first. e cost of each contraction operation is measured by adapting the error measure proposed by Garland and Heckbert, whose idea is to approximate the surface locally by a set of tangent planes and to estimate the geometric deviation of a point, with respect to the surface represented by the sum of the distances squared to these planes.
is method has the advantage of controlling the distribution of the simplified sample, which also has the property of preserving the details. However, its initialization cost is high, and it requires the maintenance of an overall priority queue, which is a disadvantage for large samples of points. Xuan et al. [9] proposed a progressive point cloud simplification technique, founded on the theory of the information entropy and normal angle. e fundamental of this technique is to find the importance of points using the information entropy of the normal angle. Calculation of the normal angle is based on the normal vectors. e simplification operation is carried out by removing the less relevant points.
Leal et al. [10] proposed a simplification technique comprised of three stages. First, to cluster point cloud, the expectation maximization algorithm is used. Second, the point cloud to be removed using curvature is selected. ird, linear programming is used to simplify point cloud. Ji et al. [11] proposed a simplification technique named detail feature points simplified algorithm. In this technique, a rule of k neighborhood and an octree structure are used to reduce point cloud.
e first key interest of this paper is point cloud simplification. e extraordinary simplification point cloud strategies reviewed in the literature may be classified into three categories: subsampling algorithms, resampling algorithms, and a mixture of them [12]. A first strategy for simplifying a sample of points is to break it down into small regions, each of which is represented by a single point in the simplified sample, while the resampling algorithms rely on estimating the properties of the sampled surface to compute new relevant points. In the literature, these principles have been applied according to three main simplification schemes: simplification by selection or calculation of points representing subsets of the initial sample [3], iterative simplification [6], and simplification by incremental sampling [13].
e second key interest of this paper is the clustering notion. Clustering is a statistical analysis method used to organize raw data into homogeneous groups. Within each cluster, the data are grouped according to a common characteristic. e scheduling tool is an algorithm that measures the proximity between each element based on defined criteria. Clustering is an integrated concept in several areas such as pattern recognition [14], machine learning [15], and 3D point cloud simplification [12,16]. In the literature, there are many clustering techniques [17]. e work in this article is based on clustering to optimize the number of points constituting an original 3D point cloud in order to obtain another simplified 3D point cloud close to the original. e third key interest of this paper is generally information theory and particularly the concept of Shannon's entropy [18]. is work is based on this concept to select the set of points grouped into cluster in order to simplify the original point cloud. Information theory is presented in different areas such as data processing [19,20], data clustering [21], and 3D point cloud simplification [1,9].
In this work, we are inspired by the work of Wang et al. [22] in order to provide a robust method of simplifying the point cloud. is technique is based on the notion of entropy [18] and clustering algorithm [17].
is paper is organized as follows. In Section 2, we evoke the density function estimator and entropy definition. en, in Section 3, we present clustering algorithm used in our method. In Section 4, we demonstrate how to evaluate simplified meshes. Afterwards, in Section 5, we lay out our 3D point cloud simplification algorithm based on the Shannon's entropy [18]. Section 6 lays out the experimental results and the validation of the proposed technique. Finally, we wrap up with a conclusion.

Clustering Algorithm
e k-means clustering [23] is a type of unsupervised learning and analysis. e goal of this algorithm is to find groups in data, with the number of groups represented by the variable K, in which each goal belongs to the group with the closest average. e k-means clustering will be thought of as the foremost important unsupervised learning approach, which is widely used in pattern recognition and machine intelligence. e details of k-means clustering algorithm are presented in [17].

Density Estimation and Entropy Definition
In this 3D point cloud simplification work, we use the concept of entropy to simplify point clouds. e calculation of the entropy requires the estimation of the density function. Multitudes density estimation approaches exist in literature, such as parametric and nonparametric methods. e first category makes it possible to estimate a parameterized model of a density function such as the maximum likelihood estimator method [24]. e nonparametric category includes the kernel density estimator, also known as the Parzen-Rosenblatt method [25,26], the k-nearest neighbor estimator (k-NN), and a combination of them [27]. Each type has its advantages and disadvantages. For Parzen estimator, the bandwidth choice has strong impact on the quality of the estimated density [28]. In other words, the main motivation stems from the fact that k-NN estimator represents a solution to adapt the amount of smoothing to the local density of the data [21,27].
e parametric approach has the main disadvantage of requiring prior knowledge of the probability law of the random phenomenon under study. e nonparametric approach estimates the probability density directly from the available information on the set of observations. We are interested here rather in the nonparametric category, specifically the k-NN estimator.

Density Estimation Using k-NN Approach.
In this work, an unstructured approach, so called nonparametric estimation, was used to estimate density function. ere are two kinds of nonparametric estimation methods: one is the Parzen density estimator [25] the other is the k-nearest neighbor (k-NN) density estimator [27]. In this paper we use k-NN technique to estimate density function. In the literature, the k-NN concept is used in several fields related to classification as in articles [29][30][31]. e level of the estimator is defined by k, which is an integer number of the nearest neighbors, generally proportional to the size of the sample N. Definition of the density estimate is done for any point x.
e distances between objects of the sample and points x are as follows: where R i with i � 1, . . . , N are distances sorted in ascending order. e k-nearest neighbor estimator in d dimension can be defined as follows: where R k (x) is the distance from x to the kth nearest point and K(u) is the Gaussian kernel: en, we obtain where V k (x) is the volume of a sphere of radius R k (x) and C d is the volume of the unit sphere in d dimension.
Equation (5) is the special case of (2) when K is the uniform kernel. e later function is defined as follows:

Shannon's Entropy.
Shannon's entropy [18] is a mathematical function, developed by Claude Shannon in 1948, that corresponds intuitively to the amount of information contained or delivered by an information source. is latter can be a text, an electrical signal, or any numerical file. For a source, which is a discrete random variable x with n symbols, each symbol X i has a probability p � p 1 , . . . , p N to appear. e entropy H of the source x is defined as where E is the expected value operator and log 2 the logarithm in base 2. Shannon's entropy can be found in the literature in various fields of research such as stock market [32], image segmentation [33], and cryptography [34]. e main reason for using Shannon's entropy is that it is a function that intuitively quantifies the amount of information in a variable. In order to remove irrelevant points, our simplification technique is based on the estimation of the amount of information.

Simplification Error.
In order to evaluate the accuracy of the novel simplification method, the geometric error between the original and simplified point cloud to be measured is used. To make a comparison between two surfaces, Cignoni et al. [35] developed a tool called Metro. Also, Pauly et al. [3] and Miao et al. [36] adopted a technique to measure simplification errors. In this paper, we evaluate the maximum geometric error and the average geometric error between the original model X and the simplified one X ′ . e geometric max error is defined in paper [3] as e geometric average error is defined in paper [3] as e corresponding normalized geometric errors can then be obtained by scaling the above error measures according to the model's diagonal of bounding box.
For each sample point q ∈ X, the geometric error d(q, X ′ ) can be defined as the Hausdorff distance between the q on the original surface and its projection point q ′ on the simplified surface X ′ . e Hausdorff distance is defined as follows: where d(., .) is an Euclidian distance. If N q is the normal vector of point q and q ′ is the projection point on the simplified surface X ′ , the sign of d is the sign of N x * (q ′ − q).

Surface Compactness.
To measure the quality of the obtained meshes, Gueziec [37] proposes a formula to compute the quality of the triangles. It is called compactness formula and is defined as follows: where L i are the lengths of the edges of a triangle and α is the area of the triangles as shown in Figure 1. Note that this measure is equals to 1 for an equilateral triangle and 0 for a triangle whose vertices are collinear. According to [38], a triangle is of acceptable quality if z ≥ 0.6.

The Simplification Method Proposed
e goal of 3D point cloud simplification is to choose the relevant and representative 3D points and remove redundant data points. In this work, the k-means clustering algorithm [23], which has been extensively used in the pattern recognition and machine learning literature, is extended to simplify dense points. As noted in Figure 2, the k-means algorithm is used to subdivide point cloud into c clusters.  e size of the clusters is equal to 5% of the size of the original set of points. Subsequently, to select the clusters to be deleted, Shannon's entropy [18] will be used.
In this paper, we present a new robust approach based on clustering and Shannon's entropy.
is approach allows keeping a uniform distribution of the points of the resulting cloud. In addition, it makes it easy to control the overall density of the coarse cloud by simply defining the size of the clusters. is approach, as shown in Figure 2, simplifies the 3D point cloud by saving the characteristics of the model presented by the original point cloud. Moreover, this simplification method preserves contours and sharp feature. Also, small features are maintained in the simplified point sets. is new method can be adapted to simplify nonuniformly distributed point sets.
Data clustering in small sets of points, using information theoretic clustering algorithm [21], makes it possible to obtain groups containing points having a great similarity, which guarantees a good quality of simplification with an acceptable calculation time. To subdivide data sample into groups of 3D points, our technique of simplification is based on information theoretic clustering algorithm [21].
Next, the selection of relevant points in each cluster is done using Shannon's entropy [18]. e set of relevant points is the representative data samples that contain more information selected from the original dataset based on the proposed sample selection algorithm [1].
Compared to other simplification algorithms such as those of Shi et al. [16], Lee et al. [39], and Miao et al. [36], the advantages of the new algorithm are analyzed from many factors.
Firstly, our simplification method allows keeping the borders. is preservation of the integrity of original border is attributed to the nature of our method, as it uses Shannon entropy, which allows keeping clusters that have a high Input (i) X � x 1 , x 2 , . . . , x N : the data sample (point cloud) (ii) C[ ]: the array in which cluster indexes are stored (iii) c: the number of clusters (iv) n: the number of clusters to delete (n < c) (v) Ec min � E(R 1 ): minimal entropy (vi) Begin (vii) Decomposing the initial set of points X into c small clusters denoting X � R j (j � 1, 2, . . . , c), using the k-means algorithm (viii) For i � 1 to n For j � 2 to c Calculate global entropy of a cluster j by using all data samples in R j � y 1 , y 2 , . . . , y m according to equation (7), Note this entropy ALGORITHM 1: Simplification of 3D point cloud based on the clustering algorithm and Shannon's entropy.  Advances in Multimedia entropy value, and this is the case for borders. Secondly, the novel algorithm preserves compactness of the surface obtained from the simplified point cloud. is characteristic is measured by calculation of the percentage of compact triangles using (11) proposed by Gueziec [37]. e construction of surfaces used in this article is realized using ball pivoting method [40].
e summary of contributions is as follows: (i) Subdivide 3D dataset to clusters using k-mean clustering [23], which is widely applied in the pattern recognition and machine learning literature (ii) Shannon's entropy [18] is applied to select clusters of 3D point cloud, where it is applied to data classification (iii) e effectiveness and performance of the novel method are validated and illustrated through experimental results and comparison with other point sampling methods (iv) e new algorithm is validated and illustrated by the test of its efficiency and its performance through the realized experiments and the comparison with other simplification methods e full description of the 3D point simplification algorithm, Algorithm 1, is as follows: We note that the level of simplification of our approach is mainly determined by the user. is level is defined by the number (n) of clusters to be removed and the size of these clusters. In this work, the density of the clusters constituting the original point cloud is equal to 5% of the number of points of the original point cloud.

Results and Discussion
e new technique was implemented using MATLAB and MeshLab software. e algorithm for this new technique was run on an Intel 64 core i5-2540M CPU 2.60 GHz PC. e David model and the Stanford Bunny model tested in this paper were developed at Stanford University [41]. e Fandisk, Max Planck, Genus, and Bimba models were obtained from the AIM@SHAPE database [42].
In order to approve the robustness of the proposed technique, we apply it using various 3D objects of different sizes and topologies. To ensure a better reconstruction, the surfaces of all the point clouds of the simplified objects were reconstructed using the MeshLab software [43].

Computing of Compactness.
Computing of the compactness of the original and simplified surface of Bimba gives, respectively, 65.9498% and 66.7420%. e two values represent the percentages of the compact triangles of the two surfaces. e two previous results, Figures 3 and 4, show that this method ensures and increases the compactness of the simplified surface of Bimba. Calculation of the compactness is done using (11).

Results of the Novel Simplification Method.
e novel strategy can deliver balanced point cloud. Among the models tested in this paper, we used nonuniform objects such as the models of David, Bimba, and Max Planck. After simplification of these point clouds using the new method, we obtained satisfying results with the preservation of small details. erefore, we can use the new technique for the simplification of nonuniform point clouds. Figure 6 shows two models simplified using the new technique.
ese point sets have boundaries. e Genus model was simplified from 1234 to 1134, and the Fandisk model was reduced from 103568 to 93809. e experimental results obtained in Figure 6(b) indicate that the new technique can preserve the boundaries. Furthermore, the original sharp edges were well maintained, which again illustrates the superiority of our technique. e novel method can produce some sparser level-ofdetail point sets while preserving the small features and the sharp edges. In Figure 7, the sharp edges of the bunny model can be clearly seen when the point set is reduced from 16130 to 15813. is example demonstrates the good performance of the proposed method.

Comparison with Other Simplification Methods.
e adaptive simplification of point cloud using k-means clustering of Shi et al. [16] and 3D Grid method [39] was employed for a comparative study. e simplification results were triangulated with the software MeshLab [43]. In Figure 8, the famous Fandisk model was simplified. Since there was no redundant data in the original model (vertices 2502, faces 5000), we increased the vertices with the Geomagic Studio [44]. Finally, the number of vertices was 103 570. As shown in Figures 8 and 9, the new simplification technique gives better results either in terms of the number of points deleted or in terms of the error which presents the difference between original and simplified surfaces. We obtain uniformly distributed sparse sampling points in the flat areas and necessary dense points in the high curvature regions. e sharp edges of the Fandisk model are well maintained. e adaptive simplification of point cloud using k-means clustering of Shi et al. [16] and 3D Grid method [39] can also preserve sharp edges, but too many sampling points are assigned to the sharp edges. 3D Grid method [39] preserves fewer points in the flat areas, which leads to unbalance, unlike the proposed technique, as shown in Figure 4, which produces balanced simplified surfaces. On the other hand, as shown in Figure 4, the novel technique produces balanced simplified surfaces. Figures 8 and 9 and Table 1    Advances in Multimedia error of the original surface and the simplified surface obtained from the application of the new method is small compared to the error obtained from the method of Shi et al. [16] and 3D Grid method, which shows that our technique allows giving simplified point cloud close to that of the original one.

Conclusion
In this work, Shannon's entropy, which has been largely used in data processing, and k-means clustering algorithm, which has been extensively used in pattern recognition and machine learning literature, have been extended to reduce 3D point cloud. is simplification procedure is achieved through the removal of redundant and less attractive 3D groups of points that have a minimum entropy value. Clusters are obtained using the k-means clustering algorithm. e new method is mainly impacted by two factors: number of original clusters and number of deleted clusters. e studies and illustrations made above show that, since both factors are regulated, this new method can be applied to different levels of detail and different forms of 3D point clouds and produce well-balanced surfaces, which makes it robust, as the results show.
Data Availability e experimental data, which are in the form of 3D objects, used to support the results of this study are downloadable from the AIM@SHAPE database included in references.