MFOC-CliqueNet: A CliqueNet-Based Optimal Combination of Multidimensional Features Classification Method for Large-Scale Laser Point Clouds

As large-scale laser 3D point clouds data contains massive and complex data, it faces great challenges in the automatic intelligent processing and classification of large-scale 3D point clouds. Aiming at the problem that 3D point clouds in complex scenes are self-occluded or occluded, which could reduce the object classification accuracy, we propose a multidimension feature optimal combination classification method named MFOC-CliqueNet based on CliqueNet for large-scale laser point clouds. The optimal combination matrix of multidimension features is constructed by extracting the three-dimensional features and multidirectional two-dimension features of 3D point cloud. This is the first time that multidimensional optimal combination features are introduced into cyclic convolutional networks CliqueNet. It is important for large-scale 3D point cloud classification. The experimental results show that the MFOC-CliqueNet framework can realize the latest level with fewer parameters. The experiments on the Large-Scale Scene Point Cloud Oakland dataset show that the classification accuracy of our method is 98.9%, which is better than other classification algorithms mentioned in this paper.


Introduction
e development and advancement of 3D scanning technology have significantly improved the measurement efficiency and accuracy of 3D laser point cloud data. e volume of 3D point cloud data is growing at an unprecedented rate. is also makes the processing, analysis, and understanding of massive 3D point cloud data become a new research hotspot in the field of artificial intelligence. Its application prospects are very broad such as photogrammetry, remote sensing, cultural relics protection, autonomous driving, and robot vision. In the process of automatic analysis and processing of 3D point cloud data for large scenes, the point cloud classifications are the basic steps, which have a great impact on subsequent higher level operations. However, due to the self-shielding or self-shielding characteristics of point clouds between different 3D objects in large-scale scenes, the research and application of 3D point cloud classification are challenging.
In recent years, many researches have done a lot of research on multiobject classification in large-scale 3D point cloud data. e DBSCAN algorithm is used to effectively and accurately segment the outdoor large scene point cloud, and then classified the point cloud based on the segmentation result [1]. However, when using DBSCAN algorithm to cluster ungrounded point clouds, some point clouds are not continuous due to the obstacles of 3D objects, resulting in oversegmentation and other phenomena in clustering. To solve this problem, the author proposes a strategy of largescale including small-scale and proximity fusion, but this method cannot split two closely connected objects, which will affect the point cloud classification results in special scenes. RangNet++ [2] uses the distance image as the intermediate representation; it projects the 3D point cloud onto the front view and compensates for the information loss caused by projection by performing a fast, GPU-based kNN-search postprocessing method. Li et al. [3] distinguish the shapes of plane, cylinder, space body, and sphere mainly by using four features of volume, normal vector, principal direction, and principal curvature. However, these features are more or less affected by occlusion and noise. Although the methods proposed in some papers compensate for the loss of information [3,4], the 3D shape features they extract are insufficient to describe a complete 3D object.
is paper proposes a 3D point cloud classification architecture based on MFOC-CliqueNet for large scene. Firstly, three-dimensional nearest neighbor optimization is performed based on neighborhood space, and the threedimensional features of each point are solved based on the adaptive optimal spherical neighborhood. By analogizing the three-dimensional feature extraction method, point clouds are projected onto three two-dimensional planes: XOY, YOZ, and XOZ, and corresponding two-dimensional features are extracted based on two-dimensional adaptive optimal circle neighborhood. Secondly, the optimal combination feature matrix MFOC is obtained by arranging and weighting the two-dimensional and three-dimensional features. Finally, the feature matrix is imported into the CliqueNet to fully learn the features related to the training data, so as to further improve the classification efficiency of large scene point clouds.
e main contributions of this paper are as follows: (1) We propose an optimal combination method of point cloud multidimensional features. e 2D and 3D features of the point cloud are arranged, weighted, and combined, and the optimal combined feature matrix MFOC is constructed. (2) MFOC can better describe the local and contextual global structure of the point cloud in large scale. And it can improve the accuracy of cloud multitarget classification in complex large-scale scenic. (3) In this paper, CliqueNet, a cyclic convolution network, is apply to the classification of 3D point cloud data in large scenes for the first time, and we propose a large-scene 3D point cloud classification architecture based on MFOC-CliqueNet. (4) We conducted experiments on the Oakland data set.
And the experimental results show that the method achieves a good classification accuracy in large-scale point cloud.

Related Works
e deep learning algorithm was first applied in two-dimensional image processing, and achieving outstanding performance and mature technology. For example, AlexNet [4] applied numerous methods to improve model performance, such as the first use of ReLU nonlinear activation function and the first use of dropout and the regularization of the network through massive data enhancement. VGG-Net [5] proposed by Oxford University has a smaller convolution kernel and deeper level than AlexNet, which further improves the parameter efficiency. GoogLeNet [6] contains a very efficient Inception module; it does not use a fully connected network as VGG-Net does, so the amount of parameters is very small. e significant work of ResNet [7] is to solve the problem of gradient disappearance during back propagation, so it can train very deep networks without adding a classification network in the middle like Goo-gLeNet. Although ResNet is very efficient, not every layer of such a deep network is effective, these redundant convolutional layers and feature maps will reduce the parameter efficiency of the model. To this end, Huang et al. [8] proposed DenseNet, the goal of this network is to improve the efficiency of information flow and gradient flow between network layers and to improve the efficiency of parameters.
is structure ensures that each layer can directly access the gradient from the loss function, so it can train very deep networks. Although DenseNet has few parameters, the memory usage is very large. Yang et al. [9] proposed Cli-queNet, which has achieved good results in the two-dimensional image recognition task. Its advantage is that it not only has the prequel part but also optimizes the feature maps of the previous level according to the output of the latter level. is architecture is inspired by the cyclic structure and attention mechanism, that is, the feature maps output by the convolution can be reused, and the refined feature maps will pay attention to more important information. Within the same Clique module, there are forward and reverse connections between any two layers, which also enhances the information flow in the deep network. It has the advantages of parameter amount and calculation amount. erefore, we introduce the CliqueNet to perform feature learning on the 2D and 3D features of the large-scene 3D point cloud and obtain better classification results.
As the advancement of 3D laser scanning technology, the quality and accuracy of the collected 3D point cloud data have been greatly improved. erefore, a study of the analysis and processing based on the 3D point cloud data is significant. Many researchers have begun to apply deep learning to 3D point cloud data processing. In order to make 3D point cloud data suitable for 2D convolutional neural networks, 3D point cloud data are usually preprocessed and then input into the network. At present, the main methods to solve the problems of unstructured and disordered point clouds are as follows. (1) Multiview: Representing threedimensional objects with pictures from different angles, that is, converting 3D CNN into 2D CNN technology for classification and other tasks. e most representative study is MVCNN [10]. e author projects 3D objects through different angles to obtain multiview 2D images to represent 3D objects. However, in the face of complex large-scale scene tasks, a fixed number of multiview images cannot describe all the targets in a three-dimensional large scene well. (2) Volumetric: mainly to solve the problem of disorder of point cloud, by regularizing the 3D point cloud. is method does improve the performance of point cloud classification, but due to its large amount of computation and low resolution of voxel mesh, it takes up a lot of memory and loses local information, it is still not suitable for complex large scenes.
ere are some representative studies, such as 3D ShapeNets [11], the probability distribution of 3D voxel grid binary variables is used to represent the 3D shape. It learns the joint distribution of input and labels by constructing a convolution DBN. VoxNet [12] is proposed by Maturana and Scherer, which uses semisupervised 3DCNN to process the occupied 3D voxel grid. (3) Point clouds are processed directly: this kind of method directly act on the original point cloud, without any transformation of the original point cloud, and retain their rich location information. Charles et al. [13] proposed PointNet, breaking the tradition for the first time, so that the network can directly handle point cloud, and the author uses symmetric functions to avoid the disorder of point cloud.
e feature extraction method of PointNet is to extract a global feature for all point cloud data. Obviously, this is different from the current popular CNN method of extracting local features layer by layer. Inspired by CNN, the author proposed PointNet++ [14], which can extract local features at different scales and obtain deep features through a multilayer network structure. PointConv [15] can construct a multilayer deep convolutional network on a 3D point cloud, this structure can achieve the same translation invariance as 2D convolutional networks, and permutation invariance of point order in point cloud. is type of method has received more and more attention. However, the type of point cloud data used in this method is different from the first two methods; the method is oriented to the 3D point cloud model data, rather than the point cloud data for the large scene.

Methods
We propose a large-scale 3D point cloud classification architecture based on MFOC-CliqueNet. We construct multidimensional feature optimal combination matrix (MFOC) of the large-scale 3D point cloud, and import CliqueNet cyclic network to the field of 3D point cloud classification for large scene for the first time. First, the "Kd-Tree" algorithm is used to search for 100 nearest neighbors for each point in the large-scene 3D point cloud data set and obtaining the optimal neighborhood adaptive radius based on the method of minimum Shannon entropy [16]. en the 3D eigenvalues and eigenvectors corresponding to the 3D covariance matrix of the point cloud are calculated based on the optimal neighborhood adaptive spherical radius. Meanwhile, the 3D point cloud was projected to the XOY, YOZ, and XOZ planes to extract the 2D features. Based on the optimal neighborhood adaptive circle radius, we calculate the 2-dimensional eigenvalues and eigenvectors corresponding to the point cloud. However, the simple horizontal random combination of the multidimensional features of the point cloud is not good. erefore, it is necessary to combine 2D and 3D features according to certain principles to obtain a multidimensional feature matrix. Here, we propose some optimal combination principles: (1) Arranging the 2D projection features diversely to obtain the optimal arrangement features. (2) e 2-D and 3-D features are combined with different weights, and the optimal combination features are obtained according to the specific permutation weighted combination experiment.
(3) Finally, the optimal combination of the multidimensional features matrix MFOC is integrated into the CliqueNet to construct the MFOC-CliqueNet to achieve 3D point cloud classification. e MFOC-CliqueNet architecture is illustrated in Figure 1.
In order to better describe the local structure of the largescale 3D point cloud, this paper adopts the two-dimensional projection feature extraction based on the two-dimensional nearest circular neighborhood and the three-dimensional feature extraction based on the three-dimensional nearest spherical neighborhood, respectively.
e characteristic values, characteristics, and geometric features of the matrix based on 3D covariance matrix are extracted using the 3D structure tension [17]. e 3D point cloud is projected to the XOY, YOZ, and XOZ planes, and the 2D structure tensor is used to extract the 2D features of the point cloud.

A Multidimensional Feature Extraction of the Large-Scale 3D Point Cloud
Due to the complexity of outdoor cloud objects in three-dimensional scenic area, selfshielding and be shielded of three-dimensional point cloud data are prone to occur, which affects the accuracy of point cloud object classification. If the point cloud is only projected on the XOY plane [18], it cannot well describe the three-dimensional spatial characteristics of the nonplanar structure point cloud objects. In response to this situation, we project the point cloud onto the three 2-dimensional planes XOY, YOZ, and XOZ to obtain richer point cloud feature information, as shown in Figure 2: (1) e representation form of any point cloud in the XOY plane is (2) e representation form of any point cloud in the YOZ plane is (3) e representation form of any point cloud in the XOZ plane is We use the neighborhoods sought by KD-Tree and the neighborhood optimization based on minimizing Shannon entropy to obtain the best adaptive circular neighborhood r k . And then the two-dimensional structure tensor of each Computational Intelligence and Neuroscience point is calculated, and the sum of feature values s 2 , value ratio R λ,2D , and two-dimensional local point density D 2 was extracted. erefore, in this paper, we extract r k , S 2 , R λ,2D , and D 2 from each plane of the XOY, XOZ, and YOZ. e given point p k (x k , y k ) is the K-th nearest neighbor of point P, and K is the optimal adaptive circular neighborhood size parameter of point P, then the radius of the circular neighborhood at point P is e two-dimensional local point density [19] is e characteristic value ratio [20] is e sum of the characteristic values is r k is the radius of the optimal adaptive circular neighborhood. λ 2,2D and λ 1,1D are the eigenvalues corresponding to the two-dimensional covariance matrix. We visualized the projection of the three-dimensional point cloud on the three planes XOY, XOZ, and YOZ.
Comparing the visualizations in the three directions of XOY, YOZ, and XOZ in Figure 3, you can find that the red area in the middle and the blue and red areas on the right of the left view (Figure 3(a)) are all obscured in the rear view ( Figure 3(b)). At the same time, the red area on the left side of Figure 3(b) is also missing in Figure 3(a). e top view (Figure 3(c)) is the most severely blocked. erefore, the single view cannot fully display the complex 3D objects in the large scene. If one direction is projected, the relatively complete 2D feature information of the 3D point cloud data cannot be extracted. erefore, in this paper, three-dimensional point cloud is projected onto the planes of XOY, XOZ, and YOZ, respectively.

3D Feature Extraction.
Based on the nearest neighbor computed by KD-tree, the eigenvalues and eigenvectors corresponding to the three-dimensional covariance matrix are computed point by point, and then the following features are computed by combining the neighborhood optimization based on the Minimization of Shannon Entropy: Dimensional features are linear, planar, and cluster-like attributes: ree-dimensional local point density is as follows:    Computational Intelligence and Neuroscience e ball radius is Nearest neighbor tetrahedron volume Q is (1) Verticality. Total variance, anisotropy, feature entropy, and trajectory at this point [21] are as follows: Given a three-dimensional point P (x, y, z), K is the nearest neighbor parameter of the point P, and r K−NN is the distance from the point P to the K-th point p k (x k , y k , z k ) in its neighborhood. e verticality can distinguish the ground point from the nonground point, and n Z is the third component of the third-dimensional feature vector of the three-dimensional structure tensor of the current point P. O λ , A λ , E λ , and T λ are local three-dimensional shape features. e 1 , e 2 , and e 3 are all normalized feature values. T λ describes the invariant characteristics of points. ere are also features related to feature vectors: v i1 , v i2 , and v i3 . v i1 is the maximum distribution direction vector, v i3 is the maximum distribution direction vector, and v i2 is the direction vector perpendicular to v i1 and v i3 , they are all three-element vectors. Until now, a total of 29 2D and 3D features has been extracted in this method.

A Large-Scale 3D Point Cloud Classification Based on
MFOC-CliqueNet. In this paper, CliqueNet is imported into the 3D point cloud classification of large scenes for the first time, and we propose a large-scene 3D point cloud classification framework based on MFOC-CliqueNet. CliqueNet not only combines the cyclic structure and the attention mechanism but also updates the optimal combination matrix (MFOC) of multidimensional features alternately twice. en, we can obtain the MFOC with higher feature quality and pass it to the next block. At the same time, it uses a multiscale feature strategy to avoid parameters linear growth, as shown in Figure 4. e TBlock module contains two parts: the Transition layer and the Clique Block. Each Clique Block has output of the Z 0 (MFOC) and stage-II (the second alternately update of the MFOC feature map parameters). ese two outputs go through concatenation and then connect to a Global Pooling as part of the prediction. Z 0 represents the input of the first Clique Block, and each subsequent Clique Block input is the result obtained after the output stage-II of the previous block passes through the Transition layer.
us, each layer is both the input and output from the other layers in Clique Block.
Each Clique Block of CliqueNet contains two stages: the first stage is similar to DenseNet [8], which can be regarded as the initialization process of MFOC; secondly, the input of each convolution operation includes not only the output Computational Intelligence and Neuroscience feature map MFOC of all the previous layers also includes the output feature map MFOC of the subsequent level (Output after the update of Clique Block module). For the i-th layer and k-th cycle in the second stage, the alternately updated expression is where k ≥ 2, W li * Z l means that the convolution kernel performs a convolution operation on the input feature matrix graph Z, and g is a nonlinear activation function. In this paper, the obtained feature matrix Z0 is input into CliqueNet, and passes through 64 convolutional layers of 5 × 5 with a stride of 2; then it passes through a pooling layer of 3 × 3 with stride of 2. Each Block is connected by Transition layer, and the network is trained with 250 epochs. e CliqueNet is set as a three-stage network, including convolution layer, pooling layer and block module. And the CliqueNet node parameter setting at each stage is shown in Table 1. In order to adapt the CliqueNet to the multidimensional feature matrix (32 × 32 × 1) of the large-scale 3D point cloud obtained in this paper, we need to modify the parameters of the original CliqueNet: (1) K and T in the original network are too large, which is easy to cause over fitting. We reduce K and T to 12 and 16.

Datasets.
e framework proposed is tested on the public Oakland 3D point cloud data set [22]in this paper. e data set is the most widely used, acquired by Mobile Laser Scanner, with marked point cloud data, which is saved in text format. ree real value coordinates are written in each line. e data set represents the urban environment and captured by a mobile platform equipped with a side SICK LMS laser scanner. e data set is divided into training set X, verification set V, and test set Y. Each 3D point is assigned one of five semantic categories, namely wire, pole/trunk, facade, ground, and vegetation. e number of samples in each category is shown in Table 2.

Network Training Details.
In this paper, the initial learning rate of the network is 0.001, the total number of training rounds is initially set to 300 epochs, and it adopt a gradual decrease in learning rate. When the number of training rounds is equal to 150 epochs, the learning rate becomes 0.1 of the initial learning rates. e number of rounds is equal to 225 epochs, and the learning rate becomes 0.01 of the initial learning rates. We use the momentum optimization method with a decay rate of 0.9 to train the MFOC-CliqueNet framework.
e MFOC-CliqueNet framework in this paper takes about 5 hours to train on the GTX 2060 GPU using TensorFlow.
As shown in Figure 5(a), the initial learning rate is 0.03 (orange line) and the initial learning rate is 0.01 (green line), their test accuracy rates fluctuate greatly and are extremely unstable in the first 150 epochs, even training tests after 150 epochs, the accuracy of the initial learning rate of 0.001 (blue line) has been consistently higher than the previous two. For the choice of optimizer, it can be seen from Figure 5(b) that the performance of the SGD optimizer has the worst accuracy, while the Momentum optimizer performs best, and after several training rounds, its accuracy has been stable at more than 90%.

Multidimensional Features Arrangement.
ree-dimensional point clouds are projected onto the XOY, YOZ, and XOZ planes in X, Y, and Z directions, respectively. Four twodimensional features, r k , D 2 , S 2 , and R (λ,2D) , are obtained for each plane. In the process of point cloud projection, the result of feature extraction will be affected by occlusion or self-occlusion between objects. erefore, in this paper, the two-dimensional features extracted from the projected surface are tested in different directions to test the optimal multidimensional feature arrangement.
Initially, the weighted combination ratio of multidimensional features defaults to 3D : 2D � 1 : 1, which means the arrangement of two-dimensional features is added to the original three-dimensional features. Suppose XOY � B, YOZ � C, XOZ � D, there are six different arrangements [BCD, BDC, CBD, CDB, DBC, DCB]. e six arrangements were tested experimentally.
Comparing its experimental results (as shown in Table 3), the overall classification accuracy (OA) of the first  Computational Intelligence and Neuroscience three categories is not significantly different, which are all above 97%. However, compared the classification accuracy of each category, the classification efficiency of the DCB sorting is higher than that of other sorting. (Although the classification accuracy of the two types of "poles and vegetation" arranged by DCB was lower than that of other sequences, there was little difference in the comparison accuracy between them.) e comparison accuracy of the other two categories, "wire and facade," ranked first in DCB, with a large difference. erefore, we take the X � [(r k , D 2 , R (λ,2D) , S 2 ) xoz , (r k , D 2 , R (λ,2D) , S 2 ) yoz , (r k , D 2 , R (λ,2D) , S 2 ) xoy ] as the best arrangement of multidimensional features.

Weighted Combination of Multidimensional Features.
On the basis of the previous section, we have obtained the best arrangement and combination of two-dimensional and three-dimensional features. e best arrangement based on two-dimensional characteristics is X � r k , D 2 , R (λ,2D) , S 2 xoz , r k , D 2 , R (λ,2D) , S 2 yoz , r k , D 2 , R (λ,2D) , S 2 xoy . (15) Here, XOY, YOZ, and XOZ represent three two-dimensional projection surfaces, and the three-dimensional features of the point clouds are arranged as follows: e multidimensional feature weighted combination feature matrix is where W i is the weight. Since the two-dimensional and three-dimensional features have different effects on target classification, the best classification accuracy will not be obtained by simply combining the two-dimensional and three-dimensional features in a 1 : 1 scale.
is paper combines the two-dimensional and three-dimensional characteristics of point clouds with the optimal arrangement of different weights to study the validity of point cloud classification accuracy [23]. e experiment shows in Table 4, the 2D features extracted on the 2D projection, and the optimal arrangement DCB obtained from the above section is used for weighted combination of multidimensional features. It can be seen  Computational Intelligence and Neuroscience from Table 4 that the overall classification accuracy of each weighted combination has little difference, with the maximum value of 98.22% and the minimum value of 96.90%.
Comparing the classification accuracy of the second, fourth, and fifth categories of the first three different weight combinations, it can be concluded that the lower the threedimensional feature ratio, the lower the classification accuracy.
For the first and third categories, when the combined weight of multidimensional features is 3D : 2D � 0.9 : 0.1, the classification accuracy is the lowest, but the overall classification accuracy is higher than other weight combinations. e experimental results show that the three-dimensional feature is more important than the two-dimensional feature for point cloud classification in large scenes. is paper chooses the weight 3D : 2D � 0.9 : 0.1 to combine multidimensional features to further improve the classification accuracy.

Scaling of Multidimensional Features.
e feature combination matrix Z can be divided into three parts according to the value of the weight w. First, if w � 1 then it means that the multidimensional features size remains unchanged. Second, if the weight w < 1 or w > 1 then the features of different dimensions are scaled; and the performance of effective features may be increased by scaling different features. It may also increase the size range between features which is disadvantaged to the fast convergence of gradient descent and thus affects the final classification accuracy. Combining with the multidimensional features matrix designed in our paper, the experiment shows that scaling features of different dimensions with different weight values will affect the classification accuracy of point cloud as shown in Table 5.
Comparing Tables 4 and 5, we reduce and normalize the multidimensional features. e experimental results show that this classification result is better than other amplification weight values. For example, the ratio of 3D : 2D � 0.9 : 0.1 can obtain the best classification accuracy. rough properly scaling and normalization of multidimensional features, the data feature range can be standardized, so that the gradient descent process converges faster.
In summary, the two-dimensional features matrix is the optimal arrangement of X � X D , X C , X B for multidimensional feature weighted combination, and the optimal feature combination matrix Z (MFOC) is obtained as the input of CliqueNet: In the large-scale 3D point cloud classification architecture of MFOC-CliqueNet designed in this paper, the optimal combination matrix Z 0 (MFOC) of multidimensional features of different label categories is visually displayed for observation, as shown in Figure 6.

Number of Training
Epochs. An epoch refers to the process of sending all the data set sent to the network to complete a forward calculation and back propagation. As the number of epochs increases, the number of weight update iterations in the neural network will also increase. However, if  there are too many epochs, it is prone to over-fitting. Conversely, too few training rounds will cause convergence to be too slow and not optimal state. erefore, in order to make the MFOC-CliqueNet, in this paper achieve a good fitting state, the number of epochs is very important. Here, this paper makes an experimental comparison of different epoch numbers to get an efficient classification for large-scale 3D point cloud. e experimental results are shown in Table 6. As can be seen from the above table, with the training epoch increases from 160, 200, and 250 to train the MFOC-CliqueNet network in this paper, the accuracy of each type of point cloud classification and the overall training accuracy are also improved accordingly. erefore, the method in this paper compares the classification accuracy with other methods based on this data set, as shown in Table 7. e experimental results show that the proposed method achieves an overall classification accuracy of 98.9% in Oakland point cloud dataset. Compared with our previous works [18], the overall classification accuracy is improved by 4.05%. And compared with papers [25][26][27][28][29], the overall classification accuracy has increased by 7.14%, 5.27%, 4.12%, 3.3%, 1.8%, and 1.21%, respectively. Figure 7 shows the contrast effect of threedimensional point cloud visualization in a large scene, Figure 7(a) is the visualization of the real-world data set, and Figure7(b) is the visualization of the classification results of the algorithm in this paper. In this paper, all point clouds in a large scene are projected in three directions: XOY, YOZ, and XOZ, and the overall classification accuracy is significantly better than other methods. For pole classification, XOY direction projection will affect the information feature   Bold values highlight the advantages of our method and other literature methods in classification accuracy comparison. extraction, resulting in poor classification efficiency of this category. At the same time, for wire classification, it is also affected by the XOZ direction projection, and making the classification efficiency not optimal. However, for the other three categories of objects, due to the projection in different directions, the self-occlusion and occluded of the 3D point cloud data itself are minimized, and more complete feature information of these point can be obtained, thereby improving the classification accuracy of these categories.
In the next work, we will improve the MFOC-CliqueNet further. Aiming at the situation that the geometric characteristics of some object categories in the 3D scene cloud data are close, such as pole and wire, we will improve the feature extraction efficiency of these two types of objects. Not only to improve the classification accuracy but also to enhance the robustness and generalization of MFOC-CliqueNet.

Conclusions
In the study of large-scene 3D point clouds, we introduce a new network structure MFOC-CliqueNet, based on the optimal combination of multidimensional features, which constructs the optimal combination matrix of multidimensional features by extracting the 3D features of 3D point cloud and the 2D features of multiple projection directions. CliqueNet is introduced into 3D point cloud data processing for the first time. It used a fixed number of parameters to obtain deeper representation space and combined with cyclic feedback to achieve the attention mechanism. It gets the MFOC with higher feature quality and passes it to the next Block by performing twice parameter cycle alternately update processing. At the same time, the multiscale feature strategy is adopted to effectively avoid the linear growth of parameters.
e experiments show that we proposed MFOC-CliqueNet can reach the best level with fewer parameters, especially the total classification accuracy on the Oakland 3D large-scale point cloud data set reaches 98.9%. Different from the previous network, the proposed MFOC-CliqueNet provides the potential of the model development for other computer vision tasks in the future, especially for the application of increasing 3D data, such as semantic segmentation and salient objects detection of video, point cloud, remote sensing data [29].

Conflicts of Interest
e authors declare no conflicts of interest.