Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network

In order to better realize the orchard intelligent mechanization and reduce the labour intensity of workers, the study of intelligent fruit boxes handling robot is necessary. 'e first condition to realize intelligence is the fruit boxes recognition, which is the research content of this paper. 'e method of multiview two-dimensional (2D) recognition was adopted. A multiview dataset for fruits boxes was built. For the sake of the structure of the original image, the model of binary multiview 2D kernel principal component analysis network (BM2DKPCANet) was established to reduce the data redundancy and increase the correlation between the views. 'e method of multiview recognition for the fruits boxes was proposed combining BM2DKPCANet with the support vector machine (SVM) classifier.'e performance was verified by comparing with principal component analysis network (PCANet), 2D principal component analysis network (2DPCANet), kernel principal component analysis network (KPCANet), and binary multiview kernel principal component analysis network (BMKPCANet) in terms of recognition rate and time consumption. 'e experimental results show that the recognition rate of the method is 11.84% higher than the mean value of PCANet though it needs more time. Compared with the mean value of KPCANet, the recognition rate exceeded 2.485%, and the time saved was 24.5%. 'e model can meet the requirements of fruits boxes handling robot.


Introduction
As the primary industry of the national economy, agriculture is the primary condition for all production, and the proposal of precision agriculture has put forward higher requirements. e fruit industry, as a labour-intensive industry, has a large demand for labour and low work efficiency. e automation and mechanization industry chain needs upgrade urgently [1]. With the rapid development of the artificial intelligence, the fruit recognition and fruit picking have always being studied [2].
ere are relatively few studies on fruit handling [3]. On farms and in wholesale fruit markets, the handling of fruits boxes is still dominated by manual labour, which is time-consuming and labour-consuming. In the new era, the cost of manual labour is increasing year by year, which cannot meet the demand of precision agriculture. In order to realize intelligent handling, this paper studied the fruits boxes recognition based on machine vision.
According to the different modelling methods of target appearance, the research results of target recognition in recent years have been divided into three categories [4]: based on feature invariants, representation learning, and deep learning. e view models based on feature invariants extract the features of multiple images from different perspectives and then train the classifier, which are used for the occasions with a small number of training samples. e research studies mainly focus on the construction of artificial features and classification algorithms, and many outstanding works have emerged. Due to the necessity to study the characteristic invariance of the target in advance, candidate features have characters such as weak adaptability, weak generalization ability, and large application limitation. It has the large feature description vector dimension and high training cost of the classifier. Researchers proposed the methods based on subspace learning to solve the problems, which transformed high-dimensional feature vectors into low-dimensional ones. e classifiers were trained in the subspace. e typical representative methods are as follows: principal component analysis (PCA) based on unsupervised learning and linear discriminant analysis (LDA) based on supervised learning. Based on these, the methods with low data dimension, strong noise processing ability, and high efficiency were put forward, such as robust PCA (RPCA), inductive RPCA (IRPCA), kernel PCA (KPCA), two-dimensional PCA (2DPCA), and discriminative low-rank and sparse principal feature coding (D-LSPFC). With the emergence of a large number of public image datasets, the target recognition methods based on deep learning have been studied more and more. e models based on convolutional neural network (CNN) promoted the development of computer vision in particular by virtue of its strong nonlinear feature expression ability and good generalization performance. Region CNN (R-CNN) [5] applied deep learning in the target recognition for the first time. And then deep convolutional neural networks Fast R-CNN [6] and Faster R-CNN [7] were proposed by combining the training and testing process, which improved the identification accuracy and efficiency greatly. As the product of integrating fuzzy logic reasoning and self-learning ability of neural network, neurofuzzy network has also been widely used [8].
e CNN-based single shot detector (SSD) [9] and the YOLO [10] deep learning object detection method further improved a new height in real-time effect. On this basis, the proposed YOLOv2 [11] and YOLO V3 [12] gradually improved the running speed and robustness, and the detection performance had been significantly improved. YOLO V4 [13] achieved double improvement in speed and accuracy, which took CSPDarkNet53 + SPP + PANet (path-aggregation-neck) + yolov3-head as the model. It is undeniable that the effect of the target recognition algorithm based on deep learning is remarkable, and the recognition accuracy is much higher than the traditional manual methods and the representation learning methods. However, it cannot be ignored that the target recognition still has great challenges in some occasions, such as target overlap, partial occlusion, high similarity, complex environment, and strong interference. e methods with complex models, long training time, and high requirements for hardware computing power have affected the application in mobile robots.
As a three-dimensional (3D) object, the direct extraction and recognition of 3D features for fruit box lead to complex calculation and high operation storage. A viewbased method is adopted in this paper, that is, 3D objects are represented through multiple views. As a common method of 3D objects recognition, multiview learning model and recognition method have also received more attention.
e multiview-based convolutional neural network (MVCNN) was built in [14]. e maximum pooling layers blended the multiple views features. e MVCNN model had low convergence speed and low efficiency of feature extraction because it was not end-toend network.
e end-to-end group-pair convolutional neural network (GPCNN) was established in [15]. e small-scale problem could be solved. e novel pairwise multiview convolutional neural network (PMV-CNN) was proposed in [16], which focused on complementary information between views. e feature extraction and target recognition are unified into CNN. It could improve the robustness of feature extraction obviously when the number of training samples was small. In order to make up for the disadvantages caused by random images selection in multiview recognition, a multiview discrimination and pairwise convolutional neural network (MDPCNN) was obtained by adding the Slice layer and the Concat layer in [17].
e model was verified that it had good intraclass compactness and interclass separability. e multiviewbased Siamese convolutional neural network was exploited in [18]. An end-to-end multiview 3D fingerprint learning model was proposed in [19], which included full convolutional network and three Siamese networks. e multiview generator module was used in [20] to project the 3D point cloud to the plane at a specific angle. On the premise of retaining the underlying features, spatial refusion operation was adopted to realize the interaction between different projections, and the features were reconstructed for target recognition. Based on the semisupervised learning and expectation maximization, a multiview fusion strategy classification method with the ability of label propagation was proposed in [21]. An endto-end cloud convolutional neural network was built in [22] based on the projection network mechanism. e point cloud was projected into a two-dimensional view with rich discriminant features, and the robustness and accuracy had been improved significantly. Multiview features projections were coded as binary in [23]. e recognition descriptors were assembled block statistical features. Although the above methods have achieved good results, the models based on the convolutional neural network (CNN) also have some problems, such as complicated structure, long training cycle. ey do not seem to be the best choice for fruits boxes handling robots. From the above research, it can be concluded that considering the relationship between multiple views can improve the recognition accuracy and robustness, and the binary coding method can improve the operation efficiency of the model, which also become the research factors of the new model developed in this paper.
No system is perfect. e hidden state of the inevitable uncertainty in the system can be stimulated, and the connection between these uncertainties and the object system can be established to improve the system performance [24]. Although fruit packing boxes are generally regular cubes, traditional rule-based feature extraction and recognition methods cannot achieve better results because of the variety of fruits and the influence of surface patterns, colours, and surrounding environment. erefore, deep learning algorithm is more advantageous. e current deep learning target recognition algorithm is an end-toend solution; that is, it is completed in one step from the input image to the output task result, but it is completed in stages internally as image feature extraction network classification and regression. Aiming at the long training time of the classic CNN parameters, the simple principal component analysis network (PCANet) was built in [25]. e convolution layer of CNN was introduced into the classical feature extraction framework of "Feature Map-Pattern Map-Histogram." e unsupervised hierarchical features were obtained. e high computational complexity caused by iteration and optimization was avoided. It has been widely used with simple model and rapid calculation. Since PCA could not extract the nonlinear relationship between images, the kernel principal component analysis network (KPCANet) model was established in [26], which achieved better classification results than PCANet. In order to remove the redundancy of multiperspective views, our team proposed a binary multiview kernel principal component analysis network (BMKPCANet) model [27] for the multiview objects recognition. However, the model converted two-dimensional image matrix into vector when the features were extracted, and the original image structure was destroyed, and the computation was also large, so we improved the model. Inspired by the two-dimensional principal component analysis network (2DPCANet) [28] and the twodimensional kernel principal component analysis (2DKPCA) [29], the images of fruits boxes were processed by 2DKPCA, and a new multiview feature extraction model was established.
e main contributions of this work were summarized as follows: (1) A binary multiview two-dimensional kernel principal component analysis network (BM2DKPCANet) model was built to extract clustering features, which can reduce data redundancy and realize binary multiview clustering. (2) e multiview recognition method of fruits boxes was proposed combining BM2DKPCANet model with the support vector machine (SVM). (3) e proposed method was compared with the PCANet, 2DPCANet, KPCANet, and BMKPCANet models on the fruits boxes dataset and ETH-80 and COIL-100 public datasets. Taking the recognition accuracy and time consumption as the evaluation indexes, the experiments showed that the recognition performance of the proposed method was superior to other methods. e rest of this work was organized as follows. e methods based on the 2DPCA and KPCA are introduced in Section 2. e obtaining method of fruits boxes images from multiview angles is introduced in Section 3. e feature extraction process of the proposed BM2DKPCANet algorithm and the identification process of fruits boxes are also discussed in detail in Section 3. e experimental process, results, and discussion are shown in Section 4. Finally, the research and the future work are summarized in Section 5.

Related Works
In order to reduce the sample dimension and obtain the nonlinear correlation between multiple pixels, some scholars have proposed a series of algorithms by synthesizing the advantages of 2DPCA [30] and KPCA [31]. Nhat and Lee [32] proposed the kernel-based 2DPCA, which directly extracted nonlinear features from two-dimensional images.
e nonlinear correlation analysis of matrix was realized. However, the storage requirement of kernel matrix was higher when training samples were large. Zhou et al. [33] calculated the low-rank approximate decomposition of kernel matrix using Cholesky decomposition method to achieve nonlinear feature extraction. e computational efficiency was low in the test stage. Xu et al. [34] used Laplace to reduce dimension after the 2DKPCA. Choi et al. [35] proposed the incremental 2DKPCA (I2DKPCA), which reduced the calculation speed and improved the performance of feature extraction. Zhang et al. [29] built the 2dimensional kernel PCA (2DKPCA) framework. e performance of unilateral 2DKPCA (row and column) and that of bilateral 2DKPCA in face and object recognitions were compared, respectively. Mohammad et al. [36] matched historical parameters by bilateral 2DKPCA. Xiang et al. [37] realized dimensionality reduction for hyperspectral images using the segmented row-column K2DPCA method. In order to reduce the storage requirement and computational complexity of kernel matrix, blockwise methods were proposed [38,39], which transformed the large kernel matrix into several small kernel matrices and then combined the eigenvectors of small kernel matrices. Wang and Zhou [40] mixed image blocks and vector method. e scale of the kernel matrix was decreased by taking several adjacent rows or columns of the graph as a computing unit for nonmapping. Chen et al. [41] proposed bidirectional two-dimensional kernel quaternion principal component analysis (BD2DKQPCA) for colour image recognition. e kernel matrix was used to replace the covariance matrix between samples, which avoided the high-complexity calculation of high-dimensional space. en they improved 2DKQPCA by adding blockwise process [42]. According to the characteristics of quaternion Hermitian matrix, the blocks of main diagonal, next to the main diagonal, and backdiagonal direction were analyzed. rough the research of the above algorithms, considering the recognition and computing performance, this paper sampled images in blocks when extracting the image features. It had been demonstrated that the recognition performance of the column-oriented algorithm was superior to the row-oriented algorithm by experiments in the proposed B2DKPCA [38], the bidirectional two-dimensional KQPCA (BD2DKQPCA) [41], and the block-based 2DKQPCA [42]. So this work adopted column-oriented algorithm to conduct 2DKPCA; that is, the column vector of the image sample is mapped to a high-dimensional space through the nonlinear mapping function. e kernel matrix replaced the covariance matrix. Journal of Robotics 3

Establishment of Multiview Dataset of Fruits Boxes.
is work adopted the multiview feature method to collect images. Under the principle of ensuring that the set of projected views is as small as possible and can represent many common attitudes of the boxes, several two-dimensional projections with different viewpoints are used to describe the features of the boxes. In order to describe and establish visual model preferably, the relative position relation between fixed view and boxes in different positions was transformed into the relation between relative movement view and fixed boxes. Various observed postures of the boxes in normal operation were collected under the motion view. Since the opposite sides of the boxes had the same pattern generally, multiple semiarc viewpoint projection model was set up, as shown in Figure 1. e camera kept moving on the green cambered surface, and the multiple different postures of the boxes are obtained. e semiarc viewpoints surface must be divided into small areas to obtain the projection of 3D targets with different attitudes. e view areas are reasonably divided and distributed viewpoints to ensure that the projection view set is as small as possible and can represent multiple common attitudes of the boxes. e distribution of viewpoints was described by the representation of latitude and longitude in geography based on the idea of uniform division and morphology diagram method [43] to simulate the box postures in the real situation, as shown in Figure 2.
e projection of the box at each viewpoint corresponds to a two-dimensional image, respectively, and the multiview projection model of the box was constructed. e experimental objects were from the fruit wholesale market of Zhangdian District, Zibo City, Shandong Province, China. A total of 15 different types for 10 kinds of fruits boxes were selected in the experiment, which were defined as apple1, apple2, apple3, watermelon1, watermelon2, orange1, orange2, cantaloupe1, cantaloupe2, pomegranate, pear, durian, coconut, banana, and pineapple, as shown in Figure 3. Multiview collection was carried out for the boxes of each category, which is shown in Figure 4. 200 samples of each category were retained. e image size was normalized to 32 × 32, and gray processing was carried out.

Multiview Public Datasets.
In order to fully verify the performance of the proposed multiview recognition algorithm, the recognition performance tests are also carried out on public datasets ETH-80 [44] and COIL-100 [45]. e ETH-80 dataset contains 8 species classes. Each species is an image set of 10 different objects, which contains 41 images of each object taken from different angles. 4 objects of each species were randomly selected to form the training set, and the rest were used as the test set in this paper. e COIL-100 dataset contains images of 100 objects. Each object was taken at 72 different angles within a 360°circumference. e training set was composed of the 720 images of 20 objects randomly. e partial images of the ETH-80 and COIL-100 are shown in Figure 5.

BM2DKPCANet Model Based on 2DKPCA
3.2.1. Construction of Feature Extraction Model. Since the image database is composed of several two-dimensional multiview images, the images as much as possible represent the common postures of the boxes, which lead to a lot of data redundancy. In order to reduce unnecessary data storage, this paper added clustering step in the feature extraction model of fruits boxes, as shown in Figure 6. According to the related research principal component analysis network, the two-layer 2DKPCA network was constructed. e extracted feature vectors were binary clustering coded at the same time. e clustering feature representation of decimal system was obtained by block histogram transformation, and the clustering feature extraction was completed.

BM2DKPCANet Model
(1) First 2DKPCA. e image size of database was adjusted to m × n. As the input layer I i , patch sampling was sliding performed by k × k window. All sample patches were gathered and cascaded. e jth patch of the ith image was defined as x i,j . e ith image could be expressed as   Journal of Robotics where m was the number of patches on rows and n was the number of patches on columns. e demean sample patch was obtained as follows: e local feature matrix of the ith image could be written as After doing the same progress for the other images, the feature analysis based on 2DKPCA was performed on the local feature matrix. Due to not needing explicit form after mapping, and in order to avoid complex calculation in highdimensional space, the covariance matrix after samples mapping was replaced by kernel matrix [41]. Training sample matrix I i (i � 1, 2, . . ., S) ∈ R m ×n was converted to local eigenmatrix X i (i � 1, 2, . . ., S) ∈ R k 2 ×mn after patches sliding sampling. e dimension of the column direction kernel matrix for S training samples is Smn × Smn, which requires a large amount of computation. is work adopted average column vectors to replace the original mn column vectors [29]; then the sample of nonlinear mappings for training became X i ; that is, where i � 1, 2, . . ., S, t � 1, 2, . . ., mn. S training samples can be approximately expressed in the kernel feature space as follows: en the kernel matrix can be expressed as e dimension was reduced to S × S, and the computational complexity reduced greatly. e kernel matrix K1 was centralized [46], such that   Journal of Robotics where 1 was the matrix of order S whose all components were 1. e eigenvectors corresponding to the top L 1 largest eigenvalues of K 1 were taken as the kernel principal component filters of the first-layer network.
e training sample I i after the zero-filled boundary was convolved with the first-layer 2DKPCA filter, where * was two-dimensional convolution, Ι l i ∈ R m×n , and L 1 was the filters number of the first 2DKPCA.
(2) Second 2DKPCA. Taking the output of the first 2DKPCA as the input of the second 2DKPCA, the same process as the first 2DKPCA was repeated. e nonlinear high-dimensional mapping of the image matrix was carried out. e kernel matrix K 2 was calculated and centralized to K 2 approximately. e first L 2 kernel principal component features of K 2 were used as the filters convolution kernel W 2 ℓ of the second-layer network: Similarly, the output of the first 2DKPCA was further convoluted, and the output of the second 2DKPCA could be obtained:    Journal of Robotics 1, 2, . . . , S; l � 1, 2, . . . , L 1 ; ℓ � 1, 2, . . . , L 2 .

(12)
(3) Binary Hash Features Clustering. Similarly, the output of the first 2DKPCA was further convoluted, and the output of the second 2DKPCA could be obtained: in order to reduce the data redundancy caused by the multiangle acquisition process of the box image, the clustering operation was carried out in this stage. Binary clustering algorithm [47] used binary encoding technology to solve the problem of multiview clustering. Binary encoding and clustering for multiple views were jointly optimized at the same time. e problems of big data storage and long time-consuming operation were well improved. It reduced the computation time and storage space greatly. e speed and efficiency were enhanced. e model proposed in this paper encoded and clustered multiview dataset at the same time, and the total optimization function was set as where α m was the weight of the mth view, m � 1,. . .,M. Different views had different weights. r > 1 was scalar that controlled the weights. B � [b 1 , . . . , b n ], b i was collaborative binary code of the ith instance, and each encoding B was represented by the product of a clustering centroid C and indicator vector g. U m was mapping matrix of mth view. ϕ(Ο l m ) was the kernel function based on nonlinear RBF mapping between the output feature of the second 2DKPCA and selected sample points randomly under the mth view. c was nonnegative constant. G � [g 1 , . . . , g n ] and λ are the regularization parameters. e optimization problem was divided into several subproblems. U m , B, C, G, and α m were optimized and updated alternately by an alternating optimization strategy. When some variable was updating, other variables were fixed. e corresponding optimization cost functions were as follows; then the sample of nonlinear mappings for training became X i ; that is, where con is the constant with respect to B. H is the distance from each B to the cluster center. Until the total optimization function was optimal, the binary hash clustering optimization was completed.
(4) Output of the Block Histogram Features. L 2 features were outputted for each input I l i in second 2DKPCA, whose binary cell vector was clustered and optimized as a whole. Each optimized feature was converted to decimal,

Journal of Robotics
where T l m i , l ∈ [1, L 1 ], and each pixel was an integer within [0, 2 L 2 − 1 ]. Z blocks of each T l m i were counted by histogram Zhist(T l m i ). A vector can be obtained by connecting Zhist(T l m i ), (20) where f m i is the BM2DKPCANet feature of the ith sample under the mth view.

Fruits Boxes Recognition Based on BM2DKPCANET
Model.
e fruits boxes features extracted by BM2DKPCANet model were input into the classifier for training recognition. e performance of classifier determines the recognition accuracy and classification speed directly. Support vector machine (SVM) is widely used in the field of pattern recognition because of its outstanding advantages in solving small sample nonlinear high-dimensional pattern recognition [48,49]. is work also used SVM as classifier. According to previous studies [27], the radial basis function (RBF) was selected as kernel function, which mapped the features into the high-dimensional space to find the optimal hyperplane. Correct recognition of different kinds of fruits boxes achieved. e specific identification process of fruits boxes is shown in Figure 7

Results and Discussion
e experiment was performed by Matlab2017b and Python integrated environment Anaconda3 on the Intel(R) Xeon(R) CPU E5-1650 v4@3.6 GHz, 64 GB RAM, NVIDIA GeForce GTX 10808G GPU platforms. e classifier kernel parameters were selected by grid search method and cross-validation method based on the LibSVM software package. e penalty parameters C � 58 and c � 2 were determined. e following experiment analyzed influence of parameters on model performance taking the average accuracy after 10 tests as the evaluation index. e recognition accuracy was used as the evaluation metric: where n is the total number of images in the dataset, g i is the ground-truth of images, and map(c i ) is the classification predicted by the algorithm. If Z i � map(c i ), then ϕ(Z i , map(c i )) � 1; otherwise ϕ(Z i , map(c i )) � 0.

Influence of Kernel Function.
KPCA is a nonlinear extension of principal component analysis using kernel technique. e selection of kernel function is related to the extraction of nonlinear features of dataset and affects the performance of model recognition directly. is paper studied the influence of commonly used kernel functions on the performance of the BM2DKPCANet model, such as Linear, Polynomial, PolyPlus, Gaussian, and Sigmoid kernel functions. eir corresponding expressions are as follows: where c, d, σ, α are all real constants. is work defined that c � 0, d � 3, σ � 1, and α � 1/2. υ i , υ j ∈ R mn is the row vector of the matrix to be transformed. e influence of kernel function on model performance in the same parameters environment was studied, as shown in Figure 8. It can be seen that Gaussian kernel function adopted in the model can achieve the best recognition effect.

Influence of Number of Filters.
e patch size, block size, and overlapping ratio were set as 5 × 5, 8 × 8, and 0.5, respectively. e influence of the number of filters on the performance of the model was studied, as shown in Figure 9. e blue line represents the accuracy on the fruits boxes dataset when the number of filters of the first 2DKPCA was selected within range from 2 to 14. e accuracy tended to be stable when L 1 ≥ 8. e selection of the second 2DKPCA filter was conducted with L 1 � 8. e red line represents the accuracy on the fruits boxes dataset when the number of filters of the second 2DKPCA was selected within range from 2 to 14. e accuracy is levelling off when L 2 ≥ 6. L 1 � 8, L 2 � 6 would be set in the following experiment.

Influence of Patch Size and Block Size.
Since PCA filter has the conditions of k 1 k 2 ≥ L 1 , L 2 , the minimum patch size was set to 3 × 3. In order to observe the influence of patch size and block size on the recognition in this proposed model, the block sizes were defined as 4 × 4, 8 × 8, and 16 × 16. e maximum patch size was set to 13 × 13. e accuracy with different patch size and different block size in the fruits boxes dataset is shown in Figure 10. It can be obtained that the accuracy tends to increase as the block size increases. Whereas larger block size will lose part of the features of the first-layer network [25], the block size is set to 5 × 5, and the block size is set to 8 × 8 in this paper.

Influence of Overlapping Ratio.
It has been verified that overlapping blocks not only improve target detection accuracy [50] but also resist geometric rotation and scaling changes to some extent. e robustness is also enhanced 8 Journal of Robotics [51]. In order to strengthen the spatial information of fruits boxes, overlapping partitioning was carried out in this paper. e overlapping ratio of blocks was set from 0.1 to 0.9, respectively. e influence of overlapping ratio on fruits boxes recognition with the optimal other parameters was shown in Figure 11. It can be seen that when the block overlapping ratio is 0.6, the recognition performance of the model is optimal.

Analysis of Experimental Results.
In order to verify the recognition ability of the proposed algorithm for fruit packing boxes, 80 images of each type of packing boxes were randomly selected as training samples, and the other 120 images were taken as test samples. e experiment was done with the optimal parameters. e recognition accuracies of different categories were shown in Table 1.
e overall recognition meets the requirements of fruits boxes handling robot. For apple3, watermelon1, orange2, cantaloupe2, pear, and pineapple, etc., the top and side surfaces are easily to be confused, which lead to the lower accuracy. e average accuracy is 92.89%, which increased by 2.09% compared to the BMKPCANet [27] model. is model was compared with PCANet [24], 2DPCANet [28], KPCANet [26], and BMKPCANet [27] in terms of recognition rate and time normalization to verify the performance of BM2DKPCANet model, as shown in Figure 12. It can be seen that although the proposed BM2DKPCANet model has more time consumption than the PCANet-related model, the recognition rate is 11.84% higher than the average of the PCANet-related model and   2.485% higher than the average of the KPCANet-related model. In addition, the time consumption of BM2DKPCANet can be saved by 24.5% on average. Taking that into account, BM2DKPCANET model is better than other models in fruits boxes recognition.
e recognition experiments were conducted with the same model parameters to verify the proposed multiview recognition algorithm. e comparisons of model performance on ETH-80 and COIL-100 are shown in Figures 13  and 14, respectively. It is easy to find that the proposed BM2DKPCANET model can achieve a higher recognition accuracy compared with other models.
It can be proved that the BM2DKPCANET model has achieved a good recognition effect for the three datasets in the part of the experimental results. Compared with PCANet and 2DPCANet models, the proposed model adopts kernel principal component analysis method, which makes the features to a high-dimensional space by nonlinear mapping and then carries out PCA dimensionality reduction. e nonlinear relationship of images is extracted, whereas the calculation is more complex and takes more time than PCA. e recognition accuracy is greatly improved. Compared with KPCA in KPCANet and BMKPCANet models, the proposed model does not need to transform two-  dimensional matrix into one-dimensional vector but directly takes the average column vector method based on 2-dimensional image, which not only does not destroy the structural information of the original image as much as possible, but also greatly reduces the complexity. erefore, not only is the recognition rate higher, but it also improves the efficiency.

Conclusions
In order to reduce the labour intensity of fruits handling in fruits orchards and fruits markets, this paper studied the fruits boxes recognition based on the machine vision. e recognition of 3D boxes was transformed into the feature extraction of 2D images. For the sake of the original 2D images' structures, the established BM2DKPCANet model performed two-layer 2DKPCA analysis on the 2D images. Binary clustering algorithm was added in the feature extraction stage to reduce the data redundancy caused by the multiview acquisition. e multiview recognition method for fruits boxes was proposed by combining BM2DKPCANet model with SVM classifier based on RBF. e experimental results showed that the recognition accuracy of this method is 11.84% higher than the average of PCANet model and 2.485% higher than the average of KPCANet model, which can meet the requirements of automatic rapid identification of fruits boxes handling. It laid a foundation for realizing the intelligent mechanization of fruits boxes handling and reducing the labour intensity of fruit farmers.

Data Availability
e dataset presented in this study are available on request from the corresponding author.

Conflicts of Interest
e authors declare that there are no conflicts of interest regarding the publication of this study.