Mango Grading System Based on Optimized Convolutional Neural Network

In order to achieve the accuracy of mango grading, a mango grading system was designed by using the deep learning method.)e system mainly includes CCD camera image acquisition, image preprocessing, model training, and model evaluation. Aiming at the traditional deep learning, neural network training needs a large number of sample data sets; a convolutional neural network is proposed to realize the efficient grading of mangoes through the continuous adjustment and optimization of super-parameters and batch size. )e ultra-lightweight SqueezeNet related algorithm is introduced. Compared with AlexNet and other related algorithms with the same accuracy level, it has the advantages of small model scale and fast operation speed. )e experimental results show that the convolutional neural network model after super-parameters optimization and adjustment has excellent effect on deep learning image processing of small sample data set. Two hundred thirty-four Jinhuangmangoes of Panzhihua were picked in the natural environment and tested. )e analysis results can meet the requirements of the agricultural industry standard of the People’s Republic of China—mango and mango grade specification. At the same time, the average accuracy rate was 97.37%, the average error rate was 2.63%, and the average loss value of the model was 0.44. )e processing time of an original image with a resolution of 500 × 374 was only 2.57 milliseconds.)is method has important theoretical and application value and can provide a powerful means for mango automatic grading.


Introduction
Mango is an important economic crop in southeast China. It is native to tropical areas, and its shape is similar to eggs and kidneys. It is rich in vitamins. As the second largest country in the world in mango production, China has established mango plantations in Sichuan, Yunnan, and Hainan, which have become the local supporting industries. With the rapid development of mango planting industry and people's increasing demand for mango quality, the quality of mango directly affects its market competitiveness. erefore, the mango grading has become an indispensable step. China has formulated some standards for mangoes based on shape, color, and surface defects of mango. Based on the standard of the People's Republic of China "NY/T492-2002 [1]" and "NY/T3011-2016 [2]," mango was divided into three grades according to the characteristics of mango surface defects.
At present, mango is mainly classified by manual detection or chemical extraction method, but the manual detection cost is high and the accuracy is low. e classification of mango by chemical extraction will damage the appearance quality of mango [3,4]. In recent years, with the rapid development of machine vision and deep learning theory, it has been possible to use machine learning to classify and sort fruits in large quantities, which not only reduces the labor force, but also improves the accuracy [5][6][7][8]. Li and Eng [9] took apple image as the research object, improved the deep learning target detection framework, and built the corresponding learning model. After training and testing, the accuracy rate reached 97.6%. Aiming at the difficulty of sample acquisition in fruit quality supervision learning method, Li et al. [10] took green plum as the research object and proposed an intelligent algorithm based on deep learning. e simulation analysis showed that the accuracy rate was 98.2%. Li et al. [11] proposed a mango quality grading algorithm based on computer vision and extreme learning machine neural network. Compared with the traditional back propagation neural network, the proposed algorithm has higher grading accuracy. Great progress has been made in the application of machine vision in the detection and classification of globular fruits [12][13][14][15][16][17]. e application of convolutional neural network in transfer learning can effectively solve the problems in the field of agriculture. It does not need to manually extract features and automatically classify the sample images [18][19][20][21][22]. Saad et al. [23] presented an improved algorithm for mango grading and measuring mango weight, and the accuracy of weight grading is 95%. He et al. [24] used image processing technology to automatically detect the shape of mango fruit. e evaluation indexes and methods of mango fruit shape were put forward. e cluster analysis of 50 mango fruit shape indexes was carried out to determine the classification basis of each evaluation index. e results shown that the accuracy rate of mango shape evaluation can reach 92%. Mohd et al. [25] aimed at the problems of time-consuming and high cost in traditional mango grading and proposed using computer vision to recognize the shape and irregularity of mango, so as to realize mango grading. e experimental results showed that the average success rate of mango grading was 94%. Combining the methods of machine vision and image processing, to sum up, compared with spherical fruit, mango is ellipsoidal in shape and soft in texture. e existing researches have detected and graded mango by extracting mango images, but did not refer to Chinese standards. At present, there is no research on mango grading according to Chinese national standards. erefore, the paper took the Panzhihua mango as the research object and graded it according to Chinese national standards. A convolutional neural network (CNN) deep learning model based on HDevelop development environment is established and trained.
rough automatic recognition of mango surface features, the identified image features are put into CNN model for training and testing, and the trained model can quickly grade mango. is model is compared with ResNet-50, MobileNetV2, and other models. After constantly adjusting parameters of "batch size" and "epoch," the model is evaluated and compared. e results show that this model is better than the other convolutional neural networks in processing speed and accuracy. e traditional method of image processing with machine vision is generally divided into three steps. ey are image acquisition, feature extraction, and graphics recognition. Because the deep learning classification method is used for image processing, a large number of images need to be preprocessed, trained, verified, and tested in order to get the expected classification results. e mango image processing flowchart is shown in Figure 2.

Image Preprocessing.
During experiment, a total of 234 natural mango samples were collected, and the deep learning tool was used to label the samples. ere were 55 first-grade mangoes, 60 second-grade mangoes, 58 third-grade mangoes, and 61 inferior mangoes. e labeled mango image is exported to "hdict" data set format and entered into the integrated development environment of HDevelop. It is divided into 70% training set, 15% verification set, and 15% test set; that is, 161 training samples, 35 verification samples, and 38 test samples are randomly assigned to start the deep learning model pretraining.
In order to meet the requirement of train CNN classifier, the following steps was carried out.
e pixel size of the imported mango image is 500×374, and the imported image needs to be changed to 224 × 224 × 3, so the mango image is scaled.
e image is enhanced to maintain the gray value of each image between 0 and 255, and the gray level of the single-channel image is converted to make the pixel value between −127 and 128. e specific method for gray conversion is shown in the following formula: where g(x, y) is the output image, G max is the maximum gray value, G min is the minimum gray value, and f(x, y) is the input image.
Step 3. e preprocessing result image is obtained by connecting processing and threshold segmentation. e image preprocessing process is shown in Figure 3. As one of the most classic deep learning algorithms, convolutional neural network is widely used in the field of image recognition. While processing large data image, it is different from fully connected neural network (FCNN) [29]. Convolution is used to replace matrix multiplication in convolutional neural network. A complete convolutional neural network mainly includes the following parts. e input layer is the input of the whole neural network. While processing images, it is usually the pixel matrix of the image. e convolution layer is the most important part of convolutional neural network. Its function is to analyze every small part of the neural network in depth to obtain higher abstract features. e pooling layer is used to reduce the matrix, that is, to convert a higher-resolution image into a lower-resolution image. e fully connected layer gives the classification results by using feature extraction. Among them, the fully connected layer is generated iteratively through the convolution layer and pooling layer.
In this paper, the convolutional neural network is optimized based on the ultra-lightweight network SqueezeNet algorithm. e basic module used is called fire, as shown in Figure 4. e fire basic module consists of three convolution layers. In the expand part, the results of two different core sizes are combined and output through concat. e size of squeeze partial convolution kernel is set to1 * 1, and the size of expand convolution kernel is set to 1 * 1 and 3 * 3, respectively.
In Figure 4, k represents the side length of convolution kernel and c represents the number of channels. If the input and output dimensions are the same, the number of input channels is unlimited, and the number of output channels is e 1 + e 2 . In the SqueezeNet structure proposed in this paper, e 1 � e 2 � 4 * s 1 . e structure of the whole hierarchical convolutional neural network is shown in Figure 5. In order to improve the training effect, considering the size and number of input samples and the number and size of convolution kernel, a 12-layer CNN structure based on ultra-lightweight network   is constructed. e specific structure is as follows. e first layer is the convolution layer, which reduces the input image and extracts 64 dimensional features. e second to ninth layers are fire modules. Reduce the number of channels inside each module and then expand. After every two modules, the number of channels will increase. Add downsampling MaxPooling after layer 1, layer 3, and layer 5, respectively, to reduce the size by half. e tenth level is used as a convolution layer to predict each pixel. Finally, in order to reduce the amount of calculation, the global average pooling is used to replace the fully connected layer, and the SoftMax function is used to normalize it to probability. In order to improve the generalization ability of the model, dropout technology is used to avoid overfitting, and the dropout probability is set to 0.2. e global ReLu function is used as the activation function in the training process.
Convolution layer is the most important model of CNN. Its input is multiple two-dimensional characteristic data graph. Convolution kernel is used as a filter to calculate the local data on the neuron node, and the two-dimensional characteristic data graph with convolution layer is obtained. e principle of convolution layer can be expressed by the following formula: where a l j represents the jth output characteristic diagram of l layer, M l j represents the index set of multiple output features corresponding to the jth output feature graph of layer l, a l−1 i represents the ith output characteristic diagram of l − 1 layer, * represents convolution operation, and k l ij and b i j represent convolution kernel and bias term, respectively. Convolutional neural network model is inseparable from the transfer of activation function to data. In the application of convolutional neural network, activation function must be nonlinear. e functions of sigmoid, SoftMax, and ReLu are usually used in convolutional neural network. ReLu function has been proved to be able to deal with complex problems such as gradient disappearance. e formula of ReLu functions is shown in the following formula: Pooling layer is usually inserted between successive convolution layers. Pooling layer is used to follow convolution layer to gradually reduce the space size (width and height) of data representation. e essence of pooling layer is to further select the features of the convoluted data and reduce the dimension of features by convolution kernel of  different sizes, which is executed independently on each depth slice of the input. If the input layer is a convolution layer and the first layer is a pooling layer, the expression of the convolution layer is shown in the following formula: where down(·) represents the downsampling function and β l j represents multiplicative bias. After the continuous iterative cycle of the convolution layer and the pooling layer, it becomes the fully connected layer. e fully connected layer is used as the output layer, which is used to calculate the score used as the output category of the network. e fully connected layer has general parameters used for layer and super-parameters. e fully connected layer performs conversion on the input data volume, which is a function of the activation and parameters (weights and biases of neurons) in the input space. e SoftMax function is used as shown in the following formula: where S i represents the output of the ith neuron, n represents the number of neurons, and x i represents the input signal. In this paper, mango grades are divided into four categories, which are first-grade mango, second-grade mango, third-grade mango, and NG.

Model Building Based on Convolutional Neural Networks.
e pretraining model based on HDevelop is used in HALCON software. e input layer of convolutional neural network is an image that has three channels. e size of the original image is 500 * 374 pixels. Based on the pretraining model of CNN, the convolutional neural network is built. Based on the pretraining model of CNN, a convolutional neural network is constructed by superimposing convolution layer, pooling layer, and fully connected layer. en, the processed image is output, and finally mango grade classification is carried out. e specific convolution process is shown in Figure 6.

Super Parameter Setting.
In machine learning, there are not only the parameters of the model, but also the parameters that can make the network train better and faster by tuning. ese tuning parameters are called super-parameters. ey are responsible for controlling the selection of optimization function and model during the training of learning algorithm. e key point of selecting super-parameters is to ensure that the model is neither under fitting nor just fitting the training data set and learn the data structure as soon as possible.
Parameter of learning rate refers to the amount of parameter adjustment in the process of optimization in order to minimize the error of neural network prediction. A large coefficient of learning rate will make the parameters jump, while a small coefficient of learning rate (e.g., 0.000001) will cause the parameters to change slowly. erefore, the selection of learning rate is particularly important. e parameter of momentum can help the learning algorithm get rid of the search space and keep the whole system in a stagnant state. A suitable momentum value can help to build a higher quality model. In order to prevent overfitting in machine learning, resulting in parameters out of control, regularization is needed to find a suitable fitting and maintain some low feature weight value.
Because the number of samples is relatively small, a small number of samples are used to train the network. Based on the small batch stochastic gradient descent algorithm with momentum, the loss function is a polynomial combining cross entropy error and regularization term. e calculation method is shown in the following formulas: n q+1 � n q + m q+1 , y a , log f z a , n + β 2  Mathematical Problems in Engineering where α represents momentum, σ represents learning rate, f(z, n) represents the results of classification, L(·) represents loss function, n represents weight parameter, z represents the input batch, y a represents the encoding of the ath image z a , β represents regularization parameter, and k represents the number of weights.

Parameter Test and Results
. During experiments, the parameter "batch_size" is set as 8 and the "epoch" is set as 30.
In order to avoid overfitting, the regularization parameter is set as 0.0005. e specific experiment is based on the influence on the verification set and accuracy rate, while the learning rate is "0.1, 0.01, and 0.001," and the momentum is "0.5, 0.7, and 0.9," respectively. e experimental data are shown in Table 2.
Based on the experimental data, the final choice of learning rate is 0.001, and momentum size is 0.9. But from the accuracy, it does not achieve the ideal accuracy. erefore, it is necessary to adjust the parameters "batch_size" and "epoch" to achieve the ideal training effect. Due to the limitation of experimental hardware test conditions and the number of samples, the batch size interval 4 is selected for the test, which is "12, 16, and 20," respectively, and the cycle number interval 20 is selected for the test, which is "80, 100, and 120," respectively. e experimental data are shown in Table 3.
Obviously, when the "batch_size" is set as 16 and the cycle of "epoch" is set as 120, the accuracy is significantly higher than that of other groups. After a total of 720 iterations, the training time is 73 s, which means that it takes only 0.23 s on average to process a 500 × 374 image, achieving the expected effect. So far, the whole model preprocessing and training process parameter adjustment have been completed.

Model Evaluation and Performance Analysis.
rough the above optimal adjustment of the super-parameters of the neural network, with the increasing number of iterations, the loss function curve gradually becomes convergent with the increasing cycle, and the results are shown in Figure 7. By adjusting the appropriate superparameters, the average error rate of the training set and the average error rate of the verification set are convergent with the increase of the period, as shown in Figure 8.
It can be seen from Figures 7 and 8 that with the increase of the training cycle to about 35 cycles, the accuracy of the whole CNN-based neural network is significantly improved. At the same time, the overall error rapidly drops to less than 0.5, and the overall error drops to less than 0.21, which achieves good model processing effect.

Robustness and Comparative Analysis.
In order to verify the robustness of the network model, 35 samples were tested. Among them, mango grading refers to the mango standard of the People's Republic of China "NY/T3011-2016," as shown in Table 4.
e test results are shown in Table 5, which gives the confusion matrix of thirty-five mangoes verification sets. e experimental results show that only one first-grade mango is misjudged to third-grade one in machine recognition.
Confusion matrix, also known as error matrix, is a standard format for accuracy evaluation. e confusion matrix is a matrix with n row × n column. It has many evaluation indexes, such as overall accuracy and user accuracy. It can characterize the accuracy of image classification through the evaluation index. In the process of deep learning, confusion matrix is a visual tool to supervise the learning process. Among them, the column of confusion matrix represents the prediction category, and the horizontal row represents the real belonging category of data. e 38 mango images used in the test are imported into the convolutional neural network model, which has been pretrained and run in HDevelop environment for classification. e result is as shown in Table 6. Only one first-grade mango is judged as positive sample, but in fact it is negative sample, that is, the type II error in statistics. e accuracy is 90.91%.
ere is a certain similarity between first-grade mango and second-grade mango, and their F 1-score is 95.24%. e principle expression of F 1-score is as follows: where Precision represents the accuracy of a single class and Recall represents the regression rate (the accuracy of the true value is zero). According to the mango quality grade index NY/T3011-2016 corresponding to Table 4, the accuracy of mango grade classification is analyzed. e analysis results are shown in Table 7.
e overall accuracy of the test results reaches 97.37%, and only one first-grade mango is misjudged as second grade, which achieves the expected accuracy.
Heat map, also known as thermal map, is a graphic representation of the features of an object with the form of a special highlight. By observing the heat map, we can intuitively find the user's overall access and other characteristics. Heat map analysis can often intuitively observe the prominent features of the image, with the help of heat map can accurately capture the target features. By using heat map, HDevelop development environment successfully combines the results of deep learning confidence and visual features, which make the whole classification process more intuitive. Two mango images are randomly selected from each grade of the test results, and the unprocessed images are analyzed with the heat map to find the target defect location, as shown in Figure 9. e belief propagation algorithm updates the mark state of the whole Markov random field (MRF) by transferring information between nodes, which is an approximate calculation based on MRF. e belief propagation algorithm is an iterative algorithm, which is mainly used to solve the probability inference problem of probability graph model. At the same time, all information can be spread in parallel. In this experiment, some confidence data of mango grade recognition result graph are randomly selected, as shown in Table 8. e confidence level is replaced by the probability distribution principle formula of probability, as shown in the following formula: where b i (x i ) represents the joint probability distribution of node i. m ji (x i ) represents the message that the implied node j passes to the implied node i. Φ i (x i , y i ) represents the local

Grade Requirement
First-grade mango Mango fruit shape is not deformed and size is uniform. e color of the fruit is normal and uniform. e pericarp is smooth, almost without defects, with no more than 2 single spots, and the diameter of each spot is less than 2 mm.

Second-grade mango
Mango fruit shape has no obvious deformation. e color of the fruit is normal and more than 75% of the fruit surface is uniform. e pericarp is smooth, with no more than 4 spots per fruit, and the diameter of each spot is less than 3 mm.
ird-grade mango Deformation of mango fruit shape is not allowed to affect the quality of mango products. e color of the fruit is normal and more than 35% of the fruit surface is uniform. e pericarp is relatively smooth, with no more than 6 spots per fruit, and the diameter of each spot is less than 3 mm.    evidence of node i. 1/z represents sum of confidence which can be 1. N(i) represents the MRF first-order neighborhood of node i. e information formula expression of message propagation is shown in the following formula: (11) where N(j)/i represents the neighborhood of target node. i is excluded from the MRF first-order neighborhood of node j.
x i and x j represents the hidden node. m ji represents the dissemination of information.

Algorithm Comparison.
In order to verify the rationality of the proposed algorithm, it is compared with the model based on HDevelop. Due to the limitation of hardware, the same size data set is used to uniformly set the batch size as 8, and the cycle as 60 times, the momentum is defined as 0.9, the learning rate is defined as 0.001, and the regularization parameter is defined as 0.0005. e comparison results are shown in Table 9. e recognition accuracy of this method is the highest, reaching 96.94%, and the single recognition speed is only 2.57 ms. ResNet-50 model is usually used to deal with more complex environmental tasks, and it has no obvious advantages in dealing with small batch data sets similar to the one shown in this paper. e recognition accuracy of Enhanced is almost the same as that of this method, but the processing speed is still not as fast as the model used in this paper. e MobileNetV2 has poor performance in dealing with small batch classification tasks, with low recognition accuracy and the top1_error rate is 26.67%.

Conclusions
In this paper, the deep learning method based on convolutional neural network can effectively improve the recognition accuracy of mango grade classification, which is more robust and efficient than the traditional feature recognition algorithm. By adjusting the super-parameters, batch size and period of convolutional neural network, the CNN model can achieve high recognition rate while processing small batch data sets. e whole experimental error analysis converges to the expected range. rough the algorithm comparison experiment, it is proved that the ultralightweight network SqueezeNet has the advantages of saving memory and running time when dealing with hierarchical classification tasks. It is proved that this model can better deal with the task of nondamage deep learning classification of small batch mango data sets. e optimized CNN in this paper is used to classify mangoes. Compared with the current relevant models such as AlexNet and ResNet-50, the accuracy is 97.37%, the average error rate is only 2.63%, and the processing time of an original image with a resolution of 500 × 374 was only 2.57 millisecond. e slight blemishes or color spots on the surface of mango have a certain influence on mango grading. Reducing the influence of defects and color spots on mango grading is the part to be optimized. More training samples are needed to improve the application value of the system. is system can be applied at an industrial production, but some modifications should be done.

Data Availability
e data can be obtained from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.