Deep Learning Based on Residual Networks for Automatic Sorting of Bananas

,


Introduction
In the food industry, the quality of processed fruits is extremely important. Meeting the demands of the consumers and producing high-quality fruits at the production line at a very fast rate requires the implementation of high-performance technologies [1]. Moreover, the food industry is one of the few fields which have restricting conditions and constraints due to its dependency on weather conditions and the labor market [2]. For example, if the fruits were not harvested at the most suitable time due to weather conditions, the quality and quantity of the harvest may decrease due to bad weather conditions and excessive ripening of the fruits. Over the years, the most technological processes in this industry were mainly controlled by human operators.
Some delicate tasks such as postharvest and grading of healthy and defective products were based on human-made decisions. Human operators are sometimes exposed to the tiredness of the eyes due to lack of sleep and fatigue caused by overworking that can affect their performances. Fruit sorting is a decision-making task which is based on some visual features of the fruit and decides whether a fruit is healthy or defective when it passes through a conveyor belt. erefore, it is a computer vision problem which can be perfectly solved using machine learning that can prevent the errors caused by human operators [3].
Recently, different research works have been performed for controlling and grading of fruits using computer vision and machine learning techniques. e common applications are classification and sorting of fruits [4,5], identification of the fruits defects [6], ripeness detection [7], and estimation of food security [8]. Reference [6] claimed that the products produced should have a certain weight, size, colour, and density in order to meet quality standards. erefore, they proposed a machine vision system for controlling 1-10 conveyor belts, with a maximum performance of 15 fruits per second. e system aimed to classify the fruits into a set of classes using the weight, size, and colour of fruits. e presented system was based on the automatic visual inspection on fruits and vegetables using machine vision algorithms and sensors. e developed system using visual fruits' features implemented colour processing, weights detection, size measurements, and density detection. e authors claimed that the system performance was satisfactory as it was compared to human criteria, and no significant differences were observed. Moreover, the computation time of the system has also been decreased to 15 fruits/second, and at the same time, the system controlled 2 conveyor belts.
In recent years, research works had been carried out for determination of banana size [9], banana ripeness [10], and sorting of healthy and unhealthy bananas [11]. Reference [5] presented an automatic sorting system for bananas. e system was based on the extraction of texture features of bananas using the gray-level cooccurrence matrix (GLCM). ree algorithms backpropagation neural network (BPNN), support vector machine (SVM), and radial basis function network (RBFN) were used for classification purposes. Experimental results have shown the highest classification rate of 100% using SVM. However, RBFN and BPNN scored 96.25% and 98.8%, respectively. As a result of the implementation of these research studies, the system performances such as the production quality and quantity have been increased. Additionally, the production process has switched to the faster operating mode.
Recently, different machine learning algorithms are implemented for solving different engineering and image processing problems. Machine learning, in particular deep learning techniques, has undergone a major development that sharply improved its performance in different areas such as medicine [12,13], agriculture [14], and food engineering [15]. Different deep learning structures have been designed in order to improve their performance in problem solutions. ese are AlexNet [16] with 8 layers, VGG [17] with 18 layers, and GoogLeNet [18] with 22 layers. Chronologically, the aforementioned networks were getting deeper and deeper. However, the "in-depth" structures caused an optimization difficulty during the training of the networks, i.e., vanishing gradients. Consequently, this affected the generalization performance of the network. e accuracy of the network became saturated and degraded rapidly. To overcome this problem, residual learning was employed for training very deep networks [19]. A few research studies have been performed using residual networks for solving different engineering problems. In reference [20], the combination of a deep residual neural network (ResNet) and lower and upper bound estimation is proposed for forecasting future flow in order to construct prediction intervals. In reference [21], the deep neural network is used to identify six kinds of grain pests. e residual network is introduced in order to improve convolutional vision of the model. Reference [22] presents a local binary residual block to promote the very deep residual networks on the trainable parameters. It was shown that the used structure reduced at least 69.2% trainable parameters. e study [23] presented a deep convolutional neural network termed as the dense residual network for optical character recognition. e study [24] presents multiple improved residual networks for super resolution reconstruction of medical images. Residual learning or residual networks (ResNet) builds special constructs by skipping some connections and jumping over some layers. ese ResNet models are basically designed by double or triple layer skips instead of using consecutive layer connections as it was used in other deep plain networks (AlexNet). Skipping over layers allows avoiding the vanishing gradient problem. In this study, we are using residual learning for optimization of network parameters. e study presents the design of a deep network of 50 layers, called ResNet-50, in order to sort the banana fruits into healthy or defective category. Transfer learning and residual learning are applied for the optimization of the network parameters and development of the system. e study is structured as follows. Section 2 presents the ReseNet-50 used for grading bananas. Section 3 presents the dataset and the training process of the network. Section 4 presents the results and discussion of the study. Section 5 gives the conclusion.

Residual Learning
Deep networks are multilayer neural network structures with more than one hidden layer. e learning of deep networks are basically carried out hierarchically, starting from the lower level to higher, through various layers of the network [25]. Deep learning based on convolution neural networks (CNNs) have been widely used in various areas to solve different engineering problems and showed significant performance in problem solutions [26][27][28][29][30][31][32]. As mentioned, the "in-depth" structures caused an optimization difficulty during training of the networks, i.e., vanishing gradients problem and affected the performance of the network. In this study, we present residual learning to overcome this problem and design a deep learning structure for grading the fruits. Figure 1 depicts a residual block of ResNet. As shown in the figure, in residual networks, stacked layers perform a residual mapping by creating shortcut connections which perform identity mapping (x). eir outputs were added to the output of the stacked layers' residual function F(x).
During the training of the deep network using backpropagation, the gradient of error was calculated and propagated to the shallow layers. In deeper layers, this error becomes smaller until it finally vanishes. is is called the gradient vanishing problem of very deep networks. e problem can be solved using residual learning [19] as shown in Figures 1 and 2. Figure 2 shows the original residual branch or unit l inside the residual network. e figure depicts weights, batch normalization (BN), and rectified linear unit (ReLU). e input and output of a residual unit were calculated as follows: Journal of Food Quality where h(x(l)) is the identity mapping, F is the residual function, x l is the input, and W l is the weight coefficient. e identity mapping can be written as h(x(l)) � x(l). is defines the basis of ResNet architecture. e residual networks were developed for the networks having a different number of layers, 34, 50, 101, and 152. In this study, ResNet-50 was used. e network consists of 50 layers. In the 34-layer of ResNet, by replacing each 2-layer block with 3-layer bottleneck block, creating a 50-layer ResNet was carried out.

Dataset.
e proposed ResNet-50 deep learning structure was applied for classification of bananas. e model was retrained using two banana datasets that include healthy and defective banana images. Note that healthy means that bananas were eatable and can be used in the fruit industry as raw materials, while defective means they were deteriorated and not eatable. e first database used in this research was taken from [5]. e data acquisition stage was carefully considered, and the images were captured using a digital camera and then converted into the manageable entity. Collected images are of size 960 × 720 pixels; hence, we downsampled them in order to fit the input size of ResNet-50 which is 224 × 224 pixels. e dataset contains 300 images that include 150 healthy and 150 defective bananas. e second database also contains 300 images. Here, the healthy banana images were obtained from the dataset Fruit 360 [32] that includes different fruit types. Only 150 healthy banana images were collected from this dataset. However, to make equal distribution of classes, 150 images of defective bananas were collected from the web. Overall, a dataset includes 600 images that include 300 healthy and 300 defective banana images.

Data Augmentation.
Data augmentation was employed in order to create a more robust sorting system and prevent the overfitting during the training of the network. Hence, shift translation and scale invariance were employed in order to have the power of detecting the condition of banana at different angles and shifts. erefore, the 600 original images of bananas were rotated at angle 0°, 90°, and 180°. Moreover, those images were also randomly translated up to two pixels horizontally and vertically. In total, a dataset of 2400 images were formed. Note that the half of images was healthy bananas, while the other half was defective. Figure 3 shows a sample of the formed database. . In this work, the "ResNet-50" model was retrained and tested using Matlab environment. e network was simulated on a Windows 64bit desktop computer with an Intel Core i7 4770 graphical processing unit (GPU) and 8 GB random access memory. e learning algorithm was used to train and test the pretrained model (ResNet-50). 40% of images were used for training, while the remaining 60% was used for testing and evaluating the network's performance. Note that the network was evaluated by calculating its training and testing accuracy and loss function using the following formulas:

Transfer Learning of
where the probability of the correctly classified images was denoted as P(N), n was the number of images, while T represents the total number of images during the training and/or testing phases. ResNet is a very deep network, and when it was first employed, it was used for the skip connections approach to mitigate the vanishing gradient problem. is model was first presented in ILSVRC 2015 competition with a principal breakthrough that allowed the training of more than 150 layers networks. A brief architecture of ResNet-50 is shown in Figure 4. As seen, the network consists of 4 stages excluding stage 1, each with a convolution and identity block. Each convolution and identity block was comprised of 3 convolution layers of size 1 × 1 and 3 × 3 and 1 × 1 convolutions. Stage 1 consists of 4 different layers such as convolution, batch normalization (BN), rectified linear regularization unit (ReLU), and maximum pooling (Max Pool). Finally, the network has an average pooling layer followed by a fully connected layer along with a softmax activation function (multinomial logistic regression). is output layer has two neurons in order to classify the bananas into healthy or defective. It is also important to mention that ResNet-50 has more than 23 million trainable parameters. Hence, it is a good structure in terms of computation time, employing transfer learning.
In this study, transfer learning was employed in order to leverage the knowledge of ResNet-50 into another classification task which is sorting out bananas. Transfer learning of ResNet-50 can be simply described in two stages, i.e., freezing and fine-tuning. In the freezing stage, Journal of Food Quality the publicly available weights and learned parameters of the pretrained models were frozen and used. Fine-tuning begins by removing the fully connected layer (FC) of the ResNet-50 and then rearchitecting it to three fully connected layers with two output neurons at the output layer which corresponds to healthy and defective bananas. We noted that the weights of the FC layers were initiated randomly during training. On the contrary, the weights of the remaining layers were frozen in order to act as a strong feature extractor of high levels of abstractions of input images, as they have been already trained on millions of images from ImageNet dataset [33].
As mentioned, the network was trained using only 40% of the data. e stochastic gradient descent optimization method [34] was used to train the network with a batch size of 64 images for every iteration.
To minimize the cost function, an initial learning rate and a reducing factor of the fully connected layers were set to 0.0001 and 0.1, respectively, during training. Selecting the number of epochs was complex, as it was directly associated with a number of optimization during training. Hence, if the epoch's number was high, the network might overfit and performed poorly. erefore, to avoid the overfitting problem, the error and performance rate on validation images were monitored. It  [19].
was found that the ResNet-50 achieved its highest training accuracy and best generalization capability at epoch 6. Table 1 shows that the training performance of the network was relatively good as it scored a 100% accuracy in a very short time (37 seconds) and a small number of epochs (6) Figure 4: ResNet transfer learning process for the bananas sorting system. BN, batch normalization; FC, fully connected layer.
Journal of Food Quality Figure 5 shows the network's training progress curves and its associated loss function (error), respectively. e learning curve shows variations of the training accuracy with each epoch. From the curve, it can be seen that the network's learning was difficult only during epoch 1, but once it has passed that stage, the network's performance raised sharply until it reaches 100 at approximately epoch 2. e network reached a very small loss as shown in Figure 5

Results and Discussion
In order to verify the feasibility of the proposed transfer learning-based banana sorting system, we conduct an experimental test of 60% of the remaining images of our dataset. ose images are test images that are not seen before by the system, and the number of these test images is more than training images. As mentioned, the learning scheme was 40 : 60. As given in Table 2, the ResNet-50 has a very high recognition rate of 99% (Table 2) during testing based on the formulas (2) and (3). is means that 99% of the images of healthy and defective bananas are correctly classified during    Journal of Food Quality testing, as the network gained a high generalization power when tested on 60% of unseen banana images. Figure 6 shows samples of some banana images used to test the ResNet-50. Figure 7 shows samples of some misclassified and correctly classified abnormal bananas using ResNet-50. It is seen that the misclassified images are all defective bananas. All of the healthy bananas are classified correctly. is figure shows the network results that correctly classified a defective banana, while it failed to classify another defective one (Figure 7(b)).   Journal of Food Quality We visualize the learned features of different convolution and pooling layers in order to have a look at the network learning inside its deeper layers. Figure 8 shows the learned kernels of convolution layer 1. It is seen that these learned filters consist of gradients and features of different levels, orientations, and edges which are very helpful for the process of banana sorting.
e network learns to detect higher levels and more complicated features than those detected by the first convolution layers. Figure 9 shows the activations that show the learned kernels of a deeper convolution layer.
Features learned in every channel may change depending on the strength of their activations. In Figure 10, we show the strongest activation channel (Figure 10(b)) of a banana image ( Figure 10(a)) from the same convolution layer in Figure 9. Note that each square of this image (Figure 10(b)) is an activation output of a channel in the convolutional layer 1 in Figure 9. Nonetheless, compared to the original image ( Figure 10(a)), it is remarkable that this channel activates on edges in particular left and right edges. e channel activates positively on right edges and negatively on dark edges.
A comparison of the proposed model for banana grading is compared with other related works in Table 3. It is one of the most of the related research studies that employed image processing methods and texture analysis to distinguish colours, intensities, edge, and morphological shapes. ese are all handcrafted engineering mechanisms for features extraction of images, and they are time-consuming and limited to human  constraints. On the other hand, deep learning networks perform this task automatically within its convolution and pooling layers, making them strong and efficient feature extractors of different conceptual abstractions in a hierarchical way. us, it is seen that our proposed system based on residual learning, which also helps in boosting the performance of networks due to its skip connections approach, outperforms all other models presented in Table 3.
Note that the data used in training our model are the same data used in [4,5]; however, we added more images from different datasets as deep networks require a bigger number of examples to learn than traditional networks. It is also noted that the proposed network outperformed other networks despite its learning scheme which uses less training images than testing (40 : 60). In contrast, all other related works used more training than testing examples. is demonstrates the robustness and effectiveness of residual learning (ResNet-50) in bananas grading task.

Conclusion
e study presents the design of a deep learning structure for grading bananas as healthy or defective. Residual learning was employed for designing of this grading system. e systems presented a new deep learning approach named skip connections, resulting in greater performance in different types of tasks such as classification and object detection. Upon training and testing, we conclude that the ResNet-50 as a very deep network has the capability of accurately generalize the grade of a banana with a very small margin of error. As compared to other models, this network shows a better accuracy when tested on unseen test data, despite its learning scheme (40 : 60) which uses more training images than testing ones. e robustness and significance of such network in grading the banana images are due to its power of learning low and high levels of features via its deep residual blocks, convolution, and pooling layers. is depth helps in extracting unimaginable features contributing to reaching higher recognition rates during training and testing. e importance of such a system is its urgent need in the food industry, due to the big demand requested at a very fast rate. Such a system should be ascertained to be the most efficient, errorless, reliable, accurate, and flexible systems for production in the food industry field.

Data Availability
e dataset used to support the findings of this study can be downloaded from the link https://figshare.com/articles/ dataset/DeepBanana/14230262.

Conflicts of Interest
e authors declare that there are no conflicts of interest.