Two-Stage Intelligent DarkNet-SqueezeNet Architecture-Based Framework for Multiclass Rice Grain Variety Identification

Image processing is an important domain for identifying various crop varieties. Due to the large amount of rice and its varieties, manually detecting its qualities is a very tedious and time-consuming task. In this work, we propose a two-stage deep learning framework for detecting and classifying multiclass rice grain varieties. A series of steps is included in the proposed framework. The first step is to perform preprocessing on the selected dataset. The second step involves selecting and fine-tuning pretrained deep models from Darknet19 and SqueezeNet. Transfer learning is used to train the fine-tuned models on the selected dataset. The 50% sample images are employed for the training and rest 50% are used for the testing. Features are extracted and fused using a maximum correlation-based approach. This approach improved the classification performance; however, redundant information has also been included. An improved butterfly optimization algorithm (BOA) is proposed, in the next step, for the selection of the best features that are finally classified using several machine learning classifiers. The experimental process was conducted on selected rice datasets that include five types of rice varieties and achieves a maximum accuracy of 100% that was improved than the recent method. The average accuracy of the proposed method is obtained at 99.2%, through confidence interval-based analysis that shows the significance of this work.


Introduction
In the feld of agriculture, rice is an unavoidable staple diet and the most used around the world [1]. In addition, it is the third most cultivated feld in diferent areas, including China, East Asia, and the south [2,3]. Approximately 90% of Asians prefer rice food, and their demand for rice increases day by day [4]. Myanmar is the largest rice exporter in the world and the sixth-largest rice-producing country [8]. Te actual goal is to detect the rice grain quality within less time and cost.
Both physical and chemical methods are used for seed varieties of rice. In other words, destructive and nondestructive methods are also used for solving the rice problems [9]. Tese nondestructive methods are better as compared to destructive methods because nondestructive methods use the digital image processing (DIP) process, and now, people want to detect rice grain with less time and at cheaper rates. Stone, weed seeds, chaf, etc. are the main causes of damaging the seed quality [10]. Flatbed scanners are used for rice grading and inspection in which rice kernels, glass plate, and scanner head are used. Te scanner detects the size, shape, density, and percentage of broken kernels [11]. Rice has diferent types. C4Raja rice [12], and kernel rice features depend on length, width, and perimeters [13]. Several rice types, such as Chenab Basmati, Kissan Basmati, Basmati 2000, KSK 133, KSK 434, Punjab Basmati, and PK 1121 aromatic, are mostly grown in Pakistan, are utilized for rice seed categorization, and have been gathered [14]. Some varieties are shown in Figure 1.
Rice grain quality and their varieties is an important factor in cultivation and import/export [16]. Manual labor and some machines work to solve these rice grain problems. Current machines are costly, so, to solve the quality problem, humans propose a machine vision algorithm [17] to identify the rice grain problem and solve them through classifcation method and based on images and videos, which is easy and fast in the detection of color, shape, and texture feature [18]. Diferent classifcation methods are used to classify rice into diferent groups [1]. On the contrary, automatic detection of the rice grain is a challenging task in the farming industry. Diferent applications of the technology are used for the automatic identifcation of rice grains [19]. To remove the impurities from the rice, diferent methods of DIP and probabilistic neural network (PNN) algorithms are utilized.
To determine the grain's moisture content, machine learning (ML) techniques are applied [20]. Moreover, morphological methods are much important and play a signifcant part in determining the rice's quality.
In the feld of agriculture, identifying and classifying a rice grain are one of the most interesting and demanding tasks. In the ongoing years, diferent machine learningbased approaches have been introduced. However, still, there are some gaps that need to be resolved in this area [21]. Several techniques have been introduced in the literature for rice variety detection and classifcation using computer vision and machine learning techniques [22]. Normally, they perform preprocessing techniques at the initial phase for better detection [23]. Robert Singh and Chaudhury [24] presented texture, wavelet, morphological, and color features which were used to classify the four diferent types of rice grain images that are taken using a 12-megapixel camera. Te HSI channel has been used for image conversion and then these images (grayscale) converted into a binary image. Sethi et al. [6] used adaptive thresholding to convert the image into grayscale. Low pass and Gaussian flters are also used to smooth and denoise the image. Masking is used for highlighting the rice object.
Histogram equalization (HE) is reviewing the gray level of the image, and image enhancement gives the idea that which techniques are applied to the grayscale and color image [25]. In the preprocessing process, the auto-align method is available to automatically align the arrangement and positioning of grain. Te exact position around the grain used the contour method [26]. Wongruen [27] utilized four techniques based on segmentation such as fast K-means of Dixit, fuzzy C-means (FC), multilevel thresholding, and Otsu's multilevel thresholding (OMT). Te OMT's and FC accuracy rates are 89.83% and 95.53%. Itharatet al. [28] used Khao Dawk MAli 105 dataset and applied global thresholding to segment the chalk area on images of rice. RiceNet-based segmentation has improved the quality of adhesive rice grain. We compare the result with region-based convolutional neural network (R-CNN) and give an 89.5% accuracy [29]. Bhupendra et al. [30] presented a deep learning architecture-based framework for rice grains' classifcation. Tey used several CNN models, and EfcientNetB0 successfully attained the accuracy of 98.32%. Moreover, classifcation is performed for several other fruit varieties by computer vision researchers such as date fruit [31], Pistachio Species [32], corn [33], and cofee beans [34].
In the abovementioned studies, the major problem is the appearance variation of several rice varieties. Te similar shape, texture, irregularity, and rice width afect the rice segmentation accuracy. Furthermore, the incorrect segmentation step produced irrelevant features that later degrade the classifcation accuracy. Furthermore, in the classifcation phase, the presence of a few redundant and irrelevant features impacts the proposed framework's computational time (during the prediction process) and can decrease the classifcation accuracy. Another challenge is the selection of a classifer for fnal accuracy because several classifers exist and many of them consumed a lot of time for the prediction process. Terefore, we proposed a two-stage framework for rice variety detection and classifcation. Our major contributions are as follows: (i) We proposed a rice segmentation technique based on a color-based saliency map and thresholding (ii) Two pretrained deep models have been fne-tuned and have fused their information based on maximum correlation (iii) Best features are selected using an improved butterfy optimization algorithm

Proposed Methodology
In the proposed work, the main steps are the acquisition of a dataset, deep saliency-based rice segmentation, deep features extraction using pretrained models trained on the selected dataset, feature fusion, feature selection using an optimization technique, and fnally classifcation through machine learning classifers. Te main fow of the proposed framework is illustrated in Figure 2. In Figure 2, it is noted that the saliency map is constructed that further processed in the 2 Computational Intelligence and Neuroscience form of a color map. Mathematically, each substep presented in this framework has been discussed below.

Saliency-Based Rice Segmentation.
Segmentation is an important step in the domain of computer vision using plant application [35]. In this study, we introduced a saliencybased approach using Wang and Peng [36] technique. In this technique, we initially picked the superpixel image for the saliency method and select top, left, and right pixel for background seeds, the vector B v � B 1 , B 2 , . . . , B n Z , and their values B p � 1 and B p � 0, this represents the labeled and unlabeled nodes, whereas p represents superpixel. Te results of this method are shown in Figure 3. Te background saliency is estimated as follows: Difusion-based compactness is defned as follows: where I v (p) is the spatial variance of superpixel and I d (p) is the spatial distance of superpixel. Later on, we combined the background saliency and compactness as follows:

Computational Intelligence and Neuroscience
We jointly take into consideration the initial saliency values into the superior afnity network with the goal of creating a perspective of coarse-to-fne optimization to difuse the frst saliency map. Second, the robust afnity graph M J 2 is computed as follows: whereas m ij I f is the afnity graph. We insert this graph into the manifold ranking objective function in order to achieve our goal of working the coarse-to-fne saliency result I 2 : Te lab color transformation is applied on I 2 , and the color mapped images are obtained which are later utilized for the training of pretrained deep learning models.

Pretrained Deep Models.
In this work, two pretrained deep models have been employed for the training on rice dataset. Te selected datasets are SqueezeNet and DarkNet19.

SqueezeNet Model.
A convolutional neural network of 18 layers deep is called SqueezeNet. Te ImageNet database contains a pretrained version of the network that has been trained on almost a million photos. Figure 4 shows SqueezeNet architecture and 3 CNNs highlighting the subset of layers. Te SqueezeNet model is used for feature extraction. Final layer activation is dependent on the classifcation module, and these layers are complicated networks. Tere are two layers of the classifcation module: the global pooling layer and the convolutional layer. Te remaining 2 layers are arbitrarily and pretrained on ImageNet. Te detailed representation of the SqueezeNet model is shown in Table 1. Classifcation is performed on images through pretrained networks.

DarkNet19.
DarkNet19 is the convolutional neural network. We used the DarkNet19 model, in which 19 convolutional layers are known as DarkNet19. Tere are fve max-pooling layers and 19 convolutional layers (CL) in DarkNet19, with several 1 × 1 CL to the minimum triangle parameters involving 3 × 3. We train the pretrained network by using a transfer learning technique that is applied to 75,000 images which provide the features of the network. In this stage, DarkNet19 was used to classify the images into 1000 classes, but we used only 5 classes for the results. Moreover, the image size of the input layer of DarkNet19 is 256 × 256. Te view of the architecture neural network of DarkNet19 is shown in Figure 5. Tis network accepts the input of dimensional 244 × 244. Te value of parameters and more details of the architecture neural network of Dar-kNet19 are shown in Table 2.

Transfer Learning.
Transfer learning is used to improve the efciency of the process and reduce the number of resources required. When elements of a pretrained machine learning model are reused in a new machine learning model, this is known as transfer learning. In transfer learning, we defne feature vector and probability distribution as is a probabilistic representation of the function. Transfer learning and learning rate are denoted as T o and L o . T f will be used to show the targeted function and the targeted output. Te main goal of transfer learning is to improve the learning rate for predicting the targeted item using the recognition function (l(x)) depending upon the training learned from T o and T f , where T o ≠ T f and L o ≠ T f . Pattern recognition is improved via inductive transfer learning. When using inductive transfer learning, you will need an annotated database for fast training and testing. Visually, the process of transfer learning for the training of model on rice images is illustrated in Figure 6. models. Te frst feature vector is extracted from SqueezeNet fne-tuned model and performed activation on second last feature layer. On this layer, 1024 features are extracted for each image. Te second feature vector is extracted from DarkNet19 fne-tuned model and performed activation on second last convolutional layer. On this layer, 1000 features are extracted for each image. We fused both feature vectors using a proposed maximum correlation-based approach that combine the important information from both vectors.
We consider two feature vectors F v1 Dar , and F v2 Squ, and fusion vector represented as Fus v . Te dimension of these vectors is R × N, where N is represented as the number of images. Each one has a vector length of R × 1024 and R × 1000, respectively. Te following formulation is performed to compute the correlation coefcient between two characteristics: Te range of these values, Dar and Squ, lies between (− 1 and 1). -1 stands for weak correlation, and 1 stands for strong correlation. Te equation of maximum correlation vector is defned as follows: where φ denotes the supremum of the overall Borel functions [37], Squ: ω ⟶ ω is located between (0 and 1), and CV(Dar , Squ) is the maximum correlation. If the correlation is close to 1, then we drop it; if it is 0, then we discard both features through fused vector. Based on this formulation, a fusion vector is obtained having dimension N × 931 that later processes in the optimization algorithm for the best feature selection.

Improved Butterfy Optimization Algorithm-Based Feature Selection.
A technique for reducing the input variable to your model is feature selection, which involves removing noise and using just useful data. It involves selecting the best characteristics for your machine learning model in accordance with the kind of issue you are seeking to tackle automatically. Te butterfy optimization algorithm (BOA) is the metaheuristic algorithm [38]. Metaheuristic is a relatively new type of optimization algorithm that is used to fnd the best solution and is infuenced by the food investigating behavior of butterfies. Initially, the physical stimulus intensity work on the perceived intensity of the fragrance (f) is defned as follows: where e, d, and L are sensory mandatory, stimulus intensity, and power exponent that depends on modality. Te best global butterfy position can be represented as where P m j is j th butterfy position at time m and rn is the random number [0,1]. f j is the fragrance of jth butterfy and b p is the best current position. Te local walk is defned as where rn is the random number of between [0,1] and P m k and P m l are used for kth and lth butterfy's position. In the following step, the selected features based on the above equation are passed to the cross-entropy (CE) function. CE is used for the global search for butterfy's movement. Te following is an optimization problem: where Z defned the fnal states and R is the real value performance function Z. An auxiliary problem and probability distribution estimation problem are described as follows: where P m is the probability measure, S(R) is the random state, E o is the expected operator, c show the thresholding parameter, and F shows the indicator function (the value of F is 1 if S(R) ≤ c, 0 ). In the CE method, the signifcance sampling methodology is employed to lower the sample size. As a result, the equation can be rewritten like this where e(y i ; a) represents the random sample and h(y i ) represents optimal density that can be discovered by reducing the Kullback-Leibler divergence: Hence, the fnal optimization can be done as follows: whereas the smoothing parameter is 0≤ ∝ ≤ 1. Tis improved algorithm continues until the number of iterations is not completed. In this work, the numbers of iterations are 200. At the end, we obtained a feature vector of dimensional N × 522 which was fnally classifed using machine learning classifers.

Dataset. In this work, we utilized a publically available rice dataset that include fve types of rice varieties such as (a) Khazar, (b) Gharib, (c) Ghasrdashti, (d) Gerdeh, and (e)
Mohammadi [39]. A sample image is shown in Figure 1. Tis dataset consists of total 75,000 images and each class consists of 15,000 images (https://www.kaggle.com/datasets/ muratkokludataset/rice-image-dataset). We employed 50% of the images from each class for training and rest for testing.

Experimental Setup.
Te experimental process of the proposed method is conducted on publically available rice imaging dataset [39], as discussed under Section 3.1. We performed the 50 : 50 approach for both training and testing, whereas the cross-validation value is 10. Four diferent experiments have been performed for the detailed analysis of the proposed method. In the frst experiment, the Dar-kNet19 model is used after applying deep saliency-based segmentation on the datasets and fed to the classifer. In the second experiment, we train the dataset through the SqueezeNet model and fed it to the classifer. In the third experiment, we applied feature fusion both the frst and second models fused the features and fed them to the classifer. At the end, the best value of features is selected through improved butterfy optimization. Several classifers have been opted for the classifcation, whereas the performance of each classifer is computed based on recall rate, precision rate, F1 score, and accuracy. Moreover, the computational time is noted during the testing process. Te entire proposed method is implemented on MATLAB2022a using Desktop Computer with 16 GB of RAM and 6 GB NVIDIA RTX graphics card.

Experiment 1: DarkNet19
Features. In experiment 1, segmentation is performed on the selected dataset through  Figure 7.

Experiment 2: SqueezeNet Features.
In experiment 2, SequeezeNet testing features have been extracted and classifcation has been performed. From this model, 1000 features are extracted. Ten classifers have been utilized for the classifcation results and obtained the maximum accuracy of 99.5% by quadratic SVM, as presented in Table 4. Te other classifers also give better accuracy, but the maximum accuracy is obtained on quadratic SVM. Te computational time is also noted for all classifers, and it is observed that the linear discriminant consumes less time than the rest of the classifers. Moreover, the confusion matrix of quadratic SVM is also provided as illustrated in Figure 8.

Experiment 3: Feature Fusion.
In the third experiment, extracted deep learning features are fused using the proposed maximum correlation-based approach. Similar to experiment 1 and experiment 2, ten diferent classifcation methods have been utilized, and the results are presented in Table 5. In this table, it is noted that the accuracy is improved after the fusion process, whereas the time is slightly decreased. Te wide neural network better performed than the rest of the classifers as the obtained accuracy is 100% and computational time is 139.86 (sec). Compared to experiment 1 and experiment 2, all classifers performed better after the fusion process. Moreover, the confusion matrix is also provided as illustrated in Figure 9. In this fgure, it is clearly observed that each class correct prediction rate is above 99%.

Experiment 4: Best Selected Features.
In experiment 4, the best features are selected using the proposed improved butterfy optimization algorithm. Te total 522 features are selected after employing the proposed optimization algorithm. Results of this experiment are given in Table 6. Te best noted accuracy of 100% was achieved by Cubic SVM, whereas the recall rate is 99.98, precision rate is 99.96, and F1 score is 99.96%, respectively. Te other classifers also obtained better accuracy and consumed less time than the previous experiments. Te confusion matrix of quadratic SVM is also given as illustrated in Figure 10.

Analysis and Comparison.
In this section, a detailed analysis of the proposed system is conducted based on numerical values and some visual plots. Te proposed method was evaluated on a publicly available dataset including fve classes as illustrated in Figure 1. Te proposed method includes some important steps such as rice segmentation, deep learning feature extraction, fusion of deep features using the proposed approach, and selection of best features using improved butterfy optimization; the main fow is given in Figure 2. Te proposed segmentation technique clearly highlights the rice region shown in Figure 3 which is later utilized for the training of deep models. Te results of each step are computed as given in Tables 3-6. Moreover, the confusion matrix of step is illustrated in Figures 7-10. Based on the results, it is clearly observed that the fusion process reduces the computational time that further reduces after the best feature selection. Moreover, we conducted a comparison of the proposed method with several other neural networks as illustrated in Figure 11. Based on this fgure, it is clearly shown that the proposed fusion and selection methods give better accuracy.  Computational Intelligence and Neuroscience 7 Finally, a comparison is conducted with recent techniques on rice_image_dataset, as presented in Table 7. Koklu et al. [39] used a rice image dataset and applied three classifcation methods. Te classifer ANN, DNN, and CNN and their results are 99.87%, 99.95%, and 100%, respectively. Cinar and Koklu [40] used the same dataset and achieved

Conclusion
In this study, we proposed an automated deep learning-based framework for rice segmentation and variety classifcation. Te proposed framework includes several critical steps. For better visualization, the segmentation step localized the rice region and mapped color. Te main goal of this step is to provide better images for deep learning model training. Two pretrained deep models extract features from segmented mapped images. Te extracted features are fused using the proposed maximum correlation-based technique. Tis step reduced the number of features and selected only the most important ones for the fused matrix. However, it should be noted that, during the experimental process, some redundant features are also added to the fused matrix, which does not afect accuracy but increases computational time. As a result, we proposed an improved butterfy optimization algorithm that not only maintains accuracy but also reduces testing time. Te selection of important features during the fusion process is the main limitation of this work using the static threshold value. In the future, the fusion process will be optimized for less computational time by employing new techniques. Moreover, in the future, plant and rice disease detection and classifcation problems will be considered [43][44][45][46][47].

Data Availability
Te publicly available dataset has been used in this work for the experimental process (https://www.kaggle.com/datasets/ muratkokludataset/rice-image-dataset).

Conflicts of Interest
Te authors declare that they have no conficts of interest.