Computer-Aided Multiclass Classification of Corn from Corn Images Integrating Deep Feature Extraction

Corn has great importance in terms of production in the field of agriculture and animal feed. Obtaining pure corn seeds in corn production is quite significant for seed quality. For this reason, the distinction of corn seeds that have numerous varieties plays an essential role in marketing. This study was conducted with 14,469 images of BT6470, Calipso, Es_Armandi, and Hiva types of corn licensed by BIOTEK. The classification of images was carried out in three stages. At the first stage, deep feature extraction of the four types of corn images was performed with the pretrained CNN model SqueezeNet 1000 deep features were obtained for each image. In the second stage, in order to reduce these features obtained from deep feature extraction with SqueezeNet, separate feature selection processes were performed with the Bat Optimization (BA), Whale Optimization (WOA), and Gray Wolf Optimization (GWO) algorithms among optimization algorithms. Finally, in the last stage, the features obtained from the first and second stages were classified by using the machine learning methods Decision Tree (DT), Naive Bayes (NB), multi-class Support Vector Machine (mSVM), k-Nearest Neighbor (KNN), and Neural Network (NN). In the classification processes of the features obtained in the first stage, the mSVM model has achieved the highest classification success with 89.40%. In the second stage, as a result of the classifications performed through the active features selected by using three types of feature selection algorithms (BA, WOA, GWO), the classification success obtained with the mSVM model was 88.82%, 88.72%, and 88.95%, respectively. The classification accuracies of the tested methods and the classification accuracies obtained in the first stage are close to each other in terms of classification success. However, with the algorithms used in feature selection, successful classification processes have been carried out with fewer features and in a shorter time. The results of the study, in which classification was carried out in the inexpensive, the objective, and the shorter time of processing for the corn types, present a different perspective in terms of classification performance.


Introduction
Corn, one of the basic grain products, is a staple food for millions of people all over the world, particularly in Latin America, Asia, and Africa. Corn is used by being processed in various food products directly as human food such as corn flour, semolina, starch, snacks, breakfast cereals as well as it is used in the production of animal feed [1]. Corn, or maize, which can be harvested once a year, is an agricultural product that ranks third after wheat and rice in terms of cultivation area throughout the world [2]. As a multipurpose grain widely cultivated in many parts of the world, corn has many different types across the world [3]. e distinction of corn type is of great importance for crop monitoring, highthroughput phenotyping, and yield prediction [4]. e region where it is grown has a strong influence on the quality and commercial value of corn. Hence, as the geography changes, the unique characteristics of corn also differ [5]. To the extent that the classification of corn has an impact on the final product and its quality, it plays an important and critical role in determining the market value. e main purpose of classification is to facilitate the correct commercialization of corn, as well as to provide information about the storage and processing [6]. Seed purity is an important parameter for the evaluation of seed quality and can be effectively examined by the seed classification [7]. In addition to the fact that there are numerous literature studies conducted in this field, it is also seen that classification studies are carried out in agricultural products.
In recent years, multispectral and hyperspectral imaging techniques have been used as well as several image processing, deep learning, and machine learning methods for the classification and quality evaluation of corn. When the literature in this area is examined, it is seen that the classification of corn has been performed with Multi-Linear Discriminant Analysis (MLDA) and Least-Squares Support-Vector Machine (LS-SVM) [7], Radial Basis Function Neural Network (RBFNN) and SVM [8], Principal Component Analysis + Partial Least Squares Discriminant Analysis (PCA + PLS-DA) [6], and Deep Convolutional Neural Network (DCNN) [9]. Table 1 gives the results of grain products' classification with various artificial intelligence methods and the results of these classifications. e aim of this study is to compare nondestructive classification models by using the images of different corn types. A limited number of features can be obtained by extracting color, morphological, and shape features from corn grain images. However, a large number of features are obtained with Deep Feature Extraction. e deep learning model tested and used in the study is based on the SqueezeNet architecture as it has a smaller structure compared to well-known pretrained network designs [10]. e created model was used to extract the deep features of the images. Different classification models have been created to classify these extracted features. Decision Tree (DT), Naive Bayes (NB), Multi-Class Support Vector Machine (mSVM), k-Nearest Neighbor (KNN), and Neural Network (NN) classifiers [11][12][13][14][15][16] were used in these models. Among the deep features, the more effective features were selected with the meta-heuristic algorithms, Bat Algorithm (BA), Whale Optimization Algorithm (WOA), and Gray Wolf Optimization (GWO) [16][17][18][19][20]. Furthermore, the selected features were classified by machine learning algorithms DT, NB, mSVM, KNN, and NN. 10-fold cross validation was used to objectively measure the success of the models. e main contributions of this research to the literature are listed below: (1) A different approach based on deep feature extraction, selection, and classification strategy is presented for the classification of corn types used in the study.
(2) e deep features of the corn images were extracted and classified with DT, NB, mSVM, KNN, and NN models. (3) e features obtained as a result of the most effective features' selection process with BA, WOA, and GWO were classified with DT, NB, mSVM, KNN, and NN models. (4) As a result of the processes, the classification success of all models, as well as the classification times, were compared and the optimum classification model was determined.
In order to realize the abovementioned contributions, the article is organized as follows: in Section 2, the materials and methods used in this research are described. In Section 3, the experimental results for the multiple classification problem are presented. In Section 4, the performance of the proposed framework is evaluated.

Dataset.
In this study, the licensed BT6470, Calipos, Es_Armandi, and Hiva types belonging to BIOTEK were used. A total of 14,469 corn seeds images were obtained from 1-kilogram corn of each type, 3056, 5090, 3385, and 2938, respectively. Each image is 350 × 350 pixels in size. In Figure 1, sample seed images of the corn types in the dataset are given.

Convolutional Neural Network (CNN)
. CNN is a deep learning method that has been frequently used in the literature recently, designed to recognize visual patterns directly from image pixels by minimizing preprocessing [21]. CNNs are a kind of feedforward neural network with many layers. In Figure 2, a typical CNN architecture is shown [22].

SqueezeNet. First proposed by Iandola et al. in 2016,
SqueezeNet is a specially designed CNN model [23]. It consists of 15 layers as two convolution layers, three maximum pooling layers, eight fire layers, a global average pooling layer, and an output layer softmax. SqueezeNet has a lightweight structure with fewer structural parameters and less computation. SqueezeNet has only 1 × 1 and 3 × 3 convolution cores, and its purpose is to simplify the complexity of the network to achieve the best classification accuracy [24]. At the end of the layers, there are fully connected (FC) layers with average pooling and 1000 neurons.

Feature Selection.
Feature selection plays an important role in terms of dimensionality reduction and classification in high-dimensional datasets. In the feature selection process, only the most active features in the datasets are selected. A good feature selection technique aims to improve classification performance while reducing computational cost and time [25]. Searching for the best feature set is a challenging problem in the feature selection process.
Metaheuristic algorithms perform well in finding the optimal solution for this type of problem [26]. In this study, BA, WOA, and GWO metaheuristic optimization algorithms were utilized for feature selection.

Bat Optimization Algorithm (BA).
Bat optimization algorithm, which is a metaheuristic optimization method based on the behavior of bats, was proposed by Yang in 2010. It is an optimization algorithm inspired by the behavior of bats to determine the direction and distance of an object by utilizing echolocation [27]. e basics of the bat optimization algorithm are given as follows [28]: Rule 1: All bats locate their prey by echolocation. Rule 2: Each bat flies randomly in position x i , v i speed, and f min frequency and searches for their prey by varying the wavelength (λ) and sound output (A). Rule 3: Bats can adjust their wavelength and sound output for different situations.
It is frequently used in feature reduction problems in the feature extraction [29]. Each bat is associated with a set of binary coordinates indicating whether a feature belongs to the final feature set. e feature reduction function, which depends on the number of bats, requires that a classifier with features defined by the position of each bat is trained and evaluated on the classifier set [30].

e Whale Optimization Algorithm (WOA). Whale
Optimization Algorithm (WOA), first brought to the literature by Mirjalili and Lewis in 2016, is a metaheuristic optimization method that mimics the hunting behavior of humpback whales. It finds an area of study in classical engineering problems such as unimodal, multimodal, fixeddimensional modal, and composite functions. Based on the hunting behavior of whales, this technique has both exploitation and exploration stages with the spiral bubble net attack method. In this respect, it is used for the global optimization target [31,32]. is technique can be used to find the best subset of features that maximizes classification success while keeping the minimum number of features [33].

Gray Wolf Optimization (GWO).
It is a new metaheuristic optimization method developed by Mirjalili et al. in 2014. e number of group individuals of wolves living as a group varies between 5 and 12. In the gray wolf group, which has a social hierarchy, the alpha wolf, the leader, is followed by the beta and delta wolves. Omega wolves are the lowestlevel wolves. In this strategy, gray wolves first recognize the location of the prey and surround it under the leadership of the alpha wolf. In the mathematical model of gray wolves' hunting strategy, it is assumed that alpha, beta, and delta wolves provide better information about prey location. erefore, the first three best solutions (alpha, beta, delta) are used to update the positions of wolves in the GWO algorithm. For this reason, omega wolves have no place in the algorithm [34,35]. e features of GWO such as fast convergence, and simple and easy implementation are the reasons for its preference compared to other optimization methods. High-classification success can be achieved with the successful application of feature selection in datasets and a small number of features [36].

Classification Methods.
Within the scope of this study, five multiclass supervised classification algorithms are focused, which are DT, NB, mSVM, KNN, and NN methods. e detailed information about each method is given below. e features obtained from deep feature extraction of corn images are given as separate inputs to these methods: Decision Tree (DT): e use of Decision Tree algorithms was started first in 1995. e DT method is used to solve regression or classification problems [37]. Naive Bayes (NB): It is a simple probabilistic algorithm using Bayes' theorem. Naive Bayes performs the classification process by assuming that all variables are independent.
is conditional assumption of independence is rarely valid in real-world applications. at is why it is characterized as naive. In spite of this, the algorithm tends to learn quickly in a variety of supervised classification problems [38]. Multi-Support Vector Machines (mSVM): Support vector machine (SVM) is one of the most powerful  Computational Intelligence and Neuroscience kernel-based machine learning tools used for classification and regression problems and it can distinguish all classes with a single optimization process [39,40]. K-Nearest Neighbor (KNN): It is one of the frequently used algorithms in the machine learning field due to its versatility and ease of use. However, since KNN uses all the training data, it needs more time in analyzing large data and high memory for storage. e letter "K" indicates the number of nearest neighbors, and the term "nearest neighbor" indicates that the algorithm searches for the nearest point it needs to classify and label the closest point assigned to it [41]. Neural Network (NN): It is a mathematical system consisting of many processing units (neurons) interconnected in a weighted manner. Unlike other statistical, mathematical, and experimental techniques that require prior knowledge, the NN model performs classification processes by using the similarities and relationships between the data [42].

K-Fold Cross Validation.
e standard k folds operation divides the data into k subsets. Each fold contains approximately an equal number of data, and fold membership is randomly assigned typically. If the dataset is relatively small, stratified random sampling is also used to ensure that the target variable is approximately uniformly distributed in each fold. After dividing the data into k folds, the candidate model is then subjected to an iterative evaluation process. During iteration, each fold is used to train the k − 1 candidate model and the performance of the model is measured with the remaining fold. is process is repeated until each fold is fully used as a validation set, and a total of k retraining and validation processes are performed for each candidate model [43][44][45][46]. Figure 3 shows the k � 10 fold cross-validation process used in the study.

Experimental Results and Discussion
e feature vectors obtained from the feature selection were reclassified with the specified classification methods as in the first step, and the 10-fold cross-validation method was used again to evaluate the success of the models. e general block diagram of the study is given in Figure 5. e classification performances of the feature vector obtained from deep feature extraction of corn images and feature vector obtained after feature selection were calculated separately. Table 3 gives the classification performances and the number of features obtained.
According to Table 3, the mSVM is the most successful model as a result of the classification performed with 1000 features. e mSVM model is followed by NN, KNN, NB, and DT models, respectively. e ranking of the models' other performance metrics also shows parallelism with the classification success metric. As a result of the classifications performed with the active features selected by the BA, WOA, and GWO feature selection methods, the model with the highest classification success is mSVM. Again, NN, KNN, NB, and DTmodels are followed in the classifications carried out by the feature selection process. Likewise, the

Predicted Class
Actual Class   performance metrics of these models have similarities to their classification success. Consequently, it is seen that mSVM has the best classification performance from machine learning algorithms in the classification processes made as a result of deep feature extraction and feature selection.
As a result of the classifications performed with 1000 features obtained from the SqueezeNet model, it was seen that the highest classification success belonged to the mSVM model. As a result of the classifications made with the features obtained from the BA, WOA, and GWO feature selection methods, the highest classification success was obtained from the mSVM model, again. e ACC, TPR, TPR, PRE, F1-Score, MCC, and process time of these mSVM models are given in Table 4 and the graph showing the time taken for these classification processes is given in Figure 6. Figure 7 gives the confusion matrix obtained as a result of the classification.  Figure 5: General block diagram of the study.   Computational Intelligence and Neuroscience Figure 8 gives the performance metrics ACC, TPR, TPR, PRE, F1-Score, and MCC obtained as a result of the classifications performed with the mSVM, which has the highest classification success.

Conclusion
In this study, a feature vector containing 1000 deep features extracted from the images of four different corn types, BT6470, Calipos, Es_Armandi, and Hiva, by using the SqueezeNet model. Features were first classified by DT, NB, mSVM, KNN, and NN machine learning algorithms. Following, via BA, WOA, and GWO algorithms, more effective features selected from this feature vector were classified by DT, NB, mSVM, KNN, and NN machine learning algorithms. Finally, the performance results of the models were compared. e performances of the classifiers were analyzed by using the confusion matrix data. e mSVM method achieved the highest classification performance in all classification processes performed with 1000 features obtained with the SqueezeNet, 480 features obtained with BA, 315 features obtained with WOA, and 384 features obtained with GWO. is model is followed by NN, KNN, NB, and DT methods in all classification processes, respectively. As a result of the classification of the feature vector obtained from deep feature extraction with mSVM, ACC, TPR, TPR, PRE, F1-Score, and MCC values were found to be 89. 40%

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
e authors declare that they have no conflicts of interest.