In this study, eggplant seeds of fifteen different varieties were selected for discriminant analyses with a multispectral imaging technique. Seventy-eight features acquired with the multispectral images were extracted from individual eggplant seeds, which were then classified using SVM and a one-dimensional convolutional neural network (1D-CNN), and the overall accuracy was 90.12% and 94.80%, respectively. A two-dimensional convolutional neural network (2D-CNN) was also adopted for discrimination of seed varieties, and an accuracy of 90.67% was achieved. This study not only demonstrated that multispectral imaging combining machine learning techniques could be used as a high-throughput and nondestructive tool to discriminate seed varieties but also revealed that the shape of the seed shell may not be exactly the same as the female parents due to the genetic and environmental factors.
Discrimination among different seed varieties is important for species registration, intellectual properties of plant breeders, and development of new varieties on the market [
The traditional method of sorting seeds typically relies on manual inspection [
The multispectral and hyperspectral images contain both morphological and spectral information. The difference between the regular RGB color imaging and the spectral imaging is that the latter can be used to identify information that is invisible to human eyes. Multispectral and hyperspectral imaging have also been widely used in seed research, such as predicting the viability and vigor of seeds [
In recent years, deep learning techniques have been developing rapidly. For example, some search engines, recommendation systems, and image recognition and speech recognition systems have adopted deep learning techniques and achieved decent results [
As we know, CNN has been widely used in images. It also has a remarkable performance in processing data. Wei et al. used CNN for hyperspectral imaging classification and achieved decent results [
The purpose of this study was to classify fifteen varieties of eggplant seeds by image recognition and feature extraction. A 2D-CNN was used to classify the images. We also extracted seventy-eight image features from the multispectral images, and a 1D-CNN was used to find classification criteria based on the extracted features. As a traditional machine learning algorithm, a Support Vector Machine (SVM) was employed to compare the performance of CNNs.
The image acquisition device, VideometerLab 4 (VM) [
Image acquisition and image segmentation. (a) The VideometerLab 4 for image acquisition. (b) The multispectral image of eggplant seeds (view mode: sRGB). (c) Seed boundary image segmented by watershed algorithm. (d) The images of singulated seeds.
Fifteen varieties of eggplant seeds (including 17-5, 17-12, 17-14, 17-15, 17-24, 17-25, 17-26, 17-38, 17-39, 17-41, 17-49, 17-52, 17-53, 17-54, and 17-55) harvested in 2017 were used in this experiment. All seeds were cultivated by the Provincial Key Laboratory of Hebei Agricultural University. A random number of seeds were placed in a petri dish with a diameter of 9 centimeters for image acquisition (Figure
The number of divided data sets.
Varieties | Training set | Testing set | Validation set |
---|---|---|---|
17-5 | 101 | 30 | 15 |
17-12 | 119 | 34 | 17 |
17-14 | 190 | 55 | 27 |
17-15 | 153 | 46 | 29 |
17-24 | 93 | 27 | 14 |
17-25 | 204 | 58 | 29 |
17-26 | 212 | 61 | 30 |
17-38 | 138 | 40 | 20 |
17-39 | 110 | 31 | 16 |
17-41 | 124 | 35 | 18 |
17-49 | 99 | 28 | 14 |
17-52 | 97 | 28 | 14 |
17-53 | 156 | 44 | 22 |
17-54 | 107 | 31 | 15 |
17-55 | 99 | 28 | 14 |
The Otsu method [
The accuracy of a CNN is positively correlated with the number of training samples [
We extracted seventy-eight features using the Blob Tool (VM software built-in tools), including colors, textures, shapes, smoothness, morphological texture, and spectral texture under nineteen bands. In order to speed up the process of obtaining the optimal solution and improve the accuracy of the result, we normalized the original data. The original data was expressed as
where the mean
The SVM splits the targets by finding the separating hyperplane, and the support vector is the data closest to the separating hyperplanes. The distance from the support vector to the separating hyperplane is to be maximized. A kernel function is used in SVM to create designated linear or nonlinear mapping of data in high-dimensional feature space. We compared three kernel functions (radial basis function (RBF), poly and linear) in our study. Scikit-learn [
The 1D-CNN architecture is shown in Figure
The CNN structure used in this study. (a) Structure used to identify two-imensional seed images. (b) One-dimensional network structure for identification of seed features.
The 2D-CNN architecture is shown in Figure
The Leaky ReLU layer effectively reduces the loss of information by retaining the negative input. The purpose of using BatchNormalization is to alleviate the problem of gradient disappearance in training and speed up the training of the model. The MaxPooling layer is used to (1) guarantee the position and rotation invariance of features and (2) reduce the number of model parameters and alleviate the problem of overfitting.
Figure
Average reflectance of the fifteen varieties of eggplant seeds.
Discriminant models based on extracted features were developed with SVM and 1D-CNN. Table
The test accuracy of SVM with three kernel functions; the SVM was used with extracted features.
Algorithm | Kernel | Accuracy (%) |
---|---|---|
SVM | Rbf | 87.13 |
Linear | 91.82 | |
Poly | 68.52 |
The curves for 1D-CNN: (a) training loss; (b) training accuracy; (c) testing loss; (d) testing accuracy.
The confusion matrix for features. The value on the on-diagonal represents the correct predictions of the validation set, and the background is blue. The value on the off-diagonal represents the incorrect predictions of the validation set. The value of zeros was not presented or colored.
We also used 2D-CNN to develop discriminate models, and the classification accuracy was 87.6%. Figures
The curves for 2D-CNN: (a) training loss; (b) training accuracy; (c) testing loss; (d) testing accuracy.
The confusion matrix for images. The value on the on-diagonal represents the correct predictions of the validation set, and the background is blue. The value on the off-diagonal represents the incorrect predictions of the validation set. The value of zeros was not presented or colored.
Number of classification errors.
Ground truth | Predict | Quantity |
---|---|---|
17-53 | 17-49 | 8 |
17-55 | 17-41 | 5 |
17-52 | 17-54 | 3 |
17-55 | 17-52 | 3 |
17-12 | 17-14 | 2 |
17-14 | 17-15 | 2 |
17-25 | 17-24 | 2 |
17-52 | 17-53 | 2 |
CNN is an end-to-end architecture; it is convenient to train and deploy models. CNN has shown good performance for processing data and images in our experiments. The accuracy of the model established by these quantified features was higher than the model established by images. The advantage of 2D-CNN is that there is no need to design complex algorithms to extract features.
In 1D-CNN, the accuracy of the training set was close to 100% but the testing set was 93.58%. The result showed that a small number of training samples lead to over-fitting during training.
The reflectance of the variety 17-38 in Figure
The phenomenon revealed by this study is noteworthy. It is widely accepted that the shell of the seeds developed from the female parents (integument); the phenotype from the same female parent is relatively correlated [
The eggplant seed varieties and their parents.
Varieties | Female parent×male parent |
---|---|
17-5 | TM×CJY |
17-12 | CJY×JYHG |
17-14 | CJY×HQY |
17-15 | CJY×TZSQ |
17-24 | GCBY×HLR |
17-25 | GCBY×HQY |
17-26 | GCBY×TZSQ |
17-38 | N5×TQ1 |
17-39 | N5×HQF |
17-41 | TQ1×HQF |
17-49 | 7#M×14#F |
17-52 | Dr3×3#F |
17-53 | Dr3×7#M |
17-54 | Dr3×7#F |
17-55 | Dr3×11#M |
In this study, we adopted multispectral imaging with different machine learning methods to discriminate the varieties of eggplant seeds. The SVM and 1D-CNN were used to classify the seeds based on the extracted features, comparing with 2D-CNN without feature extraction. The experiments proved the feasibility of CNN in classification of seed varieties. CNN was significantly better than traditional machine learning algorithms in this study. Theoretically speaking, the shell of seeds comes from the female parents, but our study revealed that genetic and environmental factors can lead to significant differences even if the seeds come from the same female parents. However, this phenomenon is to be further investigated with an experimental design incorporating more varieties as well as a larger sample size.
The data used to support the findings of this study are available from the corresponding authors upon request.
The authors declare that they have neither competing financial nor nonfinancial interests.
Sheng Huang, Xiaofei Fan, and Lei Sun contributed equally to this work.
This work is supported by the Key R&D Program of Hebei Province (20327403D), Hebei Talent Support Foundation (E2019100006), Talent Recruiting Program of Hebei Agricultural University (YJ201847), University Science and Technology Research Project of Hebei (QN2020444), and State Key Laboratory of North China Crop Improvement and Regulation, Hebei Agricultural University.