^{1}

^{1}

^{1}

^{1}

^{2}

^{1}

^{1}

^{2}

Hyperspectral imaging (HSI) technology has increasingly been applied as an analytical tool in fields of agricultural, food, and Traditional Chinese Medicine over the past few years. The HSI spectrum of a sample is typically achieved by a spectroradiometer at hundreds of wavelengths. In recent years, considerable effort has been made towards identifying wavelengths (variables) that contribute useful information. Wavelengths selection is a critical step in data analysis for Raman, NIRS, or HSI spectroscopy. In this study, the performances of 10 different wavelength selection methods for the discrimination of

Hyperspectral imaging (HSI) technology has emerged as an alternative technique that can meet both spatial and spectral requirements and thus has been widely applied in quality evaluation and classification of Traditional Chinese Medicine. Zhang et al. fabricated a visible-near-infrared (Vis-NIR) HSI portable field spectrometer to distinguish sun-dried and sulfur-fumigated Chinese medicine herbs and achieved the results with a sensitivity of 96.4% and a specificity of 98.3% for RPA identification [

A HSI spectrum of a sample is typically measured by a spectroradiometer for hundreds of wavelengths. The large number of spectral variables in most spectral datasets often renders the prediction of a dependent variable unreliable. However, the use of appropriate projection or selection techniques, such as principle component analysis or partial least squares regression, may minimize this problem [

Two to five wavelengths selection methods were usually compared [

A total of 675

A hyperspectral imaging system was used in the experiment, which consists of an imaging spectrograph (Imspector V10E, Spectral Imaging Ltd., Oulu, Finland), a CCD camera (C8484-05, Hamamatsu city, Japan), a lens (OLE-23, Specim, Spectral Imaging Ltd., Oulu, Finland), an illuminant source with two quartz tungsten halogen lamps (Fiber-Lite DC950, Dolan Jenner Industries Inc., Boxborough, USA), a conveyer belt controlled by a stepper motor (IRCP0076 Isuzu Optics Corp, Taiwan, China), and a computer. The whole system was assembled in a dark chamber except the computer, as shown in Figure

The hyperspectral imaging system.

After repeated tests, the height between the lens and the sample was set as 15 cm, the exposure time of camera was set as 1.35 ms, and the speed of the conveyer was set as 18.7 mm·s^{−1}. The hyperspectral image was acquired by a software (Spectral Image-V10E, Isuzu Optics Corp, Taiwan, China).

The acquired raw hyperspectral images should be calibrated with the white and dark reference according the following equation:

Successive projections algorithm (SPA) is an efficient method of spectral feature selection, which could minimize the collinearity between variables [

Regression coefficient is calculated based on PLS, and sensitive wavelengths are usually selected according to the regression coefficient of the optimal PLS model. Generally, the peaks or bands where the absolute value of RC is greater than threshold are selected as sensitive wavelength or waveband [

The loading weights show the importance of corresponding wavelength or bands in the spectral matrix. The peaks or valleys with the maximum absolute loading weights from the first principal factor to the optimal principal factor are selected as sensitive wavelengths [

Uninformative variable elimination (UVE) is widely applied for variable selection based on analysis of the regression coefficients of the PLS model. It can eliminate noninformative variables and the remaining is useful for the chemical and classification analysis [

Competitive adaptive reweighted sampling (CARS) is a feature variable selection method combining Monte Carlo sampling with PLS regression coefficient. Adaptive reweighted sampling is employed in CARS, and the variables with larger weight of regression coefficient are applied as a new subset to establish PLS model, and after repeated calculation, the subset with the lowest root mean square error of cross validation (RMSECV) is chosen [

In the iPLS method, the data are divided into nonoverlapping sections; each section develops a separate PLS model to identify the most useful variable range [

For the backward iPLS (BiPLS) algorithm, the dataset is split into a given number of intervals; the PLS models are then calculated with each interval left out in a sequence; that is, if

As in the interval PLS model, the dataset is split into a given number of intervals, but the PLS models are then developed based on successively improving intervals with respect to RMSECV; that is, if

The method combines the advantage of GA and PLS and is the most commonly used method for spectral data analysis. GA applied to PLS have been shown to be very efficient optimization procedures. They have been applied on many spectral datasets and have been proved to provide better results than full-spectrum methods [

In this method, UVE eliminates uninformative variables, and then SPA is employed for variable selection. Fewer variables are selected by a UVE-SPA algorithm compared to UVE.

The efficiency of the wavelengths selection method is based on the identification rate and the number of variables. The efficiency equation is as follows:

When

When

When

The spectral data extraction, SPA, UVE, UVE-SPA, iPLS, BiPLS, FiPLS, CARS, GA-PLS, and SVM were conducted on Matlab R 2010b (The Math Works, Natick, MA, USA). LW, RC, and PLS-DA were performed on Unscrambler® 10.1 (CAMO AS, Oslo, Norway).

The spectra of “Zhemaidong” and “Chuanmaidong” were acquired in the range of 380–1030 nm. The raw average spectra of “Zhemaidong” and “Chuanmaidong” were shown in Figure

Average raw spectra reflectance curves of

675

Class assignment and division of

Zhemaidong | Chuanmaidong | |
---|---|---|

Label | 1 | 2 |

Calibration set | 210 | 240 |

Prediction set | 105 | 120 |

Sum up | 315 | 360 |

Firstly, each wavelengths selection method should be optimized to evaluate the performance of each method better.

The number of sensitive wavelengths was set as 5~30, and 5 wavelengths (889, 1014, 411, 460, and 407 nm) were selected.

Regression coefficients of PLS model based on full-spectrum were shown in Figure

Seven wavelengths were selected by RC method.

UVE method was applied for full-spectrum data with no pretreatment; the number of principal components was set as 20. The selection criteria of threshold were 99% of the maximum value of variable stability. 291 wavelengths were selected by UVE (as shown in Figure

Plot of 291 selected wavelengths by UVE. Columns represent selected wavelengths.

291 variables were selected by UVE method; SPA was applied to minimize the number of variables selected by UVE. 12 wavelengths were extracted finally which were shown in Figure

Plot of 12 selected wavelengths by UVE-SPA. Columns represent selected wavelengths.

The raw dataset was split to 16–32 intervals, and the optimal interval was selected according to the lowest root-mean-squares error of cross validation (RMSECV). As shown in Figure

Minimum RMSECV of different number of intervals.

RMSECV of 16 intervals.

The optimal result was achieved when raw dataset was divided into 32 intervals. Finally, seen in Figure

13 intervals selected by BiPLS. Columns represent selected intervals.

In the GA-PLS method, population size was set as 30, probability of mutation was set as 0.01, probability of cross-over was set as 0.5, and the number of runs was chosen to be 100. 85 wavelengths were selected eventually.

After optimization procedure, wavelengths selected by different methods were shown in Table

Effective wavelengths selected by different methods.

Methods | Number | Wavelengths/nm |
---|---|---|

SPA | 5 | 889, 1014, 411, 460, 407 |

RC | 7 | 409, 448, 455, 491, 545, 959, 999 |

LW | 8 | 550, 990, 433, 1014, 539, 385, 380, 382 |

UVE | 291 | 408, 410, 417, 425, 428, 430, 432, 444~467, 477~479, 481~517, 519, 524~564, 574, 576~611, 626, 631~639, 642, 644, 650, 652, 659, 664, 667, 676~695, 697, 700, 703, 709, 711, 723, 740, 752, 759, 776, 786, 788~796, 799, 800, 810, 844, 849, 852, 853, 856, 857, 859~929, 934, 937, 940~981, 986, 990~1007, 1009, 1014 |

UVE-SPA | 12 | 426, 455, 503, 545, 589, 786, 875, 970, 994, 998, 1007, 1014 |

CARS | 105 | 418, 426, 431, 437, 443, 457, 466, 475~481, 489, 491~495, 497~500, 507, 511, 516, 519, 533, 535, 539, 540, 543~545, 548~551, 555, 558, 565~569, 571, 573, 576, 582, 584, 594~610, 613~618, 624, 637, 640, 643, 648, 653, 661, 668, 675, 681, 685, 687, 689, 719, 735, 738, 750, 751, 795, 806, 813, 816, 831, 856, 862, 874~877, 881, 888~889, 905, 910, 918, 920, 924, 961~964, 968, 973, 976, 987, 992~996, 1023 |

iPLS | 32 | 942~982 |

BiPLS | 208 | 418~436, 456~474, 494~513, 534~592, 614~653, 839~879, 963~1023 |

FiPLS | 480 | 418~1023 |

GA-PLS | 85 | 431, 466~469, 472, 473, 479~481, 490~494, 506~509, 512~522, 534~550, 551, 580~584, 685~689, 799~801, 875~877, 888, 956~963, 965~978, 983, 985, 990~1000 |

PLS-DA models were established based on variables selected by different methods and the raw full-spectrum, respectively, and prediction results of different models were compared (shown in Table

Results of PLS-DA models using different selected wavelengths.

Methods | Variables | Calibration | Prediction | |||
---|---|---|---|---|---|---|

Correct number | Identification accuracy/% | Correct number | Identification accuracy/% |
| ||

Raw | 512 | 449 | 99.8 | 214 | 95.1 | |

SPA | 5 | 417 | 92.7 | 210 | 93.3 | −1.78 |

RC | 7 | 413 | 91.8 | 211 | 93.8 | −1.28 |

LW | 8 | 402 | 89.3 | 199 | 88.4 | −6.60 |

UVE | 291 | 449 | 99.8 | 219 | 97.3 | 0.95 |

UVE-SPA | 12 | 449 | 99.8 | 216 | 96.0 | 0.88 |

CARS | 105 | 449 | 99.8 | 221 | 98.2 | 2.46 |

iPLS | 32 | 430 | 95.6 | 199 | 88.4 | −6.28 |

BiPLS | 208 | 450 | 100 | 223 | 99.1 | 2.38 |

FiPLS | 480 | 449 | 99.8 | 219 | 97.3 | 0.14 |

GA-PLS | 85 | 446 | 99.1 | 217 | 96.4 | 1.08 |

Support vector machine is suitable for solving small sample, nonlinear, and high dimensional pattern problems. SVM models were developed based on variables selected by different methods and full-spectrum, and the results were compared. Seen from Table

Results of SVM models using different wavelengths.

Methods | Variables | Calibration | Prediction | |||
---|---|---|---|---|---|---|

Correct number | Identification accuracy/% | Correct number | Identification accuracy/% |
| ||

Raw | 512 | 449 | 99.8 | 218 | 96.9 | |

SPA | 5 | 431 | 95.8 | 208 | 92.4 | −4.46 |

RC | 7 | 429 | 95.3 | 210 | 93.3 | −3.55 |

LW | 8 | 415 | 92.2 | 205 | 91.1 | −5.71 |

UVE | 291 | 448 | 99.6 | 219 | 97.3 | 0.17 |

UVE-SPA | 12 | 448 | 99.6 | 217 | 96.4 | −0.49 |

CARS | 105 | 444 | 98.7 | 217 | 96.4 | −0.40 |

iPLS | 32 | 442 | 98.2 | 199 | 88.4 | −7.97 |

BiPLS | 208 | 444 | 98.7 | 221 | 98.2 | 0.77 |

FiPLS | 480 | 444 | 98.7 | 218 | 96.9 | 0 |

GA-PLS | 85 | 448 | 99.6 | 223 | 99.1 | 1.83 |

According to the discriminant results of SVM and PLS-DA models, BiPLS and GA-PLS were highly efficient methods. SPA, RC, LW, and iPLS were methods of low efficiency, probably because these methods greatly reduce variables but also remove some useful information. The efficiency of UVE, UVE-SPA, and CARS was different in the two models. Above all, in identification of “Zhemaidong” and “Chuanmaidong,” BiPLS and GA-PLS were efficient wavelengths selection methods.

Seen from Tables

In this study, in view of hyperspectral data of

The study indicated that the wavelengths selection method could extract a small number of variables containing effective information and eliminating noninformation variables. Variables selection methods were tools to identify more concise and effective spectral data and played important roles in the multivariate analysis, which could be used for subsequent modeling analysis. Meanwhile, the characteristic wavelengths selected could provide a theoretical basis for the development of instruments.

The authors declare that there are no conflicts of interest.

This study was supported by the National Natural Science Foundation of China (Grant nos. 61405175 and MOI201702), the National Key Research and Development Program of China (Grant no. 2016YFD0700304), and the National Key Scientific Instrument and Equipment Development Project (Grant no. 2014YQ47037702).