The present wood species classification systems can usually process the limited wood species quantity. We propose a novel incremental self-adaptive wood species classification system to solve the above-mentioned issue. The visible/near-infrared (VIS/NIR) spectrometer is used to pick up the spectral curves of wood samples for the subsequent wood species classification. First, when new wood samples of unknown wood species are added, they are classified as an unknown category by our one-class classifier, Support Vector Data Description (SVDD), while the existent wood species are classified as a known category by the SVDD. Second, the wood samples of known species are sent into the BP neural network for subsequent wood species classification. Third, the new wood samples of unknown species are sent into the Clustering by Fast Search and Find of Density Peaks (CFSFDP) algorithm for the unsupervised clustering, and the clustering result is evaluated by the internal and external norms. Last, if one cluster of one unknown species has an adequate amount of wood samples, these wood samples are removed and identified by human experts or other schemes to ensure to get the correct wood species name. Then, these wood samples are considered as a new known species and are sent into the classifiers, SVDD and BP neural network, to train them again. Experiments on 13 wood species prove the effectiveness of our prototype system with an overall classification accuracy of above 95%.
Wood species recognition has been investigated for some years since different wood species have different physical and chemical properties with a different price. Many wood species classification systems are used for automatic processing by use of sensors and computers. In terms of the used sensors, spectrum-based [
However, most present wood species classification systems usually process the limited wood species quantity. For example, Yusof et al. proposed a kernel-genetic nonlinear feature selection scheme for 52 tropical wood species recognition [
In this paper, we propose a novel prototype system which can process the incremental wood species quantity. Our prototype system is illustrated in Figure
Our prototype system’s structure graph.
Our wood samples consist of 13 wood species, as illustrated in Table
Wood species samples.
Category | English name | Latin name | Wood sample quantity | |
---|---|---|---|---|
Training dataset | Testing dataset | |||
The known category | Bubinga |
|
60 | 20 |
Merbau |
|
60 | 20 | |
Burma teak |
|
60 | 20 | |
African padauk |
|
60 | 20 | |
Casla |
|
60 | 20 | |
The unknown category | Birch |
|
55 | |
Mongolian pine |
|
55 | ||
White pine |
|
55 | ||
Larch |
|
55 | ||
Pometia |
|
55 | ||
Indian teak |
|
55 | ||
Siam teak |
|
55 | ||
Thailand teak |
|
55 |
The intraclass spectral variations of wood samples may be caused by the wood age, cutting position, and geographic place. For each wood species, our wood samples are taken from different individual trees at random position to include these intraclass differences. In this way, the reliability of our proposed scheme can be ensured.
The spectrometer used here is an Ocean Optics USB2000 + microfiber spectrometer of Ocean America. The visible/near-infrared (VIS/NIR) spectra of wood samples are picked up by this spectrometer system which consists of a halogen lamp, a universal serial bus, a computer, a customized holder, a fiber spectrometer (Ocean Optics Inc, USA), as well as a piece of optics fiber, as illustrated in Figure
Spectral data acquisition system’s structure graph: (1) halogen lamp; (2) universal serial bus; (3) computer; (4) holder; (5) spectrometer; (6) optic fiber; (7) test sample.
Considering the spectrometer’s price, we choose the VIS/NIR spectrometer for spectral acquisition. As we all know, the spectral data in visible band may be variable because the wood sample’s color may change with the environmental variation. Therefore, we must keep the environment stable during the spectral acquisition and wood sample’s preservations. In addition, 780–900 nm is a near-infrared band, which can characterize the different wood properties to some extents.
For each spectral curve, the VIS/NIR band includes thousands of spectral data so that the dimension of acquired spectral curve is very large. In fact, each spectral curve contains redundant information, which will reduce the wood species classification speed and accuracy. In this paper, Principal Component Analysis (PCA) and Wavelet Transform (WT) are used and compared for spectral dimension reduction in our one-class classifier, SVDD.
PCA is a transformation scheme in multivariate statistics, which is one of the most commonly used spectral feature extraction schemes. WT is an efficient data compression tool that decomposes the signal into a series of wavelet functions with time-frequency analysis capabilities. WT can compress the spectral data, filter the spectral noises, and extract useful spectral features. The spectra are divided into low-frequency and high-frequency parts. The low-frequency coefficient contains most useful information, while the high-frequency coefficient contains redundant information such as noises. Therefore, we replace the original spectral data with low-frequency wavelet coefficients.
Moreover, in our unsupervised clustering stage, the t-distributed stochastic neighbor embedding (t-SNE) algorithm is used for the secondary spectral dimension reduction. This algorithm is a nonlinear dimension reduction which is an improved version of the original SNE algorithm [
In the SVDD, all objects in known category are considered as an ensemble to construct an optimal hypersphere which can almost consist of all objects
When the input objects are not linearly divisible, a nonlinear mapping
The standard quadratic programming algorithm can be used to solve the optimal solution
For a new testing specimen
If
In this study, there are 5 wood species in the known category, as illustrated in Table
Class label and code for 5 wood species in known category.
Class label | Wood species | Code |
---|---|---|
1 | Bubinga |
|
2 | Merbau |
|
3 | Burma teak |
|
4 | African padauk |
|
5 | Casla |
|
In our clustering analysis, an accurate CFSFDP algorithm [
Here,
The distance
Therefore, one sample may be a clustering center when it has a relatively large
Once the clustering centers are determined, one sample is classified into the cluster which consists of the nearest neighbor of this sample with a higher local density.
The performance of one-class classifier SVDD is evaluated by the classification precision, recall, and In the CFSFDP algorithm [ Here, Here, The clustering correct ratio (CCR) is used to evaluate the clustering accuracy. When the cluster number is determined, external criteria are used for clustering evaluation by use of each sample’s true class. Since the calculated clusters may be different from the true clusters, we suppose the true clusters of dataset with
In this section, an updating procedure is performed to increase the wood species quantity in the training dataset so that our system can recognize more and more wood species. If one cluster of an unknown species in the unknown category has an adequate amount of wood samples (i.e., approximately 50 samples per wood species), these wood samples are removed from clusters and identified by human experts or other schemes to ensure to get the correct wood species name. Then, these wood samples are considered as a new known species and are sent into the classifiers, SVDD and BP neural network, to retrain them. By this way, our system can classify this new wood species as in the known category with SVDD and then recognize this wood species correctly with the BP neural network.
Moreover, two issues should be considered. First, in the one-class classification by the use of SVDD, some wood samples in the known category may be falsely classified as in the unknown category. These wood samples will be put into some clusters later in the subsequent unsupervised clustering processing. Therefore, if one cluster of an unknown species is removed from clusters and identified by human experts or other schemes to ensure to get the correct wood species name, all wood samples of this cluster must be processed one by one to find and discard those samples falsely classified by SVDD possibly. Second, in the one-class classification, some wood samples in the unknown category may be falsely classified as in the known category on the contrary. These wood samples will be sent into the BP neural network for further species recognition. The ideal classification output should be “rejected classification.”
The acquired spectral curve has noises in the beginning and terminal bands due to the system error. Therefore, these noisy bands should be deleted, and the spectral curve in the band of 450 nm∼900 nm is remained for the model training and testing. Figure
Spectral curve graph of our 13 wood species.
The WT and PCA are used and compared for spectral dimension reduction. As for WT, the sym4 wavelet basis is used, and the preprocessed spectral curve is decomposed into 5 levels. Therefore, the original 1328D spectral curve is reduced to 48D, and the low frequency wavelet coefficient is remained and sent into the SVDD classifier. As for PCA, the cumulative contribution ratio (%) reaches to 99% for the first 15 principal components, which can represent the original spectral curve’s basic information. Therefore, the first 15 principal components are remained for SVDD classification, as illustrated in Figure
The cumulative contribution ratio graph of PCA.
In our one-class classification experiment, the Gaussian kernel function is used in
One-class classification of SVDD with optimal parameters and WT/PCA.
Methods | Precision | Recall |
|
|
|
---|---|---|---|---|---|
WT-SVDD | 0.9963 | 0.98 | 0.9881 | 0.17 | 18 |
PCA-SVDD | 0.9241 | 0.97 | 0.9465 | 0.06 | 18 |
The BP neural network’s structure is as follows. Its input layer has 48 nodes, while the output layer has 5 nodes. The hidden layer has 27 nodes that are determined by multiple tests. The delivery function between the input layer and hidden layer is selected as “tansig,” while that between the hidden layer and output layer is “purelin.” The iteration times are 500, with an error target of 0.0002 and a learning rate of 0.01. After network training, this neural network is used for wood species recognition for samples in the known category which may include some samples falsely classified as in the known category by SVDD possibly.
To test neural network’s classification performance, the testing dataset in the known category (
SSE comparison for the known category and unknown category.
The relationship between threshold
In our practical system, there are 100 samples in testing datasets of 5 wood species in the known category (Table
Classification result of the BP neural network.
First, the t-SNE algorithm is used for the spectral secondary dimension reduction after WT. The dimension reduction results of t-SNE, PCA, Sammon Mapping (SM) [
Comparison of secondary dimension reduction results: (a) t-SNE; (b) PCA; (c) SM; (d) IFM; (e) LE; and (f) LDA.
Second, the CFSFDP algorithm is used for clustering of 2 wood species
Decision chart for parameters
CH value for different clustering numbers.
Third, the remaining 6 wood species are added to our system one by one, and the clustering analysis is performed again. Since the t-SNE algorithm is based on uncertainty in essence, we fulfill the clustering experiments for 50 times when one wood species is added. The times of correct clustering are recorded and CCR is computed, as illustrated in Tables
CFSFDP clustering analysis with CH criterion.
True wood species number | Different values of clustering number |
CCR (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
2 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
3 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
4 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
5 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 100 |
6 | 0 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 100 |
7 | 0 | 0 | 0 | 0 | 0 | 49 | 1 | 0 | 0 | 98 |
8 | 0 | 0 | 0 | 0 | 0 | 0 | 49 | 1 | 0 | 98 |
CFSFDP clustering analysis with DB criterion.
True wood species number | Different values of clustering number |
CCR (%) | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||
2 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
3 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
4 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 0 | 100 |
5 | 0 | 0 | 0 | 50 | 0 | 0 | 0 | 0 | 0 | 100 |
6 | 0 | 0 | 0 | 1 | 49 | 0 | 0 | 0 | 0 | 98 |
7 | 0 | 0 | 1 | 7 | 24 | 18 | 0 | 0 | 0 | 36 |
8 | 0 | 0 | 0 | 0 | 3 | 10 | 37 | 0 | 0 | 74 |
Last, the clustering results are also evaluated by some external criteria such as Rand, Adjusted Rand, Jaccard, Fowlkes–Mallows, whose value interval is
CFSFDP clustering analysis with external criterion.
Clustering algorithm | External criterion | Different values of clustering number |
||||||
---|---|---|---|---|---|---|---|---|
2 | 3 | 4 | 5 | 6 | 7 | 8 | ||
CFSFDP | Rand | 0.9580 | 0.9504 | 0.9681 | 0.9793 | 0.9854 | 0.9887 | 0.9912 |
Adjusted Rand | 0.9160 | 0.8861 | 0.9128 | 0.9337 | 0.9462 | 0.9527 | 0.9587 | |
Jaccard | 0.9155 | 0.8564 | 0.8760 | 0.8985 | 0.9138 | 0.9218 | 0.9301 | |
Fowlkes–Mallows | 0.9568 | 0.9231 | 0.9341 | 0.9467 | 0.9551 | 0.9594 | 0.9638 |
In this paper, we propose a novel incremental self-adaptive wood species classification prototype system. Our prototype system can recognize the incremental wood species quantity. The visible/near-infrared spectrometer is used to pick up the spectral curves of wood samples for the subsequent wood species classification. First, when new wood samples of unknown wood species are added, they are classified as an unknown category by our one-class classifier, SVDD, while the existent wood species are classified as a known category by the SVDD. Second, the wood samples of known species are sent into the BP neural network for the subsequent wood species classification. A few wood samples of the unknown category falsely classified by SVDD as the known category may be processed by the BP network as “rejected classification.” Third, the new wood samples of unknown species are sent into the CFSFDP algorithm for the unsupervised clustering, and the clustering result is evaluated by the internal and external norms. An improvement on this CFSFDP algorithm is proposed to fulfill the automatic determination of clustering number by use of CH or DB internal criterion. Last, if one cluster of one unknown species has an adequate amount of wood samples, these wood samples are removed and identified by human experts or other schemes to ensure to get the correct wood species name. All wood samples of this cluster must be processed one by one to find and discard those samples falsely classified by SVDD possibly. Then, these wood samples are considered as a new known species and are sent into the classifiers, SVDD and BP neural network, to train them again. This procedure is repeated so that our system can recognize more and more wood species.
Among wood properties such as wood species, wood density, surface color, surface roughness, surface defects, wood strength, moisture content, and so on, wood species is the most important property since other wood properties are usually related to wood species. Many sensors and measurement schemes have been used to detect these wood properties [
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.
This research was supported by the National Natural Science Foundation of China with grant no. 31670717, the Fundamental Research Funds for the Central University with grant no. 2572017EB09, and the Heilongjiang Province Natural Science Foundation with grant no. C2016011.