Mobile robots that operate in realworld environments interact with the surroundings to generate complex acoustics and vibration signals, which carry rich information about the terrain. This paper presents a new terrain classification framework that utilizes both acoustics and vibration signals resulting from the robotterrain interaction. As an alternative to handcrafted domainspecific feature extraction, a twostage feature selection method combining ReliefF and mRMR algorithms was developed to select optimal feature subsets that carry more discriminative information. As different data sources can provide complementary information, a multiclassifier combination method was proposed by considering a priori knowledge and fusing predictions from five data sources: one acoustic data source and four vibration data sources. In this study, four conceptually different classifiers were employed to perform the classification, each with a different number of optimal features. Signals were collected using a tracked robot moving at three different speeds on six different terrains. The new framework successfully improved classification performance of different classifiers using the newly developed optimal feature subsets. The greater improvement was observed for robot traversing at lower speeds.
Mobile robots are increasingly deployed in realworld environments, such as forestry, mining, rescue, site inspection, and space exploration [
A lot of efforts have been made to explore the applications of terrain perception for mobile robots. At present, the most commonly used sensing modalities are cameras and LADARs. Visionbased methods present a powerful tool to perceive the surrounding environment, by which texture or color information is utilized to define the terrain. However, it is unreliable as changes in appearance may be caused by factors such as illumination, weather, and camouflaging by leaves [
It is well known that human beings can capture information about terrain during walking by sensing it with their feet and by the sound of their footsteps, of particular importance in dark environments [
Iagnemma et al. first proposed the vibrationbased method for terrain classification [
Although comparatively little research has been reported on acousticbased methods, it is a growing area of interest. Ojeda et al. used a microphone to classify robotterrain interactions; however, the study reported that sound performed poorly as the sole modality for terrain classification, except grass [
In this paper, a new terrain classification framework is presented to improve classification performance. There are two main contributions in our study. First, instead of extracting features from a handcrafted domain, a twostage feature selection method combining ReliefF and mRMR algorithms was developed to select optimal yet compact feature subsets, which takes both attributes weights and redundancy reduction into account. Moreover, the combination method is more computationally efficient than mRMR working alone. Second, by fusing the predictions from five data sources, a multiclassifier combination method was developed. The predicted class is determined by integrating prior knowledge with the current classification results. The proposed framework has demonstrated promising performance.
The proposed framework involves the following steps:
Data collection from tracked robotterrain interaction
Assigning labels to the prepared terrains
Splitting the collected signal into short time windows
Extracting features from each window
Selecting optimal feature subsets using the twostage feature selection method
Training a classifier using the optimal feature subsets
Predicting the class labels of these short windows
Determining the terrain class by fusing the predictions from each classifier based on the prior knowledge of each data source
A schematic overview of the framework is shown in Figure
A flowchart of the proposed terrain classification framework.
In each experiment, the tracked robot was driven over six different types of terrain: brick (
Images of the different terrains used in the experiment.
Features were extracted from each window. FFT is the most commonly used feature extraction method for acceleration and is often used to perform transformation from time domain to frequency domain. Moreover, it is the foundation of many other features such as the MFCC and frequency characteristics. Thus, FFT was chosen as a basic feature candidate for both the acoustic and acceleration data. As the SNR is higher in regions where there is more power, only the lower part of the spectrum was used in this study. The truncation point was set to 200 Hz.
MFCCs are perceptually based spectral features that have been successfully used in speech recognition, which basically maps the linear frequency scale to a scale that resembles the frequency resolution of the human ear [
Transforming (
The best Mel mapping function curve can be obtained by tuning
The classification accuracy as a function of
The gianna and shape feature vector [
Compared to an acoustic signal, there are fewer feature vectors for acceleration. As mentioned previously, FFT was adopted as the basic feature candidate, and the truncation point was set to 200 Hz. By doing so, the lower part of the spectrum is used, resulting in a higher SNR. Based on FFT, five derived frequency domain indicators were calculated: MSF, RMSF, FC, RVF, and VF [
To summarize, each short window was taken as a dataset and processed to extract different feature vectors. The feature vector candidates for the acoustic set are listed as follows:
18Dimensional MMFCCs
200Dimensional FFT
9Dimensional gianna and shape feature
227Dimensional feature vector formed by the above three feature vectors
11Dimensional temporal indicators
200Dimensional FFT
5Dimensional frequency characteristics
216Dimensional feature vector formed by the above three feature vectors
All features were normalized to range within
More features can increase computation load and are also more likely to cause overfitting of the corresponding classification model. In classification problems, there are hundreds of potential features that can be used to characterize a target object; however, noisy irrelevant features provide little information and should be found and removed. In this paper, a twostage feature selection method is developed combining ReliefF [
The ReliefF algorithm works as follows: first, an instance
Input: feature vectors and class labels
Output: the vector
set
randomly select a set
search
find
Referring to the pseudocode,
The mRMR algorithm employs MaxRelevance and MinRedundancy criteria. MaxRelevance aims to find features satisfying
Features selected using MaxRelevance criteria could have rich redundancy; therefore, MinRedundancy is performed in the next step. The MinRedundancy condition can be expressed by
Finally, the two criteria can be combined by the operator
For comparison and experimental validation, four conceptually different classifiers were used to perform the classification. The principles of each classifier are briefly described as follows.
As different sensors can provide complementary information, a multiclassifier combination method can be developed to improve classification performance. Voting principles are perhaps the most general and useful multiclassifier combination methods, aiming to make a consensus by fusing opinions from individual classifiers [
Assume that there is a pattern space
The event
A commonly used voting rule based on majority is given by
In this study, the confusion matrix [
The number of test samples is given by
The number of samples belonging to each class
The number of samples that are classified into class
Under the occurrence
The confusion matrix is believed to be able to reflect the performance of
Confusion matrix for sixclass terrain classification using SVM classifier and 18dimensional MMFCC at speed of 2 m/s. The terrain representations:
Finally, the voting principle based on prior knowledge takes the form
A data acquisition system was proposed to record signals, which consisted of a data acquisition instrument (24bit), router, computer, and three different sensors, as shown in Figure
Frame diagram of the proposed data acquisition system.
The acoustic signal was measured using an acoustic pressure sensor placed close to the first road wheel. To reduce the influence of background noise, it was pointed downward perpendicular to ground surface. For shock absorption, a bracket and damping foam block were used. A singleaxis accelerometer was mounted on the axis of the first road wheel to collect acceleration data along vertical direction. Vibrations induced in the centroid position of the robot were measured by a triaxial accelerometer along three perpendicular directions, as depicted by the coordinate system shown in Figure
The instrumented tracked robot.
To preserve symmetry, 80 samples were chosen for each class, such that a total of 480 samples were obtained. Prepared samples were separated into two sets of equal numbers, namely, the training sets and testing sets. Representative examples of the signal token from five data sources at 2 m/s are presented in Figure
Signal token examples of five data sources at 2 m/s.
The results of each trial generated a confusion matrix as depicted in Figure
Accuracies with individual data source and handcrafted feature vector at 2 m/s.
Data source  Feature vector  SVM  RF 

NB 

Acoustics 
MMFCC  89.6%  87.1%  80.8%  77.9% 
FFT  82.9%  70%  56.7%  55.4%  
Gianna and shape  58.8%  67.9%  53.8%  64.2%  


Wheel vibration 
Temporal  68.3%  62.5%  62.1%  56.3% 
FFT  90.4%  89.2%  75.4%  84.6%  
Frequency characteristics  59.6%  64.2%  62.5%  51.7%  


Centroid vibration 
Temporal  41.7%  52.9%  51.3%  47.5% 
FFT  94.2%  92.9%  72.1%  88.3%  
Frequency characteristics  46.7%  54.2%  47.1%  44.2%  


Centroid vibration 
Temporal  42.5%  44.2%  50%  46.7% 
FFT  92.1%  93%  91.3%  88.2%  
Frequency characteristics  44.6%  44.6%  46.7%  45.1%  


Centroid vibration 
Temporal  44.2%  46.3%  50%  45.5% 
FFT  93.3%  92.9%  91.3%  90.8%  
Frequency characteristics  42.5%  48.3%  46.7%  42.5% 
For the acousticbased method, accuracies between 53.8% and 89.6% were achieved. The best result in terms of accuracy was obtained using MMFCC and SVM, whereas the worst result was observed using gianna and shape and
Referring to Table
The confusion matrices obtained with SVM at 2 m/s: (a) confusion matrix of
Figure
Accuracies obtained using the proposed terrain classification framework as a function of number of optimal features: (a) Accuracies of SVM; (b) accuracies of RF; (c) accuracies of
Representative signal tokens at three different speeds from (a) data source of
From the classifier standpoint, SVM and RF clearly outperformed the other two simple classifiers; however, it should be pointed out that they are relatively computationally expensive. The highest accuracy of 99.6% was achieved using RF with 90 and 100 optimal features at 2 m/s. Additionally, accuracies given by RF at 0.8 m/s were above 95%, except for the trial with 10 optimal features. However, when traversing at 0.4 m/s, all the accuracies given by SVM were beyond 85% except for the trial with 10 optimal features, while most of the accuracies given by RF are lower than 85%. The worst performance was given by
Tables
Comparison of the proposed terrain classification framework with traditional method on benchmark feature vectors. MMFCC and FFT are adopted as benchmark feature vectors for acoustics and acceleration data sources, respectively. The driving speed is 2 m/s. The number in brackets is the dimensionality of the optimal feature subset corresponding to the best result.
Data source  Feature vector  SVM  RF 

NB 


MMFCC (18D)  89.6%  87.1%  80.8%  77.9% 

FFT (200D)  90.4%  89.17%  75.42%  84.6% 

FFT (200D)  94.2%  92.92%  72.08%  88.3% 

FFT (200D)  92.1%  92.95%  91.3%  88.2% 

FFT (200D)  93.3%  92.92%  91.3%  90.8% 


Proposed framework  97.9% (80D)  99.6% (90D)  95% (50D)  97.5% (100D) 
Comparison of the proposed terrain classification framework with traditional method on benchmark feature vectors. MMFCC and FFT are adopted as benchmark feature vectors for acoustics and acceleration data sources, respectively. The driving speed is 0.8 m/s. The number in brackets is the dimensionality of the optimal feature subset corresponding to the best result.
Data source  Feature vector  SVM  RF 

NB 


MMFCC (18D)  76.7%  75%  67.5%  72.1% 

FFT (200D)  80%  82.5%  76%  75.8% 

FFT (200D)  82.5%  82.9%  76.3%  76.3% 

FFT (200D)  87.5%  89.2%  79.6%  79.6% 

FFT (200D)  88.3%  85.4%  80%  80% 


Proposed framework  97.1% (30D)  97.5% (50D)  92.1% (30D)  91.3% (30D) 
Comparison of the proposed terrain classification framework with traditional method on benchmark feature vectors. MMFCC and FFT are adopted as benchmark feature vectors for acoustics and acceleration data sources, respectively. The driving speed is 0.4 m/s. The number in brackets is the dimensionality of the optimal feature subset corresponding to the best result.
Data source  Feature vector  SVM  RF 

NB 


MMFCC (18D)  74.2%  67.9%  62.5%  68.3% 

FFT (200D)  80%  63%  76%  53.8% 

FFT (200D)  69.2%  69.2%  76.3%  60.8% 

FFT (200D)  82.5%  79.6%  79.6%  70% 

FFT (200D)  83.8%  80%  80%  73.3% 


Proposed framework  92.1% (40D)  86.3% (50D)  81.7% (60D)  82.9% (30D) 
The purpose of terrain classification is to improve robot control and thus, in addition to classification accuracy, the classification time is another important factor used to guarantee realtime implementation. Generally speaking, algorithms resulting in faster classification times are believed to be better for running online. Figure
Classification times of different classifiers as a function of the number of optimal features.
In this paper, a new terrain classification framework was presented. The experiments were carried out with a tracked robot on six different terrains. Multiple sensors were employed to collect signals, and in total five data sources were used. A twostage feature selection method was proposed to obtain optimal feature subsets, and a multiclassifier combination method considering prior knowledge was developed. Finally, four conceptually different classifiers were employed to perform the classification.
The results showed that the new framework successfully improves classification performance with optimal feature subsets when different classifiers are used. Only a small number of features effectively contribute to classification, which demonstrates the necessity of the feature selection operation. Different distributions of the confusion matrices resulting from five data sources revealed that complementary information can be obtained from the classifier combination. In addition, greater improvements are achieved for signals collected at lower speeds, which means that our approach can successfully dig out discriminative information hidden in the weak signals. Additionally, the accuracies tend to increase at higher speeds, as higher speeds lead to stronger signals and longer travel distances. For realtime properties, the classification times increase approximately linearly with the number of optimal features. Since the computation time is affected by several factors such as the size of the training set, optimal features number, and offline training time, it is difficult to determine the best algorithm in relation to computation time. In this study, the SVM was found to be the best approach in terms of classification time. In comparison to traditional methods, this work suggests that the new framework could handle more complex terrain and increase the probability of detecting danger in advance due to the presence of the acoustic modality. Another advantage is that the proprioceptive sensors used in this study cost much less than tactile sensors, cameras, and LADARs. In future studies, additional types of hazards such as marshland, desert, and stream should be considered. To provide further variation, different locations should be considered within each terrain type.
LAser Detection And Ranging
Power spectral density
Principal component analysis
Fast Fourier transform
Support vector machine
Probabilistic neural network
Naïve Bayes
Signaltonoise ratio
Minimalredundancymaximalrelevance
Melfrequency cepstrum coefficient
Modified Melfrequency cepstrum coefficient
Zero crossing rate
Short time energy
Mean square frequency
Root mean square frequency
Frequency center
Root variance frequency
Variance frequency
Root mean square
Library for support vector machines
Random forests.
The authors declare that they have no conflicts of interest.
The authors acknowledge the support of the National Natural Science Foundation of China (Grant no. U1564210).