In the engineering field, excessive data dimensions affect the efficiency of machine learning and analysis of the relationships between data or features. To render feature dimensionality reduction more effective and faster, this paper proposes a new feature dimensionality reduction approach combining a sampling survey method with a heuristic intelligent optimization algorithm. Drawing on feature selection, this method builds a feature-scoring system and a reduced-dimension length-scoring system based on the sampling survey method. According to feature scores and reduced-dimension lengths, the method selects a number of features and reduced-dimension lengths that are ranked in the front with high scores. This feature dimensionality reduction method allows for in-depth optimal selection of features and reduced-dimension lengths with high scores using an improved heuristic intelligent optimization algorithm. To verify the effectiveness of the dimensionality reduction method, this paper applies it to road roughness time-domain estimation based on vehicle dynamic response and gene-selection research in bioengineering. Results in the first case show that the proposed method can improve the accuracy of road roughness time-domain estimation to above 0.99 and reduce measured data of the vehicle dynamic response, reducing the experimental workload significantly. Results in the second case show that the method can select a set of genes quickly and effectively with high disease recognition accuracy.
The curse of dimensionality is a common problem in engineering. Ineffective and unreasonable feature dimensionality reduction will compromise machine-learning efficiency, pattern recognition accuracy, and data-mining efficiency while increasing the workload of measured data experiments to some extent. A set of high-dimension features possesses the following problems: too many features but few samples, too many features with few or no relations to the mining task, and excessive redundancy among features [
Feature dimensionality reduction methods can be classified into two types: feature selection and feature projection. Feature projection is also called
The road roughness time-domain estimation of vehicle dynamic response is taken as an example in this paper, capturing feature dimensionality reduction in multivariate time series. Acquiring road roughness information based on vehicle dynamic response is an economical and practical method, but most researchers have only improved the precision of road roughness estimation by using different neural networks or enhanced estimation methods [
For instance, in gene-selection research on cancer, the mining of gene expression profile data can identify cancer types. The gene expression profile has many data dimensions but small samples. For instance, leukemia has 7129 gene dimensions but only 72 samples for analysis, which increases the challenges of feature dimensionality reduction. Various methods exist for gene selection, but they are computationally costly and complex [
To solve the problem of excessive dimensions, this paper proposes a new feature selection method based on a sampling survey method and heuristic intelligent optimization algorithm. The general process is as follows: first, we build a feature-scoring system and a reduced-dimension length-scoring system based on the sampling survey method; second, we sort features and reduced-dimension lengths according to their scores and select the features and reduced-dimension lengths that are ranked in the front; third, to apply selection optimization to features and reduced-dimension lengths simultaneously, we redefine the meaning of the information carried by individuals in the heuristic intelligent optimization algorithm population and improve the algorithm accordingly. To better explain and verify the effect of the dimensionality reduction method, this paper conducts feature dimensionality reduction using road roughness time-domain estimation of vehicle dynamic response and gene-selection research in bioengineering. Results show that the new feature dimensionality reduction method proposed can select features quickly and effectively: the accuracy of road roughness time-domain estimation was generally higher than 0.99 and effectively reduced vehicle dynamic response measurement data, and a reasonable number of genes was selected.
The sampling survey method takes part of the population as samples for analysis [
The feature-scoring system operates as follows:
Draw Set a threshold value Score each feature based on the set of valid feature scores. The score of each feature is composed of the following two parts:
The total score of a feature is represented by the following formula:
According to Formulas ( Rank features according to their total scores and choose the first
The scoring system for reduced-dimension length is as follows:
These steps are the same as in the feature-scoring system Score the length of each reduced dimension according to the set of valid feature scores. The score of each dimension length is as follows: Rank each dimension length according to
To verify that a scoring system based on the sampling survey method can be used to effectively select features with the greatest influence on results, this paper uses a calculation example for simple mathematical analysis: the classical problem of Hald cement. Hald cement contains four chemical components: 3CaO·Al2O3 (
Data on relations between content
Sample no. | |||||
---|---|---|---|---|---|
1 | 7 | 26 | 6 | 60 | 78.5 |
2 | 1 | 29 | 15 | 52 | 74.3 |
3 | 11 | 56 | 8 | 20 | 104.3 |
4 | 11 | 31 | 8 | 47 | 87.6 |
5 | 7 | 52 | 6 | 33 | 95.9 |
6 | 11 | 55 | 9 | 22 | 109.2 |
7 | 3 | 71 | 17 | 6 | 102.7 |
8 | 1 | 31 | 22 | 44 | 72.5 |
9 | 2 | 54 | 18 | 22 | 93.1 |
10 | 21 | 47 | 4 | 26 | 115.9 |
11 | 1 | 40 | 23 | 34 | 83.8 |
12 | 11 | 66 | 9 | 12 | 113.3 |
13 | 10 | 68 | 8 | 12 | 109.4 |
In statistics, variance reflects the degree to which two or more data points are different. Then, using the idea of calculating the degree of contribution in PCA, this paper takes the proportion of variance of one variable from the total variance of all variables to represent the degree of influence of that variable:
Formula (
According to Step 1 of the scoring system based on the sampling survey method proposed in this paper, we must also calculate the fitness of each sample by considering the accuracy of classification recognition, data-mining efficiency, or neural network estimation accuracy as the fitness to analyze the four features. To better compare their degrees of influence, we considered the output approximation degree of RBFNN when using the scoring system; Table
Scores of four components of Hald cement.
0.0920 | 1.8962 | 0.6642 | 1.1559 |
When ranking the four features by degree of influence in descending order, we obtained
This paper uses an artificial fish swarm algorithm (AFSA) and particle swarm optimization (PSO) to explain and verify the new feature dimensionality reduction method [
AFSA is improved as follows:
Add the memory function to all AFSA behaviors and store historical optimal positions and corresponding optimal values of individuals and the entire fish swarm Introduce the speed and position calculation formulas of PSO into swarm foraging behavior and update and calculate the movement speed and position of the next-generation fish swarm According to the idea of variation in a genetic algorithm, if the global optimum remains the same in multiple records, we shall disturb multiple fish in the swarm with a certain probability
The PSO is improved as follows:
To avoid the prematurity phenomenon in PSO, introduce the probability judgment Metropolis rule of a simulated annealing (SA) algorithm Change the update criterion of temperature variable Taking the nonuniform mutation idea of a genetic algorithm as reference, disturb learning parameter
To theoretically confirm the excellent optimum searching performance of the improved heuristic intelligent optimization algorithm, this paper analyzes convergence of the improved PSO and improved AFSA. Solis and Wets [ Condition H1:
Condition H1 ensures that the fitness Condition H2: For
Condition H2 indicates that if a stochastic optimization algorithm can search and find the globally optimal solution, the algorithm goes through the globally optimal solution at least once when the number of iterations approaches infinity; that is, the algorithm has ergodicity. The convergence of the improved PSO is confirmed below.
In the PSO, we define each particle by updating its speed and position based on the following formulas:
The PSO stores the historical optimal position of a swarm,
Then, we judge the convergence of particle speed and position according to convergence in probability. With an inertia weight
If the number of iterations is sufficiently large, then the particle’s position and update speed will converge to
Additionally, we analyze and confirm the convergence of PSO using the difference equation [
Then, according to Formulas (
Substituting Formulas (
It is a second-order constant-coefficient nonhomogeneous difference equation, and this paper uses the characteristic equation method to solve it.
Equation (
If
If
If
If
We propose an improved PSO to improve learning factors
We prove the convergence of the improved PSO proposed using the three methods above. However, if the algorithm is required to converge to the global optimal solution, it must satisfy condition H2. In this case, the algorithm goes through the global optimal solution at least once when its number of iterations approaches infinity; that is, the particle in the PSO should demonstrate ergodicity. Apparently, standard PSO cannot guarantee identification of the global optimal value, resulting in the aforementioned prematurity phenomenon. We introduce SA into the evolutionary criteria of PSO because SA has been theoretically proven to converge to the global optimal solution with a probability of 1 in certain conditions [
Next, this paper proposes modifying the learning factor to provide the algorithm with a fast flying speed in early iterations and then obtaining a large search scope in the search space to search and find the global optimal solution when the search scope declines as the search proceeds. According to the derivation of reference [
Because we use the speed and position update formulas of PSO in the fish swarm’s foraging behaviors where the fish swarm may also exhibit clustering and tailgating behaviors, all of which move in the direction with higher fitness, we suppose that
Similarly, to satisfy condition H2, the position of the fish swarm should have ergodicity in AFSA. Like PSO, the standard AFSA also suffers from the prematurity phenomenon. If a super individual appears in the early iterations of the algorithm, then the whole fish swarm will converge quickly to the value. Therefore, we propose a third AFSA improvement: if the historical optimal solution does not change in multiple iterations, then the algorithm randomly changes some individuals in the fish swarm. When the iteration approaches infinity, the fish swarm position will take all possible values in the definition domain with a theoretic probability of 1. Essentially, the improved AFSA is globally convergent with a certain probability.
To verify the computational complexity of the improved PSO and improved AFSA, we choose the following test function [
For analysis, we used a Lenovo computer with an Intel(R) Core(TM) i5-4590 processor, CPU at 3.30 GHz, and 8 GB memory. The PSO and AFSA used 20 particles over 30 iterations. Figure
Iterative evolution curves of test function.
The standard AFSA and standard PSO were trapped in the area of the local optimal value in early iterations and slowly identified the global optimal value as the number of iterations increased, whereas the improved AFSA and improved PSO quickly identified the area of global optimal value in early iterations and finally converged to the global optimal value. The improved PSO and improved AFSA spent only 0.0904 s and 0.0719 s, respectively, converging to the global optimal value, whereas the standard PSO and standard AFSA took 0.0990 s and 0.5321 s, respectively.
To apply the heuristic intelligent optimization algorithm to optimal feature selection, we improved the algorithm based on classification recognition accuracy, data-mining efficiency, or neural network estimation precision as the target value of algorithm optimization. We redefined the individuals of each population in the intelligent optimization algorithm (i.e., redefined information carried by either the fish swarm in AFSA or each particle in PSO). By imitating the transcription and translation process of genetic information in biology, information carried by individuals is defined as shown in Figure
Information carried by individuals in the algorithm.
Here,
Selected intervals of reasonable dimension length after dimensionality reduction.
Lower limit of interval | Upper limit of interval | Reasonable dimension length |
---|---|---|
… | … | … |
In this case,
If the value of
According to continuous iterative calculation of the heuristic intelligent optimization algorithm, the information carried by each individual will evolve in the direction of the optimal target value to obtain the most reasonable dimension length after dimensionality reduction and feature selection.
Researchers generally measure contacting or noncontacting road roughness to determine roughness; however, this method returns measurement results slowly and requires expensive equipment. A method combining a dynamic response and neural network can quickly estimate road roughness. Yet little research exists about the types and quantity of dynamic response and the points to be measured in the vehicle. Because measurement points can change positions freely in the vehicle, and each position can offer three dynamic responses (i.e., vertical vibration acceleration, speed, and displacement), this example theoretically has infinite feature dimensions.
Assume the vehicle is symmetric along the longitudinal axis, and the left and right wheels are under the same road conditions while driving. Given these assumptions, we simplify the vehicle model into a half-vehicle model and the vehicle body into a rigid rod. The vibration model in this paper only considers degrees of freedom: the vehicle body’s vertical vibration, pitch-angle vibration, vertical vibration of the seat system, vertical vibration of front wheel, and vibration of the rear wheel (Figure
Vehicle vibration model with five degrees of freedom.
In Figure
The following is Lagrange’s second equation:
Because the pitch angle has a smaller vibration, the paper takes
The kinetic energy of the system is
The potential energy of the system is
The dissipated energy of the system is
Substituting Equations (
Vibration simulation model.
According to the vibration simulation model, we can obtain dynamic response data conveniently. The vibration model uses the filtered white noise method to produce the road roughness signal, meeting experiment requirements, and considers a level-B road as the road to be estimated.
This example uses the improved AFSA for optimal feature selection. We number the vertical accelerated speed, vertical speed, and vertical displacement of the positions of measurement point, center of the front wheel, center of the rear wheel, and center of the driver’s seat as shown in Table
Numbers of dynamic responses.
Measurement point | Front wheel | Rear wheel | Driver’s seat | |
---|---|---|---|---|
Accelerated speed | 1 | 3 | 4 | 5 |
Speed | 6 | 8 | 9 | 10 |
Displacement | 11 | 13 | 14 | 15 |
Numbers 2, 7, and 12 are the pitch-angle accelerated speed, pitch-angle speed, and pitch angle of the measurement point, respectively.
The example uses the improved AFSA and assumes information carried by the
In the iteration process, the improved AFSA finds the corresponding reasonable dimension length
The example uses the RBFNN for time-domain estimation of road roughness. We used the improved AFSA and conducted 12 total optimization experiments; results appear in Table
Optimization selection results of dynamic response features.
Test number | Features selected | Distance | |
---|---|---|---|
1 | [1, 3, 4, 7, 8, 12, 13] | 0.9959 | −0.5574 |
2 | [1, 3, 4, 12, 13, 15] | 0.9934 | 0.3557 |
3 | [2, 3, 4, 13] | 0.9939 | 0.9049 |
4 | [1, 4, 8, 9] | 0.9948 | −1.4350 |
5 | [2, 3, 4, 13] | 0.9944 | −0.0074 |
6 | [1, 3, 4, 8, 9, 14] | 0.9890 | −1.2500 |
7 | [2, 3, 4, 13] | 0.9944 | −0.5321 |
8 | [2, 3, 4, 6, 10, 12] | 0.9904 | −1.0486 |
9 | [1, 4, 8, 9] | 0.9932 | −1.4712 |
10 | [1, 3, 4, 9, 11, 13] | 0.9823 | −0.3755 |
11 | [2, 4, 7, 13, 14] | 0.9892 | −0.9510 |
12 | [3, 4, 8, 13, 14, 15] | 0.9649 | 0.4839 |
In Table
Road roughness estimation results with features in Experiment 3.
Road roughness estimation results with features in Experiment 4.
The actual value was close to the estimated value. The coefficient of determination exceeded 0.99 in both tests, indicating that the proposed feature dimensionality reduction method can select a feature set with high contributions to road roughness using only four sets of features; thus, we can accurately estimate road roughness using only four dynamic response values. To further confirm this result, we used the random forest method to estimate road roughness based on the results of Experiments 3 and 4. Estimation results are presented in Figures
Road roughness estimation results based on the random forest method and features in Experiment 3.
Road roughness estimation results based on the random forest method and features in Experiment 4.
The road roughness estimation results obtained with feature sets from Experiments 3 and 4 based on the random forest method were 0.9489 and 0.9708, respectively. Although the estimation accuracy of the random forest method was slightly lower than that of RBFNN, it was still quite high, suggesting that the set of dynamic response features selected with the proposed method can match other estimation data methods perfectly. When the proposed feature selection method was combined with a prediction method, only a few dynamic response parameters were needed to estimate road roughness accurately.
The experimental subjects in the example consisted of four open microarray datasets on leukemia, the colon, SRBCT, and brain cancer. Leukemia [
Brain cancer [
Because each type of tumor disease is related to many genes, we roughly extracted genes using the information index to classification (IIC) [
The example uses the extreme learning machine (ELM) to recognize tumor subtypes and takes the mean and standard deviations of the 5-fold CV accuracy of 100 experiments to verify the accuracy of extracted genes.
To demonstrate the advantage of the proposed method in sample classification accuracy, we chose five methods with high gene-selection accuracy, including binary particle swarm optimization BPSO-ELM, K-means-PSO-ELM, K-means-BPSO-ELM, BPSO-GCSI-ELM, and K-means-GCSI-MBPSO-ELM, to compare with the proposed method. Results of 100 independent repeated experiments were taken from [
Comparisons between proposed feature dimensionality reduction method and six other methods.
Mean accuracy (%) + std (%) (number of genes) | ||||
---|---|---|---|---|
Colon | Brain cancer | Lung cancer | Lymphoma | |
BPSO-ELM | 93.34 ± 1.99 (9) | 85.45 ± 2.33 (7) | 94.80 ± 0.72 (11) | 83.50 ± 2.72 (8) |
K-Means-PSO-ELM | 93.94 ± 1.17 (5) | 86.55 ± 2.35 (5) | 95.67 ± 0.72 (11) | 83.72 ± 2.33 (6) |
K-Means-BPSO-ELM | 93.50 ± 2.02 (9) | 87.23 ± 2.34 (8) | 95.64 ± 0.56 (12) | 85.14 ± 2.87 (6) |
BPSO-GCSI-ELM | 97.83 ± 1.33 (9) | 89.89 ± 2.23 (7) | 96.13 ± 0.42 (12) | — |
K-Means-GCSI-MBPSO-ELM | 97.61 ± 1.37 (6) | 88.63 ± 2.16 (3) | 97.10 ± 0.63 (11) | 86.97 ± 2.44 (8) |
Proposed method | 99.08 ± 1.23 (13) | 93.73 ± 2.64 (4) | 98.67 ± 0.43 (11) | 93.60 ± 2.33 (7) |
BPSO is a PSO algorithm that solves discrete optimization problems using particles formed by the binary system [
Because other gene extraction methods have demonstrated good effects on gene extraction of leukemia and SRBCT, this paper does not compare the results of these two types of diseases. Tables
Gene sets selected using the proposed feature dimensionality reduction method with classification accuracy.
Dataset | Selected gene sets | Accuracy of test set (%) ± std | 5-fold CV accuracy (%) ± std |
---|---|---|---|
Leukemia | 4050, 2642, 2121 | 100 ± 0.00 | 100 ± 0.00 |
4050, 2642, 1882 | 100 ± 0.00 | 100 ± 0.00 | |
4050, 2642, 3258 | 100 ± 0.00 | 100 ± 0.00 | |
2335, 2642, 1843, 4050 | 100 ± 0.00 | 100 ± 0.00 | |
Brain cancer | 3041, 3052, 3692 | 94.36 ± 0.043 | 92.47 ± 0.021 |
3052, 973, 3041, 3692, 4796 | 92.00 ± 0.023 | 91.78 ± 0.046 | |
3041, 3052, 1014, 3692 | 92.42 ± 0.042 | 92.73 ± 0.026 | |
7129, 2881, 3052, 865, 1970, 2935, 4871 | 92.78 ± 0.012 | 91.88 ± 0.019 | |
Colon | 493, 792, 739, 513, 280, 1346, 1334, 1954, 1110, 1563, 223, 175, 1668 | 97.13 ± 0.028 | 99.08 ± 0.012 |
493, 792, 377, 1976, 280, 1039, 1334, 698, 1110, 1563, 175, 1668, 1617 | 98.18 ± 0.024 | 98.34 ± 0.009 | |
792, 1423, 14, 1976, 1909, 1110, 1589, 102, 107, 1916, 175, 1151, 251 | 93.73 ± 0.031 | 98.71 ± 0.013 | |
792, 14, 1976, 765, 1909, 1524, 1110, 175, 43, 53, 1293, 1740, 251 | 96.86 ± 0.033 | 99.05 ± 0.011 | |
SRBCT | 742, 1003, 1954, 430, 2050, 123 | 100 ± 0.00 | 100 ± 0.00 |
545, 1955, 1434, 509, 971, 255 | 100 ± 0.00 | 100 ± 0.00 | |
1003, 545, 1911, 153, 123, 1489, 2161 | 100 ± 0.00 | 100 ± 0.00 | |
1955, 2050, 545, 2144, 2045, 123, 1489 | 100 ± 0.00 | 100 ± 0.00 | |
Lung cancer | 1765, 2779, 2841, 1474, 2045, 3191, 2763, 2817, 525, 1552, 1630 | 100 ± 0.00 | 98.27 ± 0.004 |
525, 1493, 607, 2763, 792, 580, 867, 368, 3279, 2158, 1225 | 100 ± 0.00 | 98.39 ± 0.003 | |
1765, 883, 2763, 792, 580, 867, 985, 3279, 2988, 2045, 814 | 100 ± 0.00 | 98.67 ± 0.004 | |
1765, 525, 2763, 2841, 1474, 2583, 867, 985, 2045, 814, 918 | 100 ± 0.00 | 98.67 ± 0.00 | |
Lymphoma | 5279, 4862, 6965, 2374, 1855, 2060 | 93.45 ± 0.025 | 93.39 ± 0.018 |
2828, 2437, 152, 1855, 4998, 2416 | 97.55 ± 0.028 | 92.62 ± 0.019 | |
1855, 2828, 152, 2437, 80, 530, 1102 | 92.30 ± 0.041 | 92.36 ± 0.029 | |
5279, 4687, 4940, 5449, 1133, 1855, 4519 | 90.50 ± 0.031 | 93.60 ± 0.023 |
To compare the ELM classifier with other classifiers based on disease classification results, we selected the random forest classifier and support vector machine (SVM) classifier to identify tumor diseases based on the gene sets in Table
Tumor disease recognition results of random forest classifier and SVM classifier.
Dataset | Gene sets selected | Random forest classifier (%) ± std | SVM classifier (%) ± std |
---|---|---|---|
Leukemia | 4050, 2642, 2121 | 98.06 ± 0.72 | 100 ± 0.00 |
4050, 2642, 1882 | 95.56 ± 1.10 | 99.44 ± 0.72 | |
4050, 2642, 3258 | 96.39 ± 1.34 | 98.75 ± 0.79 | |
2335, 2642, 1843, 4050 | 96.53 ± 1.18 | 97.50 ± 0.88 | |
Brain cancer | 3041, 3052, 3692 | 82.65 ± 3.15 | 65.48 ± 5.18 |
3052, 973, 3041, 3692, 4796 | 81.75 ± 3.52 | 72.75 ± 3.88 | |
3041, 3052, 1014, 3692 | 82.68 ± 3.28 | 75.83 ± 3.76 | |
7129, 2881, 3052, 865, 1970, 2935, 4871 | 82.20 ± 3.42 | 84.95 ± 3.28 | |
Colon | 493, 792, 739, 513, 280, 1346, 1334, 1954, 1110, 1563, 223, 175, 1668 | 85.32 ± 2.99 | 94.52 ± 2.18 |
493, 792, 377, 1976, 280, 1039, 1334, 698, 1110, 1563, 175, 1668, 1617 | 79.52 ± 0.038 | 90.16 ± 2.46 | |
792, 1423, 14, 1976, 1909, 1110, 1589, 102, 107, 1916, 175, 1151, 251 | 83.55 ± 5.03 | 93.55 ± 2.74 | |
792, 14, 1976, 765, 1909, 1524, 1110, 175, 43, 53, 1293, 1740, 251 | 85.65 ± 3.52 | 94.35 ± 2.77 | |
SRBCT | 742, 1003, 1954, 430, 2050, 123 | 94.12 ± 2.17 | 99.26 ± 0.52 |
545, 1955, 1434, 509, 971, 255 | 95.41 ± 1.78 | 98.94 ± 1.04 | |
1003, 545, 1911, 153, 123, 1489, 2161 | 94.72 ± 1.72 | 97.63 ± 1.15 | |
1955, 2050, 545, 2144, 2045, 123, 1489 | 94.78 ± 1.64 | 98.01 ± 1.10 | |
Lung cancer | 1765, 2779, 2841, 1474, 2045, 3191, 2763, 2817, 525, 1552, 1630 | 94.91 ± 0.91 | 98.74 ± 0.33 |
525, 1493, 607, 2763, 792, 580, 867, 368, 3279, 2158, 1225 | 94.26 ± 1.12 | 98.55 ± 0.48 | |
1765, 883, 2763, 792, 580, 867, 985, 3279, 2988, 2045, 814 | 94.83 ± 1.03 | 99.16 ± 0.31 | |
1765, 525, 2763, 2841, 1474, 2583, 867, 985, 2045, 814, 918 | 94.52 ± 1.00 | 99.18 ± 0.43 | |
Lymphoma | 5279, 4862, 6965, 2374, 1855, 2060 | 79.22 ± 3.61 | 83.71 ± 2.67 |
2828, 2437, 152, 1855, 4998, 2416 | 73.19 ± 4.06 | 78.98 ± 3.03 | |
1855, 2828, 152, 2437, 80, 530, 1102 | 72.76 ± 4.65 | 71.57 ± 3.63 | |
5279, 4687, 4940, 5449, 1133, 1855, 4519 | 73.24 ± 4.50 | 75.47 ± 3.56 |
In terms of tumor disease recognition accuracy based on the selected gene sets using the proposed method, the random forest classifier and SVM classifier were generally poorer than the ELM classifier. For lung cancer, however, the SVM classifier’s average accuracy was slightly higher than that of the ELM classifier and its standard deviation was smaller.
This paper proposes a new feature dimensionality reduction method based on a sampling survey method and heuristic intelligent optimization algorithm. First, we build a feature-scoring system and a reduced-dimension length-scoring system based on the sampling survey method and select features ranking in the front with reasonable dimension lengths. Then, we select features according to the improved heuristic intelligent optimization algorithm. Our method has only two steps with simple operation and low computational complexity. We successfully apply the proposed feature dimensionality reduction method to road roughness time-domain estimation research. Results show that we only need as few as four dynamic responses to estimate road roughness accurately. The experimental results of 12 feature dimensionality reductions reveal the mean accuracy of road roughness time-domain estimation to be 0.9897. We successfully apply the proposed feature dimensionality reduction method to gene-selection research in bioengineering. Results show that the gene sets selected with the proposed method have high classification accuracy with mean classification accuracy substantially higher than that of other gene-selection methods.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that there is no conflict of interests regarding the publication of this paper.
This research was supported by the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX17_0647) and National Key R&D Program of China (2016YFD0701103).