Feature Selection of the Rich Model Based on the Correlation of Feature Components

Currently, the popular Rich Model steganalysis features usually contain a large number of redundant feature components which may bring “curse of dimensionality” and large computation cost, but the existing feature selection methods are difficult to effectively reduce the dimensionality when there are many strongly correlated effective feature components. )is paper proposes a novel selection method for Rich Model steganalysis features. First, the separability of each feature component in the submodels of Rich Model is measured based on the Fisher criterion, and the feature components are sorted in the descending order based on the separability. Second, the correlation coefficient between any two feature components in each submodel is calculated, and feature selection is performed according to the Fisher value of each component and the correlation coefficients. Finally, the selected submodels are combined as the final steganalysis feature.)e results show that the proposed feature selection method can effectively reduce the dimensionalities of JPEG domain and spatial domain Rich Model steganalysis features without affecting the detection accuracies.


Introduction
Digital steganography is a technology that embeds information in the redundancy of digital images, audio, video, text, and so on to achieve the purpose of covert communication [1][2][3][4][5][6]. With the presentation of the HUGO steganography algorithm [7] in 2010, adaptive steganography based on the framework of "distortion function + STC coding" has become the mainstream of image steganography. Based on this framework, researchers have successively proposed a series of adaptive steganography algorithms with high antidetection performance, which make the traditional steganalysis algorithms mostly invalid [8][9][10][11][12]. In 2012, Fridrich and Kodovský proposed the Rich Model steganalysis feature [13], which effectively improved the detection performance for HUGO steganography. Since then, many features with thousands of dimensions have been successively proposed for steganalysis, such as PSRM (Project Spatial Rich Model) features [14], PHARM (Phase-Aware Projection Rich Model) features [15], GFR (Gabor Filter Rich Model) features [16], and CC-JRM (Cartesian-Calibrated JPEG Rich Model) features [17]. ese features may bring large computation and storage costs and even the problem of "curse of dimensionality." In order to reduce the dimensionality of the steganalysis feature, researchers have carried out a series of works in two different ways containing feature transformation and feature selection.
Feature dimensionality reduction based on feature transformation generally transforms the feature vector into another feature space so that the effective information would mainly concentrate around some components of the transformed feature, and the most effective feature components are selected from the transformed feature to achieve the purpose of feature dimensionality reduction. For example, Qin et al. [18] used PCA (Principal Component Analysis) to obtain the main components in the feature to achieve the purpose of reducing the feature dimension. However, the PCA method is not suitable for the feature which is not linearly distinguishable [19]. Wang et al. [20] performed one-dimensional discrete Fourier feature transformation on SRM steganalysis feature and then selected the spectral coefficients of the positive half axis as the feature vector, which effectively reduced the feature dimension. Boroumand and Fridrich [21] obtained a class of nonlinear transformations by approximating a symmetric positive semidefinite kernel function and then used them to reduce the dimensionality of feature and improve the detection accuracy at the same time. However, the two methods [20] proposed above are only applicable to the spatial domain steganography.
Dimensionality reduction based on feature selection is mainly to select a part of the feature components that can most effectively distinguish the cover images and the stegoimages from the original feature vector. For example, Xuan et al. [22] and Davidson and Jalan [23] used Bhattacharyya distance and Mahalanobis distance to measure the distinguishability of feature components, respectively, and then selected part of feature components with the maximum distances between them of the cover images and the stegoimages. Lu et al. [24] utilized an improved Fisher criterion to measure the importance of feature vectors and then selected the feature subvector with the highest Fisher value for detection. Zhang et al. [25] applied the method in [24] to the reduction of each submodel in SRM. Ma et al. [26] proposed a feature dimensionality reduction method based on decision rough set α-positive region reduction and selected partial feature components that conform to the principle of nonreduction in the positive domain and the principle of independence. However, this method brings large computation cost in the process of positive domain reduction.
In a word, the above methods can significantly reduce the dimensionalities of Rich Model steganalysis features. However, the current existing feature selection methods do not take the correlation between feature components into consideration, so they cannot effectively reduce the highly separable and correlated feature components. Aiming at this problem, a feature selection method is proposed based on the correlation of feature components. e experimental results show that the proposed selection method can reduce the dimensionalities of CC-JRM [17] and SRM [13] steganalysis features while maintaining the detection accuracies. e rest of the paper is organized as follows. Section 2 introduces the related knowledge involved in the method of this paper, while Section 3 details the feature selection method based on the correlation of feature components in the Rich Model steganalysis features. Accordingly, Section 4 gives the experimental results of this method and finally summarizes the full text.

Pearson Correlation and Fisher Criterion.
In statistics, the Pearson correlation coefficient is applied to measure the degree of linear correlation between two variables X 1 and X 2 , and its value is between −1 and 1.
e Pearson correlation coefficient r X 1 ,X 2 between two feature components is defined as where cov(X 1 , X 2 ) represents the covariance between two variables X 1 and X 2 . σ X 1 and σ X 2 represent the standard deviation of the two variables. μ 1 and μ 2 represent the mean value of the two variables. When the absolute value of the correlation coefficient r X 1 ,X 2 is larger, it means that the correlation between the two variables is stronger. Activated by [7], Lu et al. [24] used a Fisher criterion function to measure the separability of a single feature component as follows: where μ C d and σ C d denote the mean value and standard deviation of the d-th feature components of the features of the cover samples and μ S d and σ S d denote the mean and standard deviation of the d-th feature components of the features of the corresponding stegosamples, respectively. (μ C d − μ S d ) 2 reflects the degree of dispersion between the d-th feature components computed from the cover and stegosamples. e larger the value is, the greater the difference between cover and stegosamples will be. (σ C d ) 2 + (σ S d ) 2 reflects the degree of intraclass aggregation of the d-th feature components computed from the cover and stegosamples. Accordingly, the smaller the value of (σ C d ) 2 + (σ S d ) 2 is, the smaller the intraclass difference will be. In steganalysis, when the Fisher value is larger, it means that the feature component contributes more to the detection of stegoimages.

Rich Model.
Currently, the Rich Model steganalysis feature is still one of the most effective methods to detect adaptive steganography. is feature is proposed by Fridrich et al. in 2012 [13], and its extraction procedure is shown in Figure 1. Firstly, a variety of linear and nonlinear high-pass filters are used to filter the image from diverse directions and angles, resulting in various types of residual images. en, the fourth-order cooccurrence matrices are computing from each residual image along with different directions. Due to the symmetry of the cooccurrence matrix, some cooccurrence matrices are merged to form a new residual cooccurrence matrix. Finally, each new-formed cooccurrence matrix is regarded as a submodel feature, and all submodel features are combined to the final Rich Model steganalysis feature.

Feature Selection of Rich Model Based on Correlation
In this section, the principle and procedure of the feature selection method of the Rich Model steganalysis features is proposed. en, an algorithm is proposed to search the proper correlation coefficient threshold.

e Method of Feature Selection.
Among the feature components of the high-dimensional Rich Model steganalysis feature, there are some feature components whose variances are 0 in both cover and stegoimages. Particularly, this situation is common in the feature components computed from the quantized DCT coefficients in JPEG image. ese feature components have no positive effect on distinguishing the cover and stegoimages. erefore, the feature components whose variances are 0 in cover samples and stegosamples will be eliminated at first. e obtained features are F C and F S whose dimensions are D.
Next, equation (2) (1) is used to calculate the correlation coefficient matrices of the components in the cover image feature and the components in the stegoimage feature, respectively. e correlation coefficient matrices are denoted as follows: where R C and R S are both symmetric matrices with size of D × D and the element r c ij and r s ij represent the strength of the linear correlation between the i-th feature component and the j-th feature component of cover image feature and stegoimage feature, respectively, −1 ≤ r C ij , r S ij ≤ 1. When r c ij � −1 and r s ij � −1, the two feature components are fully negative linear correlation. Correspondingly, when r c ij � 1 and r s ij � 1, the two feature components are fully positive linear correlation.
Finally, since R C and R S are symmetric matrices, it is only necessary to traverse the elements below (above) the main diagonal in order, such as . en, when the element satisfies with equation (4), the feature component where T is a preset correlation coefficient threshold. According to equation (4), the selection rules of feature components are provided.
(1) When r c ij > T and r s ij > T, there is a strong positive linear correlation between f i and f j , feature component f j is retained, and feature component f i is removed Based on the above rules, the main procedure of the proposed method is shown in Figure 2. Taking the SRM [13] as an example, firstly, the high-dimensional SRM steganalysis feature is divided into 106 submodel features. en, each submodel removes the feature components whose variance is 0 in cover and stegoimage. Next, feature components of each submodel are sorted according to the Fisher value obtained by equation (2). After that, the effective feature components are selected based on the proposed feature selection rules. Finally, the feature components selected from each submodel are combined to a final feature for steganalysis.

Selection Algorithm of Correlation Coefficient reshold T.
In the feature selection process, the correlation coefficient threshold T is of critical significance. When the value of T is larger, the dimensionality of the selected feature is smaller, Security and Communication Networks 3 and vice versa. e selection of the threshold T should follow the idea that the selected feature components can retain the diversity of the original feature as much as possible, and the redundant feature components can be removed as much as possible. In addition, in order to ensure the finiteness of the process, the significant digits of T are set as 5. Algorithm 1 describes the steps for selecting the proper T.

Experimental Results and Analysis
In this section, the proposed feature selection method was tested on typical JPEG image steganalysis feature CC-JRM and spatial image steganalysis feature SRM for the steganography algorithms J-UNIWARD and S-UNIWARD.  [13] were obtained. On the other way, 10,000 grayscale images were compressed by JPEG standard with a quality factor of 75 to generate the cover JPEG images. e 10,000 cover JPEG images were used to generate stegoimages by J-UNIWARD with payloads of 0.1,0.2,0.3,0.4, and 0.5. en, six groups of CC-JRM steganalysis features (22510 dimensions) [17] were extracted from the cover images and stegoimages.
In experiments, an ensemble classifier [27] was trained. For each payload, cover images and stegoimages were divided into two groups; one group was used for training which contained 5000 randomly selected cover images and 5000 corresponding stegoimages. e remaining images composed the testing group. e performance of steganalysis is evaluated by P E which is the mean value of P E over 10 tests. P E is computed by the false alarm and missed detection as follows: where P FA represents the false alarm, namely, the probability of judging a cover image as a stegoimage, and P MD represents the missed detection, namely, the probability of judging a stegoimage as a cover image. Besides, the average detection accuracy P A � 1 − P E was used to reflect the detection performance more intuitively. e larger P A is, the better the performance of steganalysis is.

Effectiveness Analysis of Feature Selection.
is section will explore the relationship between dimensions of the selected feature and the detection performance in more aspects.
e MMD (Maximum Mean Discrepancy) mentioned in [28] will be used to measure the similarity between the feature distributions of cover images and stegoimages, so as to describe the classification performance of the selected feature. Taking X and Y samples as an example, the formula of MMD is as follows: where x i represents the feature extracted from the i-th cover image, correspondingly, y i represents the feature extracted from the i-th stegoimage, m and n, respectively, represent the number of cover images and stegoimages, and k(·, ·) is the radial basis function (RBF). When the MMD value between X and Y is smaller, the feature distributions of cover images and stegoimages are more similar, and the performance of classification is worse, and vice versa. Figure 3 shows the scatter plot of the MMD and detection accuracy for different numbers of feature components selected from the Ah_T3 submodel feature in the CC-JRM steganalysis feature [17] by the proposed selection method.
e results were computed on the cover JPEG images and their stegoversions of J-UNIWARD with a payload of 0.5. e red "▲" represents the MMD values of the selected feature components, and the blue "•" represents the corresponding detection accuracy. It can be found that as the number of the selected feature components increases, the MMD value and detection accuracy also increase. When the number of selected feature components reaches a certain range, the MMD value and detection accuracy tend to be flat, which means that the subsequent addition of strongly correlated feature components does not bring much improvement to the MMD value and detection accuracy. Figure 4 shows the scatter plot of the MMD value and detection accuracy for different numbers of feature components selected from the s1_minmax22v_q1 submodel feature in the SRM steganalysis feature [13]. e used stegoimages were generated by the S-UNIWARD algorithm with a payload of 0.5. e corresponding result can be found in Figure 4.
Additionally, we also selected the feature components from the Dix2_T2 submodel of CC-JRM steganalysis feature [17] and the s1-minmax22h_q1 submodel of SRM steganalysis feature [13] and then compared the performances of the selected features for different payloads with randomly selected features. In order to obtain ideal selected features, proper T was searched by Algorithm 1. e experimental results are shown in Tables 1 and 2. It can be seen that the proposed feature selection method nearly maintains the original detection accuracy, and the detection accuracy is higher than that of the randomly selected feature for each case. is should be attributed to the following two reasons. On the one hand, strongly correlated feature components are redundant, and they do not affect the detection performance. On the other hand, the Fisher criterion ensures that the retained feature components are effective instead of randomly selecting.

Performance Test of the Feature Selection Method for CC-JRM Steganalysis Features.
e CC-JRM steganalysis feature proposed by Fridrich et al. is a typical Rich Model steganalysis feature for JPEG image steganography, which can effectively detect the common JPEG image steganography algorithms, such as J-UNIWARD [29], UED [30], and nsF5 [31]. Under different threshold T, the proposed method was used to select feature components from CC-JRM features of cover and stegoimages with a payload of 0.1. Table 3 shows the number and detection accuracy of the selected feature components.
It can be seen from Table 3 that the detection accuracy of the original 22510-D CC-JRM in steganalysis is 53.06%. e optimal detection accuracy of selected features can reach 53.48%, which is slightly higher than that of the original features. Meanwhile, the number of these selected feature components is 4436, which is only 19

Security and Communication Networks
proposed method can effectively reduce the feature dimension while maintaining the detection accuracy of steganalysis. Figure 5 shows the detection accuracy of CC-JRM before and after feature selection with a payload of 0.1. e horizontal axis represents the number of feature components, and the vertical axis represents the detection accuracy. e blue "▲" represents the detection accuracy of the selected most proper feature. e green "▼" represents the detection accuracy of the original feature. It can be inducted from the figure that when the feature components are insufficient, the detection accuracy performance is not desirable because some valid features are not selected. Meanwhile, when the feature components are retained too much, some redundant and even harmful feature components may be brought. Moreover, retaining too many redundant feature components will increase the time and space complexity of training and classification. And as the number of the feature components gradually approaches the original feature dimension, the detection accuracy of steganalysis will not improve significantly.
In order to test the performance of the method under other payloads, the paper also conducted experiments with the payloads of 0.2, 0.3, 0.4, and 0.5, and the results are shown in Figure 6. e same as Figure 5, the horizontal axis represents the number of feature components, and the vertical axis represents detection accuracy. From Figures 6(b) to 6(d), it can be seen that the dimensionality of steganalysis feature can be effectively reduced while the detection accuracy is not significantly affected, and the difference between the detection accuracies before and after selection is within 0.4%. In Figure 6(a), it can be found that the detection accuracy is even improved in the case of low payload.
e feature selection method proposed in this paper is also compared with those of Zhang et al. [25] and Ma et al. [26]. e specific results are shown in Figures 7 and 8. e four different colors in the figures represent the experimental results of different feature selection methods. e detection accuracies of three selection methods are not much different from the original features. However, the dimensionality of the selected feature by the proposed method is significantly lower than those of the other methods. e reason is that the proposed method eliminates redundant feature components strongly correlated to the remained feature components, but the other two methods just select effective feature components and do not consider the redundant relation between feature components. On the other hand, Ma's method selects the optimal feature components from overall features and maximizes the retention of beneficial feature components in the original features. erefore, although the dimensionality reduction of Ma's method is not as good as the proposed method and Zhang's method, the detection performance of selected features by Ma's method is slightly better than theirs.

Performance Test of the Feature Selection Method for SRM Steganalysis Features.
e previous section has tested the effectiveness of the proposed method in JPEG domain. In order to explore the generalization ability of the method, this section tests the effectiveness of this method in the spatial domain. e same as in JPEG domain, this section sets the experimental groups with payloads of 0.1, 0.2, 0.3, 0.4, and 0.5 (bbp), respectively, and the experimental results are shown in Figure 9.
As in the previous section, Figures 9(a)∼9(e) provide the detection accuracy of SRM before and after feature selection with payloads of 0.1, 0.2, 0.3, 0.4, and 0.5, respectively. e horizontal axis represents the feature number, and the vertical axis represents detection accuracy. e blue "▲" represents detection accuracy of the most proper feature after selection, and the green "▼" represents detection accuracy of the original feature. It can be seen from Figure 9 that, with the gradual increase of the feature number, the detection accuracy of steganalysis will also increase. When the feature number increases to a certain value, the detection accuracy of steganalysis will stabilize. Experiments show that the difference of the detection accuracy between the selected feature by the proposed method and the original feature is within 0.3%. Moreover, the dimensionality of the selected feature is far lower than that of the original feature (less than 50%). Based on the above analysis, the proposed method is still effective in the spatial domain.
In addition, the comparison results between the proposed method and the method proposed by Zhang et al. [25] and Ma et al. [26] are approximately the same as in JPEG domain. e experimental results are shown in Figure 10.
It can be seen that the dimensionalities of selected features by the proposed method are obviously lower than those of selected features by the other two methods. In the case of low payloads, the comparison is particularly conspicuous.
In summary, the feature selection method proposed in this paper can effectively reduce the JPEG domain feature dimension and spatial domain feature dimension. Moreover,        for low payloads of JPEG image steganography, the selected JPEG domain feature has a certain improvement in detection accuracy compared with the original feature. In addition, the dimensionality reduction of JPEG domain feature is greater than that of the spatial steganalysis feature. e main reason is that the JPEG domain Rich Model steganalysis feature has a stronger linear correlation.

Conclusion
At present, the Rich Model steganalysis features have favorable detection results for adaptive steganography; however, the Rich Model steganalysis features have disadvantages of high dimensionality and slow training. In order to reduce the computation cost of training brought by the high-dimensional Rich Model steganalysis features, a new Rich Model steganalysis feature selection method is proposed based on the correlation of feature components. e experimental results show that the proposed method can effectively reduce the dimensionalities of features while maintaining the detection accuracy of steganalysis.
Although the method in this paper has taken the correlation between feature components into consideration, the positive role of the complementarity between feature components in steganalysis is still worthy of further study. In the next step, we will also try to combine the proposed method with deep learning in steganalysis [32] and even extend it to other fields, such as visual search [33].
Data Availability e cover images used in this manuscript were downloaded from http://agents.fel.cvut.cz/stegodata/.