Visual Security Assessment via Saliency-Weighted Structure and Orientation Similarity for Selective Encrypted Images

Selective encryption has been widely used in image privacy protection. Visual security assessment is necessary for the effectiveness and practicability of image encryption methods, and there have been a series of research studies on this aspect. However, these methods do not take into account perceptual factors. In this paper, we propose a new visual security assessment (VSA) by saliencyweighted structure and orientation similarity. Considering that the human visual perception is sensitive to the characteristics of selective encrypted images, we extract the structure and orientation feature maps, and then similarity measurements are conducted on these feature maps to generate the structure and orientation similarity maps. Next, we compute the saliency map of the original image. (en, a simple saliency-based pooling strategy is subsequently used to combine these measurements and generate the final visual security score. Extensive experiments are conducted on two public encryption databases, and the results demonstrate the superiority and robustness of our proposed VSA compared with the existing most advanced work.


Introduction
Nowadays, with the widely pervasive usage of interaction devices, such as cameras, cloud storage devices, and the explosive growth of digital images, privacy protection has attracted a lot of attention from researchers [1][2][3][4][5]. Various security schemes, such as digital watermarking [6][7][8], steganography [9], and encryption [10], have been developed to protect copyright, and encryption is the mostly accepted approach which can ensure the security and integrity of data all the time. Roughly, the existing image encryption methods can be divided into two categories: full encryption and selective encryption. Full encryption refers to encrypting the entire image; therefore, we cannot get any information about the original image from the encrypted image. However, the content information of the image cannot be revealed if several kinds of redundant information are unencrypted. For this reason, traditional full encryption methods such as AES are not suitable for image data, because these methods always encrypt all the information of the image which cost a lot of time.
erefore, researchers proposed the selective encryption algorithm that has been widely used to protect the visual content of multimedia by only encrypting the specified parts of the multimedia data. A great variety of selective encryption algorithms [10][11][12][13][14][15][16] have been proposed in recent decades. Compared with full encryption algorithms, selective encryption has two main advantages as follows. First, it can be extremely fast on encryption and decryption because only a portion of the data needs to be encrypted. Second, the selective encrypted multimedia data can prevent the abuse of the essential visual property of the original data. ese advantages make selective encryption highly desirable for protecting more and more image and video data which hide a large amount of personal privacy on the network. e purpose of security analysis for selective image encryption is to measure the degree of visual security of selective encrypted images. Visual security analysis can measure the performance of the selective encryption methods and then help us to optimize the encryption methods. Since humans are the ultimate receivers of images, subjective tests conducted by human viewers are the most suitable and accurate way to evaluate the visual security of selective encrypted images. However, such tests are too timeconsuming and laborious to accomplish real-time applications. For this reason, visual security assessment [17] (VSA) is proposed to evaluate the visual security of selective encrypted images by measuring the unintelligibility or unrecognizability of the image automatically, which indicates the amount of useful information about the original image that an attacker can obtain from its selective encrypted image via visual perception. e higher the unrecognizability degree of the selective encrypted image is, the less visual information the attacker can obtain. In such a situation, it becomes more difficult for an attacker to obtain information about the original image and the selective encryption method is more secure.
In the past decades, many efforts have been conducted to design VSAs. At the beginning, researchers believe that visual security of images has a strong relationship with image quality. erefore, they directly used the well-known image quality assessment [18] (IQA) methods to evaluate the visual security degree of selective encrypted images, such as peak signal-tonoise ratio (PSNR), structural similarity (SSIM) [19] and visual information fidelity (VIF) [20], and the images with lower quality tend to have higher security. However, these metrics may be inconsistent with the concept of security strength. For example, an image with a lower PSNR value may be even more recognizable than one with a higher PSNR value. ese IQA methods do not take full account of the characteristics of the selective encrypted images. For selective encrypted images, an important feature is that the skeleton of the image is still intelligible but the details are almost unintelligible [21]. On the other hand, the structure information can express the skeleton of an image which plays a more important role in selective encrypted images. Subsequently, several VSAs have been developed based on some visual features of selective encrypted images, e.g., edge similarity score (ESS) [22] based on the edge, luminance similarity score (LSS) [22] utilizing luminance feature, local feature-based visual security (LFBVS) [23] using luminance and localized gradient, and the visual security indexbased Canny (VSI-Canny) [21] which extracted edge and texture features. However, these VSAs do not fully consider the role of visual perception [24][25][26] factor in VSAs, because the visual perception of each region differs from another according to the principle of the human visual system (HVS), which also have different impacts on visual security evaluation. Additionally, HVS presents an obvious visual saliency mechanism. HVS focuses only on these important regions for detailed perception and withdraws the other regions. e regions have high saliency values play more important roles than the other regions for visual perception, and the information leakage on the high saliency regions has a larger influence on the visual security assessment.
Motivated by the problems mentioned above, in this paper, we propose a visual security assessment via saliencyweighted structure and orientation similarity. Structure is the basic element that conveys important visual information, and selective encryption can cause obvious structure changes of an image [21]. erefore, we can measure the visual security of selective encrypted images by the change of structure. e gradient magnitude (GM) and the phase congruence [27] (PC) are widely used to extract the image structure information. However, GM and PC cannot effectively reflect the structure degradation in the selective encryption images. GM is sensitive to luminance and it can well reflect the changes of image luminance [28]. However, this characteristic of GM also makes it is not effective to extract the structure information of the areas with similar grayscale values. Compared with GM, PC is not affected by luminance [27]. However, PC cannot extract the clear structure information of the areas with similar frequencies as it is calculated based on frequency [27]. erefore, we integrated PC with GM to obtain the structure features of the selective encrypted images. Studies show that HVS is highly adapted to extract orientation information [29] and selective encryption can cause obvious orientation changes of an image. erefore, we can extract the orientation information of a selective encrypted image to measure its security. e structure and the orientation feature maps are extracted from both original and selective encrypted images. Finally, an image saliency-based pooling strategy is introduced to combine these measurements and generate a visual security score. Our main contributions can be summarized as follows: (1) We propose to extract the structure and orientation features for the visual security evaluation of the selective encrypted images, because selective encryption can cause obvious changes in structure and orientation of an image and the HVS is highly sensitive to the change of structure and orientation. We combine GM and PC to extract the structure information to measure the structure similarity of original images and selective encrypted images and utilize the change of image orientation to measure the orientation similarity. (2) Considering that different regions of an image have different effects on visual security assessment of selective encrypted images, we combine the saliency map with the structure and orientation similarity maps to generate the final VSA. (3) We conduct comparative experiments on two common encryption image databases to evaluate the performance of our proposed VSA. e experimental results show that the proposed method achieves superior and robust performance compared with other state-of-the-art VSAs, especially on the images in low-and moderate-quality ranges. e structure of the rest of the paper is as follows. Section 2 reviews the related work. e details of our proposed VSA method are in Section 3. en, we describe the experimental evaluation of our proposed VSA and existing VSAs in Section 4. Finally, Section 5 concludes this paper.

Related Work
A variety of methods have been proposed to estimate the visual security of selective encrypted images. e initial solutions usually employ well-known IQAs to evaluate the visual security. Subsequently, several VSAs have been proposed to evaluate the visual security.

Image Quality Assessment.
Many researchers believe that the images with higher visual security tend to have lower visual quality, so many IQA methods designed for the assessment of image visual quality have been employed to measure the visual security of selective encrypted images. For instance, PSNR is the simplest and the most widely used method [30,31]. PSNR, which evaluates visual security by calculating the Euclidean distance between the original image and distorted image, is the simplest and most popular visual quality assessment metric. SSIM [19] is also adopted for visual security evaluation [30,31] by measuring the similarities of luminance, contrast, and structure between two images in consideration of the HVS. VIF [20] is another IQA method used to estimate the visual security of selective encrypted images. It measures the amount of information contained in original and selective encrypted images, respectively, and then measures the relationship between image information and visual quality. However, these IQA metrics often exhibit unsatisfactory performance when they are used to estimate the visual security of selective encrypted images of low quality. Since the task of IQA is inconsistent with that of VSA, an image with poor visual quality may not indicate its visual security [21].
For example, an image with a higher VIF, PSNR, or SSIM may even be more visually secure than one with a lower value of one of these indices. Figures 1(a)-1(c) show the performance of the PSNR, VIF, and SSIM indices on several images from the PEID database [33]. Figure 1 Figure 1(c) has a higher visual security, but this image is found to have better visual quality as assessed using the PSNR, VIF, and SSIM.
We can find that many IQAs cannot achieve excellent performance on visual security assessment because the targets of image quality assessment and visual security assessment are different: image quality assessment focuses on the fidelity of an image, but visual security assessment is concerned with the leakage degree of an image's content.

Visual Security Assessment. Several VSAs have been
proposed to evaluate the visual security of selective encrypted images. ey are usually more accurate and effective than IQA methods because they are specifically designed for the visual security evaluation of selective encrypted images. Mao and Wu [22] proposed the ESS and LSS to compute the edge similarity and the luminance similarity between original and selective encrypted images. However, the ESS and LSS focus only on local information of the images, which may not cover the various types of distortions that appear in selective encrypted images. Tong et al. [23] presented the LFBVS by considering various types of distortions present in selective encrypted images and measured the similarities of luminance and the localized gradient between original and selective encrypted images. Although the LFBVS utilizes more visual information compared with the ESS and LSS, its performance is still unsatisfactory when tested on various encrypted image databases. Xiang et al. [21] proposed the VSI-Canny by calculating the edge and texture similarities between original and selective encrypted images. VSI-Canny considers more visual features of selective encrypted images and has relatively good performance, but it does not consider the image's visual saliency, which is a critical property of the HVS. As mentioned above, the problems of the existing visual security metrics exhibit many aspects. ese questions will lead to the inaccurate evaluation of image security by visual security indicators. We consider and address these issues in our proposed scheme, as described in the following section.

Proposed Visual Security Assessment
In this work, we describe our proposed VSA and the flowchart of the proposed VSA is shown in Figure 2. First, we combine GM and PC to extract the structure information and we can compute the structure similarity map of the original and selective encrypted image. Secondly, based on the fact that the HVS is sensitive to the change of orientation, we extract the orientation information by the GM and we can compute the orientation similarity map. Next, considering that the security of a selective encrypted image depends on the degree of disclosure of its visual content, which is obtained by comparing it with the original image, we only compute the saliency map of the original image. At last, the generated structure and orientation similarity maps are further fused by saliency-based polling method to obtain the final score.

Structure Similarity.
e structure of an image has important information which is highly sensitive to the visual perception. Both GM and PC can extract structural information of images and we found that they can complement each other. erefore, we integrated the GM map with the PC map to generate the structure features of selective encrypted images.

Gradient Magnitude.
Image gradient magnitude can be defined as a transition in intensity. e GM of an image is represented by a vector which consists of gradient in the horizontal and vertical directions at each pixel, and it reflects the maximum strength of structure variation. e gradient magnitude of an image is defined as where (i, j) is the index of an image I. In this work, for image I, G h and G v are calculated as where * and T denote the convolution and transpose, respectively, and F is the gradient operator: As shown in Figure 3(b), it can be seen that the GM maps of the encrypted images have obvious changes. However, there are no obvious changes in the GM values in some areas with similar grayscale values. GM is sensitive to luminance; therefore, it can well reflect the changes of image luminance [28]. However, this characteristic of GM also makes it is not effective to extract the structure information of the areas with similar grayscale values.

Phase Congruency.
e phase congruency model [27], which is based on frequency domain processing of an image, means that features with similar edges appear more frequently at the same stage. It assumes that the visual system is more competent in performing operations using the phase and amplitude of the individual frequency components in an image than handing of image information spatially. Compared with GM, PC is invariant to local smooth luminance changes. Given an image I, its PC is computed as with where W(i, j) denotes the weighting parameter to reduce the effect of frequency spread at position (i, ) − 1 denotes the manipulating function by weighting; N is the scale number; c offers a cutoff value for penalizing low PC values under it; l norm (i, j)is the normalized luminance at (i, j) to avoid the effect of luminance. c, as the gain variable, controls the cutoff sharpness; and symbol ⌊ · ⌋ aims to set negative value to zero. To determine two-dimensional phase congruency of a given image, the image is first convoluted with a Log-Gabor filters bank s and θ are the scale and orientation of the Log-Gabor filter. And the even symmetric filter and odd-symmetric filter at scale s and orientation θ are M E sθ and M O sθ , respectively. A sθ (i, j) and φ sθ (i, j) represent the amplitude and phase at position (i, j), respectively; T is a quantity introduced to compensate image noise; ε is a small positive constant to preserve stability; φ sθ (i, j)represents the mean value of phase. Since it is out of scope to investigate these parameters' influence on the PC map, in this study, we directly set them according to [27]. e effectiveness of PC can be demonstrated in Figure 3(c), and the PC maps of the encrypted images have obvious changes. However, there are no obvious changes in the PC values in some areas with similar frequencies. Compared with GM, PC is not affected by luminance [27]. However, PC cannot extract the clear structure information of the areas with similar frequencies as it is calculated based on frequency [28].

Structure Map
Integrating GM with PC. As mentioned above, GM and PC cannot effectively reflect the structure degradation in the selective encryption images. erefore, after extracting the GM and PC of the image, we proposed to reflect the structure map of the image by integrating GM with PC, and the structure information ST of the image can be obtained as where (i, j) is the index of the pixel, GM max represents the maximum value of GM, the maximum value of the corresponding positions of GM and PC is used to form ST, and GM uses the maximum value for normalization. If one of the two values has a larger value, the pixel is considered to be a structural feature point, and the maximum fusion strategy Security and Communication Networks can comprehensively extract the structural features of the image. GM and PC can play complementary roles in extracting structural information. Here, we give an example in Figure 3 to demonstrate the effectiveness of our structure extraction method. Figure 3 shows different feature maps from the reference images and their selective encrypted images. Apparently, as compared with the other feature map, the proposed structure map has a remarkable effect on structure degradation of a selective encrypted image. From Figure 3(d), we can observe the more accurate and clearer structure information in the selective encrypted image. Analogous to the practice exercised in [19,34], the structure similarity map S ST of the original and the encrypted images can be measured as where S ST (i, j)ε(0, 1], S O (i, j) and S E (i, j) are the structure maps of the original image O and the encrypted image E, respectively, and R is a positive constant used to avoid instability when the denominator converges to zero.

Orientation
Similarity. In addition to the structure similarity, we also consider the orientation similarity between the original and encrypted images because the orientation information is an indispensable element for human visual perception. e orientation of an image, which has been widely used to the image quality assessment [35], conveys important information [29], which has an important effect on the visual security evaluation of selective encrypted images. e orientation change of each pixel can reflect the degradation of the selective encrypted image details.
A visual pattern was built by orientation information in [35], which can be used for IQA. However, this pattern ignores some intuitive visual information; therefore, this pattern does not fully apply to VSA.
Considering the above question, we design a new algorithm to compute the orientation similarity. In this work, for an image I, the preferred orientation of each pixel is calculated as its gradient direction θ I : where G h (i, j) and G v (i, j) are the gradient magnitudes along the horizontal and vertical directions, respectively, which can be obtained from equation (2). And (i, j) is the index of the pixel in I. So, we can obtain the quantitative orientation information. We give an example in Figure 3, and we can find that the orientation information of the image has obvious changes. en, we compute the orientation change D O of the original image O and its encrypted image E by calculating their distance: en, the orientation similarity map S O of the original and the encrypted images can be measured as

Saliency-Weighted
Pooling. It is observed that different regions have drastically different effects on the visual understanding of an image. Most of the contribution to visual perception is provided by the information loss and distortion in important regions. An image importance map refers to the important regions that provide a greater contribution to the visual perception, and such maps have been studied extensively in recent years. So, we highlight these important regions and suppress the other regions with a salient map for visual content extraction. To this end, the salient value of each pixel is required. As illustrated in Figure 3(f ), visual saliency map highlights the important regions in an image, and the visual saliency map can extract the important areas of an image and then get a better VSA. In the past decades, a large number of saliency models [36][37][38][39][40] have been proposed and these models can help us complete a better VSA. S ST (i, j) and S O (i, j), obtained by equations (7) and (10), respectively, are two feature similarity maps with the same size as the image. However, we need a VSA score to represent the visual security. erefore, we need a pooling method to compress the two feature maps into two scores to represent the feature similarities. In our work, we take the simple and classic saliency-weighted pooling method. Considering that the security of a selective encrypted image depends on the degree of disclosure of its visual content, which is obtained by comparing it with the original image. So, we select the original image's saliency map SM O (i, j) to combine with the structure similarity map S ST (i, j) and orientation similarity map S O (i, j), respectively: . (11) Considering that different saliency models affect the performance and communicational cost of our proposed VSA, we calculate the performance and running time of different saliency models. To eliminate the possible bias due to specific image selection, we randomly choose 100 images from the IVC-SelectEncrypt database and then calculate the average running time as the computation cost of each saliency model. Table 1 shows the results. From Table 1, we can find that the more appropriate saliency model is GBVS. erefore, as a simple but powerful saliency model, graphbased visual saliency [36] (GBVS) is employed. A saliency map of an original image generated by GBVS can be seen in Figure 3(f ).
After performing the similarity measurements on the structure and the orientation features between the original and encrypted images, respectively, the generated structure similarity VS ST and orientation similarity VS O are combined together to calculate the visual security: where α and β are two parameters used to adjust the relative importance of VS ST and VS O . e structure and orientation features of an image are important which are highly sensitive to the visual perception. For selective encrypted images, an important feature is that the skeleton of the image is still intelligible but the details are almost unintelligible. erefore, structure obviously plays a more important role than orientation, and we explore the effect of the structure information and orientation information, respectively, in Table 2. So that the value of α should be greater than β. In our experiments, α and β are set to 0.8 and 0.2, respectively, because this setting was found to be optimal.

Experiments
In this section, the performance of our proposed VSA is analyzed by comparing with other IQAs and VSAs. We evaluate the performance from confidence, monotonicity, linearity, and accuracy and provide comparisons with other IQAs and VSAs.

Security and Communication Networks
e IVC-SelectEncrypt database consists of 8 original images, 200 encrypted images are generated from them using 5 different encrypted algorithms with 5 different encryption degrees. e range of its mean opinion scores (MOS) is [1,5].
e PEID database has 1080 encrypted images obtained from 20 original images by using 10 encryption schemes. It has two subjective scores: the visual quality score and visual security score. We use only the visual security score here because our task is visual security assessment, and the range of its mean opinion scores (MOS) is [0, 6].

Evaluation Methodology.
We evaluate the performance from confidence, monotonicity, linearity, and accuracy.
Confidence is utilized to establish how well a VSA actually reflects the human judgment [17]. Given a subjective score x(x ε MOS) on a database D, and for this score x, each image I has a subjective score V I . We define V max (x) as the maximum of the objective scores of those images on D and define V min (x) as the minimum of the objective scores.
Confidence C x � | V max (x) − V min (x)| measures the difference between these two extrema. e normalized mean confidence µ D , the normalized standard deviation σ D , and the normalized maximum confidence max D are the evaluation criteria which are generated based on C x .
To ascertain the correlation between the subject VSA scores and object scores MOS, we compute the Spearman rank correlation coefficient (SRCC), the Kendall rank correlation coefficient (KRCC), the Person linear correlation coefficient (PLCC), and the root mean-squared error (RMSE). SRCC and KRCC can evaluate performance monotonicity, PLCC can evaluate linearity, and RMSE can evaluate accuracy. Before the calculation of the correlation between the subject VSA scores and object scores MOS, a five-parameter logistic regression function is applied to reduce the nonlinearity of the subject VSA scores [33], which is defined as where S′ is the fitted VSA score, S is the objective VSA score, and β i (i � 1, 2, 3, 4, 5) denotes the parameters determined via curve fitting. A better VSA should have lower µ D , σ D , max D , and RMSE values but have higher SRCC, KRCC, and PLCC values.

Overall Evaluation.
e results of the confidence evaluation of all IQAs and VSAs on the IVC-SelectEncrypt database are shown in Figure 4. A better VSA should have lower and more stable C x values. From Figure 4, we can find that the C x values of VIF, VSI-Canny, and our VSA are more stable. Table 4 lists the overall performance of all IQAs and VSAs on the IVC-SelectEncrypt and PEID databases, and the best is marked in bold. Obviously, our proposed VSA performs best on IVC-SelectEncrypt. On the PEID database, VIF achieves the best monotonicity (the highest SRCC and KRCC) and the lowest RMSE, LSS has the lowest σ D , ESS achieves the lowest max D , and our proposed VSA achieves the best μ D and PLCC. Although our proposed VSA is not the best in some values, it is very close to the best one. Compared with other methods, the extracted structure and orientation features of our proposed VSA are more consistent with HVS because HVS is very sensitive to structure and orientation changes caused by selective encryption. In addition, we also considered the visual saliency that was not considered by other IQAs and VSAs. erefore, it is clear and reasonable that our proposed VSA exhibits the better overall performance.

Evaluation on Different Quality Ranges.
e selective encrypted images usually have low and moderate visual quality [21,26]. erefore, to evaluate the performance of these VSAs more comprehensively, we should evaluate the performance of these VSAs on different image quality ranges (i.e., low, moderate, and high). e detailed division information can be found in Table 2. Considering that the     Security and Communication Networks selective encrypted images are typically in the low-or moderate-quality ranges, it is more important to evaluate the performance of VSAs in the low and moderate image quality ranges than in the high-quality ranges [21,26]. e comparison results of different VSAs in different image quality ranges on the two test databases are shown in Table 5. We can find that our proposed VSA has better performance compared with the other VSAs in the low and moderate image quality ranges. In the low image quality range, on the IVC-SelectEncrypt database, VIF shows superior performance in max D , LSS shows the best performance on PLCC, SRCC, KRCC, and RMSE, our proposed VSA achieves the best value on µ D and σ D , and other values are very close to the best one. On the PEID database, VIF shows superior performance in confidence evaluation (lowest μ D , σ D , and max D ), and our proposed VSA achieves the best performance on PLCC, SRCC, KRCC, and RMSE. In the moderate image quality range, our proposed VSA achieves the best performance on monotonicity, linearity, and accuracy evaluation on the two databases. In the high image quality range, SSIM obtains the best performance on IVC-SelectEncrypt database; on the PEID database, various VSAs exhibit satisfactory performance in different aspects. In summary, our proposed VSA exhibits better performance in low and moderate image quality ranges on the two databases. In the low and moderate image quality ranges, the structure and orientation changes caused by selective encryption are more obvious. And the saliency of the original image can extract the more important areas of the images which is important for the visual security assessment of the selective encrypted images. erefore, it is rational that our proposed VSA shows the better performance in low and moderate image quality ranges.

Evaluation on Different Encryption Types.
We also evaluated the different VSAs on various types of encryption on the two test databases to more comprehensively evaluate  the performance of all VSAs.
ere are 15 different encryption types in the test on the two databases. Tables 6 and 7 report the performance results of all encryption types that appear in the test databases, respectively.
From Table 6, we can find that our proposed VSA has better monotonicity performance (the higher SRCC and KRCC values) than other IQAs and VSAs on the two databases. More specifically, our proposed VSA achieves the highest SRCC hit-count (8 times) and KRCC hit-count (7 times), and this value is higher than those of the other metrics. We can also find that our proposed VSA still has the highest PLCC hit-count (7 times) and RMSE hit-count (7 times) from Table 7.
From Tables 6 and 7, we can see that all of the involved VSAs obtain relatively inferior performance on encryption types enc08 and enc09 in the PEID database, and our proposed VSA is still relatively great on enc09 but relatively poor on enc08. As shown in Figure 5, the distortions caused by these two encryption methods are different from other methods that they make the images warping. erefore, the features extracted from these encrypted images cannot match the features of the original images and these VSAs and IQAs have not good performance on the two encryption methods. is situation also results the overall performance of our method at PEID being worse than IVC-SelectEncrypt. Except for enc08 and enc09, other encryption methods cause     obvious structure and orientation changes of images. erefore, it is reasonable that our proposed method shows the better performance because the features of our proposed VSA are more relevant to the content leakage caused by most encryption methods.

Computational Complexity.
Finally, considering that the running time is important in many practical applications, we analyze the computational cost of all VSAs. In our test, we measure the computational cost of a VSA on 512 × 512 images. We perform experiments using the original code in the MATLAB R2016b on a 64-bit Windows 7 operating system at 16 GB memory and 3.20 GHz frequency of Intel processors. To avoid the possible bias caused by selecting the specific images, we randomly choose 100 images from the PEID database and then calculate the average running time as the computation cost of each VSA.
It is known from Table 8 that most of the metrics are fast to compute. By contrast, PSNR and SSIM are the fastest methods but they are mainly used for image quality assessment, and their performance is not excellent. VIF is also an IQA method which has a relatively good performance, but its running time is much higher than other methods because its computational model is more complex. Compared with other VSAs, our proposed VSA has a faster running speed. In implementation, our method takes up most of the time in feature extraction procedure. In the future, we will try to explore more efficient feature extraction techniques to reduce the computational cost of the proposed method.

Conclusions
In this paper, we have presented a novel visual security assessment (VSA) that makes use of the structure and orientation information. First, we extract the structure of the original and the encrypted images by combining PC and GM. en, we extract the orientation information by the GM, and we can obtain similarity measurements by calculating the structure and orientation similarity maps. Meanwhile, we compute the saliency map of original image. en, we utilize a saliency-based polling strategy to combine these two similarity maps and generate the final VSA score. We conduct extensive experiments to evaluate the performance of our proposed VSA and compare it with other IQAs and VSAs which are widely used for the visual security assessment for encrypted images on two encryption image databases. e experimental results show that our proposed VSA has better performance and stronger robustness than all existing IQAs and VSAs, especially in the range of low and moderate image quality.
ese prior studies (and datasets) are cited at relevant places within the text as references.