Image Quality Evaluation Based on Gradient, Visual Saliency, and Color Information

This paper proposes an image quality evaluation (IQE) metric by considering gradient, visual saliency, and color information. Visual saliency and gradient information are two types of e ﬀ ective features for quality evaluation research. Di ﬀ erent regions within an image are not uniformly important for IQE. Visual saliency can ﬁ nd the most attractive regions to the human visual system in a given image. These attractive image regions are more strongly correlated with image quality results. In addition, the degradation of gradient information is related to the structure distortion which is a very important factor for image quality. However, the two types of features cannot accurately evaluate the color distortion of images. In order to evaluate chromatic distortion, this paper proposes the color similarity which is measured in the YIQ color space. The computation of the proposed method begins with the similarity calculation of local gradient information, visual saliency, and color information. Then, the ﬁ nal quality score is obtained by the standard deviation on each similarity component. The experimental results on ﬁ ve benchmark databases (i.e., CSIQ, IVC, LIVE, TID2013, and TID2008) show that the proposed IQE method performs better than other methods in the correlation with subjective quality judgment.


Introduction
Images are likely to be distorted by other irrelevant signals during acquisition, transmission, and processing [1]. In order to discriminate the quality of the resulting images, reliable quality evaluation criteria are needed. Moreover, image processing algorithms and systems require image quality evaluation (IQE) algorithms to assess their performance. Therefore, IQE becomes an essential problem in the field of image processing [2].
Because the human visual system (HVS) is the ultimate receiver of image information, image quality is a kind of subjective perception result of the HVS [3]. Therefore, the most reliable IQE method is the subjective method which means the quality of an image is determined through the evaluation of a large number of human observers [4]. Subjective IQE methods can provide reliable evaluation results which are consistent with human subjective feelings [5,6]. But this kind of method needs human involvement, which is a costly and time-consuming operation. Moreover, it cannot be used in the real-time systems [7].
Objective IQE research aims to design quality evaluation algorithms that can automatically calculate the quality results [8][9][10][11][12]. According to the availability of reference image information, IQE methods can be divided into three types [6]: full-reference (FR), no-reference (NR), and reduced-reference (RR). This research mainly talks about FR-IQE problem which can be viewed as a similarity (or correlation) computation between a test image and its corresponding reference image in the sense of visual perception [13].
The traditional methods for FR quality evaluation are mean squared error (MSE) and peak signal-to-noise ratio (PSNR). However, these two metrics fail to take into account the perception characteristics of the HVS, so their evaluation results are not consistent with the human subjective judgments [14][15][16][17][18][19]. In the past two decades, many IQE models have been proposed by using the HVS properties, such as structural information, contrast [20], frequency [21], and gradient information [22]. Among the previously proposed image quality metrics (IQMs), the structural similarity (SSIM) is one of the most widely influential [6]. SSIM is based on the assumption that the HVS is more sensitive to the structural information of a given image. The structural information mainly refers to the edge and contour information in images, which is more sensitive than other information for the HVS. Moreover, SSIM can be used to extract the edge and contour information from images.
Image structural information is very effective for IQE, so more researchers have improved and extended the original SSIM, for example, multiscale SSIM (MS-SSIM) method [23], textural similarity (content-partitioned, SSIM) method based on content segmentation, complex wavelet domain structure similarity (CW-SSIM) method [24], and information content weighted SSIM (IW-SSIM) method [25]. In addition to the SSIM, the gradient information is another effective way to reflect the structural information. Feature similarity (FSIM) combines the phase consistency with the similarity of gradient magnitude maps and uses the weighted pooling method based to obtain the final quality score [26]. The image gradient is a popular feature since it can effectively capture image local structures, to which the HVS is highly sensitive. The pixel-wise gradient magnitude similarity deviation (GMSD) computes the similarity between the gradients of reference and distorted images and then computes some additional information [27].
However, the structure information cannot reflect the color changes between the reference image and the distorted image, above IQE methods using structure information can not accurately evaluate the color distortion of images. Moreover, more and more color images have been widely used in our daily life, and the color information contains important signals for visual perception. Therefore, color information should be considered to improve the accuracy of IQE methods. In order to extract the color information from images, this paper proposed the color similarity (CS) which is measured by transforming images from the RGB color space to YIQ color space [28].
Moreover, the visual attention mechanism is a very important character during visual observation, and visual saliency (VS) information is another way to improve evaluation performance of IQE methods. In general, this paper proposes an IQE model by using VS, gradient similarity (GS), and CS. The GS model is used to extract structure information from a given image. The VS model is used to find the important regions in the image which are given more weight while evaluating the quality of images. Meanwhile, CS model is used to evaluate the chromatic distortions in the image [29]. In the CS model, the image is transformed from the RGB color space to the YIQ color space, and the CS is calculated on the I and Q chromaticity channels. In addition, a fusion technique is proposed for the computation of GS. The simple IQE structure improves the accuracy of the prediction image.
The rest of this paper is organized as follows. The detailed calculation process of the proposed algorithm will be introduced in Section 2. Then, experimental results, result comparison, and discussion are introduced in Section 3. Finally, the conclusions are drawn in Section 4 of this paper.

Methodology
Like most IQE methods, the proposed method also utilizes a two-stage framework. Specifically, it mainly includes the following steps. First, the images are transformed into a YIQ color space. Second, the local GS, local CS, and global VS similarity (VSS) are calculated. Then, the weighted results of the three similarity components form the ultimate quality score. The framework of the proposed method is shown in Figure 1.
As shown in Figure 2, VS and GS are very sensitive to distortion because we can easily find the distorted parts in the image from Figures 2(e) and 2(f). Therefore, our research attempts to deal with IQE problems by combining visual attention model and gradient information.
The reference image and the distorted image are first transformed into another color space. In this color space, the brightness and chromaticity information can be separated. The perception of color information is a complex process of physiology and psychology. Quantitative measurement of color information can be provided by color space model. YIQ color space has the following advantages over other color spaces. The YIQ color space has a linear relationship with RGB space, which not only adapts to changes in light intensity but also requires very little computation. Besides, in line with the color perception mechanism of HVS, YIQ color space can separate color and brightness information. Therefore, the YIQ color space is selected in this paper. The RGB space is converted to YIQ space by the following equations: where Y, I, and Q represent brightness, hue, and saturation channel information, respectively. Besides, I and Q contain the color information of the image.
2.1. Gradient Similarity. The gradient function can find the direction of greatest increase. The edges of the image always have a large gradient value. Conversely, when the image has a smooth part, the gradient value is smaller. In the discrete domain, we can usually calculate the gradient amplitude based on some operators and use the difference between adjacent pixels to approximate the derivative of the image function. In the classical image gradient algorithm, the change of each pixel in the image is usually considered, and the first or second derivative of adjacent edges is used to set the gradient operator in the original image. In general,   These operators approximate the gradient information of image f ðIÞ in vertical and horizontal directions by convolution calculation, represented by G x ðIÞ and G y ðIÞ,respectively. Therefore, we can obtain the formula for calculating image gradient information, which is shown below: where P x and P y represent the gradient operators in the horizontal and vertical directions of the image, respectively, and * represents the convolution operator.
In this paper, we use the following Prewitt filter to detect the edge of the image: GðYÞ is the gradient magnitude information of the luminance Y channel, which can be obtained by the following Then, the gradient information of reference image and distorted image is obtained by Prewitt operator. Finally, the calculation formula of GS is as follows where Y r and Y d represent the brightness channel from the reference image and the distorted image, respectively, the parameter C 0 is a constant which controls numerical stability. In image processing research, gradient information is widely used. However, in many cases, the perception of image structure by HVS deviates greatly from the judgment of structural deformation by gradient. In fact, the above formula cannot effectively reflect the edge features of the image. Therefore, the edge features of the reference image and distorted image cannot effectively play a role in this model. This defect prompts us to put forward a new GS mapping method.
The above deficiency of traditional GS is mainly due to the calculation of G r and G d is independent of each other. Therefore, we fuse the correlation between the reference image and the distorted image into the calculation of GS map by using a fusion technique. First, a fused image, Y m , is obtained by the weighted average of Y r and Y d . In the following formula, the weight parameters are determined through a large number of experiments.
where Y m represents the fused image.
Then two additional GS are calculated as follows: where C 1 and C 2 are the constants to maintain the stability of the formula. Note that equations (7) and (8) are usually unequal. Finally, the result of GS, G, can be calculated as follows: 2.2. Visual Saliency Similarity. We use the spectral residual method to generate visual significance images. This method extracts the spectral residuals of the input image in the spectral domain, and a visual significance map based on the spatial domain is generated. The spectral residual method has obvious advantages in terms of extraction effect and computational complexity compared with other methods. We do not evaluate the saliency map of the model based on the size of the original image, but evaluate the VS of the model by reducing the resolution. The VSS information, S, is calculated as follows: where C 3 is a constant, and VSð·Þ is a function to extract saliency information from images. The result of VS could be computed using various VS models, e.g., GBVS [30], SR   [31], AIM [32], and SDSP [33]. For the reason of performance and efficiency, SDSP model is used to compute the VS.

Color
Similarity. In addition, we must consider a special case. The color change between the reference image and the distorted image cannot be reflected by the structural information of the image. Therefore, the color characteristics of the test image need to be taken into consideration. Previous studies use CS maps to represent color differences between images [34,35]. Based on these research results, we design the following CS calculation steps.
Let I and Q denote the two chrominance channels in the YIQ color space type. The CS on the I and Q channels (CS I and CS Q ) can be calculated in the following equations: where C 4 and C 5 are constants, which are used to control numerical stability. Finally, the result of CS, C, is calculated by merging CS I and CS Q .
2.4. Pooling Stage. The local GS, CS, and global VSS are integrated by weighted average calculation. Then, the final quality score is generated from these three components. The calculation formula is as follows: where stdð·Þ is the standard deviation function of the matrix, and α and β represent the weights of local GS and CS, respectively. In addition, according to our experiments, top performance is achieved when α = 0:2 and β = 0:47.

Experiments
The experimental comparison is conducted on five databases, including the CSIQ [36], LIVE [6], IVC, TID2008 [37], and TID2013 [38] databases. The five databases are summarized in Table 1. Among all these databases, all the images in these databases are subjectively evaluated by many observers, and the results of the evaluation are statistically treated and changed into subjective evaluation scores, that is, mean opinion score (MOS) or different mean opinion score (DMOS). MOS or DMOS is always used as an objective evaluation scale to measure the image quality.
The eight IQE methods were tested on five databases, and the results were analysed and compared. This paper selects the evaluation standard of IQE model provided by VQEG. It is usually thought that the objective evaluation of image measurement and subjective evaluation of image have some specific nonlinear relationship, and the natural texture and edge information of image are highly unstructured, singularity characteristics. The logistic regression function is employed for the nonlinear function: where β 1 , β 2 , β 3 , β 4 , and β 5 represent the five fitting parameters of the regression model. We use Pearson linear correlation coefficient (PLCC), Spearman rank order correlation coefficient (SRCC), Kendall rank order correlation coefficient (KROCC), and root mean square error (RMSE) commonly adopted to measure the performance of IQA models, so the experiment also used the four criteria for the validation.
PLCC is used to indicate the correlation between subjective evaluation and objective evaluation to predict the accuracy of IQE performance. SROCC is used to measure the consistency between the objective evaluation (predicted value) and the subjective evaluation value. KROCC, similar to PLCC, can be used to measure the nonlinear and linear correlation of the model. RMSE represents the root mean square error between the objective evaluation value and the subjective quality score. The value range of PLCC, SROCC, and KROCC is [-1,1]. The closer the absolute value of PLCC, SROCC, and KROCC is to 1, the better the linear grade correlation of the model and the higher the accuracy. The smaller the RMSE value is, the better the prediction effect of the model is. Table 2 shows the evaluation results of GSC and other 7 classical IQMs on 5 commonly used databases. The two best results are shown in bold. As can be seen from Table 2, the performance of GSC performs significantly better than them on CSIQ, IVC, TID2008, and TID2013 databases. According to the results of experiments, the 7 IQMs in this paper are generally better than PSNR and SSIM methods, and the performance of VSI and GSC is comparable. In addition, the GSC evaluation algorithm proposed in this paper is not only more consistent with the subjective perception In order to provide a more visual illustration, the scatter plot in Figure 3 shows the subjective and objective scores of the six IQE methods in the CSIQ database. These methods include PSNR, SSIM, VIF-p, IW-SSIM, VSI, and GSC. Each point in the scatter diagram represents a test image, and the red curve is obtained according to the logistic regression function.
The scatter plot corresponding to a good IQE method should have good convergence and correlation. The gap between the point and the fitted curve reflects the correlation of the model. The smaller the gap between the two, the higher the correlation of the model. As shown in Figure 3, the objective score of the GSC has a stronger correlation with the subjective score. It can be seen that the clustering points of GSC are closer to the fitting curve, which proves better performance. Table 3 shows the SROCC results of each distortion type in LIVE, CSIQ, and TID2013 databases. For each distortion type, the top two SROCC result scores for the 8 IQMs are highlighted in bold. There are 35 distortion types in the LIVE, CSIQ, and TID2013 databases, and we can see GSC has been in the top two for a total of 27 times.
In order to test the stability of performance on different distortion types, four types of distorted images were collected from LIVE, CSIQ, and TID2013. Compared to other IQM, GSC has excellent performance on JPEG compression (JPEG), JPEG-2000 compression (JP2K), and Gaussian blur (GB) distortion. Figure 4 shows the histogram of SROCC results of the 8 IQMs on four distortion types, from which we can be seen that GSC not only has high computational accuracy but also has very stable performance.
In Table 3, most IQE methods performed poorly on contrast change (CC), changes of color saturation (CCS), and image color quantization with dither (ICQD) in TID2013 database, because these methods ignore the correlation between color information and quality distortion. Compared with other IQMs, because GSC uses CS maps to represent color differences between distorted and reference images by color space transformation, GSC can give more accurate perceptual quality scores when evaluating color distorted images.
To further examine whether the advantage of GSC is achieved from the proposed VSS, GS, or CS, as shown in Table 4, we also conduct ablation experiments. The proposed GSC (i.e., VSS + GS + CS) measures the image quality by integrating the effects of the VSS, GS, and CS together and achieves the highest performance, about SROCC = 0:8657 and PLCC = 0:8766 in TID2013 database.
In order to compare the computing speed, we compare the time required by GSC and other IQE methods to calculate an image. Table 5 lists the running times of 9 IQE models on images of dimensions 372 × 512. The experiment was performed on a Core i5 3.40 GHz CPU with 16 GB memory. The experiment is implemented in MATLAB 2017b. As can be seen from Table 5, compared with other IQE methods, GSC is one of the three fastest IQMs and has a significant advantage in efficiency.

Conclusions
In this paper, we proposed a new FR IQE model, which is based on GS, VSS, and CS of images. The GS measures the deformation of local structures, VSS reflects the distortion of visual attractive regions in a given image, and CS considers the chromatic distortion of images. The three parts enhance the accuracy of the proposed method and make the method closer to human vision evaluations. The experimental results on five image databases (CSIQ, LIVE, IVC, TID2008, and TID2013) show that our method is well correlated to the subjective quality evaluation.

Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest
The authors declare that they have no conflicts of interest.