Subjective Score Predictor : A New Evaluation Function of Distorted Image Quality

Image quality assessment (IQA) is a method to evaluate the perceptual performance of image. Many objective IQA algorithms are developed from the objective comparison of image features, which are mainly trained and evaluated from the ground truth of subjective scores. Due to the inconsistent experiment conditions and cumbersome observing processes of subjective experiments, it is imperative to generate the ground truth for IQA research via objective computation methods. In this paper, we propose a subjective score predictor (SSP) aiming to provide the ground truth of IQA datasets. In perfect accord with distortion information, the distortion strength of distorted image is employed as a dependent parameter. To further be consistent with subjective opinion, on the one hand, the subjective score of source image is viewed as a quality base value, and, on the other hand, we integrate the distortion parameter and the quality base value into a human visual model function to obtain the final SSP value. Experimental results demonstrate the advantages of the proposed SSP in the following aspects: effective performance to reflect the distortion strength, competitive ground truth, and valid evaluation for objective IQA methods as well as subjective scores.


Introduction
Image quality assessment (IQA) is fundamental and important in evaluating and improving the perceptual quality of images, which is widely applied in image-based instrumentation [1,2].The IQA problem can be addressed in objective and subjective classes.
Over the years, various objective IQA methods have been proposed.According to the availability of reference image, the objective image quality metrics is classified as full-reference (FR), reduced-reference (RR), and no-reference (NR) [3].Most existing methods are FR via comparing the distorted image with a complete reference image.The simplest FR metrics is the mean squared error (MSE) and the peak signalto-noise ratio (PSNR).However, they both cast the image quality on the pixel values without the structure of image and human visual system (HVS).To overcome these drawbacks, Wang et al. proposed the structural similarity index (SSIM) to compare the local patterns on luminance and contrast [4].Deriving from SSIM, some researchers developed a gradientbased SSIM [5] and a multiscale SSIM [6].However, the SSIM and the modified SSIM IQA methods compare the features within corresponding patches of reference and distorted images, so they ignore the image curvature information [7].To adequately investigate the visual disparity between a center patch and its spatial neighborhoods, Zhou et al. proposed an FR IQA scheme via comparing visual similarity in both the interpatch and intrapatch ways in [7].Since the reference image is often not available in practice, NR-IQA methods have become a good alternative to evaluating the distorted image quality.To remedy the lack of prior knowledge of reference images, most of NR-IQA algorithms are limited to some specific distortions, dataset training, or IQA models.For example, Jiang et al. designed the image quality metric for the compressed remote sensing image assessments [8].Based on training on dataset, Tang et al. learned a map model from low-level features to image quality scores from human observers [9].With a multivariate Gaussian model (MVG) of IQA, Mittal et al. proposed a blind image quality analyzer via measuring the distance between the MVG fit of image features from test image and a MVG model of natural images [10].In practice, their assumption is not satisfied well, so they failed in the various real applications.Therefore, RR-IQA methods provide a lying solution between FR and NR-IQA methods, which is based on partial information about the reference image.For instance, Wang and Li et al., respectively, developed the RR image quality metrics using a wavelet-domain natural image statistic model and a divisive normalization-based image representation [11,12].
Generally, all objective IQA metrics is to evaluate the image quality in agreement with the subjective opinion of human observers [13], so their performances are validated via comparing with the subjective scores in open IQA datasets such as LIVE [14], A57 [15], CSIQ [16], IVC [17], TID2008 [18], and Toyoma [19].However, these datasets only have a limited amount of images since the subjective experiments are time-consuming and expensive [20].Among the above six datasets, the number of natural reference images is only up to 30 within Toyoma.In practice, this is not enough to generally cover the vast proliferation of image data, and it is really not enough to reflect the performance of objective IQA algorithms.In addition, the subjective scores could be unfaithful to reflecting the ground truth of image quality because of the various experimental conditions, individual observers, and different processing methods of raw scores.Thus, it is necessary to develop an objective predictor to correctly represent the real quality of distortion image.
To predict the subjective scores of image quality, Kaya et al. imitate the human observers with a trained multilayer neural network based on extracted statistical features [21].It requires many training samples including not only image features but also subjective scores.To address the lack of ground truth of image quality, Lu et al. calculate the objective distortion score (ODS) from the logarithm of distortion parameter   , that is, ln(  ) [22].The larger ODS value means the worse image quality.This is the first work to provide ground truth for IQA research according to a simple function of the impaired distortion.However, referring to the analyzed criteria of objective IQA [23], ODS has some problems in terms of consistency, discrimination, and convergence.For example, since ln(⋅) is a monotonically increased function, the ODS is still increasing on larger   no matter that the image quality becomes better or worse.On the other hand, different distortions could result in same ODS values at same distortion strengths, while the impaired degrees are obviously different.To remain consistent with the subjective scores (NODS), the ODS are mapped to the interval [0, 100], but the normalization is limited by the difference of the evaluated maximum and minimum values.More seriously, when the distortion parameter is quite small, the ODS value is close to minus infinity, which leads to failure of this normalizing.
As the evaluation of subjective scores faces the aforementioned problems, we design a subjective score predictor (SSP) function to calculate the scores of distorted images.To reflect the distortion strength in agreement with ground truth of subjective scores, we combine the necessary knowledge of a successful IQA algorithm design, including the information of source image, distortion, and the human visual system (HVS) [23].In this paper, the SSP function is demonstrated with three advantages.First, SSP unifies a visual perception based model, so it partially avoids the individual differences between observers on different distortions with different levels.Second, the distortion information and subjective score of source image are both integrated to make the subjective score prediction more consistent with humanbased subjective scores.Third, an objective function is used to calculate the final value.It is efficient and feasible, which can reduce the cost and manpower.Therefore, the proposed approach has stronger application in constructing various IQA datasets to cover more source images such as highresolution remote sensing images and disgusting medical disease images, which are difficult to test on human observers in subjective experiments.
The rest of this paper is organized as follows: Section 2 introduces the SSP model, and Section 3 analyzes its characteristics and extension.Its effectiveness is validated in Section 4. Section 5 concludes this paper.

The Proposed Model
To reflect the subjective scores of image quality by an objective function, we should firstly consider the requirement of a successful objective IQA algorithm.Inspired from the recent work, which suggested that a successful IQA algorithm design should combine the knowledge of source image, distortion, and HVS [23], we design our SSP function on information of the source image, distortion, and HVS characteristic, which are prior knowledge on constructing an IQA dataset.

The Source Image: Reference.
In all open IQA datasets, the entire image databases are derived from some selected source images with given distortions.Referring to the source images, the distorted images are subjectively evaluated by human observers, and then the raw scores are adjusted to obtain the final subjective quality scores.From the perspective of objective IQA, the source images are treated as reference images, and the subjective observing is an FR IQA processing in human eyes.Therefore, the information of source image   including its corresponding distortion level   and subjective score   can be considered as reference information in SSP.For all the public datasets, the source images are given with perfect image quality, so subjective score   equals 100, and distortion level   is at zero-distortion.

Distortion:
Parameter.Usually, the distortion types include noise, blur, compression, transmission, and intensity deviation, which are involved in the open IQA datasets [24].Their parameters are standard deviation of noise  N , standard deviation of blur kernel  B , bit rate of per pixel  BPP , transmission signal-to-noise ratio  SNR , and intensity change value V I , respectively.
For each distortion type, the image quality could monotonically vary according to the strength of the distortion level.Intuitively, if the distortion strength reaches one extreme, the distortion is very little or even inexistent.This means that the distortion has zero-distortion level, denoted as  0 , such as  N = 0 and  B = 0 for noise and blur, respectively.Ideally, the distortion level of ground truth image is at zero-distortion level  0 .Following the psychological mechanism of image understanding, it is difficult to distinguish the levels of the evaluated images by human when they are impaired seriously beyond a certain threshold.This threshold can be marked as zero-score distortion level   .In a word, zero-distortion level  0 and zero-score distortion level   can portray the distortion property in SSP.

HVS:
Mechanism.HVS has complicated psychological inferences and is not a direct translation of information [25].A vision system is subdivided into multiple parts with quite distinctive functions [26].Based on the contrast sensitivity function (CSF), HVS models are developed as a band-pass or low-pass filter, which are reviewed by Kim and Allebach in halftoning [27].In the point of spatial frequency, the most popular modulation transfer functions (MTF) of HVS are Gaussian model, exponent model, Barten model, and compound model [28,29].Recently, Bayesian brain theory suggests that the brain works with an internal generative mechanism for visual perception and image understanding [30].Therefore, we are devoted to look for a function to deal with the signal processing approach of IQA and signify the physiological and psychological mechanisms of perception.Observing the popular CSF and MTF models in [27][28][29], the exponential function is employed as a prototype.In the same way, we depict the SSP function based on an exponential function with the prior knowledge of the reference information (distortion parameter   and subjective score   ) and the distortion information (zero-distortion parameter  0 and zero-score threshold   ).

SSP:
Function.Firstly, zero-distortion parameter  0 and zero-score threshold   are treated as a zero-point (high score) and one-point (low score) of distortion strength.To limit the strength of different distortions to a uniform interval, input parameter   at ith-level distortion is normalized by the difference between one-point   and zero-point  0 .Secondly, to ensure that the SSP has a corresponding score referring to the subjective opinion of source image, the distortion strength is considered as a relative one from input parameter   to distortion level   of source image.Thirdly, the power of the SSP function is adjusted by a positive fading factor  depending on the distortion type.Finally, the SSP is defined as an exponential function in the following way: Analyzing on this definition, the more seriously the distortion degree is impaired, the less the SSP value becomes, which can conform to the subjective expectation of human on image quality.When the reference image is the ground truth with zero-distortion  0 , the formula can be simplified as As shown in (2), the SSP is only faithful to the actual distortion information of test images.In particular, for IQA datasets, this can avoid the individual differences of observers and the unfaithful subjective scores because of complex experimental conditions.

The Analysis of SSP
The SSP function can satisfy the following mathematical conditions.
(i) Bounded.To make the subjective experiments significant, the impaired distortion level is always between  0 and   .Therefore, the input parameter have two extreme values  0 and   .Based on simplified formula (2), when   =  0 , we can get   =   = 100.When   =   ,   = 100/  .Therefore, the SSP can be bounded in [100/  , 100].
(ii) Monotonous.According to the variation trend of image quality with the increasing numerical value of distortion strength, the distortions can be divided into increasing-type distortion (the larger numerical value of distortion strength reflecting less distortion can generate better image) and decreasing-type distortion (the larger numerical value of distortion strength reflecting more distortion can generate worse image).
(iii) Invertible.From formula (1), we can obtain the image quality score of reference image   via transposition operation: Similarly, it obeys the basic prototype of formula (1), only exchanging the information of the reference image and test image.
(iv) Recursive.If objective assessment  1 at parameter  1 is known, the subjective score of reference image can be obtained according formula (3) as   =  1 ⋅  −⋅((  − 1 )/(  − 0 )) .Thus, at parameter  2 of the same distortion, we can calculate the objective assessment as After the first distortion is finished,  1 is the reference for the second distortion.Therefore, the SSP with reference to  1 is Combining formula ( 5) into (6), we can get the final SSP to the ground truth as Overall, the SSP model can obtain bounded, monotonous, invertible, and recursive properties, so that it is easy to extend for multiple distortions.The SSP function is mainly dependent on objective distortion parameters, so it usually keeps the same order as the distortion degree order.In addition, SSP avoids the subjective experiments, which has diversity in experimental circumstances, individual differences, and different scenes, and it is easily obtained and faithful to the real image quality.

Experiments
This section presents the experiments on dataset LIVE II [14] owing to the given distortion parameters, in which 29 high-resolution 24 bits/pixel color images are distorted by five distortion types: JPEG 2000, JPEG, white noise, Gaussian blur, and transmission errors using a fast fading Rayleigh channel model [31].The subjective scores are reflected by the difference of the Mean Opinion Score (MOS) between reference image and distorted image, naming it as Difference Mean Opinion Score (DMOS).
Initialization (the setting of the parameters).For SSP, three parameters including zero-distortion parameter  0 , zeroscore threshold parameter   , and fading factor  are required to initialize at first for each distortion.
To obtain  0 and   in SSP, we tabulate the distortion information, the subjective opinions, and their corresponding relationship in LIVE II dataset in Table 1.Considering the human distinctive characteristic, prior variables  0 and   are empirically initialized as the values in italic type at the bottom of Table 1.
Fading factor  is based on the expectation that SSP and MOS are equivalent; that is, we expect that the SSP can hold consistency with MOS scores in terms of the upper limit, the lower limit, and the range of values.Since the clear reference image quality is viewed as 100, the MOS can be simply equal to 100-DMOS.To find the appropriate  for each distortion type, we adjusted fading factor  from 0.5 to 20 by step 0.1 in our experiment.Figure 1 illuminates the average of the differences in the upper limit, the lower limit, and the range of SSP values at each  sample.As shown in Figure 1, there exists only one point with least average difference for each distortion, which indicates the SSP fitting MOS well.Therefore, we can set  = 1.4,1.7, 3.5, 2.5, and 1.8 as the fixed fading factors of JPEG 2000, JPEG, white noise, Gaussian blur, and transmission errors, respectively, also recorded at the bottom of Table 1.
Experiment 1 (the effectiveness of the proposed SSP).To measure the effectiveness of our SSP function, we compare SSP with the subjective scores DMOS in dataset and recent NODS index [22].In this experiment, the linear correlation  is tested between distortion strengths and image quality scores (i.e., SSP, DMOS, and NODS) using Pearson linear correlation coefficient (PLCC) and Spearman rank order correlation coefficient (SROCC).
One thing needs to be paid attention here.Our method directly evaluates the image quality, similar to the subjective MOS opinion of human expectation.In contrast, the NODS index is a distortion score more like the image quality difference DMOS from the reference image.Therefore, the correlation between proposed SSP function and distortion is opposite to that between DMOS and distortion, while this correlation of NODS should be the same as DMOS.Intuitively, for the increasing-type distortions such as JPEG 2000, JPEG, and fast fading, SSP should generate positive correlation values since the better image quality is from the larger distortion parameter, while the DMOS and NODS have negative values in both PLCC and SROCC.In contrast, for the decreasing-type distortions such as white noise and Gaussian blur, the PLCC and SROCC values are negative in SSP but positive in DMOS and NODS, since the distortion parameter increasing causes the image quality to be worse.
Table 2 tabulates the values of PLCC and SROCC between distortion strengths and the DMOS, NODS, and SSP scores, respectively.From it, we can observe that the DMOS expresses positive correlation for the decreasing-type distortions such as white noise and Gaussian blur, while negative correlation is responded for the increasing-type distortions such as JPEG 2000, JPEG, and fast fading.SSP holds opposite correlation compared with DMOS, which is consistent with the previous analysis.However, the NODS demonstrate an invariable positive correlation no matter what the distortion type is.Therefore, SSP can reasonably catch the human expectation about image quality with the strength changing for different distortions, similar to DMOS, but the NODS index has failed on the increasing-type ones.In addition, both of SSP and NODS show stronger linear correlation in PLCC and SROCC than DMOS, since they exclude the various changes in experimental conditions and raw score processing error of DMOS.Furthermore, since SSP is objectively calculated from the relative degree instead of the direct numerical value of distortion parameter, and SSP has better linear correlation than NODS in PLCC.
Experiment 2 (the comparison as ground truth).SSP is proposed to provide the ground truth of distorted image quality, similar to DMOS scores.To compare the SSP scores with the DMOS more distinctly, Figure 2 scatters the scores of DMOS and SSP for different distortions and gives the linear trends via  − detrend() in MATLAB.
Comparing with DMOS, we can get three appearances from Figure 2: (i) similar effectivity: the linear trend line of SSP holds a complementary changing characteristic well with DMOS, so that it can reflect the overall variation trend of DMOS.(ii) Obvious discriminability: SSP is monotonically changes with the increasing of distortion numerical value.For one certain distortion strength, SSP has unique value, while the DMOS values present some shock changes.(iii) Objective quantitation: SSP only depends on the distortion information without the influence of the image scene and human subjective factors, so it demonstrates that the proposed SSP function is seamlessly meets the ground truth of IQA.
In the distortion distinction, we test whether the SSP changes reasonably with the real image quality.Figure 3 shows a group of distorted images for a house.The corresponding values of DMOS and SSP are shown in Table 3. Figures 3(a)-3(c) are the distorted images of fast fading, and their signal-to-noise ratio are 17.9, 20.3, and 22.7, respectively.In theory, the quality scores should increase from (a) to (c).The SSP values in Table 3 follow this rule, while the DMOS values do not make sense for Figure 3  for SSP values in Table 3, but the DMOS has almost the same value.
Experiment 3 (the validation for IQA methods).The ground truth of image quality is always used to evaluate the performance of IQA algorithms.In this experiment, we test whether SSP scores can evaluate the IQA algorithms.As mentioned before, the objective IQA methods are classified as FR, RR, and NR.Therefore, we representatively select the popular FR multiscale structural similarity (MS-SSIM) [6], the recent RR entropic differencing (RRED) [32], and NR spatial-spectral entropy-based quality (SSEQ) [33] to test on the total 634 distorted images in LIVE II dataset.Then, we calculate the PLCC and SROCC between the values of IQA algorithms and the ground truth DMOS or SSP scores.the parentheses following the correlation strength.It can be seen from that the IQA algorithms have the same rank evaluated by SSP scores as the DMOS scores.Thus, the SSP is applicable as ground truth to compare the performance of different IQA algorithms.
Experiment 4 (the stabilization for multiple distortions).For many open IQA datasets, the subjective studies obtain MOS or DMOS on corrupted images by only one distortion.However, the majority of images could be corrupted by multiple distortions in practical consumption [34].It motivated us to extend our SSP as formula (6) for the ground truth of multiple distortion dataset.We simulate the modified SSP with LIVE Multiply Distorted Image Quality Database (LIVE MD) [35] to evaluate the usability of our extended method.There exist two kinds of multiple distortions firstly blur followed by JPEG or noise.The detailed information about distortion is shown in Table 5. Referring to Table 1 of LIVE II, the fading factors are selected as 2.5, 1.7, and 3.5 for blur, JPEG, and noise, respectively.Also, the zero-distortion and zero-score parameters of blur and noise are set to be the same as Table 1 because of same quantitative indicators of distortion strengths, while the JPEG with different distortion indicator are set according to the parameter setting limitations of Matlab imwrite function.
To evaluate the stabilization of our SSP method, the LIVE MD dataset is partitioned into 15 groups with the same combination of two distortion parameters on 15 source images.Tables 6 and 7 give the mean and deviation values of DMOS and SSP (SSP1: the reference score is the MOS of source images; SSP2: the reference score is 100) for blur + JPEG and blur + noise, respectively.Compared with DMOS, the deviation of SSP is less considering the changes of MOS values for source images.This cannot be avoided in subjective experiments, since the observers are always influenced by the contents of the images [35].In contrast, the SSP from the real distortion parameters is objective and stable, which can be easily observed from SSP2 in Tables 6 and 7.
More directly, Figure 4 records the mean scores of 100-DMOS and SSP.It demonstrates that the 100-DMOS values are almost the same for blur + JPEG and blur + noise, while the larger differences appear in level 0 of JPEG and Noise.For the proposed SSP, although it has small values for blur + JPEG owing to large parameter  0 of JPEG distortion, it is effective in distinguishing the qualities of different distortion levels, integrating the multiple distortions, and keeping constant at same distortion degree.

Conclusion
A novel metric SSP for distorted image quality has been proposed in this paper, which is derived from distortion information based on an exponent prototype of human visual model.Tested on LIVE II dataset, the SSP shows strong correlation with the distortion strength, effective consistency with the subjective score DMOS, and reasonable ground truth to evaluate the IQA algorithms.Moreover, the proposed SSP has more stabilization than DMOS to handle the combination of the multiple distortions on LIVE MD dataset.Therefore,   it is feasible to integrate the proposed SSP into IQA dataset to not only directly generate the ground truth of various images but also improve the subjective experiments with more distortion categories.

Figure 1 :
Figure 1: The average differences between SSP and 100-DMOS at  samples.

Figure 2 :Items Subfigures in Figure 3 Figure 3
Figure 2: The scores and trend lines of DMOS and SSP for different distortion categories.

Figure 3 :
Figure 3: Group of distorted images for a house image (f).

Figure 4 :
Figure 4: The average scores of SSP and 100-DMOS for LIVE MD database.

Table 1 :
(1) characteristic information of LIVE II and the parameter setting of our proposed SSP function.This formula also follows the definition of SSP in formula(1), which refers to the distorted image with  1 score of image quality at parameter  1 .(v)Multiple Distortions.Our SSP can be extended into multiple distortions easily.Constructing one IQA dataset with multiple distortions, the reference image is sequentially polluted by different distortions.Given that the source image is ground truth and two distortion parameters  1 and  2 on different distortions with factors  1 and  2 , respectively, multiple distorted image can be obtained from first distortion with factor  1 and then the distortion with  2 .Therefore, after the first distortion is finished, the SSP with reference to the ground truth for the captured distorted image is

Table 2 :
PLCC and SROCC between distortion strengths and image quality scores including DMOS, NODS, and SSP.
Table 4 records the average values of PLCC and SROCC for each IQA index.Regardless of the positive and negative direction, the IQA algorithms of the PLCC and SROCC values are in

Table 5 :
The distortion information in the LIVE MD Database.

Table 6 :
(6) mean (deviation) statistics of DMOS and SSP scores for blur + JPEG.The source images: mean (DMOS) is 89.508; deviation is 4.3648.SSP1: using the MOS values in LIVE MD dataset as reference scores   in formula(6).SSP2: setting reference scores   in formula (6) as 100.

Table 7 :
(6) mean (deviation) statistics of DMOS and SSP scores for blur + noise.The source images: mean (DMOS) is 89.508; deviation is 4.3648.SSP1: using the MOS values in LIVE MD dataset as reference scores   in formula(6).SSP2: setting reference scores   in formula (6) as 100.