With the rapid development of Internet technology, images on the Internet are used in various aspects of people’s lives. The security and authorization of images are strongly dependent on image quality. Some potential problems have also emerged, among which the quality assessment and denoising of images are particularly evident. This paper proposes a novel NR-IQA method based on the dual convolutional neural network structure, which combines saliency detection with the human visual system (HSV), used as a weighting function to reflect the important distortion caused by the local area. The model is trained using gray and color features in the HSV space. It is applied to the parameter selection of an image denoising algorithm. The experiment proves that our proposed method can accurately evaluate image quality in the process of denoising. It provides great help in parameter optimization iteration and improves the performance of the algorithm. Through experiments, we obtain both improved image quality and a reasonable result of subject assessment when the cascaded algorithm is applied in image security and authorization.
With the continuous increase in the size of video surveillance networks, current large-scale video surveillance systems contain tens of thousands of cameras and millions of millions of images or pictures. However, they may be interfered and attacked [
Today’s systems of security and authorization are very large and there are millions of images and videos produced by the Internet every day. Under such circumstances, it is unrealistic to employ a large number of people to subjectively evaluate the quality of each monitored image and video. Therefore, how to objectively evaluate such quality to meet the needs of monitoring and authorization has become a new direction of research in the field of security and authorization.
Image quality assessment is divided into subjective and objective evaluation methods. The subjective evaluation method [
A no-reference image quality assessment (NR-IQA) method can not only evaluate image processing algorithms but also optimize the image system [
Image preprocessing is a very important part of the image system. It can provide reliable data for subsequent image processing, thereby improving detection and recognition accuracy. There are many methods, such as image denoising and image enhancement, which can eliminate interference information and purposefully enhance useful information respectively. However, there are very few methods of image preprocessing in the field of image quality evaluation, because most preprocessing operations will enhance or reduce the distortion information in the image. The existing no-reference image quality evaluation methods, such as those discussed in the literature [
Example of image local normalization.
In the method proposed by Le Kang, the average value of all local quality scores for the test image is taken as the quality value for the entire image. However, such methods do not take into account human visual perception of the image, because the content of the local image block and its position in the entire image will affect the viewer’s visual perception. For example, people are less sensitive to distortion in flat areas of an image (such as blue sky) than in complex textured areas (such as edges). Each block in the image has a different effect on human-perceived image quality. We therefore use the saliency of image blocks in the following study as a weight to represent the degree of human visual perception. Figure
Schematic detection results: (a) original image and (b) saliency map.
Previous NR-IQA methods based on deep learning, such as [
The representation of an image in the HSV color space. The first column contains the three components of the reference image in the HSV color space. The second column is the three components of the JPEG2000 distorted image in the HSV color space. (a) and (b) represent a hue component; (c) and (d) represent a saturation component; (e) and (f) represent a value component.
In this section, we will introduce the proposed CNN model in detail. Figure
CNN model architecture.
In addition, to prevent overfitting when training the CNN model, we apply dropout regularization with a ratio of 0.5 before the last fully connected layer. Because image quality assessment is a quantitative problem, the CNN needs continuous variable prediction. Thus, the Euclidean distance loss function is chosen as the learning loss function:
Through the proposed CNN model, the local image quality evaluation can be obtained. An RGB image is given, converted into a grayscale image and a hue image in the HSV color space, and then the grayscale and hue images are sampled separately. The sampled image block is then subjected to a preprocessing operation, i.e., the local contrast normalization described in (
Preprocessed grayscale image blocks.
For a test image G, we obtain the corresponding saliency map S through the saliency detection model. In order to obtain the image block weight corresponding to the CNN model, we adopt a nonoverlapping sampling strategy for the saliency map S. The weight of the image block is expressed as
Algorithm architecture.
Nowadays, many algorithms [
The assessment method for image quality will be applied to an image processing algorithm to achieve optimal parameter selection. Based on the success of the CNN model for image quality assessment in Section
CNN model architecture.
The ROF model is one of the most effective methods for image denoising. It aims to model the problem of image denoising as a minimization of the energy function to make the image smooth while the edge can be maintained well. Generally, the total variation value of the noisy image is larger than the clear image. The effect of noise removal can be achieved by solving the minimization function of the total variance. Therefore, the ROF model for image denoising is as follows:
Initialize: 1: 2: 3: 4: 5: 6: 7:
In order to study the effect of parameter selection based on image quality on denoising results, we denoise the Park image [
(a) Iterative process of different parameter values; (b) using SSIM and CNN Q to evaluate the denoising image quality of different parameter values.
However, image quality assessment is performed after the completion of the whole processing of iterative convergence when the range of parameter is large. Therefore, choosing an optimum parameter in a certain range requires a very large number of iterations.
Therefore, we use the changes in iteratively generated denoising image quality as a basis for determining if the iterative algorithm converges. Assuming that Dm = Q(m+1)−Q(m), Q(m) represents the quality score of the denoised image produced by the m-th iteration, and the algorithm convergence is presentedin Algorithm
Our improvement to the ROF algorithm is as follows. The algorithm can determine whether the parameter value is the optimal parameter we need before convergence, as shown in Figure
Framework of parameter selection.
Suppose
At the end of each iteration of the ROF algorithm, we use the above-mentioned evaluation algorithm to evaluate the quality of the denoising image
If we want to further improve the performance of the parameter selection framework, we can simplify the complexity of the image evaluation network model, as described in Section
We obtain the quality evaluation of the image by weighting the quality score of the sampled image block. Therefore, the sampling strategy can affect the overall algorithm performance. In this section, we study the effect of two parameters on the performance of the algorithm: the block size of the sampled image and the number of samples per image.
(1) Block size: we assume the sampled image is nonoverlapped. In order to study the effect of image block size on the performance of our algorithm, we use fixed sampling strides and different image block sizes. The results are shown in Table
LCC and SROCC with different patch size.
Size | 80 | 72 | 64 | 56 | 48 |
LCC | 0.967 | 0.966 | 0.964 | 0.958 | 0.953 |
SROCC | 0.975 | 0.964 | 0.962 | 0.957 | 0.951 |
(2) Sampling stride: we study the effect of the number of image blocks sampled on each image. We use a fixed block size of 64x64 and various strides. Figure
LCC and SROCC with different sampling stride.
In order to test the performance of our proposed algorithm, we conducted experiments on the LIVE database [
Tables
Median SROCC across 100 train-test iterations on the LIVE database.
SROCC | JP2K | JPEG | WN | BLUR | FF | ALL |
---|---|---|---|---|---|---|
PSNR | 0.870 | 0.885 | 0.942 | 0.763 | 0.874 | 0.866 |
SSIM | 0.939 | 0.946 | 0.964 | 0.907 | 0.941 | 0.913 |
FSIM | 0.970 | 0.981 | 0.967 | 0.972 | 0.949 | 0.964 |
| ||||||
DIIVINE | 0.913 | 0.910 | 0.984 | 0.921 | 0.863 | 0.916 |
BLIINDS-II | 0.929 | 0.942 | 0.969 | 0.923 | 0.889 | 0.931 |
BRISQUE | 0.914 | 0.965 | 0.979 | 0.951 | 0.877 | 0.940 |
CORAIN | 0.943 | 0.955 | 0.976 | | 0.906 | 0.942 |
CNN | 0.952 | | 0.978 | 0.962 | 0.908 | 0.956 |
Proposed | | 0.968 | | 0.962 | | |
Median LCC across 100 train-test iterations on the LIVE database.
LCC | JP2K | JPEG | WN | BLUR | FF | ALL |
---|---|---|---|---|---|---|
PSNR | 0.873 | 0.876 | 0.926 | 0.779 | 0.870 | 0.856 |
SSIM | 0.921 | 0.955 | 0.982 | 0.893 | 0.939 | 0.906 |
FSIM | 0.910 | 0.985 | 0.976 | 0.978 | 0.912 | 0.960 |
| ||||||
DIIVINE | 0.922 | 0.921 | 0.988 | 0.923 | 0.888 | 0.917 |
BLIINDS-II | 0.935 | 0.968 | 0.980 | 0.938 | 0.896 | 0.930 |
BRISQUE | 0.922 | 0.973 | 0.985 | 0.951 | 0.903 | 0.942 |
CORAIN | 0.951 | 0.965 | 0.987 | | 0.917 | 0.935 |
CNN | 0.953 | | 0.984 | 0.953 | 0.933 | 0.953 |
Proposed | | 0.975 | | 0.963 | | |
It can be seen that the method we propose has a high degree of consistency with human subjective evaluation. Compared to other no-reference image quality assessment methods, our method has the best performance on the three distortion types, JP2K, WN, and FF.
In order to test the generalization ability of our proposed algorithm, we conducted an experiment on the TID2008 database. We train our model on the entire LIVE database and then test model on the TID2008 database. In Table
Experimental LCC and SROCC indicators on the TID2008 database.
BRISQUE | CORNIA | CNN | Proposed | |
---|---|---|---|---|
LCC | 0.882 | 0.892 | 0.920 | |
SROCC | 0.892 | 0.880 | 0.903 | |
First, we evaluate the performance of our proposed evaluation algorithm. We train our CNN model on the LIVE database and test the performance of model on the TID2008 database. Our training data is WN and blur types of distortion images from the LIVE database. By default, all results reported are averaged from 100 train-test iterations. In each iteration, we randomly select 80% distorted images as the training set and the remaining 20% as the validation set. The result is shown as Table
Results on LIVE validation sets and TID2008 test sets.
LCC | SROCC | |
---|---|---|
LIVE validation set | 0.966 | 0.957 |
TID2008 testing set | 0.9161 | 0.9013 |
We apply the parameter selection framework to the ROF denoising model. First, we use different parameters to denoise the two images and observe the denoising results. Figure
(a) Park image of noise image; (b) denoising result with
Next, we study the influence of the parameter selection framework on the iterative denoising algorithm. In our experiment, we set
Park diagram, (a) ROF denoising process without parameter selection framework; (b) ROF denoising process with parameter selection framework.
In order to further prove this conclusion, we calculated the data generated by the denoising experiments on multiple images. The experimental results are shown in Table
Performance of our parameter selection framework.
Parameter selection framework | without parameter selection framework | Reduced iteration | |||||
---|---|---|---|---|---|---|---|
| Iterations of A | CPU time of A(s) | | Iterations of B | CPU time of B(s) | ||
Park | 13 | 259 | 40.125 | 13 | 658 | 79.528 | 60.7% |
Fisher | 11 | 238 | 37.524 | 11 | 932 | 122.45 | 74.5% |
Pepper | 21 | 245 | 37.283 | 21 | 709 | 88.194 | 65.4% |
Man | 11 | 217 | 30.478 | 11 | 707 | 86.526 | 69.3% |
Cameraman | 13 | 274 | 45.589 | 13 | 1063 | 132.36 | 74.2% |
Barbara | 13 | 235 | 34.235 | 13 | 817 | 106.75 | 71.2% |
We propose a cascaded algorithm for no-reference image quality assessment and image denoising based on CNN and visual saliency, to be applied in image security and authorization. This algorithm uses a strategy of significantly weighting image blocks to obtain a quality assessment more in line with the human visual system. This method has achieved very good performance on the LIVE database. The evaluation method is embedded into an image denoising algorithm, such as the total variation denoising algorithm. We propose a parameter selection framework that can judge the advantages and disadvantages of the parameters according to the quality change of the denoising image generated during the iterative process. And it is verified that the iteration number of denoising process is significantly reduced by parameter selection. Meanwhile, we can accurately find the best parameters from a bunch of parameter candidates.
The data used to support the findings of this study are included within the article.
The authors declare that they have no conflicts of interest.