Research Article Towards Automatic Image Exposure Level Assessment

. The quality of acquired images can be surely reduced by improper exposures. Thus, in many vision-related industries, such as imaging sensor manufacturing and video surveillance, an approach that can routinely and accurately evaluate exposure levels of images is in urgent need. Taking an image as input, such a method is expected to output a scalar value, which can represent the overall perceptual exposure level of the examined image, ranging from extremely underexposed to extremely overexposed. However, studies focusing on image exposure level assessment (IELA) are quite sporadic. It should be noted that blind NR-IQA (no-reference image quality assessment) algorithms or metrics used to measure the quality of contrast-distorted images cannot be used for IELA. The root reason is that though these algorithms can quantify quality distortion of images, they do not know whether the distortion is due to underexposure or overexposure. This paper aims to resolve the issue of IELA to some extent and contributes to two aspects. Firstly, an Image Exposure Database (IE ps D) is constructed to facilitate the study of IELA. IE ps D comprises 24,500 images with various exposure levels, and for each image a subjective exposure score is provided, which represents its perceptual exposure level. Secondly, as IELA can be naturally formulated as a regression problem, we thoroughly evaluate the performance of modern deep CNN architectures for solving this speciﬁc task. Our evaluation results can serve as a baseline when the other researchers develop even more sophisticated IELA approaches. To facilitate the other researchers to reproduce our results, we have released the dataset and the relevant source code at https://cslinzhang.github.io/imgExpo/.


Introduction
Exposure is the total amount of light falling on a photographic medium when capturing an image [1]. Improper exposure will inevitably reduce the quality of the acquired images, e.g., bringing contrast reduction. us, how to assess exposure levels of images (videos) and to correct ill-exposed images (videos) are of paramount importance in the research area of multimedia.
An exposure distortion is understood as the overall quality degradation caused by improper exposure. In many industrial fields, a method that can accurately assess the exposure levels of images is in urgent need [2][3][4][5]. For example, almost all the modern digital cameras can work in "autoexposure" mode [2]. When the user is taking images with this mode, the camera will automatically adjust relevant hardware parameters (such as the aperture, the shutter speed, and the electronic gain [6]) using a particular autoexposure algorithm to make the collected images have proper exposure levels. Obviously, in order to verify the performance of an autoexposure algorithm, a method that can accurately assess the exposure levels of acquired images is indispensable. Another example commonly encountered is in video surveillance. For video surveillance, it is very common that lighting conditions are out of the adaptive capacity of the camera. Hence, it is quite necessary to continuously monitor the exposure level of the acquired video to determine its quality [4].
At present, commonly adopted approaches of judging whether an image is properly exposed are based on the experience of the photographers. ese kinds of schemes are of course costly and inefficient, lack robustness, and cannot be applied to systems requiring real-time exposure level scores. Hence, there is an urgent need to develop computational image exposure metrics.
is work tries to solve the problem of IELA (Image Exposure Level Assessment) to some extent. e ultimate goal is to obtain a computerized model that can objectively and effectively predict the overall exposure level of any given image, and the prediction results are anticipated to correlate well with human subjective judgements. e target algorithm should quantify exposure in a meaningful manner, which means that the same predicted exposure score should preferably correspond to the same exposure level across different image contents. Such an IELA algorithm has many potential applications. For example, it could be explored to measure or to optimize the performance of autoexposure models, which are of paramount importance for imaging sensor manufacturing industries.
In order to more clearly demonstrate the objectives of our work, in Figure 1, we present six images and give their exposure scores predicted by our proposed approach IEM SN (short for "Image Exposure Metric with ShuffleNet"; refer to Section 4 for details). It should be pointed out that exposure scores predicted by IEM SN can vary continuously from −1 to +1. "−1" implies that the assessed image is extremely underexposed, "0" implies that it is correctly exposed, and "+1" implies that it is extremely overexposed. e more the exposure score deviates from "0," the more serious the exposure distortion is. Using IEM SN  is example demonstrates that IEM SN 's prediction results of images' exposure levels correlate consistently with human judgements. e rest of this article is organized as follows. Section 2 introduces the related work, our motivations, and our contributions. Section 3 presents details of IE ps D (short for "Image Exposure Database"), which is our newly established benchmark dataset for the study of IELA. Section 4 presents our DCNN-based image exposure level assessment model, IEM X . Experimental results and related discussions are presented in Section 5. Finally, conclusions are provided in Section 6.

Related Work and Our Contributions
In this section, we first review some representative studies most relevant to our work, including existing approaches for IELA, approaches for no-reference (NR) quality assessment of contrast-distorted images, and approaches for blind NR image quality assessment (NR-IQA). en, our motivations and contributions are presented.

Existing
Approaches for IELA. At present, the work that specializes in IELA is quite sporadic. Human experience suggests that an image's exposure level could be characterized by its luminance histogram. It is generally believed that the histogram of a correctly exposed image spreads over the whole range of luminance; by contrast, histograms of overexposed (underexposed) images are shifted to the bright (dark) sides. Moreover, the higher the exposure distortion is, the more significant will be the shift. Several IELA metrics were proposed in the literature just based on this hypothesis. In Liu et al.'s patent [7], three quantities "center," "centroid," and "effective width" are first extracted from the image's luminance histogram and then the exposure level is derived from them using predefined rules. Based on the similar idea as Liu et al.'s invention, Rychagov and Efimov [8] patented a method for exposure estimation by comparing the mean of the illuminance histogram with predefined thresholds. In Romaniak et al.'s approach [4,9], the average luminance of three blocks with the highest mean luminance is regarded as the luminance upper bound L U and the average luminance of three blocks with the lowest mean luminance is regarded as the luminance lower bound L L . en, the exposure metric is calculated as (L U + L L )/2.

Approaches for NR Quality Assessment of Contrast-Distorted Images.
In most cases, improper exposure can reduce the contrast of the acquired images. Hence, studies focusing on NR quality assessment of contrast-distorted images are quite relevant to our work. e recent progress made in this area is briefly reviewed here.
On seeing that a database specially dedicated to contrastdistortion assessment was lacking, Gu et al. [10] established a database comprising contrast-changed images and their associated subjective ratings.
With respect to quality assessment models of contrastdistorted images, existing schemes can be roughly classified into two categories: the ones based on supervised learning (SL) and the ones not based on SL. Representative approaches based on SL include [11][12][13][14]. In [12], Fang et al. first derived five NSS models (in the form of probability density functions) based on the moment (mean, standard deviation, skewness, and kurtosis) and entropy features from images in SUN2012 [15]. en, for any given image, a set of five likelihood features can be extracted based on learned NSS models. Finally, they adopted SVR (support vector regression) to find the mapping between the feature vectors and the perceptual quality scores. Inspired by Fang et al.'s idea [12], both Ahmed et al.'s work [11] and Wu et al.'s work [13] followed the similar "features + SVR" framework. In [11], Ahmed and Der extended the 5-D feature vector proposed in [12] to a 6-D one by introducing a new directional contrast feature derived from the curvelet domain. In [13], for feature extraction, Wu et al. extracted a 7-D feature vector (the image mean, the image variance, the image skewness, the image kurtosis, the image entropy, the mean of the phase congruency map [16], and the entropy of the phase congruency map) from each image. In Xu and Wang's approach [14], a 4-D feature vector, consisting of the perceptual contrast of the image, the skewness, the variance, and the intensity distribution number, is extracted from each image. Concerning the regression model mapping the feature vectors to perceptual quality scores, they resorted to a three-layer BP neural network. Panetta [17], the image is first partitioned into blocks. en, a local quality measure is derived for each block from its maximum and minimum luminance values. Finally, an overall single measure is obtained from local measures based on the PLIP (parameterized logarithmic image processing) model [19]. In [18], Gu et al. first removed predictable regions from the image and then they regarded the entropy of regions with maximum information as the local quality measure. ey also derived a global quality measure by comparing the image's histogram with the uniformly distributed histogram of maximum information. Finally, an overall quality score was generated as the weighted mean of local and global measures.

Approaches for Blind NR-IQA.
Another research area related to our work is blind NR-IQA, which aims to devise algorithms to predict the image's perceptual quality without knowing its high-quality reference nor its quality distortion type. Hence, the recent progress made in this area will be reviewed as well.
With respect to blind NR-IQA models, most of the existing ones are "opinion aware," meaning that they are obtained by training on a dataset comprising quality-distorted images and the corresponding subjective scores. Typical approaches belonging to this category include [26][27][28][29][30][31][32][33], and they have similar architectures. At the training phase, the set of feature vectors is first extracted from the training images, and then a regression model that maps the feature vectors to the associated subjective scores is learned. At the testing phase, given an image f to be assessed, its feature vector is extracted first and then is input into the learned regression model at the training phase to predict f's objective quality score. Different kinds of regression models are adopted in these methods, including the SVR [28,[31][32][33], the BP (backpropagation) neural networks [27], and the deep neural networks [26,29,30,34].
Having noticed the disadvantages of opinion-aware blind NR-IQA models with respect to the generalization ability and training sample collection, some researchers proposed adding new vectorized labels to aid evaluation [35], and some researchers began to develop opinion-unaware IQA models. ese kinds of models do not rely on quality-distorted training images nor subjective scores. Some eminent studies in this research direction have been reported. In [36], Mittal et al. proposed the Natural Image Quality Evaluator (NIQE) model. Given an image f to be evaluated, NIQE first extracts from it a set of local features and then fits them to a multivariate Gaussian (MVG) model. e perceptual quality of f is expressed as the distance between its MVG model and the MVG model learned from the image set composed of high-quality natural images. Inspired by [36], Zhang et al. [37] introduced three additional types of quality-aware features. At the test stage, on each patch of a test image, a best-fit MVG model is computed online. e overall quality score of the test image is then obtained through pooling the patch scores by averaging them. In [38], Xue et al. synthesized a virtual image set, in which the perceptual scores of the quality-distorted images were provided by FSIM (a full-reference IQA algorithm) [39].
en, an NR-IQA model was learned from the established dataset by patch-based clustering. In [40], Wu Figure 1: (a-f ) Six images with various exposure levels. eir exposure scores predicted by our approach IEM SN are −0.8870, −0.5043, −0.2577, 0.1368, 0.4739, and 0.5697, respectively. e output range of IEM SN is from −1 to +1. "−1" implies that the image is extremely underexposed, "0" implies that it is properly exposed, and "+1" implies that it is extremely overexposed.

Our Motivations and Contributions.
rough the literature survey, it can be found that though IELA is a problem of paramount importance, systematic and in-depth studies in this field are still lacking.
First, all the existing IELA metrics [4,[7][8][9] are derived from luminance histograms, and accordingly, their shared drawback is that they are not image content-independent. In most cases, a useful IELA metric is expected to be contentindependent. However, existing IELA metrics do not satisfy this requirement because they are totally defined on luminance histograms. As shown in Figures 2(a)-2(c), three images have the same image content, but their histograms have different distribution patterns because of their different exposure levels.
e histogram of the properly exposed image (Figure 2(a)) expands over the whole luminance range, while the histogram of the overexposed (underexposed) image moves to the right (left) as shown in Figure 2(b) (Figure 2(c)). Existing IELA methods [4,[7][8][9] were designed precisely based on the assumption that images' perceptual exposure levels could be well characterized by their luminance histograms. However, this assumption becomes problematic when applied to images taking from various scenes. As shown in Figures 2(d)-2(f ), though all three images are exposed correctly, their histogram distribution patterns differ apparently from each other owing to their different contents. As a consequence, when dealing with images similar to Figures 2(d)-2(f ), IELA metrics totally based on luminance histograms [4,[7][8][9] would yield erroneous prediction results. In a word, the outputs of [4,[7][8][9] depend on image contents, and consequently, their accuracy in measuring the image exposure level is quite limited.
Second, blind NR-IQA algorithms or metrics used to measure the quality of contrast-distorted images cannot be used for IELA. When an image with improper exposure is fed into these algorithms, they can quantify its quality degradation caused by improper exposure, but the evaluation results cannot indicate whether the degradation is due to underexposure or overexposure. is fact is further illustrated by examples shown in Figure 3. By perceptual evaluation, it can be found that the images in Figures 3(a)-3(c) are underexposed, properly exposed, and overexposed, respectively. eir objective scores evaluated by "NIQMC" [10], "CS-BIQA" [33], and "IEM SN " are presented in Table 1. NIQMC is a state-of-the-art metric to measure the quality of contrast-distorted images, and a higher NIQMC score indicates higher contrast. CS-BIQA is a representative modern blind NR-IQA model, and a lower CS-BIQA score indicates higher quality. IEM SN is our proposed IELA model (refer to Section 4 for details) trained on our established dataset used for the IELA study (refer to Section 3 for details). From Table 1, it can be seen that NIQMC and CS-BIQA can characterize an image's quality degradation quite well. However, whether the examined image is underexposed or overexposed cannot be reflected from their results. By contrast, the proposed IELA model IEM SN can accurately and unambiguously evaluate the exposure levels of given images. e interpretation of IEM SN 's output can be found in Section 1.
ird, there is no publicly available benchmark dataset specially designed to study the IELA problem. To design and evaluate IELA approaches, such a dataset is actually indispensable.
is work attempts to fill the aforementioned research gaps partially.
e major contributions are briefed as follows.
(1) To facilitate training and testing IELA models, a benchmark dataset, namely, IE ps D (Image Exposure Database), has been established. IE ps D contains 24,500 images with different exposure levels. 3,500 of them were collected from the real-world while the other 21,000 ones were synthesized from properly exposed source images by using our exposure simulation pipeline. For each image in IE ps D, a corresponding subjective score is provided to represent its perceptual exposure level. To our knowledge, IE ps D is the first large-scale benchmark dataset established for the study of IELA. In our experiments, synthetic images in IE ps D are used for training IELA models, while real-world ones of IE ps D are used for testing. For more details about IE ps D, refer to Section 3. (2) e problem of IELA can be formulated as a regression problem from the input image to its subjective exposure score, which can be naturally solved by DCNNs (Deep Convolutional Neural Networks [45]). Hence, in this paper, a DCNN-based model IEM X (Image Exposure Metric using X) is proposed for IELA, which can learn an end-to-end mapping from images to their subjective exposure scores. Here "X" denotes a concrete DCNN architecture used. In experiments, a thorough evaluation has been conducted to assess the performance of modern DCNN architectures for IELA in the framework of IEM X (refer to Section 5 for details).
We have released IE ps D and the relevant source code at https://cslinzhang.github.io/imgExpo/ to facilitate the other researchers to reproduce our results.
A preliminary version of this manuscript has been presented on ICME 2018 [46]. e following improvements are made in this version: (1) the database IE ps D is substantially extended and a more reasonable way to perform the subjective evaluation of exposure levels is adopted; (2) the performance of blind NR-IQA models and metrics used to measure the quality of contrast-distorted images for addressing the problem of IELA is thoroughly investigated and analyzed; (3) thorough performance evaluation of modern DCNN architectures in the framework of IEM X is conducted; and (4) more competing IELA models are evaluated in experiments.

IE ps D: A Benchmark Dataset for IELA
As stated in Section 2, in view of the fact that a database specially dedicated to IELA still lacks in the community, we are motivated to establish such a dataset in this work. is section will discuss details about the establishment of our image exposure dataset IE ps D and its practical use. By collecting and synthesizing images of various exposure levels from different shooting scenes, IE ps D finally contains 24,500 images. Additionally, for each image in IE ps D, we provide it with a subjective score which is expected to represent its perceptual exposure level. ree phases were involved in constructing IE ps D, including collection of real-world images, generation of synthetic images, and finally subjective evaluation.

Collection of Real-World Images.
In order to accurately quantify an IELA algorithm's prediction accuracy on real data, IE ps D should include a large number of real-world Figure 2: (a-c) ree images having the same contents but different exposure levels, along with their luminance histograms. (a) is properly exposed, while (b) and (c) are overexposed and underexposed, respectively. (d)-(f ) are three images that are all properly exposed; however, their luminance histograms are quite different from each other due to their different contents.
(a) (b) (c) Figure 3: By perceptual evaluation, the images in (a), (b), and (c) are underexposed, properly exposed, and overexposed, respectively. eir objective scores predicted by different metrics are presented in Table 1. Table 1: Objective scores of images in Figure 3 obtained by different metrics.
Method Figure 3(a) Figure 3(b) Figure 3(c) NIQMC [10] 3 images. When taking these images, the shooting scenes need to be as diverse as possible, meaning that they should cover different kinds of objects (humans, plants, animals, humanmade objects, etc.), different periods of the day (morning, noon, afternoon, evening, and night), different lighting conditions, and different shooting distances. Taking these factors into consideration, we finally collected images from 500 shooting scenarios which were carefully planned. An iPhone7 Plus mobile phone was used for image collection. For digital cameras, exposure levels can be modulated in three ways. e first way is by enlarging or shrinking the aperture. e larger the iris aperture is, the more the light reaches the imaging sensor in a fixed period of time. e second way is by adjusting the ISO sensitivity. e last way is by varying the exposure time. To simplify data collection operations, we only changed the exposure time and kept the other factors unchanged to obtain 7 different exposure results, ranging from extremely underexposed to extremely overexposed.
In the end, 3,500 (7 × 500) real-world images were collected, and we denote the dataset formed by them by IE ps D_R. umbnails of 28 sample images selected from IE ps D_R are shown in Figure 4. In Figure 4, from top to bottom, images in each row belong to one specific shooting scenario; from left to right, the exposure levels are changing from "extremely overexposed" to "extremely underexposed."

Generation of Synthetic Images with Various Exposure Levels.
To get an IELA model with a satisfying generalization capability, a large-scale dataset, comprising a large number of images with various exposure levels, is indispensable for training. Unfortunately, establishing such a real-world dataset is extremely costly and laborious. In order to resolve this contradiction, we propose to use synthetic images for training IELA models. Actually, in the community of computer vision, researchers have recently found that the use of synthetic images can effectively alleviate the problem of insufficient real training data. is has spurred the development of pipelines for synthesizing photo-realistic images. Synthetic data have already been explored to train models to tackle the problems such as object detection [47], semantic segmentation [48], optical flow estimation [49], and so on. In this paper, we propose a novel method for generating synthetic images with various exposure levels from properly exposed source images.
Suppose that I is a given properly exposed source image. A synthetic image I with a different exposure level could be created by modulating I's illumination and saturation channels. In order to manipulate the illumination and saturation channels separately, we first convert I from the RGB space to the HSV space. Denote the illumination channel and the saturation channel of I by I v and I s , respectively. Similarly, denote the illumination channel and the saturation channel of I by I v and I s , respectively. I v is generated by adjusting I v as where x denotes the spatial location and θ is a global parameter controlling the amount of illumination adjustment.
θ should be positive when simulating an overexposed image, while it should be negative when simulating an underexposed one.
In addition, I s needs to be adjusted to I s accordingly. As suggested by Romaniak et al. [4], the mapping function between I(x)'s exposure level E in (x) and its saturation value I s (x) conforms to an inverse asymmetric logit function (I-ALF) given by the following equation: where a, b, and c are three given constants. I(x)'s exposure level E out (x) can be obtained by shifting E in (x) by a desired offset eps, i.e., At last, I(x)'s saturation value I s (x) can be calculated by the following asymmetric logit function (ALF): (4) Putting equations (2)-(4) together, we can get the formula for adjusting I s to I s as In our implementation, a is set to −3.2 and c is set to 0.4. By altering the values of parameters θ and eps, we can synthesize a series of I's variants with different exposure levels. Specifically, to construct IE ps D, seven exposure levels were synthesized. Alternatively, in other words, from each properly exposed source image, seven images (including the source image itself ) having different exposure levels, ranging from "extremely underexposed" to "extremely overexposed," were synthesized. Sample synthetic images generated by our proposed scheme are shown in Figure 5. In Figure 5, images in the first column are the properly exposed source images, based on which the synthetic ones are generated. Columns 2-4 are the synthetic results of overexposed images while columns 5-7 show the synthetic results of underexposed ones. By visual inspection, it can be found that using our proposed scheme, the appearance of synthetic images looks quite natural and correlates well with human perception.
To establish the synthetic image dataset, we collected a set of properly exposed images from the Internet. Four volunteers (postgraduate students from Tongji University, Shanghai, China) were involved, and each of them was asked to search for 1000 high-quality images covering four categories: people, plants, animals, and man-made objects. en, each of the 4000 collected images was visually examined by seven volunteer observers (undergraduate students from Tongji University). If no fewer than five of the seven observers confirmed that the image being examined was properly exposed, then the image was retained. rough this way, 3000 images were thus selected, and they were used as source images for generating the synthetic ones. Note that none of the images used here is included in IE ps D_R. Finally, using our proposed synthetic image generation model, 21,000 (7 × 3000) synthetic images were generated, and we denote the dataset formed by them by IE ps D_S.
To demonstrate the reliability of the synthetic images in the established dataset, the comparison between the real images and the generated images in terms of the brightness, which could reflect the applicability of simulated exposure levels to some extent, is conducted. Figure 6 shows the comparison of the average brightness distributions between the real images and the synthetic images.
e X-coordinate is the normalized average brightness value of the image. e Y-coordinate is the number of images. e seven colors of the bar represent different exposure levels. It can be found from the figure that the distributions of the average brightness of the real images and the synthetic images under various exposure levels are similar, which indicates that the algorithm proposed in this paper for generating synthetic images is reasonable and effective. e final dataset IE ps D comprises two parts, IE ps D_R (formed by real-world images) and IE ps D_S (formed by synthetic images).

Subjective Evaluation for IE ps D.
When the image set IE ps D is ready, the next step is to assign a subjective score to each image in IE ps D, which can reflect its perceptual exposure level.  e subjective evaluations were conducted following a single-stimulus strategy [50]. e reason for choosing a singlestimulus methodology instead of a double-stimulus one was that the number of images to be assessed was extremely huge for a double-stimulus study (we evaluated a total of 24,500 images). Subjective evaluations were performed on identical workstations. Monitors of workstations were all 22-inch LCD monitors, and their screen resolutions were all set to 1920 × 1080. Evaluations were conducted in an indoor environment with normal illuminations. Matlab software was developed to assist the subjective study. e lab setup is illustrated in Figure 7. Subjects taking part in the subjective evaluation were all undergraduate students of Tongji University, and they were inexperienced with image exposure level assessment. e number of subjects evaluating each image was 20.
For each participant, we explained to him/her the goal of the experiment and also the experimental procedure. We also showed each participant the approximate range of image exposure levels and the corresponding scoring results in a short training session. It needs to be noted that we used different images in the training session from those used in the actual experiment. During the subjective evaluation, images were displayed to a subject in random order, and for different subjects, the randomization processes were different. A subject reported his/her judgement of the exposure level by dragging a slider on a quality scale. e quality scale was marked both numerically and textually and was divided into five equal portions, which were labeled as "Extremely Underexposed," "Underexposed," "Normally Exposed," "Overexposed," and "Extremely Overexposed," respectively. After the subject evaluated the image, by uniformly mapping the entire quality scale to the range [−50, 50], the position of the slider was converted into an integer exposure score. By this way, raw exposure scores obtained from subjects were integers falling in the range [−50, 50]. e closer the score is to "0", the more likely the image is exposed normally. A score smaller than "0" means the examined image is underexposed, and a score above "0" means the image is overexposed. Moreover, the more the exposure score deviates from "0," the more serious the exposure distortion is.
Next, some postprocessing steps were applied to subjects' raw scores. At first, to eliminate the influence of different subjective evaluation standards of subjects, the raw scores d ij were normalized as where d ij is the exposure score of the image I j given by the ith subject, d i is the mean score of the ith subject, σ i is the   Mathematical Problems in Engineering standard deviation of his/her scores for all images, and z ij is the normalized score of the image I j given by the ith subject. en, we used a strategy similar to the one mentioned in [51] to filter out those heavily biased subjective scores, which satisfied where z j is the mean of the normalized scores of I j , T is a threshold constant, and σ j is the standard deviation of I j 's normalized scores. e mean of the remaining evaluation scores of I j was deemed as I j 's subjective exposure score s j : where N j is the number of valid subjective scores for I j .
Finally, s j is linearly rescaled to the range [−1, 1]. Now, for each image I j in IE ps D, we get a subjective score s j that reflects its perceptual exposure level.

Practical Use of IE ps D.
In addition to being used for IELA research, our dataset also has great potential in lots of relevant fields like high dynamic range (HDR) and image exposure correction.
HDR images can provide more dynamic range and image details and reflect the visual effects better in the real environment than ordinary images. e most common way to capture HDR images is to take a series of low dynamic range (LDR) images at different exposures and then merge them into an HDR image [52]. IE ps D contains sequences of images which are very diverse and often contain complex scenes with multiple objects. Such images usually possess the same content while having different exposure levels, so they can be used to generate HDR images and to conduct related studies.
For image exposure correction, IE ps D can be used as a benchmark dataset to evaluate correction methods via fullreference image quality assessment (FR-IQA) metrics. It provides properly exposed, overexposed, and underexposed images and associated subjective scores.

IEM X : A DCNN-Based IELA Model
In this section, we discuss how to build an IELA model. It is desired that such a model can accurately and efficiently predict the perceptual exposure level of a given image. Such a problem can be naturally formulated as a regression problem, which can be well addressed by DCNN (Deep Convolutional Neural Network) models [45] in an end-toend manner, mapping from input images to their associated exposure levels.
As is widely known, in the last five or six years, thanks to the emergence and maturity of DCNN, the field of multimedia processing has developed rapidly. In essence, DCNN is a representation learning technology [45]. During training, by providing a large amount of raw data to the DCNN model, it can automatically discover the suitable internal representation of data. Today, in many technical fields, DCNN-based approaches usually perform much better than non-DCNN-based ones due to the availability of larger training sets, deeper models, better training algorithms, and more powerful GPUs. e first CNN was invented by LeCun in 1989 [53], and since the year of 2012, more elegant and powerful DCNN architectures have been continuously proposed in the literature, such as AlexNet [54], VGG [55], GoogLeNet [56], ResNet [57], DenseNet [58], and Shuf-fleNet [59].
We denote the proposed DCNN-based IELA model by IEM X , where "IEM" is short for "Image Exposure Metric" and "X" represents the concrete DCNN model used (in this paper, four specific DCNN models are investigated in the framework of IEM X , including GoogLeNet [56], ResNet [57], DenseNet [58], and ShuffleNet [59]). For training IEM X , the established dataset IE ps D_S described in Section 3 is used. e lost function is defined as where W denotes the weights of the network, λ is a regularization parameter, I j is the jth training image whose subjective exposure level is s j , ‖W‖ F returns W's Frobenius norm, and N is the number of training samples in IE ps D_S. Implementation details of IEM X are presented in Section 5.1. e general framework of IEM X is presented in Figure 8.

Implementation Details of IEM X .
Four state-of-the-art or representative DCNN architectures, including GoogLeNet [56], ResNet [57], DenseNet [58], and ShuffleNet [59], are investigated in the framework of IEM X , and the corresponding concrete IELA models are referred to as IEM GN , IEM RN , IEM DN , and IEM SN , respectively. IEM X s were trained on IE ps D_S. For training IEM X , we used the fine-tuning strategy, i.e., IEM X was fine-tuned from the deep model pretrained on ImageNet [60] for the task of image classification. Actually, for GoogLeNet, ResNet, DenseNet, and ShuffleNet, the models pretrained on ImageNet were provided by their authors and we used them directly in IEM X . TensorFlow [61] was used as our deep learning platform. Key hyperparameters used when training IEM X s were set as "optimizer" � "ADAM" [62], "learning rate" � 0.001, "batch size" � 8, and "weight decay" � 0.0001.

Test Protocol.
e collected dataset IE ps D_R was used to evaluate the approaches' capability for predicting the image's perceptual exposure level. e performance of representative blind NR-IQA models, QA models for contrast-distorted images, and models specially designed for IELA was thoroughly studied and analyzed.
Four widely accepted metrics are adopted to evaluate the performance of the competing methods. e first two are the Spearman rank-order correlation coefficient (SROCC) and the Kendall rank-order correlation coefficient (KROCC). Both of them compute the correlation coefficients between the objective scores predicted by the IELA models and the Mathematical Problems in Engineering subjective exposure scores provided by the dataset. SROCC is defined as where d i is the difference between ith image's ranks in objective and subjective judgements and M is the number of images in the test set. KROCC is defined as where M c and M d are the numbers of concordant and discordant pairs in the test set, respectively. SROCC and KROCC are both nonparametric rank-based correlation metrics, implying that they depend only on the rank of the data points. e third metric is the Pearson linear correlation coefficient (PLCC) between subjective scores and objective scores after a nonlinear mapping. Denote by s i M i�1 and o i M i�1 the set of subjective scores and the set of corresponding objective scores, respectively. First, a nonlinear mapping given by the following regression function [20] is applied to o i : where β i , i � 1, 2, . . . , 5 are the model parameters that could be fitted using a nonlinear regression process to maximize the correlation between q i M i�1 and s i M i�1 . After that, we can compute the PLCC value by e last metric is the RMSE (root mean squared error) between s i M i�1 and q i M i�1 , which is defined as Different from SROCC and KROCC, PLCC and RMSE can measure the prediction accuracy of IELA models. A better IELA model is anticipated to have higher SROCC, KROCC, and PLCC values and a lower RMSE value.

Evaluations of QA Models for Contrast-Distorted Images.
As we have stated in Section 2.2, in most cases, improper exposure can decrease the image's contrast. us, the studies focusing on quality assessment of contrast-distorted images (QACDI) are quite relevant to our work and it is reasonable to clearly know their performance for addressing the problem of IELA. erefore, in this experiment, we evaluated the performance of six eminent QACDI models on IE ps D_R. e QACDI models evaluated included logAME [17], NR-CDIQA [12], NIQMC [18], and methods in [11,13,14]. It needs to be noted that NR-CDIQA and models in [11,13,14] are based on supervised learning and they were trained on a subset of CSIQ [21] comprising contrast-distorted images with associated subjective quality scores.
e evaluation results are listed in Table 2. In addition, we also list the results of our IELA model IEM SN in Table 2 for comparison.
Actually, existing blind NR-IQA models can be classified into two categories, opinion-aware ones and opinion-unaware ones. e opinion-aware models are obtained by training on a dataset comprising distorted images and associated subjective scores while the opinion-unaware ones do not require those kinds of training sets. BRISQUE [31], SSEQ [28], OG-IQA [27], NOREQI [32], CS-BIQA [33], and HyperIQA [63] are opinion-aware ones, and in this experiment, we used the trained models provided by their authors (for these five blind NR-IQA schemes, models provided by the authors were all trained on the entire LIVE dataset [20]). e other five models, NIQE [36], QAC [38], Deep convolutional neural networks X s I Figure 8: e general framework of the proposed DCNN-based IELA model IEM X . I is the input image, and s is its predicted perceptual exposure level. "X" is a specific DCNN model. IL-NIQE [37], LPSI [40], and TCLT [42], are opinion-unaware ones.
Results of this experiment are reported in Table 3. In addition, we also list the results of our IELA model IEM SN in Table 3 for comparison.  Table 4.

Evaluations
In order to make a more convincing conclusion on the performance of the models, some statistical analysis is necessary [64]. We performed a left-tailed F-test [65] based on the prediction residuals of each model. e results of the significance test are shown in Figure 9. It can be seen that our method is much better than all other models. 5.6. Discussion. Based on the experimental results reported in Sections 5.3 ∼ 5.5, the following conclusions could be drawn.
(1) Existing QACDI models or blind NR-IQA models cannot address the problem of IELA quite well. From the results presented in Tables 2 and 3, it can be seen that using these models, the assessment results of images' exposure levels do not correlate well with the subjective evaluations. Specifically, the best-performing QACDI model is Xu and Wang's model [14], whose SROCC value is 0.6716, and the bestperforming blind NR-IQA model is QAC, whose SROCC value is 0.5415. Both of them perform much worse than the approaches specially designed for IELA, whose results are reported in Table 4. e poor performance of blind NR-IQA algorithms and QACDI models should be mainly attributed to the fact that they cannot tell whether the quality distortion is caused by overexposure or underexposure. Another reason is that none of the existing datasets commonly used to train IQA models comprises image samples with associated subjective exposure level scores.
(2) e proposed DCNN-based IELA model IEM X performs extremely well for predicting perceptual exposure levels of real-world images. From the results listed in Table 4, it can be seen that all the variants of IEM X can achieve high SROCC, KROCC, and PLCC values and low RMSE values. IEM X 's performance is greatly better than the other IELA models evaluated for comparison. Especially, IEM SN performs the best among all the models evaluated, whose SROCC and PLCC values are 0.9850 and 0.9750, respectively. (3) e proposed method for generating synthetic images with various exposure levels is quite reasonable. In order to provide sufficient data for training the DCNN-based IELA model IEM X , we propose a method to generate synthetic images with various exposure levels as described in Section 3.2. With this strategy, we generated the dataset IE ps D_S, based on which IEM X was trained. en, IEM X was tested on IE ps D_R, consisting of real-world images. In other words, IEM X was trained on synthetic images, but it was tested on real-world images. e results reported in Table 4 demonstrate that even though IEM X s were trained on synthetic data, they perform quite well in predicting a real-world image's exposure level. is fact implies that the scheme we proposed for generating synthetic images with various exposure levels is quite effective. Such a scheme significantly reduces the cost of preparing data for training IELA models. How to effectively make use of synthetic data to solve   vision problems should be given more attention by researchers.

Conclusion and Future Work
IELA models are highly desired in some vision-related industries. However, systematic studies specially focusing on this issue are still lacking. is work attempts to fill this research gap, and the contributions are from two aspects. First, an Image Exposure Database, namely, IE ps D, containing 24,500 images with multiple exposure levels, was established. For each image in IE ps D, we provide it a subjective exposure score representing its perceptual exposure level. IE ps D can serve as a benchmark to train and test IELA models. To the best of our knowledge, IE ps D is the first of its kind. Second, we formulated the IELA problem as a regression problem and proposed a DCNN-based solution IEM X . Four specific DCNN architectures, GoogLeNet, ResNet, DenseNet, and ShuffleNet, were investigated in the framework of IEM X . Experimental results show that IEM X yields much better exposure level prediction performance than all the compared competing methods. Experimental results also corroborate that blind NR-IQA models or QACDI models could not yield acceptable performance when being exploited to address the IELA issue. In near future, we will consider how to embed IELA metrics into the design of autoexposure algorithms.
Data Availability e relevant source code and dataset have been made publicly available at https://cslinzhang.github.io/imgExpo/.

Conflicts of Interest
e authors declare no conflicts of interest.

Method
Liu et al. [28] Rychagov et al. [46] Romaniak et al. [ Figure 9: e results of the significance test on dataset IE ps D_R. A value of 1 means that the first model (indicated by the row) has better performance than the second model (indicated by the column), with a confidence level of greater than 95%. A value of 0 means that the first model is not significantly better than the second model. If the value is always 0 no matter which of the two models is the first one, there is no significant difference in the performance of the two models.