Estimation of Particulate Levels Using Deep Dehazing Network and Temporal Prior

Particulate matters (PM) have become one of the important pollutants that deteriorate public health. Since PM is ubiquitous in the atmosphere, it is closely related to life quality in many different ways. Thus, a system to accurately monitor PM in diverse environments is imperative. Previous studies using digital images have relied on individual atmospheric images, not benefiting from both spatial and temporal effects of image sequences. This weakness led to undermining predictive power. To address this drawback, we propose a predictive model using the deep dehazing cascaded CNN and temporal priors. The temporal prior accommodates instantaneous visual moves and estimates PM concentration from residuals between the original and dehazed images. The present method also provides, as by-product, high-quality dehazed image sequences superior to the nontemporal methods. The improvements are supported by various experiments under a range of simulation scenarios and assessments using standard metrics.


Introduction
Particulate matters (PM) are small particles suspended in the air, generally having an aerodynamic diameter smaller than or equal to 10 μm (micrometers). PM originates from anthropogenic activities, (e.g., combustion of fossil fuel, dust) as well as natural sources (e.g., mineral dust, volcanic ash). PM measurements are commonly made for particles with an aerodynamic diameter smaller than or equal to 2.5 μm (PM 2.5 ) and 10 μm (PM 10 ). The size of PM is directly associated with health problems [1]. Inhaling the small particles is known to be hazardous as they can infiltrate in depth into the respiratory system [2]. In this regard, PM 2.5 has been widely used as a key indicator of the air quality index, and thus, we focus on PM 2.5 in this study (hereafter abbreviated as PM for brevity). Many experts point out that the recent increases of PM in many parts of the world are attributed to the rapid growth in global energy consumption [3]. Over the years, efforts have been made to identify adverse effects of PM on public health and environment [4][5][6]. Surprisingly, the large-scale retrospective cohort study of lung cancer by the World Cancer Institute reported that PM is ascertained as primary carcinogens, as the risk of lung cancer increased by 22% for an increase of PM by 10 μg/m 3 [7]. The air pollution from PM has been one of the most controversial issues over East Asia. This has become no longer negligible, and media as well as researchers try to inform the public of its detrimental effects [8]. Particularly, it was reported that the annual mean PM concentration in South Korea is twice as high as in the Organisation for Economic Co-operation and Development (OECD; https://www.oecd.org/) countries [9]. Under this circumstance, it is imperative to build an accurate air monitoring system to facilitate alerting the public prevention. The Korean government drastically expanded the PM monitoring network to help improve PM forecast service to a remarkable extent. However, there are still not enough PM monitoring stations to cover the whole country. Moreover, most of the stations are distributed in urban areas (e.g., Seoul, Busan), leaving many suburban and rural undermonitored.
Two different types of approaches can be used to estimate PM concentrations: sensor-based approaches and visionbased approaches.
1.1. Sensor-Based. Improvements in PM measurements using sensor-based approaches have been made to develop more precise sensor units [10]. There are two types of devices [11], i.e., microbalance PM monitoring stations (accurate but expensive [12]) and portable light-scattering-based PM monitors [13,14]. For instance, the Korea Meteorological Administration (KMA) now operates 475 measuring stations and publicly reports PM concentration levels (i.e., limited to the station vicinity) every hour [15]. Most of the instruments operated by KMA are Tapered Element Oscillating Microbalance (TEOM), which is designed to directly weight PM on the filter [11]. Although highly accurate, TEOM is relatively expensive to install and maintain (approximately 200K USD per year) [16] and is bound to space limitations, thereby undermining practicality. The light-scattering method is relatively affordable. In [11,17], collected PM via airflow structure are measured by densely deployed sensors. However, both methods relying on large-scale sensing nodes inevitably suffer from the expensive maintenance costs for high coverage and reliability. Recently, several newly developed devices for mobile facility (e.g., balloons and drones) are very interesting [10,18,19], but carrying sensors to acquire data is still highly energy-intensive and less practical to use.

Vision-Based.
This approach is less explored compared to the sensor-based approach, and so there is a lot of room for improvement. To the best of our knowledge, all the vision-based studies exploit only individual images (e.g., [20][21][22]). In this case, it is fairly sensitive to motion blurring frequently caused by camera or subject movements. Specifically, Liu et al. [20] manually determined several regions of interest targeting distant objects to derive the transmission map. The explanatory power of the transmission map for PM estimation has proven its efficiency [20][21][22][23]. However, the need for selecting regions of interest is the pain point. Li et al. [21] leveraged heterogeneous data composed of GPS, camera lens, magnetic sensor, official station, and image data. Combining these multiple data, they generate highdimensional features using kernel methods. Pan et al. [22] extracted haze effects using Adaptive Transmission Map [24] and pass derived features to the deep neural network designed on the basis of the well-known Boltzmann machine [25]. Since it is assumed that transmission values in a local patch (a.k.a. window) are the same constant, the dehazed images inevitably contain blocking artifacts [22,26,27]. The rest of this paper is organized as follows. In Sections 2 and 3, we introduce related works and the proposed method, respectively. Section 4 describes the numerical experiments.
We discuss the results and conclude in Section 5.

Related Work
According to the atmospheric scattering model (ASM), two factors are involved in the formation of a haze image: (1) direct attenuation and (2) airlight. When we take a photo, reflected radiance coming from objects is attenuated while reaching the camera. This is due to the effects of atmosphere absorption (a.k.a. direct attenuation), and the intensity of attenuation is proportional to camera distance (scene depth). In addition to this, there is another light, called airlight, resulting from scattering of neighboring light sources (e.g., sun) by haze [28]. Importantly, airlight is known to shift the color range of the object, and direct attenuation describes the scene radiance and its decay. Figure 1 illustrates how image degradation occurs under haze conditions. With a little of algebra, this process can be formulated as follows: where x ∈ R 2 denotes the pixel, I ∈ R 3 is the observed haze image in RGB channels, J ∈ R 3 is the haze-free image, t ∈ ½0, 1 is the medium transmission describing the portion of the scene radiance that reaches the camera, and A is the global atmospheric light. Note that A is assumed to be homogeneous throughout the image and determined using empirical techniques. The first term JðxÞtðxÞ on the right-hand side of equation (1) represents direct attenuation, and the second term Að1 − tðxÞÞ corresponds to airlight. The farther the distance between a camera and objects is, the thicker the atmospheric layer exists between them.
where β is the scattering coefficient of atmosphere, and dðxÞ is the distance from the object to the camera. Equation (2) indicates that the scene radiance is attenuated exponentially with the scene depth d. Here, it is intuitively understandable that some haze effect HðxÞ deteriorates the haze-free image J ðxÞ (i.e., tðxÞ = 0), causing the haze image IðxÞ. Starting from this simple idea, we can consider the following relation: Related to image dehazing (a.k.a. haze removal), we restore J, A, and t from the observed haze image I.

Proposed Method
To measure PM via image sequences (i.e., video clip) is the ultimate goal of our study, and the deep neural network serves as a feature extractor. To achieve our goal, we first learn the deep dehazing network that extracts informative features based on strong correlation with PM concentration levels. In this section, we introduce two strategies working together for the network: (1) deep compression for energy efficiency in light of the model architecture and (2) temporal priors to capture haze-related features in image sequences. The specific design of our dehazing network is presented in Figure 2.  [29]. Inspired by this, we formulate the feature extraction network (FEN) at the front part of our dehazing network with deep cascaded convolutional layers of 436 kernels as in Figure 2. However, going deep convolutions involves too many parameters in the network and high computational cost. We thus seek to simplify the network without loss of accuracy. To address this hurdle, we compress the FEN using network pruning (NP). This has been widely used to reduce network complexity and prevent overfitting [30][31][32]. We also eliminate a waste of unnecessary parameters in our network without performance loss. Since the local feature is more important than the global feature in image restoration [33], we reduced the number of kernels in a row (see Table 1 [34]) to image restoration network (IRN) that appears in the second half of our dehazing network in Figure 2. The P1CL proves its applicability to enhance the representational power of neural networks [34]. GoogLeNet fully exploits this technique not only to make the network deep in the name of the inception module but also to reduce the dimensions inside these modules [35]. Inspired by GoogLeNet, we place the P1CL in front of the IRN to reduce the dimension of the feature maps accumulated through the FEN. This does not simply imply more of dimension reduction. In doing so, we compress the feature maps so as to narrow down the scope of involving features. Besides, the P1CL also includes the use of rectified linear activation [36]. In general, each P1CL consists of one or more 1 × 1 convolutions followed by a nonlinear activation function (e.g., ReLU), which adds more nonlinearity and thereby helps approximate the highly nonlinear function such as ASM. Taking all together, we form the IRN approximating the following equation derived from equation (3) with a little algebra: where HðxÞ means the haze effect and the two terms JðxÞ ðtðxÞ − 1Þ and Að1 − tðxÞÞ correspond to the lost scene radiance (e.g., by scattering or absorbing) and the shift   3 Journal of Sensors of the scene color, respectively. The main reason for parallel processing in IRN is to divide the computational tasks and thereby reduce the burden of the network. The upper and lower unit estimates the lost scene radiance and the shift of the scene, respectively. In the blessing of parallel architecture, the IRN can expedite the work in a simultaneous processing fashion, and it also helps prevent overfitting problem with reduced network complexity. The residual between original and dehazed images can serve as an explanatory variable related to haze effects for predicting PM levels. Motivated by this, we set a model of the relationship between the variables of haze effects and PM levels. Using components stored in estimated haze effectsĤ (in Figure 3), we can estimate PM concentration levels.

Temporal Prior.
Since the PM consists of infinitesimal particles floating in the air, they can be characterized by a transport phenomenon such as flow motions in video. Although we cannot directly identify flow motion of PM by tracking all particles, we can indirectly discover its effects through variability in the midst of multiple image sequences in video. As introduced by Kim et al. [37], differences between the original consecutive frames in both cases (safe ≤ 80 μg/m 3 and harmful > 80 μg/m 3 ) are obviously distinctive. This previous work reinforces the plausibility of hypothesis that the image distinctions across images show PM concentration levels are considerably significant. This is motivated by the fact that the moving objects happen to create subtle changes. Stepping beyond the previous work, inspired by prior in Bayesian, we impose the fluid flow model proposed by Xie et al. [38] in order to additionally accommodate feature variations: where t is the time domain and F, T, and P denote the fluid, the transport operator (a.k.a. advection or warp operator), and the temporal prior, respectively. Note that p t ∈ R 2 , the element of P t ∈ R 3 , is the vector, which has the velocity and flow direction field. For brevity, we replace the existing physical prior P t−1 , where P t−1 denotes the average over a period of time prior to t. More precisely, Figure 4 describes how the proposed temporal prior is made up based on sequential frames during the prior length, each of pixels calculating the average values of RGB across image frames. Taken together, we combine the original RGB (3 channels) and its corresponding temporal priors (3 channels) into augmented inputs for a total of six channels (see the first half of the network in Figure 2). When exploiting the priors almost consistent across image sequences, the network is expected to be superior in producing consistent haze effects, as compared to the model with no temporal prior.

Datasets.
In this section, we describe haze video datasets to apply. One of the challenges in creating real video datasets is that both consecutive haze frames and the corresponding haze-free frames are supposed to be perfectly matched up.
To circumvent a little, artificial haze video datasets based on existing videos can be used to train dehazing networks [39][40][41]. Yet, synthetic haze effects hardly accommodate natural flow motions that disperse light, and thereby, this wrongly distorts pixel values. Instead, we collected datasets of various environments including both indoor and outdoor areas (refer to the sample haze images in Supplementary Material (available here)). Here, one note of caution is that since indoor environments are not directly exposed to the outside atmosphere, in the case of indoor, it is difficult to collect video clips with high PM levels. Therefore, we had to open all the windows when PM was high or sometimes directly generate PM by burning incense or spraying potassium chloride. The thumbnail images are shown in Journal of Sensors predictive power, we compare the true hazy effects with predicted hazy effects. To do so, we fit the three regression-type models: (1) random forest regression (RFR), (2) support vector regression (SVR) with the radial basis function (rbf) kernel, and (3) multilayer perceptron regression (MPR). The metric to evaluate prediction accuracy is as follows: where AQI ground and AQI predicted refer to statistic (e.g., mean, entropy, and variance of HðxÞ) from ground truth and the proposed model. Table 3 summarizes the accuracy results for the test sets, and all indoor and outdoor scenarios are equally assigned.

Indoor and Outdoor Environment.
We collect over 2,000 video clips from the indoor office and corridors at the Konkuk University over several months. Here, the proposed model presents the best performance of accuracy with 86.72% when adopting MLPR with prior and the entropy benchmark. Especially notable is that accuracy tends to increase when applying the temporal prior across all scenarios. In this sense, it is confirmed that temporal priors can provide the network with additional haze-related information. In addition, indoor environments are believed to be less sensitive to environmental factors, and thus, it is understandable that indoor experiments outperform outdoor as a whole.
For outdoor experiments, we select two locations in South Korea populated with people in the midst of the residential areas and the building complex. For several months, we collect more than 3,000 video clips of each location. For reliability, we make sure that outdoor data consist of a widespread range of PM levels. In Table 3, SVR with entropy presents 72.16% and 82.45% for nonprior and prior, respectively. Interestingly, priors in outdoor data allow considerable accuracy

Experimental Chamber.
In simulations, it is essential to assess diverse environmental conditions. To this end, we especially design the experimental chamber to implement diverse conditions of interest. The experiments were carried out regarding four factors including wind, temperature, humidity, and illuminance. In the course of the experiments, the other confounding factors are well adjusted in the experiment chamber. We gather approximately 1,000 video clips across all experiment conditions. In Table 4, the results show that the MLPR with entropy consistently present high accuracy over 80% for almost all scenarios. Therefore, it is clearly confirmed that this prior-based model serves to adequately control environmental confounding factors that possibly intervene in haze effects.

Discussion
Undoubtedly, the latest AI has mainly focused on visionbased techniques (e.g., RGB and Lidar). Nonetheless, in the AI domain, infinitesimal materials still remain underestimated due to its invisible nature. In this regard, visionbased PM measurements are featured with many advantages of flexibility and accessibility in the view of real-time air quality monitoring and extension to spatial scales. Given our finding, even with a low-cost optical sensor, the proposed method can offer further benefits in business and cost savings in practical aspects. Moreover, this also serves as a predictive model using the deep cascaded CNN and temporal prior in a methodological aspect. Compared to existing vision-based predictive models, the proposed model stretches into accommodating additional temporal prior features among image sequences, aiming at improving predictive power in the virtue of data augmentation. With various simulation designs (e.g., real data and experimental chamber data), we confirm that the proposed models are superior to the traditional models without temporal priors, showing outstanding predictive power. Further improvements can be made by exploiting the optimal length of frames in the context of optimizing predictive power. This effort can facilitate to export it to a gauging device and promote practical utility, since all of vision-based models strongly depend on well-controlled  Journal of Sensors radiance, which considerably discourages vision-based measurement techniques at times. To address this issue, we plan on developing alternative prediction models that account for particulate-related features only. We leave these topics for future research.

Data Availability
All the datasets and the codes are available at the author's website (http://www.hifiai.pe.kr/).

Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.