A Statistical Image Feature-Based Deep Belief Network for Fire Detection

Detecting ﬁres is of signiﬁcance to guarantee the security of buildings and forests. However, it is diﬃcult to fast and accurately detect ﬁre stages in complex environment because of the large variations of the ﬁre features of color, texture, and shapes for ﬂame and smoke images. In this paper, a statistic image feature-based deep belief network (DBN) is proposed for ﬁre detections. Firstly, for each individual image, all the statistic image features extracted from a ﬂame and smoke image in time domain, frequency domain, and time-frequency domain are calculated to construct training and testing samples. Then, the constructed samples are fed into DBN to classify the multiple ﬁre stages in complex environment. DBN can automatically learn fault features layer by layer using restricted Boltzmann machine (RBM). Experiments using the benchmark data of three groups of ﬁre and ﬁre-like images are classiﬁed by the present method, and the classiﬁcation results are also compared with those commonly used support vector machine (SVM) and convolutional deep belief networks (CDBNs) to manifest the superiority of the classiﬁcation accuracy.


Introduction
Accurately and timely detection of fires is important to save life, property, and economic losses. Recently, fire detection methods have been developed to monitor forest fires, civil infrastructure, and industrial fires [1][2][3][4][5][6]. 23,535 fire incidents of buildings in 18 cities around the world in the year 2017 were reported by the International Association of Fire and Rescue Services (CTIF) [7]. erefore, accurately and timely detection of fires using sensors is of great significance to protect the social security [8].
Fire features, such as heat, gas, flame, and smoke, are the most commonly used in fire detection techniques to monitor fire. Point sensors are the most directly and commonly used fire sensing techniques for engineering applications [9][10][11]. However, point sensors might lead to high false positives using signal processing techniques to extract features under complicated fire environment [11]. Fire images are the effective and direct representations of fires under complex outside environment, which has attracted numerous researchers to construct modern fire alarm systems. Smokes or flames are the first signal when there is a fire. erefore, effective smoke or flame detection plays an important role in fire detection. e color, location, height, and optical density are the features of a smoke for the combination of gases, airborne solid, and liquid particulates [12][13][14][15]. However, color-based smoke detection is always reliable for the confusion of gray or black color smoke with nonsmoke pixels of other objects [16]. e fuel and oxidant exothermic reactions create flames with various colors, flickering motions, and dynamic texture feature. e spatial and temporal wavelet analysis combined with Weber contrast analysis and color segmentation was presented to enhance the dynamic texture feature of flames [17]. However, though flames have distinct color features, failed to recognize fire similar color objects in the backgrounds, which lead to a difficult problem on how to choose a suitable color model. In general, once there are fires occurred in buildings and forest, they can be reflected in the measured smokes and flames with certain characteristics. However, the smoke or flame features are usually too weak to be observed due to the strong background noises and other disturbances.
Traditional machine learning methods are support vector machine (SVM) [18], fuzzy neural networks [19], smoke segmentation-based local binary pattern Silhouettes coefficient variant (LSPSCV) [20], color segmentation-based radial basis function (RBF) nonlinear Gaussian kernel-based binary SVM [21], color segmentation-based fuzzy model [22], and fire frame segmentation-based Markov random field [23]. However, most existing studies on fire detection train models using smoke or flame from videos are often difficult to track more complicated smoke situations. Moreover, traditional machine learning methods only consider fire smoke or flame in an ideal background without too much disturbances. erefore, these artificial intelligent fire detection models still suffer from obvious shortcomings to learn complicated nonlinear relationships and thus have limited representation capacity.
Deep learning, as a new branch of machine learning, has been shown the excellent ability in learning features from raw images [24]. e most obvious property of deep learning models is the multiple layer structures. With multiple hidden layers stacked hierarchically, the deep learning model can realize very complicated transformation and abstraction of raw images [25][26][27][28]. Deep belief network (DBN) is a kind of the generative deep learning model with powerful feature learning ability [29].
Pundir and Raman adopted DBN to recognize fires in an accurate and robust way for varieties of scenarios such as wildfire-smoke video, hill base smoke video, and indoor or outdoor smoke videos [30]. DBN was carefully designed to extract the nonlinear features for a better description of the important trends in a combustion process [31]. Kaabi et al. [32] developed a Gaussian mixture model (GMM) and the corresponding energy attitude of smoke region based on RGB rules to preprocess smoke or flame images for the DBN classification. Wang et al. [33] extended DBNs to classify coal-fired boilers to further predict NOx emission. e main advantage of the intelligent diagnosis solution is that DBN does not rely on manual feature extraction and selection.
In this paper, a new fire detection method based on DBN is proposed to detect fires using smoke and flame images. Firstly, all the statistic image features extracted from the raw images are obtained to characterize the fire status. Secondly, the training and testing samples are constructed by the statistic image features. Finally, DBN is employed to identify the fire status using smoke or flame images in the strong background noises and other disturbances. Comparing with other commonly used methods, the classification accuracy ratio of the proposed method is manifested by using the open-access experimental investigations.

A Basic Theory of DBN
DBN is a kind of neural network with multiple hidden layers which allow DBN to learn complex functions which could then complete data transformation and abstraction by successive learning process. e main architecture of DBN includes an ensemble of stacked RBMs. Every two-layer neural network composes an RBM. Here, DBN is composed of three stacked RBMs and an output layer. e learning process of a DBN consists of two stages: one is pretraining every individual RBM layer by layer in an unsupervised manner; the other is to fine tune the whole network using the back propagation algorithm in a supervised manner. More details can be seen in References [34][35][36].

Overview.
e diagram of the fire detection method is shown in Figure 1. e general procedures are summarized as follows: Step 1. Predefine smoke or flame patterns.
Step 2. Collect the raw images of the smoke or flame patterns using the image capture system, respectively. Moreover, raw images of unknown status are also collected using the same image capture system.
Step 3. Calculate feature descriptor from the raw images using feature descriptors, i.e., color moments in HSV space, statistical features of gray-scale image, statistical features of gray-level co-occurrence matrix, local binary pattern histogram, and wavelet transform decomposition (WTD)-based statistical features of grayscale image, which will be discussed in details late.
Step 4. Construct the training samples for the known smoke or flame patterns and the testing samples for unknown smoke or flame patterns using the feature descriptors.
Step 5. Train each individual RBM layer by layer and then fine tune DBN (DBN learning).
Step 6. Obtain the unknown smoke or flame patterns after DBN learning process.

Color Moments in HSV Space
. RGB space is the commonly used color representation. However, researchers [37,38] found that HSV space is more close to human visual perception and is friendly to the base visual feature extraction. e HSV refers to hue (0 to 360°, H), saturation (0 to 1, S), and value (0 to 1, V). In HSV space, colors are represented as combination of the three channels. For any channel of HSV color space, the first-order moment (denoted as M 1 ) is calculated by the second-order moment (denoted as M 2 ) is and the third order moment (denoted as M 3 ) is 2 Complexity (3) Figure 2(a) shows a smoke image in RGB space, and Figure 2(b) shows the same image in HSV space. It does not show much difference in both spaces, except for a certain region of concern. Figure 2(c) shows the detection result of the smoke region in RGB space, and Figure 2(d) shows the same result in HSV space. It can be seen that HSV space shows a better property. erefore, the first, second, and third moments M 1 , M 2 , and M 3 in HSV space (H, S, V) will generate 9 × 1 statistical features vector F1.

Statistical Features in Gray-Scale Image.
Typical statistical features in gray-scale image [39] are considered in this part, as listed in Table 1. Images in gray-level bins of frequency distribution are represented by gray-level histogram, and it counts the pixels which are similar and stores it. e histogram analyses statistics of a single pixel with a certain gray level. It can reflect the changes occurred in the translation, rotation, and angle on individual parts of an image. Entropy is calculated based on the gray-level histogram. erefore, 7 statistical features of gray-scale images will generate 7 × 1 statistical features vector F2. ere will be a certain gray-level relationship between two pixels separated by a certain distance in the image space, i.e., the spatial correlation characteristics of the gray level in the image. e gray-level co-occurrence matrix (GLCM) [40] is a tool to describe such relations by measuring the co-occurrence frequencies among pairs of pixels of images in gray level. e GLCM is obtained by the statistics of two pixels with a certain distance in the image. e elements in the diagonals of the GLCM tend to have larger values when the image is composed of pixel blocks with similar gray values, while the elements that deviate from the diagonal will have larger values when the image pixels change locally.
GLCM reflects the comprehensive information about images in gray-level co-occurrence matrix.
rough the gray-level co-occurrence matrix, we can analyze the local pattern and arrangement rules of the image. In order to describe the texture, we usually obtain statistics on the basis of the GLCM. Table 2 gives the five typical statistics used in GLCM, which formed 5 × 1 statistical features vector F3.

Local Binary Pattern Histogram.
Local binary pattern (LBP) is an operator that is used to describe the local texture features of an image [41]. It has the characteristics of multiresolution, invariant gray scale, and rotation; therefore, it is a good measure for image feature extraction, as shown in Figure 3. LBP is often alongside with histograms to represent images as a feature vectors, i.e., LBPH. As a visual descriptor, LBP can be obtained directly using scikit-image [42] package, which needs at least 2  Complexity parameters, the number of circularly symmetric neighbor set points and the radius of circle. Histograms are then extracting using the generated LBP; we set 18 bins to get the final histogram of the raw image, i.e., a feature vector, denoted as b1 to b18.
From Figure 3, we can see that different parts of the frequency histogram form a certain texture of the image. erefore, local binary pattern histogram is chosen here as features for smoke/flame detection. F4 is an 18 × 1 statistical features vector combined by the frequency histogram.  Arithmetic mean Note. P n is the n-th pixel in the gray-scale image; N is the total pixel number. In the equation of median, P denotes the ordered list of P n , and x var and Inverse difference moment x P g 4 Entropy Note. x kur � N i�1 (x(i) − x m ) 4 /Nx 4 std denotes the element of GLCM, where x max � max(x(i)) and SF � x rms /( N i�1 |x(i)|/N).

WTD-Based Statistical Features of Gray-Scale Image.
WTD can be understood as a successive low-pass and highpass filters. erefore, image after WTD is a set of subimages which contains different details of the raw image. Figure 4 shows a 3rd level WTD, L3 is the low resolution approximation of the original image, and H3_h, H3_v, and H3_d; H2_h, H2_v, and H2_d; and H1_h, H21_v, and H1_d are the wavelet subimages of horizontal details, vertical details, and diagonal details at the 3rd level decomposition. Due to its simplicity and efficiency, the Haar wavelet is widely used, especially in the image processing fields such feature detection and image compression. erefore, the Haar wavelet is chosen in the present method. Figure 4 shows the corresponding subimages after the 3rd level WTD. e final 7 subimages are denoted as L3, H3_h, H3_v, and H3_d; H2_h, H2_v, and H2_d. As can be seen from Figure 5, the raw fire image and its L3, H3_h, H3_v, H3_d, H2_h, H2_v, and H2_d show different details. We calculate the statistical features of the 7 subimages as listed in Table 1. Finally, we get 7 feature vectors for each of the subimage, which are denoted as W1 to W7. In the present investigations, we only focus on lower frequency band, and the subimages of H1_h, H21_v, and H1_d are not considered.

Data and Evaluation
Metrics. In our experiment, two case studies on smoke detection and fire detection are carried out. e benchmark data are collected from smoke dataset [43] and fire dataset [44]. In smoke dataset, the first 500 images are characterized by the presence of the smoke and the last 500 images are normal scene images which contain roads, trees, white curtains, lights, jets of water, and so on. In fire dataset, the first 1048 images are characterized by the presence of the fire and the last 1048 images are normal images with natural landscape and some of which contain sunlight that appeared with a large similarity to fires. Table 3 lists the brief information of the two datasets. Figure 6 shows some of the images. We can see that it shows great difference between images, which means it is very challenging to detect smoke from the images.
To evaluate the performance of the present method, we use the same evaluation metrics mentioned in [40], which are detection rate, i.e., the true positive rate (TPR), false alarm rate (i.e., FPR), and average accuracy rate (AAR) given as follows: AAR � TP + TN (TP + TN + FP + FN) , (6) in which TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative.

Result and Discussion.
We first apply the model to smoke detection. In the case study, we conduct ablation study w.r.t various feature combinations and comparison study with other methods. e smoke and nonsmoke images are divided into training set and testing set with ratio equals  6 Complexity to 0.6. Features are extracted from the images to feed into the deep belief networks, i.e., F 1 , F 2 , F 3 , F 4 , F 5 , F 6 (F 1 and F 2 ), F 7 (F 3 and F 4 ), F 8 (F 1 , F 2 , F 3 , and F 4 ), and F 9 (F 1 , F 2 , F 3 , F 4 , and F 5 ). Details are shown in Table 4. Figure 7 shows the results of the ablation study over three metrics with respect to (w.r.t.) 9 statistical features vectors. Figure 8 is the average accuracy and the standard deviation over 9 statistical features vectors F 1 , F 2 , F 3 , F 4 , F 5 , F 6 , F 7 , F 8 , and F 9 . e combination of color features, textual features, and wavelet features shows the best performance. en, we conduct the comparison study with another two machine learning methods. One is the convolutional deep belief networks (CDBNs) [45]. e reason we choose the model for comparison is that it could learn the hierarchical representations from raw images in an unsupervised learning manner without any label information, which is somewhat like the way we extract the statistical features.
e other one is the support vector machine (SVM) [46]. Codes are released by the authors. DBN used in the proposed method is simply set to 200-100-50, while the CDBN is a two-layer unsupervised feature learning model followed by a softmax classifier, whose parameters are listed in Table 5. And the kernel type of the SVM is sigmoid, with gamma set to 1 and cost set to 10. e others take the default values. e inputs to the models are all same except for the CDBN.
We take 50 trails to obtain average results of the three evaluation metrics over the three comparison methods, as shown in Figure 9. e metric values of TPR, FPR, and AAR of the proposed method are 97.14%, 5.28%, and 95.88%, respectively. e metric values of TPR, FPR, and AAR of CDBN are 87.61%, 10.64%, and 88.42%, respectively. e metric values of TPR, FPR, and AAR of SVM are 64.39%, 34.63%, and 64.80%, respectively. Table 6 lists the best results among the 50 trails. Besides, the best performance of the average classification accuracy rate over the three methods is 97.68%, 92.72%, and 71.25%, respectively. Compared with the CDBN and SVM models, it can be seen that the present statistic image feature-based DBN obtained the best performance. e comparisons with more advanced models are shown in Table 7. We have to admit that these SOTA models have strong feature learning ability from raw images but may suffer from a large quantity of training parameters and limited training samples, as well as the various backgrounds, diverse angles, and different lights.
Fire detection is conducted in the following part using the proposed method. Same as the previous case in the smoke detection, both ablation study and comparison study are conducted. e fire and nonfire images are also divided into training set and testing set with ratio equals to 0.6. Figure 10 shows the results of the ablation study over three metrics w.r.t. 9 feature combinations. e features are calculated using color descriptor, texture descriptor, and  Statistical features of the gray-level co-occurrence matrix [g 1 , g 2 , g 3 , g 4 , g 5 ] F 4 Local binary pattern histogram    Figure 9: Performance metrics of the comparison study for smoke detection.    statistical features with DWT. Feature vectors F 1 , F 2 , F 3 , F 4 , F 5 , F 6 , F 7 , F 8 , and F 9 in the case are the same as those in the previous case. Figure 11 shows the average accuracy and the standard deviation over 9 feature vectors, and F 9 has the best performance in average accuracy.

Complexity
In the comparison study of fire detection, we also take 50 trails. Figure 12 shows the average results of the three evaluation metrics. e metric values of TPR, FPR, and AAR of the proposed method are 91.68%, 9.66%, and 90.94%, respectively. e metric values of TPR, FPR, and AAR of CDBN are 80.69%, 19.10%, and 80.79%, respectively. e metric values of TPR, FPR, and AAR of SVM are 69.56%, 29.96%, and 69.76%, respectively. Table 8 lists the best results among the 50 trails in the fire detection. Besides, the best performance of the average classification accuracy rate over the three methods is 92.65%, 84.50%, and 75.06%, respectively. e present statistic image feature-based DBN shows comparable results over CDBN and SVM models. SOTA results are given in Table 9, which shows the similar results as in the smoke detection case.

Conclusion
Fires in buildings and forests are difficult to be fast and accurately detected due to the complex environment. A statistic image feature-based DBN is proposed for fire detections to reduce the feature variations influence of color, texture, and shapes for flame and smoke images. Experiments using open-access benchmark data of fire and fire-like images are performed to verify the effectiveness of the present method. e relative optimal combinations of statistic image features are analyzed using the different combinations of statistic image features. e best performance of the average classification accuracy rate using the present method could achieve 97.68 and 92.65%, respectively, for detecting smoke and fire using the benchmark data. Compared with two typical classification models of SVM and CDBN, the classification accuracy rates exceed 4.96%; 26.43% and 8.15%; 17.59%, respectively. e results show that the present method can learn effective flame and smoke features with high accuracy in complicated classifications. Moreover, this method is expected to fast detect fire stages with the help of small samples of statistic image features.

Conflicts of Interest
e authors declare that there are no conflicts of interest.