Generation of Smoke Dataset for Power Equipment and Study of Image Semantic Segmentation

,


Introduction
Electricity is one of the most important energy sources in modern human society.It is indispensable in daily life, industrial production, public facilities, economic development, and national defence.As a critical infrastructure for generating and transmitting electricity, electric power equipment provides the fundamental guarantee for the continuous supply of electricity.
With the increasing depletion of fossil fuels and the adverse efects on the environment, the demand for electricity, a clean energy source, has been increasing year by year.Tis poses a huge challenge for the continuous, safe, and stable operation of power equipment, which is the foundation for the generation and transmission of electrical energy.Every year, accidents caused by fres in electric equipment occur around the world.Te data show that the number of electrical fres still ranks frst, accounting for as high as 50.4%.Fires in power equipment not only afect people's normal production and life but also cause casualties and economic losses [1][2][3][4].Terefore, daily fre prevention and timely detection of fres are important means to ensure the safe operation of power equipment.
Te main causes of electric equipment fres are short circuits, overloads, equipment malfunctions, damage to power lines, extreme weather, illegal operations, and electric equipment failures.Before generating an open fame, electric equipment generates a large amount of smoke.Terefore, accurate and timely identifcation of smoke is critical in preventing fres from becoming more severe.Currently, a large number of smoke detectors require a certain level of smoke concentration to be detected, resulting in poor timeliness, and they also have the disadvantage of not being useable outdoors [5,6].With the widespread use of video surveillance equipment and drones in the power system [7,8], these devices can provide a large amount of image data in real-time indoors and outdoors, providing the basic data for smoke identifcation through images.Terefore, identifying smoke based on images has a very good application prospect in the power system.
Te simplest way to identify smoke in an image is to determine whether or not smoke is present, but this method cannot provide information on the location and size of the smoke, and therefore, research on this method is limited.Te more common approach is to identify the location of the smoke and accurately segment the smoke, which involves object recognition and semantic segmentation.Te former quickly identifes the location of the smoke in an image or video frame using an algorithm and marks the specifc location with a B-Box [9].Te latter not only identifes the location of the smoke but also segments the smoke region.Te former algorithm belongs to image object recognition and the latter belongs to image semantic segmentation.Obviously, accurate segmentation of the smoke can provide more useful information than just identifying the location of the smoke.Tis is because it not only provides the location of the smoke but also its size information and even estimates the concentration of the smoke in combination with other algorithms.However, this method is more challenging because it requires distinguishing smoke from background pixels and generating a smoke mask.
Traditional methods for smoke segmentation algorithms generally rely on the color, shape, motion, and other information of smoke.Terefore, algorithms such as motion detection, wavelet analysis, hidden Markov model (HMM), and histogram of oriented gradients (HOG) are widely used to extract smoke features [10][11][12][13][14].However, traditional smoke image segmentation algorithms are afected by factors such as lighting, weather, background, and the semitransparent nature of smoke, which leads to poor segmentation results.Neural networks can learn more abstract image feature information and have better robustness.In recent years, using neural networks for smoke segmentation has become a research hotspot, and it has achieved good results in smoke segmentation tasks in diferent felds [15,16].However, there is almost no research on smoke segmentation in the context of the power system, and there are three main problems in this regard.
First, the lack of data is a major issue for smoke segmentation in the feld of electric power systems, as it is generally a common problem in the research area of smoke segmentation.Image segmentation neural networks learn image features from the input image dataset, and therefore, the dataset is an important factor that determines the fnal segmentation result.Fires in electric power equipment generally have small burning areas, high smoke density, and fast smoke generation, and the smoke color is mainly gray, black, and light blue.Terefore, it is diferent from other felds such as forest fre prevention and urban building fre prevention.In addition, smoke data in electric power equipment are not easy to collect or obtain from the Internet as compared to other felds, and even creating a suitable environment to obtain these data is difcult due to costrelated reasons.Smoke in electric power equipment has higher density and smaller area than other felds such as forest fres.In previous smoke segmentation research, artifcial synthesis methods were often used to supplement the lack of real images.Tis method usually involves directly overlaying smoke images with masks onto other background images.Tis two-dimensional method overlooks the fact that smoke can be afected by environmental lighting to a great extent when compared to real smoke.In fact, the color and brightness of smoke vary in diferent environments and lighting conditions.Moreover, the smoke images generated by this method are static and cannot provide continuously changing smoke images to the neural network.Terefore, there is still a signifcant gap between twodimensional generated smoke and real smoke.
Second, there is an issue with the annotation of smoke images.Due to the complex and rapidly changing shape of smoke edges and its translucent nature, smoke annotation is difcult to be accurate, which afects the fnal segmentation efect.
Lastly, there is a lack of research on neural networks for smoke segmentation in the feld of power equipment.Although there are many studies on smoke segmentation in other felds such as forest fre prevention, there are very few studies on smoke segmentation in power equipment.Unlike other felds such as forest fre prevention, the smoke generated by power equipment is more concentrated, with a higher concentration and a more diverse background environment including both indoor and outdoor settings.Terefore, the smoke segmentation results achieved in other felds cannot be easily translated to the smoke segmentation tasks in the feld of power equipment.
Tis article proposes a solution to the problem of insufcient real smoke data for electric power equipment by using 3D virtual software to create realistic smoke data.In addition, a new neural network structure called DS-UNet (double smoke UNet) based on UNet is proposed, which is tailored to the characteristics of smoke in electric power equipment and achieves better smoke segmentation results.
Te innovation of this article lies in proposing a method to address the issue of insufcient real smoke data for electric equipment smoke image segmentation by using 3D virtual software to create realistic smoke data.Te software is used to simulate the realistic environment lighting with HDR, making the generated smoke consistent with the real smoke in the environment.In addition, more diverse background environments and occlusion are provided, and continuous changing smoke data (including the initial production stage of smoke) are generated to enhance the robustness of the neural network for electric equipment smoke segmentation.
We use 3D virtual software to generate precise mask images of smoke during the smoke generation process, eliminating the need for data annotation and, more importantly, improving the accuracy of subsequent segmentation.

Journal of Electrical and Computer Engineering
We propose a DS-UNet neural network tailored to the characteristics of electric equipment smoke, achieving more accurate segmentation of electric equipment smoke.

Related Works
Long et al. [17] frst achieved end-to-end semantic segmentation using fully convolutional networks (FCNs), marking a groundbreaking advancement in image semantic segmentation.Teir key contribution was the incorporation of skip connections within the network to fuse outputs from diferent layers, resulting in improved segmentation results.In addition, they initiated training by directly initializing the network's parameters with a VGG16 model pretrained on ImageNet [18], followed by fne-tuning on other datasets to accelerate convergence.Since the introduction of FCN, numerous researchers have conducted extensive studies in image segmentation, including applications such as smoke segmentation.Yuan et al. [19], inspired by the concept of FCN for semantic segmentation, proposed a deep smoke segmentation network aimed at inferring high-quality segmentation masks from ambiguous smoke images.To address the signifcant variations in smoke appearance related to texture, color, and shape, they divided the network into coarse and fne pathways.Te frst pathway utilized an encoder-decoder FCN with skip connections, extracting global contextual information of the smoke to generate coarse segmentation masks.To preserve the fne spatial details of the smoke, the second pathway was also designed as an encoder-decoder FCN with skip connections, but was shallower compared to the frst pathway.Ultimately, experimental results on three synthetic smoke datasets and one real smoke dataset demonstrated that the proposed method's segmentation performance signifcantly outperformed existing segmentation algorithms based on neural networks trained for ambiguous data.
Based on FCN, Ronneberger et al. [20] introduced the UNet deep learning image segmentation model in 2015.UNet's primary characteristic is its symmetric encoderdecoder architecture with skip connections.Te encoder portion resembles a convolutional neural network (CNN), progressively reducing the input image size to extract features at diferent scales.Te decoder portion reverses this operation, gradually upscaling the image to reconstruct fner details.Between these two parts, UNet incorporates skip connections, linking the output of the encoder with the input of the decoder to transmit both low-level and high-level feature information.Another distinctive feature of UNet is its loss function, typically a combination of cross-entropy and Dice coefcient.Tis combination efectively penalizes discrepancies between predicted and actual segmentation areas.Since the introduction of UNet, it has found extensive use in various classifcation tasks, including medical image segmentation [21][22][23].In addition, UNet has been applied to smoke segmentation tasks as well [15,24].
In addition to the aforementioned FCN and UNet, generative adversarial networks (GANs) and transformers have also been applied to image segmentation.Zhao et al. [25] employed GANs to enhance the contrast, sharpness, and brightness of segmented images.Tey utilized a staged fne-tuning strategy, gradually fne-tuning layers of deep neural networks from top to bottom to achieve optimal segmentation results.Ultimately, they achieved state-ofthe-art performance in the bone age assessment (BAA) task using the RSNA dataset.While transformers [26] have made signifcant advancements in natural language processing, they have also been applied in the feld of computer vision [27,28].Zhou et al. [29] introduced a hybrid semantic segmentation algorithm named SCDeepLab.Tey combined Swin Transformer and CNN within the encoding and decoding framework of DeepLabv3+ to achieve accurate identifcation of tunnel lining cracks.Tis approach yielded excellent results in their study.
While transformers have demonstrated excellent performance in image classifcation and segmentation, they often require more training data.UNet, on the other hand, can achieve relatively good results with a smaller dataset, which is crucial for smoke image segmentation in cases of limited data availability.However, UNet still sufers from the issue of losing image details during the upsampling and downsampling processes.Consequently, accurately segmenting complex scenes such as occlusions and densely packed objects can be challenging in smoke segmentation.Te dual-path network proposed by Yuan [19] and colleagues excels in extracting fne and coarse segmentation information from smoke data using neural networks.However, FCN's convolutional operations focus only on local information around the current pixel, lacking broader contextual information.Terefore, this paper introduces a dual-path network based on UNet to address the aforementioned challenges in smoke segmentation.Unlike Feiniu Yuan's [19] approach, we utilize a dual UNet structure rather than FCN.In addition, both networks adopt the UNet architecture instead of the VGG16 structure.To retain more smoke information, there are more skip connections between downsampling and upsampling layers.While this structure increases computational complexity compared to UNet, it yields superior segmentation results, making the added computational load worthwhile.

Generating a 3D Virtual Smoke Dataset.
As mentioned earlier, there is a lack of publicly available smoke data for power equipment.In other smoke segmentation tasks, a common method is to directly overlay the smoke image with an alpha channel on the background image, as shown in Figure 1, to address this issue.However, this direct overlay method results in a very rough fusion of smoke and background.On the other hand, the smoke has a semitransparent nature, and the alpha channel of the smoke image overlaid in this 2D way usually lacks information that can refect this characteristic of smoke, making it difcult to blend with the background.Moreover, this method can only generate static smoke images and cannot provide continuously changing smoke.Tese issues are not conducive to the training of neural networks because they cannot realistically refect the relationship between smoke and background in real images or the characteristics of smoke.

Journal of Electrical and Computer Engineering
In order to address the aforementioned issues, we propose using three-dimensional virtual technology to generate realistic smoke as a method to overcome the lack of smoke data.Currently, the use of three-dimensional virtual technology to generate smoke is very mature and has been widely used in the flm and television industry to create realistic smoke and fre in various productions.Te smoke generated by three-dimensional virtual technology not only solves the shortcomings of two-dimensional smoke mentioned above but can also generate continuously animated smoke, which is extremely helpful for neural networks to learn the continuous changes in smoke.
Tere are many 3D software programs that can generate smoke.In this study, we use the open-source Blender [30] software to generate virtual and realistic smoke.Blender is a free and open-source 3D creation suite.It supports the entirety of the 3D pipeline-modelling, rigging, animation, simulation, rendering, compositing, and motion tracking, and even video editing and game creation.Blender can generate very realistic smoke efects, and various features such as the density and color of the smoke can be adjusted freely.To better integrate with the background environment, we chose to use the HDR images as background images and used them as light sources to illuminate the generated smoke.Tis allows the smoke to refect the lighting characteristics of the background realistically, producing real light and shadow efects.At the same time, the smoke images also come with their own mask data, avoiding the problem of manual labeling.Te detailed steps of generating smoke images in Blender are as follows and the process is shown in Figure 2: (1) Smoke generation: Blender can directly use the fuid module to generate realistic smoke.First, we create a smoke domain and then create a smoke generator inside the domain.We create an object with a similar shape to the object that needs to produce smoke in the scene and use it as a smoke generator.Smoke will be produced from the surface of the object.At the same time, we set the parameters of the domain and smoke generator (see Figure 3).As shown in Figure 4, we created an object similar to an insulator as a smoke generator, which will be composited into the image of the power equipment.
(2) To integrate the smoke into the power equipment image and achieve a realistic efect, we use diferent HDR images to illuminate the smoke and serve as the background for the rendering process.Te smoke not only refects the background lighting but also casts shadows onto the background, making it more realistic.(3) Rendering the image sequence: During the process, we set the rendering size to 1000 × 600.Te image background and smoke are rendered and output together.Te color mode is set to RGB.
(4) Render the smoke mask series: After rendering the previous set of images, we separately render the smoke mask image as ground truth for segmentation training using DS-UNet.Tis way, we do not have to manually label this part of the image.As shown in Figure 4, some generated smoke images and their corresponding smoke masks are displayed.Tis method can generate static smoke images and produce continuous images.Figure 5 demonstrates the sequence of generated smoke images.
Obviously, compared with manual labeling of smoke data, using Blender to generate real smoke data avoids high data collection and labeling costs.Terefore, it greatly reduces time and manpower costs.Table 1 simply compares the time and manpower costs required for one person to  Journal of Electrical and Computer Engineering complete 10 scenes and 5000 pieces of smoke data.It should be noted that the image acquisition process in manual data labeling is afected by various spatial factors such as scenes and regions, so the estimated time will be longer, which does not include transportation costs.In addition, due to the irregularity of the edges of smoke images, labeling will take longer than other types of data.According to experience, it takes about 1 minute per image, so 5000 images require about 83.3 hours, which is about 3.47 days.Tis is the ideal case, and normally, no one can work continuously for so long.From the table, we can see that the manpower cost of the manual data labeling method exists in every process, while the virtual data generation method requires manpower mainly in the process of creating virtual scenes, and rendering basically does not require much manpower, which is completely performed automatically by the computer (the computer hardware confguration required for rendering time in the table: NVIDIA 3080, 32G memory, and Intel i7.If the computer has a better performance, rendering time will be shorter).
It should be noted that the smoke mask is a grayscale image, not a binary image.Terefore, before training, we need to convert the image to a binary image.We found that using a threshold of 100 can better preserve the edges of the smoke.In order to have better robustness for the neural network used later, in addition to the generated smoke images, we also have real images that are accumulated from daily work or obtained from the Internet.In order to obtain more accurate segmentation masks, we use Photoshop's smart selection tool to select and segment the real images.In the end, our dataset consists of a total of 6100 images, including 5100 generated images and 1000 real images.We will divide 3/4 of the generated images for the training set and 1/ 4 for the validation set and use the 1000 real images as the  test set.In addition, in the training process, we use a series of image enhancement techniques to augment the dataset, such as rotation, fipping, cropping, and contrast changes.Te detailed composition of the datasets is shown in Table 2. Te data used to support this study are available at the following website: https://github.com/baihch7982/power_smoke_datasets.git.

Double Smoke UNet (DS-UNet).
Compared to other semantic segmentation networks, UNet has a good image segmentation efect with only a small amount of data and is widely used in various semantic segmentation tasks, especially in medical image segmentation.However, in the task of smoke segmentation, due to the semitransparent and unclear edge characteristics of smoke, using UNet alone cannot achieve good segmentation of smoke edges, and the model loses some details of smoke during the training process.Tis is because the UNet network extracts high-level abstract features of images through continuous downsampling and restores them through continuous upsampling.Although information is passed through skip connections similar to ResNet, a lot of information is still lost because the downsampling and upsampling processes are not reversible.
Inspired by Feiniu Yuan's [19] dual-path network, we used a dual UNet to improve the UNet model for smoke segmentation and improve the segmentation performance of smoke.It should be noted that the only similarity between    [19] is that both use a deep network to allow the model to learn high-level features of the image and a shallow network to allow the model to learn shallow information about the image.Apart from this, our network structure is completely diferent.
Te DS-UNet model uses a fully convolutional structure, as shown in Figure 6.Like UNet, the deep network uses the UNet network structure.Terefore, after fve identical downsampling operations, the input image is gradually reduced in scale, while the channel dimension changes in the order of 64, 128, 256, 512, and 1024.Ten, the image scale is restored to the input size by upsampling.In this process, except for the bottom layer, the same layer is connected by skip connections through concatenation.Te downsampling process is implemented using max pooling, while the upsampling process uses transpose convolution, rather than interpolation used in previous upsampling methods.Te diference between the two is that upsampling completes the upsampling through interpolation without training parameters, while transpose convolution has parameters that can be trained.Although upsampling is faster without parameters, the trainable parameters in transpose convolution can better control the restoration process from a high abstract feature space.
Te shallow network also uses the UNet network structure, but to preserve more information, the number of layers in the network should be shallow.At the same time, the number of convolutions in each layer is reduced from 2 to 1 (please see Figure 7 for the detailed network structure).Te orange upper part is the shallow network for detailed feature extraction and the blue lower part is the deep network for high abstract feature extraction.After diferent downsampling and transpose convolution, the two networks obtain the same size and dimension, and we concatenate them directly.Te channel dimension is the sum of the output dimensions of the two networks.Finally, the fnal segmentation result is obtained by adjusting to the same dimension and size as the input image using a 1 × 1 convolution and taking the maximum probability value of each pixel by using sigmoid.
During the training of the whole network, we used PyTorch as the deep learning framework.To prevent the repetition of applying sigmoid to the output and increase prediction errors, we used binary cross-entropy loss as the network's loss function instead of binary cross-entropy loss with logits.Te loss function is shown in the following equation:  7 shows the loss-epoch curve for DS-UNet.
To evaluate the performance of our proposed DS-UNet, we compared it with UNet and Feiniu Yuan's [19] dual-path FCN segmentation algorithm.We trained these comparative methods on the same power equipment smoke training data using the code reproduced according to the respective papers.

Journal of Electrical and Computer Engineering
After training, we tested the mean pixel accuracy (mPA) of the three models using both the validation dataset and the test dataset.Te results are shown in Table 3.We observed that DS-UNet consistently outperformed the other three models.
Te evaluation metrics for image segmentation generally include mean intersection over union (mIoU) and Dice coefcient.Terefore, in order to further compare our algorithm with other algorithms, we calculated the widely used metrics: mIoU and Dice coefcient.For mIoU, the closer the value is to 1, the better the segmentation efect.Te calculation formula for mIoU and Dice are shown in the following equations: where PR i denotes the predicted segmentation result of the ith image, GT i denotes the corresponding ground truth, and n denotes the total number of images in the dataset.As shown in Tables 4 and 5, as we can see, our method performs signifcantly better than the other methods on both the validation dataset and test dataset.Our method achieves the highest mIoU among all the compared methods, indicating that our predicted segmentation is closest to the ground truths.It should be noted that if the shallow network shown in the orange color in the upper part of the fgure is removed, our model will degrade to the UNet model.As can be seen from Tables 4 and 5, adding the shallow network has a signifcant efect on the segmentation performance.
Typically, the complexity of a neural network model is described using two metrics: the number of parameters and the computational cost (time complexity).Terefore, in order to further illustrate the complexity of DS-UNet, we have counted the number of parameters as well as the amount of data used for inference (Table 6).As shown in Table 6, the number of parameters in the model reaches 31037697, and the amount of data used for forward inference and backward inference reaches 3718.00.Due to the addition of branches, both the number of parameters and the amount of data used for inference increase.However, we believe that this is worthwhile in order to achieve better smoke segmentation results.
At the same time, we selected four data images and created heat maps.As shown in Figure 8, we can see that the model has accurately identifed the smoke regions in the images.Tis further demonstrates DS-UNet's ability to accurately recognize smoke images in an electric equipment environment.
Smoke is diferent from other objects, such as cars and pedestrians (their edges are clear).It has the characteristics of constantly changing shape, transparency, and blurred edges.Terefore, in daily use, it is very important to correctly segment continuous smoke images (such as videos).To test the segmentation ability of DS-UNet for continuous smoke data, we selected three diferent scenes with a duration of 1 minute (30 seconds/frame, a total of 1800 frames) for video data testing.Te test results show that DS-UNet has a good segmentation efect for continuous smoke, as shown in Figure 9, which displays the segmentation results of 1 second and 30 frames from these three scenes.Te qualitative comparison between our method and other deep learning methods on the smoke test dataset of power equipment is shown in Figure 10.Te frst column shows the test images, the second column shows the corresponding ground truths, and the other columns show the segmentation results of diferent methods.Tese images include both the generated images by our method (frst two rows) and real images (last two rows).Te experimental

Images
Ground truths DS_Unet Feiniu UNet results demonstrate that our method has good segmentation performance on both synthetic and real smoke images.Moreover, our method separates the smoke with clearer edges and more accurate positions compared with other methods.

Conclusion
Electric equipment fres have always been one of the main hazards of electric equipment.Smoke detection and recognition have always been extremely important in electric equipment, as they can provide early warning before an open fame occurs.Compared to relying on smoke concentration for detection, image-based smoke recognition has the advantage of being unafected by indoor and outdoor environmental conditions.Tis paper addresses the problems of limited smoke data in the electrical system, difculty in labeling data, and inadequate research on recognition algorithms.We propose using a 3D virtual technology to generate smoke and image masks and using the environment background such as HDR lighting to enable smoke to be realistically combined with the background.Inspired by the dual-path networks [19], we also propose the DS-UNet model.Te model uses two UNet structures, one deep structure network for extracting abstract features of the data and one shallow structure network for extracting detailed features of the data.We trained the model on generated smoke data and conducted experiments on the generated and real images.Comparative experiments showed that the DS-UNet model has a signifcantly better smoke segmentation performance in electric equipment than other similar models.

Figure 2 :
Figure 2: Te process of generating smoke images in Blender.

Figure 3 :
Figure 3: Creating an object similar to an insulator as a smoke generator.

Figure 4 :
Figure 4: Te smoke synthesis images and image masks generated by 3D virtual technology.

Figure 9 :
Figure 9: Segmentation efect of DS-UNet on continuous smoke images (two lines per scene).

Figure 10 :
Figure 10: Comparison of segmentation results on generated smoke images and real images (the frst two rows are generated images and the last two rows are real images).

Table 2 :
Composition of the dataset.
our work and the work proposed by the authors in reference

Table 3 :
Comparison of image segmentation accuracy of diferent models on the validation set.

Table 4 :
Comparison of mIoU among diferent models.

Table 5 :
Comparison of Dice among diferent models.

Table 6 :
Te number of parameters and computational cost (time complexity) of DS-UNet.