Real-Time Fire Detection Method Based on Computer Vision for Electric Vehicle Charging Safety Monitoring

. In the process of charging and using electric vehicles, lithium battery may cause hazards such as fre or even explosion due to thermal runaway. Terefore, a target detection model based on the improved YOLOv5 (You Only Look Once) algorithm is proposed for the features generated by lithium battery combustion, using the K-means algorithm to cluster and analyse the target locations within the dataset, while adjusting the residual structure and the number of convolutional kernels in the network and embedding a convolutional block attention module (CBAM) to improve the detection accuracy without afecting the detection speed. Te experimental results show that the improved algorithm has an overall mAP evaluation index of 94.09%, an average F1 value of 90.00%, and a real-time detection FPS (frames per second) of 42.09, which can meet certain real-time monitoring requirements and can be deployed in various electric vehicle charging stations and production platforms for safety detection and will provide a guarantee for the safe production and development of electric vehicles in the future.


Introduction
As people's awareness of environmental protection increases, the proportion of new-energy electric vehicles is also increasing and their supporting infrastructure construction, such as charging stations, is also increasing year by year.Te power source of new energy electric vehicles is mainly lithium battery, which has the advantages of high storage energy density, long service life, and lightweight and is very popular in all kinds of new energy vehicle products.However, in the process of rapid charging of new energy electric vehicles at charging stations, the internal lithium battery may leak, catch fre, or even explode due to thermal runaway and other reasons [1].Terefore, it becomes especially important to monitor the safety of electric vehicles in the charging process.
At present, deep learning-based target detection methods have gradually become the mainstream, including one-stage series algorithms and two-stage series algorithms.
Te two-stage series algorithms include R-convolutional neural network (R-CNN), faster R-CNN, and other networks.In the frst step, region proposal network (RPN) is trained, and in the second step, the classifcation and location information of the target are predicted by convolutional neural network [2].Te one-stage series of algorithms, mainly, single shot multibox detector (SSD) algorithms [3], adopt the idea of mathematical regression, omitting RPN, and directly regressing to obtain the class probability and location coordinates of the object, with slightly lower accuracy, but signifcantly improving the detection speed.Te combustion of electric vehicle battery cells is complex and variable, with no exact rule.Once the internal stable state of the lithium battery due to collision, overcharging, and other situations is established, thermal runaway will quickly lead to the battery electrolyte decomposition and other reactions, resulting in the release of a large amount of heat and rapid heating of the battery, but also generating a lot of hydrogen, methane, and other white smoke.Te lithium battery fre process is rapid, often resulting in booming combustion and even explosion phenomena.Te fre will usually ignite the surrounding electrical equipment, causing smoke and continuous burning phenomena [4][5][6].Experiments in the literature [7] proved that the main way to afect the surrounding environment when the battery compartment explosion of electric vehicles is high temperature, the temperature in the battery compartment will rise to 2158K in 0.12 s, the high temperature will spread around horizontally along the pressure relief hole, and it is very easy to cause the surrounding charging pile or other vehicles to burn.In addition, in the case of large-scale outdoor charging stations, it is difcult to cover all the scenarios using traditional physical sensors and is susceptible to the infuence of the surrounding environment, such as driver smoking and restaurant smoking.If not detected in time, more damage will occur.
Earlier traditional detection methods were based on image feature-based recognition judgments.Smoke and fame combustion features are diverse, and the color, texture, and motion features are extremely complex.Xie et al. [8] proposed a method for early detection of fres in indoor enclosed environments based on the refective properties of frelight, while developing a highly sensitive foreground identifcation method for fame detection by using strategic background updates and block binarisation thresholds, but it is difcult to apply to complex scenes with multiple refections outdoors.Liu et al. [9] presented the Y d U a V a colour model to analyse the colour changes and motion trajectories of smoke in adjacent frames, thus roughly fltering out blocks of images suspected of having smoke.Du et al. [10] improved the ViBe algorithm based on the color features of smoke to extract smoke features.Chen et al. [11] introduced a convolutional network to extract smoke texture information and combine it with the static texture information of the original image for detection.Zhao et al. [12] built a classifcation model of fame elements by YCbCr color space and formulated new rules to reduce the interference of image brightness.Wu et al. [13] performed multithreshold segmentation based on the image grayscale entropy criterion and used an improved particle swarm optimization algorithm to select thresholds as a way to quickly segment fame targets and background regions.Te Krawtchouk torch was introduced to construct the feature vectors of fame images as a way to construct support vector machines for detection [14].All the above methods are based on traditional imagebased detection methods, which have low accuracy and slow speed.Nowadays, many deep learning-based target detection methods are applied in video fame detection with better results.Te accuracy of faster R-CNN on fame detection task is improved by using color-guided anchoringbased strategy to constrain the generation region of anchor frames, but the detection speed still needs to be improved [15].Te detection efciency of small fame regions is particularly improved by improving the prior frame of YOLOv3 network and combining the features of fame ficker to reduce the false detection [16], but the detection speed of YOLOv3 algorithm is slow and not applicable to video stream monitoring.To track ship fres, Wu et al. [17] modifed the YOLOv4 algorithm with the SE attention mechanism module.Cai et al. [18] improved the YOLOv4 algorithm by replacing the network structure and pruning operations to achieve real-time object detection on an invehicle platform.Deep separable convolution was applied to a YOLOv4 network by Huo et al. [19] to enhance the algorithm's ability to detect smoke, but the algorithm was too old and lacked the capability to detect early fames.Wu et al. [20] proposed a fame detection model using the YOLOv5 network, but it could not detect the intense smoke phenomenon in the early stages of lithium battery combustion and lacked the timeliness of hazard prediction.Li et al. [21] applied the YOLOv5 algorithm to the feld of remote sensing and proposed the TCS-YOLO method by adding convolutional layers and replacing activation functions to improve the efciency of identifying global oil storage tanks.Wang et al. [22] applied a structurally reparameterised adaptation of the re-param visual geometry group (RepVGG) model to the conventional CenterNet to achieve object detection in mobile driving scenarios.Te YOLO [23] family of algorithms is also commonly used in applications such as marine, biomedical, and autonomous driving and is extremely versatile and stable [24][25][26][27][28][29][30].In recent years, the percentage of fre accidents in electric vehicles that occur in the parked state and during charging can be more than half of the accidents [31][32][33].Terefore, the safety monitoring of electric vehicles during charging is very important.
We propose an enhanced YOLOv5-based electric vehicle charging safety monitoring algorithm in this paper for lithium battery combustion and fre characteristics, such as white smoke and fre light from defagration, that may be brought on by thermal runaway during the charging process.Te method suggested in this article can be directly applied to the existing video surveillance equipment, and it is less expensive, more universal, and easier to implement than the traditional detection methods.However, it also has a higher monitoring accuracy and speed when compared to the unimproved algorithm.Te Methods section of the paper describes the algorithms used as well as the innovations and improvements made to address the issue at hand.Following this, the Experimental Procedure and Results Analysis sections present experiments and comparisons based on the improved approach.Finally, we present experimental conclusions and afrmations for future applications in realworld scenarios.Figure 1 depicts the fowchart's overall structure.Tis paper's main contributions are as follows: (1) In order to solve the problem that the original algorithm has poor detection capability for small targets in the complex scenario of electric vehicle combustion, the number of convolution kernels inside the algorithm is increased and more residual components are stacked in the feature extraction part to improve the detection capability of the algorithm for small targets.
(2) To address the uncertainty and complexity of the target locations in the fame smoke dataset, the Kmeans clustering algorithm was introduced to recluster the target locations in the dataset to obtain Te CBAM [34] is added after the backbone feature extraction network to signifcantly improve the method's ability to detect fames and smoke with only a small amount of code added.

K-Means Algorithm-Based Flame Smoke Anchor Frame
Planning.Te frst step of the YOLO series algorithm in the process of target detection is to generate candidate regions (anchor box).In the combustion process of lithium battery, the state changes are complex and dramatic, with great uncertainty, so this method uses the K-means algorithm to recluster the target location of the features in the dataset to get the anchor box based on the lithium battery combustion feature dataset and used for training.Firstly, the cluster centres are selected, and K samples are then counted as cluster centres c 1 , c 2 , ..., c k   from the dataset.Ten, for each sample x i in the dataset, we calculate the distance d ik to each of the K cluster centres and assign it to the category corresponding to the cluster centre with the smallest distance.For each category c k , we recalculate the coordinates of its cluster centre, i.e., the centre of the mass of all samples belonging to that category.We repeat the above steps until the cluster centre positions no longer change.
where d ik is the distance from sample x i to the cluster centre K and IOU is the intersection ratio between the two samples, which has a smaller error compared to the traditional Euclidean formula.
Te clustering results are shown in Figure 2. Te maximum values of the horizontal and vertical coordinates represent the image input size for this algorithm.Clustering is based on all the marked fame positions in the dataset, and 9 clustering centres are obtained, which are represented by asterisks.
Table 1 shows the preselected box positions obtained after clustering the target positions within the dataset using the K-means algorithm, compared to the preselected box positions obtained from training based on the COCO dataset.Te use of clustered preselected boxes is more benefcial to improve the training accuracy and detection results.

YOLOv5 Network Model.
Te YOLO series target detection algorithms solve the problem of target detection by regressing the anchor frames into which the images are divided and have good real-time performance as an endto-end detection algorithm.Te network structure of the unimproved YOLOv5 algorithm is shown in Figure 3.
Te YOLOv5 target detection algorithm can be divided into four parts: input, backbone, neck, and head.Using Mosaic data enhancement in the input section, four images are stitched together to form a new image by random scaling, cropping, and random rows.Tis reduces the GPU memory required for training while greatly enriching the dataset and improving the robustness of the network.Te backbone part is the backbone network for extracting the burning features of lithium battery during the charging process.Te YOLOv5 algorithm adds a focus structure and a cross-stage partial network (CSPNet) structure, which extracts a picture based on its width and height, one pixel apart, and fnally obtains four independent feature layers, which are then stacked to concentrate the picture's width and height information into the channel space, quadrupling the input channel without losing the picture information, as shown in Figure 4. Te CSPNet structure splits the feature map into two parts: one part is convolved and the other part is spliced and fused.It enhances the feature extraction and maintains accuracy while being lightweight, while reducing computational bottlenecks and memory costs.

CBAM.
Adding an attention mechanism module is a common optimization process in the feld of deep learning that allows the model to simulate the focus information observed by the human eyes.CBAM is used to assign different weights to the picture features extracted from the backbone network, which suppresses useless information and improves the utilization of efective features in the neck part.
Te CBAM is a simple and efective feedforward convolutional neural network attention mechanism module, which innovatively accesses the spatial attention module after the channel attention module compared to squeezeand-excitation network (SENet) and efcient channel attention module networks (ECANet) and uses the summation and stacking of maximum pooling and average pooling to make the feature map obtain adaptive feature refnement with corresponding weight proportion.CBAM incorporates efective suppression of the fame and smoke background information and emphasizes the target feature information.In this paper, after the backbone network, the output feature vectors are fed into the CBAM, which is a lightweight general-purpose module that can efectively improve the detection accuracy with little impact on the detection speed.Te CBAM is shown in Figure 5.

Journal of Electrical and Computer Engineering
Te channel attention mechanism is to pass the input feature information through global maximum pooling and global average pooling, respectively, through the multilayer perceptron multilayer perceptron (MLP) and then add the results obtained from the two pooling counterparts and output the results after multiplying the corresponding weights with the original input feature map by the Sigmoid function, as shown in Figure 6.Equation ( 1) is given by where F is the input feature map, MLP() is the multilayer perceptron, AvgPool() is the mean pooling, MaxPool() is the global pooling, and σ is the Sigmoid operation.
Among them, the spatial attention mechanism is to use the feature map output from the channel attention mechanism module as the input feature map.Firstly, global maximum pooling and global average pooling based on the number of channels are performed.Ten, the two results are concatenated based on the number of channels.After a convolution operation, they are reduced to one channel, and fnally, the results are passed through the Sigmoid function to generate the channel attention features and multiplied with the input feature map to obtain the fnal generated features, with the following equation.Te spatial attention mechanism in CBAM is shown in Figure 7.
where σ is the Sigmoid operation and 7 * 7 denotes the convolution kernel size.

Improvement in the Number of Residual Components and
Convolutional Kernels.Te YOLOv5 algorithm is divided into networks with diferent weights and detection capabilities by using diferent numbers of residual component modules and diferent numbers of convolutional kernels, so that its networks have diferent depths and widths.For the YOLOv5 algorithm with diferent widths and depths, the depth and width of the network as well as the number of residual components and the number of convolutional kernels are controlled by two parameters, namely, Depth_multiple and Width_multiple, which can be classifed as YOLOv5-s, YOLOv5-m, YOLOv5-l, and YOLOv5-x according to the commonly used parameters, and the corresponding parameters are shown in Table 2.

Improved YOLOv5
Algorithm.Finally, we adjusted the number of residual structures and the number of convolution kernels in the YOLOv5 algorithm to select the parameters most suitable for lithium battery combustion feature detection, achieving better recognition results compared to the unimproved algorithm.A CBAM attention mechanism module was also added after the feature extraction network to increase the weights of valid features, allowing the algorithm to focus more on important features and suppress unnecessary features, eventually greatly improving the detection accuracy without afecting the detection speed.
Te improved network structure is shown in Figure 8.Compared with Figure 3, the diferences between the improved algorithm and the unimproved algorithm are marked in Figure 8, with a blue background.

Experimental Procedure and Results Analysis
3.1.Dataset Acquisition and Preprocessing.Te initial characteristics of electric vehicle lithium battery fres are often dominated by white smoke, open fames of booming combustion, and then gradual ignition of other body structures which produces black smoke, continuous fames, etc.Despite being extremely complex, the features and background share some similarities, so we focus on selecting images with the above features to build the dataset.Figure 9 shows the schematic diagram of some of the images in the fame smoke dataset, where (a) represents the image of an electric vehicle catching fre during charging at a charging station; (b) and (c) represent the images of a sudden lithium battery fre during the use of an electric vehicle; and (d) represents the image of an electric vehicle when it catches fre at night.LabeImg software is used to annotate the fame and smoke parts in the above dataset and generate XML fles with target location information to form a dataset following the VOC annotation specifcation.In the real scenario, there are situations such as partial masking of fames and fusion of smoke with the sky background.For diferent situations, rectangular boxes of diferent sizes are used for annotation to improve the accuracy of the dataset content.Te fnal composition consists of 3391 images, as the experimental dataset.3. Te comparison of YOLOv5 detection results under diferent parameters is shown in Figure 10.
From the data in Table 3 and from Figure 10, it is known that when Depth_multiple and Width_multiple are 1.00 and 1.00, respectively, the CSP1_X structure with 3, 9, and 9 residual components, the CSP2_3 structure with 3 residual components, and the network with 64, 128, 256, 512, and 1,024 convolutional kernels in the CBL structure have the best feature extraction efect.It has the best detection efect and the highest accuracy for the intense smoke and fame burning phenomena generated by electric vehicle combustion.

Comparison of Detection Efects of Diferent Backbone
Feature Extraction Networks.Based on the experiments in Section 3.3, YOLOv5-l was chosen as the basis for network improvement, and then the YOLOv5 algorithm with different backbone networks was used to compare and test the According to the data in Table 4, the average FPS is only 10.86.When using CSPDarknet as the backbone feature extraction network, the mAP is only 2.11 percentage points lower, but the average FPS is signifcantly higher at 25.04,  In Figure 11, the YOLOv5 algorithm, which uses CSPDarknet as the backbone network, has the best detection performance, detecting smaller targets and being more resistant to interference than others.

Comparison of YOLOv5 Algorithms Using Diferent Attention Mechanisms.
From the comparative experiments in Section 3.4, it is clear that the YOLOv5-l algorithm based on CSPDarknet as the backbone network has the best recognition results, so further improvements are made based on this.Te addition of attention mechanisms allows the model to locate interesting information and suppress useless information.Te commonly used attention mechanisms include SENet, CBAM, ECANet, etc.In this paper, we embed each of these three attention mechanisms at the same location to improve the detection accuracy during the experiment.Before the neck part, the three sizes of image feature information output from the backbone network are input into the attention mechanism module so that their features are given weight information.Te YOLOv5 algorithm with the addition of CBAM increases the number of computational parameters by less than 0.01%, has the highest detection accuracy, and has the best detection efect on the features generated during the combustion of lithium battery.Te comparison of picture detection results using SENet, ECANet, and CBAM attention mechanisms is shown in Figure 12.
Te data in Table 5 obtained by comparing the algorithm models based on diferent backbone networks, diferent residual structure techniques, and convolutional kernel bases and adding diferent attention mechanisms using the same dataset in the same experimental environment are shown in Table 5, which shows that the YOLOv5 algorithm based on the CSPDarknet backbone network with residual structure base 3, convolutional kernel base 64, and embedded CBAM is the best.Te network model has the best results for monitoring electric vehicle charging safety.

Comparison of Improvement Results
. Due to the rapid reaction of lithium batteries when burning, the temperature rises sharply, which can easily ignite the surrounding materials and cause the whole car to catch fre.Terefore, we have selected images of the relevant cars burning to conduct comparative experiments again and to test the universality and robustness of the algorithm.
Figure 13 shows a comparison of the detection results of the original, unimproved, and improved algorithms, respectively.Te improved algorithm can more accurately mark the location of the fames and detect smoke features

Conclusions
A feature target detection algorithm that realises the realtime monitoring of targets including fame and smoke in complex scenes of the charging process of electric vehicles is proposed for the potential safety issues with electric vehicle charging.In addition, the best target detection model for EV charging safety monitoring scenarios is derived after experimental comparison and analysis, and CBAM is added to the model to improve it.Finally, the enhanced algorithm mAP is able to surpass a number of well-known target detection algorithms in terms of evaluation index performance, detection accuracy (94.09%), anti-interference capability, and real-time performance.It can be inexpensively ported to mobile devices for real-time monitoring, providing a creative and practical solution for the safe operation of electric vehicle charging stations.
In addition, the algorithm can also be applied in the future to unmanned charging stations, the production, transport and storage of lithium battery, and other real-time safety monitoring scenarios, as well as the burning characteristics of lithium battery in public areas to provide security for lithium battery-related use scenarios.

2
Journal of Electrical and Computer Engineering the most suitable anchor frame for this dataset and improve the training speed and accuracy of the algorithm.(3) To enhance the extraction capability of the method for the fame smoke features generated by lithium battery combustion as well as to improve the generalization capability and robustness of the method.

( 4 )
In the initial stage of weight model training, the training weights obtained by YOLOv5 based on the COCO dataset are used for migration training, which can improve the convergence speed of the model for the fame smoke dataset, reduce the model training time, and improve the training results.Te whole experiment lasts for 100 training rounds (epoch), the confdence (confdence) is set to 0.5, in the frst 50 training rounds, the batch size is 8; in the last 50 training rounds, the batch size is 4, the data input size is 416 × 416, and Adam is used as the optimizer.3.3.Comparison of Diferent Widths and Depths.Te same validation set was tested during experiments using the diferent algorithms mentioned above to detect combustion feature targets in electric vehicles.Te results are shown in Table

Figure 9 :
Figure 9: Schematic diagram of fame and smoke dataset.(a) Fire during charging, (b) electric vehicle fre side, (c) electric vehicle fre front, and (d) fre at night.

Table 2 :
Comparison table of YOLOv5 algorithm structure diferences under diferent parameters.

Table 3 :
Comparison of YOLOv5 algorithm detection indicators under diferent parameters.

Table 4 :
Comparison table of evaluation index parameters of diferent backbone feature extraction networks.

Table 5 :
Comparison table of evaluation index parameters of diferent backbone feature extraction networks.