Human Detection and Action Recognition for Search and Rescue in Disasters Using YOLOv3 Algorithm

,


Introduction
Drone reconnaissance is becoming famous nowadays in a catastrophic event for help and salvage.Nearly every week another natural disaster somewhere in the world makes another headline.Discussing India, it is the most generally infuencing our nation's economy; the best dangers are the catastrophic events like foods, tropical storms, typhoons, and the wide range of various regular disasters.Debacles and normal catastrophes cause an immense far and wide of human, material, ecological, and fnancial misfortune.
A variety of drone types are available on the market.A few of the basic drones are listed as follows: (a) Multirotor (b) Fixed wing (c) Fixed wing hybrid vertical take-of and landing (VTOL) 1.1.Multirotor.It is the easiest and the cheapest option of all the drones.It allows the user to attach a small camera and "keep an eye in the sky" for a small period of time.
1.2.Fixed-Wing.Fixed-wing drones (not "rotary wing," like helicopters) use a wing as a typical plane to give a lift in opposition to vertical lift rotors.In view of that, they just have to use the energy to move forward, rather than raise themselves into the air, so are considerably more profcient.
Tey are able to be in the air for long periods of time and are very efcient.

Fixed Wing Hybrid VTOL.
Te combination of the advantages of fxed-wing UAVs with drift capability is another 50/50 classifcation that can also take of and land upwards.Tere are diferent kinds of work going on, some of them are primarily existing fxed-wing aircraft with vertical lift engines.Other planes are "tail sitter," which appear to be a customary aircraft lying on the tail on the ground, pointing directly up for the start before a pitch to fy normally, or a type of "inclined rotor" where the rotors or even the whole wing with the connected propellers may rotate upwards for departure to point to an aeroplane in level fight forward.
Tere are various types of natural disasters that occur every day in various parts of the world.Some are earthquakes, foods, and wildfres.A wildfre, sometimes known as a forest fre, is an unplanned fre that happens in a forest, grassland, or other natural areas.Wildfres can occur anywhere at any moment.Tey frequently result from human activity or natural occurrences like lightning.Half the forest fres are not known, which have been registered begun.
According to the report in Times of India (TOI), India's average annual economic losses due to disasters are estimated to be $9.8 billion.And, moreover, taking about recent disasters, we lost around 103 lives in the Assam foods (source-Hindustan times).Drone surveillance involves using unmanned aerial vehicles (UAVs) to capture still pictures and video-based data to collect information on particular targets, which can be either individuals, groups, or environments.In this document [1], they examined the use of UAV-based human detection and technology in search and rescue operations during natural disasters.Drone surveillance is used to surreptitiously gather data about a target captured from a distance.It is shown in Figure 1.
In this research paper [2], they suggested a cost-efective CNN fre detection architecture for keeping track of videos.Te model was based upon the architecture of GoogLeNet, taking into consideration its reasonable calculation complexity and the relevance of the problem to other high-cost computing networks such as AlexNet.To balance efciency with accuracy, the model had been refned considering the nature of the target and fre data.Te experimental results of the fre benchmark data sets demonstrated the efectiveness of the proposed framework and validate its suitability for fre detection in CCTV monitoring systems compared to the latest methods.

2
Journal of Electrical and Computer Engineering A signifcant topic in emergency response management was considered: the allocation and scheduling of rescue units.MOMILP was used by the author to allocate and arrange rescue in the event of a natural disaster.Te frst goal was to reduce the sum of weighted relief operation completion times, and the second was to reduce the number of makes pan.Tis model used a single objective mixed integer programme with linear function utility to tackle the problem of prior models.Teir experiment demonstrated the efcacy of the recommended strategy in achieving high-quality results.However, determining the processing time, travel time, and severity level still takes time [3].
In their previous work, through the use of VANET, cloud computing, and simulations, they created a system for handling disasters and developed evacuation strategies for the city.Tey developed on past research utilising deep learning to forecast the behaviour of urban trafc in this paper.Tey also used GPUs to address the computationally intensive nature of deep learning methods.Tey were the frst to bring a deep learning method to disaster management.Tey utilized real-world open road trafc in a city available through the UK Department of Transport.Teir fndings demonstrated the efectiveness of deep learning in managing disasters and accurately predicting trafc behaviour in emergency situations [4].
In this research [5], they proposed a method to improve postnatural disaster management operations by selecting the appropriate disaster type and site.Tey initially retrieved the disaster-related tweets from the Twitter API using predefned keywords.Te posts were cleaned and the noise level was reduced at the second stage.Te third stage then moved on to the disaster type and geolocation.Te Named Entity Recognizer library and Google Maps Geocoding API were also utilized to acquire the geolocation.Tey used the same three steps to retrieve news using the News API.To determine the trueness of each Twitter message, they contrasted the data from Twitter data from the news.
In this paper [6], they provided a framework for early fre detection for CCTV security cameras that used customised convolutional neural networks to detect fre in a variety of indoor and outdoor environments.To ensure an autonomous response, they recommended an adaptive priority technique for the surveillance system's cameras.Finally, they ofered a dynamic channel selection approach for cameras based on cognitive radio networks to guarantee precise data transfer.Experimental results demonstrated the higher accuracy of our fre detection methodology in comparison to state-of-the-art technologies and support the applicability of their framework for successful fre catastrophe management.
Te three goals of this study [7] were: (1) creating instructional resources for teaching disaster mitigation; (2) understanding student learning results when utilising developed materials; and (3) understanding student reactions to developed materials.
Tey discussed real-time human identifcation on a fully autonomous rescue UAV in this article using the YOLOv3 algorithm.Te embedded system that was created could fnd swimmers in open water by using deep learning algorithms.Tis improved the operating capability of frst responders by enabling the UAV to give help precisely and entirely unattended.Te unique aspect of this work was the integration of computer vision algorithms and global navigation satellite system (GNSS) for exact human recognition and release of rescue equipment.Te hardware confguration in detail as well as the system's performance assessment were covered in detail [8].
In order to identify aeroplanes from satellite photos, this study compared and evaluated the most recent CNN-based object detection models.Te DOTA dataset was used to train the networks, and the DOTA dataset as well as independent Pleiades satellite pictures were used to assess their performance.According to COCO metrics and F1 scores, the faster R-CNN network produced the best results.With less processing time, the YOLO-v3 architecture also produced promising results, but SSD was unable to efectively converge the training data with few iterations.With more rounds and various parameter settings, all of the networks tended to learn more.When compared to other networks, YOLO-v3 can be said to have a faster convergence capability; however, optimization techniques also play a signifcant part in the process.SSD was superior in object localization while having the weakest detection performance.Te disparity in object sizes and diversities also had an impact on the results.Imbalances should be avoided or the categories should be broken down into smaller grains, such as aeroplanes, gliders, small planes, jet planes, and warplanes, while training deep learning architectures [9].
In this document [10], they came up with a data set on drones for human action recognition.Tis data set can also serve to detect and other similar tasks in diferent    Journal of Electrical and Computer Engineering surveillance applications.Te dataset provided was diverse by colour, size, actor, and background.Tis variation allowed for the generalization of the proposed dataset for various applications.In addition, their primary purpose was to provide support to search and rescue operations through drone surveillance.An experimental comparison of the measurement model of deep learning applied to the proposed dataset was submitted with additional publicly available datasets.A new detection model for the recognition of actions was also proposed in this document obtained a higher mAP value by 7% compared to the advanced SSD when used on a publicly available Okutama dataset Te suggested model also achieved 0.98 mAP when it was applied to their two class action detection data sets for the SAR, which was a good performance value for a live application.In this paper [11], to establish a uniform PASCAL VOC format picture database, they gathered pill images and used LabelImg.Te pill dataset was used to train RetinaNet, SSD, and YOLOv3, three of the most popular object detection techniques currently available.Te YOLOv3 model's loss function converges more quickly, which suggested that its training time was less than that of the other two models.As a result, it could better handle the efects of the model's retraining owing to the frequent changes in pills in pharmacies.By using the MAP and FPS as the evaluation metrics, they compared three models.
Te conceptual model for deducting the emergency plan in the event of a natural disaster was the main purpose of this document.Te multiagents system (MAS) based emergency response plan deduction architecture, both the internal structure and reporting mechanism of the agents, had been designed.At the JADE platform, the suggested deduction model had been tested.Te fndings indicated that the natural disaster contingency plan deduction conceptual model had been developed and was better able to meet the challenge of modelling in the complex system of inference of contingency plans and could achieve the desired results.Within NDREPMACS, there was a problem of coordination and collaboration among separate agents, that was referred to in the study.To address these issues, in-depth cooperative research in emergency management, artifcial intelligence, the science of disasters, and other felds was needed [12].
Te primary contribution of this article was to create a formal RSLD model for managing natural disasters that could be customised for viewing any real RSLD situation.Various stochastic aspects were considered during RSLD modelling, including the selection of trajectory, the destruction of the trajectory, the selection of vehicles, the destruction of vehicles, and the passage time of the trajectory.Te fndings of the analysis demonstrated the efcacy of the proposed framework, that could be extended to the ofcial model and assess other aspects of disaster management like evacuation shelters and evacuation planning [13].
Tey recommended a hybrid heuristic approach [14] built on a bilevel optimization model and machine-learning framework.An optimization framework would be used to iteratively fnd a better scenario using a supervised learning (regression) model that was trained using data from several contrafow scenarios.Real datasets from Winnipeg and Sioux Falls were used as benchmarks to assess this approach.It was programmed as a single computer programme that worked in conjunction with the general algebraic modeling system (GAMS) for optimization, MATLAB for machine learning, Equilibre Multimodal, Multimodal Equilibrium 3 (EMME3) for trafc simulation, MS-Access for data storage, and MS-Excel for data analysis (as an interface).By changing the direction of some roads, the algorithm improved accessibility to Winnipeg's centrally located, crowded, and congested districts while also producing optimal global solutions for the Sioux Falls example.
In this study [15], "deep learning" classifcation techniques were compared to human-coded photos released during Hurricane Harvey in 2017.Te VGG-16 convolutional neural network/multilayer perceptron classifers were used in their framework for feature extraction to classify the urgency and time period for a given image.Tey found that machine learning algorithms did not always capture the unique characteristics of disaster situations.Together, these techniques helped locate relevant material and requests by sorting through the volume of irrelevant social media posts.
In order to operationalize the science and technology roadmap for disaster risk reduction (DRR), this article evaluated the main knowledge gaps and potential for boosting transdisciplinary approach (TDA).To promote science-based policy implementation at global, regional, and local levels for risk-informed decision-making throughout the post-2015 Agenda, it was necessary to strengthen the links between science and policy.Te crucial TDA elements for enhancing DRR, climate change, and sustainable development strategies must be included, and DRR stakeholders must work together to improve collaboration through an integrated strategy that uses science, technology, and these TDA components [16].
Te objectives of this work [17] were to investigate cutting-edge disaster recovery (DR) solutions, thoroughly analyze recently published research, and identify various methodologies that were described in the literature.49 research studies, spanning the years 2007-2017, were examined as part of a comprehensive mapping efort.Numerous DR techniques were being researched.Te fndings revealed a variety of pertinent concerns, such as justifcations for adopting DR solutions, implementation strategies, analytical methods, and metrics taken into account when DR solutions were analysed.
In this paper [18], for studies on intelligent search, rescue, and disaster recovery missions, they provided as a useful foundation.Tey examined the fundamental machine-learning methods needed for object detection and path planning in clever rescue operations.Tey also demonstrated the viability of the suggested architecture using a proof-of-concept hardware-in-the-loop (HIL) simulator framework to support a specifc rescue mission scenario.By building a proof-of-concept prototype for search, rescue, and disaster recovery operations, they illustrated the Internet of Tings (IoT) architecture and put it to the test.
In this research [19], they reviewed the ways in which post-tsunami disaster response had benefted from the development of remote sensing techniques.Te performance assessments of the remote sensing techniques were addressed in light of the requirements for responding to the tsunami disaster in the long term.
In some other papers, the authors use live simulation data/real time data [16] and [18], United Nations World Tourism Organization [20], etc. Tey are given as follows.
In this study [16], population estimation in the regionof-interest and their migration pattern were investigated using cutting-edge object detectors.It was shown that the majority of detectors exhibit numerous detections for a single object, therefore exaggerating the number of objects.In order to decrease numerous redundant detections and enhance the count and mean average precision of the discovered classes, a nonmaxima-suppression (NMS) technique was used after the detections.
A dataset on natural and man-made catastrophe occurrences was incorporated into a model of international tourism fows in order to evaluate the efects of various disaster types on foreign arrivals at the national level [20].Te fndings demonstrated that various events can modify tourist fows to difering degrees.While a positive efect was considered in some situations, the results were frequently unfavorable and led to reduced tourist arrivals after an event.Destination managers who make critical decisions regarding recovery, reconstruction, and marketing had benefted from understanding the relationship between catastrophic events and tourism.[8,9,11,21] and [22], and diferent types of CNN algorithms like R-CNN, faster R-CNN, fne-tuned CNN, and U-Net CNN were used in these papers [2,6,8,9,16,19] and [10] for the human detection and action recognition for SAR.Tey were given as follows.

Deep Learning Algorithms. YOLO algorithm was used in these articles
In this paper [21], they looked at the usage of composite photos to hone an efcient victim detector.Tey were inspired because it was difcult to fnd real victim photographs for training and the state-of-the-art detectors trained on the COCO dataset could not reliably identify disaster victims.Tey suggested that in order to create composite victim photos, human body pieces should be copied, then pasted onto a background of garbage.Teir approach takes into account the fact that the actual victims were frequently covered in the rubble and could only be seen in fragments.Te body sections are thereby randomly pasted, as opposed to earlier approaches that copied and pasted an entire object instance.Te experimental fndings showed that fne-tuning the detectors using their composite images could signifcantly increase the average precision (AP).Tey had examined some cutting-edge detectors.Teir unsupervised deep harmonisation network, which could create harmonic composite images for training and aided in improving the detectors even further, was also proven to be benefcial.YOLOv3 algorithm was used to detect the human.In this paper [22], they intended to introduce the use of robots for the initial investigation of the disaster site.Te robots toured the area and used the video stream (with audio) they had recorded to locate the human survivors.Tey proceeded to transmit the survivor's discovered position to a main cloud server.In order to establish whether it is safe for rescue personnel to access the chosen area, it was also necessary to monitor the area's associated air quality.Tey employed a human detection model for images with a mean average precision (mAP) of 70.2%.Te F1 score of the suggested method's speech detection technology was 0.9186, while the architecture's overall accuracy was 95.83%.To increase the accuracy of the detection, they merged audio and image detection (YOLOv3 algorithm) approaches.
In this work [23], a critical review of current disaster management and warning systems was performed, fnding that the SMS alert system was highly useful but had not yet been approved in Romania.Tere were various gadgets which helped us build the system, as well as software modules capable of managing devices all over the world.In addition, along with the SMS alert system, the real-time data collection system was a vital component.Some international researchers had made a few feeble attempts to explore the subject, but this technique had been noted as more accurate.
In this article [24], they outlined automated techniques of extracting information from microblog posts.In particular, they emphasised on extracting precious "nuggets of information," short, and autonomous pieces of information relevant to disaster response.In order to categorise messages into a set of precise classes, their system made use of cuttingedge machine learning algorithms.
In this paper [25], they discussed a technology called artifcial intelligence for disaster response, which classifes tweets in real time into a number of user-defned situational awareness categories.Te platform used a combination of machine learning and human intelligence to assign tags to a subset of messages and create an algorithmic classifcation for the remaining messages.Te platform used an active learning approach to select possible messages to tag and continuously learns to increase the accuracy of traditional training when new training examples become available.
Te focus of this study was on a specifc natural calamity, i.e., landslides.It was lacking in the phase of preparation.Tis research provided a review of the conceptual framework for managing landslides as a natural hazard.Background, goal, model, substance, legitimation, implementation, and contribution were the seven criteria used in the evaluation.Te evaluation revealed the framework's strengths and faws, which could be used by the user to identify any gaps.Te fndings showed that the framework could be used as a guideline for landslide management during the preparedness phase [26].Te conceptual framework for managing landslides as a natural disaster was evaluated in this study.[4,27] and [22].Tey were given as follows.

Cloud Computing Technology. Te cloud computing technology was used in these articles
In this article [27], they put forward a disaster recovery (DR) structure for the e-learning environment.In particular, they described assistance in using the framework provided, and they showed the signifcance of earthquakes and tsunami events response to the e-Learning environment.Tey constructed the model system according to the suggested framework, and they described the results of the experimental use and examination.Going forward, they implemented their disaster recovery framework on a cloud orchestration framework like OpenStack.Tey tried to validate that it is efective in a cross-organizational environment with multipoint organizations.Also, they believed it was necessary to reconfgure an OpenFlow control method in order to shorten live migration time.
In this article [28], they presented a short review of how UAV capabilities had been used in disaster management, and examples of existing use in disaster management with adoption considerations.Disaster domains included fre, tornadoes, fooding, building and dam collapses, crowd monitoring, search and rescue, and postdisaster critical infrastructure monitoring.Tis review might increase alertness and problems in the review of UAVs by those facing crisis and disaster management.
In this study [30], they highlighted unresolved research questions about unmanned aerial vehicles (UAVs) in disaster management (DMS) and indicated the key use of the UAV network for DM.Based on the reviewed study, UAV networks, along with the wireless sensor network (WSN) and Internet of Tings (IoT), were promising future technology for DM applications.Te combined role that WSN, IoT, and UAV systems could play in both natural and manmade DM was the main focus of this article.Tis paper's initial important contribution was the classifcation of ongoing research projects that use various technologies combined with UAVs for DM.It also concentrated on various network architecture and technology utilized in the DMS.Following that, it covered further crucial facets of using UAVs to provide emergency communication during natural and man-made disasters.

Network Performance.
OPNET was used to evaluate the network performance in these papers [29,31].Tey were given as follows.
In this document [31], they evaluated the performance of the UAV-assisted intelligent on-board computing network to speed up search and rescue (SAR) missions and capabilities because it could be rolled out in a short period of time and could assist most people in the event of a disaster.Tey examined network parameters such as delay, speed, and trafc sent and received, as well as path loss for the suggested network.It has also been established that with the suggested parameter optimization, network performance increases considerably, and ultimately resulting in much more efcient SAR missions in the event of a disaster and tough environments.
In this article [29], the authors concentrated on network performance for efective disaster management collaboration of drone edge intelligence and smart wearable devices.Tey concentrated mostly on network connectivity factors to enhance real-time data exchange between wearable smart devices and drone edge intelligence.In this study, pertinent characteristics such as throughput, delay, and the load from drone edge intelligence were taken into account.Furthermore, it was demonstrated that when the aforementioned parameters were properly optimised, network performance may greatly improve, which would have a positive impact on the efectiveness and efciency of search and rescue (SAR) team guiding and coordination.

Dataset Description and Sample Data
Te dataset for the proposed work has been taken from https://www.leadingindia.ai/data-set.
Te number of action classes are eight.Te camera motions are yes, slow, and steady.Te total number of images used for the proposed model is 1,996.Te resolution is 1920 × 1080.Te annotation and its format are Bounding box,.tx(yolo) format.Te information about the dataset used for the proposed model is shown in Table 2.
Table 3 shows various actions of people captured by drone from various angles and locations.Te sample images with various actions of people captured by drone from various angles and locations from the dataset are shown in Table 3. Te actions are human standing and waving, human running, human standing, human sitting walking, human standing and running, human waving, and human standing and walking.

Proposed Algorithm with Flowchart
Te proposed algorithm has used SSD using text detection.Te steps involved in the proposed algorithm are given as follows:

4.2.
Step 2: Data Cleaning and Pre-Processing.Images whose annotations are not given were removed.And, for data preprocessing, following steps were taken: 10 Journal of Electrical and Computer Engineering (a) Frame selection: the extracted frames were repeated as action features when they were frst extracted.We checked by eliminating 10, 15, 20, and 30 frames from the suggested dataset while preserving one frame.(b) Annotations: used Labelme (software) to manually annotate 1,996 images, as the given annotation was not suitable for our proposed algorithm.

4.3.
Step 3: Training the Model.80% of dataset images were used in training the network.We train our model on the using darknet-53 convolutional neural network.On the human action detection dataset for the validation set, we train that network for about a week and get a single 94 percent accuracy.Te darknet framework is used for all training and testing.Te model was then adapted to carry out the detection.We add four convolutional layers and two fully interlinked layers with randomly initialized weights, according to their lead.As detection often requires fnegrain visual information, we have increased network input resolution from 224 × 224 to 448 × 448.Class probabilities and coordinates of the boundary box are predicted by our fnal layer.We want only one bounding box predictor to be responsible for each object during training.Te task of predicting an object has a predictor assigned to it and is based on the prediction of who had the highest IoU current with the terrain truth.Terefore, limiting box predictors were becoming more specialized.A network of convolutional neurons is generally composed of one input layer, one convolutional layer, one pooling layer, and one output layer.Te system captures an image in two dimensions, and the convolution layer extracts and charts the important features of the image in the form of a slider; the pooling layer reduces the image of the input feature, reduce the intricacy of the calculations, and extract the main features from them.Image characteristic information is obtained by convolving.Te fowchart of the YOLOv3 algorithm is shown in Figure 2.
Te third iteration of the YOLO object detection algorithm is called YOLOv3.A YOLO network's structure resembles that of a typical CNN.Before ending with the fully connected layers, it has multiple convolutional and max pooling layers.By employing techniques like multiscale prediction and bounding box prediction through the use of logistic regression, YOLOv3 greatly improved the design.A novel CNN architecture called Darknet-53 is used in YOLOv3.Te ResNet architecture's Darknet-53 variant was created primarily for jobs involving object detection.On a variety of object detection benchmarks, it achieves cuttingedge performance because to its 53 convolutional layers.Anchor boxes in the YOLOv3 version have various scales and aspect ratios.To better ft the size and shape of the objects being detected, the anchor boxes in YOLOv3 are scaled and their aspect ratios are altered.
Furthermore, "feature pyramid networks" (FPN) are introduced in YOLOv3.FPNs are a CNN architecture that can recognise objects at various scales.Tey build a pyramid of feature maps, and they use each level of the pyramid to fnd objects at various scales.Due to the model's ability to view the objects at diferent scales, this helps to improve the detection performance for small objects Te range of object sizes and aspect ratios that YOLOv3 can handle has increased.Figure 3 depicts the architecture diagram for the YOLOv3 algorithm, which has 106 layers of convolution.
One-step approach YOLOv3 features are given as follows.
Tis network was not looking at the complete picture.Rather, these are parts of the picture that may contain the object.
(i) A single neural network predicts bounding boxes and class probabilities for these boxes (ii) Te input image is divided into S × S grids, each with "m" bounding boxes (iii) For each bounding box, the network generates an ofset value and a class probability (iv) Te object in the image is located using bounding boxes that have been selected and have a class probability higher than a threshold value Object de tection � localisation * classification.
It predicts the B boundary boxes for every grid cell, with a confdence score for each box; it detects only one item, regardless of how many boxes B there are; it forecasts class C conditional probabilities (one for each class for the probability of the object class).
As we can understand, the midpoint of the fow is the cutof value, that requires the bounding box for an object in the image and is displayed in Figure 4.
Te object detection method used by the SSD (single shot multibox detector) has two sections as follows: Many predictions, as predicted, have no object.Class "0" is designated by SSD to indicate that it has no object.SSD has no network of delegated regional proposals.Rather, this is a very simple process, small convolution flters are used to calculate the location and class scores.After collecting feature maps, SSD employs three convolution flters for each cell to create predictions.

4.4.
Step 4: Testing the Model.YOLO is exceptionally quick during testing compared to classifer-based techniques because it only requires one network evaluation.Te grid layout guarantees the spatial diversity of the bounding box forecasts.Because it is frequently obvious to which grid cell an object belongs, the network anticipates only one box for each object.For each picture in our network, there are 98 boundary boxes with corresponding class probabilities.On the other hand, large objects or those that are near a lot of cells' boundary can be accurately localised by a lot of cells.Nonmaximal suppression was used for IoU >0.5, and the outcomes were better.

4.5.
Step 5: Performance Evaluation.IoU (Intersection over union) mAP (mean Average Precision), precision, and recall are used for performance evaluation.Tey are given as follows.
Te overlap between two edges is measured through the IoU.We used this to see how close our estimated limit is to the truth of the ground (the boundary of the real object).In some data sets, we use an IoU cut-of (for example, 0.5) to determine whether a prediction is correct or wrong.
IoU is shown in the following equation: IoU (Intersection over Union) represents the prediction of the bbox during testing versus the actual bbox.If the prediction is the same as the actual then the IoU will be 1.Te value of IoU nearer to 1 represents that our model is working fne.
For object detection, the mAP (mean average precision) is the mean of the average precision attained across all classes.It is also worth noting that some studies use the terms average precision and mAP interchangeably.It is shown in the following equation: where TP � True Positive, TN � True Negative, FP � False Positive, and FN � False Negative.

Experiments Results
Natural disasters are unforeseen occurrences that can result in signifcant economic and environmental losses as well as the loss of lives.As an organisation, it is our duty to make sure that suitable recovery plans are in place in case these natural tragedies occur.A save our souls (SOSs) system can be put in place to deal with these kinds of situations.
Te SOS ought to have the ability to retrieve live video feed messages from a variety of sources, including cameras, news stations, and podcasts.Te most typical SOS architecture consists of three main parts, such as Alert, Assess and Act.
Natural disasters, the weather, terrorism, disease outbreaks, etc., will all be given to an SOS system, and an alarm can be generated from this data.International and local incidents are entered into this SOS system.
As part of the assessment step, the threat data are regularly analysed to weed out false positives and identify useful information.Filtering incident data according to our specifc region of interest, the incident's timestamp, its severity, etc., allows us to perform the assessment.Te impact on people, property, and businesses is taken into account, as well as past occurrences, chance of occurrence, and other variables.Ad hoc search, range fltering, and geographical queries should be used to evaluate the data and turn it into useful information during this phase.By doing this, we can simply query the data from a single interface.Tus, it enables the analyst to drill down the data in accordance with its seriousness or can apply any specifc flter that makes the data more understandable.
Te incident data are shared with victims during the action phase.Te decision is made on whether the occurrence is critical and whether it afects our assets or not after the act and assess phase.Emails, personal or business phone numbers, or other forms of contact can be used to communicate this to victims.Te matter is being carefully monitored at this point by focusing on it and alerting upper management about the incident.It can be difcult to carry out the communication phase.
An SOS system can be used in natural disasters to locate and track the current impact zone and location of natural catastrophes such as earthquakes, foods, storms, and volcano explosions.
Efective reaction to security issues, improved operational performance, and protection of the business's assets are some of the benefts.
Real-time location alertness, data sharing between businesses, legal issues, poor implementation, recipients' lack of understanding, device failure, lack of a mobile network, difculties communicating internationally, language barrier, absence of a good procedure manual, etc., are disadvantages of SOS.
SOSs (save our souls) image (help is required image) is given to SSD using text-based detection.Te proposed algorithm detection accuracy is 73%.Te output to show help is required with accuracy is shown in Figure 5.
A natural catastrophe is defned by the abnormal intensity of a natural agent (such as a food, mudslide, earthquake, avalanche, and drought) when the typical precautions to be taken to mitigate the damage were either unable to avoid their emergence or were not able to be taken.No natural catastrophe or not help is detected in Figure 6.Tat is no help is required.
Help is not required image without text is given to SSD using text-based detection.Te proposed algorithm detection accuracy is 73%.Te output to show help is not required with accuracy is shown in Figure 5.
YOLOv3 algorithm is used for only the detection and localization of images.Te detecting human action from various drone images like person standing, person running, person waving, and person walking of drone images using the YOLOv3 deep learning technique is shown in Figure 7.
Te YOLOv3 deep learning technique detect the various actions like person standing, person waving, and person walking with confdence score is shown in Figure 8. Figure 8 shows images with detection along with localization and confdence scores of the same images with the same action classes.
Te proposed model uses the YOLOv3 algorithm.YOLOv3 algorithm is compared with the existing algorithms like F-RCNN, SSD, and RFCN.Te existing algorithms detected only images with six actions not as a text image and showed action detection without a confdence score, and it is shown in Figure 9. YOLOv3 detects images with eight actions, also detect text image and show action detection with a confdence score, and it is shown in Figures 5-7.

Results and Discussion
Te proposed model detects images with eight actions using YOLOv3, also detect text image using SSD and show action detection with a confdence score.Te accuracy comparison between all the existing algorithms like F-RCNN, SSD, and RFCN versus the proposed algorithm YOLOv3 is shown in Table 4 with the eight classes are human standing and waving, human running, human standing, human sitting and human walking, human standing and running, human waving, and human standing and walking.
Figure 10 shows the comparative analysis of the existing algorithm like F-RCNN, SSD, and RFCN with the proposed algorithm (YOLOv3) using the mean average precision values.Figure 11 illustrates the proposed algorithm (YOLOv3), which provides the best accuracy of 95%.Te 14 Journal of Electrical and Computer Engineering accuracy achieved for SSD with text detection with a graphics processing unit (GPU) is 73%.Hence, this result shows that our proposed algorithm outstands in both, less and more number of classes.With 1996 images, while the existing ones have only trained on around 700 images and also only on 6 classes, whereas the proposed model has been trained on 8 classes for 1,996 images with more accuracy and faster results.
Figure 12 shows human action detection on videos and real-time detection by the proposed model.

Conclusion and Future Work
In the conclusion, the proposed algorithm (YOLOv3) achieved faster (gives results in milliseconds), more accurate, and have worked on more number of classes as compared to the existing detections with the proposed model, the proposed model also succeeded in real-time detections where other model fails to do so.As the main objective is to provide assistance for natural disaster management and mitigation team help in drone surveillance, an experimental assessment of the used deep-learning action detection model was reported in the suggested study.
Te average loss was also less and the learning rate was high in the proposed method, which was one of the reasons for the higher accuracy of 95%.Fast YOLO is the fastest and most adaptable object detection system in the world, and it pushes limits on real-time object detection.YOLO is excellent for applications which need speed and reliably recognising objects because it can adapt to new domains.
In future work, more number of action classes (in datasets) can be included like help (waving both hands) and searching which are more specifc towards the search and rescue area.We can try to increase the accuracy and decrease the speed to detect by using more efcient algorithms in the future like YOLOv4 which is right now in the testing phase.We can also try and use good quality drones for clearer pictures which can be then used to train the system.Te proposed work can be further extended for other disasters also like food, tornado, or tsunami.
(a) Dataset size/live simulation data/real time data/ other data (b) Deep learning algorithms (c) Machine learning algorithms (d) Cloud computing technology (e) Robots/smart wearable devices/IoT and (f ) Network performance 2.1.Dataset Size/Live Simulation Data/Real Time Data/Other Data.

Figure 1 :
Figure 1: Flow diagram of drone surveillance process.

(i) Step 1 .
Data set collection and action selection (ii) Step 2. Data cleaning and preprocessing (iii) Step 3. Training the model (iv) Step 4. Testing the model (v) Step 5. Performance evaluation 4.1.Step 1: Data Set Collection and Action Selection.Te dataset has been collected from drone-surveillance video from a leading AI website.Other details about datasets are mentioned above.

Figure 6 :
Figure 6: Output to show not help is required with accuracy.

Figure 7 :
Figure 7: Figure showing various action detections (i.e., person standing, person running, person waving, and person walking) of drone images using YOLOv3 deep learning technique.

Figure 5 :
Figure 5: Output to show help is required.

Figure 9 :
Figure 9: Showing action detection without confdence score.(a) Human standing.(b) Human walking.(c) Human walking and standing.(d) Human waving and walking.(e) Human walking and waving.(f ) Human walking and standing.(g) Human waving.(h) Human waving and standing.(i) Human waving.

Figure 10 :
Figure 10: Comparative analysis of the existing algorithms (F-RCNN, SSD, and RFCN) with the proposed algorithm (YOLOv3) using the mean average precision (mAP) values.

Figure 11 :
Figure 11: mAP of the proposed model with GPU.

Figure 12 :
Figure 12: Showing human action detection on videos and real time detection.

Table 1 :
Existing papers on NDMM and SAR.

Table 4 :
Comparison between all the existing algorithms v/s the proposed algorithm.