Energy-Efficient Unmanned Aerial Vehicle (UAV) Surveillance Utilizing Artificial Intelligence (AI)

Recently, unmanned aerial vehicles (UAVs) enhance connectivity and accessibility for civilian and military applications. A group of UAVs with on-board cameras usually monitors or collects information about designated areas. The UAVs can build a distributed network to share/exchange and to process collected sensing data before sending to a data processing center. A huge data transmission among them may cause latency and high-energy consumption. This paper deploys arti ﬁ cial intelligent (AI) techniques to process the video data streaming among the UAVs. Thus, each distributed UAV only needs to send a certain required information to each other. Each UAV processes data utilizing AI and only sends the data that matters to the others. The UAVs, formed as a connected network, communicate within a short communication range and share their own data to each other. Convolution neural network (CNN) technique extracts feature from images automatically that the UAVs only send the moving objects instead of the whole frames. This signi ﬁ cantly reduces redundant information for either each UAV or the whole network and saves a huge energy consumption for the network. The UAVs can also save energy for their motion in the sensing ﬁ eld. In addition, a ﬂ ocking control algorithm is deployed to lead the group of UAVs in the working ﬁ elds and to avoid obstacles if needed. Simulation and experimental results are provided to verify the proposed algorithms in either AI-based data processing or controlling the UAVs. The results show promising points to save energy for the networks.


Introduction
Autonomous UAV networks have been deployed in many applications in both military and civil fields. With the ability to handle large data as well as maneuverability, UAVs are capable of completing a wide range of applications such as oil and gas facilities for security [1], surveillance [2], emergency response, and seaport [3]. They are dynamic and effective for sensing and monitoring surveillance purposes [4], and especially, they can be the core technology in Internet of Things vision in which the distributed UAVs can collect sensing data and exchange the data to each other [5,6].
An UAV network consists of sensing devices, control algorithms, and communications. UAVs in the network cooperatively work together to complete specific missions. Each UAV can obtain visual sensing data by its equipped camera. The sensing data is then exchanged throughout the network for mission purposes. There are two main structures of information sharing: centralized and distributed [7]. In centralized networks, a central processor performs all tasks include collecting, computing, and delivering commands to other nodes in a network. The centralized scheme has a single point failure of the central processor, and other nodes must maintain a connection with the central node. In distributed networks, information is exchanged between nodes, and the computation and decision-making strategies are performed in each UAV itself. Usually, UAV networks operate in a distributed fashion to improve robustness and reduce communication burden as an UAV only needs to connect with its neighbors. Besides information sharing, another consideration is control algorithms for multiple UAV formations. In UAV-based surveillance systems, UAVs have to encounter numerous obstacles because they normally operate at low altitudes in urban environments due to policy restrictions [8].
Control algorithms should be able to drive an UAV formation to targeted areas without collisions with obstacles as well as other UAVs. In [9], a control algorithm for a team of micro-UAVs based on a leader-follower approach was proposed. The above-proposed methods have shown a good performance in terms of formation shape keeping and smooth maneuvering. However, the obstacle avoidance has not been considered. The Artificial Potential Field (APF) method has been investigated to deal with obstacle avoidance problems [10,11]. In the papers, the impulsive and attractive forces are generated by the potential field for an agent to avoid collision and remain the desired distance in a formation. However, the APF method possesses limitations due to local minimal problems. At these points, the total force due to attractive and repulsive forces is zero, which prevents the UAV to reach targets. In addition, APF methods have shown poor performance in handling obstacles that have convex and concave shapes [12]. Another powerful approach for controlling swarm robots is flocking control which was first proposed by Olfati-Saber [13]. In flocking control, agents in a group only need to keep a certain distance from their neighbors, which is different from formation control algorithms where agents maintain a rigid position respecting their neighbors. Flocking control algorithms allow the formation to effectively change formation shapes when encountering obstacles. This feature makes flocking algorithms become suitable approaches for UAVbased surveillance systems.
UAV networks have been providing the most successful application for surveillance systems. An UAV-based platform for drought mapping of agricultural crops is presented in [14]. In [15], multiple UAVs are used to monitor and detect traffic congestions. A framework for wildfire monitoring based on multiple UAV system is developed in [16]. Surveillance tasks often require rapid ability to monitor multiple interested points. As UAVs operate in aerial environments, they have a broader vision and encounter few obstacles than other kinds of robots. These features make UAVs become appropriate approaches in surveillance systems. The intelligent surveillance system (ISS) is a surveillance system with strong data analysis capabilities. An ISS can not only detect or track objects but also analyze data to anticipate behaviors of objects or upcoming events. These kinds of work have been done with minimal intervention from human. Numerous applications of ISS can be found in literature like traffic monitoring [17,18] or home security [19]. The ISS is a modern technology that makes use of knowledge from various technical fields such as sensing devices, communications, signal processing, and artificial intelligence (AI) [20]. However, due to a large number of cameras deployed in practical surveillance systems, the collected sensing data from the cameras are also large. This leads to numerous issues in terms of system accuracy, time, data complexity, etc.
The development of AI technologies has been rapidly increased in recent years. In [21], motion information is combined with a convolution neural network (CNN) to classify and to track a crowd of people. Sultani et al. [22] develop specific classification models to recognize events and correctly identify various activities of human. In [23], the authors propose a knowledge representation framework for describing patterns in video sequences. The proposed framework has shown more advantages in the ability to rapidly detect objects on screen compared to deep learning techniques. AI techniques have been also used in managing network traffic. Ant Colony Optimization (ACO) is applied to improve the performance of software-defined networks (SDN). The quality of experience of SDN increased 24.1% by applying ACO on the weight graph of the SDN controller. Most AI algorithms usually require powerful hardware to process a huge amount of data. This feature limits applications of advanced AI-based signal processing algorithms in practice.
The hardware constraints are more strict in UAV network surveillance systems. An UAV can only bring a finite amount of batteries. Equipping more onboard processing devices will increase the weight of UAVs that reduces their operating time. Commercial UAVs can operate within 20-40 minutes per charged cycle [4]. Most of the energy consumed comes from propulsion [24], which can be solved by optimizing total flight time in case of data collection and analysis tasks in wireless sensor networks application on a single charge [25]. In surveillance applications, the UAVs often perform tasks at a certain altitude and position until the energy runs out, optimizing the flight time may not be appropriate. The monitoring or sensing data could be images or videos that may cost a big amount of memory storage in each UAV. This also consumes a lot of energy consumption in case of UAVs transmitting data to server sides or between UAVs via wireless data transmission. As mentioned in [26], the power consumption of wireless data transmission is proportional with the package size; thus, the smaller the size of transmitted data, the smaller the energy consumption.
As shown in Figure 1, in surveillance application, each UAV monitors a certain area. Data from the UAVs can be exchanged between neighboring UAVs. The data collection in the form of a video format of UAVs may cost a big amount of memory storage in each UAV and also the transmission bandwidth. In addition, while performing surveillance, the UAVs often fly in a fixed position; hence, most of the scenes do not change over time, and only moving objects are noticeable. The transmission of redundant data such as background frame and overlapped area is a waste of resources [27]; however, further analysis tasks are only concerned with moving objects.
In this work, a framework for high-energy-efficient UAV surveillance networks is proposed. A group of UAVs is deployed to cover an area that needs to be observed. A flocking algorithm is used to drive a group of UAVs moving to sensing areas. The algorithm guarantees that the UAV team can safely travel to required locations and forms an appropriate shape to cover the sensing areas. Then, an AI-based method is proposed with the aim of reducing redundant data for the UAVs while performing surveillance tasks of collecting data. The data processing algorithm can be divided into three main steps: (i) background modeling which removes all moving objects in scenes and background stitching that combines the background modeled from each UAV, (ii) noticed object extraction of each frame captured by UAVs, and (iii) data reconstruction of combined background modeling in step (i) and noticed objects from step (ii). The methodology can be referred as a kind of compress sensing technique which is aimed at saving power consumption by removing such redundant data of sensors [28,29].
The rest of this paper is organized as follows: Section 2 provides briefly the system models that describe either the UAV network deploying in the sensing field or the AIbased methods to process the surveillance data collected from the UAVs. In Section 3, the whole problems are addressed. The flocking control algorithm and the AIbased data processing method are provided in detail. Section 4 presents both simulation and experimental results following all the steps modeled in Section 3. Finally, conclusions and future research directions are provided in Section 5.

System Model
In this section, the system models are presented. First is a model of an UAV network with the ability to travel, to avoid obstacles, and to collect video streaming data. Each distributed UAV can also be able to exchange the data with neighbors to construct the completed information of sensing regions. The AI-based data processing framework for enhancing an energy-efficient approach of a UAV network is analyzed.

Multiple UAV Systems.
Considering a team of UAVs, the team is deployed in a ground center. After receiving a mis-sion request task, the UAV team will move to a target location. The target location is defined as a virtual leader to be able to lead the UAVs in a flocking control algorithm. The collaborative algorithm in [30] is chosen to drive the UAV team. The UAV formation can safely reconfigure formation shapes to avoid collisions with obstacles while migrating. When UAV team arrives at the target place, the team gradually forms quasilattice formation to fully cover sensing areas.
Assuming each UAV can obtain its global position by sensors such as GPS. A downward camera is mounted on an UAV, which provides each UAV a constant sensing range of RS. UAVs are equipped with short-range wireless communication devices that allow them to wirelessly communicate with the others if the Euclidean distance between them is smaller than a constant R c , noted as the communication range. Different from [30], the sensing range R S is not required to be smaller than the communication range R c for ensuring nonoverlapped regions. In this work, overlapped regions are acceptable to guarantee coverage performance. While processing, the overlapped data is handled by an AI-based data processing algorithm, which is proposed as follows.

AI-Based Data Processing
Method. The structure of the UAV system is given in Figure 1. An UAV monitors a distinct area, and the areas handled by different UAVs might be overlapped with each other. UAVs form a distributed network and share their local sensing information with others to reconstruct global sensing data.
In the first step, background modeling is performed on the UAVs. Captured videos are processed to create backgrounds that consist of only nonmoving objects. Then, the backgrounds are sent to the neighbors at the begging and only updated as there are any changes in backgrounds. The individual backgrounds are then stitched together to form a complete background of the sensing area. In the case of overlapped images, an overlapping detection algorithm is presented. In detecting keypoints and local invariant descriptors, then matching descriptors of overlapping images, a random sample consensus (RANSAC) algorithm is utilized to obtain homography. The obtained homography matrix is then used to warp and stitch overlapped pictures.
Secondly, UAVs perform object extraction functions where moving objects are detected by comparing differences in the continuous sequence of frames. If there are motions which is detected, details of moving objects are determined by a convolution neural network (CNN). These useful data are also shared among UAV networks.
Finally, the reconstructed images are built based on extracted data sent by other UAVs. Reconstructed processes can be performed on an UAV. As sensing data are reduced by the proposed method, a burden on transmission bandwidth and computational resources is greatly diminished.

Problem Formulation
This section presents an overview of approaches to the problems in multiple UAV-based surveillance systems. First, the

Wireless Communications and Mobile Computing
flocking algorithm to drive the UAV formations to navigate sensing areas is presented. Next, the AI-based data processing method is given. Three steps as shown in Figure 2 are presented in details.
3.1. Flocking Control for Multi-UAV Systems. In this section, the flocking algorithm for controlling a formation of UAV is presented. The network of a N-UAV team is modeled by graph GðV, EÞ, where the vertex set V = f1, 2, ⋯, Ng represents UAV members. The edge set E = fði, jÞ: i, j ∈ V, i ≠ jg is the communication link between two UAVs. Let p i , v i ∈ R 2 denote the position and velocity vectors of the i th UAV ði = 1, 2, ⋯, NÞ . The dynamics of an UAV is described by double integrators model as where u i is the control input vector for the UAV i th . Equation (1) can be used to model distributed UAVs having omnidirectional motion capacity.
Consider each UAV has a communication range R c . The number of neighbors of the i th UAV is defined by where kp j − p i k is the Euclidean distance. The superscript indicates the actual neighbors of UAV i th . To provide the obstacle avoidance ability, the term "virtual neighbors" is introduced. The virtual neighbors of UAV i th are defined as where r o is the obstacle detection range, V o is the set of obstacles, and p ik is the position of UAV i th projecting on the k th obstacles. The virtual neighbors are utilized to generate the repulsive forces to avoid the collision between UAVs and obstacles. A team of UAVs forms a certain formation structure to navigate a large sensing field. Each UAV must avoid collisions with other members as well as obstacles. The distributed flocking algorithm consists of three components, namely, formation control f f i , obstacle avoidance f o i , and navigation f n i , which is given as The first term f f i is the formation control term to generate the attractive and repulsive forces for UAV members to form a formation. This term is also used to regulate the velocity matching of UAVs in group. The term is designed as [21] No Input frame   Wireless Communications and Mobile Computing where c a 1 , c a 2 are positive constants, ϕ a is the action function [19], n ij is the vector connecting p i and p j , and the kk σ is defined as kxk σ = ð1/ϵÞ½ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi 1 + ϵkxk 2 p − 1 with constant ϵ > 0. The kk σ is differential everywhere and is utilized to construct smooth colective potentials.
The second term f o i is introduced to prevent UAVs from collisions with obstacles in sensing environments. This term is designed by where ϕ o ð:Þ is the repulsive action function [19] and n ik is the vector along the line connecting p ik and p i . The adjacency matrices A = ½a ij and B = ½b ik are defined by a graph [31]. The UAVs may be deployed in a ground center, and they have to travel to certain places depending on sensing missions. The last component, navigation term, is introduced to provide navigating abilities for UAV formations. The component is given as where c 1 , c 2 are the positive constants, p ðN a i ∪figÞ is defined as  The median filter technique is used to perform background modeling. The main idea of the median filter is to run through the signal entry by entry, replacing each entry with the median of neighboring entries. In this work, thanks to the idea of median filter technique, a number of frames are chosen randomly from the video captured by UAVs and background modeling tasks are performed. The number of frames chosen may vary through experiments to choose an appropriate number of frames to achieve better results of performance. Input: Two consecutive frames, t and t+k, of the input video. Output: Motion area in each frame. 1. The video is segmented into frames, (consider each frame as an image).

Two images / frames (A & B) that is background modeled and frame t 3. Converting the images A & B into gray scale.
4. Computing a difference between these two gray scale images. 5. If, significant difference is detected between these A & B frames, it can conclude that some motion has occurred.

Wireless Communications and Mobile Computing
After UAVs perform their background modeling processes, the significant data will be sent to neighbors for reconstruction. However, in case of overlapped areas monitored by UAVs, it is not necessary to stitch those areas. In order to do that, the algorithm is called overlapped area detection in which the key points and its corresponding between backgrounds perform the background stitching. Algorithm 1 represents the overlapped area detection steps as follows.

Noticed Object Extraction. Object detection task in aerial
images is a challenging and interesting problem. With the cost of drones or UAVs decreasing, more aerial devices could be deployed. Hence, there is a surge in the amount of aerial data being generated. It will be very useful to have models that can extract valuable information from aerial data. However, since most objects are only a few pixels wide, some objects are occluded and objects in shade are even harder to detect. Thus, a hybrid noticed object extraction system that is a combination of existing method and custom object classification model to extract a valuable information from aerial data is proposed in this work.
Firstly, frame difference and thresholding technique are applied in each frame to estimate the moving areas that can be referred as the noticed area. For surveillance tasks with UAVs, the objects are often very small, so that directly applying object detection algorithms can result in missing or incorrectly detected objects. Therefore, in this paper, the first step is to determine the motion area by comparing frame t with frame t + k, for k = 1, 2, 3, ⋯ to find the difference and thus is motion area as shown in Algorithm 2.
After that, for those areas, a custom convolution neural network (CNN) object classification model is built to perform the classified task whether the area contain objects. If the area contains objects that are predefined in the training model process, object tracking task is executed and extracted the significant data.
Because of the difference of image size of training data, at the first step, all the images used for training classified model are resized into squares in size. The choice of image size that affects the performance of the model can be done through experiments. Because the training images are extracted from the dataset, they are small in size, so a lightweight classified model is built. From the resized images of size W × H × 3, W is the width, H is the height, and 3 is the number of channel layer. The first one is the convolution layer with 64 feature maps with 3 × 3 kernel size and the stride of 1. After that, these feature maps are downsampled by 2 × 2 max-pooling layer and at the end of stage 1 are downsampled by a dropout layer as shown in Table 1.
After classifying the areas with objects, the object tracking algorithm is implemented to track and extract the information of the objects towards the central UAVs or between the UAVs.

Data Reconstruction.
The data reconstruction can be performed between UAVs. Objects that are extracted from previous UAVs will be sent to the central UAVs or nearby UAVs along with their locations in the frame. Based on the information received, UAVs can process and perform data reconstruction. The most important point is that each distributed UAV can achieve the whole data collected from the network with a minimal storage. This shows the effectiveness of the proposed method.

Simulation and Experimental Results
In this section, the proposed control algorithm known as the flocking algorithm is simulated with 10 UAVs. Then, the experimental results show all the steps mentioned in Section 3. The experiment results are performed by using Python 3.6  Wireless Communications and Mobile Computing

Simulation Results with Flocking Algorithm.
In this section, a group of 10 UAVs is deployed to perform surveillance tasks in a vast area. We define a unit to calculate Euclidean distances among UAV and obstacles in the sensing field. Each UAV has a constant communication range R c = 10 units. A virtual leader represents a target location for the group. The group is led by a control algorithm as mentioned in Equation (4).
In Figure 3, as the UAVs in the group encounter obstacles, they separate and move around obstacles. Connections among members may be interrupted without affecting control performance. When the group reaches the target, UAV members gradually reconnect to form a quasialpha lattice shape. The shape covers an area around a target point. Figure 4 illustrates the quasialpha lattices shape formed by the UAV group. All 10 UAVs will stay around the virtual leader and start their mission as surveillance tasks. Based on a fixed communication range R c , they keep their connections to their neighbors as a grid network as shown in the figure.

Stanford Drone Dataset.
Stanford Drone Dataset is a massive dataset of aerial images collected by drone over the Stanford Campus. The dataset is ideal for object detection and tracking problems. It contains about 60 aerial videos. For each video, we have bounding box coordinates for the 6 classes as "Pedestrian," "Biker," "Skateboarder," "Cart," "Car," and "Bus". The dataset is very rich in pedestrians and bikers with these 2 classes covering about 85%-95% of the annotations.

Background
Modeling. By using multiple frames received from UAV surveillance, the backgrounds are modelled by using a median filter technique. The number of frames needed to perform the background can be selected depending on the scene and location of the monitoring task. In this paper, because the UAVs perform the task of supervising the school staff, the traffic volume of vehicles and people is low. Therefore, 20 frames were randomly selected to perform the background as shown in Figure 5.
After the background from different scenes is captured and sent by UAVs to other UAVs, in case of overlap, the overlapping areas will be handled by the algorithm presented in the previous section as shown in Figure 6.

Noticed Object Extraction
4.4.1. Moving Area Estimation. After background modelling, moving area detection can be done through the background itself and following frames that may contain moving objects. The different area based on the predefined threshold will transform into ½0, 1. By using this technique instead of the traditional differencing frame method, the result is better as shown in Figure 7.  Table 1. After many experiments select the best hyperparameter, the model classifies the object for the best performance with the number of epochs = 6 and steps per epoch = 1000. The training accuracy at epoch = 6 is 0.965, training loss = 0:11, testing accuracy = 0:90, and testing loss = 0:39, as shown in Figure 8.  To evaluate the performance of the custom CNN object classification model, F1-score, accuracy, and recall metrics are used, as shown in Table 2. Due to the smaller size of the object of dataset, the performance of the proposed network could be enhanced in future improvement. In contrast, the proposed network shows a very good computational efficiency with a smaller number of operations in both training and classifying stages.

Moving Object
Tracking and Extraction. The model object classification can be classified into two types of objects or nonobjects. If the moving area is an object, the object tracking algorithm will be implemented, and the system will be ignored. Figure 9 shows some images extracted from a categorized frame including object and nonobject with a different size.

Data Reconstruction.
After moving objects are tracked and extracted, based on position information, x, y, w, h, where x, y are the starting location and w, h is the width and height of the size of rectangle bounded region of interest, are achieved. To measure the effectiveness of the proposed method, the videos in the Stanford Drone Dataset are split into frames and compared to the total capacity of the background and the moving object extracted. The videos in the Stanford Drone Dataset have 1422 × 1945, 30 fps, and data rate =50890 kbps. 300 frames are taken from any video with a capacity of 138 MB. After processing, the total remaining capacity is approximately 14 MB. Thus, the percentage reduction is 90% but still ensures the quality of the image. Indeed, the proportion of objects in each frame is extremely small compared to that of whole frames. If we can eliminate most of the unnecessary data, this can significantly reduce a huge amount of data. The results either save energy for data transmission among UAVs or save storage capacity for UAVs.
Overall, the distributed network of UAVs can significantly reduce a huge data transmission for the video surveillance purposes based on the AI data processing methods. In addition, the flocking control algorithm also helps the UAVs working in the fields that are suitable to the working tasks. The energy-efficient approach is presented and solved completely.

Conclusions and Future Developments
This paper proposes new methods either to control multiple UAVs or to process video surveillance data based on AI techniques with CNN. The flocking control algorithms are applied into distributed UAVs to lead the UAVs travelling on the working fields and avoiding collision and obstacles. The AI-based data processing method that reduces significant redundant data streaming among UAVs is proposed. The method also reduces the training time and classification time compared to existing methods, such as YOLO detection. The overall proposed methods help reducing the  Wireless Communications and Mobile Computing storage capacity, transmission bandwidth, and performance in surveillance application of UAVs. Indeed, the proportion of objects in each frame is extremely small, and the transmission of redundancy in each frame is not necessary. The application of the method helps to reduce approximately 90% of the excess data capacity but still ensures the quality of the image. This significantly reduces the energy consumption for UAVs in their tasks.
Future research can be done to enhance the proposed solution. In order to improve the system performance, the crucial process is an object classification task that will classify the wrong area detected from the previous step, thereby improving the efficiency of the method. Moreover, when applied to more complex applications such as traffic surveillance and agriculture, more types of object should be considered.

Data Availability
All the data used to support the findings of this study are included within the article.

Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.