1. Introduction

Journal of Sensors

1687-7268 1687-725X

Hindawi

10.1155/2017/3296874

3296874

Review Article

A Review of Deep Learning Methods and Applications for Unmanned Aerial Vehicles

http://orcid.org/0000-0001-6711-7279

Carrio

Adrian

¹ Sampedro

Carlos

¹ Rodriguez-Ramos

Alejandro

¹ Campoy

Pascual

¹ Tyrsa

Vera

Computer Vision and Aerial Robotics Group

Centre for Automation and Robotics (CAR) UPM-CSIC

Universidad Politécnica de Madrid

Calle José Gutiérrez Abascal 2

28006 Madrid

Spain

upm.es

2017

1482017

2017 28 04 2017 18 06 2017 1482017

2017

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Deep learning is recently showing outstanding results for solving a wide variety of robotic tasks in the areas of perception, planning, localization, and control. Its excellent capabilities for learning representations from the complex data acquired in real environments make it extremely suitable for many kinds of autonomous robotic applications. In parallel, Unmanned Aerial Vehicles (UAVs) are currently being extensively applied for several types of civilian tasks in applications going from security, surveillance, and disaster rescue to parcel delivery or warehouse management. In this paper, a thorough review has been performed on recent reported uses and applications of deep learning for UAVs, including the most relevant developments as well as their performances and limitations. In addition, a detailed explanation of the main deep learning techniques is provided. We conclude with a description of the main challenges for the application of deep learning for UAV-based solutions.

Spanish Ministry of Science

DPI2014-60139-R

1. Introduction

Recent successes of deep learning techniques in solving many complex tasks by learning from raw sensor data have created a lot of excitement in the research community. However, deep learning is not a recent technology. It started being used back in 1971, when Ivakhnenko [1] trained an 8-layer neural network using the Group Method of Data Handling (GMDH) algorithm. The term deep learning began to be used during the 2000s, when Convolutional Neural Networks (CNNs), a computational original model from the 80s [2] but trained efficiently in the 90s [3], were able to provide decent results in visual object recognition tasks. At the time, datasets were small and computers were not powerful enough, so the performance was often similar to or worse than that of classical Computer Vision algorithms. The development of CUDA for Nvidia GPUs which enabled over 1000 GFLOPS per second and the publication of the ImageNet dataset, with 1.2 million images classified in 1000 categories [4], were important facts for the popularization of CNNs with several layers (109 to 1010 connections and 107 to 109 parameters). These deep models show great performance not only in Computer Vision tasks but also in other tasks such as speech recognition, signal processing, and natural language processing [5]. More details about recent advances in deep learning can be found in [6, 7].

An evidence of the suitability of deep learning for many kinds of autonomous robotic applications is the increasing trend in deep learning robot related scientific publications over the past decades, which is expected to continue growing [8].

Due to the versatility, automation capabilities, and low cost of Unmanned Aerial Vehicles (UAVs), civilian applications in diverse fields have experienced a drastic increase during the last years. Some examples include power line inspection [9], wildlife conservation [10], building inspection [11], and precision agriculture [12]. However, UAVs have limitations in the size, weight, and power consumption of the payload and limited range and endurance. These limitations cannot be overlooked and are particularly relevant when deep learning algorithms are required to run on board a UAV.

In this survey, we have grouped publications according to the taxonomy proposed in Aerostack [13], which is aerial robotics architecture consistent with the usual components related to perception, guidance, navigation, and control of unmanned rotorcraft systems. The purpose of referring to this architecture, depicted in Figure 1, is to achieve a better understanding about the nature of the components to the aerial robotic systems analyzed. Using this taxonomy also helps identify the components in which deep learning has not been applied yet. According to Aerostack, the components constituting an unmanned aerial robotic system can be classified into the following systems and interfaces:(i)

Hardware interfaces: this category includes interfaces with both sensors and actuators

(ii)

Motor system: the components of a motor system are motion controllers, which typically receive commands of desired values for a variable (position, orientation, or speed). These desired values are translated into low-level commands that are sent to actuators

(iii)

Feature extraction system: feature extraction here refers to the extraction of useful features or representations from sensor data. The task of most deep learning algorithms is to learn data representations, so feature extraction systems are somewhat inherent to deep learning algorithms

(iv)

Situational awareness system: this system includes components that compile sensor information into state variables regarding the robot and its environment, pursuing environment understanding. An example component within the situational awareness system is SLAM algorithms

(v)

Executive system: this system receives high-level symbolic actions and generates detailed behaviour sequences

(vi)

Planning system: this type of system generates global solutions to complex tasks by means of planning (e.g., path planning and mission planning)

(vii)

Supervision system: components in the supervision system simulate self-awareness in the sense of ability to supervise other integrated systems. We can exemplify this type of component with an algorithm that checks whether the robot is actually making progress towards its goal and reacts in the presence of problems (unexpected obstacles, faults, etc.) with recovery actions

(viii)

Communication system: the components in the communication system are responsible for establishing an adequate communication with human operators and/or other robots

Figure 1

Aerostack architecture, consisting of a layered structure, corresponding to the different abstraction levels in an unmanned aerial robotic system. The architecture has been applied here to systematically classify deep learning-based algorithms available in the state of the art which have been deployed for applications with Unmanned Aerial Vehicles.

The remainder of this paper is as follows: firstly, Section 2 covers a description of the currently relevant and prominent deep learning algorithms. For the sake of completeness, deep learning algorithms have been included regardless of their direct use in UAV applications. Section 3 presents the state of the art in deep learning for feature extraction in UAV applications. Section 4 surveys UAV applications of deep learning for the development of components of planning and situation awareness systems. Reported applications of deep learning for motion control in UAVs are presented in Section 5. Finally, a discussion of the main challenges for the application of deep learning for UAVs is covered in Section 6.

2. Deep Learning in the Context of Machine Learning

Machine Learning is a capability enabling Artificial Intelligence (AI) systems to learn from data. A good definition for what learning involves is the following: “a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E” [15]. The nature of this experience E is typically considered for classifying Machine Learning algorithms into the following three categories: supervised, unsupervised, and reinforcement learning:(i)

In supervised learning, algorithms are presented with a dataset containing a collection of features. Additionally, labels or target values are provided for each sample. This mapping of features to labels of target values is where the knowledge is encoded. Once it has learned, the algorithm is expected to find the mapping from the features of unseen samples to their correct labels or target values.

(ii)

The purpose in unsupervised learning is to extract meaningful representations and explain key features of the data. No labels or target values are necessary in this case in order to learn from the data.

(iii)

In reinforcement learning algorithms, an AI agent interacts with a real or simulated environment. This interaction provides feedback between the learning system and the interaction experience which is useful to improve performance in the task being learned.

Deep learning algorithms are a subset of Machine Learning algorithms that typically involve learning representations at different hierarchy levels to enable building complex concepts out of simpler ones. The following paragraphs cover the most relevant deep learning technologies currently available in supervised, unsupervised, and reinforcement learning.

2.1. Supervised Learning

Supervised learning algorithms learn how to associate an input with some output, given a training set of examples of inputs and outputs [16]. The following paragraphs cover the most relevant algorithms nowadays in supervised learning: Feedforward Neural Networks, a popular variation of these called Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and a variation of RNNs called Long Short-Term Memory (LSTM) models.

Feedforward Neural Networks, also known as Multilayer Perceptrons (MLPs), are the most common supervised learning models. Their purpose is to work as function approximators: given a sample vector x with n features, a trained algorithm is expected to produce an output value or classification category y that is consistent with the mapping of inputs and outputs provided in the training set. The approximated function is usually built by stacking together several hidden layers that are activated in chain to obtain the desired output. The number of hidden layers is usually referred to as the depth of the model, which explains the origin of the term deep learning: learning using models with several layers. These layers are made up of neurons or units whose activation given an input vector x∈Rn is given by the following equation:(1)aθx=gθTx,where θ is a vector of n weights and g is an activation function that is usually chosen to be nonlinear. The activation of unit k in layer m given its n inputs (outputs of the previous layer m-1) is given by the following equation:(2)akm=gΘk0m-1a0m-1+Θk1m-1a1m-1+⋯+Θknm-1anm-1.

During the process of learning, the weights in each unit are updated using backpropagation in order to optimize a cost function, which generally indicates the similarity between the desired outputs and the actual ones.

Convolutional Neural Networks (CNNs), depicted in Figure 2, are a specific type of models conceived to accept 2-dimensional input data, such as images or time series data. These models take their name from the mathematical linear operation of convolution which is always present in at least one of the layers of the network. The most typical convolution operation used in deep learning is 2D convolution of a 2-dimensional image I with a 2-dimensional kernel K, given by the following equation:(3)Ci,j=I∗Ki,j=∑m∑nIm,nKi-m,j-n.

Figure 2

A generic example of a Convolutional Neural Network model. The usual architecture alternates convolution and subsampling layers. Fully connected neurons are used in the last layers.

The output of the convolution operation is usually run through a nonlinear activation function and then further modified by means of a pooling function, which replaces the output in a certain location with a value obtained from nearby outputs. This pooling function helps make the representation learned invariant to small translations of the input and performs subsampling of the input data. The most common pooling function is max pooling, which replaces the output with the maximum activation within a rectangular neighborhood. Convolution and pooling layers are stacked together to achieve feature learning in a hierarchical way. For example, when learning from images, layers closer to the input learn low-level feature representations (i.e., edges and corners) and those closer to the output learn higher level representations (i.e., contours and parts of objects). Once the features of interest have been learned, their activations are used in final layers, which are usually made up of fully connected neurons, to classify the input or perform value regression with it.

In contrast to MLPs, Recurrent Neural Networks (RNNs) are models in which the output is a function of not only the current inputs but also of the previous outputs, which are encoded into a hidden state h. This means that RNNs have memory of the previous outputs and therefore can encode the information present in the sequence itself, something that MLPs cannot do. As a consequence, this type of model can be very useful to learn from sequential data. The memory is encoded into an internal state and updated as indicated in the following equation:(4)ht=gWxt+Uht-1,where ht represents the hidden state at time step t. The weight matrices W (input-to-hidden) and U (hidden-to-hidden) determine the importance given to the current input and to the previous state, respectively. The activation is computed with a third weight matrix V (hidden-to-output) as indicated by the following equation:(5)at=Vht.

RNNs are usually trained using Backpropagation Through Time (BPTT), an extension of backpropagation which takes into account temporality in order to compute the gradients. Using this method with long temporal sequences can lead to several issues. Gradients accumulated over a long sequence can become immeasurably large or extremely small. These problems are referred to as exploding gradients and vanishing gradients, respectively. Exploding gradients are easier to solve, as they can be truncated or squashed, whereas vanishing gradients can become too small for networks to learn from and for the resolution of a computer to enable its representation.

Long Short-Term Memory (LSTM) models are a type of RNN architecture proposed in 1997 by Hochreiter and Schmidhuber [17] which successfully overcomes the problem of vanishing gradients by maintaining a more constant error through the use of gated cells, which effectively allow for continuous learning over a larger number of time steps. A typical LSTM cell is depicted in Figure 3. The input, output, and forget gate vector activations in a standard LSTM are given as follows:(6)it=gWixt+Uiht-1,ot=gWoxt+Uoht-1,ft=gWfxt+Ufht-1.

Figure 3

A long-short term memory model, adapted from the original figure in [14]. Learned weights control how data enter and leave and are deleted through the use of gates.

The cell state vector activation is given by the following equation:(7)ct=ft∘ct-1+it∘gWcxt+Ucht-1,where ∘ represents the Hadamard product. Finally, the output gate vector activation is given by the following equation:(8)ht=ot∘gct.

As it has been already stated, LSTM gated cells in RNNs have internal recurrence, besides the outer recurrence of RNNs. Cells store an internal state, which can be written to and read from them. There are gates controlling how data enter and leave and are deleted from this cell state. Those gates act on the signals they receive, and, similar to a standard neural network, they block or pass on information based on its strength and importance using their own sets of weights. Those weights, as the weights that modulate input and hidden states, are adjusted via the recurrent network’s learning process. The cells learn when to allow data to enter and leave or be deleted through the iterative process of making guesses, backpropagating error, and adjusting weights via gradient descent. This type of model architecture allows successful learning from long sequences, helping to capture diverse time scales and remote dependencies. Practical aspects on the use of LSTMs and other deep learning architectures can be found in [18].

2.2. Unsupervised Learning

Unsupervised learning aims towards the development of models that are capable of extracting meaningful and high-level representations from high-dimensional sensory unlabeled data. This functionality is inspired by the visual cortex which requires very small amount of labeled data.

Deep Generative Models such as Deep Belief Networks (DBNs) [19, 20] allow the learning of several layers of nonlinear features in an unsupervised manner. DBNs are built by stacking several Restricted Boltzmann Machines (RBMs) [21, 22], resulting in a hybrid model in which the top two layers form a RBM and the bottom layers act as a directed graph constituting a Sigmoid Belief Network (SBN). The learning algorithm proposed in [19] is supposed to be one of the first efficient ways of learning DBNs by introducing a greedy layer-by-layer training in order to obtain a deep hierarchical model. In this greedy learning procedure, the hidden activity patterns obtained in the current layer are used as the “visible” data for training the RBM of the next layer. Once the stacked RBMs have been learned and combined to form a DBN, a fine-tuning procedure using a contrastive version of the wake-sleep algorithm [23] is applied.

For a better understanding, the theoretical details of RBMs are provided in the following equations. The energy of a joint configuration {v,h} can be calculated as follows:(9)Ev,h;θ=-∑i∈visvibi-∑j∈hidhjaj-∑i,jWijvihj,where θ={W,b,a} represent the model parameters. v∈{0,1} are the “visible” stochastic binary units, which are connected to the “hidden” stochastic binary units h∈{0,1}. The bias terms are denoted by bi for the visible units and aj for the hidden units.

The probability of a joint configuration over both visible and hidden units depends on the energy of that joint configuration and is given by (10), where Z(θ) represents the partition function (see (11)):(10)Pv,h;θ=1Zθexp⁡-Ev,h;θ,(11)Zθ=∑v∑hexp⁡-Ev,h;θ.

The probability assigned by the model to a visible vector v can be computed as expressed in the following equation:(12)Pv;θ=1Zθ∑hexp⁡-Ev,h;θ.

The conditional distributions over hidden variables h and visible variables v can be extracted using (13). Once a training sample is presented to the model, the binary states of the hidden variables are set to 1 with probability given by (14). Analogously, once the binary states of the hidden variables are computed, the binary states of the visible units are set to 1 with a probability given by (15).(13)Ph∣v;θ=∏jphj∣v,Pv∣h;θ=∏ipvi∣h,(14)phj=1∣v=σ∑iWijvi+aj,(15)pvi=1∣h=σ∑jWijhj+bi,where σ(z)=1/1+exp⁡(-z) is the logistic function.

For training the RBM model, the learning is conducted by applying the Contrastive Divergence algorithm [22], in which the update rule applied to the model parameters is given by the following equation:(16)ΔWij=ϵvihjdata-vihjrecons,where ϵ is the learning rate, 〈vihj〉data represents the expected value of the product of visible and hidden states at thermal equilibrium, when training data is presented to the model, and 〈vihj〉recons is the expected value of the product of visible and hidden states after running a Gibbs chain.

Deep neural networks can also be utilized for dimensionality reduction of the input data. For this purpose, deep “autoencoders” [24, 25] have been shown to provide successful results in a wide variety of applications such as document retrieval [26] and image retrieval [27]. An autoencoder (see Figure 4) is an unsupervised neural network in which the target values are set to be equal to the inputs. Autoencoders are mainly composed of an “encoder” network, which transforms the input data into a low-dimensional code, and a “decoder” network, which reconstructs the data from the code. Training these deep models involves minimizing the error between the original data and its reconstruction. In this process, the weights initialization is critical to avoid reaching a bad local optimum; thus some authors have proposed a pretrained stage based on stacked RBMs and a fine-tuning stage using backpropagation [24, 27]. In addition, the encoder part of the autoencoder can serve as a good unsupervised nonlinear feature extractor. In this field, the use of Stacked Denoising Autoencoders (SDAE) [25] has been proven to be an effective unsupervised feature extractor in different classification problems. The experiments presented in [25] showed that training denoising autoencoders with higher noise levels forced the model to extract more distinctive and less local features.

Figure 4

Deep autoencoder. An autoencoder consists of an encoder network, which transforms the original input data into a low-dimensional code, and a decoder network, which reconstructs the data from the code.

2.3. Deep Reinforcement Learning

In reinforcement learning, an agent is defined to interact with an environment, seeking to find the best action for each state at any step in time (see Figure 5). The agent must balance exploration and exploitation of the state space in order to find the optimal policy that maximizes the accumulated reward from the interaction with the environment. In this context, an agent modifies its behaviour or policy with the awareness of the states, actions taken, and rewards for every time step. Reinforcement learning composes an optimization process throughout the whole state space in order to maximize the accumulated reward. Robotic problems are often task-based with temporal structure. These types of problems are suitable to be solved by means of a reinforcement learning framework [28].

Figure 5

Generic structure of a reinforcement learning problem. The optimization methods to solve the reinforcement learning problem are mainly categorized into value function and policy search methods.

The standard reinforcement learning theory states that an agent is able to obtain a policy, which maps every state s∈S to an action a∈A, where S is the state space (possible states of the agent in the environment) and A is the finite action space. The inner dynamics of the agent are represented by the transition probability model pst+1∣st,at at time t. The policy can be stochastic πa∣s, with a probability associated with each possible action, or deterministic π(s). In each time step, the policy determines the action to be chosen and the reward r(st,at) is observed from the environment. The goal of the agent is to maximize the accumulated discounted reward Rt=∑i=tT‍γi-tr(si,ai) from a state at time t to time T (T=∞ for infinite horizon problems) [29]. The discount factor γ is defined to allocate different weights for the future rewards.

For a specific policy π, the value function Vπ in (17) is a representation of the expectation of the accumulated discounted reward Rt for each state s∈S (assuming a deterministic policy π(st)):(17)Vπst=ERt∣st, at=πst.

An equivalent of the value function is represented by the action-value function Qπ in (18) for every action-state pair (st,at):(18)Qπst,at=rst,at+γ∑st+1pst+1∣st,atVπst+1.

The optimal policy π∗ shall be the one that maximizes the value function (or equivalently the action-value function), as in the following equation:(19)π∗=arg⁡maxπ ⁡Vπst.

A general problem in real robotic applications is that the state and action spaces are often continuous spaces. A continuous state and/or action space can make the optimization problem intractable, due to the overwhelming set of different states and/or actions. As a general framework for representation, reinforcement learning methods are enhanced through deep learning to aid the design for feature representation, which is known as deep reinforcement learning. Reinforcement learning and optimal control aim at finding the optimal policy π∗ by means of several methods. The optimal solution can be searched in this original primal problem, or the dual formulation V∗,Q∗ can be the optimization objective. In this review, deep reinforcement learning methods are divided into two main categories: value function and policy search methods.

2.3.1. Value Function Methods

These methods seek to find optimal V∗,Q∗, from which the optimal policy π∗ in (20) is directly derived. Q-learning approaches are based on the optimization of the action-value function Q, based on the Bellman Optimality Equation [29] for Q (see (21)):(20)π∗=arg⁡maxat ⁡Q∗st,at,(21)Q∗st,at=Erst,at+γ maxat+1 ⁡Qst+1,at+1.

Deep Q-Network (DQN) [30, 31] method estimates the action-value function (see (22)) by means of a CNN model with a set of weights θ as Q∗(s,a)≈Q(s,a;θ):(22)Qi∗st,at=yi=Erst,at+γ maxat+1 ⁡Qst+1,at+1;θi-1∣st,at.

The CNN can be trained by minimizing a sequence of loss functions Li(θi) which are optimized in each iteration i as shown in the following equation:(23)Liθi=Eyi-Qst,at;θi2.

The state s of the DQN algorithm is the raw image and it has been widely tested with Atari games [31]. DQN is not designed for continuous tasks; thus this method may find difficulties approaching some robotics problems previously solved by continuous control. Continuous Q-learning with Normalized Advantage Functions (NAF) overcomes this issue by the use of a neural network that separately outputs a value function V(x) and an advantage term A(x,u), which is parametrized as a quadratic function of nonlinear features [32]. These two functions compose final Qx,u∣θQ, given by the following equation:(24)Qx,u∣θQ=Ax,u∣θA+Vx∣θVwith x being the state, u being the action, and θQ, θA, and θV being the sets of weights of Q, A, and V functions, respectively. This representation allows simplifying more standard actor-critic style algorithms, while preserving the benefits of nonlinear value function approximation [32]. NAF is valid for continuous control tasks and takes advantage of trained models to approximate the standard model-free value function.

2.3.2. Policy Search Methods

Policy-based reinforcement learning methods aim towards directly searching for the optimal policy π∗, which provides a feasible framework for continuous control. Deep Deterministic Policy Gradient (DDPG) [33] is based on the actor-critic paradigm [29], with two neural networks to approximate a greedy deterministic policy (actor) and Q function (critic). The actor network is updated by applying the chain rule to the expected return from the start distribution J with respect to the actor parameters (see (25)):(25)▽θμJ≈Est~ρβ▽θμQs,a∣θQs=st,a=μst∣θμ.

DDPG method learns with an average factor of 20 times fewer experience steps than DQN [33]. Both DDPG and DQN require large samples datasets, since they are model-free algorithms. Regarding DNN-based Guided Policy Search (DNN-based GPS) [34] method, it learns to map from the tuple raw visual information and joint states directly to joint torques. Compared to the previous works, it managed to perform high-dimensional control, even from imperfect sensor data. DNN-based GPS has been widely applied to robotic control, from manipulation to navigation tasks [35, 36].

3. Deep Learning for Feature Extraction

The main objective of feature extraction systems is to extract representative features from the raw measurements provided by sensors on board a UAV.

3.1. With Image Sensors

Deep learning techniques for feature extraction using image sensors have been applied over a wide range of applications using different imaging technologies (e.g., monocular RGB camera, RGB-D sensors, infrared, etc.). Despite the wide variety of sensors utilized for image processing, main deep learning feature extractors are based on CNNs [67]. As explained in Section 2.1, CNN models consist of several stacked convolution and pooling layers. The convolution layers are responsible for extracting features from the data by convolving the input image with learned filters, while pooling layers provide a dimensionality reduction over previous convolution layers.

In the robotics field, feature extraction systems based on CNN models have been mainly applied for object recognition [42–48] and scene classification [51–54]. Concerning the object recognition task, recent advances have integrated object detection solutions by means of bounding box regression and object classification capabilities within the same CNN model [42–44]. Unsupervised feature learning for object recognition was applied in [68], making fewer requirements on manually labeled training data, the obtainment of which can be an extremely time-consuming and costly process. Regarding the scene classification problem, recent advances have focused on learning efficient and global image representations from the convolutional and fully connected layers from pretrained CNNs in order to obtain representative image features [53]. In [52], it was also shown that the learned features obtained from pretrained CNN models were able to generalize properly even in substantially different domains for those in which they were trained, such as the classification of aerial images. Scene classification on board a Parrot AR.Drone quadrotor was also presented in [40], where a 10-layered CNN was utilized for classifying the input image of a forest trail into three classes, each of which represented the action to be taken in order to maintain the aerial robot on the trail (turn left, go straight, and turn right).

Nowadays, object recognition and scene classification from aerial imagery using deep learning techniques have also acquired a relevant role in agriculture applications. In these kinds of applications, UAVs provide a low-cost platform for aerial image acquisition, while deep learned features are mainly utilized for plant counting and identification. Several applications have used deep learning techniques for this purpose [12, 49, 50, 55, 56], providing robust systems for monitoring the state of the crops in order to maximize their productivity. In [55], a sparse autoencoder was utilized for unsupervised feature learning in order to perform weed classification from images taken by a multirotor UAV. In [56], a hybrid neural network for crop classification amongst 23 classes was proposed. The hybrid network consisted of the combination of a Feedforward Neural Network for histogram information management and a CNN. In [49], the well-known AlexNet CNN architecture proposed in [69] was utilized in combination with a sliding window object proposal technique for palm tree detection and counting. Other similar approaches have focused on weed scouting using a CNN model for weed specifies classification [12].

Deep learning techniques applied on images taken from UAVs have also gained a lot of importance in monitoring and search and rescue applications, such as jellyfish monitoring [70], road traffic monitoring from UAVs [71], assisting avalanche search and rescue operations with UAV imagery [72], and terrorist identification [73]. In [72, 73], the use of pretrained CNN models for feature extraction is worth noting again. In both cases, the well-known Inception model [74] was used. In [72], the Inception model was utilized with a Support Vector Machine (SVM) classifier for detecting possible survivors, while in [73], a transfer-learning technique was used to fine-tune the Inception network in order to detect possible terrorists.

Most of the presented approaches, especially in the field of object recognition, require the use of GPUs for dealing with real-time constraints. In this sense, the state-of-the-art object recognition systems are based on the approaches presented in [46, 47], in which the object recognizer is able to run at rates from 40 to 90 frames per second on an Nvidia GeForce GTX Titan X.

Despite the good results provided by the aforementioned systems, UAV constraints such as endurance, weight, and payload require the development of specific hardware and software solutions for being embedded on board a UAV. Taking these limitations into account, only few systems in the literature have embedded feature extraction algorithms using deep learning processed by GPU technology on board a UAV. In [75], the problem of automatic detection, localization, and classification (ADLC) of plywood targets was addressed. The solution consisted of a cascade of classifiers based on CNN models trained on an Nvidia Titan X and applied over 24 M-pixel RGB images processed by an Nvidia Jetson TK1 mounted on board a fixed-wing UAV. The ADLC algorithm was processed by combining the CPU cores for the detection stage, allowing the GPU to focus on the classification tasks.

3.2. With Other Sensors

Most of the presented workload using deep learning in the literature has been applied to data capture by image sensors due to the consolidated results obtained using CNN models. However, deep learning techniques cover a wide range of applications and can be used in conjunction with sensors other than cameras, such as acoustic, radar, and laser sensors.

Deep learning techniques for UAVs have been utilized for acoustic data recognition [64, 65]. In [64], a Partially Shared Deep Neural Network (PS-DNN) was proposed to deal with the problem of sound source separation and identification using partially annotated data. For this purpose, the PS-DNN is composed of two partially overlapped subnetworks: one regression network for sound source separation and one classification network responsible for the sound identification. The objective of the regression network for sound source separation is to improve the network training for sound source classification by providing a cleaner sound signal. Results showed that PS-DNN model worked reasonably well for people’s voice identification in disastrous situations. The data was collected using a microphone array on board a Parrot Bebop UAV.

In [65], the problem of UAVs identification based on their specific sound was addressed by using a bidirectional LSTM-RNN with 3 layers and 300 LSTM blocks. This model exhibited the best performance amongst other 2 preselected models, namely, Gaussian Mixture Models (GMM) and CNN.

Concerning the radar technology and despite the fact that radar data has not been widely addressed using deep learning techniques for UAVs in the literature, the recent advances presented in [62] are worth mentioning. In this paper, the spectral correlation function (SCF) was captured using a 2.4 GHz Doppler radar sensor that was utilized in order to detect and classify micro-UAVs amongst 3 predefined classes. The model utilized for this purpose was based on a semisupervised DBN trained with the SCF data.

Regarding laser technology, in [66], a novel strategy for detecting safe landing areas based on the point clouds captured from a LIDAR sensor mounted on a helicopter was proposed. In this paper, subvolumes of 1 m³ from a volumetric density map constructed from the original point cloud were used as input to a 3D CNN which was trained to predict the probability of the evaluated area as being a safe landing zone. Several CNN models consisting of one or two convolutional layers were evaluated over synthetic and semisynthetic datasets, showing in both cases good results when using a 3D CNN model with two convolutional layers.

4. Deep Learning for Planning and Situational Awareness

Several deep learning developments have been reported for tasks related to UAV planning and situational awareness. Planning tasks refer to the generation of solutions for complex problems without having to hand-code the environment model or the robot’s skills or strategies into a reactive controller. Planning is required in the presence of unstructured, dynamic environments or when there is diversity in the scope and/or the robot’s tasks. Typical tasks include path, motion, navigation, or manipulation planning. Situational awareness tasks allow robots to have knowledge about their own state and their environment’s state. Some examples of this kind of tasks are robot state estimation, self-localization, and mapping.

4.1. Planning

Path planning for collaborative search and rescue missions with deep learning-based exploration is presented in [57]. This work, where a UAV explores and maps the environment trying to find a traversable path for a ground robot, focuses on minimizing overall deployment time (i.e., both exploration and path traversal). In order to map the terrain and find a traversable path, a CNN is proposed for terrain classification. Instead of using a pretrained CNN, training is done on the spot, allowing training the classifier on demand with the terrain present at the disaster site [58]. However, the model takes around 15 minutes to train.

4.2. Situational Awareness

Cross-view localization of images is achieved with the help of deep learning in [59]. Although the work is presented as a solution for UAV localization, no UAVs were used for image collection and the experiments were based on ground-level images only. The approach is based on mining a library of raw image data to find nearest neighbor visual features (i.e., landmarks) which are then matched with the features extracted from an input query image. A pretrained CNN is used to extract features for matching verification purposes, and although the approach is said to have low computational complexity, authors do not provide details about retrieval time.

Ground-level query images are matched to a reference database of aerial images in [60]. Deep learning is applied here to reduce the wide baseline and appearance variations between both ground-level and aerial images. A pair-based network structure is proposed to learn deep representations from data for distinguishing matched and unmatched cross-view image pairs. Even though the training procedure in the reported experiments took 4 days, the use of fast algorithms such as locality-sensitive hashing allowed for real-time cross-view matching at city scale. The main limitation of their approach is the need to estimate scale, orientation, and dominant depth at test time for ground-level queries.

In [61], a CNN is proposed to generate control actions (the permitted turns for a UAV) given an image captured on board and a global motion plan. This global motion plan indicates the actions to take given a position on the map by means of a potential function. The purpose of the CNN is to learn the mapping from images to position-dependent actions. The process would be equivalent to perform image registration and then generate the control actions given the global motion plan but this behaviour is here learnt to be efficiently encoded in a CNN, demonstrating superior results to classical image registration techniques. However, no tests on real UAV were carried out and no information is provided about execution time, which might complicate the deployment for a real UAV application.

As seen from the presented works, developments in planning and situational awareness with deep learning for UAVs are still quite rudimentary. The path planning approach presented is limited to small-scale disaster sites and the different localization and mapping approaches are still slow and have little accuracy for real UAV applications.

5. Deep Learning for Motion Control

Deep learning techniques for motion control have been recently involved in several scientific researches. Classic control has solved diverse robotic control problems in a precise and analytic manner, allowing robots to perform complex maneuvers. Nevertheless, standard control theory only solves the problem for a specific case and for an approximated robot model, not being able to easily adapt to changes in the robot model and/or to hostile environments (e.g., a propeller on a UAV gets damaged, wind gusts, and rain). In this context, learning from experience is a matter of importance which can overcome numerous stated limitations.

As a key advantage, deep learning methods are able to properly generalize with certain sets of labelled input data. Deep learning allows inferring a pattern from raw inputs, such as images and LIDAR sensor data which can lead to proper behaviour even in unknown situations. Concerning the UAV indoor navigation task, recent advances have led to a successful application of CNNs in order to map images to high-level behaviour directives (e.g., turn left, turn right, rotate left, and rotate right) [38, 39]. In [38], Q function is estimated through a CNN, which is trained in simulation and successfully tested in real experiments. In [39], actions are directly mapped from raw images. In all stated methods, the learned model is run off board, usually taking advantage of a GPU in an external laptop.

With regard to UAV navigation in unstructured environments, some studies have focused on cluttered natural scenarios, such as dense forests or trails [40]. In [40], a DNN model was trained to map image to action probabilities (turn left, go straight, or turn right) with a final softmax layer and tested on board by means of an ODROID-U3 processor. The performance of two automated methods, SVM and the method proposed in [76], is latterly compared to that of two human observers.

In [37], navigable areas are predicted from a disparity image in the form of up to three bounding boxes. The center of the biggest bounding box found is selected as the next waypoint. Using this strategy, UAV flights are successfully performed. The main drawback is the requirement to send the disparity images to a host device where all computations are made. The whole pipeline for the UAV horizontal translation, disparity map generation, and waypoint selection takes about 1.3 seconds which makes navigation still quite slow for real applications. On the other hand, low-level motion control is challenging, since tackling with continuous and multivariable action spaces can become an intractable problem. Nevertheless, recent works have proposed novel methods to learn low-level control policies from imperfect sensor data in simulation [41, 63]. In [63], a Model Predictive Controller (MPC) was used to generate data at training time in order to train a DNN policy, which was allowed to access only raw observations from the UAV onboard sensors. In testing time, the UAV was able to follow an obstacle-free trajectory even in unknown situations. In [41], the well-known Inception v3 model (pretrained CNN) was adapted in order to enable the final layer to provide six action nodes (three transitions and three orientations). After retraining, the UAV managed to cross a room filled with a few obstacles in random locations.

Deep learning techniques for robotic motion control can provide increasing benefits in order to infer complex behaviours from raw observation data. Deep learning approaches have the potential of generalization, with the limitations of current methods which have to overcome the difficulties of continuous state and action spaces, as well as issues related to the samples efficiency. Furthermore, novel deep learning models require the usage of GPUs in order to work in real time. In this context, onboard GPUs, Field Programmable Gate Arrays (FPGAs), or Application-Specific Integrated Circuits (ASICs) are a matter of importance which hardware manufacturers shall take into consideration.

6. Discussion

Deep learning has arisen as a promising set of technologies to the current demands for highly autonomous UAV operations, due to its excellent capabilities for learning high-level representations from raw sensor data. Multiple success cases have been reported (Tables 1 and 2) in a wide variety of applications.

Table 1

Deep learning-based UAV applications grouped by learning algorithms and application fields.

Learning type	Algorithm	Tasks	Field of application	References
Supervised	CNN	Outdoor navigation	Navigation	[37–39]
		Indoor navigation	Navigation	[40, 41]
		Object recognition	Generic	[42–45]
		Object recognition	Generic	[46–48]
		Object recognition	Agriculture	[49, 50]
		Scene classification	Generic	[51–54]
		Scene classification	Agriculture	[55, 56]
		Path planning	Search & rescue	[57, 58]
		Image registration	Localization	[59–61]
		Image registration	Navigation	[59–61]

Unsupervised	Autoencoder	Feature extraction	Agriculture	[55]
Unsupervised	DBN	Feature extraction	UAV identification	[62]

Reinforcement	DQN	—	—	—
	DDPG	—	—	—
	NAF	—	—	—
	GPS	Indoor navigation	Navigation	[63]

Table 2

Deep learning-based UAV applications grouped by the type of system within an unmanned aerial systems architecture, the sensor technologies, and the type of learning algorithms: supervised (S), unsupervised (U), and reinforcement (R).

Aerial robot systems	Sensing technologies	Learning algorithms	References
Feature extraction	Image	S	[42–45]
			[46–48]
			[51–54]
	Image	S , U	[55]
	Acoustic	S	[64, 65]
	Radar	S , U	[62]
	LIDAR	S	[66]

Planning	Image	S	[57, 58]

Situational awareness	Image	S	[59–61]

Motion control	Image	S	[38–41]
Motion control	LIDAR	R	[63]

A straightforward conclusion from the surveyed articles is that images acquired from UAVs are currently the prevailing type of information being exploited by deep learning, mainly due to the low cost, low weight, and low power consumption of image sensors. This noticeable fact explains the dominance of CNNs among the deep learning algorithms used in UAV applications, given the excellent capabilities of CNNs in extracting useful information from images.

However, deep learning techniques, UAV technology, and the combined use of both still present several challenges, which are preventing faster and further advances in this field.

Challenges in Deep Learning. Deep learning techniques are still facing several challenges, beginning with their own theoretical understanding. An example of this is the lack of knowledge about the geometry of the objective function in deep neural networks or why certain architectures work better than others. Furthermore, a lot of effort is currently being put in finding efficient ways to do unsupervised learning, since collecting large amounts of unlabeled data is nowadays becoming economically and technologically less expensive. Success in this objective will allow algorithms to learn how the world works by simply observing it, as we humans do.

Additionally, as mentioned in Section 2.3, real-world problems that usually involve high-dimensional continuous state spaces (large number of states and/or actions) can turn the problem intractable with current approaches, severely limiting the development of real applications. An efficient way for coping with these types of problems remains as an unsolved challenge.

Challenges in UAV Autonomy. UAV autonomous operations, enabling safe navigation with little or no human supervision, are currently key for the development of several civilian and military applications. However, UAV platforms still have important flight endurance limitations, restricting size, weight, and power consumption of the payload. These limitations arise mainly from the current state of sensor and battery technology and limit the required capabilities for autonomous operations. Undoubtedly, we will see developments in these areas in the forthcoming years.

Furthermore, onboard processing is desired for many UAV operations, especially those where communications can compromise performance, such as when large amounts of data have to be transmitted and/or when there is limited bandwidth available. Today, the design of powerful miniaturized computing devices with low-power consumption, particularly GPUs, is an active working field for embedded hardware developers.

Challenges in Deep Learning-Based UAV Applications. This review reveals that, within the architecture of an unmanned aerial system, feature extraction systems are the type of systems in which deep learning algorithms have been more widely applied. This is reasonable given the excellent abilities of deep learning to learn data representations from raw sensor data. Systems regarding higher-level abstractions, such as UAV supervision and planning systems, have so far obtained little regard from the research community. These systems implement complex behaviours that have to be learned and where the application of supervised learning (e.g., the generation of labelled datasets) is complex.

Nevertheless, systems operating at lower levels of abstraction, such as feature extraction systems, still demand great computational resources. These resources are still hard to integrate on board UAVs, requiring powerful communication capabilities and off-board processing. Furthermore, available computational resources are in most cases not compatible with online processing, limiting the applications where reactive behaviours are necessary. This again imposes the aforementioned challenge of developing embedded hardware technology advances but should also encourage researchers to design more efficient deep learning architectures.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the Spanish Ministry of Science (Project DPI2014-60139-R). The LAL UPM and the MONCLOA Campus of International Excellence are also acknowledged for funding the predoctoral contract of one of the authors.

Ivakhnenko

A. G.

Polynomial theory of complex systems

IEEE Transactions on Systems, Man and Cybernetics 1971 1 4 364 378

10.1109/TSMC.1971.4308320

2-s2.0-0015142058

Fukushima

Neocognitron: a hierarchical neural network capable of visual pattern recognition

Neural Networks 1988 1 2 119 130

2-s2.0-0023846591

LeCun

Bottou

Bengio

Haffner

Gradient-based learning applied to document recognition

Proceedings of the IEEE 1998 86 11 2278 2323

2-s2.0-0032203257

10.1109/5.726791

Deng

Dong

Socher

ImageNet: a large-scale hierarchical image database

Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

June 2009

Miami, Fla, USA

248 255

10.1109/CVPRW.2009.5206848

Bengio

Courville

Vincent

Representation learning: a review and new perspectives

IEEE Transactions on Pattern Analysis and Machine Intelligence 2013 35 8 1798 1828

10.1109/tpami.2013.50

2-s2.0-84879854889

Schmidhuber

Deep learning in neural networks: an overview

Neural Networks 2015 61 85 117

10.1016/j.neunet.2014.09.003

2-s2.0-84910651844

Wang

Kuen

Shahroudy

Shuai

Liu

Wang

Recent Advances in Convolutional Neural Networks https://arxiv.org/abs/1512.07108

Tai

Liu

Deep-learning in mobile robotics - from perception to control systems: a survey on why and why not

CoRR abs/1612.07139. http://arxiv.org/abs/1612.07139

Martinez

Sampedro

Chauhan

Campoy

Towards autonomous detection and tracking of electric towers for aerial power line inspection

Proceedings of the 2014 International Conference on Unmanned Aircraft Systems, ICUAS 2014

May 2014

284 295

10.1109/ICUAS.2014.6842267

2-s2.0-84904579598

Olivares-Mendez

M. A.

Ludivig

Bissyandé

T. F.

Kannan

Zurad

Annaiyan

Voos

Campoy

Towards an autonomous vision-based unmanned aerial system against wildlife poachers

Sensors 2015 15 12 31362 31391

10.3390/s151229861

2-s2.0-84949818988

Carrio

Pestana

Sanchez-Lopez

J.-L.

Suarez-Fernandez

Campoy

Tendero

Garca-De-Viedma

González-Rodrigo

Bonatti

Rejas-Ayuga

J. G.

Ubristes: uav-based building rehabilitation with visible and thermal infrared remote sensing

Proceedings of the Robot 2015: Second Iberian Robotics Conference

2016

Springer International Publishing

245 256

Fan

Huang

Tian

Real-time uav weed scout for selective weed control by adaptive robust control and machine learning algorithm

Proceedings of the 2016 ASABE Annual International Meeting, American Society of Agricultural and Biological Engineers

2016

Sanchez-Lopez

J. L.

Molina

Bavle

Sampedro

Fernßndez

R. A. S.

Campoy

Fernández

R. A. S.

A multi-layered component-based approach for the development of aerial robotic systems: The aerostack framework

Journal of Intelligent & Robotic Systems 2017 1 27

Graves

Generating sequences with recurrent neural networks

arXiv preprint https://arxiv.org/abs/1308.0850

Mitchell

T. M.

Machine Learning 1997 45 (37)

Burr Ridge, Ill, USA

McGraw Hill

Goodfellow

Bengio

Courville

Deep Learning 2016

Cambridge, Mass, USA

MIT Press

MR3617773

Hochreiter

Schmidhuber

LSTM can solve hard long time lag problems

Proceedings of the 10th Annual Conference on Neural Information Processing Systems, NIPS 1996

December 1996

473 479

2-s2.0-0000370416

Gibson

Patterson

Deep Learning 2016

O’Reilly

Hinton

G. E.

Osindero

Teh

Y.-W.

A fast learning algorithm for deep belief nets

Neural Computation 2006 18 7 1527 1554

10.1162/neco.2006.18.7.1527

MR2224485

2-s2.0-33745805403

Bengio

Lamblin

Popovici

Larochelle

Greedy layer-wise training of deep networks

Advances in Neural Information Processing Systems 2007 19 153 160

Smolensky

Information processing in dynamical systems: foundations of harmony theory

Tech. Rep., DTIC Document 1986

Hinton

G. E.

Training products of experts by minimizing contrastive divergence

Neural Computation 2002 14 8 1771 1800

10.1162/089976602760128018

Zbl1010.68111

2-s2.0-0013344078

Hinton

G. E.

Dayan

Frey

B. J.

Neal

R. M.

The “wake-sleep” algorithm for unsupervised neural networks

Science 1995 268 5214 1158 1161

2-s2.0-0029652445

10.1126/science.7761831

Hinton

G. E.

Salakhutdinov

R. R.

Reducing the dimensionality of data with neural networks

American Association for the Advancement of Science. Science 2006 313 5786 504 507

10.1126/science.1127647

MR2242509

Zbl1226.68083

2-s2.0-33746600649

Vincent

Larochelle

Lajoie

Manzagol

Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion

Journal of Machine Learning Research 2010 11 3371 3408

MR2756188

Zbl1242.68256

2-s2.0-79551480483

Salakhutdinov

Hinton

Semantic hashing

International Journal of Approximate Reasoning 2009 50 7 969 978

2-s2.0-67449128732

10.1016/j.ijar.2008.11.006

Krizhevsky

Hinton

G. E.

Using very deep autoencoders for content-based image retrieval

Proceedings of the 19th European Symposium on Artificial Neural Networks (ESANN '11)

April 2011

Bruges, Belgium

Kober

Bagnell

J. A.

Peters

Reinforcement learning in robotics: A survey

International Journal of Robotics Research 2013 32 11 1238 1274

2-s2.0-84884276459

10.1177/0278364913495721

Sutton

R. S.

Barto

A. G.

Reinforcement Learning: An Introduction 1998 1

Cambridge, UK

MIT Press

Mnih

Kavukcuoglu

Silver

Graves

Antonoglou

Wierstra

Riedmiller

Playing atari with deep reinforcement learning

arXiv preprint https://arxiv.org/abs/1312.5602

Mnih

Kavukcuoglu

Silver

Rusu

A. A.

Veness

Bellemare

M. G.

Graves

Riedmiller

Fidjeland

A. K.

Ostrovski

Petersen

Beattie

Sadik

Antonoglou

King

Kumaran

Wierstra

Legg

Hassabis

Human-level control through deep reinforcement learning

Nature 2015 518 7540 529 533

2-s2.0-84924051598

10.1038/nature14236

Lillicrap

Sutskever

Levine

Continuous deep q-learning with model-based acceleration

Proceedings of the 33rd International Conference on International Conference on Machine Learning

June 2016

New York, NY, USA

2829 2838 preprint https://arxiv.org/abs/1603.00748

Lillicrap

T. P.

Hunt

J. J.

Pritzel

Heess

Erez

Tassa

Silver

Wierstra

Continuous control with deep reinforcement learning

preprint https://arxiv.org/abs/1509.02971

Levine

Finn

Darrell

Abbeel

End-to-end training of deep visuomotor policies

Journal of Machine Learning Research 2016 17 39 1 40 preprint https://arxiv.org/abs/1504.00702

Zhang

McCarthy

Finn

Levine

Abbeel

Learning deep neural network policies with continuous memory states

Proceedings of the 2016 IEEE International Conference on Robotics and Automation, ICRA 2016

May 2016

520 527

10.1109/ICRA.2016.7487174

2-s2.0-84977555800

Zhang

Kahn

Levine

Abbeel

Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search

Proceedings of the 2016 IEEE International Conference on Robotics and Automation, ICRA 2016

May 2016

528 535

10.1109/ICRA.2016.7487175

2-s2.0-84977493262

Shah

Khawad

Krishna

K. M.

Deepfly: Towards complete autonomous navigation of mavs with monocular camera

Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP 16

2016

New York, NY, USA

59:1 59:8

Sadeghi

Levine

Real single-image flight without a single real image

preprint https://arxiv.org/pdf/1611.04201.pdf

Kim

D. K.

Chen

Deep neural network for real-time autonomous indoor navigation

preprint https://arxiv.org/abs/1511.04668

Giusti

Guzzi

Ciresan

D. C.

F.-L.

Rodriguez

J. P.

Fontana

Faessler

Forster

Schmidhuber

Caro

G. D.

Scaramuzza

Gambardella

L. M.

A machine learning approach to visual perception of forest trails for mobile robots

IEEE Robotics and Automation Letters 2016 1 2 661 667

10.1109/LRA.2015.2509024

2-s2.0-84992306727

Kelchtermans

Tuytelaars

How hard is it to cross the room? – training (recurrent) neural networks to steer a uav

preprint https://arxiv.org/abs/1702.07600

Girshick

Donahue

Darrell

Malik

Rich feature hierarchies for accurate object detection and semantic segmentation

Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR '14)

June 2014

Columbus, Ohio, USA

580 587

10.1109/cvpr.2014.81

2-s2.0-84911400494

Girshick

Fast R-CNN

Proceedings of the 15th IEEE International Conference on Computer Vision (ICCV '15)

December 2015

1440 1448

10.1109/iccv.2015.169

2-s2.0-84964588182

Ren

Girshick

Sun

Faster R-CNN: towards real-time object detection with region proposal networks

Advances in Neural Information Processing Systems 2015 28 91 99

2-s2.0-84960980241

Lee

Wang

Crandall

Šabanovic

Fox

Real-time, cloud-based object detection for unmanned aerial vehicles

Proceedings of the 1st IEEE International Conference on Robotic Computing (IRC)

April 2017

Taichung, Taiwan

36 43

10.1109/IRC.2017.77

Redmon

Divvala

Girshick

Farhadi

You only look once: Unified, real-time object detection

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

2016

779 788 preprint https://arxiv.org/abs/1506.02640

Redmon

Farhadi

Yolo9000: better, faster, stronger

preprint https://arxiv.org/abs/1612.08242

Liu

Anguelov

Erhan

Szegedy

Reed

C.-Y.

Berg

A. C.

Ssd: Single shot multibox detector

Proceedings of the European Conference on Computer Vision

2016

Springer

21 37

Cracknell

Deep learning based oil palm tree detection and counting for high-resolution remote sensing images

Remote Sensing 2017 9 1 22

10.3390/rs9010022

Chen

S. W.

Shivakumar

S. S.

Dcunha

Das

Okon

Taylor

C. J.

Kumar

Counting apples and oranges with deep learning: a data-driven approach

IEEE Robotics and Automation Letters 2017 2 2 781 788

10.1109/LRA.2017.2651944

Zhou

Lapedriza

Xiao

Torralba

Oliva

Learning deep features for scene recognition using places database

Proceedings of the 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014

December 2014

487 495

2-s2.0-84937964578

Penatti

O. A. B.

Nogueira

Dos Santos

J. A.

Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2015

June 2015

44 51

10.1109/CVPRW.2015.7301382

2-s2.0-84940417790

Xia

G.-S.

Zhang

Transferring deep convolutional neural networks for the scene classification of high-resolution remote sensing imagery

Remote Sensing 2015 7 11 14680 14707

2-s2.0-84950141946

10.3390/rs71114680

Gangopadhyay

Tripathi

S. M.

Jindal

Raman

Sa-cnn: dynamic scene classification using convolutional neural networks

preprint https://arxiv.org/abs/1502.05243

Hung

Sukkarieh

Feature learning based approach for weed classification using high resolution aerial images from a digital camera mounted on a UAV

Remote Sensing 2014 6 12 12037 12054

2-s2.0-84920065888

10.3390/rs61212037

Rebetez

Satizábal

H. F.

Mota

Augmenting a convolutional neural network with local histograms-a case study in crop classification from high-resolution uav imagery

Proceedings of the European Symposium on Artificial Neural Networks

2016

Delmerico

Mueggler

Nitsch

Scaramuzza

Active autonomous aerial exploration for ground robot path planning

IEEE Robotics and Automation Letters 2017 2 2 664 671

10.1109/LRA.2017.2651163

Delmerico

Giusti

Mueggler

Gambardella

L. M.

Scaramuzza

“on-the-spot training” for terrain classification in autonomous air-ground collaborative teams

Proceedings of the International Symposium on Experimental Robotics (ISER)

EPFL-CONF-221506, 2016

Taisho

Enfu

Kanji

Naotoshi

Mining visual experience for fast cross-view UAV localization

Proceedings of the 8th Annual IEEE/SICE International Symposium on System Integration, SII 2015

December 2015

375 380

10.1109/SII.2015.7404949

2-s2.0-84963720809

Lin

T.-Y.

Cui

Belongie

Hays

Learning deep representations for ground-to-aerial geolocalization

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015

June 2015

5007 5015

10.1109/CVPR.2015.7299135

2-s2.0-84959245070

Aznar

Pujol

Rizo

Visual Navigation for UAV with Map References Using ConvNets

Advances in Artificial Intelligence 2016 9868

Springer

13 22 Lecture Notes in Computer Science

10.1007/978-3-319-44636-3_2

Mendis

G. J.

Randeny

Wei

Madanayake

Deep learning based doppler radar for micro UAS detection and classification

Proceedings of the MILCOM 2016 - 2016 IEEE Military Communications Conference (MILCOM)

November 2016

Baltimore, Md, USA

924 929

10.1109/MILCOM.2016.7795448

Zhang

Kahn

Levine

Abbeel

Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search

Proceedings of the 2016 IEEE International Conference on Robotics and Automation (ICRA)

May 2016

Stockholm, Sweden

528 535

10.1109/ICRA.2016.7487175

Morito

Sugiyama

Kojima

Nakadai

Partially shared deep neural network in sound source separation and identification using a uav-embedded microphone array

Proceedings of the 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016

October 2016

1299 1304

10.1109/IROS.2016.7759215

2-s2.0-85006421929

Jeon

Shin

J.-W.

Lee

Y.-J.

Kim

W.-H.

Kwon

Yang

H.-Y.

Empirical study of drone sound detection in real-life environment with deep neural networks

preprint https://arxiv.org/abs/1701.05779

Maturana

Scherer

3D convolutional neural networks for landing zone detection from LiDAR

Proceedings of the IEEE International Conference on Robotics and Automation (ICRA '15)

May 2015

Washington, DC, USA

IEEE

3471 3478

10.1109/icra.2015.7139679

2-s2.0-84938228349

LeCun

Boser

B. E.

Denker

J. S.

Henderson

Howard

R. E.

Hubbard

W. E.

Jackel

L. D.

Touretzky

D. S.

Handwritten digit recognition with a back-propagation network

Advances in Neural Information Processing Systems 1990 2 396 404

Ghaderi

Athitsos

Selective unsupervised feature learning with convolutional neural network (S-CNN)

Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR)

December 2016

2486 2490

10.1109/ICPR.2016.7900009

Krizhevsky

Sutskever

Hinton

G. E.

Imagenet classification with deep convolutional neural networks

Proceedings of the 26th Annual Conference on Neural Information Processing Systems (NIPS '12)

December 2012

Lake Tahoe, Nev, USA

1097 1105

2-s2.0-84876231242

Kim

Jung

Koo

Shin

J.-U.

Myung

Development of a UAV-type jellyfish monitoring system using deep learning

Proceedings of the 12th International Conference on Ubiquitous Robots and Ambient Intelligence, URAI 2015

October 2015

495 497

10.1109/URAI.2015.7358813

2-s2.0-84962690817

Kim

N. V.

Chervonenkis

M. A.

Situation control of unmanned aerial vehicles for road traffic monitoring

Modern Applied Science 2015 9 5 1 13

2-s2.0-84928572981

10.5539/mas.v9n5p1

Bejiga

Zeggada

Nouffidj

Melgani

A convolutional neural network approach for assisting avalanche search and rescue operations with UAV imagery

Remote Sensing 2017 9 2 100

10.3390/rs9020100

Sawarkar

Chaudhari

Chavan

Zope

Budale

Kazi

HMD vision-based teleoperating UGV and UAV for hostile environment using deep learning

CoRR abs/1609.04147. URL http://arxiv.org/abs/1609.04147

Szegedy

Liu

Jia

Sermanet

Reed

Anguelov

Erhan

Vanhoucke

Rabinovich

Going deeper with convolutions

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '15)

June 2015

Boston, Mass, USA

1 9

10.1109/cvpr.2015.7298594

The Technion – Israel Institute of Technology

Technion aerial systems 2016

Journal Paper for AUVSI Student UAS Competition 2016

Santana

Correia

Mendonça

Alves

Barata

Tracking natural trails with swarm-based visual saliency

Journal of Field Robotics 2013 30 1 64 86

2-s2.0-84870864663

10.1002/rob.21423