Deep learning is a subfield of machine learning, which aims to learn a hierarchy of features from input data. Nowadays, researchers have intensively investigated deep learning algorithms for solving challenging problems in many areas such as image classification, speech recognition, signal processing, and natural language processing. In this study, we not only review typical deep learning algorithms in computer vision and signal processing but also provide detailed information on how to apply deep learning to specific areas such as road crack detection, fault diagnosis, and human activity detection. Besides, this study also discusses the challenges of designing and training deep neural networks.
Deep learning methods are a group of machine learning methods that can learn features hierarchically from lower level to higher level by building a deep architecture. The deep learning methods have the ability to automatically learn features at multiple levels, which makes the system be able to learn complex mapping function
The most characterizing feature of deep learning methods is that their models all have deep architectures. A deep architecture means it has multiple hidden layers in the network. In contrast, a shallow architecture has only few hidden layers (1 to 2 layers). Deep architectures are loosely inspired by mammal brain. When given an input percept, mammal brain processes it using different area of cortex which abstracts different levels of features. Researchers usually describe such concepts in hierarchical ways, with many levels of abstraction. Furthermore, mammal brains also seem to process information through many stages of transformation and representation. A very clear example is that the information in the primate visual system is processed in a sequence of stages: edge detection, primitive shapes, and more complex visual shapes.
Inspired by the deep architecture of mammal brain, researchers investigated deep neural networks for two decades but did not find effective training methods before 2006: researchers only obtained good experimental results of neural network with one or two hidden layers but could not get good results of neural network with more hidden layers. In 2006, Hinton et al. proposed deep belief networks (DBNs) [
Many other deep architectures, that is, autoencoder, deep convolutional neural networks, and recurrent neural networks, are successfully applied in various areas. Regression [
This survey provides an overview of several deep learning algorithms and their emerging applications in several specific areas, featuring face recognition, road crack detection, fault diagnosis, and falls detection. As complementarity to existing review papers [
Deep learning algorithms have been extensively studied in recent years. As a consequence, there are a large number of related approaches. Generally speaking, these algorithms can be grouped into two categories based on their architectures: restricted Boltzmann machines (RBMs) and convolutional neural networks (CNNs). In the following sections, we will briefly review these deep learning methods and their developments.
This section introduces how to build and train RBM-based deep neural networks (DNNs). The building and training procedures of a DNN contain two steps. First, build a deep belief network (DBN) by stacking restricted Boltzmann machines (RBMs) and feed unlabeled data to pretrain the DBN. The pretrained DBN provides initial parameters for the deep neural network. In the second step, labeled data is fed to train the DNN using back-propagation. After two steps of training, a trained DNN is obtained. This section is organized as follows. Section
RBM is an energy-based probabilistic generative model [
Restricted Boltzmann machine.
As a result of the lack of hidden-hidden and input-input interactions, the energy function of a RBM is
We can obtain a tractable expression for the conditional probability
For binary RBM, where
Because
Although binary RBMs can achieve good performance when dealing with discrete inputs, they have limitations to handle continuous-valued inputs due to their structure. Thus, in order to achieve better performance on continuous-valued inputs, Gaussian RBMs are utilized for the visible layer [
Hinton et al. [
Deep belief network structure.
As Figure
After the unsupervised pretraining step for DBN, the next step is to use parameters from DBN to initialize the DNN and do supervised training for DNN using back-propagation. The parameters of the
Convolutional neural network is one of the most powerful classes of deep neural networks in image processing tasks. It is highly effective and commonly used in computer vision applications [
The architecture of convolution neural network.
As Figure
Digital image representation and convolution matrix.
An example of convolution calculation.
The subsampling layer is an important layer to convolutional neural network. This layer is mainly to reduce the input image size in order to give the neural network more invariance and robustness. The most used method for subsampling layer in image processing tasks is max pooling. So the subsampling layer is frequently called max pooling layer. The max pooling method is shown in Figure
The example of the subsampling layer.
Full connection layers are similar to the traditional feed-forward neural layer. They make the neural network fed forward into vectors with a predefined length. We could fit the vector into certain categories or take it as a representation vector for further processing.
Compared to conventional machine learning methods, the advantage of the deep learning is that it can build deep architectures to learn more multiscale abstract features. Unfortunately, the large amount of parameters of the deep architectures may lead to overfitting problem.
The key idea of data augmentation is to generate additional data without introducing extra labeling costs. In general, the data augmentation is achieved by deforming the existing ones. Mirroring, scaling, and rotation are the most common methods for data augmentation [
Training a deep learning architecture is a time-consuming and nontrivial task. On one hand, it is difficult to obtain enough well-labeled data to train the deep learning architecture in real application, although the data augmentation can help us obtain more training data.
For visual tasks, when it is hard to get sufficient data, a recommendable way is to fine-tune the pretrained CNN by natural images (e.g., ImageNet) and then use specific data set to fine-tune the CNN [
On the other hand, the deep learning architecture contains hundreds of thousands of parameters to be initialized even with sufficient data. Erhan et al. provided the evidence to explain that the pretraining step helps train deep architectures such as deep belief networks and stacked autoencoders [
Deep learning has been widely applied in various fields, such as computer vision [
As we know, convolutional neural networks are very powerful tools for image recognition and classification. These different types of CNNs are often tested on well-known ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) data set and achieved state-of-the-art performance in recent years [
Face recognition has been one of the most important computer vision tasks since the 1970s [
Figure
Logic flow of CNN-based face recognition [
Outline of DeepFace architecture [
Nowadays, face recognition in mobile computing is a very attractive topic [
Experiment results on LFW benchmark [
Technique | Accuracy |
---|---|
Human-level (cropped) [ | 0.9753 |
FaceNet [ | |
DeepFace-ensemble [ | |
OpenFace [ | |
Automatic detection of pavement cracks is an important task in transportation maintenance for driving safety assurance. Inspired by recent success in applying deep learning to computer vision and medical problems, a deep learning based method for crack detection is proposed [
Illustration of the architecture of the proposed ConvNet [
The Receiver Operating Characteristic (ROC) curves of the proposed method, the SVM, and the Boosting method are shown in Figure
ROC curves [
Probability maps.
Scene 1
Scene 2
Scene 3
For each scene, each row shows the original image with crack, ground truth, and probability maps generated by the SVM and the Boosting methods and that by the ConvNet. The pixels in green and in blue denote the crack and the noncrack, respectively, and higher brightness means higher confidence. The SVM cannot distinguish the crack from the background, and some of the cracks have been misclassified. Compared to the SVM, the Boosting method can detect the cracks with a higher accuracy. However, some of the background patches are classified as cracks, resulting in isolated green parts in Figure
Plant faults may cause abnormal operations, emergency shutdowns, equipment damage, or even casualties. With the increasing complexity of modern plants, it is difficult even for experienced operators to diagnose faults fast and accurately. Thus, designing an intelligent fault detection and diagnose system to aid human operators is a critical task in process engineering. Data-driven methods for fault diagnosis are becoming very popular in recent years, since they utilize powerful machine learning algorithms. Conventional supervised learning algorithms used for fault diagnosis are Artificial Neural Networks [
TEP is a simulation model that simulates a real industry process. The model was first created by Eastman Chemical Company [
Tennessee Eastman Process [
For each simulation run, the simulation starts without faults and the faults are introduced at sample 1. Each run collects a total of 1000 pieces of sample data. Each single fault type has 5 independent simulation runs. The Tennessee Eastman Process has 20 different predefined faults but faults 3, 9, and 15 are excluded for fault diagnosis due to no effect or subtle effect on all the sensors [
Schematic diagram of HDNN [
Correct classification rate of SNN, DOHANN [
Human activity detection has drawn much attention from researchers due to high demands for security, law enforcement, and health care [
Fall detection is one of the very important human activity detection scenarios for researchers, since falls are a main cause of both fatal and nonfatal injuries for the elderly. Khan and Taati [
Block diagram of the deep learning based fall detector [
Though deep learning techniques achieve promising performance on multiple fields, there are still several big challenges as research articles indicate. These challenges are described as follows.
Training deep neural network usually needs large amounts of data as larger training data set can prevent deep learning model from overfitting. Limited training data may severely affect the learning ability of a deep neural network. Unfortunately, there are many applications that lack sufficient labeled data to train a DNN. Thus, how to train DNN with limited data effectively and efficiently becomes a hot topic.
Recently, two possible solutions draw attention from researchers. One of the solutions is to generalize new training data from original training data using multiple data augmentation methods. Traditional ones include rotation, scaling, and cropping. In addition to these, Wu et al. [
Training deep neural network is very time-consuming in early years. It needs a large amount of computational resources and is not suitable for real-time applications. By default, GPUs are used to accelerate training of large DNNs with the help of parallel computing technique. Thus, it is important to make the most of GPU computing ability when training DNNs. He and Sun [
Though deep learning algorithms achieve promising results on many tasks, the underlying theory is still not very clear. There are many questions that need to be answered. For instance, which architecture is better than other architectures in certain task? How many layers and how many nodes in each layer should be chosen in a DNN? Besides, there are a few hyperparameters such as learning rate, dropout rate, and the strength of regularizer which need to be tuned with specific knowledge.
Several approaches are developed to help researchers to get better understanding in DNN. Zeiler and Fergus [
Although there is progress in understanding the theory of deep learning, there is still large room to improve in deep learning theory aspect.
This paper gives an overview of deep learning algorithms and their applications. Several classic deep learning algorithms such as restricted Boltzmann machines, deep belief networks, and convolutional neural networks are introduced. In addition to deep learning algorithms, their applications are reviewed and compared with other machine learning methods. Though deep neural networks achieve good performance on many tasks, they still have many properties that need to be investigated and justified. We discussed these challenges and pointed out several new trends in understanding and developing deep neural networks.
The authors declare that there is no conflict of interests regarding the publication of this paper.