^{1}

^{1}

^{2}

^{3}

^{1}

^{1}

^{2}

^{3}

Nowadays, deep learning has achieved remarkable results in many computer vision related tasks, among which the support of big data is essential. In this paper, we propose a full stage data augmentation framework to improve the accuracy of deep convolutional neural networks, which can also play the role of implicit model ensemble without introducing additional model training costs. Simultaneous data augmentation during training and testing stages can ensure network optimization and enhance its generalization ability. Augmentation in two stages needs to be consistent to ensure the accurate transfer of specific domain information. Furthermore, this framework is universal for any network architecture and data augmentation strategy and therefore can be applied to a variety of deep learning based tasks. Finally, experimental results about image classification on the coarse-grained dataset CIFAR-10 (93.41%) and fine-grained dataset CIFAR-100 (70.22%) demonstrate the effectiveness of the framework by comparing with state-of-the-art results.

Computer vision is the first and most widely used field of deep learning technology. After the advent of AlexNet [

However, deep learning still has many unintelligible properties and the theory behind it is not perfect. Typically, due to its difficulty of interpretation, deep learning models are difficult to be improved in a targeted manner. Researchers usually need to consider both optimization and generalization. Moreover, big data driven mode based deep CNNs still have the “overfitting” problem; that is, the neural network can perform well on the training set but cannot be effectively generalized on the unseen test data. On the other hand, a larger model tends to perform better [

At present, many methods have been developed to alleviate the “overfitting” problem of deep CNNs, and they can be summarized as follows:

Regularization techniques for limiting network complexity, such as L2-regularization [

Data augmentation methods for expanding sample set, such as translation [

Model ensemble for reducing dependence on single network, for example, auxiliary classification nodes in GoogleLeNet [

Some special training tricks like well-designed initialization [

In this paper, we propose the full stage (i.e., training and testing stages) data augmentation framework in deep learning for natural image classification. Data augmentation in the training process is used to ensure that the network can mine the structural information of samples and finally converge in the appropriate position, and the data augmentation in the test process can play the role of model ensemble to reduce the dependence on a single network. Augmentation in two stages needs to be consistent to ensure accurate transfer of domain information. It is worth noting that the framework is universal to any network architecture and data augmentation strategy and can therefore be applied to a variety of deep learning based tasks. We have done extensive experiments on fine-grained and coarse-grained image classification datasets, that is, CIFAR-10 and CIFAR-100 [

The remainder of the paper is organized as follows. Section

Data augmentation is an effective method to reduce the “overfitting” of deep CNN caused by limited training samples, which approximates the data probability space by manipulating input samples, such as horizontal flipping, random crop, scale transformation, and noise disturbance. In general, as long as the quantity, quality, and diversity of the data in the dataset are increased, the effectiveness of the model can be improved. Sample pairing [

In addition, there are many regularization methods at the loss layer which can also be interpreted as an implicit data augmentation, such as Dropout [

In fact, approximating real and natural input spaces through data augmentation is intuitionistic. A more comprehensive input space allows the model to better converge on a global minimum or a better local minimum. However, the “overfitting” problem of deep CNNs still exists, which prompts us to rethink of the influence of data augmentations during training and testing process on the optimization and generalization of deep CNNs.

Given a deep CNN model

In the forward propagation stage, the output of each layer is the input of the next; the output _{l} of _{0} =

The first term is the negative log-likelihood loss and the second term is L2-regularization of all the weights.

In the back propagation stage, our goal is to minimize

The overall flow of training and testing of deep CNNs.

From the perspective of image acquisition, an acquired image is only one of many possible observations of the potential anatomy that can be observed by different spatial transformations and noise disturbance. Direct inference of the acquired images may result in biased results affected by specific transformations and noise associated with equipment and environment. In order to obtain a more reliable and robust prediction, we propose a full stage data augmentation framework to decrease the “overfitting” problem in deep CNN.

At the first level, that is, training stage, the training samples

At the second level, that is, test stage, we use the same distributions of augmentation parameters for the convergent deep CNN. Each test image is augmented to

Then the label corresponding to the location of the largest value in the one-dimensional vector

The whole full stage data augmentation framework.

Researchers [

By reducing the reconstruction error between original sample and augmented samples, we can obtain the updated parameters of deep CNNs. We have observed the parameter distribution of a series of networks

Parameter distribution of a series of deep CNNs by projecting the augmented images back into the input space. The horizontal axis represents the normalized network parameters.

Two benchmarks CIFAR-10 and CIFAR-100 represent coarse-grained and fine-grained natural image classification tasks, respectively, which are used to evaluate the effectiveness of full stage data augmentation frameworks under different difficulties. CIFAR-10 and CIFAR-100 are labeled subsets of the 80 million tiny images dataset [

Some examples of images in CIFAR-10 (first row) and CIFAR-100 (second row).

Input images of CIFAR-10 and CIFAR-100 datasets [

Two specially designed deep CNNs are constructed to complete the image classification, as shown in Figure

The structure of two specially designed deep CNNs.

Network hyperparameters including initial learning rate, batch size, dropout rate, momentum, weight decay rate, and Leaky-ReLU hyperparameter

All the training and testing procedures of deep CNNs are carried out under the Caffe deep learning framework [

In this section, we report the experimental results and discuss possible reasons behind some phenomena. To prove the validity of proposed full stage data augmentation method, fivefold cross validation results are computed for final evaluation and comparison. Furthermore, the classification results of two datasets are presented separately in terms of the fineness degree of object categories.

We first report the baseline classification results before and after using full stage data augmentation method, as shown in Figure

Classification results on CIFAR-10 before and after using full stage data augmentation method. (a)–(d) represent “no data augmentation,” “augmentation in training stage,” “augmentation in test stage,” and “full stage data augmentation,” respectively.

Then we observe the effectiveness of various data augmentation methods under the proposed full stage data augmentation framework, including translation, horizontal flip, rotation, scale transformation, and noise disturbance. Translation and horizontal flip are based on the settings given above. The rotation range is from negative to positive five degrees, and the step size is 1 degree. Four Gaussian convolutional kernels with different fuzzy radii are used for the scale transformation, including 3 × 3, 5 × 5, and 7 × 7 pixels. The noise disturbance adds Gaussian white noises with different intensity to the original image, including 0.01, 0.05, 0.1, and 0.2. During the testing phase, the data augmentation strategy of test samples is consistent with the training samples. The experimental results are presented in Table

Classification accuracy of various data augmentation methods on CIAFR-10 under the proposed full stage data augmentation framework.

Methods | CIFAR-10 (%) |
---|---|

Translation | 91.41 |

Horizontal flip | 90.27 |

Rotation | 88.78 |

Scale transformation | 90.70 |

Noise disturbance | 87.54 |

Finally, we compare state-of-the-art results brought by a series of algorithms to demonstrate the effectiveness of proposed full stage data augmentation framework, as shown in Table

Comparison with state-of-the-art algorithms on CIFAR-10.

Algorithms | CIFAR-10 (%) |
---|---|

Dropout [ |
84.40 |

Probout [ |
88.65 |

NIN + dropout [ |
89.59 |

Maxout + dropout [ |
88.32 |

Stochastic pooling [ |
84.86 |

Probabilistic weighted pooling [ |
88.71 |

Our method | 93.41 |

Since its high similarity between different classes and the scarcity of samples in each class, fine-grained image classification is more challenging than coarse-grained classification task. Table

Experimental results on CIFAR-100 before and after using full stage data augmentation method.

Methods | CIFAR-100 (%) |
---|---|

No data augmentation | 61.85 |

Augmentation in training stage | 66.49 |

Augmentation in testing stage | 63.84 |

Full stage augmentation | 70.22 |

Then we also observe the effectiveness of various data augmentation methods on CIFAR-100, as given in Table

Classification accuracy of various data augmentation methods on CIFAR-100 under the proposed full stage data augmentation framework.

Methods | CIFAR-100 (%) |
---|---|

Translation | 63.73 |

Horizontal flip | 64.11 |

Rotation | 62.20 |

Scale transformation | 64.83 |

Noise disturbance | 60.47 |

Table

Comparison with state-of-the-art algorithms on CIFAR-100.

Algorithms | CIFAR-100 (%) |
---|---|

Probout [ |
61.86 |

NIN + dropout [ |
64.32 |

Maxout + dropout [ |
61.43 |

Stochastic pooling [ |
57.49 |

Probabilistic weighted pooling [ |
62.87 |

Our method | 70.22 |

In practice, one of the obstacles to the mature application of data augmentation strategies in deep learning is that it is difficult for people to determine how many samples are efficient. In other words, the regularization intensity of data augmentation is usually uncertain. Although some scholars [

We set up a series of data augmentation schemes of different sizes and observe the classification performance of the network in an attempt to mine and establish the relationship between the expanded sample size and the network generalization boundary. The experimental results of CIFAR-10 and CIFAR-100 are shown in Figure

The relationship between the expanded sample size and network generalization ability, in which the number of images is expanded to

Then we visualize the convolutional kernels in the first layer of deep CNN trained on CIFAR-10/100, as shown in Figure

Visualization of convolutional kernels in the first layers of the deep CNNs trained on CIFAR-10 (a) and CIFAR-100 (b), respectively.

Data augmentation in training phase inevitably affects the network convergence, including the convergence speed and the final convergence position. We observe the decrease curve of the loss function of deep CNN on the training sets of CIFAR-10 and CIFAR-100 (see Figure

The optimization of the loss function of deep CNNs trained on CIFAR-10 and CIFAR-100, respectively.

In this paper, we propose a full stage data augmentation framework to improve the accuracy of deep CNNs, which can also play the role of model ensemble without introducing additional model training costs. Simultaneous data augmentation during training and testing stages can ensure network convergence and enhance its generalization capability on unseen test samples. Furthermore, this framework is universal for any network architecture and data augmentation strategy and therefore can be applied to various deep learning based tasks. Finally, experiments about image classification on the coarse-grained dataset CIFAR-10 and fine-grained dataset CIFAR-100 demonstrate the effectiveness of the proposed framework by comparison with state-of-the-art algorithms. Through visualization of convolutional kernels, we have demonstrated that the ordered convolutional kernels usually mean effective extraction of the organization information, while chaotic ones mean the “overfitting” of networks. We have also analyzed the relationship between data augmentation and network generalization ability and observed the impact of the framework on the convergence of deep CNNs. The empirical results have shown that the data augmentation framework can improve the generalization ability of deep learning models, and it can have a negligible impact on the model’s convergence.

As for future research directions, we plan to apply the proposed full stage data augmentation method to more complex CNN structures and some other machine learning related applications, such as liveness detection and gait and face recognition. We believe that it can help improve the performance of deep learning models in a series of tasks.

The experimental data of CIFAR-10 and CIFAR-100 used to support the findings of this study are included within the paper.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

This work was supported by the National Key R&D Program of China (Grant 2018YFC0831503), the National Natural Science Foundation of China (Grant 61571275), and Fundamental Research Funds of Shandong University (Grant 2018JC040).