^{1}

^{1}

^{1}

In recent years, the research of artificial neural networks based on fractional calculus has attracted much attention. In this paper, we proposed a fractional-order deep backpropagation (BP) neural network model with

It is well known that artificial neural networks (ANNs) are the abstraction, simplification, and simulation of the human brains and reflect the basic characteristics of the human brains [

The fractional calculus has a history as long as the integral order calculus. In the past three hundred years, the theory of fractional calculus has made great progresses [

In this paper, we proposed a deep fractional-order BP neural network with

The structure of the paper is as follows: in Section

In this section, the basic knowledge of fractional calculus is introduced, including the definitions and several simple properties used in this paper.

Different from integer calculus, fractional derivative does not have a unified temporal definition expression up to now. The commonly used definitions of fractional derivative are Grünwald-Letnikov (G-L), Riemann-Liouville (R-L), and Caputo derivatives [

The following is the G-L definition of fractional derivative:

The R-L definition of fractional derivative is as follows:

The Caputo definition of fractional derivative is as follows:

Fractional calculus is more difficult to compute than integer calculus. Several mathematical properties used in this paper are given here. The fractional differential of a linear combination of differintegral functions is as follows:

The fractional differential of constant function

For the G-L definition,

For the R-L definition,

And for the Caputo definition

According to (

In this section, we introduce the fractional-order deep BP neural network with

Particularly, external outputs can exist in any layer except the last one. With the square error function, the error corresponding to

The total error of the neural networks is defined as

In order to minimize the total error of the fractional-order deep BP neural network, the weights are updated by the fractional gradient descent method with Caputo derivative. Let

Firstly, we define that

According to (

Then the relationship between

Then, according to the chain rule and (

The updating formula is

Fractional-order BP neural network can be overfitted easily when the training set has small scalar.

By introducing (

The updating formula is

In this section, the convergence of the proposed fractional-order BP neural network is analyzed. According to previous studies [

(1) The activation functions

(2) The boundedness of the weight sequence

(3) The learning rate

(4) Let

Then, the influence of

According to (

The fractional-order derivative of

From (

In this case, the output of each layer in the neural networks is still given by (

When

Without loss of generality, according to (

Since the value of

Secondly, without loss of generality, for

To allow (

With (

Equation (

In this section, the following simulations were carried out to evaluate the performance of the presented algorithm. The simulations have been performed on the MNIST handwritten digital dataset. Each digit in the dataset is a 28 × 28 image. Each image is associated with a label from 0 to 9. We divided each image into four parts, which were top-left, bottom-left, bottom-right, and top-right, and each part was a 14 × 14 matrix. We vectorized each part of the image as a 196 × 1 vector and each label as a 10 × 1 vector.

In order to identify the handwritten digits in MNIST dataset, a neural network with 8 layers was proposed. Figure

The topological structure of the neural networks.

The MNIST dataset has a total number of 60000 training samples and 10000 testing samples. The simulations demonstrate the performance of the proposed fractional-order BP neural network with

In order to explore the relationship between the fractional orders and the neural network performance, the fractional-order neural networks with different orders were trained. Figure

Performances of the algorithms when v>2.

Size of training set | | | ||
---|---|---|---|---|

Train Accuracy | Test Accuracy | Train Accuracy | Test Accuracy | |

10000 | 88.65% | 83.52% | 76.31% | 72.66% |

20000 | 91.04% | 89.52% | 78.93% | 75.97% |

30000 | 93.03% | 90.65% | 82.51% | 80.79% |

40000 | 93.20% | 90.53% | 82.47% | 80.61% |

50000 | 93.02% | 91.23% | 82.53% | 81.60% |

60000 | 93.85% | 91.71% | 87.32% | 86.05% |

The relationship between the fractional order of gradient descent method and the neural network performance.

From Figure

Table

Optimal Orders and Highest Accuracies.

Size of training set | Optimal order of training set | Optimal order of testing set | Highest training accuracy | Highest testing accuracy |
---|---|---|---|---|

10000 | 10/9 | 11/9 | 98.53% | 90.31% |

20000 | 10/9 | 10/9 | 98.84% | 92.34% |

30000 | 11/9 | 11/9 | 99.05% | 93.50% |

40000 | 10/9 | 11/9 | 99.18% | 93.92% |

50000 | 1 | 10/9 | 99.20% | 94.56% |

60000 | 11/9 | 11/9 | 99.20% | 95.00% |

It also can be seen that, in each case, the training accuracy is much bigger than testing accuracy, which means that the BP neural networks have obvious overfitting phenomenon. To avoid overfitting, the integer-order and fractional-order BP neural networks with

The performance of the proposed fractional-order BP neural networks with

Performance comparison of different type BP neural networks.

Size of training set | Integer-order BP neural networks | Fractional-order BP neural networks | Integer-order BP neural networks with | Fractional-order BP neural networks with | ||||||
---|---|---|---|---|---|---|---|---|---|---|

Training Accuracy | Testing Accuracy | Training Accuracy | Testing Accuracy | Training Accuracy | Testing Accuracy | Training Accuracy | Testing Accuracy | Improvement relative to IOBP | Improvement relative to FOBP | |

10000 | 98.41% | 89.87% | 98.48% | 90.31% | 98.45% | 93.35% | 98.43% | 93.95% | 4.54% | 4.03% |

20000 | 98.81% | 92.28% | 98.84% | 92.34% | 98.75% | 95.09% | 98.79% | 95.13% | 3.09% | 3.02% |

30000 | 98.95% | 93.38% | 99.05% | 93.50% | 98.92% | 95.15% | 98.88% | 95.62% | 2.40% | 2.27% |

40000 | 99.05% | 93.83% | 99.01% | 93.92% | 98.96% | 95.63% | 98.95% | 95.83% | 2.13% | 2.03% |

50000 | 99.20% | 94.55% | 99.17% | 94.56% | 99.11% | 96.08% | 99.15% | 96.45% | 2.01% | 2.00% |

60000 | 99.17% | 94.87% | 99.20% | 95.00% | 99.13% | 96.51% | 99.17% | 96.70% | 1.93% | 1.79% |

We use the following formula to calculate improvement: improvement of A compared with B = (A-B)÷B.

Performance comparison in terms of testing accuracy.

In Table

Then, the stability and convergence of the proposed fractional-order BP neural networks with

Changes of total error

Changes of

In this paper, we applied fractional calculus and regularization method to deep BP neural networks. Different from previous studies, the proposed model had no limitations on the number of layers and the fractional-order was extended to arbitrary real number bigger than 0.

The code of this work can be downloaded at

The authors declare that they have no conflicts of interest.

This work was supported in part by the National Key R&D Program of China under Grant 2017YFB0802300, the National Natural Science Foundation of China under Grant 61671312, the Science and Technology Project of Sichuan Province of China under Grant 2018HH0070, and the Strategic Cooperation Project of Sichuan University and Luzhou City under Grant 2015CDLZ-G22.