Artificial neural network has been extensively consumed training model for solving pattern recognition tasks. However, training a very huge training data set using complex neural network necessitates excessively high training time. In this correspondence, a new fast Linear Adaptive Skipping Training (LAST) algorithm for training artificial neural network (ANN) is instituted. The core essence of this paper is to ameliorate the training speed of ANN by exhibiting only the input samples that do not categorize perfectly in the previous epoch which dynamically reducing the number of input samples exhibited to the network at every single epoch without affecting the network’s accuracy. Thus decreasing the size of the training set can reduce the training time, thereby ameliorating the training speed. This LAST algorithm also determines how many epochs the particular input sample has to skip depending upon the successful classification of that input sample. This LAST algorithm can be incorporated into any supervised training algorithms. Experimental result shows that the training speed attained by LAST algorithm is preferably higher than that of other conventional training algorithms.
An artificial neural network (ANN) is a nonlinear knowledge processing model that have been successfully utilized training models for solving supervised pattern recognition tasks [
Speeding the ANN training is still a focus of research attention in neural network to improve network for faster processing. Many research works have been explored on different amendments by estimating optimal initial weight [
All of the previously mentioned efforts are focused on speeding the training process by reducing the total number of epochs or by converging quickly. But each and every technique employs all the input samples in the training dataset to the network for classification at each and every single epoch. If a large amount of training data with high dimension is rendered for classification, then the mentioned technique introduces a problem by slowing down classification. There is a real fact that the correctly classified input samples are not involved in the weight updation since the error rate is calculated based on the misclassification rate. So, the intention of this research is to impart a simple and new algorithm for training the ANN in a fast manner. The core idea of LAST is when an input pattern is categorized perfectly by the network, and that particular pattern will not be presented again for the subsequent
The gist of this research paper is systematized as follows. Section
The LAST algorithm that is incorporated in the prototypical multilayer feedforward neural network architecture is sketched in Figure
LAST incorporated in neural network architecture.
Such network is furnished with
The network parameter symbols employed in this algorithm are addressed here. Let
Since the network is fully interconnected, each layer nodes is integrated with all the node in the next layer. Let
In BPN algorithm, each output unit correlates its computed activation
The core idea of LAST is when an input pattern is categorized perfectly by the network, and that particular pattern will not be presented again for the subsequent
The working principle of the LAST algorithm that is incorporated in the BPN algorithm is documented as follows.
Determine the magnitude of the connection initial weights (and biases) to the disseminated values within the precised range and also the learning rate,
While the iteration terminating criterion is attained, accomplish Steps
Iterate through the Steps
Trigger the network by rendering the training input matrix to the input nodes in the network input layer.
Disseminate the input training vector from the input layer towards the subsequent layers.
Hidden Layer Activation net value is as follows. Each hidden node Apply nonlinear logistic sigmoid activation function to estimate the actual output for each hidden node
Attaining the differential for the aforementioned activation function,
Output Layer Activation net value is next. Each output node Apply non-linear logistic sigmoid activation function to estimate the actual output for each output node
Attaining the differential for the aforementioned activation function,
For each output unit
For each hidden unit
For each output unit, Consider the following.
The weight amendment is yielded by the following updating rule:
The bias amendment is yielded by the following updating rule:
For each hidden unit, Consider the following.
The weight amendment is yielded by the following updating rule
The bias amendment is yielded by the following updating rule:
Measure the dissimilarity between the target and true value of each input sample
Accomplish collation between the utter error value,
Compute the possibility value for presenting the input sample in the next iteration:
Calculate Initialize the value of If
Construct the new probability-based training dataset to be presented in the next epoch.
Inspect for the halting condition such as applicable mean square error (MSE), elapsed epochs, and desired accuracy.
The proposed LAST algorithm has been analyzed for the categorization problem concomitant with two-class and multiclass. The real-world workbench data sets wielded for training and testing the algorithms are Iris data set, Waveform data set, Heart data set and Breast Cancer data set which are possessed from the UCI (University of California at Irvine) Machine Learning Repository [
Concrete quantity of the data sets used in the research.
Datasets | No. of attributes | No. of classes | No. of instances |
---|---|---|---|
Iris | 4 | 3 | 150 |
Waveform | 21 | 3 | 5000 |
Heart | 13 | 2 | 270 |
Breast cancer | 31 | 2 | 569 |
The magnitude of the initial weights is materialized with uniform random values in the range −0.5 to
The real database of Iris flowering plant consists of measurements of 150 flower samples. For each flower, the four facets weighed for each flower are positioned here: a flower Sepal Length and Width and a flower Petal Length and Width. In fact, these four facets are involved in the categorization of each flower plant into apposite Iris flower genus: Iris Setosa, Iris Versicolour, and Iris Virgincia. The 150 flower samples are equally scattered amidst the three iris flower classes. Iris setosa is linearly separable from the other 2 genera. But Iris Virgincia and Iris Versicolour are nonlinearly detachable. Out of these 150 flower samples, 90 flower samples are employed for training and 60 flower samples for testing.
The neural network is orderly structured with 4, 5, and 1 neurons in the input, hidden, and output layers, respectively, for training Iris flowering plant database with the step size of 0.3 and momentum constant of 0.8. Such skeleton is put into training for 675 epochs by exploiting BPN and LAST algorithms.
The visual representation of the number of trained input samples and training time seized by BPN and LAST algorithms at every single epoch is laid out in Figures
IRIS dataset: epoch-wise input samples taken by BPN and LAST algorithms.
IRIS dataset: epoch-wise training time taken by BPN and LAST algorithms.
The empirical outcomes of BPN and LAST algorithms are rendered in Table
Result comparison of BPN and LAST algorithms for the IRIS dataset.
Neural network algorithm | Network topology | Number of epochs | Total number of input samples | Training time (in sec.) | Accuracy (%) |
---|---|---|---|---|---|
BPN |
|
675 | 60750 | 0.036189 | 91.67 |
LAST |
|
675 | 24148 | 0.016641 | 91.67 |
The Waveform database generator data set is holding the measurements of 5000 wave’s samples. The 5000 wave’s samples are equally distributed, about 33%, into three wave families [
A 3-layer feedforward neural network, containing 21, 10, and 1 units in the input, hidden, and output layers, respectively, is trained with the learning rate parameters
The visual representation of the number of trained input samples and training time seized by BPN and LAST algorithms at every single epoch is laid out in Figures
Waveform dataset: epoch-wise input samples taken by BPN and LAST algorithms.
Waveform dataset: epoch-wise training time taken by BPN and LAST algorithms.
The empirical results of BPN and LAST algorithms are rendered in Table
Result comparison of BPN and LAST algorithms for the waveform dataset.
Neural network algorithm | Network topology | Number of epochs | Total number of input samples | Training time (in sec.) | Accuracy (%) |
---|---|---|---|---|---|
BPN |
|
815 | 3912000 | 0.005806 | 97.00 |
LAST |
|
815 | 2035031 | 0.000328 | 97.50 |
The Statlog Heart disease database consists of 270 patient’s samples. Each patient’s characteristics are recorded using 13 attributes. These 13 features are involved in the detection of the presence and absence of heart disease for the patients.
The neural network is orderly structured with 13, 5, and 1 neurons in the input, hidden, and output layers, respectively, for training Breast Cancer database with the step size of 0.3 and momentum constant of 0.9. Such skeleton is put into training for 619 epochs by exploiting BPN and LAST algorithms.
The visual representation of the number of trained input samples and training time seized by BPN and LAST algorithms at every single epoch is laid out in Figures
Heart dataset: epoch-wise input samples taken by LAST and BPN algorithms.
Heart dataset: epoch-wise training time taken by BPN and LAST algorithms.
The empirical results of BPN and LAST algorithms are rendered in Table
Result comparison of BPN and LAST algorithms for the Heart dataset.
Neural network algorithm | Network topology | Number of epochs | Total number of input samples | Training time (in sec.) | Accuracy (%) |
---|---|---|---|---|---|
BPN |
|
964 | 212080 | 0.024903 | 90.00 |
LAST |
|
964 | 98976 | 0.004097 | 92.00 |
The Wisconsin Breast Cancer Diagnosis Dataset contains 569 patient’s breasts samples among which 357 diagnosed as benign and 212 diagnosed as malignant class. Each patient’s characteristics are recorded using 32 numerical features.
The neural network is orderly structured with 31, 15, and 1 neurons in the input, hidden, and output layers, respectively, for training Breast Cancer database with the step size of 0.3 and momentum constant of 0.9. Such skeleton is put into training for 619 epochs by exploiting BPN and LAST algorithms.
The visual representation of the number of trained input samples and training time seized by BPN and LAST algorithms at every single epoch is laid out in Figures
Breast Cancer dataset: epoch-wise input samples taken by LAST and BPN algorithms.
Breast Cancer dataset: epoch-wise training time taken by LAST and BPN algorithms.
The empirical results of BPN and LAST algorithms are rendered in Table
Result comparison of BPN and LAST algorithms for the Breast Cancer dataset.
Neural network algorithm | Network topology | Number of epochs | Total number of input samples | Training time (in sec.) | Accuracy (%) |
---|---|---|---|---|---|
BPN |
|
619 | 247600 | 0.023736 | 95.27 |
LAST |
|
619 | 39388 | 0.013930 | 95.27 |
From Tables
Total training time taken by BPN and LAST algorithm.
A simple and new Linear Adaptive Skipping Training algorithm for training MFNN is systematically investigated in order to speed up the training phase of MFNN. The previously empirical results demonstratively proved that the LAST algorithm dynamically reduces the total number of training input samples presented to the MFNN at every single cycle. Thus decreasing the size of the training set can reduce the training time, thereby increasing the training speed. Finally, the proposed LAST algorithm seems to be faster than the standard BPN algorithm in training MFNN, and also the LAST Algorithm can be incorporated with any supervised task for training all real-world problems.
Both algorithms were implemented on a machine with the aforementioned configuration: Intel Core I5-3210M processor, CPU speed of 2.50 GHz, and 4 GB of RAM. The MATLAB version used for implementation is R2010b.