High-Performance Machine Learning for Large-Scale Data Classification considering Class Imbalance

Currently, data classification is one of the most important ways to analysis data. However, along with the development of data collection, transmission, and storage technologies, the scale of the data has been sharply increased. Additionally, due to multiple classes and imbalanced data distribution in the dataset, the class imbalance issue is also gradually highlighted. The traditional machine learning algorithms lack of abilities for handling the aforementioned issues so that the classification efficiency and precision may be significantly impacted. Therefore, this paper presents an improved artificial neural network in enabling the high-performance classification for the imbalanced large volume data. Firstly, the Borderline-SMOTE (synthetic minority oversampling technique) algorithm is employed to balance the training dataset, which potentially aims at improving the training of the back propagation neural network (BPNN), and then, zero-mean, batch-normalization, and rectified linear unit (ReLU) are further employed to optimize the input layer and hidden layers of BPNN. At last, the ensemble learning-based parallelization of the improved BPNN is implemented using the Hadoop framework. Positive conclusions can be summarized according to the experimental results. Benefitting from Borderline-SMOTE, the imbalanced training dataset can be balanced, which improves the training performance and the classification accuracy. The improvements for the input layer and hidden layer also enhance the training performances in terms of convergence. The parallelization and the ensemble learning techniques enable BPNN to implement the high-performance large-scale data classification. The experimental results show the effectiveness of the presented classification algorithm.


Introduction
Classification is one of the most effective approaches in enabling the analysis of the digital data in quite a number of academia and research fields, for example, the medical researches [1][2][3][4][5][6] and the power system researches [7][8][9][10][11][12]. Multiple applications including image processing, pattern recognition, and pattern matching benefit from the accurate and efficient classification algorithms [1][2][3][4][5][6]. Li et al. [13] implemented the medical image classification using convolutional neural network (CNN). eir classification strategy can automatically and efficiently learn the graphical characteristics of the interstitial lung disease. As a result, the strategy is able to supply accurate and efficient classification performances. Jiao et al. [14] also employed and improved CNN to classify the mental load data. e experimental results show the effectiveness of their classification algorithm. Yang and Shen [15] reviewed the load classification researches of the electrical power system. e paper pointed out that classification is an effective way of researching the complicated load behaviors, which significantly affects the power production and consumption. In order to implement the natural language processing (NLP), Zhang et al. [16] employed deep learning as the underlying methodology to execute the text classification. eir work is able to identify the semantic labels of the texts successfully.
In the recent years, a number of classification algorithm researches have been presented, which are mainly based on k-means [17], fuzzy c-means (FCM) [18], neural networks (NNs) [19], support vector machine (SVM) [20], and so on.
For example, Peng et al. [7] employed a number of clustering algorithms containing k-means, k-medoids, self-organizing maps (SOM) [21], and FCM to recognize the patterns of the electrical load data. Xu et al. [8] improved and optimized k-means algorithm using canopy algorithm. e authors claimed that the clustering accuracy and efficiency of their algorithm have been significantly enhanced. Niazmardi et al. [22] presented an improved FCM algorithm which can achieve accurate results for clustering the hyperspectral data. However, quite a number of researches [23][24][25] pointed out that the flaws, for example, the sensitivity of outliers, difficulty of clustering nonlinear-separable classes, and empirical values-based parameters, existing in the abovementioned unsupervised machine learning algorithms prevent them from being effectively used. As a result, the supervised machine learning algorithms, for example, the neural networks, have been widely employed in the classification researches [26]. Gu et al. [27] optimized the learning rate and inertial factor for the traditional BPNN. eir improved self-adaptive BPNN algorithm shows great performances in terms of data modeling. Li et al. [9] combined BPNN and FCM to implement the electrical load forecasting. e authors reported that the load forecasting accuracy can be significantly improved. Based on the conclusions of the researches [10,11,[23][24][25], BPNN has been proved that it is quite suitable for the classification tasks. Although BPNN has a number of advantages, it still has flaws, for example, the slow convergence issue in its training caused by the sensitivity of initial weights, sensitivity of learning rate, gradient exploding, and gradient vanishing. However, several researches [28][29][30][31][32][33] suggested that batch-normalization [28] and ReLU [34] have great potentials to solve these issues.
Additionally, current digital data collection benefits from the developments of the smart meters, data communication systems, and data storage technologies, which results in sharply increasing of the data scale and data dimension. Tang et al. [35] pointed out that to overcome the issue, machine learning algorithms need focus on dimension reduction and sample selection techniques to improve the algorithm efficiencies. Farrell et al. [36] also proved that their improvements of the maximum entropy principle, random forest, and SVM can achieve satisfied performances for processing the high-dimensional data. Xu et al. [37] claimed that the data labeling and training efficiency are two main challenges for the large-scale data classification. erefore, they presented a k-means and SVM-based strategy to implement the large-scale data classification. eir experimental results show that based on their strategy, the size of the training dataset can be reduced with maintaining the classification accuracy. Liu et al. [23] further pointed out that BPNN encounters extremely low efficiency when it classifies the large-scale and high-dimension data. e research also stated that the distributed computing could be a suitable solution to overcome the low-efficiency issue. Su et al. [12] employed the Hadoop framework [38] to implement the efficiency improvement for BPNN. e authors reported that their solution achieves satisfied precision for the data forecasting. Liu et al. [25] also presented a Spark-based distributed BPNN algorithm [39] which shows remarkable efficiency for classifying large-scale data. However, Liu et al. [23] pointed out that the algorithm decoupling-based parallelization of BPNN may generate a great number of iterations, which deteriorates the processing efficiency. e authors also claimed that the data separation-based parallelization may reduce the final classification accuracy. erefore, to figure out a parallelized BPNN with high efficiency and accuracy is a valuable work.
It also should be noticed that due to multiple classes and uneven data distribution, the class imbalance issue [40] frequently exists in the training data. e issue could significantly affect the training effect, which further impacts the final classification accuracy. Zhang et al. [41] reported that the image recognition ability of the deep CNN declines in the case of unbalanced training data. erefore, the authors presented a classification method which recognizes a query image by comparing distances between category centers of CNN features of the whole training dataset and the corresponding CNN feature of the query image. Li et al. [42] also pointed out that although CNN supplies very high performances, it is still suffering from the problem of class imbalance. By adding an extra class-imbalance aware regularization, the authors presented a new loss function enabling CNN more sensitive to the samples of the minority classes. Zhang et al. [43] also claimed that the imbalanced class issue leads to significant misclassification of deep belief network (DBN). To address the issue, the authors unequalize the misclassification costs between classes, and then the costs are applied to DBN to achieve the accurate classification. Although the efforts contributed by the abovementioned researches are able to solve the class imbalance issue, Han et al. [40] suggested that the sampling technique should be an effective way of rebuilding the samples for the minority class and further solves the class imbalance issue efficiently.
As a result, this paper presents a BPNN-based highperformance large-scale data classification method considering class imbalance. e paper firstly improves the Borderline-SMOTE [40] algorithm using Fréchet distance [44] to solve the potential class imbalance issue in the training data. Secondly, in order to solve the slow convergence issue in the training phase of BPNN, zero-mean [45], batchnormalization, and ReLU are employed to improve the input layer, hidden layer, and activation function.
irdly, this paper also presents a MapReduce-based parallelization method for the improved BPNN. Based on the data separation and ensemble learning techniques, the parallelization of the improved BPNN can be implemented. e rest of the paper is organized as follows: Section 2 presents the details of the methodologies for the BPNN improvements; Section 3 shows the experimental results; Section 4 concludes the paper.

Class Balance-Based Improved BPNN in Enabling Large-Scale Classification
is section firstly presents the Fréchet distance-based Borderline-SMOTE which is able to solve the class imbalance issue in the training dataset, and then, the section presents the details of BPNN improvement using zero-2 Scientific Programming mean, batch-normalization, and ReLU. Finally, the section presents the ensemble learning-based parallelization for the improved BPNN using the Hadoop framework.

Fréchet Distance and Borderline-SMOTE-Based Class
Balance. Class imbalance issue may significantly impact the training of BPNN, which finally leads to the misclassification. Borderline-SMOTE [40] is proved to be an effective solution for balancing the classes in the training dataset. However, Borderline-SMOTE measures the similarity between data instances using Euclidean distance which lacks the ability for representing the shape features and sequential characteristics of the data instances. erefore, this paper employs Fréchet distance [44] instead of Euclidean distance to measure the similarity between the data instances.
where ||.|| P represents the Euclidean norm; inf represents the infimum of the set; [46]. e parameter t is continuous which cannot adapt to the computation for the discrete parameters. erefore, the discrete Fréchet distance is presented by researches [47,48]. Let P � (p 1 , p 2 , ..., p u ) and Q � (q 1 , q 2 , . . . , q v ) denote two discrete curves, and k denote the total number of the Fréchet permutations [47,48]; therefore, for one Fréchet permutation W j � {(P i , Q i )}, 1 ≤I ≤u, 1 ≤j≤k, the max value of the max distances in W j can be represented by d . As a result, the discrete Fréchet distance for P and Q can be represented by the following equation: Fréchet distance has better ability to represent the shape features and sequential characteristics of the data instances [44]. erefore, it is employed by this paper to compute the nearest neighbors for the Borderline-SMOTE algorithm.

Borderline-SMOTE in Enabling Class Balance.
Borderline-SMOTE is able to balance the classes for the imbalanced dataset [40]. It balances the dataset based on two remarkable advantages including the identification of the border between the major and minority classes, and the sample synthesis around the border.
Step 1. In one dataset T, for each point p i (i � 1, ..., pnum) in the minority class P, compute a set of m nearest neighbors using Fréchet distance (equation (2)). Inside the set, the number of the points belonged to the major class is m ′ (0 ≤ m ′ ≤ m).
Step 2. If m ′ equals m which indicates the m nearest neighbors are majority examples, p i is regarded as noise and neglected. Otherwise, if 0 ≤m ′ ≤m/2, p i has less chance to be misclassified, which does not need to be further processed. If m/2 ≤m ′ ≤m, p i has higher chance to be misclassified. erefore, for all p i in the minority class, a borderline set E � p 1 ′ , p 2 ′ , ..., p dnum ′ , 0 ≤ dnum ≤ pnum can be achieved, where p i ′ denotes a nearest neighbor of p i .
Step 3. For each p i ′ in E, compute a number of k nearest neighbors from the minority class P, and then a number of s points are randomly selected from the k neighbors. As a result, let r j (j � 1, 2..., s) denote a random value between 0 and 1, and p j ′ denote one of the s neighbors. erefore, a new instance synthetic j � p i ′ + r j × (p i ′ − p j ′ ) can be ultimately synthesized.
A demonstration performed by Fréchet distance-based Borderline-SMOTE is shown in Figure 1. Firstly, the algorithm is able to identify the border between two imbalance classes. Secondly, new instances can be synthesized to balance the classes and further highlight the border to distinguish two classes. erefore, if class imbalance issue exists in the training dataset of BPNN, the Fréchet distance-based Borderline-SMOTE has great potential to improve the training performance. However, it also should be noted that the values of two parameters k and m affect the performance of Borderline-SMOTE. erefore, in the later algorithm evaluation sections, the optimal values of k and m are selected from a series of preexecuted experiments.

Improved BPNN Using
Zero-Mean, Batch-Normalization, and ReLU. As an effective classification algorithm, BPNN has been successfully employed in quite a number of researches [10,11,26,49]. However, issues, for example, the sensitivity of initial weights, sensitivity of learning rate, gradient exploding, and gradient vanishing, are still needed to be handled. erefore, this paper firstly employs zeromean to normalize the input data instances. Secondly, batchnormalization and ReLU activation function are employed to overcome the convergence issues in the training phase. Figure 2 shows the architecture of a typical BPNN, which contains the input layer, several hidden layers, and the output layer. In the hidden and output layers, a number of neurons exist, respectively. In Figure 2, x 1 , . . ., x n represent the input of BPNN; w ij , b i , and g represent the weight, bias, and activation function of a neuron; a i represents the output of a neuron in the hidden layer; y represents the output of BPNN (a � y in the output layer). To facilitate the presentation, the aforementioned parameters are represented in the compact form of matrices X, W, b, A, and Y. In terms of the network training, two phases including feed forward and back propagation are employed. Basically, feed forward computes the input X using each layer of the network to achieve the output Y according to Y � g(W T X + b). Back propagation employs the loss function loss to compute the variance J between the output Y and the actual value Y_ according to J � loss(Y, Y_). e neural network chooses a proper optimization algorithm, for example, the stochastic gradient descent (SGD) [29] algorithm, to update W and b using

Brief Introduction of the Standard BPNN.
where α is the learning rate [50].
In terms of classification using BPNN, firstly the training phase should be carried out. Let instance i � [a 1 , a 2 , . . ., a n ] indicate the i th data instance in the training dataset; a denote one feature of instance i ; and c j denote the class that instance i is belonged to. Firstly, each feature of instance i is normalized. Secondly, BPNN inputs instance i to run the feed forward to compute the output Y, and then, the coded c j is regarded as the actual value Y_ to run the back propagation to update the weights and biases. As long as all the training instances in the training dataset have been processed with several epochs and iterations, the training phase terminates. In the classification phase, let instance k denote the k th testing instance in the testing dataset. BPNN inputs instance k and computes the output using only feed forward. e output is the class which instance k is belonged to. As long as all the testing instances in the testing dataset have been processed, the classification phase terminates. Figure 3 shows the BPNN improvements using zeromean, batch-normalization, and ReLU. e details of the improvements are presented in the following sections.

Zero-Mean-Based Input Layer Improvement.
Zero-mean is a normalization technique which is able to improve the data distribution to accelerate the gradient descent [45]. Let the input matrix X denote a number of batch-size input data instances in each iteration. Processed by the zero-mean layer shown in Figure 3, the zero-centered data instances become the actual input. Let the matrix X mean denote the average value of the number of batch-size input data instances, and the matrix X zero-centered represent the zero-centered input; therefore, equations (3) and (4) indicate the computations for X mean and X zero-centered :

Batch-Normalization-Based Hidden Layer Improvement.
Briefly, in the training phase of a multiple-hidden-layers neural network, the data distributions of different layers may vary, which is no longer independent and identically distributed (IID). erefore, the internal covariate shift (ICS) occurs [28], which causes two main issues impacting the training. e first one is that the variation of the parameters in the current layer may cause the distribution variation of the input data in the last layer. As a result, the last layer has to tune itself to adapt to the distribution variation so that the learning performance is deteriorated. e second one is that the values of W and b may keep enlarging, which leads to larger value of W × x + b in each layer. Consequently, the gradient saturation of the activation function may occur. erefore, the value of the updated gradient may be extremely small in the back propagation, which leads to the deterioration of the network convergence.
However, batch-normalization [28] employs the normalization operations to recover the input data distribution for each neuron in the hidden layers to the standard normal distribution N(0, 1). erefore, the activation function is able to work sensitively. Generally, in batch-normalization, the parameters mean μ and variance σ of the input data enable the normalization. Additionally, the tiny variations of μ and σ in each mini-batch can also improve the generalization of the neural network. e initial scale factor c and the displacement factor β are adopted to implement the linear transformation. Both of the factors can be trained in batchnormalization. As a result, batch-normalization is capable of dealing with larger initial weights, larger learning rates,  gradient issues, and overfitting issue, which significantly improves the training performance of the neural network [28]. is paper employs batch-normalization to improve the hidden layer of BPNN due to its remarkable advantages. It normalizes the linear output of the hidden layer, and then the output of batch-normalization is input into the nonlinear ReLU. e details of batch-normalization are shown in Algorithm 1.where D represents the output of the linear computation of the hidden layer; y i denotes the output of batch-normalization, which is the input of the nonlinear ReLU; ε is a constant value to stabilize the training; μ D and σ 2 x represent the mean and variance of the input data instances; d i represents the standardized intermediate data.
Similar to the weight and bias of BPNN, c and β are also updated in the back propagation phase according to c � c − α(zL/zy i )(zy i /zc) and β � β − α(zL/zy i )(zy i /zβ), where L represents the back propagated loss function of the hidden layer.

ReLU Activation Function.
To overcome the saturation of the commonly used Sigmoid and Tanh activation functions, as well as to improve the convergence of the network, this paper employs ReLU shown by equation (5) as the activation function for the neural network, which is able to handle the gradient issues [29]: where x presents the input of ReLU.

Ensemble Learning-Based Parallelization of BPNN.
In order to implement the large-scale data classification, the Hadoop framework based on MapReduce computing model [51] is employed to parallelize the improved BPNN. is paper firstly separates the entire training dataset into a number of data chunks which are saved in HDFS (Hadoop Distributed File System), and then, each participated mapper initializes one sub-BPNN and inputs one data chunk, respectively. Each mapper trains its own classifier individually so that a number of different classifiers can be finally achieved in parallel. erefore, for one testing data instance, these classifiers may generate different decisions. Finally, this paper also presents a weighted voting strategy to decide the ultimate classification result for the testing data instance.

Data Separation-Based Parallelization for the Improved BPNN.
Although BPNN can be directly decoupled and parallelized in Hadoop, the research [23] suggested that the efficiency of the parallelized algorithm is extremely low due to the imperfect iteration support of the Hadoop framework. erefore, the parallelization in this paper is based on the data separation. Firstly, the entire training dataset is separated into a number of data chunks using the random sampling. e data chunks are saved in HDFS. However, in each data chunk, the class imbalance issue may be intensified or alleviated due to the random sampling and the separation. erefore, if the class imbalance exists in one data chunk, Borderline-SMOTE balances the classes, and then, a number of mappers start, each of which individually initializes one improved BPNN as a sub-BPNN and inputs one data chunk to train the parameters of the sub-BPNN. e workflow of each sub-BPNN in a mapper is shown in Figure 4.
Finally, as long as the training for each mapper terminates, multiple different classifiers can be achieved in parallel. In the classification phase, one testing instance is input into all the classifiers. As a result, different classifiers may generate different classification results. One reducer collects all the results from the mappers to form an aggregation, in which the weighted voting is carried out to achieve the final classification result for the testing instance.

Weighted Voting.
In order to ensemble the multiple classification results into one final result in the reducer, this paper presents a classification reliability-based weighted voting, which can achieve the final classification result according to the reliabilities of the results from the multiple classifiers.
It is known that for different classes, each classifier has different classification accuracies. erefore, let C represent the reliability matrix; n represent the number of classes in the training dataset; and r � [r 1 , r 2 , ..., r n ] represent the classification accuracy for each class of the classifier. Consequently, softmax function [52] can be adopted to compute the reliability c i for the i th class of the classifier using the following equation: erefore, for a number of m classifiers, the reliability matrix C can be represented by the following equation: e output coding of each classifier is based on one-hot encoding [44]. So, the output p of the classifier is the probability distribution of each class p i (i = 1, 2, . . ., n) indicated by the following equation: erefore, the outputs of m classifiers form a probability distribution matrix P shown by the following equation: Scientific Programming 5 An intermediate matrix Q shown by equation (10) can be achieved by multiplying the reliability matrix C and the probability distribution matrix P: e sum r i ′ of the elements in the i th column in Q finally forms a weight matrix R denoted by the following equation: Ultimately, the final classification result class_label for the input data instance can be identified according to the following equation: class_label � arg max i R.
(12) Figure 5 shows the logical flow of the BPNN parallelization using Hadoop.

Experimental Result
e experiments are organized into three parts. e first part evaluates the performances of the Fréchet distance-based Borderline-SMOTE using the randomly generated linearly inseparable two-dimensional semiannular dataset. e second part evaluates the performances of the improved BPNN in the standalone environment using the Iris dataset [53], Wine dataset [52], and Vehicle Silhouettes dataset [54]. e third part evaluates the performances of the parallelized improved BPNN in the Hadoop distributed computing environment. e details of the experimental environments are listed in Table 1.

Evaluation for Fréchet Distance-Based Borderline-SMOTE.
A randomly generated dataset is employed to evaluate the Fréchet distance-based Borderline-SMOTE. e dataset has two classes, each of which contains 150 data instances, respectively. Based on the original dataset shown in Figure 6, an imbalanced dataset is generated, which is shown in Figure 7. Four class balance algorithms are implemented including Fréchet distance-based Borderline-SMOTE (k � 5, m � 5), random oversampling, random undersampling, and SMOTE. After the class balance for the imbalanced dataset, the improved BPNN carries out the classification using a number of 100 randomly selected training instances and a number of 90 testing instances. e classification results are shown in Table 2. Table 2 indicates that without the class balance, the classification accuracy is only 66.67%. However, all the class balance algorithms can help to improve the classification Input: a batch of input D � d 1 , d 2 , ..., d batch−size Initial scale and displacement factors: c, β Output:  accuracy, among which the Fréchet distance-based Borderline-SMOTE significantly outperforms the other algorithms.
To further evaluate the potential of the Fréchet distancebased Borderline-SMOTE, a number of 20000 instances which averagely belonged to two classes are generated using the random sampling. Firstly, a number of 5000 instances are randomly selected from class 1, and then a number of 1000, 500, 50, and 10 instances are randomly selected from class 2, respectively, to compose 4 imbalanced training datasets. Balanced by the Fréchet distance-based Borderline-SMOTE (k � 5, m � 5), the improved BPNN is trained, and then carries out the classification. e number of the testing instances is 10000. e classification accuracies are listed in Table 3. Table 3 shows that the imbalanced training data significantly impacts the classification accuracy. However, benefitting from the Fréchet distance-based Borderline-SMOTE, the classification accuracy can be improved substantially. Table 3 also indicates that the imbalance ratio can impact the classification accuracy. e slightly imbalanced training dataset achieves better classification performance as long as it is balanced by the Fréchet distance-based Borderline-SMOTE. Contrarily, the extremely imbalanced training dataset leads to low classification accuracy even if it is balanced.

Evaluation for the Improved BPNN.
is section evaluates the performances of the improved BPNN in the standalone environment. Iris dataset, Wine dataset, and Vehicle Silhouettes dataset are employed. e details of the training and testing instances are listed in Table 4. In terms of comparison, SVM, traditional BPNN, and self-adaptive BPNN [55] are also implemented. Table 5 shows the average classification accuracy based on 50 times experiments for each algorithm using the Iris dataset. e traditional BPNN performs the lowest classification accuracy with the largest epochs. e presented improved BPNN and the self-adaptive BPNN perform the similar classification accuracy. However, in terms of the number of epochs, self-adaptive BPNN slightly outperforms the improved BPNN. Tables 6 and 7 show the average classification accuracy based on 50 times experiments for each algorithm using the Wine dataset and Vehicle Silhouettes dataset. Firstly, the traditional BPNN also shows the worst performance. Secondly, in the Wine dataset-based experiments, although the self-adaptive BPNN slightly outperforms the improved BPNN in terms of epochs, the classification accuracy of the improved BPNN is higher than that of the self-adaptive BPNN.
irdly, in the Vehicle Silhouettes dataset-based experiments, the improved BPNN performs the best in terms of accuracy and epochs. Figures 8(a)-8(c) show the convergences of the traditional BPNN, the improved BPNN, and the self-adaptive BPNN using the Iris dataset, Wine dataset, and Vehicle Silhouettes dataset, respectively. For the simple Iris dataset, three algorithms perform relatively close convergences. However, for the complicated Wine dataset, both the           For the simple Iris dataset, the three algorithms perform closely. e presented improved BPNN slightly outperforms the other two algorithms. However, in terms of the Wine dataset, the traditional BPNN performs the worst. Although the classification accuracy of the self-adaptive BPNN can reach to 98%, its minimal accuracy is only 55%, which indicates the unstable performance of the algorithm. Contrarily, the improved BPNN performs the highest accuracy and the most stable performances. In terms of the Vehicle Silhouettes dataset, the improved BPNN also supplies the best accuracies.
To further indicate the performance of the improved BPNN, the statistical indices of evaluating the epoch and accuracy using three datasets for 50 times experiments are listed in Tables 8-10.
In the Iris dataset-based experiments, in terms of the epoch and accuracy tests, the improved BPNN and the selfadaptive BPNN perform similarly, though the improved BPNN shows slightly higher variance of the epoch, and also, both of them outperform the traditional BPNN.
In the Wine dataset-based experiments, although the traditional BPNN performs the least mean and variance of the epoch, it shows the lowest mean accuracy with the highest variance. Contrarily, the improved BPNN significantly outperforms the self-adaptive BPNN in terms of both the epoch and accuracy.
In the Vehicle Silhouettes dataset-based experiments, the traditional BPNN shows the worst performance. e improved BPNN outperforms the self-adaptive BPNN in terms of accuracy. For the epoch, the improved BPNN shows the least mean, and its variance is slightly higher than that of the self-adaptive BPNN. Figures 11(a)-11(c) indicate that for the improved BPNN, the classification accuracy is impacted by batch size, and also, in terms of different datasets, the optimal batch size which leads to the highest accuracy is different. erefore, a proper batch size is able to improve the classification accuracy for the presented improved BPNN. Figures 12(a) and 12(b) indicate the numbers of epochs, nonconvergence, and overfitting of the traditional BPNN and the improved BPNN with varying learning rates (lr � 0.1, 0.01, 0.001, and 0.0001; 50 times experiments for each lr).
Firstly, Figure 12(a) shows that if the learning rate is small (lr � 0.0001), the traditional BPNN cannot converge totally, which results in the loss of the experimental result. However, Figure 12(b) shows that even if the learning rate is 0.0001, the improved BPNN can still converge. Secondly, for each learning rate, respectively, the number of exceptions (nonconvergence and overfitting) of the improved BPNN is       In next experiments, imbalanced training datasets are generated based on the original Iris dataset, Wine dataset, and Vehicle Silhouettes dataset. e details of the training and testing instances are listed in Table 11. e imbalanced training datasets are firstly processed by the Fréchet distance-based Borderline-SMOTE (k � 5, m � 10), and then the improved BPNN carries out the classifications based on both the balanced and imbalanced training data for 50 times. e average classification accuracies are shown in Figures 13(a)-13(c).
Both Figures 13(a) and 13(b) indicate that the imbalanced training datasets significantly impact the training of the improved BPNN, which finally leads to severe misclassifications. However, balanced by the Fréchet distancebased Borderline-SMOTE, the network can be correctly and sufficiently trained. erefore, the classification accuracies can be greatly improved. However, due to the complicated attributes of the instances of the Vehicle Silhouettes dataset, although the classification based on the balanced dataset also outperforms the imbalanced dataset, its average accuracy is only 79.53%. is point indicates that the dimension of the data may severely impact the performance of the class balance algorithm.

Evaluation for the Parallelized Improved BPNN in Hadoop
Cluster. Firstly, the original Iris dataset is employed to evaluate the classification accuracy of the parallelized improved BPNN. A number of 105 instances are the training instances which will be separated for the parallelized training. e other 45 instances are the testing instances. A number of three mappers start in parallel. Each mapper initializes one sub-BPNN. e standalone BPNN and the parallelized long short-term memory network (LSTM) are also implemented in terms of comparison. Especially, the configuration of the parallelized LSTM is as the same as that of the parallelized BPNN. e number of the training  instances is 105 and the number of the testing instances is 45 for the standalone BPNN. Figure 14 indicates that the parallelized BPNN outperforms the standalone BPNN and the parallelized LSTM. Benefitting from the weighted voting, the potential accuracy loss issue in the data separation is properly handled. Actually, the classification accuracies for the 3 sub-BPNNs are 96.88%, 94.44%, and 91.89%, respectively. However, the accuracy for the aggregated result can reach to 100%. It further proves that the weighted voting can help to aggregate the weak classifiers to a strong classifier. Additionally, the data separation may intensify the class imbalance issue of the training dataset in each sub-BPNN. However, the Fréchet distance-based Borderline-SMOTE can effectively solve the issue, which guarantees the classification accuracy. Moreover, the parallelized LSTM performs slightly worse than the parallelized BPNN. It is well known that LSTM is suitable for processing the timeseries data. However, the Iris dataset employed for the experiments is not related to time series, which indicates that the type of the dataset may affect the performance of the parallelized LSTM in terms of accuracy.
In terms of the classification efficiency evaluation, this paper duplicates the size of the Iris dataset from 1 MB to 1024 MB. e cluster starts 16 mappers in parallel. e processing times of the standalone BPNN, the parallelized BPNN, and the parallelized LSTM are listed in Figure 15. Figure 15 indicates that the performances of three algorithms are quite close with smaller data sizes. Due to the overhead of the Hadoop cluster, the standalone BPNN even outperforms the parallelized BPNN and the parallelized LSTM. However, along with the increasing data sizes, because of the computing resource limitations of the standalone environment, the processing time of the standalone BPNN increases sharply. Contrarily, due to less computations, the parallelized BPNN outperforms the parallelized LSTM slightly and processes the large volume of data relatively efficiently.

Conclusion
In order to serve the classifications for large-scale data, this paper presents a parallelized improved BPNN algorithm. e parallelization is based on the data separation, and the parallelization is implemented using the Hadoop framework. To overcome the classification accuracy loss issue caused by the separation, the weighted voting is presented to improve the classification accuracy. Based on the experimental results, the parallelization shows effectiveness for dealing with the large-scale data. However, there are other two issues existing. e first is that the class imbalance issue in the training dataset significantly impacts the training effect of BPNN, which finally leads to the deterioration of the classification accuracy. erefore, this paper presents the Fréchet distance-based Borderline-SMOTE algorithm in enabling the class balance. According to the experimental results, the balanced training dataset can tremendously improve the classification accuracy. e second is that the convergence issue may exist in BPNN. erefore, this paper improves the input layer, hidden layer, and activation function of BPNN by employing zeromean, batch-normalization, and ReLU, respectively. Based on the comparisons to the traditional BPNN and the selfadaptive BPNN, the presented improved BPNN has great potential to serve the classification tasks accurately and efficiently.
Data Availability e data of the models and algorithms used to support the findings of this study are included within the article.   14 Scientific Programming