Performance Analysis of an Optimized ANN Model to Predict the Stability of Smart Grid

The stability of the power grid is concernment due to the high demand and supply to smart cities, homes, factories, and so on. Diﬀerent machine learning (ML) and deep learning (DL) models can be used to tackle the problem of stability prediction for the energy grid. This study elaborates on the necessity of IoT technology to make energy grid networks smart. Diﬀerent prediction models, namely, logistic regression, na¨ıve Bayes, decision tree, support vector machine, random forest, XGBoost, k-nearest neighbor, and optimized artiﬁcial neural network (ANN), have been applied on openly available smart energy grid datasets to predict their stability. The present article uses metrics such as accuracy, precision, recall, f 1-score, and ROC curve to compare diﬀerent predictive models. Data augmentation and feature scaling have been applied to the dataset to get better results. The augmented dataset provides better results as compared with the normal dataset. This study concludes that the deep learning predictive model ANN optimized with Adam optimizer provides better results than other predictive models. The ANN model provides 97.27% accuracy, 96.79% precision, 95.67% recall, and 96.22% F 1 score.


Introduction
An electric grid is said to be smart if it tends to replace traditional appliances with smart ones. A smart grid helps in the smart distribution of electric energy, provides smart meter infrastructure, etc. e smart grid also motivates the use of renewable energy resources rather than nonrenewable ones [1]. Smart grids can be used to enhance the smooth functioning of every domain like electricity generation either from renewable or nonrenewable energy resources, distribution of energy according to the demand and supply for different smart sectors like smart homes, smart offices, smart factories, electric vehicles, and so on as shown in Figure 1.
Nowadays, renewable energy resources dominate the market for electricity generation [2]. Smart grid faces multiple problems like power grid resilience, cyber security in a smart power grid system, smart energy management or distribution, etc. [3]. With the growing demand for renewable energy resources, grid topology used for electricity distribution has become more and more decentralized.
Consumer of the electricity can also behave like its producer. So unlike traditional grids, power generation and consumption can occur from any terminal point [4]; such terminals are called Prosumers. Generation and distribution of power are not limited to a central node anymore.
With this decentralized approach, it becomes very difficult to manage and keep track of the generation and consumption of energy. is creates chaos to maintain information on the demand and supply of electric power among the producers and consumers. is, in turn, makes it difficult to maintain the stability of the grid system. To remove this overhead, a decentralize smart grid control (DSGC) is proposed [5]. is control system keeps track of the power demand and supply frequency on every producerconsumer node of the grid. Different decentralized topologies can be used in smart energy grids for power consumption and production as shown in Figure 2.
DSGC increases the stability of the smart grid. DSGC model considers some parameters like price elasticity, balancing power flow, and reaction time of nodes. ese parameters help to check the stability of the energy grid network. Balancing power flow deals with the units produced or consumed by producer or consumer, respectively. e reaction time of nodes deals with the change in the response according to the rise or fall in the price of power units. e most careful part of the decentralized smart energy grid is the information flow of electricity distribution.
is information flow helps in deciding the stability of the decentralized smart grid. For predicting stability, different ML and DL models play an indispensable role [6].
e main objectives of this study are (i) is study provides a detailed overview of the integration of IoT with a smart energy grid with the help of three-layered architecture (ii) is article discusses the behavior of a smart grid dataset (iii) is study helps to understand the effect of data augmentation on prediction accuracy (iv) "ANN" model is proposed for the stability prediction of a decentralized grid system (v) is study outlines the comparative analysis of traditional machine learning algorithms with optimized ANN e article is organized as follows: integration of IoT and the electric grid is very well explained in Section 2 with a layered architecture. e related literature review is given in Section 3. Section 4 describes a stepwise approach used in methodology for analysis purposes. It also provides a detailed overview of different prediction models used in this study. Section 5 discusses the experimental results and presents them in graphical form. Also, it provides a comparative analysis of all the models.

Layered Architecture for IoT and Smart
Grid Integration e Internet of ings (IoT) has provided innovative solutions to real-life problems [7]. IoT has upgraded different technologies like computational, sensing, communicational, etc. IoT can be integrated with grid systems which further helps to resolve problems of smart grids like balancing demand and supply of energy grid. [8].
IoT-integrated smart grid system is divided into three layers [9]. Figure 3 presents a layered structure of the IoTintegrated smart grid system. In bottom-up fashion, (i) First layer is the data collection layer. Different IoT devices like sensors and actuators are used over a  2 Complexity wireless sensor network to collect data from smart homes, distribution centers, smart factories, or smart renewable energy generation systems. is data are fed into layer 2.
(ii) Second layer is the data communication layer. In this layer, data are communicated to decision-makers, different monitoring centers, and authorities to determine priorities and parameters. User devices are also attached on the second layer only. (iii) ird layer is the data storage and processing. In this layer, data are processed and analyzed using intelligent techniques like artificial intelligence, machine learning, big data analytics, cloud computing, fog computing, edge computing, etc. ese techniques process the data according to the guidelines and results required by the authorities and decision-makers presented in layer 2. is processed data are presented to the user in graphical, tabular, or textual form.
IoT can enhance the functioning of the smart grid by improving sensing and measurement [11]. It helps in automatic monitoring and remote control. IoT can help in forecasting the usage of renewable resources. e distribution of electricity is a very important part of the energy grid, which is automated by IoT. e smart meter is one of the most popular applications of IoT in the smart energy grid [12,13].

Related Work
In the smart grid, work like assigning renewable energy resources, delivering short-term energy forecasting, and sensing the motion of the occupants are to be analyzed to enhance its efficiency. Yao et al. use machine learning models for the same and have given Machine Learning Energy-Efficient Framework (MLEEF) [14]. e authors used solar energy for this study. Occupant's profiles and energy profiles are studied in depth before applying machine learning frameworks to the dataset. Findings of the analysis are measured as accuracy for checking energy consumption and load forecasting. A hybrid machine learning model is proposed by Sharmila et al. for smart energy management by optimizing energy distribution from the energy sources to the recipient. In this model, SVM is used for regression and classification problems with the Big data five V's paradigm [15].
Energy Management Model (EMM) is presented by Ahmed et al. which uses ML models to manage energy flow for smart grid [16]. Both optimization techniques and machine learning models are used to manage the energy distribution. is study concluded that machine learningbased energy management models outperform the results of optimization-based EMM. Hourly data are collected from national renewable energy resources in the US for this study. Collected data statistics are used for its simulation. is   Complexity average dataset of 10 years is then distributed in three seasons: winter, summer, and spring. Season-wise energy management models are used for analysis to check which model performs better. Arzamasov et al. focused on the DSGC system [17]. ey focus on the frequency of the alternate current (AC) to check the stability of the system. It is discussed that frequency increases at the time of excess electricity generation and decreases at the time of reduced amount of electricity production. e authors focus to identify the instability of four node star network DSGC system after accessing the mathematical model discussed in the study of Schäfer et al.
Chen et al. used a deep residue learning model called Short-Term Long Forecast (SLTF) to forecast electric load over the network for a day ahead [18]. Authors have used ANN as a base. End-to-end complete new neural network is proposed in the study. Multiple neural networks ensemble in a single model. is ensemble is done in two phases. e first phase is to take snapshots of the training model. Authors have used Adam for optimization to enhance the learning rate. In the second phase, authors initialize different models independently. All models are trained by hyper-parameter tuning and then the average output of all the models is used for forecasting. A similar forecasting model using STLF is proposed by Y. Wang et al. in their study [47]. Ensemble Markov model is proposed to forecast the industrial electric load. In this model, authors have combined novel time series data mining techniques and then ensembled different prediction frameworks of the Hidden Markov Model (HMM). Min-Max normalization is used to normalize the data and this data are fed to the bagging algorithm for resampling. After sampling, hyper-parameters are set and HMM is applied to every sample. All results are ensemble based on log-likelihood. Different parameters like average absolute percentage error (MAPE), root mean square error (RMSE), and average absolute error (MAE) help to measure the efficiency of the model. Wang et al. use SVM and XGBoost machine learning models for forecasting industrial load [20]. Bayesian optimization algorithm is used to optimize the hyper-parameters of the XGBoost model. Different state-ofthe-art methods are used to predict model accuracy.
An SVM model is proposed by Gupta et al. to predict blackout in smart grid [21]. is model is trained on the historic dataset and evaluated probabilistically. Ge et al. used a hybrid algorithm to forecast industrial power load [22]. At first, the k-means algorithm classifies the data in different clusters. After that, the reinforcement learning model is used with the SVM model. To optimize the reinforcement model, the author has used swarm optimization.
is optimized hybrid algorithm has increased the accuracy of the model to predict the load over a real-time dataset.
Wei et al. have used Deep Belief Network (DBN) over the smart grid network to detect attacks for false data injection. Initially, researchers have used unsupervised machine learning algorithms to feed the Boltzmann machine with initial weights that will be used in DBN. Back-propagation technique of DBN helps to reduce errors in the top to bottom fashion and hence refine the model. is proposed approach of the researchers provides good results as compared with the false data injection attacks detected through the SVM model [23].
Amarasinghe et al. used deep neural network models to forecast energy load balancing. Convolutional neural network (CNN) forecasts the load. Also, the results of CNN are compared with other predictive models like LSTM sequence-to-sequence, shallow ANN, restricted Boltzmann model, and SVM over the same kind of dataset. Root mean square error (RMSE) helps in the comparative analysis of different prediction models [24]. Muzumdar et al. presented the reasonableness of using ML predictive models to forecast the balance and imbalance of energy flow in the energy grid [25]. Authors have used historic communication and supplied data from the data source. After that, data preprocessing steps are applied. Refined data is split into training and testing datasets on which different machine learning models like multi-layer perceptron (MLP), bagging, AdaBoost, decision tree, random forest, naïve Bayes model, KNN, SVM, and gradient boosting. Accuracy, RMSE (for both 10 cross fold and random split), and MAE (for both random split and 10 cross fold) are tried to find the correct predictive model among all the ML models applied to the dataset. Z. Guan et al. use a nonparametric Bayesian clustering model in a smart grid system in order to preserve the privacy of the big data one gets from the smart grid. For this proposed model, the Infinite Gaussian mixture model (IGMM) is used in the implementation process. e Laplace mechanism is used to release data so that the privacy of data can be maintained [48].
Wang et al. proposed a data reduction technique for wireless sensor networks (WANs) to predict forthcoming data. is technique is divided into two phases: the first one is the data reduction phase (DRP). DRP helps in reducing the amount of data to be transmitted over WAN. e second phase is the data prediction phase (DPP), in which nontransmitted data are predicted at the base station itself. Data predictions are done on the basis of the Kalman filter [27].
Alsamhi et al. used the ANN model to predict signal strength emitted from drones. ese signals are used to communicate with the IoT devices in a smart city environment. is signal strength is used to find the next location in the path of a flying drone [28]. Nyangaresi and Alsamhi presented a study for secure signal transfer in a smart grid. In smart metering, data are being transferred from consumers to power management centers. is study has proposed a traffic signaling protocol developed for preserving the privacy of the data. is protocol is found to be effective against Man-in Middle attacks, desynchronization attacks, and impersonating attacks [29].

Dataset Used and Its Preprocessing.
is study has used a data set from the UCI repository [30]. is dataset has the readings of a renewable smart energy grid that is supposed to satisfy the needs of three consumers. is dataset contains 10,000 records originally. But this dataset is augmented for better results and the augmented version of the dataset contains 60,000 observations. is dataset contains 14

4.2.
Stepwise Methodology Used. Figure 4 presents the stepwise approach used to predict the stability of the smart grid. A detailed explanation of the approach used in the research is as follows: (i) Firstly, a raw dataset is taken, and then it is augmented. Initially, the dataset has 6000 observations. Since the grid is considered to be symmetric and we are assuming three consumers in this study, this dataset is augmented by 3! (3 * 2 * 1 = 6 times). Data augmentation helps to get better accuracy in results. (ii) Next is the data preprocessing step. No feature engineering overhead is required because data are acquired from simulation exercises so there are no NaN or duplicate values. (iii) After that, some machine learning models like logistic regression, k-nearest neighbor (KNN), decision tree, support vector machine (SVM), random forest, naïve Bayes, and XGBoost are applied to the dataset to predict future values.
Hyper-parameter tuning of these models is performed in order to get the best results. All the specifications of these ML models are explained in a subsection in Section 4. (iv) After hyper-parameter tuning, the performance of all the predictive models is measured based on the evaluation matrices used (explained in Section 5.1). (v) Deep learning model is applied to the scaled dataset.
For applying ANN with the "Relu" activation function, the dataset is needed to be scaled. So, by importing the "StandardScalar" function of preprocessing package, the features of the dataset are scaled properly. (vi) After feature scaling, the proposed ANN model for smart grid is applied. Specification of the proposed ANN model is explained in detail in Section 4.2.8.
In this study, ML and DL models are deployed to predict the stability of the decentralized energy grid. Also, an artificial neural network (ANN), a deep neural network model, is used for the same. In the following subsection, these models are explained in detail.

Logistic Regression.
Logistic regression works somewhat similar to linear regression. Logistic regression predicts the outcome on the basis of the individual characteristics of each feature [31]. e logistic regression model is very easy to regularize. It calibrates output based on predicted possibilities [32]. Suppose Y is a predicted output feature that depends on the predictor variable X and X can be given as {x1, x2, x3, . . ., x n }. y is regarded as the output variable given as follows: y � a 0 + a 1 x 1 + a 2 x 2 + . . . + a n x n + e, where a 0 , a 1 , a 2 , . . ., a n are the regression coefficients. It uses logarithmic or logistic function for cost evaluation. (KNN) is a supervised machine learning model [33]. is supervised learning technique works on the principle of similarity among the neighbors. Classification of the data points is done on the basis of the distance between them. It is also called a lazy learner algorithm. KNN algorithm is used in many energy grid scenarios like to forecast low-voltage demand, and to measure current and its state [34,35].

Naïve Bayes.
Naïve Bayes network is also a prediction model that works on conditional probability [36]. For understanding purpose, let Z be a class label that will be classified on the basis of the conditional probability of independent features X and Y. Naïve Bayes [37] model will learn using the Bayes rule to calculate a conditional probability for class Z using instances Y 0 , Y 1 , Y 2 , . . ., Y n of attribute Y. Posterior probability is used to predict the class. When attribute Y is independent of attribute X, then the posterior probability of Z can be calculated, when the following condition is hold [38]:

Decision Tree.
Decision tree is used to predict the categorical dataset. Nodes of the decision tree are applied with decision rules which help in categorizing the dataset in particular classes [39]. It helps to analyze the instance of a feature and predict the target value. Any kind of null value in data does not create any hurdle for the decision tree model. Data normalization or scaling is not necessary but any small change in data can make this model unstable [40].

SVM.
It is a supervised machine learning model in which the type of kernel used plays an important part. Statistical learning is considered the base of this predictive model. Support vector machine (SVM) is used in classification as well as regression problems. Kernel used in SVM takes the input data and plots it to high-dimensional actual feature space. SVM model then creates a hyper-plane in feature space which is used to categorize the data points into different classes and to find out patterns for classification and regression problems. e cost function used in SVM to calculate prediction cost is given in equation (4).
where Lo � loss function and W(l, h) � capacity of learning model.

Random Forest.
Random forest is made up of many decision trees. Random forest is one of the ensemble learning techniques which uses trees as a base to learn and predict [41]. Prediction is made by taking an average of the aggregate given by each tree individually. Random forest can also adapt sacristy very well [42]. Random forest supervised learning model cannot work efficiently on high-dimensional datasets. M number of multiple decision trees are combined to form a forest with a finite value.
where query decision query x is denoted as m n (x; θ j , D n ) and θ 1 , . . . , θ m are random variables which are independent of each other and independent of D n .

XGBoost.
Exclusive gradient boosting technique has become very popular since it came into existence [43]. It is a tree boosting technique that is scalable in nature. It provides an easy way to prevent over-fitting problems [44]. It can be very helpful while using a sparse and low sample size dataset. When a dataset D = {(x i ,y i )} is given with n number of records and m number of features, then prediction of y i represented as y ∧ i can be given with the mathematical principle of XGBoost as follows: where k is the adaptive functions used to predict the output and f k is the regression tree decisive space [45].

Proposed ANN.
Deep learning (DL) is a subset of machine learning. DL models like ANN (artificial neural network) [46], CNN (convolutional neural network) [47], RNN (recurrent neural network), and RBM (restricted Boltzmann machine) are one of the supervised deep learning techniques. Also, deep learning models like AE (AutoEncoders), RM, CNN, RNN, and RBM can be used as unsupervised model also [45]. For this study, we have used an optimized ANN as shown in Figure 5. ANN model has a single input layer, three hidden layers, and a single output layer. e input layer contains 12 nodes, the first and second hidden layers have 24 nodes, the third hidden layer has 24 nodes, and the output layer has a single node. For the activation function, we choose rectified linear activation function which is popularly known as ReLU. ReLU function generally chooses the maximum value from the linear combination of inputs from the previous nodes for the output [48]. ReLU is chosen because this function gives output as either all zeros or ones. And with respect to our   6 Complexity dataset, either the grid is stable represented as "1" or the grid is unstable represented as "0" and it has all numerical functions within a given range. "Sigmoid" function is used as an activation function for the output layer because the dataset has only two prediction classes, this implies the dataset will be classified logistically.
To enhance the functioning of ANN, an optimization technique named adaptive optimization algorithm also referred to as "Adam" is used to predict stability of the grid. Adam optimizer function helps to optimize the weights of ANN network. Adam optimizer helps to optimize the learning rate of ANN model [49]. Table 2 presents values of hyper-parameters of different machine learning and deep learning models used for this study.

Experimental Result and Discussion
In this section, details of the dataset used for this study are elaborated. All kinds of necessary evaluation matrices used for the analysis of the model are also elaborated in this section. Finally, performance comparison results are presented for all predictive models used.

Evaluation of Metrics.
Repeated random test-train split validation method is used for consistency of machine learning models used for predictions. Repeated random testtrain split is basically a hybrid technique of the Hold-out validation approach and K-Fold cross-validation method. Hold-out validation approach is basically known as train and test split.
In this method, the dataset is randomly split into training and testing datasets which may lead to under-fitting or overfitting. K-Fold cross-validation is a technique in which the data set is divided into K folds and for all the splits, predictive models are applied. And for every split, accuracy is calculated. is helps to avoid over-fitting.
In the repeated random test-train split approach, one split the dataset in testing and training datasets, and then the splitting and evaluating processes are repeated for model prediction multiple times (as one does in K-cross-validation method). For repeated random test-train split, ShuffleSplit and cross_val_score libraries of sklearn are used in python.
e number of splits used is 10 and the test dataset is 30% of the total dataset. Accuracy for each fold in repeated random test-train split approach is shown in Table 3. For every model, the highest accuracy achieved is highlighted.
In this analysis experiment, binary classification accuracy, precision, recall, F1-score, and ROC curve are considered for evaluation.
is study has focused on the ✓ ree hidden layers ✓ ReLU + Sigmoid activation function ✓ Adam optimizer 8 Complexity accuracy of the model. Accuracy measures the correct values that are predicted as compared with the false value predicted and any prediction model is considered to be the best if it predicts maximum correct values. e more the accuracy is given by the model, the more it is considered to be suitable for prediction. Accuracy (A) is calculated as a ratio of rightfully predicted and classified testing samples to the total testing sample size.
Precision (P) is also known as false alarm rate. It is calculated as the ratio of the number of correctly classified stable test samples to the total number of samples predicted as stable samples. .
Recall (R) referred to as sensitivity or true positive rate, is the ratio of correctly classified stable test samples to total testing samples.
F1-score (F) is a weighted average of precision and recall. ROC curve is basically a graph that helps in the visualization of binary classifier performance on all possible thresholds. ROC curve is plotted between true positive rate against false positive rate. In this analysis, all machine learning models are compared and visualized with the help of the ROC curve as shown in Figure 6.
As shown in Figure 6, SVM is providing the highest true positive rate with the lowest false positive value. Next, close to SVM is XGBoost classifier. So, among all machine learning models, SVM gave the best performance and KNN and logistic regression model gave the least performance.

Performance Comparison Analysis.
Seven machine learning models and deep learning models are for the prediction of the stability of distributed smart energy grid. Different parameters used for predictive analysis like accuracy, precision, recall, and F1-score are already described in Section 4.2. Performance of different machine learning models is compared and shown in Table4. In this table, all parameters are taken as an average of repeated random testtrain splits. Table 4 shows the performance parameters of every model used for the study. Logistic regression model gave the least performance with an accuracy of 81.2%, precision as 75.8%, recall as 69.7%, and F1-score as 72.6%. Now, as compared with the logistic regression model, KNN model gave a better performance. KNN prediction model gave an accuracy of 81.61%. Precision score for KNN is 76.3% and recall is 70.7%. F1-score which is a harmonic mean of both precision and recall came out to be 73.4%.
After KNN next, Naïve Bayes is next in line. In terms of accuracy score, naïve Bayes and decision tree gave almost similar performance as accuracy for naïve Bayes model is 83.04%. But, true positive rate and correct prediction rate for decision tree are much better than that of naïve Bayes. Because precision for naïve Bayes came out to be 82.9% and recall is 66.2% for the same, F1-score for naïve Bayes is 73.6%. Decision tree performs better than logistic regression, KNN model, and naïve Bayes. Accuracy for decision tree came out to be 83.28%. Precision is 86.5%, recall is 84.5%, and F1-score is 85.5% for the same. According to the parameter results, random forest is next in line for better performance. Accuracy for random forest is 90.88%, precision value is 90.6%, recall is 82.4%, and F1-score is 86.3% for random forest. SVM and XGBoost perform best among all machine learning models. Accuracy value for SVM and XGBoost came out to be 92.77% and 92.88%, respectively, Precision is 92.1% and 91.2%, respectively, recall is 87.1% and 88.5%, respectively, F1-score is 89.5% and 89.8%, respectively. Proposed deep learning model for smart grid, that is, optimized ANN model outperforms all other predictive models with accuracy as 97.27%, precision is 96.79%, recall is 95.67%, and F1-score came out to be 96.22%. Following bar charts are used to visualize the comparative analysis of different predictive models on the basis of accuracy (Figure 7), precision (Figure 8), recall (Figure 9), and F1-score Complexity 9 ( Figure 10). On the basis of these charts and Table 4, this study concludes that ANN is the best model for stability prediction in smart grid.
Also, an augmented dataset helps to achieve better accuracy than that of the normal dataset. Dataset used in the study of Arzamasov et al. [17] used normal dataset which  gave an accuracy of around 80% for decision tree while augmented dataset with proposed model gave an accuracy of 97.27%. Table 5 shows the comparison of the proposed optimized ANN model with the model in this field with respect to accuracy.

Conclusion
Industry 4.0 has shown its impact by means of Industrial Internet viz. through IoT enabling technologies, taking smart decisions, automating industries deploying machine learning, deep learning, data analytics, etc. Power grid is one of the domains where all these technologies are used to make it smart. In smart grid, producers and consumers are connected through communication lines or the Internet. IoT can help to convert conventional energy networks to smart energy grids. is article presents a three-layered integrated smart grid IoT network. is integration can play a very major role to predict the stability of smart power grid in near future using data collected from these communication lines or the Internet.   Complexity Different predictive models like logistic regression, naïve Bayes, KNN, decision tree, SVM, random forest, XGBoost, and optimized ANN are used to analyze openly available smart grid datasets. ese predictive models are then compared based on accuracy, precision, recall, and F1-score. Also, ROC curve is used for the comparison of all the predictive models. Dataset is available on UCI machine learning repository and this dataset is firstly augmented and then scaled to find better results. As a result, we found that the decision tree predictive model on normal dataset gave an accuracy of 80% and the augmented dataset gave an accuracy of 83.28% for the same model.
Among all the predictive models, ANN optimized with Adam optimizer with ReLU and Sigmoid activation function gave an accuracy of 97.27% that wins the race. All comparative matrices are presented using bar charts. ese bar charts help to conclude that optimized ANN which is a deep learning model outperformed the remaining machine learning predictive models. Further, other deep learning models will be proposed which may give much better results for prediction. A real-time dataset may be used in near future, to apply the same predictive models and analyze their efficiency for it.

Data Availability
Data will be available on request from the submitting author.

Conflicts of Interest
e authors declare that they have no conflicts of interest.