A Hybrid Deep Learning-Based Network for Photovoltaic Power Forecasting

For eﬃcient energy distribution, microgrids (MG) provide signiﬁcant assistance to main grids and act as a bridge between the power generation and consumption. Renewable energy generation resources, particularly photovoltaics (PVs), are considered as a clean source of energy but are highly complex, volatile, and intermittent in nature making their forecasting challenging. Thus, a reliable, optimized, and a robust forecasting method deployed at MG objectiﬁes these challenges by providing accurate renewable energy production forecasting and establishing a precise power generation and consumption matching at MG. Furthermore, it ensures eﬀective planning, operation, and acquisition from the main grid in the case of superior or inferior amounts of energy, respectively. Therefore, in this work, we develop an end-to-end hybrid network for automatic PV power forecasting, comprising three basic steps. Firstly, data preprocessing is performed to normalize, remove the outliers, and deal with the missing values prominently. Next, the temporal features are extracted using deep sequential modelling schemes, followed by the extraction of spatial features via convolutional neural networks. These features are then fed to fully connected layers for optimal PV power forecasting. In the third step, the proposed model is evaluated on publicly available PV power generation datasets, where its performance reveals lower error rates when compared to state-of-the-art methods.


Introduction
Photovoltaic (PV) power generation is one of the easiest-toaccess, low-cost, and most promising sources of renewable energy. When the energy demands rise in the developing country, the PV power generation annually increases; therefore, it mitigates the global energy and climatic change crisis [1]. According to the Global Future Report, by 2050, the PV generation capacity will reach 8000 GW [2]. However, different atmospheric variables such as temperature, solar irradiance, humidity, and cloud properties cause significant uncertainty in integrating PVs to microgrid (MG) [3][4][5][6][7]. In contrast, an effective PV power forecasting model greatly improves solar power utilization [8][9][10]. erefore, efficient forecasting models in the utility grid will operate the power grid economically and transfer the required energy to the end-users [11,12]. Over the years, for efficient energy management and distribution, MG has played an important role in ensuring reliability, two-way power flow, self-healing, and demand response [6]. Although MG offers several advantages, due to the volatile and intermittent nature of PV power, integrating a larger portion of renewable energy into existing power generating systems creates several challenges, such as load and demand mismatch, poor scheduling, operation, penalties enforced by customers, and fluctuations in the load connected to the power systems. To tackle these challenges, integrating an intelligent forecasting model into the MG greatly reduces the aforementioned problems.
Forecasting PV power belongs to the time series (TS) forecasting problem which are divided into univariate and multivariate forecasting [13]. Based on the time horizon, these methods are divided into three types, such as long-term, medium-term, and short-term power forecasting [14,15]. For different scheduling and planning, each type has its own uses, for example, contributing to long-term planning and decision-making such as month or year, usually long-term forecasting is used. Similarly, for medium-term scheduling, such as looking ahead one week or less, mediumterm forecasting is used. Finally, short-term forecasting is the most challenging since the target is to look ahead for a short period of time, such as hours, but it is the most reliable and accurate method for PV forecasting. e forecasting models are divided into three types, such as physical, statistical, and deep learning models [12]. Historical data is not needed in a physical model but they are used in the solar radiation and the interaction between physics laws [16], where it further consists of three sub-modules, such as numerical weather prediction [17], total sky image [18], and satellite image [19]. e modelling techniques of the physical model can be divided into regression model [20], autoregressive [21], grey theory [22], Markov chain [23], and fuzzy theory [24]. However, physical models poorly perform in ultra-short-term forecasting because it takes a long time and only produces six hours of meteorological data [16]. ere are huge deviations and low precision in the results of the physical models; therefore, it is impractical to use them in PV forecasting [17] in the MG. e statistical forecasting modelling establishes a mapping relation between the historical data and the target forecasting data using the future prediction of PV power [16]. It is easy to use and possesses strong interregional versatility, but due to the complex and volatile nature of PV power generation, its TS is complex and nonperiodic [25].
e traditional statistical forecasting model provides limited performance on large-scale historical data due to long-range complex temporal information. Furthermore, due to shallow and simple processing methods, nonlinear PV power patterns are highly affecting the prediction of PV. erefore, researchers investigated ANN-based approaches and significantly improved the performance of PV power due to their ability to learn the variational pattern of PV [26]. However, because of different atmospheric variables and complex patterns of the weather conditions, it is unable to extract the corresponding deep nonlinear characteristics and TS dynamics of PV power [27,28]. e task of nonlinear mapping and feature extraction is extremely challenging; therefore, the best way to tackle these challenges is to employ deep learning models with the ability to extract the discriminative features end-toend [29,30]. In recent years, the application of deep learning models has significantly improved for image classification [31,32], video classification [33][34][35][36][37], and power forecasting in TS data [38][39][40][41][42]. For instance, Khan et al. [43] proposed a hybrid model for electricity forecasting in residential and commercial buildings. ey used the CNN model for spatial feature extraction and then applied a Bi-directional LSTM (Bi-LSTM) network for temporal feature extraction. Li et al. [44] proposed a hybrid model that integrated wavelet transform with CNN for PV power prediction in various horizons. Similarly, in [45], the authors predicted the dayahead weather forecast data from the solar irradiance using LSTM and then established a mathematical model between irradiance and PV power to analyze the forecasting. Yona [46] proposed a novel method that uses atmospheric data and a deep neural network for the next day's PV generation.
However, to accurately forecast the PV power, numerous researchers investigate different techniques to map the association between the historical data and the target attributes. eir methods are mainly focused on only spatial or temporal features, but without focusing on different discriminative features extracting strategies to hold the longrange temporal dependencies among complex PV power patterns. erefore, in this paper, we explore different feature extraction mechanisms and finally propose a hybrid model that prioritizes temporal features first followed by spatial features for PV power forecasting. Our proposed model was evaluated on four publicly available PV power generation datasets for an hour-ahead forecasting. e experiments concluded that the proposed feature extraction mechanism achieved the lowest error rates when compared with state-of-the-art techniques.
e contributions of the proposed model are summarized as follows: (1) A novel framework is proposed for the MG to accurately forecast an hour-ahead power generation to effectively manage the energy distribution between the consumers and suppliers. Next, a comparative study is conducted over different deep learning models for efficient feature extraction mechanisms, and finally, a hybrid GRU-CNN network is proposed. (2) e mainstream methods first learn the spatial and then temporal features that degrade the overall performance for complex nonlinear PV power patterns. Herein, the temporal features are prioritized over spatial features to efficiently learn the longrange complex non-linear PV power patterns for an hour-ahead PV power forecasting. e proposed model learns temporal dependencies using a multilayered GRU sequential deep model and spatial patterns using convolutional features, thus making our proposed model robust and generalized for an hour-ahead PV power forecasting.
(3) To validate the performance of the proposed model, standard TS performance metrics such as mean square error, mean absolute error, root mean square error, and mean bias error are used to compared it with existing state-of-the-art methods over benchmark datasets. Our experimental results achieve the lowest error rates compared to other state-of-the-art methods.  [29] used PV power forecasting using CNN. Sezer and Ozbayoglu [57] used the CNN model and changed the input format from 2D to 1-D for TS data. Usually, CNN is suitable to extract and learn spatial features from the input data; however, temporal features also play a key role in TS PV power prediction. erefore, researchers used the LSTM model for long-range temporal dependencies, for example, Qing and Niu [58] used meteorological and weather data as input to the LSTM model for solar irradiance prediction. Recently, researchers concluded that integrating CNN with the LSTM model overcomes the shortcoming of a single model, as it utilizes the advantages of multiple models to jointly learn the spatial and temporal information for accurate and complex PV forecasting. Hybrid models are also introduced in the TS prediction domain, for example, Liu et al. [59] used wavelet transform followed by CNN to extract low-frequency information, while LSTM is used for high-frequency information extraction. Qin et al. [60] used the CNN model for spatial feature extraction while the temporal features were extracted by the LSTM model. To reduce the energy crises and limit the harmfulness of climatic changes, researchers proposed different techniques as mentioned above to integrate PV power forecasting into their existing power generation systems. e existing traditional methods employ structural and parameter adjustments of the forecasting model. eir performance is better for traditional forecasting tasks. However, due to the extremely unsteady nature of the PV power, especially on cloudy and rainy days [61,62], their performance is extremely degraded. In the literature, most researchers claim that for accurate PV power forecasting, both spatial and temporal features are important [63,64]. e existing standalone network of deep learning paradigms is only capable of exploring spatial or temporal features. To address these challenges, researchers are developing hybrid networks that have the potential to learn spatial and temporal features at the same time. However, in the context of PV power forecasting, hybrid networks are developed in the literature without focusing on the discriminative features of spatial and temporal ordering. erefore, in this paper, we have comprehensively analyzed different feature extraction mechanisms by using a hybrid model. Our experiments concluded that learning temporal features by GRU followed by spatial features by CNN has much more efficient and effective pattern representation and learning potential, thereby achieving the highest accuracy and greatly reducing the error rates as compared to state-of-the-art methods.

Proposed Methodology
is section briefly discusses the overall flow of the proposed framework, where power from the main grid flows through the MG towards the end users, as visualized in Figure 1. In this research, we have developed an intelligent and robust hybrid deep learning inspired model, which mainly consists of three steps: processing; model training; and its evaluation. In the preprocessing step, outliers and abnormalities are removed from the data, while in the second stage, a training procedure is applied on various machine and deep learning models. In the third stage, the final PV forecasting is computed and evaluated using different error metrics. All these steps of the proposed method are discussed in subsequent sections.

Preprocessing.
A recent study shows that the performance of the deep learning model highly depends on the input data [45]. erefore, the PV power data is refined in terms of filling missing values, removing outliers, standardization, and normalization, then the proposed deep learning model efficiently extracts the meaningful patterns more conveniently. e existing PV power data is obtained from the solar panel in a raw format that is incomplete and unorganized [42]. It contained abnormalities because of sensors' faults, bad weather conditions, and variable customer consumptions. Feeding these data directly to the deep learning model degraded the overall prediction [40]. erefore, the input data is fed to the preprocessing stage to fill in missing values by taking the mean of the next and previous values. en the data is normalized, and outliers are removed via the min-max and standard deviation methods, respectively.

Temporal Feature Extraction.
To capture long-range temporal dependencies in the complex PV power foresting data, most of the researchers used a recurrent neural network (RNN) that learns weights across the hidden layers of the network for long-range dependencies in TS data [65]. e intermediate layers of the RNN preserve meaningful information from the previous state. e visual representation of the internal structure of RNN is shown in Figure 2(a), where the input and output are represented by x t and y t at time t, similarly, the output of the single hidden layer at time t is represented by a t , where w represents the weight metrics. Figure 2(a) can be mathematically represented as in equation (1).
In equation (1), the terms g 1 , b a , and b y are used to represent the nonlinear activation and bias terms, while the term w refers to the learn weights when capturing temporal dependency in PV power forecasting. RNN suffers from the vanishing gradient problem when the time interval of the target output is long, therefore a special variant called GRU resolves the vanishing gradient problem, which has two structure-gated mechanisms such as reset and update. As a result, it is less complex than the LSTM model because it has fewer gates and require a small number of parameters during training [66].
eir visual representation is shown in Figure 2(b).
e mathematical representation of GRU is given in equations (2) to (5), the updated and reset gate is represented by w u and w r , similarly, the candidate activation and basis vectors are represented by C t and b u , b r , b c ,, respectively. e c t is the output of the current unit which is connected to the input of the next unit. Furthermore, c t− 1 is the input of the current unit, which is also the output of the previous units. e σ and tanh represent the activation function while the input of the training data and their corresponding output are represented by x t and y t at a time stamp t. e reset gate and update gate are represented by Ґ r and Ґ u . Step 2: Training stage

Spatial Features
Step 3: Model evaluation Main Grid Step 1: Preprocessing stage convolutional layers, pooling layers, and fully connected layers. Convolution layers are the core layers that are responsible for extracting local features. e extracted features of the previous layer are multiplied with the convolutional kernel to form the output feature map j. It contains convolution with multiple input feature maps; their mathematical representation is given in equation (6).
Here, the feature map of the input convolutional layers l and C j are represented by t (l) j , while the bias, kernel, and output of the convolutional layer are represented by b (l) j , y (l) j , and w (l) ij , respectively. A Relu f activation function is used throughout the network and its mathematical representation is shown in equation (7) f(x) � max(0, x).
e pooling layer is mainly responsible for reducing the dimensions of the features, also known as the downsampling layer. It has several variants, such as average, maxpooling, etc.

Network Architecture.
e GRU module captures the long-range dependency, so it is capable of learning useful information from TS data using the memory cells. e nonsalient information is discarded by a memory gate called the forget gate. eir output is directly connected to the CNN module. In the proposed hybrid model, the GRU module consists of two layers. In the first and second layers, 32 and 64 cell sizes are used, followed by a two-layered CNN module having a kernel size of 3 and a filter size of 64 in each layer. For nonlinearity, a ReLU activation is used. A detailed summary of the proposed model is given in Table 1. e output features are then flattened and a fully connected layer with 16 numbers of neurons is applied. An MSE is used as a loss function when the model is successfully trained, and then we evaluated it on testing data.

Experimental Results and Discussion
In this section, we discussed the PV power datasets, evaluation metrics, and comparative analysis with state-of-theart methods. e proposed model is implemented in the Python programming language and the Keras (2.3.1) with TensorFlow (1.14.0) deep learning framework. Windows 10 operating system with a GeForce RTX 2070 SUPER graphics card is used to speed up the training process and the complete details are given in Table 2.  Table 3. All these datasets are recorded from active solar power generation plants at fiveminute resolution with different power generation capabilities. It consists of different attributes, for example, power generation and meteorological elements such as wind speed, weather temperature, etc. For training purposes, these datasets are divided into 70% for training, 20% for validation, and 10% for testing.

Evaluation Metrics.
e performance of the proposed model is evaluated on the four widely used forecasting metrics such as MSE, MAE, RMSE, and MBE, which are mathematically expressed in equations (8) to (11).

Experimental Results and Discussions
e performance of the proposed model is evaluated with several deep learning models such as LSTM, GRU, CNN-LSTM, CNN-GRU, LSTM-CNN, and finally, the proposed GRU-CNN model.

Detailed Comparative Analysis.
To analyze the performance of the proposed model, we have used four real-world PV power datasets, and their details are given in Table 3. In the literature, there are two types of feature extraction; one refers to spatial or temporal features extraction, and the second is a hybrid model where the spatial or temporal features are prioritized, respectively. Table 4 shows one-hour ahead PV power forecasting of the different standalone and hybrid models. Here, the error rate such as MSE, MAE, RMSE, and MBE of the proposed hybrid model is comparatively lower than standalone models. A graphical comparison of the forecasting results of naïve (SVR), stateof-the-art (LSTM-CNN), and the proposed model is given in Figure 3. While the visual representation of the proposed model on each dataset is given in Figure 4. e results reveal that the performance of naïve forecasting methods is much worse than the state-of-the-art and our proposed method. As given in Figure 4, there is a narrow gap between actual and forecasted values by the proposed model. is gap is higher in state-of-the-art models and much higher in naïve forecasting models.    8 Complexity To summarize the Table 4 experiments, in the TS PV power forecasting, effective feature extraction highly correlates with the forecasting of the deep learning models. In our case, the temporal features are prioritized first and then reduced their dimensionality. Using 1D-CNN to extract spatial features is an effective approach for modelling complex PV power forecasting patterns. However, extracting temporal features using an LSTM model is not effective because it uses 3-layer structuring gates.
erefore, due to high-dimensional features, the final layers of LSTM are not able to recognize the complex patterns of PV power. While the GRU uses two layers of structure, its feature space is small as compared to LSTM; thereby, GRU requires fewer computations and achieves the highest accuracy. e performance of the GRU-CNN model on the four datasets concludes that the proposed model is more suitable to be deployed in real-world PV power forecasting at MG.

Quantitative Evaluation.
In this section, the experimental results are discussed to compare the performance of our model with deep learning models. Table 5 shows the performance of the proposed model with existing state-ofthe-art models, herein, the first part shows the results of the DKASC-AS-1B dataset when compared with existing stateof-the-art models. For instance, Wang et al. [45] used the 1D-CNN model and achieved 0.304 and 0.822 values for MAE and RMSE, respectively.
Similarly, a hybrid approach is also used where they extracted the spatial features with the help of CNN and then LSTM is used to learn the temporal information,   Table 5 represents the performance of the DKASC-AS-2Eco dataset compared with existing techniques. In baseline research [44], the author's experiments on multilayer perceptron (MLP) achieved 1.0861 and 0.1995 values for RMSE and MBE, respectively. ey also used RNN and reported 1.0581 RMSE and −0.1442 MBE. An LSTM and GRU network is also used for PV forecasting, and they have achieved 1.0382, −0.084, and 1.0351, 0.1206 values for RMSE and MBE, respectively. In the last model [44], the authors decomposed the power series task into subseries by employing wavelet packet decomposition and then used the LSTM model, achieving 0.2357 and 0.0067 values for RMSE and MBE, respectively. Our proposed model achieved superior performance of 0.1646, 0.0271, 0.1157, and −0.0641 for RMSE, MSE, MAE, and MBE, respectively, when compared to existing models. Finally, the performance of the proposed model is evaluated on the DKASC-Yulara-SITE-3A [70] dataset against state-ofthe-art methods. Chen et al. [71] proposed a radiation coordinate classification called (RCC-LSTM) for solar forecasting.

Conclusion
Accurate PV power forecasting plays an important role in avoiding penalties enforced by customers on various production companies, building trust in the energy markets, and is helpful in energy generation scheduling. Mainstream traditional and deep learning methods rely on simple features and only consider spatial or temporal features to inherent nonlinear patterns of PV power series. In the proposed framework, we have investigated different features extraction mechanisms and experimentally proved that the proposed temporal and spatial features extraction outperformed the existing state-of-the-art methods. Our proposed framework mainly consists of three steps. In the first step, preprocessing is applied to the input data to fill in the missing values and normalize the data. After normalization, the data is fed to the GRU-CNN model to first learn the temporal and then spatial features. Finally, the performance of the proposed model is evaluated against its rivals, advocating better prediction abilities with the lowest error rates and better generalization potential. In the future, we are planning to deploy the proposed model over resourceconstrained devices of home appliances for energy management.

Conflicts of Interest
e authors declare that they have no conflicts of interest.