An Improved Load Forecasting Method Based on the Transfer Learning Structure under Cyber-Threat Condition

Smart grid is regarded as an evolutionary regime of existing power grids. It integrates artificial intelligence and communication technologies to fundamentally improve the efficiency and reliability of power systems. One serious challenge for the smart grid is its vulnerability to cyber threats. In the event of a cyber attack, grid data may be missing; subsequently, load forecast and power planning that rely on these data cannot be processed by generation centers. To address this issue, this paper proposes a transfer learning-based framework for smart grid scheduling that is less reliant on local data while capable of delivering schedules with low operating cost. Specifically, the proposed framework contains (1) a power forecasting model based on transfer learning which can provide high quality load prediction with limited training data, (2) a novel adaptive time series prediction method with modeling time series from a covariate shift perspective that aims to train the forecasting model with a strong generalization capability, and (3) a day-ahead optimal economic power scheduling model considering a shared energy storage station.


Introduction
In recent years, the emergence of renewables and big data has prompted a reform of the electrical network. As a consequence, the concept of a smart grid becomes increasingly popular [1]. A smart grid is defined as the next generation electrical grid with power-flow control, selfhealing, and energy reliability using digital communications. Compared to the conventional power system, the smart grid is designed to integrate millions of smart sensors and advanced computing technologies into the whole grid [2]; it can efficiently realize real-time automatic control, intelligent regulation, online analysis and decision making, cooperative interaction, and other advanced functions of the power grid. One feature of the smart grid is the high share of renewable generation, which poses a threat to its reliability due to the renewable intermittency. One widely adopted solution to this problem is to employ the energy storage system (ESS) in microgrids. Since wind and photovoltaics power are nondispatchable parts, the dispatching strategy of ESS becomes an important component in the smart grid with renewable generation.
Most existing energy storage planning approaches rely on historical data or highly precise generation/load forecast [3,4]. Current power forecasting methods can be broadly categorized into two types: methods based on statistical analysis [5] and methods based on artificial intelligence (AI) algorithms [6]. For methods in the first category, the Bayesian theory, multivariable linear regression, and autoregressive moving average (ARIMA) are often employed [7,8]. ese methods are better suited for fitting data that have periodic features; the high percentage of renewable energy sources considerably increases the randomness of power variations, making methods in this category less suitable for applications in smart grids. Methods based on AI algorithms, on the other hand, are theoretically favorable in predicting an output for systems with high nonlinearity and complex dynamic properties. In particular, a recurrent neural network (RNN) [9] helps handle nonlinear problems and capture more dynamic relationships between the input and the forecasted output compared to the artificial neural network (ANN) [10], and the long shortterm memory (LSTM) unit proposed in [11] further improves its performance in the prediction of time series data.
A noteworthy concern is that all the abovementioned methods require sufficient training data. e heavy reliance on information networks leads to higher cyber risks for smart grids [12]. In the event of a cyber-attack, the local database could be tampered and lost, which leads to serious consequences [13]. For instance, Ukraine's electricity supply system was hacked in 2017 through an attack on the data aggregator, which is the central node containing data from data collection base stations [14]. is event caused massive power outages, paralyzed connected nodes, and prevented the control center from accessing customer load data in time for power control and scheduling operations. It is of great interest to reduce the grid control centers' heavy reliance on local users' data, so that the day-ahead or long-term power dispatch would not be blocked by insufficient data. e primary issue that needs to be addressed is load forecasting based on inadequate local data.
Transfer learning (TL) [15,16] is a suitable tool for addressing the challenges discussed above. It is of great value to introduce transfer learning in power systems to efficiently utilize resources from different regions, discover the commonality of different datasets, and establish transfer learning-based forecasting methods.
e key idea of transfer learning is to use the existing experience to solve similar tasks, exploit similarities between data and models, and apply the trained content to new tasks. Specifically, transfer learning allows the use of knowledge in the dataset with complete labels (i.e., the source domain) to solve problems in the dataset with missing labels (i.e., the target domain), using a trained model with good generalization capability [17]. e research in [18] presents the outstanding contribution of transfer learning in the field of image processing. Innovative breakthroughs have also been made in the field of classification and target detection [19] in recent years. Lu et al. [20] proposed a general transfer learning-based framework for load forecasting with limited data. e influence of adopting different kernel functions in transfer learning for fault diagnosis is studied by Li et al. in [21]. Scholars in [22] investigate the superiority of transfer learning in extracting features and aim to predict the wind speed in different environments. Yin et al. [23] proposed a hybrid transfer learning-based wind power forecasting model. Unfortunately, the potential relationship between statistical properties in time series and transfer learning is ignored in these works.
One critical issue for developing transfer learning-based forecasting methods is how to train a forecasting model with strong generalization capabilities. Many published forecasting methods for smart grids are based on the assumption that historical data follow the same distribution. e scholars in [24] made great improvements in load forecast based on machine learning in certain areas, but the generalization ability of their proposed method is not very promising since the differences between distributions of data are not considered. In a typical grid, especially those with high penetration of the renewable system that introduces high stochasticity, the distribution of the data in the temporary structure changes over time. Consider the illustrative graph in Figure 1, the probability distribution of P x varies for different intervals, and the temporal covariate shift phenomenon happens after adding a new segment of data, where P a ≠ P b ≠ P c ≠ P test . Here, the aforementioned issue is embodied in two aspects. First, how to build an adaptive prediction model to weaken the effect of covariate shift and accommodate the diversity of sample data. Second, how to develop a probability distribution algorithm to minimize the divergence between the distribution for different intervals.
Load prediction provides basic data for generation planning, day-ahead market offers, and intraday market trading, and it is important for the economic dispatch of power system. In the wind/photovoltaic/energy storage complementary microgrid, the generation plan of energy storage is the only dispatchable part. Researchers in [25] proposed a general method for the capacity and power of energy storage batteries and constructed a capacity allocation scheme for energy storage batteries, researchers in [26] introduced the control and communication technology, and operation principle of cloud energy storage based on the Irish power system. At present, the research on shared energy storage is in its initial stage, and the existing work takes shared energy storage systems as the main research object to analyze the business model and profitability of shared energy storage systems, while in-depth research on the charging and discharging behavior and economic benefits of users' participation in the shared energy storage system is limited.
is paper introduces shared energy storage plants among different user groups and establishes an optimal scheduling model with the objective to minimize daily operation cost of user groups. e main contributions of this paper are summarized as follows: (1) To address the challenge that the generation center fails to develop power planning for grid operation due inadequate local user data, we propose a power forecasting method based on transfer learning, where data from the source domain can provide valuable reference information. Additionally, case studies provide a detailed analysis of how to choose the appropriate source domain, and the effect of negative transfer on model performance is analyzed. (2) is work proposes to model the time series of load prediction from the covariate shift perspective. To train a forecasting model with strong generalization capability with transfer learning, we generate a combination mode where the probability distributions of time series vary for different intervals, and an optimal split method is proposed to ensure that the segments being divided are the most dissimilar ones. e temporal distribution matching algorithms are proposed to minimize the divergence between the distribution for different intervals. Dynamic programming (DP) is applied to optimize the optimal division points. e case study shows that the maximum improvement of the proposed forecasting method is up to 52.8% in mean absolute percentage error (MAPE) compared to other transfer learningbased methods, and up to 64.4% compared to the traditional method.
(3) With the accurate load prediction obtained from (1) and (2), a novel shared energy storage station (ESS) concept is proposed to form a framework for optimal scheduling based on the transfer learning method. e case study shows that the proposed framework can reduce the overall operating cost of the microgrid and maximize the benefit of grid operation, thus addressing the issue of energy curtailment [27] as well as the high cost of energy storage. is paper is organized as follows: Section 2 presents the structure of the transfer learning-based forecasting algorithm with the limited data set, and modeling time series from a covariate shift perspective, which is critical to determining the generalization ability of forecasting models. Section 3 proposes a framework of optimal dispatch planning for the distributed microgrid, a shared energy storage station (ESS) is formed as a solution for multisource power grid scheduling, and the dispatchability of the cyber-attacked area is analyzed when using the proposed optimal economic dispatching method. e performance of transfer learning in addressing fragmented test data of the target domain is then developed. Moreover, using data obtained from the proposed forecasting method, the economic analysis of the proposed energy storage station is shown in Section 4. Finally, the main findings are included in Section 5.

Methodology of the TCS-Transfer
Learning Model 2.1. Transfer Learning-Based Structure. In most machine learning tasks, the training and test sets come from the same feature space and are subject to the same probability distribution. When the actual conditions are not satisfied, it takes a lot of resources to collect data from the target domain to retrain a model. In particular, in a feature-rich smart grid, retraining the model becomes inefficient and time-consuming, which can seriously disrupt the schedule of the generation center, transfer learning [28] is a new solution to this problem. Maximum mean discrepancy (MMD) is applied in this paper to define a more specific formula for the TL problem: (1) where H represents the vector space that satisfies the objective function, N s means the number of samples of the source domain D s , and D t means the source domain and target domain. N t represents the number of samples of the target domain, and l(x, y) denotes the loss function suitable for different structures, and x and y represent the sample and label, respectively. Argmin denotes the value of the variable that minimizes the objective function. ξ represents the metric coefficient. is paper selects root mean squared error (RMSE) as the measurement of error. It is considered that MMD is one of the most widely used distance metrics in TL among many statistical measures, which is an effective way to measure the correlation between any two different domains D i and D j , and it can be formulated using the following equations: where D i and D j represent any two different domains, and their sample sizes are m and n. k(·, ·) represents the Gaussian kernel function. Radial basis kernel function k(x, x ′ ) is used for mapping to the higher dimensional spaces. x ′ is the center point of the kernel function, σ denoted the expectation, which controls the range of action of the Gaussian kernel function; the larger its value, the larger the local range of the influence of the Gaussian kernel function. F denotes the unit ball in the regenerative nuclear Hilbert space (RNHS). It is worth noting that not all source domains are suitable to be selected, Krizhevsky et al. [29] provides an in-depth analysis of the role of pretraining models for transfer tasks, a pretraining model can be used as benchmark models for the task of the target domain, researchers in [30,31] also demonstrate that for datasets with different distributions, pretraining in an appropriate source domain can greatly improve the accuracy of the results, and source domain that has a weak correlation with the target domain may cause a negative transfer phenomenon. erefore, it is essential to analyze the discrepancy between the D s and D t ; this paper measures the difference by calculating the MMD value, and the source domain can be selected for pretraining if the MMD is below the preset threshold value.

Problem Formulation of Temporal Covariate Shift (TCS).
e variate x i of the time series is assumed to follow the same probability distribution in most existing prediction methods, it may have achieved satisfactory results on specific datasets, such as the load of a stand-alone device in a traditional predict scene, which is relatively uncomplicated in its diversity. However, this assumption is not realizable in the actual application due to the huge amount of data and features, the variation of data distributions with the time changing cannot be ignored.
us, our problem can be formulated as follows: split a given time series S with m labeled dataset into k segments with the most dissimilar distribution S 1 . . . S k .
With reference to the definition of covariate drift [32] in the classification field, the temporal covariate shift can be presented such that the whole intervals in the same period i follow the same probability distribution P S p,q (x, y). e distribution will be different when the time period changes To train a prediction model with excellent generalization performance under temporal covariate shift, the main issue is to capture the common knowledge shared among different periods of S p,q .
According to the principle of maximum entropy [33,34], finding intervals that are the most distinct from each other can help maximize the capacity of shared information within a time series under temporal covariate change issues. It is fair to make distributions of each interval as diverse and feasible to maximize the entropy of the overall distributions of an array. is enables for more generic and adaptable future data modeling. Worst-case training using the original sequence enables the model to cope with the stochastic nature of the unknown data. Figure 2 shows the structure of the proposed predicting method based on transfer learning, and the primary task is to split the time series into k segments. e splitting problem of the time series in (5) is solved by the greedy algorithm.
where array λ[1...m] represents a set of time series, m represents the length of it, MMD(·, ·) means the distribution-based distance function to measure the distance between distributions of any two segments S p ,

Matching Process and Fine-Tune.
e proposed method is designed to acquire the common information shared by distinct intervals by comparing their probability distributions, this section presents the process of how to pretrain a model after obtaining the optimal split intervals in raw data. In comparison to approaches that simply depend on local or statistical knowledge, the pretraining model M can produce a nice generalization on unknown datasets of the target domain. e pretraining issue formulated in Section 2.2 can be solved by the domain generalization (DG) [35] method, and the distribution matching loss of the network can be established as follows: (6) where L(·, ·) denotes the MSE loss in the source domain, and S i and S j are any two different intervals of λ[1 . . . m].
In the proposed method, LSTM is employed as the main body of the network structure. Due to the special memory unit of LSTM, potential relationships between data in a time series can be preserved, which can provide high accuracy for the prediction results. Compared to the conventional recurrent neural network algorithm, which contains only one state h t , the LSTM structure introduces cell states c t to develop potential relationships in a long time series. e LSTM structure can be formulated as follows: 4 Computational Intelligence and Neuroscience where x t is the input data, W i,c,o represent the weight matrices, i t , f t , and o t are the input, forget, and output gates of the LSTM structure, respectively. b represents the bias value. c t−1 denotes the state of the memory cell. e candidate value c t is generated by tanh layer. h t presents the output value, h t−1 represents the output value of the previous unit, the sigmoid function is denoted as σ, and * represents the dot product. ⊙ is the element-wise product. Figure 3 shows the structure of LSTM; the type of data being inputted is described in Section 4. Figure 4 shows the flowchart of the proposed transfer learning-based forecasting structure. e implementation steps of the whole network are as follows: (1) Start by collecting the raw data of power fluctuations from neighboring cities and calculating the probability distribution of the candidate datasets (2) Calculate the distance between the source and target domains according to eq. (2) and select the appropriate source domain as the input data of the pretrain model (3) For the selected source domain, it is first divided into k most dissimilar segments using the proposed dynamic programming-based method according to eqs. (4) and (5). (4) e k segments are considered as different domains.
Eq. (6) is used as the new loss function in the model with LSTM as the main network. After obtaining a prediction network with strong generalization capability, the target domain with very less data is used to fine-tune by using Eq. (1)-(3).
e proposed temporal covariate shift issue focuses on an easily neglected problem in time series Dynamic programming solves the optimal spilt problem of the time series, considering the fragment as k independently distributed individuals.
e generalization ability of the pretraining model using source domain data is greatly improved by taking the differences between these segments as an additional consideration in the loss function. e deep LSTM network can be trained with less learning time while overcoming the problem of lacking sufficient local data as training datasets.

Smart Grid with Multiple Sources of Energy Supplies.
Smart grids are increasingly used in recent years for both residential and industrial purposes. In many microgrids, the hybrid wind-solar generation plant is employed owing to the complementary nature of wind and solar generation patterns. erefore, the smart grid with renewable generations is studied in this paper as shown in Figure 5. From the perspective of the Internet of ings (IoT) network structure of the smart grid, the data is collected by a large number of    Computational Intelligence and Neuroscience smart meters in the users' network, encrypted by the gateway, gathered by the data aggregator, and then transmitted to the control center, and this network data distribution characteristic is called the "funnel effect" [36]. In a smart grid with deeply integrated information systems, the form of fault propagation has more possibilities. When a node in the network fails, it will trigger the failure of the related nodes, and the existing scheme requires all users to work collaboratively. As long as one user fails, the whole system cannot operate normally. e transfer learning-based forecasting method proposed in this paper can substantially reduce the dependence of generation centers on the adequacy of local data. e generation plant of the microgrid is divided into N clusters, each having wind turbine and solar photovoltaic (PV) systems. e output power of the wind/PV system in the j-th cluster can be expressed as follows: where P w,j , P PV,j , and P D,j represent the output power of wind turbines, photovoltaic systems, and diesel systems, respectively. In modern microgrids, generators are usually used as backup power sources, the term P D,j (t) in (8) can be ignored if no power failure occurs. It can be further expressed in per unit (p.u.) value denoted by P * G,j (t) as eq. (9), where P G,base (t) denotes the base power in the DC bus, P G,j (t) is the actual power value of the DC bus, andP Bat (t) represents the power of the battery.

Operation Mode and Market Rules of a Shared
Station. e concept of a shared energy storage plant is shown in Figure 5, in which the operator of an energy storage station uses the financial advantage to establish a large shared energy storage plant among a group of customers, and unifies the operation and management of the energy storage station to provide shared energy storage services to multiple customers in the same distribution network area. e generation center forecasts the load based on historical electricity consumption data and plans to use shared energy storage plants for charging and discharging, which minimizes the economic cost of operation of the storage devices and saves the customer the investment cost of installing and maintaining the storage devices. Power market rules require the dispatch plan submitted to the grid operator 24 hours ahead. Figure 6 depicts the framework of the proposed power dispatch schedule. e main purpose of this work is to solve the issue that the predicting process cannot operate due to insufficient local data, thus, the power data of the neighbor area is employed to train highly generalizable models, and the precise prediction model can be obtained after finetuning with few local data. e generation center of the energy shared storage power station transmits the remaining power of users who need charging directly to users who need discharging according to the charging demand and discharging demand of each user at each time. e generation station of energy storage station can utilize the complementary nature of customers' power consumption behavior, i.e., the difference in power consumption behavior of the same customers at different times and different customers at the same times, and can maximize the investment in the least amount of energy storage to meet customers' demand for energy storage use. e specific optimization strategy of shared storage stations is detailed in Section 3.3.

Optimal Economic Scheduling of Shared Energy Storage
Station. Having explained the forecasting method and the possible operational modes of the microgrid, the next task is to develop the dispatch strategy for the shared ESS to economically benefit users. e high investment cost of energy storage is the main reason that limits the application of energy storage technology on the demand side of the grid. is paper proposed the concept of a shared energy storage station, as shown in Figure 7, which is applied to the economic optimization scheduling of regional users, and the minimum daily operating cost of the user group is achieved by coordinating the charging and discharging power of the users. e energy storage system allows users to store electricity during the grid valley hours and release it during peak hours, thereby decreasing electricity costs and relieving the pressure on regulating the peak load. According to the charging demand and discharging demand of each user in each period, the generation center will deliver the remaining electric energy of the user who needs to discharge directly to the user who needs to charge. If the total charging and discharging demand of users in the same time period is discharged, the regulation center will decide whether the users' electricity needs to be purchased by the main grid or stored in the shared power station according to the electrovalence at that time.

Computational Intelligence and Neuroscience
Based on the predicting method proposed in the previous sections, this part develops a scheme to obtain the minimum grid operating cost through proper charging and discharging of the shared energy storage station. By using the shared energy storage station, the user saves the investment costs for the installation and maintenance of energy storage devices. Users pay the service fee to the generation center in exchange for shared energy storage services. e service fee means the users pay to the generation center when they use the shared energy storage stations for charging and discharging, it is set as 0.16$/kWh.

Objective Function of the Optimization Model.
According to the market rules presented in Section 3.2, the user group connected to the shared energy storage station uses the typical daily operating cost optimization as the objective function to determine the capacity, the maximum charging and discharging power of the energy storage station, and the charging and discharging power of the storage station for each time period of the user. e daily operating cost of the customer group includes the cost of electricity purchased from the grid and the service fee paid to the energy storage station.
where C represents the daily cost of electricity for the user community; C g and C s denote the cost of electricity purchased by the customer from the grid and the service fee paid to the energy storage station, respectively.
where N represents the serial number of the user group connected to the same shared energy storage station, user groups of three areas are selected as case studies in this paper (i ∈ [1, 3]); ρ(t) ($/kW · h) represents the price that the users purchase electricity from the grid. T denotes the scheduled time periods; P G,i (t) indicates the power value purchased by the user i from the grid at a given time interval t; Δt represents the unit time length of the power scheduled; δ(t) is the service fee of the shared energy storage station; P E,D,i (t) and P E,C,i (t) are discharge power and charging power of energy storage station at time t, respectively.

Constraint Condition.
e constrains should be met by the proposed power planning model. ey include the following: electrical power balance constraints [26] and operational constraints of the energy storage stations: (1) Power balance constrain of the whole grid: Computational Intelligence and Neuroscience 7 P G (t) + P PV,i (t) + P W,i (t) + P ESS (t) − P L,i (t) � 0, where P G (t) is the power purchased from the main network; P PV,i (t) represents the output power of the PV system of the i-th user in time interval t; P W,i (t) is the wind power of the i-th user in time interval t. P L,i (t) denotes the load power of the i-th user, which is a predicted value obtained by the method proposed in the previous section; P ESS is the power of the energy storage station, which satisfies the maximum power difference between generation and load in the grid for any period of time; P E,D,i (t) and P E,C,i (t) represent the power under discharging and the charging status of the energy storage system. (2) Charging and discharging power constraints for a shared energy storage station: where P max is the rated maximum power of the charging and discharging power of the energy storage station; δ max ESS D and δ max ESS C are defined as the discharging and charging state factor, respectively, which ensure that the energy storage is not in an overcharged and discharged state; SOC min and SOC max represent the operating range of the energy storage; SOC l and SOC h represent the optimal working interval for energy storage; SOC denotes the state of the charge value of the energy storage; SOC 0 is the initial state of charge; η is the charge and discharge efficiency; Q represents the electric charge quantity; I is the battery current.
(3) Power balance constrain of the energy storage station: e charging and discharging power of each user in a time period t needs to be balanced with the charging P C (t) and discharging P D (t) power of the energy storage station.

Resolve Method.
e few nonlinear terms in the abovementioned constraints can significantly increase the difficulty and time during the solving process of numerically solving the aforementioned optimization problem. To overcome this challenge, Big-M [27] is adopted to linearize the nonlinear constraints in this work. e user scheduling model based on shared energy storage stations can be converted to a mixed integer linear programming problem.
us, (15) can be reformulated as follows: where the value of M is determined as 10 8 in this paper.

Setup and Experimental Result of the Proposed Forecasting
Method. Field load data of three regions from Western Australia are utilized for the case studies, covering the time range of May 1, 2015, to July 1, 2021. A map of the regions of the presented case studies is shown in Figure 8, where Case 1 and Case 2 are two typical industrial type electricity consumption areas, and Case 3 is the residential user group. Each node represents a region of independent microgrids that do not interfere with each other and are powered primarily by renewable energy. It is noted that 80% of the historical data are used as the training set and the remaining 20% are used as a validation set. e train/valid sets are structured with a ratio of 8 : 2, the test set is from the real load data from users. When load forecasting is implemented, 168 steps are batched together to train the model and predict the net power in the next 24 steps, and the sampling interval is 1 hour and the forecasting horizon covers 24 hours. e input feature mainly consists of the population, temperature, and calendar data which contain the season, number of holidays, and weekdays. e features are summarized in Table 1. Figure 9 shows the raw data collected from three cases presented in Figure 8. As shown by the yellow line, part of the data in the western mining smelter (Case 2) is absent because data loss occurred due to aggregator overload. e common deep learning-based predicting models are not universal for different data sources, and each region needs to 8 Computational Intelligence and Neuroscience train a locally applicable predicting model based on its own database. According to the proposed transfer learning-based model, after comparing the MMD value of each database associated with the neighboring grid, the database of a microgrid in western mining Kambalda is chosen as the source domain.
To demonstrate the performance of the proposed structure, this work compares the proposed TCS-transfer learning model with four categories of methods listed as follows: (i) Traditional time series forecasting model LSTM with classical gradient descent: there are also BP [37], ANN, and CNN [38] belonging to this category, and the most applicable time series predicting model-LSTM is selected in this category (ii) e latest time series structure ELM [39] without backward propagation: the convergence time is substantially reduced and the training efficiency has been greatly improved. e first two categories are classical methods of time series forecasting. (iii) Variants of popular domain adaptation methods include MEDA-LSTM [40] and MMD-RNN [41], which are also based on the concept of transfer learning (iv) A branch of transfer learning is that a transformer with an attention mechanism [42,43] has a stronger generalization capability than the classical method. is paper uses 6 encoder blocks of a transformer and 8 heads for self-attention. e relevant parameters are listed in Table 2. e parameters that produce the best performance for each model are tuned by K-fold cross-validation. e following comparison discusses the effectiveness of the proposed TCStransfer learning model mainly from two aspects: first, the power grid of the western mining smelter (Case 2) is with missing data, thus the traditional methods such as LSTM and ELM included in categories i and ii failed to forecast the load power due to the fragmented dataset in Case 2. For cases of missing data in the test set, transfer learning is the only method that can solve the problem. e first part of the   Computational Intelligence and Neuroscience comparison in this section uses different transfer learning based approaches to predict the load power of Case 2. Second, this paper presents that the proposed method is also suitable for the time series forecast with a complete dataset. e comparison group of Case 1 and Case 3 contains both traditional and transfer learning based methods.

Performance of Transfer Learning in Addressing Fragmented Test Data in the Target Domain.
is section mainly discusses the differences between the proposed TCS-transfer learning model and other transfer learning-based methods in dealing with load forecasting problems. In the first set of experiments in this paper, transformer, MMD-RNN, and MEDA-LSTM are used to compare the superiority of the proposed approach among transfer learning-based learning methods, the forecast load result is shown in Figure 10. Table 3 presents the prediction error of each method with RMSE and MAPE. ese prediction errors are evaluated by where x t denotes the predicted value of samples, x t is the actual value of samples, and n represents the number of samples. e convergence times of different forecasting methods are listed in Table 4; it takes 0.34 s for convergence with regard to the proposed TCS-transfer learning structure, which is a 60.4% increase compared with the MMD-RNN. e prediction accuracy (MAPE) is improved by 52.8% over the RNN-based method, considering GPU cycles with Intel Core i9-12900K. Although the proposed method is slightly slower than the transformer in terms of convergence speed, it has a great improvement in prediction accuracy.
To evaluate the effect of different splitting methods on the training results, two additional methods of dividing the time series in Figure 11, i.e., splits A and B, were designed in this work. Split A denotes that the sequence is randomly divided into k segments; split B is the method where all intervals are with similar distributions, which is the opposite of our proposed split method. e proposed split method has the objective of minimizing the cost function in (5). e "distance" on the Y-axis means the distribution distance MMD with the green line and the RMSE denote the error with the blue line. As a result, it is critical that we divide the periods according to the worst case, where the distributions are the most varied. Figure 12 shows the results of power dispatch solved according to eqs. (10)- (19), whose location are illustrated in     Computational Intelligence and Neuroscience Figure 8. It can be observed from Case 1 in Figure 12 that the PV output power of the user group in the morning is less than the demand side power; the demanded electricity for this period is purchased from the grid as well as using part of the shared power station considering the optimal economics. When the PV power is higher than the demanded power, the surplus power within the community is stored by the storage power plant to avoid energy curtailment. At the period of 15: 00-20:00, the demand of the community cannot be met by the PV system. While the electricity pricing is high in this time period, the undersupplied energy is provided by the shared energy storage station. Table 5 provides the electricity rate obtained from the energy provider. e configuration of the shared energy storage plant results in a capacity of 2,508 kWh and a maximum charge/ discharge power of 637 kW. It can be observed from Figure 13 that at the period of 01:00-06:00, the electricity of users is purchased from the main grid and the power station does not provide electricity to the customer. During the 10: 00-17:00, the energy storage station is in a charging state and the power rises from 382 kWh to the maximum value of 2,000 kWh, where the charging and discharging power value of the energy storage plant is negative, which means that the energy storage plant is charging; if the charge/discharge power value is positive, it denotes that the energy storage plant is discharging. As can be seen from Figures 12-13, the electric loads of communities 1-3 have reached a balanced state, and there is no phenomenon of energy curtailment, meanwhile, the energy storage station returns to the initial operation state after one cycle of operation, ensuring the normal operation of the next cycle of the energy storage station. Table 6 compares the daily cost of each case considering different configuration methods of energy storage, the first method presents the independent energy storage system within each user. e total operating cost is AUD 2618.6, which is 36% more expensive than our proposed planning method, and the capacity of the energy storage required by the customer is reduced by 37.3% due to the complementary nature of the customer's power consumption behavior. Moreover, the energy curtailment is well addressed and the output power of renewables is fully utilized. One limitation with shared energy storage stations is that they are prone to cause harmonics after they are connected to the grid, which can compromise power quality; therefore, more management needs to be put into safe operation when using energy storage stations.

Conclusion
is paper proposed a framework for smart grid scheduling that is less reliant on local data while capable of delivering schedules with low operating costs. Specifically, the proposed framework contains the following: (1) a power   Optimization planning results of schemes 1: independent configuration of the energy storage system within each user. Optimization planning results of our work: each user has access to a shared energy storage station.
forecasting model based on deep transfer learning which can provide high-quality load prediction with limited training data; (2) a novel adaptive time series prediction method based on a neighboring area dataset that aims to train the forecasting model with strong generalization capability; (3) a day-ahead optimal economic power scheduling model considering the shared energy storage station. Results based on a case study with field load data in Western Australia showed that the maximum improvement of the proposed forecasting method is up to 52.8% in MAPE compared to other transfer learning-based methods, and up to 64.4% compared to the traditional method. e total operating cost after optimization according to the proposed method was reduced by 36.1%. ese numbers indicate the proposed framework is a promising approach to solving power planning problems with incomplete datasets, in particular in addressing the cyber threats.
In the future, we plan to explore a deeper extension of TCS-transfer learning to a transformer for better performance. Moreover, this work only designed a centralized energy storage system. If multiple energy storage is needed, optimal coordination between these dispatch-oriented energy storage systems would be considered a promising area for future investigation.
Data Availability e features data input into the predicting model used to support the findings of this study have been deposited in the following repository: (1) Exemplary Energy Partners Company. (http://www.exemplary.com.au/), (2) Office Holidays (https://www.officeholidays.com/countries/australia/2021), and (3) Australia Net Migration Rate 1950-2022 (https://www.macrotrends.net/countries/AUS/australia/netmigration). e net power data used to support the findings of this study are currently under embargo, while the research findings are commercialized. Requests for data, 6/12 months after publication of this article, will be considered by the corresponding author

Conflicts of Interest
e authors declare that they have no conflicts of interest.