Intelligent Detection and Recovery of Missing Electric Load Data Based on Cascaded Convolutional Autoencoders

Under the background of Energy Internet, the ever-growing scale of the electric power system has brought new challenges and opportunities. Numerous categories of measurement data, as the cornerstone of communication, play a crucial role in the security and stability of the system. However, the present sampling and transmission equipment inevitably suffers from data missing, which seriously degrades the stable operation and state estimation. ,erefore, in this paper, we consider the load data as an example and first develop a missing detection algorithm in terms of the absolute difference sequence (ADS) and linear correlation to detect any potential missing data.,en, based on the detected results, we put forward amissing recoverymodel named cascaded convolutional autoencoders (CCAE), to recover those missing data. Innovatively, a special preprocessing method has been adopted to reshape the one-dimensional load data as a two-dimensional matrix, and hence, the image inpainting technologies can be conducted to address the problem. Also, CCAE is designed to reconstruct the missing data grade by grade due to its priority strategy, which enhances the robustness upon extreme missing situations. ,e numerical results on the load data of the Belgium grid validate the promising performance and effectiveness of the proposed solutions.


Introduction
Nowadays, measurement data are the foundation of the power system. e massive collected data especially the quality of electricity such as voltage, current, and load are tightly associated with safe operation and economic dispatch [1]. However, due to the growing size of the power grid and the massive number of field sensors, the absence and anomalies of data measurements cannot be avoided, which is mainly due to the failures and disabilities of terminal equipment or performance degradation of transmission channels [2,3]. e problem of missing data may lead to serious consequences including stability, optimization, and fault prevention [4]. In addition, the measured values of the missing data can be replaced by unknown noise, which makes them more difficult to be perceived and diagnosed. us, the efficient and accurate data detection of data anomalies and recovery of the missing data is a fundamental for the development of data-driven analysis and advanced algorithmic solutions.
Essentially, the detection of missing data can be classified as a branch of abnormal or outlier detection, and the related researches have been investigated in the previous decades [5]. In the conventional algorithms, the methods involving residual and sudden change are discussed [6,7]. Based on statistical analysis, 3σ criteria [8], Z-score [9], and clustering [10] are introduced. Specifically, in the power system, scholars suggest approaches combined with the correlativity of multimeasured data to improve accuracy [11]. However, these solutions are susceptible to data pollution due to the existence of missing data and lead to unsatisfactory performance.
In recent years, with the remarkable development of artificial intelligence (AI) technologies, more and more scholars concentrate on the application of AI technologies in data recovery [12]. In literature [13], an unsupervised learning framework based on Wasserstein generative adversarial network (WGAN) is proposed to repair the missing data of active power, reactive power, voltage amplitude, and phase in power system, which achieves high accuracy, but the missing mask is required to be detected in advance, and the processing efficiency is relatively low due to the one-dimensional convolution. In [14], the adaptive neural fuzzy inference system (ANFIS) model is developed to recover the missing data of wind power. It performs better compared with traditional empirical methods but is difficult to generalize to other data. Furthermore, many other solutions have been adopted (e.g., [15][16][17][18][19][20]), but almost all the discussed solutions will deal with the missing data indiscriminately and without priority, which brings bad performance in extreme situations. is paper will take Belgium load data (http://www. elia.be) as an instance to propose a new model to detect and recover the missing data. Firstly, beginning with the analysis of the characteristics of missing data, we present a detection method with ADS and linear correlation to detect the potential missing mask from the input incomplete data. Secondly, preprocessing will be applied to the incomplete data and the detected missing mask as well, which reshapes them as images (matrices). Finally, we demonstrate a CCAE model to address the missing regions in the grade of the image by grade with defined priority. e rest of this paper will be organized as follows. In Section 2, related work and basic theories are introduced. In Section 3, the method of missing detection is investigated in Section 3.1 and then the missing recovery algorithm is developed in Sections 3.2 and 3.3. Section 4 gives some numerical results and discussion based on the load data of the Belgium grid, where a missing mask generation model is designed and employed for testing. Finally, Section 5 will conclude this paper and list the strength and weaknesses of the proposed model, and also, future work is discussed.

Related Work
In this chapter, the related work including abnormal detection, convolutional neural network (CNN), and autoencoder (AE) is introduced, which are the basic technologies applied in this paper.

Abnormal Detection.
ere many abnormal detection models widely used in the literature. e elementary idea is to model the pattern of data and then set a proper threshold or condition to pick out abnormal data in datasets. In this part, the 3σ criteria will be explored.
For the dataset that obeys the Gaussian distribution or known as a normal distribution [21], N(μ, σ 2 ), as shown in Figure 1. e mean μ and standard deviation σ can be estimated through maximum likelihood estimation (MLE). According to the features of Gaussian distribution, the possibility of data lying in range (μ − 3σ, μ + 3σ) is 99.7% [22]. Hence, the data out of that range could be labeled as an outlier. Even though the original data maybe do not obey the Gaussian distribution strictly but just approximately, we can adjust the 3σ properly to 2.5σ or 3.5σ for examples, which still makes it work well.

Convolutional Neural Network.
A convolutional neural network (CNN) is known as a feedforward neural network with convolutional computation that is a typical image processing paradigm in deep learning [23]. CNN is capable of representation learning and can process the input image with shiftinvariant classification. erefore, it is called shift-invariant artificial neural network (SIANN) as well. An example of CNNbased classification is illustrated in Figure 2. e convolutional computation in CNN illustrated in Figure 3 differs from that in equation (1). In CNN, the convolution is done for two-dimensional input and does not require the reverse operation to the final output: (1) e reason we employ CNN instead of artificial neural network (ANN) [24] to process pictures is that the parameter sharing in CNN enables us to analyze images with much fewer parameters. is is because the fully connected layers are only used in the last several layers of CNN. Hence, its training efficiency is better than ANN, when tackling with the semantics of images. Also, the training strategies can be supervised or unsupervised, which depend on the targets.
Typically, there are different categories of layers in CNN, e.g., convolutional layer, pooling layer, and fully connected layer. e convolutional layer applies the convolutional kernel to the inner product the input, region by region, and the features can be extracted in the output, as demonstrated in Figure 3. e pooling layer is designed to reduce the output size of the convolutional layer and then diminish the required parameters in the following convolutional layers as well. Another important benefit is that the pooling layer can alleviate overfitting and increase generalization ability. e major kinds of pooling layers include max pooling and average pooling, and the computation is shown in Figure 4. e fully connected layer is similar to the layers in ANN. e only difference is that we will first flatten the two-dimensional output of the convolutional layer or pooling layer, as a one-dimensional vector, and then, the fully connected layer is employed, as described in Figure 5.
us, the fully connected layer is also named the flatten layer on CNN.
CNN has been widely adopted in computer vision, such as image classification [25] and object recognition [26]. And the development of autopilot is also firmly incorporated with CNN [27].

2.3.
Autoencoder. Autoencoder (AE) is a kind of supervised or unsupervised ANN used for data compression, representation learning, dimensionality reduction [28], and image denoising [29] which was first proposed by Rumelhart in 1986 [30]. An example of AE is presented in Figure 6. e output of AE is required to be the same as the input as possible, as defined in (2) to (4). After training, AE can efficiently encode the features of input data: f, g � arg min      As shown in Figure 6, AE has an encoder and a decoder, which essentially are fully connected layers. e encoder is responsible for feature learning, while the decoder should be able to reconstruct the input from the encoded features. Notice that when dealing with image problems, AE also can be made of convolutional layers [31] and that is exactly the basic structure of this paper.

Methodology
In this section, we will firstly discuss a missing detection algorithm to obtain the missing mask from the input incomplete data. en, the input data and the detected missing mask will be further preprocessed as matrices, respectively. After that, we build a CDCAE model to recover the missing data in the matrices and reshape the matrices back as onedimensional time series.

Missing Detection
3.1.1. Classification of Missing Segments. Usually, the missing data caused by the faults of sampling and communication equipment are mainly manifested as discrete missing points, continuous missing segments, or the combination of the two [32], as demonstrated in Figure 7. When the contact of the sampling terminal is loose or other disturbance occurs, the waveform of the data may appear as discrete missing points. However, in each link of transmission, the loss of data packet by temporary communication failure will result in a continuous missing segment. e discrete missing points can be regarded as special cases of the continuous missing segments with a length of one. Hence, we only consider the missing segments.
In fact, the measurement values at the missing segments are usually not exactly zero or NA, but the noise is distributed near zero. is kind of noise includes not only the background noise from sampling equipment and transmission channel but also the noise due to failures or faults as well. To simplify the model, it is reasonable to assume that the noise at the missing segments is subjected to the Gaussian distribution with zero mean and relatively smaller variance than that of the normal data signal.
In this work, the missing segments are classified as typical missing segments and atypical missing segments, as illustrated in Figures 8 and 9 . Since the normal data are very likely to be far away from zero while the noise at missing segments is distributed near zero, for the beginning and the end of a missing segment, the curve will show an abnormal jump. We call those missing segments the typical missing segments, which are the most cases as well. On the contrary, there is a very low possibility that the normal data before or after a missing segment are also distributed near zero, which leads to the overlap of the ranges of the normal data and the missing data. In this situation, the curve may have an inapparent jump at the beginning or end of the missing segment. erefore, we name those missing segments as atypical missing segments.
Here, the threshold of the abnormal jump is defined as the abnormal values in the ADS of the input sequence. e differential sequence (DS) for the load data is naturally subjected to the Gaussian distribution as well with zero mean; thus, we can simply apply the 3σ criteria to specify the abnormal values in ADS and then to locate all the abnormal jumps. However, because there might also exist abnormal jumps in atypical missing segments as indicated in Figure 9, it is still unable to locate the two kinds of missing segments.
In addition to the sudden jump of the curve, another feature might not be so noticeable. Under the small window size, the segments of normal data usually show a shape of a regular curve which has a high linear correlation for time, while the missing data segments will have a much lower linear correlation. is could be another important factor when distinguishing the missing data from normal data.

e Criterion of Sigma and Linear Correlation.
Based on the definition in Section 3.1.1, the detection problem can be separated as two subproblems for typical and atypical missing segments, respectively. is paper will firstly propose a method to diagnose the typical missing segments. And then with the results, the remaining atypical missing segments can be detected.
Assume the ground truth data as X � (x 1 , x 2 , . . . , x n−1 , x n ), the input incomplete data as X ′ � (x 1 ′ , x 2 ′ , ..., x n−1 ′ , x n ′ ), the missing mask as M � (m 1 , m 2 , . . . , m n−1 , m n ), and the noise signal as W � (w 1 , w 2 , . . . , w n−1 , w n ), then where m i is a binary number and m i � 1(0) represents x i is missing (normal), noise w i is subjected to the Gaussian distribution N(0, σ 2 N ), and ⊙ is the element-wise production operator. e steps to detect the missing mask M from the input incomplete data X ′ are described as follows: (1) Define the DS and ADS of X ′ as (2) Assume d i is subjected to N(0, σ 2 D ) and calculate the standard deviation σ D . Label all the a i in ADS(X ′ ) To evaluate our detection algorithm, we will refer to the confusion matrix [33] and calculate the precision P, recall R, and F 1 score [34], which is defined in 1 and in the following equations:

Normalization.
After the missing mask M in X ′ has been detected, it is necessary to normalize X ′ as X ′ � (x 1 ′ , x 2 ′ , ..., x n−1 ′ , x n ′ ) out of convenience, and a possible choice is shown as follows [35]: But due to the pollution from the Gaussian noise in the missing segments, we may not gain the real maximum and minimum values, which will lead to a gap in values distribution after normalization, and the values of data cannot fill this normalized range because the is possible even higher (lower) than the real maximum (minimum).
To avoid this problem, we will calculate the maximum and minimum values only in the normal data _ X � x i ′ | x i ′ ∈ X ′ , m i and then normalize x i ′ as e values of x i ′ can well fully fill the range (0, 1), and there is no more apparent gap in the distribution of the values as well. Notice that a potential advantage by doing so is that noise in x i ′ will be replaced by zero which represents missing data uniquely and vice versa. So, the following recovery algorithm can easily know which data are the missing data by just reading the zero values, even without knowing the missing mask M.
When preparing the training datasets, the ground truth data X also will be normalized as X � (x 1 , x 2 , . . . , x n−1 , x n ), where It should be noted that the local maximum and minimum values used here for normalization may not be the global maximum and minimum of the normal data before missing happens, for the reason the real maximum and minimum could be blocked by the missing mask.
us, when we normalize those blocked data, the results are probably beyond range (0, 1). But fortunately, since the data discussed in this paper are load data with obvious periodicity, the local maximum and minimum would be very close to the global maximum and minimum.

Grade.
Repairing the missing data is to figure out how to make the best estimation for the missing data based on the adjacent normal data. Hence, making full use of the adjacent normal data is the key to solve the problem.
We note that, for different missing data at the same missing segment, the specific locations of the missing data are different, and also the numbers of adjacent available normal data are different. In detail, the missing data near the beginning or the end of the missing segments are closer to the normal data, which makes them easier to be recovered, while the missing data at the center of the missing segments are far away from the normal data and difficult to be addressed.
To design more targeted recovery algorithms for different missing situations, this paper will introduce powerful improvement on the detected missing mask M in Section 3.1.2. e core idea is that the missing data at the center should be recovered based on the recovery of the missing data at edges, which indicates the edge missing data have a higher priority and are before being recovered: To distinguish the missing data at different positions, we grade M into K submissing masks T � (G 1 , G 2 , . . . , n ). e missing segments in M will be divided into smaller submissing segments from two ends toward the inside. e ratio of the submissing segments in G j concerning that in M is defined as R j : where g (j) i is a binary number similar as m i . In this paper, the hyperparameters K � 3 and corresponding R 1 � 40%, R 2 � 30%, and R 3 � 30%, as shown in Figure 10.

Reshape.
e load data in the power system change along with the patterns of society operation and production. Hence, it will show evident multiple periodicities in days, weeks, quarters, and years. When repairing this kind of data, only referring to the continuity between the directly adjacent normal data and the missing data (e.g., data before and after half an hour) is not enough and leads to bad performance. Instead, the periodicity should be taken into consideration, meaning that we have to also refer to the data in adjacent cycles (e.g., the data at the same time but different periods), since those data, to some extent, are indirectly adjacent and share very similar patterns, as presented in Figure 11.
In terms of semantic intensity, direct adjacency is stronger than indirect adjacency, but they can be combined as a reference. For the data at the edges of the missing segments, due to the constraint of the waveform continuity, the directly adjacent data are nearly the most important repair reference; but for the missing data inside the missing segment, there will not be any directly adjacent data for reference, while the indirect adjacent data become a very significant factor instead. However, traditional mathematical estimation and interpolation methods can only perceive data inside a very limited window around the missing data, so they cannot make full use of the information from indirect adjacent data. As a result, the repair accuracy is low, especially for the long continuous missing segments.
When dealing with similar problems, literature [35] proposes a method to transfer the one-dimensional harmonic data into a two-dimensional grayscale image by periodic truncation and reshaping, as demonstrated in Figure 12n � km. Inspired by this, this paper will reshape the one-dimensional incomplete data X′ and corresponding submissing masks G j into matrices.
Take X ′ , for example, assume the number of sampling points per day is m, for a dataset with k days, and the size .
where a i,j � x (i−1)m+j ′ . For the load data of Belgium grid, k � 2000 and m � 96.
And similarly, matrices A G j can be obtained. e twodimensional structure of the matrix enables the direct and indirect adjacency to be compatible with each other in rows and columns separately. And the shaped matrix can be understood as a special "generalized" image. Based on the above analysis, when we reshape the data, the problem of repairing the missing one-dimensional data becomes the problem of inpainting a two-dimensional image. In  Figure 10: Submissing masks G 1 , G 2 , and G 3 from the missing mask M.  Scientific Programming the following discussion, we will combine the deep learning and image processing techniques to address this problem.

Edge Padding.
After reshaping, the data located in the center of the matrices will have more available adjacent data than that at the edge of the matrices, which indicates the number of available adjacent data is radially attenuated outward from the core of the matrices, making the recovery of edges data more difficult than that of the central data.
To solve the problem of an unbalanced distribution of the available adjacent data in the radial direction, the most direct and effective method is to arrange some additional data on the edges. When we truncate the original one-dimensional data and reshape it into matrices, two adjacent data at the truncated point will be separated and then arranged at the end of this row and the beginning of the next row accordingly. So, the left edge and the right edge of the matrix are -"adjacent-" but misplaced by one row, as illustrated in Figure 13. us, the data in the left and right edges can be used as padding data for each other.
Take A X′ , for example, to improve this unbalanced distribution, two k-by-p padding matrices B X′ and C X′ are designed, and p is the padding depth: where Define the ratio of p and m as hyperparameter padding ratio η:  (19) where L � m + 2p.
And similarly, padding matrix Z G j can be attained.

Slice.
Because there are no padding data on the upper and lower sides of padding matrices, we will cut the padding matrices into smaller L-by-L slices and set proper overlapped rows as padding data on the upper and lower sides. Take Z X′ , for example, the slices are defined as S X′ , where the t-th slice S X′ (t) is a L-by-L matrix: where  1). Hence, the first and last p rows and columns are redundant data. And as a result, when recovering a slice, we only need to consider its center m-by-m region which is defined as the core region: where u (t) i,j � s (t) i+p,j+p . In particular, the core areas of the first (last) slice will include the upper (lower) p rows, which leads to a (m + p)-by-m matrix: Figure 12: Reshaping of the one-dimensional data to the two-dimensional image. 8 Scientific Programming where u (1) i,j � s (1) i,j+p and u i+p,j+p . Similarly, slices S G j can be obtained.

Missing Recovery.
When we use a two-dimensional matrix to represent one-dimensional data, the problem of recovering one-dimensional data becomes the problem of image inpainting. Recently, CNN and GAN technologies in the deep learning field have excellent performance on image inpainting. In [36], the author uses the previous five convolutional layers from the AlexNet [23] as an encoder to extract the features of images with missing areas and then uses six deconvolutional layers as a decoder to restore the missing regions from the learned features. Inspired by this, this paper will put forward a convolutional autoencoderbased network to recover the missing data in the preprocessed matrices.
As we know, the feature learning in the AlexNet is designed for the object classification problem, where the object positions will not matter since the category of an object does not depend on its position. And because of the use of the max pool, "valid" padding, and flatten layer, the size of the tensor will be compressed, which will blur the position information. erefore, it is nearly impossible to trace back to the original positions of features in deep layers, especially when the input data are damaged heavily.
For reshaped matrices, because the semantic information is not uniformly distributed in rows and columns, the original position of the feature is even more important when extracting the features. If the convolution and deconvolution framework is applied directly, the fuzzy location of features in deep layers will further degrade the situations. Moreover, since the "generalized" images are usually low-dimensional with low rank, the context encoder in [36] will have bad performance as the author reminded.
us, this paper presents a CCAE network based on the context encoder as follows: only use convolutional layers without any flatten, pooling, or fully connected layers, replace the padding mode from "valid" to "same," and set the stride as one which keeps the height and width of the output tensors in every layer equal to the input, namely, L-by-L. Finally, the output matrix will be restored to onedimensional.

Network Structure.
To recover the preprocessed incomplete data S X′ with S G j , a CCAE model is proposed with the structure shown in Figure 14.
ere are K convolutional autoencoders (CAE j ) blocks cascaded in CCAE corresponding to K submissing masks S G j . Each CAE j has an encoderE j , a decoder D j , and a filter F j . Encoder E j and decoder D j are made of Q convolutional layers, respectively, where the stride is one, padding mode is "same," and activation function is Relu. e filter F j is used to update the recovery results of the missing segments in S G j , that is, And then, F j will be the input of E j+1 in CAE j+1 , which ensures S G j+1 will be recovered based on the recovery result of S G j . In particular,  Figure 13: Misplacement adjacency between the left and right sides and self-padding.
Without a flatten layer and fully connected layers, we only employ two convolutional layers for feature learning. As a result, the learned feature will locate at its original position, which means this network will encoder a feature vector for every point in the input matrix and just put that feature vector at the same position as the computed point.
rough that, the position information can be retained during feature learning to the most extend.

Loss Function.
One thing we should notice is that the size of the output matrices of CCAE is still L-by-L, in which only their m-by-m core areas matter as mentioned in Section 3.2.5. Hence, the loss function L is defined as the root mean squared error (RMSE) of the missing data within core areas: Core S X rec where S X is obtained through the same preprocessing in Section 3.2.

Restore. Since S X rec
′ is a set of normalized L-by-L matrix slices S X rec ′ (t), it should be restored back to one-dimensional time series, which means the reverse operations of preprocessing in Section 3.2. Assume n s � k − 2p/m � k − 2p/m, then reorganize the core areas of slices S X rec ′ (t) as a k-by-m matrix A F : After that, reshape A F into a n-by-1 time series Y rec � (y 1 , y 2 , ..., y n−1 , y n ), where y h � f i,j for those (i − 1)m + j � h. Finally, reverse the normalization in Section 3.2.1 to get the restored results Y rec � (y 1 , y 2 , ..., y n−1 , y n ) in which

Experimental Results and Discussion
In this section, we will validate our detection and recovery algorithms. Load data from the Belgium grid will be conducted as training and test data, and a missing mask generation model is presented to produce generative missing masks under different parameters.

Experimental
If we just randomly select some segments in X as missing segments, the segments collision might happen, as demonstrated in Figure 16, which results in a lower missing rate than the given c.
erefore, a stratified sampling model is proposed to solve the problem, as shown in Figure 17.
Define the number of missing data as NMD, namely, en, divide NMD as NMS segments where NMS is the number of missing segments and Conv13 where α and β are length parameters of missing segments which determine the average length of missing segments ALMS: Randomly generate an integer between NMDα and NMDβ as NMS and then stochastically divide NMD as NMS segments; then, according to the proportion of each NMS segment, divide M as NMS subsegments, too. Finally, within every subsegment in M, we independently select a missing segment and set those bits inside the subsegments as 1 and

Scientific Programming 13
others as 0. is pipeline ensures c and the distribution of missing segments are independent. Some of the generative missing masks with given c, α, and β are shown in Figures 18 and 19 . For each set of parameters, four missing masks are generated independently, and the total number of data is n � 200; the blue regions represent normal data with m i � 0, while red regions mean missing data with m i � 1.

Dataset Configurations.
As mentioned, we will take the load data of the Belgium grid during 2014-2020 as an example, which involves 2000 days in total and 96 sampling points per day. e original data values range from 7000 MW to 14000 MW.
To better evaluate the proposed model, the detection and recovery components will be tested independently, which means the missing mask for the recovery component is the generative missing mask instead of the detected missing mask by the detection component. Hence, different datasets configuration will be used for the two components.
For the detection component, we assume noise W obeys the Gaussian distribution N(0, σ 2 N ) and define SNR for the normal data signal power P data and noise signal power P noise as where P data � 1/n‖X‖ 2 2 and P noise � σ 2 N . In the experiment, we consider SNR ranges from 15 dB to 40 dB. Since the original data range from about 7000 to 14000, if we directly generate a missing mask on that, nearly all the missing segments will be typical missing segments. To comprehensively evaluate the missing detection algorithm, we linearly map the original data values into (-500, 0), (-250, 250), and (0, 500), respectively. e factor of missing rate c should be investigated as well, which is set as 10%, 20%, and 40% separately. Besides, length parameters are fixed as α � 0.1 and β � 0.15, then ALMS � 8. e configurations are shown in Table 2. e indexes to assess the detection results are precision P, recall R, and F 1 score in Section 3.1.2.
For the recovery component, because the noise will be cleared as zero in Section 3.1.2, we do not care about the influence of noise W. Instead, parameters c, α, β, and p are studied. c will be set as 5%, 10%, and 20%, while α and β will be set as (0.1, 0.15), (0.05, 0.075), and (0.025, 0.0375), corresponding to ALMS of 8, 16, and 32. Furthermore, padding depth p will be set as 0, 7, 9, and 11. For each p, the missing mask contains mixed 3 × 3 combinations of (c, α, β). e configurations are listed in Table 3. e index to assess the recovery results is RMSE in Section 3.3.2.
In addition, the missing masks with the above configurations will be generated for ten times independently to avoid the stochastic disturbance in results. Table 3, 80% will be used as training sets and the remaining 20% will be testing sets. e specifications of software and hardware are presented in Tables 4 and 5 . e optimizer is "Adam," the learning rate is set to descend exponentially along the epochs, and the batch size is 20 (slices of S X′ (t) with corresponding missing masks). Figure 20, for the data mapped in the positive range (0, 500) or the negative range (−500, 0), all the precision, recall, and F 1 are all nearly 100% when the SNR is over 20 dB, while the SNR needs to be more than 30 dB to reach the same result for the data mapped in (−250, 250). is is     14 Scientific Programming because the ratio of the atypical missing segments for data mapped in (−250, 250) is much higher than the other two, and the noise on the missing segments in that situation will have a very high possibility to be overlapped with the normal data. Moreover, we might be unable to obtain enough typical missing segments for noise estimation, which influences the detection performance.

Results and Discussion of Missing Detection. As illustrated in
When the SNR decreases from 20 dB to 15 dB, there is significant deterioration in the results. F 1 of data mapped in (−500, 0) and (0, 500) can drop to 0.7. And it will be even worse for the data mapped in (−250, 250), where F 1 can be lower than 0.6. But fortunately, in most cases, the data are always positive or negative, which is far away from zero, and the noise on missing data is slight enough which ensures high SNR.
us, the detected mask could be regarded as the ground truth missing masks. is is also  15  20  25  30  35  40   15  20  25  30  35  40   15  20  25  30  35  40  15  20  25  30  35  40   15  20  25  30  35  40   15  20  25  30  35  40  15  20  25  30  35  40   15  20  25  30  35  40   15  20  25  30  35    the reason for just using the generative missing mask rather than the detected missing mask to test the sequential recovery component. Another phenomenon is that, in the high SNR region, the missing rate seemingly makes no difference to the detection, while in the low SNR region, the higher the missing rate is, the better the detection performs. It may be not so intuitive. But our further analysis indicates that when the SNR is low, the ratio of the false-positive samples will increase greatly because of the overlap of normal data and noise, even when the missing rate is zero. erefore, a higher missing rate will bring more positive Scientific Programming samples in turn, which decreases the false-positive samples to some degree.

Results and Discussion of Missing Recovery.
e RMSE for different padding depths is shown in Table 6. Generally, the CCAE model performs well on the missing recovery problem, and RMSE is pretty low for the normalized range (0, 1). Besides, the edge padding technique can further improve the recovery error, but too deep padding depth may lead to a drop in the performance, and in this experiment, the optimal choice is 9 with the padding ratio of 9.375%.
During training, CCAE is allowed to output high error on those padding areas, which makes the loss function converge more easily and efficiently. And without the padding areas, those errors would appear in the core areas unavoidably, which is also the reason we design the edge padding technique. eoretically, the deeper the padding depth is, the more the adjacent data can be used for the edges data in the core area, but with the increase in the depth, the distance between the padding data and core data will grow linearly, too. And once beyond a certain distance, the padding data can no longer provide any useful information, which instead causes a lower proportion of the core area in the padding slice and brings low computation efficiency.
To demonstrate the recovery performance of CCAE with p � 9, we randomly chose some output slices of S X rec ′ compared with the input slices S X′ and the ground truth S X in Figure 21, where those matrices are virtualized by grayscale images. e size of each slice is 114-by-114, and the blue, green, and yellow segments in submissing masks represent the submissing segments in G 1 , G 2 , and G 3 , respectively. e areas inside red rectangles are core areas. e core areas of the output images are nearly the same as core areas of the ground truth, even for those inputs with high missing rate and long missing segments. We even cannot tell the difference between the cores of output and ground truth. Only when zooming up the output, we can find some slight difference in the textures for the ground truth.
While in the edge padding area outside the core areas, there are obvious black holes as the blue circles shown in Figure 21 because of ignoring those areas when defining the loss function in Section 3.3.2. In Figure 22, we restore some rows of the output to one-dimensional to get Y rec , and the results are consistent with the previous discussion.

Conclusion and Future Research
is paper proposes a missing load data detection and recovery model based on CCAE. In the detection issue, we combine ADS and the linear correlation as a criterion to detect the potential missing segments. And based on the detection results, we further divide the detected missing mask into submissing masks with priority and then reshape the original one-dimensional data and mask into two-dimensional matrices for data enhancement. e constructed matrices are regarded as "generalized" images, which transform the recovery problem to images inpainting. Furthermore, the deep learning technologies are conducted, and we have designed a CCAE model to repair the input damaged matrices. To assess the algorithms, we build a missing mask generation model to generate missing masks. Numerical results on the load data of the Belgium grid indicate that the developed detection and recovery algorithms have satisfactory performance under different missing situations. It should be highlighted that the proposed intelligent detection and recovery solution can be used for other forms of time-series dataset.
Here, the strength of the proposed detection and recovery algorithms can be summarized as follows: it can be found that the missing detection is nearly 100% accurate for most situations; the missing segments can be recovered grade by grade with priority in submissing masks strategy, which ensures the recovery accuracy even for long-missing segments. Also, the reshaping from one-dimensional time series to the two-dimensional image is a powerful data enhancement method for the load data, which enables the

18
Scientific Programming CNN to understand the semantics of one-dimensional data. Finally, the structure of CCAE is not sensitive to the input size, so it is easy to make transfer learning to datasets with different periods. On the contrary, the proposed solution is still needed for further investigation as it has the following potential limitations: firstly, under the condition of low SNR, some of the normal data distributed around zero may be wrongly labeled as missing data. e training process requires a large amount of historical data, which is difficult for some problems. Also, the number of hyperparameters is too many to be optimized and demands for the expert experience.
In future work, further research effort is required to further improving the proposed algorithmic solution from two aspects. Firstly, the models can be further enhanced through the adoption of more sophisticated deep learning models. Also, the solution can be incorporated with a hybrid model that consists of multiple different machine learning algorithms. In addition, the proposed solution can be applied and validated for different time-series data in other application domains.

Data Availability
e Belgium load data used to support the findings of this study are available at http://www.elia.be.

Conflicts of Interest
e authors declare that they have no conflicts of interest.