Broad Echo State Network with Reservoir Pruning for Nonstationary Time Series Prediction

The nonstationary time series is generated in various natural and man-made systems, of which the prediction is vital for advanced control and management. The neural networks have been explored in the time series prediction, but the problem remains in modeling the data's nonstationary and nonlinear features. Referring to the time series feature and network property, a novel network is designed with dynamic optimization of the model structure. Firstly, the echo state network (ESN) is introduced into the broad learning system (BLS). The broad echo state network (BESN) can increase the training efficiency with the incremental learning algorithm by removing the error backpropagation. Secondly, an optimization algorithm is proposed to reduce the redundant information in the training process of BESN units. The number of neurons in BESN with a fixed step size is pruned according to the contribution degree. Finally, the improved network is applied in the different datasets. The tests in the time series of natural and man-made systems prove that the proposed network performs better on the nonstationary time series prediction than the typical methods, including the ESN, BLS, and recurrent neural network.


Introduction
Time series data is observed and measured over time in human society and the natural environment. e analysis and prediction of the time series data have drawn attention because it is vital for managing and controlling various manmade and natural systems. For example, the prediction of sales data is applied to optimize inventory and reduce social costs [1]. e stock data prediction can foresee the capital flows trend [2]. e precipitation [3], water bloom [4], and typhoon intensity are also predicted for natural environment protection and disaster prevention [5]. e trend forecast of air pollutants provides strong support for the decisionmaking of relevant departments in the future [6][7][8]. e nonstationary and nonlinear trend has been the obvious feature of time series data in various application contexts. It is impossible to extract and represent data trends intuitively because the change rule of the time series is stochastic and complex. It has been a research issue how to extract the data features and predict the future trend of the time series.
For the mainstream of time series prediction, there are statistical methods [9][10][11] and machine learning methods [12][13][14][15][16]. Machine learning includes external neural networks, deep learning networks, and broad learning system (BLS). Statistical prediction methods mainly include the autoregressive (AR) model, moving average (MA) model, autoregressive moving average (AR-MA) model, and differential autoregressive moving average (ARIMA) model integration, etc. ey transform nonstationary time series into stationary time series utilizing variance or integration. ey face application difficulty when dealing with nonstationary and nonlinear time series in existing systems because the value of the loss function is hard to reduce with the solution of stationary data transformation. Machine learning has been widely used with a novel modeling solution, which focuses on the data feature with a black-box model. It uses nonlinear algorithms to reduce the loss function with a network structure [17]. It can also utilize the computing ability of the hardware to improve model accuracy, minimize loss function value, and improve network fitting ability. e typical machine learning methods include support vector machines, decision trees, recursive neural networks, convolutional neural networks, BLS, etc. For the representative and hotspot methods, deep learning networks require large-scale computing resources, although they improve prediction performance. BLS reduces the consumption of computer resources under the same prediction accuracy [12]. Meanwhile, the echo state network (ESN) [18] has been proven effective in time series modeling with a simple training algorithm. erefore, it is explored to utilize the advantages of the different machine learning methods, including their structure and learning algorithm.
In the literature research and previous studies, it is found that the performance improvement relies on the network structure expansion and computing resources occupation. A solution should be explored with an appropriate network scale and a fast learning algorithm. e ESN does not need to update backpropagation parameters because only the internal weighting matrix should be updated. In contrast, the input and reservoir pool weight matrixes need not be updated. e learning mechanism of ESN is highly efficient. e BLS builds a horizontal network instead of the vertical network of deep learning. e parameter passing can be obviously improved without the progression of multiple layers. erefore, referring to the advantages of ESN and BLS, a novel network of broad echo state networks (BESN) is designed in this paper to predict nonstationary time series. e BESN has been mentioned in [19], in which the raw data are imported into multiple parallel running reservoir pools with the unsupervised learning algorithm of the restricted Boltzmann machine. It mainly arranges the echo reservoirs parallelly, and there are no concepts of mapping and enhancement layers of BLS. In this paper, BESN is a combination of ESN and BLS. e enhancement layer of the BLS is remolded with ESNs in this paper. Meanwhile, the redundant nodes may exist in ESN, and a pruning optimization algorithm is introduced. en, the broad pruning echo state network (BPESN) is finally established in this paper. is paper is organized as follows. e second section introduces related work of time series analysis and prediction method. e third section introduces the main study of BPESN proposed in this paper. e fourth section presents the experimental environment and experiments. e fifth section introduces the experimental results and analysis. e sixth section is the summary of this paper.

Analysis of Nonstationary Time Series Data.
A nonstationary time series can be described with the statistical variates, in which the mean and covariance of the data changes dynamically over time [20]. e nonstationary trend has been the typical characteristic of the time series data, especially for existing systems. e inherent complexity of actual data can be represented with the nonstationary indexes [21,22]. e nonstationary degree analysis is the basis for the time series modeling and prediction.
ere are many methods to test the stationarity of time series data, among which the most commonly used are the correlation test [23] and the Augmented Dickey-Fuller (ADF) test [24]. e correlation test determines whether there is trailing and truncation through the correlation function of the time series as the basis for judging whether the data is stable. e ADF test is a more scientific judgment method based on whether the mean and variance of time series change over time. Meanwhile, probability values (p), test statistics (TS), 1% critical value (CV 1 ), 5% critical value (CV 5 ), and 10% critical value (CV 10 ) will be generated in a standard test. e stability of the time series can be judged by judging p and the relationship between the test statistics and the critical value. e null hypothesis of ADF is that the detection sequence has a unit root and is a nonstationary time series. When p < 0.05, the null hypothesis is rejected, and the time series is stationary [25]. According to the Akaike information criterion (AIC), the test time series is nonstationary when p ≥ 0.05, TS > CV 1 , TS > CV 5 , TS > CV 10 , and the null hypothesis is not denied.

Time Series Prediction Method.
Early time series modeling mainly used statistical methods. Based on the randomness theory of time series, the AR model [26] and MA model [27] are proposed. e AR model uses the correlation between the previous data and the later data to establish a regression equation containing the previous data and the later data. e MA model solves the problem of summing the white noise of the AR model and organically combines the MA model and the AR model to form the ARMA model [28]. e ARMA model energizes the correlation between the current data and the previous data and at the same time can solve the problem of randomly changing items. ARIMA is commonly used in nonstationary time series [10]. ARIMA transforms nonstationary time series into stationary time series through the difference between adjacent time points. Machine learning closely follows the era of big data. It makes full use of computer configuration and gives birth to many typical neural network models, such as the Back Propagation (BP) model [29], Long Short-Term Memory (LSTM) model [30], Gated Recurrent Unit (GRU) model [31], BLS model, etc. e BP model is a multilayer neural network that propagates the error back and updates the weight matrix continuously. LSTM and GRU are specifically born in RNN to solve the problem of short-term memory. ey can also solve the problem of gradient disappearance and gradient explosion in RNN to a certain extent. Both LSTM and GRU have internal gates, which are used to regulate information flow. e difference between them lies in the number of gates. Compared with LSTM, GRU consumes less computing resources when achieving the same accuracy. BLS is different from deep learning, and the network structure is not deepened vertically but expanded horizontally and consumes more computer resources [12].
Statistical methods can accurately realize the modeling of stationary time series, but the effect will be poor when the data is nonstationary and nonlinear with violent fluctuations. Machine learning applies to the nonstationary time series with the network structure and the computing resources. When time series data is large, the machine learning model itself will be significantly affected. ESN and BLS can effectively deal with the problem that large datasets cause large computer resource consumption.

Echo State
Network. ESN is a recurrent neural network that consists of three parts: input layer, reservoir pool, and output layer. It can map the input data to high dimensions in the reservoir pool with an input weighting matrix [32]. ESN simplifies the training task of the network, in which only the weights of the output matrix should be trained without the traditional backpropagation. e topology structure of ESN is shown in Figure 1 [33].
In Figure 1, the input is defined as , and the network output is y(t) � [y 1 (t), y 2 (t), . . . , y L (t)]. K, N, L represent the number of input samples, the number of neurons in the reservoir pool, and the output dimension. To ensure the echo characteristics of the reservoir pool, the spectral radius should be set from 0 to 1. e status update and network output of the reservoir pool are as follows: where x(t + 1) represents the state vector of the reservoir pool at t + 1, y(t + 1) means the output of the model at t + 1, α represents the leakage coefficient of the reservoir pool, ranging from 0 to 1, W out ∈ R N×N represents the output weight matrix, f(•), g(•), respectively, represent the activation function of the reservoir pool and the output layer.
W in , W in ESN are randomly generated and remain unchanged during training and testing, so the only thing that needs to be adjusted during the learning process is W out .
ere are many solving methods for W out , including ridge regression [34], recursive least square method [35], pseudoinverse method [36], and singular value decomposition method. In this paper, ridge regression is selected to solve W out . e ridge regression is a biased estimation method, essentially a modified least-squares estimation method. Ridge regression is abandoned to obtain higher computational accuracy, unlike the unbiased method. It will be more suitable for the discomfort problem with a pathological matrix or pathological solution. e reservoir pool in ESN has a pathological resolution due to sparse connection, so the ridge regression is selected to solve W out . e calculation of W out is as follows: where X ∈ R N×K represents the state matrix of the reservoir pool and Y ∈ R L×K represents the actual output of the model, λ is a regularization coefficient, and I is the identity matrix. It is advisable to set the first element of I into zero to exclude the bias connection from the regularization [37].
Since ESN was proposed in 2001, it has been widely used in time prediction. However, the disadvantage of ESN is that many parameter selections are realized by trial and error method, and it cannot effectively learn an intelligent choice of models based on different data. erefore, many scholars optimize the design of its network structure. For example, Sheng C et al. proposed an improved version adding noise to the ESN network [38], and Jun xu Liu et al. proposed quantum-based ESN [39]. Among many parameters of the ESN model, the selection of reservoir pool size is significant. In this paper, pruning neurons in a single reservoir pool is performed by a pruning optimization algorithm. After several cuts, the reservoir pool will reach the desired state.

Broad Learning
System. BLS is a forward neural network based on a random vector function connection network [12] and an efficient machine learning method. e network structure of BLS includes the mapping layer, enhancement layer, and output layer. Compared with the random vector function connection network, the mapping layer replaces the output layer. In BLS, the original data are imported into the linear mapping and become a node mapping layer. It realizes the feature extraction of the original data. e mapping layer increases the model's ability to adapt to the nonlinear data. Finally, the combined matrix of the mapping layer's output and the enhancement layer's output is used as the input of BLS.
e ridge regression algorithm obtains the output weight matrix. e output of the mapping layer and the output of the enhancement layer are as follows: where W ej , W hi represent the weight matrix of the mapping layer and enhancement layer, β ej , β hi , respectively, represent the bias of mapping layer and enhancement layer, W ej , W hi , β ej , β hi which are randomly generated, n, m, respectively, represent the number of network input samples and the number of neurons of mapping layer.
We assume A m that it is the combination of the mapping layer's output matrix and the enhancement layer's output matrix. We assume Z � [Z 1 , Z 2 , . . . , Z n ]H � [H 1 , H 2 , . . . , H m ] that the output weight matrix can adopt the pseudo-inverse method commonly used in the neural network to solve the output weight matrix. However, when the input sample has a high capacity and requires a high training speed of the network, the pseudo-inverse method generally cannot meet the requirements [40], so regularization can be used to solve the output weight matrix indirectly [13]. e network output and output weight matrix are shown in the following: where Y represents network output, W out represents weight matrix of network output, and λ represents regularization constraint item coefficient. Meanwhile, the structure of ESN brings the redundancy nodes in the network, and the network should be optimized for a concise and effective structure. Concretely, neurons in the reservoir pool within each ESN are sparsely connected, resulting in different contributions to the entire network. e connection between the reservoir pool and the output layer is fully bonded, contrary to the sparse link of neurons in the reservoir pool [41]. Referring to the sparse connection in the reservoir pool, the redundant information may occur with the needless nodes and connected weights. e pruning algorithm is introduced into the ESN in the broad framework. e correlation between the reservoir pool of neurons is calculated, and the high correlation between sets of neurons connected weights is set to zero. e rest of the output weight is recalculated by the regression algorithm. e network structure proposed in this paper is shown in Figure 2.

Broad Echo State Network with Reservoir Pruning
As shown in Figure 2, the original data is extracted by linear mapping in the mapping layer to extract data features. e output of the mapping layer is used as the input of the enhancement layer. Each unit of the enhancement layer is ESN.
e network output can be obtained through the combination matrix of the enhancement layer's output and the mapping layer's output. e effect is not up to the expected value. e number of ESNs is adjusted through the incremental algorithm, and the pruning optimization algorithm optimizes each ESN to optimize the network model. e output matrices of the mapping layer and the en- e output formula of each unit of the enhancement layer is as follows: e enhancement layer begins to have the incremental learning ability by replacing the unit with ESN. e entire network can adapt to time series data and extract the data feature by dynamically adding an ESN unit until the network error reaches a default value. e update calculation is as follows: where A m is the state matrix, (A m ) * is the pseudo inverse matrix of A m , W ′ is the output weight matrix after adding the ESN unit, Y is the output of the network, and B, C, and D are defined as follows: e thresholds should be set in adding ESN units adaptively, and the thresholds can be determined with Root Mean Squared Error (RMSE) and Mean Absolute Deviation (MAE). It need not recalculate the pseudo inverse (A m ) * of the state matrix in each time of adding an ESN. It only needs to update the ESN based on the previous network parameters, significantly shortening the running time.
e correlation between neurons in the reservoir pool of each ESN unit is calculated, and the correlation matrix is obtained, as follows: where r nm represents the correlation coefficient between the n-th neuron and the m-th neuron in the reservoir pool, y n represents the mean value of the state vector of the layer where the neuron in the N reservoir pool resides, T represents the number of state vectors, and N represents the number of neurons in the reservoir pool.
According to the correlation coefficient, the correlation matrix r of a single ESN unit can be determined as follows: Based on the correlation matrix, the subscripts of k sets of the largest elements in r are denoted as S � [(s n1 , s m1 ) 1 , (s n2 , s m2 ) 2 , . . . , (s nk , s mk ) k ]. e neurons with the same serial number with the subscript in S are selected, and their weight matrix is set to zero. en, the ESN is trained again, and the weights of the output matrix are calculated to obtain the network error.

Pruning
Algorithm to Optimize the Network. In this paper, each ESN in the enhancement layer is optimized to find the most appropriate size of the reservoir pool. e internal neuron connection of the ESN reservoir pool is sparse, and each neuron contributes to the network differently. e pruning optimization algorithm is a practical improvement to reduce the calculation cost, save time, and improve the accuracy to a certain extent. Algorithm optimization aims to calculate the correlation between the neurons in each reservoir pool, obtain the correlation matrix, and set the weight corresponding to the k neuron with the highest correlation to zero. From the experimental results, the pruning optimization algorithm can reduce the error of the network model. e algorithm flow is shown in Figure 3, and the overall algorithm is shown in Algorithm 1.

Experimental Environment and Settings.
e experiments are designed to verify the nonstationary time series prediction neural network. Two different types of time series data are selected as the subject to be predicted. One dataset is the air quality monitoring data of Fangshan District in Beijing, and the other is the power load data of the United States. e two datasets represent the different systems, of which the air quality data is from the natural environment, and the power load data is from the man-made system. e data can be regarded as the typical time series in the common systems. e experiments are conducted on a small computing platform. e platform is based on a 64-bit Windows system. Its RAM is 16 GB, and the processor is AMD R7 4800H (2.9 GHz). e deep learning framework is based on Ten-sorflow2.0. and Keras2.4.3. e code is applied in the programming language of Python 3.7.
In this paper, some typical time series prediction models are selected as contrast methods. e proposed method is BPESN, which integrates the ESN with BLS and the pruning optimization algorithm. As the basis of the proposed BPESN, the typical non-feedback neural network ESN and BLS are set as the contrast. Besides, the integrated structure of ESN and BLS is also set as the contrast, which is called BESN. Moreover, the recurrent neural network has been the representative method in the time series prediction. LSTM is selected on behalf of the feedback recurrent neural network. As an improvement of LSTM, GRU is widely applied in prediction, which is also set as the comparison model. Meanwhile, the k-fold cross-validation is a standard validation method for model evaluation and selection in the field of machine learning [42]. It aims at avoiding the e smaller the three values, the better the model performance. R 2 reflects the reasonable degree of the final prediction model. e closer to 1 for R 2 , the better the fitting degree of the prediction model is.
e formula of each evaluation index is as follows: where n is the number of samples, y(k) is the k − th predicted value, y(k) is the k − th actual value, and y rv is the mean of the predicted values.   including the air quality monitoring dataset and the power load dataset. e air quality monitoring dataset of the Fangshan District in Beijing includes AQI, CO, NO2, O3, PM10, PM2.5, and SQ2. For the evaluation of air quality, the air quality index (AQI) has been a comprehensive indicator calculated with concrete monitoring parameters. e data of AQI is analyzed in the experiment. e data are monitored and recorded hourly, which began on February 15, 2017, and ended on December 2, 2018. A total of 15,000 hours are covered. In the experiment, 12,000 test sets and 3,000 validation sets are included. e training sets account for 80% of the total samples, and the test sets account for 20%. e original trends of AQI data are shown in Figure 4.
US electric power load data collection began on January 1, 2017, until January 1, 2020. e sampling interval is 1 hour. In this experiment, the continuous 900 days of data were selected as the total sample with 21600 sets. e training set accounts for 80% of the total sample number, and the test set accounts for 20%. e trend of the total samples is shown in Figure 5. e nonstationary degree is tested firstly for the two datasets. e ADF tests are conducted to obtain the indicators. e test results are shown in Table 1.
It can be seen from Table 1 that the probability statistical value p of the air quality dataset and US power load dataset is greater than 0.05, and the TS is less than three critical values. According to the method in Section 2.1, the null hypothesis cannot be rejected in the ADF test; that is, the test sequence is nonstationary. It is concluded that both datasets in this experiment are nonstationary.

Prediction of Air Quality Monitoring Data.
In the experiment, the concrete parameters of the networks are determined based on the data. e parameters of the networks in this test are shown in Table 2.
e data of AQI is predicted with the proposed network, as well as the contrast methods. e results are shown in Figure 6, in which the results are denoted with lines in different colors. It can be found that the classical GRU and LSTM deviate from the actual value in the whole trend. Meanwhile, it can also be seen that ESN diverges significantly with the obvious fluctuation in nonstationary data. BESN will get better performance, which is combined with the advantages of BLS. BPESN fits the actual data curve most closely, based on BESN with the pruning optimization algorithm.
e results of different methods are represented by boxplots, as shown in Figure 7. For the boxplots, the box body means the range of most of the data. It can be seen that the box body of ESN is the largest, indicating that the fluctuation of predicted data is the largest. BESN makes the information more concentrated, of which the mean and median are close to the actual data. BPESN performs the best among all the contrasts in view of the data distribution, median, and mean, indicating that the prediction ability of the BPESN model is the best. Input: mappingNum, enhanceNum, mapFunction, enhanceFunction, thresholdRMSE, thresholdPrunNum, N, K mappingNum ← number of neurons in mapping layer enhanceNum ← number of neurons in mapping layer mapFunction ← mapping layer activation function enhanceFunction ← enhanced layer activation function thresholdRMSE ← threshold of RMSE in incremental learning thresholdPrunNum ← threshold of the number of pruning N ← number of neurons in reservoir pool in ESN K ← logarithms of neurons pruning the reservoir pool Output: network prediction output after each pruning optimization Algorithm: (1) for i in mappingNum: (2) for j in enhanceNum: (3) enter data into the mapping layer and initialize it W ej , β ej and get the matrix Z; (4) initialize the ESN of the reinforced layer, collect the calculation result matrix H; (5) each ESN is pruned and optimized, and the RMSE after optimization is calculated and record the number of pruning C; (6) if RMSE < thresholdRMSE or C < thresholdPrunNum: Add unpruned ESN units into the reinforced layer for further pruning; (8) else: (9) Calculation of the same mapping layer different enhancement layer ESN pruning optimized performance index; (10) end (11) end (12) the optimized parameters and predicted output are calculated. Computational Intelligence and Neuroscience 7 To evaluate the performance of each model quantificationally, the performance evaluation indexes are calculated according to Section 4.1.1, as shown in Table 3. It can be seen that the BESN has a minor prediction error than the single ESN and BLS model. Referring to SMAPE, MAE, RMSE, and R 2 , the prediction error of BPESN is less than the other contrasts.
For the optimization capability of the pruning algorithm on the air quality dataset, Figure 8 shows the performance change of the BESN model before and after optimization by   To validate the performance of BPESN on the air quality dataset, this experiment uses k-fold cross-validation (k � 10) for validation, and the evaluation metrics of each validation model are shown in Figure 9.

Prediction of Power Load Data.
In the experiment of power load prediction, the parameters are set firstly to obtain the best performance of different models. e parameters of the networks are shown in Table 4. e power load data is predicted with the proposed network and the contrast methods, as shown in Figure 10.
e experimental results are denoted with lines in different colors, and it can be found that BPESN fits the actual data curve most closely. Meanwhile, it can also be seen that BESN has a significant improvement in the degree of fit compared    to BLS and ESN. BPESN is the best in contrast after the pruning optimization. e distribution of the prediction results is shown in Figure 11. It can be seen that the ESN and BLS boxes are larger. e upper and lower caps cover the largest range in the ESN model, indicating that the forecast data fluctuates the most.
e results of BPESN optimized through the pruning algorithm based on BESN are the closest to the actual data in terms of box size, mean, and median, indicating that the predictive ability of the BPESN model is the best among all comparison models. e evaluation indicators are calculated in Table 5. e BESN network model has smaller prediction errors and more accurate capabilities than ESN, BLS, GRU, and LSTM. BPESN is a network based on BESN that has pruning optimization simultaneously. e SMAPE, MAE, and RMSE of BPESN descend compared with the BESN model. e fitting degree of the BPESN network is better in view of R 2 . Figure 12 shows the optimization capability of the pruning algorithm on the US power load dataset, similar to Figure 8. e relationship between the dash locations and scatter points indicates that the BPESN network model has less prediction error than the BESN model. It can be seen that the pruning algorithm can still optimize the BESN model within a specific range. e cross-validation is also carried on BPESN for the electric load dataset.
is experiment uses k-fold crossvalidation (k � 10) for validation, and the evaluation metrics for each validation model are shown in Figure 13.

Error Analysis.
BPESN was proposed in this paper and verified in the experiments. It can be seen from Figures 7 and 11 that the box of ESN and BLS is more prominent, and the upper and lower cap of ESN have the most considerable coverage, indicating that the fluctuation of predicted data is the largest. Although the boxplot of BLS is smaller, the mean and median move back up. e box of the BESN model is smaller, and the median and mean move down appropriately. It can be seen from Tables 3 and 5 that the R 2 of GRU and LSTM of BESN is closest to 1 relative to ESN, indicating that the network fitting ability of BESN is good. From the perspective of RMSE, BESN is 68.84%, 64.34%, 80.53%, and 80.46% lower than ESN, BLS, GRU, and LSTM, respectively. In terms of the American power load dataset, the RMSE of BESN decreases by 27.67%, 3.23%, and 8.22%, compared with ESN, GRU, and LSTM, respectively, but it increases by 5.42% compared with BLS. It can be seen from Figure 11 that BESN is not sensitive to fluctuations of power data, resulting in a smaller box and increased RMSE. e pruning optimization algorithm acts on the reserved layer of ESN and realizes pruning through different contribution degrees of reservoir pool neurons, which can improve network performance to a certain extent and is verified by experiments. Figures 8 and 12, respectively, show the RMSE changes of the network before and after 100 pruning of BESN with the same configuration of BPESN. Suppose the RMSE after pruning is defined to be smaller than the RMSE before pruning. e effective pruning rate is 34% on the Fangshan air quality dataset and 90% on the American power load dataset. From the perspective of RMSE, the BPESN of Fangshan air quality dataset decreased by 19.35% compared with BESN, and the BPESN of the American power load dataset was reduced by 15.22%. From the perspective of R 2 , the network fitting ability of BPESN compared with BESN is improved by 5.36% and 13.87%, respectively. In terms of network model fitting degree and error, BPESN is superior to other models mentioned above.
To better evaluate the performance of network models and avoid the occurrence of chance phenomena, k-fold cross-validation (k � 10) is used to assess BPESN models. As shown in Figure 9, the RMSE and R 2 fluctuate between the intervals [10.97, 14.22] and [0.88, 0.93] with mean values of 12.698 and 0.9067, respectively. As shown in Figure 13, the RMSE and R 2 fluctuate between the intervals [1112.96,

Complexity Analysis.
e complex network computation and structure can reduce the prediction error and improve the prediction performance, which usually increases time and space consumption. is subsection will analyze the complexity of the models based on ESN, including the time complexity and space complexity. Assuming K > N, the running time of each ESN is dominated by the reservoir pool computation Ο(KN 2 ) and the pseudoinverse computation Ο(N 3 ) [37]. e space occupation is concentrated in the reservoir pool computation process, and the space complexity is Ο(N 2 ). en, the time complexity of the ESN is shown in as follows: e mapping layer in BESN is a linear matrix operation with time complexity Ο(n), and there are m ESN units in the reinforcement layer with time complexity Ο(m (KN 2  + N 3 )). e space occupation is concentrated in the computation of the reservoir pool in the reinforcement layer. en, the time complexity of BESN is shown as follows: BPESN is implemented based on BESN by the pruning optimization algorithm. e time complexity of the optimization algorithm is Ο (2PN 2 log N), and P is the number of pruning. e space complexity is Ο(mN 2 ) and the time complexity of BPESN is shown in equation (13). e time and space complexity of different ESN-based networks are summarized in Table 6.
Ο n + m KN 2 + N 3 + 2PN 2 log N ≈ Ο m KN 2 + N 3 + 2PN 2 log N ≈ Ο m KN 2 + 2PN 2 log N . (13) While the accuracy of the network model is improved, it can be seen from Table 6 that the time and space complexity of BPESN are larger than those of a single ESN and BESN.
is is due to the special structure of the broad learning system and the pruning optimization algorithm. BPESN needs more time and space to learn a network structure to improve network performance.

Conclusion
A new network structure is studied in this paper. e proposed BPESN integrates the ESN in the framework of BLS with the pruning algorithm. In the optimization, the correlation coefficient matrix of the neuron in the reservoir pool is calculated, based on which the network model can remove the redundant information. e network's fitting ability is improved, and a better prediction effect is achieved. In the experiment, the datasets are tested with the nonstationary evaluation method. It is proved that the proposed network applies to the nonstationary time series data. Based on the experiment verification in this paper, the proposed network should be validated in the theoretical analysis and practical applications in future work [43][44][45][46].

Data Availability
e data used to support the findings of this study are available from the corresponding author upon request.