Impact of Noise on a Dynamical System: Prediction and Uncertainties from a Swarm-Optimized Neural Network

An artificial neural network (ANN) based on particle swarm optimization (PSO) was developed for the time series prediction. The hybrid ANN+PSO algorithm was applied on Mackey-Glass chaotic time series in the short-term x(t + 6). The performance prediction was evaluated and compared with other studies available in the literature. Also, we presented properties of the dynamical system via the study of chaotic behaviour obtained from the predicted time series. Next, the hybrid ANN+PSO algorithm was complemented with a Gaussian stochastic procedure (called stochastic hybrid ANN+PSO) in order to obtain a new estimator of the predictions, which also allowed us to compute the uncertainties of predictions for noisy Mackey-Glass chaotic time series. Thus, we studied the impact of noise for several cases with a white noise level (σ N) from 0.01 to 0.1.


Introduction
Currently, the prediction of time series has played an important role in many science fields of practical application as engineering, biology, physics, meteorology, and so forth. In particular, and due to their dynamical properties, the analysis and prediction of chaotic time series have been of interest for the science community. In general, the chaotic time series are usually modeled by delay-differential equations; standard examples are the Mackey-Glass system [1], or the Ikeda equation [2] (for more examples, see [3]). Also, many methods have been used in the chaotic time series analysis [4]. However, in the last decades, different types of artificial neural networks (ANN) have been widely used for forecasting of chaotic time series, for example, backpropagation algorithm [5], radial basic function [6], and recurrent network [7].
On the other hand, the analysis of real-life time series requires taking into account the error propagation of input uncertainties. The observed data could be contaminated for different instrumental noise types as white noise or proportional to signal (the latter mainly arises from instrumental calibration). In the modeling of chaotic time series, the impact of noise can be treated as errors-invariable problem where the noise is propagated into the prediction model.
In the literature, the noisy impact on chaotic time series prediction has been barely considered. We can find studies where the algorithms were tested from a theoretical point of view (e.g., see [8][9][10][11][12]) and works where the implementation was applied on real-life time series (e.g., see [9,13,14]). In addition, some authors have proposed a modification to the standard methods in order to improve the performance prediction in presence of noise [9,14].
In this work, we used the Mackey-Glass chaotic time series in order to study the short-term prediction ( ( + 6)) with an artificial neural network optimized with a particle swarm algorithm (ANN+PSO). The method was applied on noiseless and noisy chaotic time series. In order to carry out the error propagation of the input noise, this hybrid algorithm was complemented with a Gaussian stochastic procedure to compute a new estimator of the predictions and their uncertainties. Note that ANNs have been used in combination with PSO in several applications. Principally, these applications include feed-forward neural network training [15][16][17][18], design of recurrent neural networks [19], design of radial basis function networks [20], and neural network control for nonlinear processes [21]. In addition, there are several current versions of PSO available in the literature (e.g., see the following reviews [22][23][24]), but our application uses 2 Computational Intelligence and Neuroscience a standard PSO with inertial weight [25]. In this point, the use of a PSO with inertial weight is based on the following reasons: (1) this version of PSO is easy to understand and implement due to its simple concept and learning strategy; (2) as pointed out in [26], the PSO with inertia weight [25] and PSO with constriction factor [27] are mathematically equivalent, and PSO with constriction factor can be considered as a special case of PSO with inertia weight [22,26] (note that this equivalence can be applied to other improved PSO algorithms that include a varying inertia weight schedule); (3) inertia weight PSO algorithm is quite stable to population changes [23]; (4) the advantages and disadvantages of variants of PSO depend on the problem to solve [22][23][24]; (5) as a first approach for study of noise effect on dynamical systems using an ANN combined with inertia weight PSO algorithm, the present study may motivate and help the researchers working in the field of evolutionary algorithms to develop new hybrid models or to apply other existing PSO models to solve this problem. To the best of the authors' knowledge, there is no application for forecasting the noisy chaotic time series such as the one presented here, using a hybrid method that combined ANN with PSO algorithm.
Organization of this paper is as follows. In Section 2, we present a detailed description of the hybrid ANN+PSO method. Sections 3 and 4 present the simulation, algorithm implementation, and the principal results obtained for the forecasting of noiseless chaotic time series and noisy time series, respectively. Finally, conclusions are given in Section 5.

Hybrid ANN+PSO Algorithm
Artificial neural networks (ANNs) are similar to biological neural networks in performing functions collectively and in parallel using connection nodes. Thus, ANNs are a family of statistical learning algorithms biologically inspired.
In this study, we consider one of the most successful and frequently used types of neural networks: a multilayer feedforward neural network with a backpropagation learning algorithm (gradient descent error). This ANN was implemented replacing standard backpropagation with particle swarm optimization (PSO).
PSO is a population-based optimization tool, where the system is initialized with a population of random particles and the algorithm searches for optima by updating generations [28]. In each iteration, the velocity of each particle is calculated according to the following formula [29]: where and V denote a particle position and its corresponding velocity in a search space, respectively. is the current step number, is the inertia weight, 1 and 2 are the acceleration constants, and 1 , 2 are elements from two random sequences in the range (0, 1). is the current position of the particle, is the best one of the solutions that this particle has reached, and is the best solutions that all the particles have reached. In general, the value of each component in V can be clamped to the range [−V max , +V max ] control excessive roaming of particles outside the search space [28,29]. After calculating the velocity, the new position of each particle is The procedure to calculate the output values, using the input values, is described in detail in [30].
The net inputs ( ) are calculated for the hidden neurons coming from the inputs neurons. In the case of a neuron in the hidden layer, one has where is the vector of the inputs of the training, ℎ , is the weight of the connection among the input neurons with the hidden layer ℎ, and the term ℎ , corresponds to the bias of the neuron of the hidden layer ℎ, reached in its activation. The PSO algorithm is very different than any of the traditional methods of training [28]. Each neuron contains a position and velocity. The position corresponds to the weight of a neuron ( → ℎ , ). The velocity is used to update the weight (V +1 → , ). Starting from these inputs, the outputs ( ) of the hidden neurons are calculated, using a transfer function ℎ associated with the neurons of this layer: The transfer functions ℎ can be linear or nonlinear. We used one hidden layer with ℎ as a tangent hyperbolic function (tansing) and ℎ as a linear function in the output layer: All the neurons of the ANN have an associated activation value for a given input pattern, and the algorithm continues finding the error that is presented for each neuron, except those of the input layer. After finding the output values, the weights of all layers of the network are actualized , → , by PSO, using (1) and (2) [29]. The velocity is used to control how much the position is updated. On each step, PSO compares each weight using the data set. The network with the highest fitness is considered the global best. The other weights are updated based on the global best network rather than their personal error or fitness [28]. In this paper, we used the mean square error (MSE) to determine network fitness for the entire training set: where true is the real data and calc is the calculated output value obtained from the normalized output ( ) of the network. This process was repeated for the total number Computational Intelligence and Neuroscience of patterns in the training set. For a successful process, the objective of the algorithm is to modernize all the weights minimizing the total root mean squared error (RMSE): = min (RMSE) .
In PSO, the inertial weight , the constants 1 and 2 , the number of particles part , and the maximum speed of particle summarize the parameters to synchronize for their application in a given problem. Then, an exhaustive trial-and-error procedure was applied to tune the PSO+ANN parameters. Firstly, the effect of population part is analyzed for values of 25 to 100 individuals in the swarm. For other applications, some authors have shown that a larger swarm increases the number of function evaluations to converge to an error limit [31]. In addition, Shi and Eberhart [32] illustrated that the population size has hardly any effect on the performance of a swarm algorithm. Figure 1(a) shows that the best population to solve the problem is of 50 individuals. Next, the effect of is analyzed for values of 0.1 to 0.9. Figure 1(b) shows the values of that favoured the search of the particles and accelerated the convergence. This figure shows that for a linearly decreasing inertia weight starting at 0.7 and ending at 0.5, the PSO+ANN presents a good convergence. In other aspect, a usual choice for the acceleration coefficients 1 and 2 is 1 = 2 [31]. The effect of variation of constants was evaluated for the commonly used values of 1 and 2 such as 1.49 and 2.00 [31,32]. For this analysis, 1 = 2 = 1.49 presents a better convergence than other values. Table 1 shows the selected parameters for this hybrid algorithm.
The step-to-step approach of PSO+ANN can be summarized as follows.
Step 1. Initialize the positions (weights and biases) and velocities of a group of particles randomly. The particles represent the weight vectors of ANN, including biases. The dimension of the search space is therefore the total number of weights and biases.
Step 2. The ANN is trained using the initial particles position in PSO. The learning error produced from ANN network can be treated as particles fitness value according to initial weight and bias. The current best fitness achieved by particle is set as . The with best value is set as and this value is stored.
Step 3. Evaluate the desired optimization fitness function (7) over a given data set.

Computational Intelligence and Neuroscience
Step 4. Compare the evaluated fitness value of each particle ( ) with its value. If < , then = is the coordinates corresponding to best particle so far.
Step 5. The objective function value is calculated for new positions of each particle. If a better position is achieved by an agent, value is replaced by the current value. As in Step 1, value is selected among values. If the new value is better than the previous value, it is replaced by the current value and this value is stored. If < , then = is the particle having the overall best fitness over all particles in the swarm.
Step 6. The learning error at current epoch will be reduced by changing the particles position, which will update the weight and bias of the network. Change the velocity and location of the particle according to movement equations (1) and (2). The new sets of positions (weights and biases) are produced by adding the calculated velocity value to the current position value. Then, the new sets of positions are used to produce new learning error in ANN.
Step 7. This process is repeated until the stopping conditions either minimum learning error or maximum number of iterations are met and then stop; otherwise, loop to Step 3 until convergence.
Step 8. The optimum weight and biases for ANN model are obtained by PSO. Best training process is obtained for ANN.
In our time series analysis, if the input noise level contribution is available, the RMSE in the training phase will be computed as follows: where , is the noise level of each -element. Note that , = , for a white noise assumption. Henceforth, we refer as the standard ANN+PSO to the hybrid ANN+PSO defined above.

The Stochastic ANN+PSO.
Up to now, the standard ANN+PSO is not developed to carry out the error propagation of the input noise level contribution. Nevertheless, once the standard ANN+PSO has been executed and has provided the optimal topology, we can apply an additional method in order to compute uncertainty of the prediction.
Note that once the topology is established (number of hidden layers, neurons in each hidden layer, transfer functions ℎ , and weights and biases ( ℎ , and ℎ , )), the neural network acts as a function (called function ANN) whose output only depends on the input vector (see (4)). The idea is to generate simulations from the input data ( ≡ ( )) via Gaussian random number generator in order to propagate the intrinsic data noise through the function ANN.
For each -element of the input time series, we generate -simulations as where the input noise level , is known. GR( , ) is a random number generator following a Gaussian distribution with mean zero and standard deviation equal to 2 , . Finally, for the th element, each input data set , provides an output , . These , are used in the estimation of a new estimator of prediction (̂) and an error on the prediction (̂) as follows:

Noiseless Chaotic Time Series Prediction
We computed the chaotic time series from the Mackey-Glass time-delay differential system [1,33], which is described as follows: where (unitless) is the series in the time and is the time delay. Here, we assumed that = 0.2, = 0.1, and (0) = 1.2. Note that if ≥ 17, the time series shows a chaotic behaviour [33,34]. The nominal Mackey-Glass time series is obtained from numerical integration by a fourth order Runge-Kutta method. This series was computed with a time sampling of 1 second. Thus, ( ) is derived from 0 ≤ ≤ ℎ with ( ) = 0 for < 0, where ℎ is the time horizon considered. Mackey-Glass chaotic time series with = 17 is considered as the nominal case Noiseless (without noise contribution). Here, we generate two thousand data points ( ℎ = 2000).
According to the standard analysis of the Mackey-Glass chaotic time series, we consider four nonconsecutive points in the chaotic time series in order to predict the short-term ( + 6): where this standard test assumes = 4 and Δ = = 6 [6,34]. For this input, the first thousand data sets were used for learning (training), while the others were used for the prediction validation (prediction). In the ANN+PSO implementation on the nominal case, the optimum value of HL found was six; that is, the architecture is described as 4-6-1.  Table 2 shows the RMSE (for short-term prediction of Mackey-Glass chaotic time series) from different computational methods obtained from literature, for example, the backpropagation NN [35], the conjugate gradient ANN [36], the product operator -norm [37], and the fuzzy system [38] (see references in Table 2). In the ANN+PSO configuration used here, the RMSE = 0.014 indicates that the performance prediction is in good agreement with other methods. Clearly, the inclusion of the PSO approach allows us to improve methods based on ANN without PSO, for example, the conjugate gradient ANN (RMSE = 0.229) and the backpropagation NN (RMSE = 0.026).

Chaotic Behaviour.
As the Mackey-Glass time series without noise is a known system, it is possible to compare the ability of ANN+PSO method of reproducing its chaotic behavior. Figure 3 shows a representation of the chaotic attractor studied from Mackey-Glass time series. This figure shows that with = 17 the system operates in a highdimensional regime. The Mackey-Glass system is infinite dimensional system (because it is a time-delay equation) and, thus, has an infinite number of Lyapunov exponents ( ) [33]. The Lyapunov exponents of dynamical systems are one of a number of invariants that characterize the attractors

Method
RMSE ( +6) Linear model [35] 0.5503 Conjugate gradient ANN [36] 0.2296 Product operator -norm [37] 0.0907 Fuzzy system [38] 0.0816 Cascade correlation NN [39] 0.0624 Genetic algorithm and fuzzy system [40] 0.0490 Backpropagation NN [35] 0.0262 Linguistic model (20 rules) [41] 0.0256 -nearest neighbor [42] 0.0194 This work 0.0138 of the system in a fundamental way [43]. Table 3 shows a comparison of the first four largest Lyapunov exponents of the Mackey-Glass system reported in [33], with the Lyapunov exponents obtained for the ANN+PSO method for = 17. An approach to determine an appropriate cutoff value for the number of exponents can be related to the Lyapunov dimension [43]. This idea was originally explored by Kaplan and York [44]. Thus, Kaplan and York conjecture that this dimension ( KY ) is equal to the information dimension [45]. In our case, KY is computed as 2.10. Note that, in Farmer

Noisy Chaotic Time Series Prediction
In the previous section, the ANN+PSO has proven to be an efficient method to the prediction of chaotic time series. Nevertheless, up to now, effects of noise on the hybrid ANN+PSO implementation have not been studied. In order to study the impact of noise on chaotic series time prediction, we constructed the noisy time series as the contribution of a noise level on the nominal case without noise. The Mackey-Glass noisy chaotic time series, ≡ ( ), is generated as where is the particular contribution of noise on theelement. It is estimated as = GR( , ), with GR( , ), a Gaussian random number generator.
Note that 2 , corresponds to the noise level considered. Here, we assume that the original data are effected by a white noise; that is, the noise level is the same in each -element, , = (for clarification, although the noise level is the same in each time, the noise contribution is not the same (the latter depends on the Gaussian random number generator)). Different white noise levels are considered:

Noiseless min
). Figure 4 shows that the noisy chaotic time series for is equal to 0.01 (green), 0.04 (blue), and 0.1 (red). As expected, the noisy time series with = 0.01 is the closest to the nominal case. However, the cases with = 0.04 and = 0.1 show a slightly more modified shape from the noiseless case, in particular with = 0.1.

Noise Effect on ANN+PSO.
The standard ANN+PSO is applied to our noisy time series, which provides the optimum topology and the prediction. Then, the stochastic ANN+PSO is run in order to obtain a new prediction estimator̂and the uncertainty of the prediction (̂).
Impact on Architecture. For each noisy time series, in the standard ANN+PSO implementation, we carry out a detailed study of the architecture characterization. In the determination of the optimum HL , the RMSE is computed for different number of neurons in the hidden layer (from two up to thirty), which are presented in Figure 5. For each series, the optimum HL is obtained when the RMSE reaches a minimum. As expected, the characterization of the architecture is strongly related to the noise level in the input data. In lower noise (as 0.01), the optimum HL is clearly identified from Figure 5; in contrast, in the most contaminated case ( = 0.1), the selection depends on the fourth decimal of the RMSE (0.1292, 0.1291, and 0.1293 for 19, 20, and 21 neurons in the hidden layer, resp.). The RMSE and the HL optimum are presented in Table 4. Using these values and according to the trend seen in Figure 5, we fit a lineal model, which provides a correlation with a slope of 0.0085. Although the HL for = 0.08 is not well characterized for this model, we can find a clear lineal correlation between the RMSE and the HL for different noise levels. In this context, as an illustration, in the overplot (in top-right side of Figure 5  of the HL and the noise level, whose best lineal fit model is HL = 146 + 4.7. Therefore, the impact of noise on the architecture of this hybrid neural network, for contributions lower than 0.1, can be characterized by a lineal correlation of the RMSE with the HL and the HL with the input noise . The Prediction Performance. As an illustration, the predictions obtained for noisy case = 0.1, from the standard ANN+PSO ( ) and the stochastic ANN+PSO (̂) procedures, are presented in Figure 6. As expected, even on this high noise level case, the and̂predictions are in total agreement. Actually, the RMSE obtained from both methods is the same (in the approximation of the third decimal) for each noisy case. For this reason, the RMSE shown in Table 4 represents the RMSE of both methods.
On the other hand, as expected, the RMSE increases with the growing the noise level (see Figure 7). For example, we obtained RMSE of 0.0138 and 0.13 for the noiseless and noisy (with = 0.1) cases, respectively. From Figure 7, we observe a linear correlation between the RMSE and the input noise level. The best fit model, without considering the RMSE of the noiseless case, corresponds to RMSE = 1.3 , which shows a strong lineal correlation. Therefore, we confirm that a higher noise level in input data leads to a poor estimation of the prediction estimator, which is related linearly to the input noise level.
Also, the ratio = RMSE noisy /RMSE noiseless (third column in Table 4) can be used to study the impact of noise on the performance efficiency of our implementation (with respect to nominal case). The bottom-right panel of Figure 7 shows the performance efficiency against the input noise level. In the worst case, the performance efficiency ( ) is strongly affected by one order of magnitude with respect to the noiseless case. Even so, the standard and stochastic ANN+PSO confirm to be a powerful tool for making predictions of chaotic time series.
In the literature, we do not find a similar implementation (due to the ahead prediction, type and level of noise, etc.) that allows for us a straightforward comparison of results. For example, we can contrast our results with those presented by Sheng et al. 2012 [14]. They applied the Echo State Network (ESN) based on dual estimation on a noisy Mackey-Glass time series (with a sampling of 2 seconds) with a white noise level of = 0.1. However, the prediction ahead was one, which is considered lower than ours. Yet, Prediction Uncertainties. One of the main goals of this work is to estimate the uncertainty on the prediction. The prediction measurement (̂) and the error bars (̂) obtained from the stochastic ANN+PSO, for the noisy time series with = 0.1, are presented in Figure 8. We confirm that our forecast and input data, for the strong noise contribution, are in agreement at one sigma (at 68.5% of confidential level) when the error bars are considered. The uncertainties obtained are presented in the low panel of Figure 8. We found a minimum and maximum uncertainty of 0.024 and 0.13, respectively, with an average of ⟨̂⟩ = 0.07. This value is lower than the input noise level (⟨̂⟩/ = 0.7), and this shows the impact of the error propagation in our methods. According to Figure 8, a relationship between the uncertainties and the times is not appreciated.
Finally, from Figures 6 and 8, we have proven that ANN+PSO (with the standard and/or the stochastic implementation) is a robust tool in the predictability (for the shortterm prediction) of time series affected by a white noise. In addition, now the ANN+PSO method can provide, for first time, an estimation of the uncertainty of the prediction.

Conclusions
In this paper, a hybrid algorithm based on artificial neural network and particle swarm optimization (ANN+PSO) is used in the short-term ( + 6) prediction of Mackey-Glass chaotic time series. In addition, a study of the impact of the noise on our hybrid method is presented. Based on the results and discussion presented in this study, we have the following conclusions.
(i) The current value ( ) and the past values used have influential effects on the good training and predicting capabilities of the chosen network.
(ii) In noiseless case, simulation shows that this hybrid ANN+PSO algorithm is a very powerful tool for making prediction of chaotic time series, and the low deviations found with the proposed method show an accuracy comparable with other methods available in the literature.
(iii) In noisy cases, we have proven that the hybrid ANN+PSO is a robust tool in the predictability of the short-term prediction of chaotic time series affected by a white noise.
(iv) The impact of the noise on the topology and performance efficient of the ANN+PSO is important. However, this study shows that the error propagation through the ANN+PSO has a linear behaviour, which generates a linear relationship between the RMSE (optimization parameter) and the input noise level.
Computational Intelligence and Neuroscience 9 Therefore, the PSO optimization provides a linearity which ensures that the neural network will converge to an appropriate solution, even if a noise level contribution is present.
(v) For noisy cases, although a straightforward comparison with literature is unavailable, the performance efficient proves that the standard/stochastic ANN+PSO implementation is affected in a lesser degree than the other similar performances.