Echo State Property upon Noisy Driving Input

,

Te echo state property (ESP) was originally defned by Jaeger [5] to grasp the computational capabilities and mechanisms of ESNs, which may be used as a design principle of the reservoirs.Intuitively, a reservoir having this property can be successfully entrained by the driving input to generate high-dimensional, nonlinear, and rich memory properties for computations.In other words, the reservoir dynamics asymptotically washes out its transient initial state induced by the driving input; this is also referred to as "input forgetting" or "state forgetting" [5].Te original ESP [5] claims that an ESN meets the ESP if all state vectors driven by any input sequence from a compact set U asymptotically converge to the same state.While it is a sufcient condition for achieving ESP that the largest singular value of the weight matrix is smaller than unity, it is a necessary condition that the spectral radius of the matrix is smaller than unity.Some following works elaborated on the ESP defnition and its sufcient conditions by taking into account the nature of driving input signals.Te work by Yildiz et al. [42] considered the distribution of the driving input signals to refne the original ESP concept.Tis work had important ramifcations for the conditions for ESP, a spectral radius greater than unity does not necessarily imply a loss of the ESP, and thus, the commonly used procedure of scaling the spectral radius to under unity to ensure ESPs can be fawed.Meanwhile, the condition on the largest singular value of the weight matrix was found to be too restrictive, leading to poor performance, and alternative conditions were formulated [42,43].Subsequently, less restrictive sufcient conditions for the ESP to prevent the fast washing-out problem have been developed [42,43].More recently, Manjunath and Jaeger [44] provided an alternative formulation where the ESP is defned concerning a specifc input signal rather than a range of possible inputs.For a given input signal, the formulation prescribes the spectral properties of the network weight matrix W that satisfes the ESP.However, work still needs to be done to elucidate when ESP is satisfed given a broad distribution of input signals, e.g., inputs in the presence of noise.Wainrib and Galtier [45] developed a cheap algorithm to establish a local and operational version of the ESP through the computation of the largest Lyapunov exponent.Basterrech [46] presented an empirical analysis of the accuracy and the input mapping of reservoirs.Kubota et al. [47] experimentally demonstrated that cultured neuronal networks can have ESPs to serve as physical reservoir computers.
As stated above, the ESP concept has undergone several refnements since its introduction, yet the validity of widely used literature conditions for the ESP is limited because they do not always properly account for the nature of driving input signals.Of course, the simple technical treatment (i.e., setting the (efective) spectral radius of the weight matrix below the unity) often fails to guarantee the ESP when a combination of driving input and intrinsic reservoir dynamics causes unfavorable conditions for forgetting the transient initial state.For this, the empirical assessment of ESP in practice needs more elaboration in providing sufcient conditions for the ESP, considering that the ESP is a cooperative phenomenon of the intrinsic reservoir dynamics and a set of admissible driving inputs.Tis study specifcally describes numerical simulations and an analytical characterization of the ESP in the presence of noisy driving inputs.Te standard ESN model with a tangent hyperbolic activation function is used to examine the dynamical properties and the resulting ESP during the MNIST handwritten digit classifcation tasks at diferent additive white Gaussian noise levels.Te efects of the noise on the reservoir dynamics are characterized by various measures including the correlations among the neuronal activities within a reservoir, the mapping of the noisy input to the reservoir, and the memory capacity.Tese dynamical properties are related to the MNIST classifcation accuracy and bifurcation dynamics of the reservoir.In addition, the ESP index for noisy driving input is developed based on the work by Gallicchio [48] to help easily assess the property in practical applications.Bifurcation analysis was employed to capture the underlying dynamical properties of the reservoir and to prove the validity of the proposed ESP index.
By way of outline, Section 2 reviews the theoretical frameworks for understanding ESNs and the ESP.Section 3 describes the computational and theoretical methods including the defnition of noisy input-driven ESP, bifurcation analysis, and the MNIST classifcation task.Section 4 begins by describing the changes in the dynamical properties of the reservoir against noise, followed by bifurcation dynamics of the reservoir, and these are related to the deterioration of classifcation accuracy and the ESP index.Section 5 combines the discussion and conclusions.[5] with N reservoir units, K inputs, and L outputs is defned as follows:

Theoretical Framework
where x k ∈ R N×1 , u k ∈ R K×1 , and y k ∈ R L×1 are the internal, input, and output vectors at time k, respectively, W ∈ R N×N is the internal weight matrix of the reservoir, W in ∈ R N×K is the input matrix, W fb ∈ R N×L is the feedback matrix, and ) is the output matrix.Te state activation function f � (f 1 , . . ., f N ) T is a sigmoid function (usually f i is a hyperbolic tangent function) applied component-wise with f(0) � 0 and the output activation function is g � (g 1 , . . ., g L ) T , where each g i is usually the identity or a sigmoid function.Te compactness condition means F is defned on X × U, where X ⊂ R N and U ⊂ R K are compact sets and denote the set of leftinfnite input and state vector sequences, respectively.We say x − ∞ is compatible with u − ∞ when x k+1 � F (u k+1 , x k ), ∀k ≤ 0.
Te standard ESN with a hyperbolic tangent activation function without feedback (i.e., f � tanh and W fb � 0) is given by Te ESP is connected to the spectral properties of the weight matrix W, and some work has been devoted to stating and refning sufcient/necessary conditions for the ESP of the standard ESN [5,[42][43][44].A rather restrictive sufcient condition for the ESP was given by Jaeger [5] as σ max (W) < 1, where σ max (W) denotes the maximum singular value of W. Since this condition is too restrictive and the input is washed out very fast, it is not commonly used in practice.A less restrictive sufcient condition known to date is that W is diagonally Schur stable [42,43].More recently, Manjunath and Jaeger [44] provided an improved formulation of the sufcient condition for the ESP linked to an input: , where C(t) � min(|W in u(t)|), and I is the indicator function that is 1 when its argument is true, and 0 otherwise.In the presence of zero input, a necessary condition for the ESP is ρ(W) < 1, where ρ(W) denotes the spectral radius of W [5].However, for nonzero input signals, the condition is neither sufcient nor necessary for the ESP [42].
Conditions for the ESP used in the literature typically fail to properly account for the efects of driving input signals, often limiting the potentialities of the RC approach.Gallicchio [48] introduced an empirical ESP index that enables analysis of the stability regimes of reservoirs: where P is the number of randomly generated initial states, the frst and the last time step used for the calculation of the ESP index are denoted by L and T, respectively, and D(x, y) represents the Euclidean distance between two vectors x and y.A process ϕ on a state space X is defned as follows: where

Bifurcation Analysis in ESNs. Te dynamical properties of
ESNs have been investigated using bifurcation theory; it has been considered the standard method to examine the qualitative changes of the dynamical systems such as phase transitions and instabilities [49].Yildiz et al. [42] used bifurcation analysis to prove that the spectral radius condition ρ(W) < 1 is not a necessary condition for ESP under the zero input environment [42].With zero input u k � 0, the origin becomes a trivial fxed point with a sigmoid activation function.Since stationary origin state x k � 0 is compatible with zero input, the problem is reduced to the stability and existence problem of additional fxed points, which can be analyzed with bifurcation theory in an autonomous dynamical system.In two-dimensional systems consisting of two nodes, the weight matrix should be in the stable triangle region [50].For convenience, they assumed the one component in the weight matrix zero w 11 � 0. By fxing (det(W), tr(W)) � (c, c + 1), where c is positive, new fxed points emerge if . It is called degenerate bifurcation which generates two more fxed points away from the origin in a two-dimensional system.Moreover, increasing determinant det(W) with fxed trace tr(W) induces additional nontrivial fxed points away from the origin.Tis bifurcation analysis ensures the existence of nontrivial fxed points-at least four-away from the origin which can be either asymptotically stable or unstable.Stabilities of these fxed points in reservoir dynamics should be separated from stabilities in bifurcation analysis.
Using computational simulations, one can easily check the stabilities of these points by observing the basin of points.Yildiz et al. [42] showed the case that there exist three asymptotically stable fxed points including the origin and two fxed points from the degenerate bifurcation and two saddle points from the pitchfork bifurcation.Tese results can be extended to higher dimensions using appropriate block matrices.However, the bifurcations of the higher dimensional nonautonomous dynamical systems are difcult to solve analytically: in higher dimensions, ESNs can exhibit the Neimar-Sacker bifurcation, indicating fxed points other than the origin can exist at ρ(W) < 1, and the input noise signifcantly afects the dynamical properties of reservoirs and in turn the training and inference performance of ESNs.Te mathematical theory called stochastic bifurcation theory has dealt with this type of dynamical systems perturbed by additive stochastic noise [51].However, artifcial neural networks including ESNs allow the input signal to be mapped into networks through a nonlinear sigmoid-like activation function.Tis nonlinearity applied to stochastic terms makes analytic approaches much more challenging due to the difculty of linearly separating the deterministic and stochastic terms.To circumvent these problems, we resort to numerical simulations to examine the behavior of the stochastic and nonlinear dynamical systems.

Methods
We refne the literature defnitions of ESP [5,[42][43][44] and ESP index [48] to adequately refect the efects of noisy driving input u �  u + ξ, where  u is the input sequence from a compact set U and ξ the additive white Gaussian noise (AWGN).We also provide a sufcient condition of ESP considering the noise.

Echo State Property upon Noisy Driving Input
Defnition 1 (ESP with respect to noisy driving input).A network F: X × U ⟶ X (with the compactness condition) has the echo state property of tolerance ϵ and confdence c with respect to U and noise process ξ: if for any left infnite noisy input sequence Here, ‖•‖ denotes the norm defned on the state space X.If ϵ � 0, c � 1, and ξ − ∞ � (. . ., 0, 0), the defnition becomes the original one by Jaeger [5].
A sufcient condition for the ESP with respect to noisy driving input is given by following proposition-sufcient condition for ESP with respect to noisy driving input.For a standard ESN model in equation ( 2), if the following conditions are satisfed: (i) ξ and η are diferent realizations of the noise process; , the ESN has ESP of tolerance ϵ and confdence c with respect to noisy driving input.Te proof of the proposition is presented in Appendix A. When the probability distribution of the noise is given, the regime of spectral properties of W and W in where ESP holds can be calculated using the given proposition.
Finally, we defne the ESP index noisy driving input to empirically assess the property, which leads to: where  ϕ is a process defned using the  4)).

Bifurcation Analysis.
For an ESN that satisfes the ESP, diferent reservoir states induced by the same driving input from a compact set U should asymptotically converge.If the input is given zero (i.e., u � 0), the reservoir starting from any initial state should converge to the origin which is a unique global stable fxed point.Te stability and uniqueness of the fxed point are analytically provable, yet the analytical approaches seem to be challenging in the case of the nonlinear mapping of input signal containing noise because the problem of separating the stochastic part from the deterministic term is not easy to solve.Instead, we use numerical simulations to examine the nonlinear nonautonomous dynamical properties of the ESN upon noisy driving input.
Te ESNs upon a driving input with white Gaussian noise evolve following the relation: where Each image is transformed into 28 temporal driving input signals by presenting the normalized intensities of 28 pixels in a column at each time step, i.e., j th element (j � 1, 2, . . ., 28) of the input signal u k at the time step k � 28n + i is given by P n,(i,j) /255.Te readout layer consists of 10 simple neurons with a linear activation function, where the output signal ŷD of D th readout neuron represents the likelihood that the input image at time t belongs to class D. We train the output ŷD to copy a target signal y D , where y D (t) � 1 if the current image at time t belongs to class D, y D (t) � 0 otherwise.For the test, class D with the highest probability is chosen.

Results
Figure 1 exhibits the efect of noise on the handwritten digit classifcation accuracy using the standard ESN model (see equation ( 2)).For each SNR level, the training and test accuracies converged as the training/test dataset size increased (left side of Figure 1(a); training and test accuracies colored cyan and blue, respectively), where the accuracies declined as the SNR level decreased (i.e., increasing AWGN levels).Te neuronal activities in the reservoir fuctuated more frequently as the SNR decreased (Figure 1(a)) and these caused the decrease of the mean absolute value of cross-correlations with a lag of zero (C abs ) between all pairs of neuronal activities in the reservoir (Figure 1 Te short-term memory capacity (MC) of the reservoir, which is the primary measure of the network's ability to store the past input information [52], nonlinearly decreased upon reducing the SNR level (i.e., promoting AWGN) (Figure 2(a)).Te MC is the sum of δ-delay short-term memory capacity MC δ (δ � 1, . . ., δ max ) against SNR levels (i.e., MC �  δ max δ�1 MC δ ), where the MC δ means the coefcient of determination (R 2 ) of the linear regression with the reservoir state vector x k to predict driving input signal u k− δ with a delay of δ; δ max was set to 20 in Figure 2(a) and each of MC δ values against SNR levels are displayed in Appendix B. Te mapping score (S M ) of the driving input signal also decreased by promoting the AWGN levels (Figure 2(b)).Te score quantifes how much information of the driving input signal is mapped to the reservoir against noise by comparing the reservoir state induced by the driving inputs with and without the noise.It is defned as S M � 1/(1 + D( x, x)), where D denotes the Euclidean distance and x and x mean the state vector by the input sequence without noise ( u k ) and that by the noisy input sequence (u k �  u k + ξ k ), respectively.Te trends of these two information processing indices of the reservoir are in good agreement with the deterioration pattern of the classifcation accuracy due to the noise (Figure 1(c)).Tis indicates that the addition of noise strongly infuences the information processing of the reservoir via changing the neural activities, which in turn impacts the computational performance.
As a way to assess the computational capability of the reservoir, the echo state property (ESP) index for noisy driving input which refects the dynamical changes of the reservoir is devised based on the work by Gallicchio [48].Te ESP index measures the average deviation of noisedriven trajectories induced by random initial states from a noiseless trajectory starting from the zero state (see Section 3.1 for a formal defnition).It thus captures the existence of the ESP: intuitively, an ESP index value close to zero strongly suggests that the ESN possesses the ESP, while a larger value means the reservoir is far from having the ESP. Figure 3 displays the ESP index and test accuracy for noisy driving input as functions of spectral radius and input scaling at diferent noise conditions-no addition of noise and adding the AWGN with SNR from 4.0 to 0.1.Te reservoirs were generated by varying spectral radii and input scaling values as follows: the elements of the input matrix W in were randomly sampled from a uniform distribution on the interval [− a, a], where a represents the input scaling.Te internal weight matrix W was randomly generated so that 1% of the elements are nonzero, and these nonzero values are uniformly sampled from the range [− 1, 1].Ten, W was rescaled to achieve the desired spectral radius.For the case without noise (Figure 3(a)), both the ESP index for noisy driving input and the original index by Gallicchio [48] generally agree with the distribution of the test accuracies on the spectral radius-input scaling plane: the index value becomes higher (i.e., less tendency to have ESP) as the spectral radius increases and the input scaling decreases (i.e., towards the top-left corner on the plane).However, when increasing the noise level (Figures 3(b)-3(d)), the ESP indices for noisy input well capture the collapse of computational properties of the reservoirs (i.e., deterioration of test accuracies), while the original index did not capture the efect of noise on the computation.
Te detailed dynamical properties of ESNs upon noisy driving input are investigated using the one-dimensional (N � 1) and two-dimensional (N � 2) ESN models in agreement with Yildiz's method for zero input cases [42].In N � 1 case, W � ρ(W) is a constant scalar and it is the spectral radius itself.x � 0 is a trivial fxed point in this system for arbitrary W. Te stability of a trivial fxed point is determined by the spectral radius; x � 0 is stable if |ρ(W)| < 1 and unstable if |ρ(W)| > 1; additional nonzero stable fxed points emerge for |ρ(W)| > 1. Te evolution of the reservoir states obtained by numerical simulations confrmed these results.Figure 4 exhibits the dynamics of one-dimensional ESNs with zero input and with only the white Gaussian noise.For the zero input case, onedimensional ESNs converge to trivial fxed point x � 0 for ρ(W) � 0.9 (frst column in Figure 4(a)) or two nonzero fxed points for ρ(W) � 1.1 (second column in Figure 4(a)).Te reservoir states manifested a pitchfork bifurcation (last column in Figure 4(a)): the states converged to a stable fxed Complexity point x � 0 for ρ(W) < 1.As spectral radius increases, an unstable fxed point emerges around ρ(W) � 1 and two nonzero stable fxed points emerge for ρ(W) > 1 (Figure 4(a)).Upon white Gaussian noise, while the overall tendencies were similar to the zero input case, the reservoir states fuctuated by the noise: diferent initial states, in turn, converged to an asymptotically same trajectory for all noise levels (SNR � 4.0, 1.0, and 0.1), and the degree of fuctuation increased with the noise level (Figures 4(b)-4(d)).In the cases of ρ(W) � 0.9 (frst column in Figures 4(b)-4(d)), reservoir states fuctuate around x � 0 and the degree of fuctuation increases as the noise level increases.In the cases of ρ(W) � 1.1 (second column in Figures 4(b)-4(d)), states fow into the vicinity of one of the two original fxed points.As the noise level increases (or SNR decreases), noise induces more blurring of fxed points, while we can still see the footprints of fxed points observed in the dynamics of ESNs without input.Te last 100 steps are used to test the convergence of the reservoir states upon 10 diferent white Gaussian noise levels (third column in Figures 4(b)-4(d)).For given spectral radius ρ(W), states are colored with dark green if states are bounded within tolerance ϵ � 0.05, i.e., |x k | < ϵ � 0.05 for 200 < k ≤ 300.Otherwise, states are colored light green if they are not bounded.In a noisy environment, states can deviate from the origin (i.e., |x k | ≥ ϵ � 0.05) even at ρ(W) < 1.
We extend our analysis into two-dimensional ESNs where two neighboring nodes (N � 2) are connected.In N � 2 case, W is a 2 × 2 matrix and the reservoir state in each step k can be represented by a two-dimensional vector x k � diferent SNR levels (SNR � 4.0, 3.0, . .., 0.2, 0.1): cross-correlations with a lag of zero between i th and j th neurons are defned as    Complexity Refecting that the stabilities of N � 2 cases are not solely determined by ρ(W), the models exhibit various types of bifurcations compared to the N � 1 systems.In addition to constant rescaling, there are more bifurcation parameters such as tr(W) [42].Here, we analyze the system which exhibits a Hopf bifurcation under a noise-free environment.In a Hopf bifurcation, a stable fxed point becomes unstable, and a limit cycle arises around the fxed point as the bifurcation parameter crosses the critical value.Internal Te reservoir states in two-dimensional ESNs without noise converge to the trivial fxed point x � [0, 0] T for ρ(W) � 0.9 as in the one-dimensional case (frst column in Figure 5(a)).However, the reservoir states in two-dimensional ESNs with ρ(W) � 1.1 oscillate (second column in Figure 5(a)).Te last 100 steps from diferent initial states converge (third column in Figure 5(a)).Tey exhibit a Hopf bifurcation; a stable fxed point becomes unstable, and a limit cycle arises around the fxed point as ρ(W) crosses 1.For spectral radii ρ(W) > 1, the states are distributed around the origin, implying the  Results from the bifurcation analysis are related to the ESP index for noisy driving input (equation ( 5)) and the original index by Gallicchio [48].Te reservoir states in bifurcation analysis (i.e., last 250 time steps (T � 51, L � 300)) in Figures 4 and 5 were used to compute the ESP indices.As expected, both ESP indices well agreed with the case of zero input (i.e., no addition of white Gaussian noise) (Figure 6(a)).Te values become drastically increased as the spectral radius is promoted beyond the unity (i.e., ρ(W) > 1), whereby ESP is easily destroyed.For ρ(W) > 1, both indices increase with the spectral radius.

Discussion and Conclusions
Despite the importance of the empirical assessment of the ESP for the logical design and optimal operation of reservoir computers, the commonly used conditions for the ESP do not explicitly account for the interference from the input noise which can signifcantly afect their performance.To provide useful information about the empirical and analytical assessment of the computational capabilities of ESNs, a series of extensive numerical simulations and analytic characterization of the ESP were performed.Te analysis began with the comparison of the primary dynamical properties of the standard ESN model with different input noise levels (i.e., addition of noise and adding the AWGN with SNR from 4.0 to 0.1).Te signifcant and distinct relationship between the dynamical measures and the MNIST classifcation accuracy indicated that the noiseinduced dynamical changes and the computational capability of the reservoir are fundamentally intertwined.We then provided the ESP index for noisy driving input, refecting these dynamical changes based on the work by Gallicchio [48], to help easily assess the computational capability of ESNs in practical applications.We have extended the bifurcation analysis of Yildiz et al. [42] using the onedimensional and two-dimensional ESN models, by taking into account the efects of AWGN on reservoir dynamics to explicate the underlying physics of the noise efects and to confrm the validity of the proposed ESP index.For both cases in one-dimensional ESN (Figure 4) and twodimensional ESN (Figure 5), the convergence of ESN bifurcates with an increasing spectral radius of the internal weight matrix ρ(W): pitchfork bifurcations and Hopf bifurcations were observed and the origin was a unique fxed point for small ρ(W) without noise.However, the fxed point was distracted by the AWGN (Figures 4 and 5); this means that all state vectors driven by any input sequence from a compact set U would not asymptotically converge to + η − ∞ , respectively.
Applying the inequality above, we obtain

□ B. Short-Term Memory Capacity for Each Delay
Short-term memory capacity for each delay is provided in Figure 7.
Figure1exhibits the efect of noise on the handwritten digit classifcation accuracy using the standard ESN model (see equation (2)).For each SNR level, the training and test accuracies converged as the training/test dataset size increased (left side of Figure1(a); training and test accuracies colored cyan and blue, respectively), where the accuracies declined as the SNR level decreased (i.e., increasing AWGN levels).Te neuronal activities in the reservoir fuctuated more frequently as the SNR decreased (Figure1(a)) and these caused the decrease of the mean absolute value of cross-correlations with a lag of zero (C abs ) between all pairs of neuronal activities in the reservoir (Figure1(b)).Te dependence of the training and test accuracies on the SNR was nonlinear, as displayed in Figure1(c): both the training and test accuracies drastically dropped at SNR < ∼1.0.Te short-term memory capacity (MC) of the reservoir, which is the primary measure of the network's ability to store the past input information[52], nonlinearly decreased upon reducing the SNR level (i.e., promoting AWGN) (Figure2(a)).Te MC is the sum of δ-delay short-term memory capacity MC δ (δ � 1, . . ., δ max ) against SNR levels (i.e., MC �

Figure 1 :
Figure 1: Infuence of signal-to-noise ratio (SNR) on the handwritten digit classifcation: (a) (left) training (cyan) and test (blue) accuracies against training/test dataset size and (right) the activity of a neuron in the reservoir given input digits without noise and in cases of additive white Gaussian noise (AWGN) with SNR � 4.0, SNR � 1.0, and SNR � 0.1.Te size of the training dataset varied from 120 to 240, . .., 30,000, and the test data sizes were one-sixth of the training size (i.e., from 20 to 40, . .., 5,000).Te accuracies using the largest dataset are displayed in each panel.(b) Te mean absolute value of cross-correlations with a lag of zero (C abs ) among the neuronal activities in the reservoir at c) Training and test accuracies over SNR levels for 30,000 training examples and 5,000 test examples.Te horizontal dashed line indicates the test accuracy without noise, the same as indicated by the blue color on the top left panel in (a).Error bars indicate the standard error over 20 independent simulations, which are negligible in all cases.

Figure 2 :Figure 3 Figure 3 :Figure 4 Figure 4 :
Figure 2: Memory capacity and mapping score of the input digit against SNR levels.(a) Short-term memory capacity (MC).MC �  20 δ�1 MC δ , where MC δ denotes the coefcient of determination (R 2 ) of the linear regression with the reservoir state vector x k to predict driving input signal u k− δ with a delay of δ.(b) Mapping score S M of driving input signals.Te score is defned as S M � 1/(1 + D( x, x)), where D denotes the Euclidean distance and  x and x are the state vectors compatible with the input sequence without noise ( u k ) and the noisy input sequence (u k �  u k + ξ k ), respectively.Error bars indicate the standard error over 20 independent simulations; the values were negligible in most cases.
weight matrix W � c 0.274 − 1.09 0.83 0.35   was used and spectral radius ρ(W) is rescaled by multiplying the constant c > 0.

Figure 5 :
Figure 5: Reservoir states x k � [x 1 k , x 2 k ] T of two-dimensional ESNs with zero input and with only the white Gaussian noise.Evolution of reservoir states with diferent initial states for one white Gaussian noise input against the spectral radius of the internal weight matrix ρ(W) � 0.9 (left), 1.1 (center), and reservoirs states against ρ(W) (right) in the last 100 steps (201 ≤ k ≤ 300; green shaded area for the frst and second columns): (a) reservoir states without noise; (b) reservoir states driven by white Gaussian input noise with SNR � 4.0; (c) with SNR � 1.0;(d) SNR � 1.0.Each panel of the frst and second columns displays 100 reservoir states starting from random initial states, while that of the third column includes 1000 diferent states (100 randomized reservoir states × 10 randomized noise generations at each SNR).Te internal weight matrix was chosen to W � c 0.274 − 1.09 0.83 0.35  , for all simulations, where c is a positive constant to rescale the spectral radius ρ(W).Te noise levels are the same as MNIST tasks in Section 3.3.Te last 100 states are colored dark green if they are bounded with tolerance ϵ � 0.05, i.e., |x k | < ϵ � 0.05, and otherwise light green.

Figure 6 :
Figure 6: ESP index for noisy driving input against spectral radius in one-dimensional (N � 1) and two-dimensional ESNs (N � 2): (a) without noise; (b) driven by white Gaussian input noise with SNR � 4.0; (c) with SNR � 1.0;(d) SNR � 0.1.Te noise levels are the same as MNIST tasks.All reservoir parameters are the same as in Figure4for N � 1 and in Figure5for N � 2; 100 independent simulations with diferent initial states for each spectral radius and each noise level were done.Te original ESP index[48] values were compared for the last 250 time steps.
x k+1 − y k+1 � � � � � � � � � F x k , u k  − F y k , v k  � � � � � � � � � tanh Wx k + W in u k   − tanh Wy k + W in v k   � � � � � � � � � � ≤ W x k − y k  + W in u k − v k  � � � � � � � �(∵ tanh(x) is a 1-Lipschitz function) ≤ W x k − y k  � � � � � � � � + W in ξ k − η k  � � � � � � � �(by triangular inequality) [44]Z (where the random variables U k take values in a set U) if with probability one, ∀u − ∞ ∈ U − ∞ and ∀x − ∞ , ∀y − ∞ ∈ X − ∞ compatible with u − ∞ , it holds that x 0 � y 0 .Te defnition of ESP for a specifc input signal that respects the nature of the expected input signals in more detail is suggested by Manjunath and Jaeger[44]: a network F is said to have the echo state property with respect to an input sequence u k   if ∀k ∈ Z, x k+1 � F(x k , u k+1 ), and reservoir state, and ξ k ∈ R K×1 is the K-dimensional vector whose component is a n independent white Gaussian noise.Te input matrix is fxed to W in � 0.5I N×N when noise is given for convenience, and W Te modifed National Institute of Standards and Technology (MNIST) image database for handwritten digits ("0," "1," . .., and "9") is used to test the performances of the standard ESN.Te training and test accuracies are obtained using different numbers of examples: the sizes of the training sets vary from 120 to 240, . .., 30,000, and the test sets are onesixth of the corresponding training set size, i.e., from 20 to 40, . .., 5,000 examples.Each training or testing set contains an equal number of samples from each digit.Te pixel dimension of a digit number image was 28 × 28 � 784 pixels.Te AWGN was added to each image with diferent signal-to-noise ratios (SNRs).Te SNR was defned as E(  P , 28) in the n th image without noise, and σ denotes a white Gaussian noise with a mean of zero.Te variance of σ is determined by the given SNR value.Pixel intensities of the noisy image are given as follows: (6)� 0 without input.As discussed above, if there is no input (i.e.W in � 0), the ESNs with ESP will converge to the origin.Reservoir states were updated at each step following equation(6).After one update, all states are bounded within (− 1, 1) since the activation function is bounded.Hence, initial states within [− 1, 1] are enough to examine the full dynamics of the reservoirs.In onedimensional ESNs, 50 initial states within [− 1, 1] with equal intervals randomly chosen from the uniform distribution U(− 1, 1) and the reservoirs were updated for 300 steps.In two-dimensional ESNs, 100 initial states are chosen randomly in [− 1, 1] × [− 1, 1]-each component is extracted from a uniform distribution U(− 1, 1)-and updated for 300 steps.Increasing the spectral radius of W (by multiplying constants), we observe the stability in ESNs to track the ESP.In the simulations, the input noise was given in the same way as the MNIST classifcation task (Section 3.3).3.3.Handwritten Digit Classifcation Tasks.2 n,(i,j) )/Var(σ), where  P n,(i,j) is the pixel intensity of the (i, j) pixel (i, j � 1, 2, . . .