The present paper proposes a recurrent neural network model and learning algorithm that can acquire the ability to generate desired multiple sequences. The network model is a dynamical system in which the transition function is a contraction mapping, and the learning algorithm is based on the gradient descent method. We show a numerical simulation in which a recurrent neural network obtains a multiple periodic attractor consisting of five Lissajous curves, or a Van der Pol oscillator with twelve different parameters. The present analysis clarifies that the model contains many stable regions as attractors, and multiple time series can be embedded into these regions by using the present learning method.

Recurrent neural networks (RNNs) have been successfully applied to the modeling of various types of dynamical systems. Since the universal approximation ability
of multilayer neural networks has been proved, RNNs can model arbitrary
dynamical systems and turing machines [

There have been some approaches to this problem. In
order to avoid conflicts in the change of parameters, the
mixture-of-experts-type architecture has been
investigated [

In the present study, we will focus on the training
method for RNNs to learn multiple attractor dynamics. Furthermore, we will show
that the present research is related to research into RNNs with contraction
transition functions. In recent years, RNNs with contraction transition mapping
have been investigated with respect to the performance of time series learning
[

We start by defining the concepts of the RNN and the
training method for multiple attractor dynamics. The RNN has the Elman net-type
architecture, and the training method for RNNs is basically based on the
backpropagation through-time (BPTT) algorithm [

We first consider a neural network model with
recurrent connection, such as the Elman net [

Architecture of the recurrent neural network. Solid arrows, dotted arrows, and boxes represent fixed connections, adjustable connections, and network states, respectively.

Dynamic states of the RNN at time step

We now define bistability for the RNN.

Assume

The bistability of a function

We present a formulation of the training procedure for
the RNN with a multiple teacher I/O time series. For every

We initialize every element of matrices

Assume that

For every

Let

Note that the maximum value of the error function

In this section, we conduct two types of experiments
as examples of using the training method for RNNs proposed in Section

Our first task is to learn the five Lissajous curves
defined by

Trajectories of the teacher I/O time series in experiment 1.

We now describe the specific conditions applied to RNN
training. The time constant

Figure

Error and Kullback-Leibler divergence between the teaching sequences and output generated by the RNN for 20 000 learning steps in experiment 1.

Figure

Time series

In Figure

Time series

Our second task is to learn multiple attractors given
by the Van der Pol oscillator with different parameters. The Van der Pol
oscillator defined by

Teaching sequences of experiment 2. (a) Trajectories on

The parameters for learning are set as follows. Let

The error function and the Kullback-Leibler divergence
for 200 000 learning steps are displayed in Figure

Error and Kullback-Leibler divergence between the teaching sequences and output generated by the RNN for 200 000 learning steps in experiment 1.

Time
series

This result allows us to consider that the RNN acquires multiple periodic attractors constituted by the teacher I/O time series.

Assume that

Let
one consider a dynamical system on

there are

suppose that

We suppose that (

(1) Assume

Similarly, we can easily show that if

(2) Let

For any

On the other hand, for every

Then,

Schematic diagram of (

For any

Kullback-Leibler divergence between
the teaching sequences and output generated by the trained RNN with

Kullback-Leibler divergence between the
teaching sequences and output generated by the trained RNN with

In the last paragraph of the previous section, we have shown that RNNs have many stable regions, and the existence of the stable regions plays an important role in the learning of multiple sequences. However, the existence of multiple stable regions is not sufficient for success in the multiple attractor learning because if the change of parameters corresponding to each time series influences other changes, each time series cannot necessarily be embedded into each region. Similarly, this problem appears in the method of RNNPB.

In the training algorithm defined in Section

In order to show the effect of the orthogonal units on
the conflict among teaching sequences, we consider the

Average

In this report, we have investigated a method of embedding multiple time series into a single RNN. In order to clarify the characteristics of the proposed approach, we compare the proposed approach with other approaches with respect to information representation of multiple sequences in the models. The mixture-of-RNN-experts-type model composes local representation in an RNN for each sequence. The local representation provides robustness against changing the parameters in learning, but it lacks the ability to extract common patterns included in the sequences because of the independency of the local representation. In the proposed model, the local representation is constructed into orthogonal units, while the global representation is also constructed into internal units using the connection weights between I/O units and internal units. Since each sequence generated by the proposed model shares the state space and connection weights, the model can extract common patterns of the sequences as well as conventional neural networks.

Another characteristic, which clarifies the difference between our model and other models, is whether the classification of each time series is self-organized into the state space. For example, in the mixture-of-RNN-experts-type model, the allocation of time series to each RNN is determined automatically. As another example, in the RNNPB model, PB values are self-organized such that the PB can individualize each time series. On the other hand, the proposed model needs the information of orthogonalization for each time series. Since the sparse firing patterns which appear in orthogonal units, corresponding to time series, are given as teaching information externally, the classification of sequences is not self-organized. The characteristic whereby the time series cannot be automatically classified is a disadvantage of the proposed model. However, the time series can be classified using other clustering techniques before applying the proposed method. Thus, by combining the proposed method and other clustering techniques, an algorithm that automatically classifies and generates multiple time series can be constructed.

In this paper, we have presented an RNN model and a learning algorithm that can acquire the ability to generate multiple sequences. The RNN model consists of two distinct properties called bistability and orthogonality. Bistability guarantees the existence of multiple attractor structures in RNNs, and provides the RNNs with contraction transition mapping. Orthogonality, which is given as a function of the orthogonal vectors of RNNs, helps prevent conflicts with respect to parameter changes caused by multiple training sequences. In the numerical experiments, RNNs which have bistability and orthogonality can learn multiple periodic attractors constituted by five Lissajous curves or 12 Van der Pol oscillators. Based on these results, the proposed model can be applied to the modeling of various types of dynamical systems that include multiple attractors.