Stock Price Prediction Based on Procedural Neural Networks

We present a spatiotemporal model, namely, procedural neural networks for stock price prediction. Compared with some successful traditional models on simulating stock market, such as BNN (backpropagation neural networks, HMM (hidden Markov model) and SVM (support vector machine)), the procedural neural network model processes both spacial and temporal information synchronously without slide time window, which is typically used in the well-known recurrent neural networks. Two different structures of procedural neural networks are constructed for modeling multidimensional time series problems. Learning algorithms for training the models and sustained improvement of learning are presented and discussed. Experiments on Yahoo stock market of the past decade years are implemented, and simulation results are compared by PNN, BNN, HMM, and SVM.


Introduction
From the beginning of time it has been human's common goal to make life easier and richer.The prevailing notion in society is that wealth brings comfort and luxury, so it is not surprising that there has been so much work done on ways to predict the markets.From the day stock was born, the movement of prediction has been the focus of interset for years since it can yield significant profits.There are several motivations for trying to predict stock market prices.The most basic of these is financial gain.Any system that can consistently pick winners and losers in the dynamic market place would make the owner of the system very wealthy.Thus, many individuals including researchers, investment professionals, and average investors are continually looking for this superior system which will yield them high returns.There is a second motivation in the research and financial communities.It has been proposed in the efficient market hypothesis (EMH) that markets are efficient in that opportunities for profit are discovered so quickly that they cease to be opportunities [1].The EMH effectively states that no system can continually beat the market because if this system becomes public, everyone will use it, thus negating its potential gain.
It is a practically interesting and challenging topic to predict the trends of a stock price.Fundamental and technical analyses are the first two methods used to forecast stock prices.Various technical, fundamental, and statistical indicators have been proposed and used with varying results.However, no one technique or combination of techniques has been successful enough to consistently "beat the market".With the development of neural networks, researchers and investors are hoping that the market mysteries can be unraveled.Although it is not an easy job due to its nonlinearity and uncertainty, many trials using various methods have been proposed, for example, artificial neural networks [2], fuzzy logic [3], evolutionary algorithms [4], statistic learning [5], Bayesian belief networks [6], hidden Markov model [7], granular computing [8], fractal geometry [9], and wavelet analysis [10].
Recently, a novel model named procedural neural networks (PNNs) was proposed to deal with spatiotemporal data modeling problems, especially for time series with huge data of multidimension [11].Different from the traditional multilayer backpropagation neural network (BNNs), the data in PNN are accumulated along the time axis before or after combining the contribution of the space components.While collecting these data, different components do not Advances in Artificial Neural Systems have to be sampled simultaneously, but in the same intervals [12].In this way, these time series problems subjected to synchronous sampling in all dimensions can be simulated by PNN.Moreover, the dimensional scale of input for PNN does not increase, while in the recurrent BNN a fix slide time window, which makes the dimensional scale large, is usually chosen to deal with time series data [2].As a result, the complexity of PNNs is intuitively decreased both in the scale of model and in the time cost of training.Intrinsically, PNN differs from BNN in the way of mathematic mapping.BNN tries to map an n-dimensional point to another point in an m-dimensional space, while PNN tends to transfer an n-dimensional function to an m-dimensional point [13].
Varying from the previous work, this paper focuses on discussion of two kinds of quite different structures of PNNs and their application to prediction of stock market.
The rest of this paper is organized as follows.In Section 2, some general theories and analysis of stock markets are mentioned and some typical models are introduced for stock price prediction.In Section 3, the procedural neural network model is described in detail.In Section 4, the learning algorithm is constructed for training procedural neural networks, and the computational complexity of the algorithm is discussed.In Section 5, several experimental results are provided.Finally, Section 6 concludes this paper.

Typical Models for Stock Price Prediction
Stock markets are not perfect but pretty tough!Stock markets are filled with a certain energy and excitement.Such excitement and emotion is brought by the prospect of making a "buck," or by buying and selling in the hope of getting rich.Many people buy and sell shares in a rush to make huge fortunes (usually huge losses).Only through knowing future information about a particular market that nobody else knows can you hope to be able to make a definite profit [8].Research and the idea of stock market efficiency have been extensively studied in the past 40 years.Many of the reported anomalies could be the result of mismeasurements and the failure to incorporate timevarying risks and returns as well as the cost of information [14].In today's information age, there seems to be too much information out there and many opportunities to get overloaded or just plainly confused.The key is to decide when certain ideas are valid in which context and also to decide on what you believe.The same goes when it comes to investing on shares.
With the development of stock chart software, you may backtest stock trading strategies, create stock trading systems, view, buy, and sell signals on the charts, and do a lot more.Today's charting packages have up to hundreds of predefined indices for you to define your own in analyzing stock markets [1].Given the number of possible choices, which indexes do you use?Your choice depends on what you are attempting to model.If you are after daily changes in the stock market, then use daily figures of the all-ordinaries index.If you are after long-term trends, then use long-term indexes like the tenyear bond yield.Even if you use an indicator that is a good representation of the total market, it is still no guarantee of producing a successful result.In summary, the success depends heavily on the tool, or the model that you use [8].

Statistic Methods: Hidden Markov Model and Bayes
Networks.Markov models [15,16] are widely used to model sequential processes and have achieved many practical successes in areas such as web log mining, computational biology, speech recognition, natural language processing, robotics, and fault diagnosis.The first-order Markov model contains a single variable, the state, and specifies the probability of each state and of transiting from one state to another.Hidden Markov models (HMMs) [17] contain two variables, that is, the (hidden) state and the observation.In addition to the transition probabilities, HMMs specify the probability of making each observation in each state.Because the number of parameters of a first-order Markov model is quadratic in the number of states (and higher for higherorder models), learning Markov models is feasible only in relatively small state spaces.Such requirement makes them unsuitable for many data mining applications, which are concerned with very large state spaces.
Dynamic Bayesian networks (DBNs) generalize Markov models by allowing states to have an internal structure [18].In a DBN, a state is represented by a set of variables, which can depend on each other and on variables in previous states.If the dependency structure is sufficiently sparse, it is possible to successfully learn and reason in much larger state spaces than using Markov models.However, DBNs are still restricted by the assumption that all states are described by the same variables with the same dependencies.To many applications, states naturally fall into different classes and each class is described by a different set of variables.
Recently, there has been a considerable interest in the applications of regime switching models driven by a hidden Markov chain to various financial problems.For an overview of the hidden Markov chain and its financial applications, see the work of Elliott et al. in [19], of Elliott and Kopp [20], and of Aggoun and Elliott in [21].Some works on the use of the hidden Markov chain in finance include Buffington and Elliott [22,23] for pricing European and American options, of Ghezzi and Piccardi for stock valuation [24], and of Elliott et al. [25] for option valuation in an incomplete market.Most of the literature concerns the pricing of options under a continuous-time Markov-modulated process, while Hassan et al. [26] propose and implement a fusion model by combining the hidden Markov Model (HMM), artificial neural networks (ANNs) and Genetic Algorithms (GAs) to forecast financial market behavior.

Fractal and Dynamic Methods: Fractal Geometric Chaos.
The chaos theory [27] assumes that the return dynamics are not normally distributed and more complex approaches have to be used to study these time series.In fact, the Fractal Market Hypothesis assumes that the return dynamics are not dependent of the investors' attitudes and represent the result of the interaction of traders who, frequently, adopt different investment styles.The studies proposed in literature to analyze and predict stock price dynamics assume that, by looking at the past, one may collect useful information to understand the price formation mechanism.The initial approaches proposed in literature, the so-called technical analysis, assume that the price dynamics could be approximated with linear trends and could be analyzed using a standard mathematical or graphical approach [28].The high number of factors that are likely to influence the stock market dynamics makes this assumption incorrect and calls for the definition of more complex approaches that may succeed in studying these multiple relationships [29].
The nonlinear models are a heterogeneous set of econometric approaches that allow higher predictability levels, but not all the approaches may be easily applied to real data [30].Deterministic chaos represents the best trade-off to establish fixed rules in order to link future dynamics to past results of a time series without imposing excessively simple assumptions [31].In essence, chaos is a nonlinear deterministic process that looks random [32] because it is the result of an irregular oscillatory process influenced by an initial condition and characterized by an irregular periodicity [33].The chaos theory assumes that complex dynamics may be explained if they are considered as a combination of more simple trends that are easy to understand [34]: the higher the number of breakdowns, the higher the probability of identifying a few previously known basic profiles [35].Chaotic trends may be studied considering some significant points that represent attractors or deflectors for the time series being analyzed and the periodicity that exists in the relevant data.To improve the prediction accuracy of complex multivariate chaotic time series, recently, a scheme has been proposed based on multivariate local polynomial fitting with the optimal kernel function [36], which combines the advantages of traditional local, weighted, multivariate prediction methods.

Soft Computing:
Neural Networks, Fuzzy Logic, and Genetic Algorithms.Apparently, White (1988) was the first to use backpropagation neural networks (BNNs) for market forecasting [1].He was curious about whether BNNs could be used to extract nonlinear regularities from economic time series and thereby decode previously undetected regularities in asset price movements, such as fluctuations of common stock prices.White found that his training results were overoptimistic, being the result of overfitting or of learning evanescent features.Since then, it has been well established that fusing the soft computing (SC) technologies, for example, BNN, fuzzy logic (FL), and genetic algorithms (GAs), may significantly improve the analysis (Jain and Martin 1999 [37], Abraham et al. 2001 [38]).There are two main reasons for this.First, these technologies are mostly complementary and synergistic.They are complementary which follows from the observations that BNNs used for learning and curve fitting, FL is used to deal with imprecision and uncertainty, and GAs [39] are used for search and optimization.Second, as Zadeh (1992) [40] pointed out, merging these technologies allows for the exploitation of a tolerance for imprecision, uncertainty, and partial truth to achieve tractability, robustness, and low solution cost.Market forecasting and trading rules have numerous facets with potential applications for hybrids of the SC technologies.
Given this potential and the impetus on SC during the last decade, it is not surprising that a number of SC studies have focused on market forecasting and trading applications.As an example, Kuo et al. [3] use a genetic-algorithm-based fuzzy neural network to measure the qualitative effects on the stock price.A recent work introduced the generalized regression neural network (GRNN) which is used in various prediction and forecasting tasks [41].Due to robustness and flexibility of modeling algorithms, neurocomputational models are expected to outperform traditional statistical techniques such as regression and ARIMA in forecasting stock exchange price movements.

Machine Learning.
Applications of machine learning (ML) to stock market analysis include portfolio optimization, investment strategy determination, and market risk Analysis.Duerson et al. [42] focus on the problem of investment strategy determination through the use of reinforcement learning techniques.Four techniques, two based on recurrent reinforcement learning (RLL) and two based on Q-learning, were utilized.Q-learning produced results that consistently beat buy-and-hold strategies on several technology stocks, whereas the RRL methods were often inconsistent and required further investigation.The behavior of recurrent reinforcement learner needs further analysis.The technical justification seems to be most rigorous in the literature for this method it seems to deceive that simpler methods produced more consistent results.It has been observed that the performance of turning point indicator is generally better during short runs of trading which shows the validity of charting analysis techniques as used by professional stock traders.
Support vector machine (SVM) is a very specific learning algorithm characterized by the capacity control of the decision function, the use of the kernel functions, and the sparsity of the solution.Huang et al. [43] investigate the predictability of financial movement direction with SVM by forecasting the week movement direction of NIKKEI 225 index.SVM is a promising tool for financial forecasting.As demonstrated in their empirical analysis, SVM seems to be superior to other individual classification methods in forecasting weekly movement direction.This is a clear message for financial forecasters and traders, which can lead to a capital gain.However, it has been known that each method has its own strengths and weaknesses.The weakness of one method can be balanced by combining the strengths of another by achieving a systematic effect.The combining model performs best among all the forecasting methods.
For time series predictions, SVM is utilized as a regression function.But while preparing samples for SVM, all functions, which are dispersed in a certain interval of time, have to be transferred to spacial vectors.So it is essential that SVM still perform functions that map static vectors from one space to another.PNN combines the spatial and temporal information together; namely, neurons process information both from space and time simultaneously.In [44], the author proposed an extended model, named support function machine (SFM), in which each component of the vector is a time function and applied to predict stock price.

Procedural Neural Network Models
3.1.Procedural Neuron Model.The invention of procedural neuron provides an alternative modeling strategy to simulate time series problems which are related to some procedures [11].This model also offers an approach to study dynamic characteristics in classification or regression problems with a great deal of spatiotemporal data.The procedural neuron differs from the traditional artificial neuron it combines the spacial and temporal information together.In this way, neurons are endowed with spacial and temporal characteristics simultaneously.The weights, which connect neurons, are usually variable, that is, functions of time.The neurons are expected to be timeaccumulating, which will not be inspired before a period of time long enough by input accumulation.Compared with traditional artificial neurons, the procedural neurons can simulate the biology neurons physiologically better.Moreover, many problems in real life can be reduced to a procedure, for example, agricultural planting, industrial producing, and chemical reacting.However, mostly it is impracticable to stimulate such procedures in the traditional ways by constructing some mathematical or physical equations.
A typical procedural neuron can accept a series of inputs with multiple-dimensions, and there is only one corresponding output.In a procedural neuron, the aggregation operation is involved with not only the assembly of multiinputs in space, but also the accumulation in time domain.So the procedural neuron is the extension of the time region from the traditional neuron.The traditional neuron can be regarded as a special case of the procedure neuron.In the structure of a procedural neuron, the continuity of time is shown in Figure 1, in which, X(t) = [x 1 (t), x 2 (t), . . ., x n (t)] T is the input function vector of the procedure neuron, W(t) = [w 1 (t), w 2 (t), . . ., w n (t)] T is the weight function (or weight function vector) in range [t 1 , t 2 ], and f is the activation function, such as linear function, and Sigmoid-Gaussian function.There are two forms of accumulation, time first (the left in Figure 1) and space first (the right in Figure 1).
Mathematically, the static output of the procedural neuron in Figure 1 can be written as follows: In detail, the corresponding component form of ( 1) is The procedural neuron in Figure 1 is valuable in a sense only in theory, because most neural networks are constructed to solve discrete problems.In the case of discrete time, the procedural neuron takes the form as in Figure 2 or Figure 3, in which input data has been sampled along time axis and appears as a matrix {x i j } n×T .Similar to (2), the output of the discrete system in Figure 2 (time first) can be written as and for Figure 3 (space first), it turns out to be where w i0 , w 0 j , v 0 are thresholds of the neurons, respectively.

Procedural Neural Network Models.
Since the first model of procedural neural networks (PNNs) was proposed [45], several literatures have been concerned this topic, including topological structure constructing [11], computational ability estimating [46], learning algorithm [12], and time series application [47].Recently, the author has proposed various structures of PNN (e.g., the functional PNN [13], the complex number PNN [48], the segment PNN [49], and the SFM [44]).Generally, all these models try to simulate spatiotemporal problems, especially for problems with large amount of data.To solve time series problems, constructing a suitable structure of PNN is the first step which is valuable in practice.We can design various structures of PNN based on the procedural neuron model as we just mentioned, for example, the procedural perception, the multilayer PNN, and the feedback PNN.This paper focuses on discussing two forms of PNN, which are composed of the time-first neuron and the space-first neuron.Equations ( 3) and ( 4) are two typical perceptions of PNN in which the input functions are multidimensional time series, while the output is a static scalar.It is not difficult to extend these models to the case of multiinput and multioutput PNN (called procedural perception) and the corresponding expressions are as follows: where y k is the kth component of output vector Y = [y 1 , y 2 , . . ., y m ] T .Commonly, these two equations can be rewritten in the following uniforms: where x 2 (t) x n (t) x 2 (t) x n (t)  For the case of PNN with hidden layers, the structure is very like a multilayer feed-forward neural networks except the input neurons which can accept time series.PNN model with one hidden layer is shown in Figure 4 (time first), in which f i denotes transfer function of the ith procedural neuron as in Figure 2.
Also in Figure 4, the other parts of the neural networks are quite similar to the traditional multilayer neural networks.Considering the case of PNN with one hidden layer and time-first accumulation, we give the following expression: where g is a transfer function from the hidden nodes to the output layer and and u kh is the weight connecting the hidden and the output nodes.H is the number of nodes in the hidden layer.
3.3.Some Basic Properties.Now, let us consider the computational ability of PNN, which means what kinds of problems PNN can simulate and how PNNs realize them.Comparing with BNN, PNN tries to map an n-dimensional function to an m-dimensional vector.So PNN is a functional function which is defined in the functional space.For the sake of convenience, let us focus on classification problems.PNN tries to classify the given functions or time series.In a spatiotemporal domain, we face much more complicated issues those that in Euclid space and need much more work to do.However, based on functional analysis, it is possible to make the approach on functional classification and we can extend some classic results on BNN to the case of PNN.
In [12] we have discussed some theoretical problems on PNN, such as continuity, Lipschitz condition, computational ability and functional approximation.But there are still some issues which remain unknown, for example, how to define the distance between two functions, how to define the margin between two classes of functions, how to evaluate the complexity of functional functions for given time series problem, and so forth.Suppose X(t), X(t ], endowed with the following metric distance: where p > 0. Also assume to be a special function set, which maps an functional space to an Euclid space.For any F 1 , F 2 ∈ F and considering the metric distance what we are interested in is that (11) satisfies the following condition: Advances in Artificial Neural Systems Figure 3: Procedural neurons for for the case of discrete time (space first).
x 1 (t) x 2 (t) x n (t) where L is the so-called Lipschitz constant.If function F ∈ F satisfies (12), we say F is Lipschitzed.It is trivial to prove the following theorem [46].Theorem 1. PNNs defined in (1)-( 6) are Lipschitzed if the transfer functions f (and g in (8)) are Lipschitzed.
Actually, we have proved the following results in [12,46].Theorem 2. For any functional function F ∈ F satisfying Lipschitz condition as in (12), there is a procedural neural networks P in the form of (8) which satisfies where ε is an arbitrary small positive real number.

Preparing Samples for Training PNN.
To prepare samples for training PNN, the purpose focuses on how to make the data series into pieces, and each piece of data forms a sample.For different problems, there are different ways in organizing samples.In the case of stock markets, weekdata (five-daydata) naturally form a relative independent data which is a group of daily data.Sometimes we take such data from several weeks as a sample (which is a big sample) if these  data are relatively dependent, for example, one-month data or one-season data.
For each sample, data is discrete and appears as a matrix as follows: where n is the dimension of the data in space and T is the number of sampling on time axis.To construct a sample, typically we choose Z = [z 1 , z 2 , . . ., z m ] as the corresponding output of the model, while input is X, and (X, Z) is the normal form of a sample.Compared with samples for training BNN where both input and output are vectors, here X in 15 is a matrix and Z is a vector.So during the training, PNN tries to map a matrix to a vector, namely, P : X → Z or Z = P(X).form, is the minimum square error (MSE) represented as follows: In the case of PNN with one hidden layer (see e.g., ( 8)), the MSE becomes Referring to the deepest gradient descent algorithm, we have the following formulas which contribute to compute the update forms of weights in PNN In the case of the procedural perception (see e.g., (3)), the corresponding error function is The updating weights are reduced to The final forms of updating weights corresponding to (17)  training BNN.However, there are still some improved aspects compared with BNN, such as reducing the dimension of input space, processing time series and spacial information synthetically, and endowing different weights with respect to time and space.

Experimental Results and Comparisons
The stock data comes from Yahoo finance web site [50].Historical chart data and daily updates were provided by Commodity Systems, Inc. (CSI).International historical chart data and daily updates were provided by Hemscott Americas.Fundamental company data provided by Capital IQ.Quotes and other information supplied by independent providers can be identified on the Yahoo Finance partner page.Data in one week or five days composes a sample for PNN.For each sample there are five observation fields including the open price, the highest price, the lowest price, the closing price, and the stock volume.It is the experimental work to fix the parameters of PNN for a given data set.Here we choose a group of parameters in training a time-first PNN with one hidden layer, for example, the input node number n = 5, the time sampling with T = 1, 2, 3, 4, 5, the hidden node number H = 1, and the transfer functions f (x) = g(x) = (1 + exp(−x)) −1 .
Also the parameters for training algorithm are important experimentally.Here we suggest a group of such parameters so that the readers can repeat the experiment easily, for example, the size of training set N = 25, the precision of difference between two error functions in the succeeded loop ε = 0.0001, and the learning rate α = β = γ = 0.7.Experiments show that for most of the test samples 10 loops of training are enough for the given ε.Here, 1000 samples are selected as the test set, and for each test sample the nearest T + 25 samples in date are used as the training set.
From Figure 5 to Figure 10, the open, high, low, and close price and the volume are plotted in which both predictive and actual values are given.The blue lines denote the actual prices and the red lines stand for the predictive values.
In our experiment,we compare PNN with BNN which is a spacial case of PNN when T = 1.The methods of HMM, SVM are also compared.
Here is a simple method to evaluate the model for predictions.If the actual price increases or decreases in the next day and the prediction is also of the same increases or decreases, we say the model "hit" the point, otherwise the model "miss" the point.The percent of the "hits" points to the total points is named hit rate.In Tables 2 and 3, the hit

Figure 2 :
Figure 2: Procedural neurons for the case of discrete time (time first).

Figure 4 :Figure 5 :
Figure 4: Procedural neural networks with one hidden layer (time first).

Figure 9 :
Figure 9: PNN prediction versus actual value of volume.
Figure 1: Procedural neurons with continuous time input functions.

Table 1 :
Ten records of a stock price list.
can be described as avoid the issue of minimal value of the error function.It has a similar problem with training BNN.Therefore, training PNN remains NP-hard in computational complexity as with Table 1 lists ten-day records of Yahoo stock from 01/03/2000 to 01/14/2000.