A Novel Learning Scheme for Chebyshev Functional Link Neural Networks

A hybrid learning scheme (ePSO-BP) to train Chebyshev Functional Link Neural Network (CFLNN) for classification is presented. The proposed method is referred as hybrid CFLNN (HCFLNN). The HCFLNN is a type of feed-forward neural networks which have the ability to transform the nonlinear input space into higher dimensional-space where linear separability is possible. Moreover, the proposed HCFLNN combines the best attribute of particle swarm optimization (PSO), back propagation learning (BP learning), and functional link neural networks (FLNNs). The proposed method eliminates the need of hidden layer by expanding the input patterns using Chebyshev orthogonal polynomials. We have shown its effectiveness of classifying the unknown pattern using the publicly available datasets obtained from UCI repository. The computational results are then compared with functional link neural network (FLNN) with a generic basis functions, PSO-based FLNN, and EFLN. From the comparative study, we observed that the performance of the HCFLNN outperforms FLNN, PSO-based FLNN, and EFLN in terms of classification accuracy.


Introduction
In recent years, higher-order neural networks [1], particularly FLNN, have been widely used to classify nonlinearly separable patterns and can be viewed as a problem of approximating an arbitrary decision boundary. Broadly, artificial neural networks have become one of the most acceptable soft computing tools for approximating the decision boundaries of a classification problem [2,3]. In fact, a multilayer perceptron (MLP) with a suitable architecture is capable of approximating virtually any function of interest [4]. This does not mean that finding such a network is easy. On the contrary, problems, such as local minima trapping, saturation, weight interference, initial weight dependence, and overfitting, make neural network training difficult.
An easy way to avoid these problems consists in removing the hidden layers. This may sound a little inconsiderate at first, since it is due to them that nonlinear input-output relationships can be captured. Encouragingly enough, the removing procedure can be executed without giving up nonlinearity, provided that the input layer is endowed with additional higher-order units [5,6]. This is the idea behind higher-order neural networks (HONNs) [7] like functional link neural networks (FLNNs) [8], ridge polynomial neural networks (RPNNs) [1,7], and so on. HONNs are simple in their architectures and require fewer number of weights to learn the underlying approximating polynomials. This potentially reduces the number of required training parameters. As a result, they can learn faster since each iteration of the training procedure takes less time. This makes them suitable for complex problem solving where the ability to retrain or adopt new data in real time is critical. Currently, there have been many algorithms used to train the functional link neural networks, such as back-propagation learning algorithm [2], genetic algorithm [9], particle swarm optimization [10], and so on. Back-propagation learning algorithms have their own limitations. However, we can advocate that if the search for the BP learning algorithms starts from the near optimum with a small tuning of the learning parameters, the searching results can be improved.
Genetic algorithms and particle swarm optimization can be used for training the FLNN to reduce the local 2 Advances in Artificial Neural Systems optimality and speed up the convergence. But training using genetic algorithm is discouraging because of the following limitations: in the training process, it requires encoding and decoding operator which is commonly treated as a long-standing barrier of neural networks researchers. The important problem of applying genetic algorithms to train neural networks may be unsatisfactory because recombination operators incur several problems, such as competing conventions [11] and the epistasis effect [12]. For better performance, real coded genetic algorithms [13,14] have been introduced. However, they generally employ random mutations, and, hence, still require lengthy local searches near a local optima. On the other hand, PSO has some attractive properties. It retains previous useful information, whereas GAs destroy the previous knowledge of the problems once the population changes. PSO encourages constructive cooperation and information sharing among particles, which enhances the search for a global optimal solution. Successful applications of PSO to some optimization problems such as function minimization [15,16] and neural networks design [17,18] have demonstrated its potential. It is considered to be capable to reduce the ill effect of the BP learning algorithm of neural networks, because it does not require gradient and differentiable information.
Unlike the GA, the PSO algorithm has no complicated operators such as cross-over and mutation. In the PSO algorithm, the potential solutions, called as particles, are obtained by flowing through the problem space by following the current optimum particles. Generally speaking, the PSO algorithm has a strong ability to find the most optimistic result, but it has a disadvantage of easily getting into a local optimum. After suitably modulating the parameters for the PSO algorithm, the rate of convergence can be speeded up, and the ability to find the global optimistic result can be enhanced. The PSO algorithm search is based on the orientation by tracing pbest, that is, each particle's best position in its history, and tracing gbest that is all particles best position in their history, it can rapidly arrive around the global optimum. However, because the PSO algorithm has several parameters to be adjusted by empirical approach, if these parameters are not appropriately set, search will proceed very slow near the global optimum. Hence, to cope up with this problem, we suggested a novel evolvable PSO (ePSO) and back propagation (BP) algorithm as a learning method of Chebyshev functional link neural network (CFLNN) for fine tuning of the connection weights.
1.1. Outline. The remainder of this paper is organized as follows. Some the recently proposed functional link neural networks (FLNNs) are reviewed in Section 2. Section 3 provides the detailed algorithm of HCFLNN for classification. In Section 4, we have presented experimental results with a comparative study. Section 5 concludes the article with a future research scope.

Functional Link Neural Networks
FLNNs are higher order neural networks without hidden units introduced by Klassen et al. [19] in 1988. Despite their linear nature, FLNNs can capture nonlinearly input-output relationships, provided that they are fed with an adequate set of polynomial inputs, or the functions might be a subset of a complete set of orthonormal basis functions spanning an ndimensional representation space, which are constructed out of the original input attributes [20].
In contrast to the linear weights of the input patterns produced by the linear links of artificial neural network, the functional link acts on an element of a pattern or on the entire pattern itself by generating a set of linearly independent functions, then evaluating these functions with the pattern as an argument. Thus, class separability is possible in the enhanced feature space. For a D-dimensional classification problem, there are ((D + r)!/D! · · · r!) possible polynomials up to degree r that can be constructed. For most of the real life problems, this is too big number, even for degree 2, which obviously discourages us from achieving our goal. However, we can still resort to constructive and pruning algorithms in order to address this problem. In fact, Sierra et al. [21] have proposed a new algorithm for the evolution of functional link networks which makes use of a standard GAs [9] to evolve near minimal linear architectures. Moreover, the complexity of the algorithm still needs to be investigated.
However, the dimensionality of many problems is itself very high and further increasing the dimensionality to a very large extent that may not be an appropriate choice. So, it is advisable and also a new research direction to choose a small set of alternative functions, which can map the function to the desired extent with an output of significant improvement. FLNN with a trigonometric basis functions for classification, as proposed in [8], is obviously an example. Chebyshev FLNN is also another improvement in this direction, the detailed is discussed in Section 3. Some of the potential contributions in FLNNs and their success for application in variety of problems are given below.
Haring and Kok [22], has proposed an algorithm that uses evolutionary computation (specifically genetic algorithm and genetic programming) for the determination of functional links (one based on polynomials and another based on expression tree) in neural network. Patra and Pal [23] have proposed a FLNN and applied to the problem of channel equalization in a digital communication channel. It relies on BP-learning algorithm. Haring et al. [24] were presenting a different ways to select and transform features using evolutionary computation and show that this kind of selection of features is a special case of so-called functional links.
Dash et al. [25] have proposed a FLNN with trigonometric basis functions to forecast the short-term electric load. Panagiotopoulos et al. [26] have reported better results by applying FLNN for planning in an interactive environment between two systems: the challenger and the responder. Patra et al. [27] have proposed a FLNN with backpropagation learning for the identification of nonlinearly dynamic systems.
With the encouraging performance of FLNN [23,27], Patra and van den Bos [28] further motivated and came up with another FLNN with three sets of basis functions such as Chebyshev, Legendre, and power series to develop Advances in Artificial Neural Systems 3 an intelligent model of the CPS involving less computational complexity. In the sequel, its implementation can be economical and robust.
In [21], a genetic algorithm for selecting an appropriate number of polynomials as a functional input to the network has been proposed by Sierra et al. and applied to the classification problem. However, their main concern was the selection of optimal set of functional links to construct the classifier. In contrast, the proposed method gives much emphasis on how to develop the learning skill of the classifier.
A Chebyshev functional link artificial neural networks have been proposed by Patra and Kot [29] for nonlinearly dynamic system identification. This is obviously another improvement in this direction and also a source of inspiration to further validate this method in other application domain. The proposed method is clearly an example. Singh and Srivastava [30] have estimated the degree of insecurity in a power system with a set of orthonormal trigonometric basis functions.
In [31], an evolutionary search of genetic type and multiobjective optimization such as accuracy and complexity of the FLNN in the Pareto sense is used to design a generalized FLNN with internal dynamics and applied to system identification.
Majhi and Shalabi [32] have applied FLNN for digital watermarking, their results show that FLNN has better performance than other algorithms in this line. In [33], a comparative performance of three artificial neural networks has been given for the detection and classification of gear faults. Authors reported that FLNN is comparatively better than others.
Misra and Dehuri [8] have used a FLNN for classification problem in data mining with a hope to get a compact classifier with less computational complexity and faster learning. Purwar et al. [34] have proposed a Chebyshev functional link neural network for system identification of unknown dynamic nonlinearly discrete-time systems. Weng et al. [35] have proposed a reduced decision feedback Chebyshev functional link artificial neural networks (RDF-CFLANN) for channel equalization.
Two simple modified FLANNs are proposed by Krishnaiah et al. [36] for estimation of carrageenan concentration. In the first model, a hidden layer is introduced and trained by EBP. In the second model, functional links are introduced to the neurons in the hidden layer, and it is trained by EBP. In [37], a FLANN with trigonometric polynomial functions is used in intelligent sensors for harsh environment that effectively linearizes the response characteristics, compensates for nonidealities, and calibrates automatically. Dehuri et al. [38] have proposed a novel strategy for feature selection using genetic algorithm and then used as the input in FLANN for classification.
With this discussion, we can conclude that a very few applications of HONNs have so far been made in classification task. Although theoretically this area is rich, but application specifically in classification is poor. Therefore, the proposed contribution can be another improvement in this direction.

Chebyshev Functional Link Neural Network.
It is well known that the nonlinearly approximation of the Chebyshev orthogonal polynomial is very powerful by the approximation theory. Combining the characteristics of the FLNN and Chebyshev orthogonal polynomial the Chebyshev functional link neural network what we named as CFLNN is resulted. The proposed method utilizes the FLNN inputoutput pattern, the nonlinearly approximation capabilities of Chebyshev orthogonal polynomial, and the evolvable particle swarm optimization(ePSO)-BP learning scheme for classification.
The Chebyshev FLNN used in this paper is a singlelayer neural network. The architecture consists of two parts, namely transformation part (i.e., from a low-dimensional feature space to high-dimensional feature space) and learning part. The transformation deals with the input feature vector to the hidden layer by approximate transformable method. The transformation is the functional expansion (FE) of the input pattern comprising of a finite set of Chebyshev polynomial. As a result, the Chebyshev polynomial basis can be viewed as a new input vector. The learning part uses the newly proposed ePSO-BP learning.
Alternatively, we can approximate a function by a polynomial of truncated power series. The power series expansion represents the function with a very small error near the point of expansion, but the error increases rapidly as we employ it at points farther away. The computational economy to be gained by Chebyshev series increases when the power series is slowly convergent. Therefore, Chebyshev series are frequently used for approximations to functions and are much more efficient than other power series of the same degree. Among orthogonal polynomials, the Chebyshev polynomials converge rapidly than expansion in other set of polynomials [8]. Moreover, Chebyshev polynomials are easier to compute than trigonometric polynomials. These interesting properties of Chebyshev polynomial motivated us to use CFLNN for approximation of decision boundaries in the feature space.

Evolvable Particle Swarm Optimization (ePSO).
Evolvable particle swarm optimization (ePSO) is an improvement over the PSO [10]. PSO is a kind of stochastic algorithm to search for the best solution by simulating the movement and flocking of birds. The algorithm works by initializing a flock of birds randomly over the searching space, where every bird is called as a particle. These particles fly with a certain velocity and find the global best position after some iteration. At each iteration k, the ith particle is represented by a vector x k i in multidimensional space to characterize its position. The velocity v k i is used to characterize its velocity. Thus, PSO maintains a set of positions: and a set of corresponding velocities 4 Advances in Artificial Neural Systems Initially, the iteration counter k = 0, and the positions x 0 i and their corresponding velocities v 0 i (i = 1, 2, . . . , N) are generated randomly from the search space Ω. Each particle changes its position x k i per iteration. The new position x k+1 i of the ith particle (i = 1, 2, . . . , N) is biased towards its best position p k i with minimized functional value f (·) referred to as personal best or pbest, found by the particle so far, and the very best position p k g , referred to as the global best or gbest, found by its companions. The gbest is the best position in the set We can say a particle in P as good or bad depending on its personal best being a good or bad point in P. Consequently, we call the ith particle ( jth particle) in P the worst (the best) if p k i (p k j ) is the least (best) fitted, with respect to function value in P. The pbest and gbest is denoted as p k i and p k g , respectively.
At each iteration k, the position x k i of the ith particle is updated by a velocity v k+1 i which depends on three components: its current velocity v k i , the cognition term (i.e., the weighted difference vectors (p k i −x k i )), and the social term (i.e., the weighted difference vector (p k g − x k i )). Specifically, the set P is updated for the next iteration using where v k+1 . The parameters r 1 and r 2 are uniformly distributed in random numbers in [0, 1] and c 1 and c 2 , known as the cognitive and social parameters, respectively, and are popularly chosen to be c 1 = c 2 = 2.0 [40]. Thus, the values r 1 · c 1 and r 2 · c 2 introduce some stochastic weighting in the difference vectors (p k i −x k i ) and (p k g −x k i ), respectively. The set P is updated as the new positions x k+1 i that are created using the following rules with a minimization of the cost function: This process of updating the velocities v k i , positions x k i , pbest p k i , and the gbest p k g is repeated until a user-defined stopping condition is met.
We now briefly present a number of improved versions of PSO and then show where our modified PSO can stand.
Shi and Eberhart [39] have done the first modification by introducing a constant inertia ω, which controls how much a particle tends to follow its current directions compared to the memorized pbest p k i and the gbest p k g . Hence, the velocity update is given by where the values of r 1 and r 2 are realized component wise. Again Shi and Eberhart [40] proposed a linearly varying inertia weight during the search. the inertia weight is linearly reduced during the search. This entails a more globally search during the initial stages and a more locally search during the final stages. They also proposed a limitation of each particle's velocity to a specified maximum velocity v max . The maximum velocity was calculated as a fraction τ(0 < τ ≤ 1) of the distance between the bounds of the search space, that is, v max = τ · (x u − x l ).
Fourie and Groenwold [41] suggested a dynamic inertia weight and maximum velocity reduction. In this modification, an inertia weight and maximum velocity are then reduced by fractions α and β, respectively, if no improvement in p k g occur after a prespecified number of iterations h, that is, where α and β are such that 0 < α, β < 1. Clerc and Kennedy [42] introduced another interesting modification to PSO in the form of a constriction coefficient χ, which controls all the three components in velocity update rule. This has an effect of reducing the velocity as the search progress. In this modification, the velocity update is given by Da and Ge [18] also modified PSO by introducing a temperature like control parameter as in the simulated annealing algorithm. Zhang et al. [43] have modified the PSO by introducing a new inertia weight during the velocity update. Generally in the beginning stages of their algorithm, the inertial weight ω should be reduced rapidly, when around optimum, the inertial weight ω should be reduced slowly. They adopted the following rule: where ω 0 is the initial inertia weight, ω 1 is the inertial weight of linear section ending, MAXITER is the total searching generations, MAXITER1 is the used generations that inertia weight is reduced linearly, and k is a variable whose range is [1,MAXITER]. By adjusting k, they are getting different ending values of inertial weight. In this work, the inertial weight is evolved as a part of searching the optimal sets of weights. However, the evolution of inertial weight is restricted between an upper limit (ω u ) and lower limit ω l . If it exceeds the boundary during the course of training the network, then the following rule is adopted for restricting the value of ω: where c value is the exceeded value.

Advances in Artificial Neural
In addition, the proposed method also uses the adaptive cognitive acceleration coefficient (c 1 ) and the social acceleration coefficients (c 2 ). c 1 has been allowed to decrease from its initial value of c 1i to c 1 f while c 2 has been increased from c 2i to c 2 f using the following equations as in [44]:

ePSO-BP Learning
Algorithm. The ePSO-BP is an learning algorithm which combines the ePSO global searching capability with the BP algorithm local searching capability. Similar to the GA [9], the ePSO algorithm is a global algorithm, which has a strong ability to find global optimistic result, and this ePSO algorithm, however, has a disadvantage that the search around global optimum is very slow. The BP algorithm, on the contrary, has a strong ability to find local optimistic result, but its ability to find the global optimistic result is weak. By combining the ePSO with the BP, a new algorithm referred to as ePSO BP hybrid learning algorithm is formulated in this paper. The fundamental idea for this hybrid algorithm is that at the beginning stage of searching for the optimum, the PSO is employed to accelerate the training speed. When the fitness function value has not changed for some generations, or value changed is smaller than a predefined number, the searching process is switched to gradient descending searching according to this heuristic knowledge. Similar to the ePSO algorithm, the ePSO BP algorithm's searching process is also started from initializing a group of random particles. First, all the particles are updated according to (4), until a new generation set of particles are generated, and then those new particles are used to search the global best (gbest) position in the solution space. Finally, the BP algorithm is used to search around the global optimum. In this way, this hybrid algorithm may find an optimum more quickly. The procedure for this ePSO BP algorithm can be summarized by the following computational steps. (2) Evaluate each initialized particle's fitness value, and p i is set as the positions of the current particles, while p g is set as the best position of the initialized particles.
(3) If the maximal iterative generations are arrived, go to Step 10, else, go to Step 4.
(4) The best particle of the current particles is stored. The positions and velocities of all the particles are updated according to (4) and (6), then a group of new particles are generated.
(5) Adjust the value of c 1 and c 2 by using (11).
(6) Adjust the inertia weights ω according to equation (10) if it flies beyond the boundary of ω.
(7) Evaluate each new particle's fitness value, and the worst particle is replaced with the stored best particle. If the ith particle's new position is better than p i , p i is set as the new position of the ith particle. If the best position of all new particles is better than p g , then p g is updated.
(8) If the current p g is unchanged for 15 consecutive generations, then go to Step 9; else, go to Step 3.
(9) Use the BP algorithm to search around p g for some epochs, if the search result is better than p g , output the current search result, or else, output p g .
(10) Output the global optimum p g .
The parameter ω, in the above ePSO BP algorithm, evolves simultaneously with the weights of the CFLANN during the course of training. The parameter MAXITER1 is generally adjusted to an appropriate value by many repeated experiments, then an adaptive gradient descending method is used to search around the global optimum p g . The BP algorithm based on gradient descending has parameter called learning rate which controls the convergence of the algorithm to an optimal local solution. In practical applications, users usually employed theoretical, empirical, or heuristic methods to set a good value for this learning rate. In this paper, we adopted the following strategy for learning rate: where μ is learning rate, k and ν are constants, epoch is a variable that represents iterative times, through adjusting k and ν and we can control the reducing speed of learning rate.

ePSO-BP Learning Algorithm for CFLNN.
Learning of a CFLNN may be considered as approximating or interpolating a continuous multivariate function φ(X) by an approximating function φ W (X). In CFLNN architecture, a set of basis functions ϕ and a fixed number of weight parameters W are used to represent φ W (X). With a specific choice of a set of basis functions ψ, the problem is then to find the weight parameters W that provide the best possible approximation of ϕ on the set of input-output samples. This can be achieved by iteratively updating W. The interested reader about the detailed theory of FLNN can refer to [21]. Let k training patterns be applied to the FLNN and can be denoted by X i , Y i , i = 1, 2, . . . , k and let the weight matrix be W. At the ith instant i = 1, 2, . . . , k, the Ddimensional input pattern and the CFLNN output are given by X i = x i1 , x i2 , . . . , x iD , i = 1, 2, . . . , k, . . . : . . . . . : .
x k1 x k2 . x kD : y k As the dimension of the input pattern is increased from D to D by a set of basis functions ϕ, given by ϕ( where W i is the weight vector associated with the ith output and is given by The error associated with the ith output is given by e i (t) = y i (t)− y i (t). Using the ePSO back-propagation (BP) learning, the weights of the CFLNN can be optimized. The high-level algorithms then can be summarized as follows.
(1) Input the set of given k training patterns.

Empirical Study
This section is divided into five subsections. Section 4.1 describes the datasets taken from UCI [45] repository of machine learning databases. The parameters required for the proposed method are given in Section 4.2. The performance of the hybrid CFLNN using some of the datasets especially considered by Sierra et al. [21] compared with the model proposed by Sierra et al. in Section 4.3. In Section 4.4, the classification accuracy of hybrid CFLNN is compared with FLNN [8]. In Section 4.5, we compared the performance of hybrid CFLNN with FLNN proposed in [8] using the cost matrix analysis and then compared with the results obtained by StatLog project [46].

Description of the Datasets.
The availability of results, with previous evolutionary and constructive algorithms (e.g., Sierra et al. [21], Preshelt [47]) has guided us the selection of the following varied datasets taken from the UCI repository of machine learning databases for the addressed neural network learning. Table 1 presents a summary of the main features of each database that has been used in this study.  IRIS  150  4  3  50  50  50  WINE  178  13  3  71  59  48  PIMA  768  8  2  500  268  -BUPA  345  6  2  145  200  -HEART  270  13  2  150  120  -CANCER 699  9  2  458 241 -  Table 2. However, the parameters for other algorithms are set based on the suggestion. The parameters for EFLN were adopted as suggested in [21]. Similarly, the parameters for FLNN were set as suggested in [8].
The values of the parameters used in this paper are as follows. We set N = 20 * d, where d is the dimension of the problem under consideration. The upper limit (ω u ) and lower limit (ω l ) of the inertia are set to [0.2, 1.8]. Similarly, the initial and final value of cognitive acceleration coefficients are set to c 1i = 2.5 and c 1f = 0.5. The initial and final value of social acceleration coefficients are set to c 2i = 0.5 and c 2f = 2.5. the maximum number of iteration is fixed to MAXITER = 500.
In the case of BP learning, the learning parameter μ and the momentum factor ν in hybrid CFLNN was chosen after a several runs to obtain the best results. In the similar manner, the functional expansion of the hybrid CFLNN was carried out.

Hybrid CFLNN versus EFLN.
In this subsection, we will compare the results of hybrid CFLNN with the results of EFLN with polynomial basis functions of degree 1, 2, and 3. The choice of the polynomial degree is obviously a key question in FLNN with polynomial basis functions. However, Sierra et al. [21] have given some guidance to  optimize the polynomial degree that can best suit to the architecture. Considering degrees of the polynomial 1, 2, and 3, the possible number of expanded inputs of the above datasets are given in Table 3.
For the sake of convenience, we report the results of the experiments conducted on CANCER and BUPA and then compared with the methods EFLN [21]. We partitioned both datasets into three sets: training, validation, and test sets. Both the networks are trained for 1500 epochs (it should be carefully examined) on the training set, and the error on the validation set was measured after every 10 epochs. Training was stopped when a maximum of 1500 epochs had been trained. The test set performance was then computed for that state of the network which had minimum validation set error during the training process. This method called early stopping is a good way to avoid overfitting of the network to the particular training examples used, which would reduce the generalization performance. The average error rate corresponding to HCFLNN, and EFLN w.r.t. training, validation, and testing of CANCER, and BUPA datasets are shown in Table 4.

Hybrid CFLNN versus FLNN.
Here, we will discuss the comparative performance of hybrid CFLNN with FLNN using three datasets IRIS, WINE, and PIMA. In this case, the total set of samples are randomly divided into two equal folds. Each of these two folds are alternatively used either as a training set or as a test set. As the proposed learning method ePSO BP learning is a stochastic algorithm, so 10 independent runs were performed for every single fold. The training results obtained in the case of HCFLNN, averaged over 10 runs, are compared with the single run of FLNN.    Tables 5 and 6 w.r.t to the different confidence level (α) of 95% and 98%, respectively.

Performance of Hybrid CFLNN versus FLNN Based on
Heart Data. In this subsection, we will explicitly examine the performance of the HCFLNN model by considering the heart dataset with the use of the 9-fold cross validation methodology. The reason for using 9-fold cross validation is that to compare the performance with the performance of few of the representative algorithms considered in StatLog Project [46]. In 9-fold cross validation, we partition the database into nine subsets (heart1.dat, heart2.dat,. . ., heart9.dat), where eight subsets are used for training, and the remaining one is used for testing. The process is repeated nine times in such a way that each time a different subset of data is used for testing. Thus, the dataset was randomly segmented into nine subsets with 30 elements each. Each subset contains about 56% of samples from class 1 (without heart disease) and 44% of samples from class 2 (with heart disease).
The procedure makes use of a weight matrix, which is described in Table 7. The purpose of such a matrix is to penalize wrongly classified samples based on the weight of the penalty of the class. In general, the weight of the penalty for class 2 samples that are classified as class 1 samples is ω 1 , while the weight of the penalty for class 1 records that are classified as class 2 samples is ω 2 . Therefore, the metric used for measuring the cost of the wrongly classifying patterns in the training and test dataset is given by (14).
where C train is the cost of the training set; C test is the cost of test set; S 1 and S 2 denote the patterns that are wrongly classified as belong to class 1 and 2, respectively; S train and S test are the total number of training and test patterns, respectively. Table 8 presents the errors and costs of the training and test sets for the FLANN model with a weight value of ω 1 = 5 and ω 2 = 1. Table 9 illustrates the performance of HCFLANN based on the above definition of cost matrix. The errors in training and test set are explicitly given.
The classification results found by the HCFLNN for the heart disease dataset were compared with the results found in the StatLog project [46]. According to [46], comparison consists of calculating the average cost produced by the nine data subsets used for validation. Table 10 presents the average cost for the nine training and test subsets. The result of the HCFLNN is highlighted in bold.

Conclusions and Research Directions
In this paper, we developed a new hybrid Chebyshev functional link neural network (HCFLNN). The hybrid model is constructed using the newly proposed ePSO-back propagation learning algorithm and functional link artificial neural network with the orthogonal Chebyshev polynomials. The model was designed for the task of classification in  With this encouraging results of HCFLNN, our future research includes: (i) testing the proposed method on a more number of real life bench mark classification problems with highly nonlinearly boundaries, (ii) mapping the input features with other polynomials such as Legendre, Gaussian, Sigmoid, power series, and so forth, for better approximation of the decision boundaries, (iii) the stability and convergence analysis of the proposed method, and (iv) the evolution of optimal FLNN using particle swarm optimization.
The HCFLNN architecture, because of its simple architecture and computational efficiency, may be conveniently employed in other tasks of data mining and knowledge discovery in databases [4,8] such as clustering, feature selection, feature extraction, association rule mining, regression, and so on. The extra calculation generated by the higher-order units can be eliminated, provided that these polynomial terms are stored in memory instead of being recalculated each time the HCFLNN trained.