^{1}

^{1}

^{1}

^{1, 2}

^{1}

^{2}

The radial basis function (RBF) network has its foundation in the conventional approximation theory. It has the capability of universal approximation. The RBF network is a popular alternative to the well-known multilayer perceptron (MLP), since it has a simpler structure and a much faster training process. In this paper, we give a comprehensive survey on the RBF network and its learning. Many aspects associated with the RBF network, such as network structure, universal approimation capability, radial basis functions, RBF network learning, structure optimization, normalized RBF networks, application to dynamic system modeling, and nonlinear complex-valued signal processing, are described. We also compare the features and capability of the two models.

The multilayer perceptron (MLP) trained with backpropagation (BP) rule [

The RBF network has its origin in performing exact interpolation of a set of data points in a multidimensional space [

The RBF network is a three-layer (

Architecture of the RBF network. The input, hidden, and output layers have

For input

For a set of

The RBF network with a localized RBF such as the Gaussian RBF network is a receptive-field or localized network. The localized approximation method provides the strongest output when the input is near the prototype of a node. For a suitably trained localized RBF network, input vectors that are close to each other always generate similar outputs, while distant input vectors produce nearly independent outputs. This is the intrinsic local generalization property. A receptive-field network is an associative neural network in that only a small subspace is determined by the input to the network. This property is particularly attractive since the modification of the receptive-field function produces local effect. Thus receptive-field networks can be conveniently constructed by adjusting the parameters of the receptive-field functions and/or adding or removing neurons. Another well-known receptive-field network is the cerebellar model articulation controller (CMAC) [

The RBF network has universal approximation and regularization capabilities. Theoretically, the RBF network can approximate any continuous function arbitrarily well, if the RBF is suitably chosen [

In [

This paper is organized as follows. In Section

A number of functions can be used as the RBF [

Among these RBFs, (

Another popular RBF for universal approximation is the thin-plate spline function (

The Gaussian and thin-plate spline functions.

A pseudo-Gaussian function in the one-dimensional space is introduced by selecting the standard deviation

Approximating functions with nearly constant-valued segments using localized RBFs is most difficult, and the approximation is inefficient. The sigmoidal RBF, as a composite of a set of sigmoidal functions, can be used to deal with this problem [

The popular Gaussian RBF is circular shaped. Many RBF nodes may be required for approximating a functional behavior with sharp noncircular features. In order to reduce the size of the RBF network, direction-dependent scaling, shaping, and rotation of Gaussian RBFs are introduced in [

RBF network learning can be formulated as the minimization of the MSE function

RBF network learning requires the determination of the RBF centers and the weights. Selection of the RBF centers is most critical to RBF network implementation. The centers can be placed on a random subset or all of the training examples, or determined by clustering or via a learning procedure. One can also use all the data points as centers in the beginning and then selectively remove centers using the

RBF network learning is usually performed using a two-phase strategy: the first phase specifies suitable centers

A simple method to specify the RBF centers is to randomly select a subset of the input patterns from the training set if the training set is representative of the learning problem. Each RBF center is exactly situated at an input pattern. The training method based on a random selection of centers from a large training set of fixed size is found to be relatively insensitive to the use of pseudoinverse; hence the method itself may be a regularization method [

For function approximation, one heuristic is to place the RBF centers at the extrema of the second-order derivative of a function and to place the RBF centers more densely in areas of higher absolute second-order derivative than in areas of lower absolute second-order derivative [

The Gaussian RBF network using the same

Clustering is a data analysis tool for characterizing the distribution of a data set and is usually used for determining the RBF centers. The training set is grouped into appropriate clusters whose prototypes are used as RBF centers. The number of clusters can be specified or determined automatically depending on the clustering algorithm. The performance of the clustering algorithm is important to the efficiency of RBF network learning.

Unsupervised clustering such as the

After the RBF centers are determined, the covariance matrices of the RBFs are set to the covariances of the input patterns in each cluster. In this case, the Gaussian RBF network is extended to the generalized RBF network using the Mahalanobis distance, defined by the weighted norm [

After RBF centers and their widths or covariance matrices are determined, learning of the weights

After the parameters related to the RBF centers are determined,

When the full data set is not available and samples are obtained on-line, the RLS method can be used to train the weights on-line [

In order to eliminate the inversion operation given in (

The orthogonal least-squares (OLS) method [

The batch OLS method can not only determine the weights, but also choose the number and the positions of the RBF centers. The batch OLS can employ the forward [

Due to the orthogonalization procedure, it is very convenient to implement the forward and backward center selection approaches. The forward selection approach is to build up a network by adding, one at a time, centers at the data points that result in the largest decrease in the network output error at each stage. Alternatively, the backward selection algorithm sequentially removes from the network, one at a time, those centers that cause the smallest increase in the residual.

The error reduction ratio (ERR) due to the

ERR is a performance-oriented criterion. An alternative terminating criterion can be based on the Akaike information criterion (AIC) [

The computation complexity of the orthogonal decomposition of

The RBF center clustering method based on the Fisher ratio class separability measure [

Recursive OLS (ROLS) algorithms are proposed for updating the weights of single-input single-output [

The gradient-descent method provides the simplest solution. We now apply the gradient-descent method to supervised learning of the RBF network.

To derive the supervised learning algorithm for the RBF network with any useful RBF, we rewrite the error function (

Taking the first-order derivative of

The gradient-descent method is defined by the update equations

Initialization can be based on a random selection of the RBF centers from the examples and

For the Gaussian RBF network, the RBF at each node can be assigned a different width

When using the RBF using the same

The gradient-descent algorithms introduced so far are batch learning algorithms. By optimizing the error function

Although the RBF network trained by the gradient-descent method is capable of providing equivalent or better performance compared to that of the MLP trained with the BP, the training time for the two methods are comparable [

The gradient-descent method is prone to finding local minima of the error function. For reasonably well-localized RBF, an input will generate a significant activation in a small region, and the opportunity of getting stuck at a local minimum is small. Unsupervised methods can be used to determine

Actually, all general-purpose unconstrained optimization methods are applicable for RBF network learning by minimization of

The objective is to find suitable network structure and the corresponding network parameters. Some complexity criteria such as the AIC [

The LM method is used for RBF network learning [

In [

Linear programming models with polynomial time complexity are also employed to train the RBF network [

The expectation-maximization (EM) method [

The RBF network using regression weights can significantly reduce the number of hidden units and is effectively used for approximating nonlinear dynamic systems [

When approximating a given function

The Gaussian RBF network can be regarded as an improved alternative to the four-layer probabilistic neural network (PNN) [

Extreme learning machine (ELM) [

In order to achieve the optimum structure of an RBF network, learning can be performed by determining the number and locations of the RBF centers automatically using constructive and pruning methods.

The constructive approach gradually increases the number of RBF centers until a criterion is satisfied. The forward OLS algorithm [

In [

In a heuristic incremental algorithm [

The incremental RBF network architecture using hierarchical gridding of the input space [

The dynamic decay adjustment (DDA) algorithm is a fast constructive training method for the RBF network when used for classification [

Incremental RBF network learning is also derived based on the growing cell structures model [

The RAN is a sequential learning method for the localized RBF network, which is suitable for online modeling of nonstationary processes. The network begins with no hidden units. As the pattern pairs are received during the training, a new hidden unit may be recruited according to the novelty in the data. The novelty in the data is decided by two conditions

Assuming that there are

The RAN method performs much better than the RBF network learning algorithm using random centers and that using the centers clustered by the

In [

Numerous improvements on the RAN have been made by integrating node-pruning procedure [

In [

The growing and pruning algorithm for RBF (GAP-RBF) [

In addition to the RAN algorithms with pruning strategy [

The normalized RBF network [

In [

As an efficient and fast growing RBF network algorithm, the constructive nature of DDA [

Various pruning methods for feedforward networks have been discussed in [

With the flavor of weight-decay technique, some regularization techniques for improving the generalization capability of the MLP and the RBF network are also discussed in [

In [

A theoretically well-motivated criterion for describing the generalization error is developed by using Stein’s unbiased risk estimator (SURE) [

The generalization error of a trained network can be decomposed into two parts, namely, an approximation error that is due to the finite number of parameters of the approximation scheme and an estimation error that is due to the finite number of data available [

The normalized RBF network is defined by normalizing the vector composing of the responses of all the RBF units [

A simple algorithm, called weighted averaging (WAV) [

The normalized RBF network given by (

In the normalized RBF network of the form (

The normalized RBF network loses the localized characteristics of the localized RBF network and exhibits excellent generalization properties, to the extent that hidden nodes need to be recruited only for training data at the boundaries of the class domains. This obviates the need for a dense coverage of the class domains, in contrast to the RBF network. Thus, the normalized RBF network softens the curse of dimensionality associated with the localized RBF network [

The normalized RBF network is an RBF network with a quasilinear activation function with a squashing coefficient decided by the actviations of all the hidden units. The output units can also employ the sigmoidal activation function. The RBF network with the sigmoidal function at the output nodes outperforms the case of linear or quasilinear function at the output nodes in terms of sensitivity to learning parameters, convergence speed as well as accuracy [

The normalized RBF network is found functionally equivalent to a class of Takagi-Sugeno-Kang (TSK) systems [

Tradionally, RBF networks are used for function approximation and classification. They are trained to approximate a nonlinear function, and the trained RBF networks are then used to generalize. All applications of the RBF network are based on its universal approximation capability.

RBF networks have now used in a vast variety of applications, such as face tracking and face recognition [

Special RBFs are customized to match the data characteristics of some problems. For instance, in channel equalization [

In the graphics or vision applications, the input domain is spherical. Hence, the spherical RBFs [

In a spherical RBF network, the kernel function of the

The sequential RBF network learning algorithms, such as the RAN family and the works in [

The state-dependent autoregressive (AR) model with functional coefficients is often used to model complex nonlinear dynamical systems. The RBF network can be used as a nonlinear AR time-series model for forecasting [

For time-series applications, the input to the network is

For online adaptation of nonlinear systems, a constant exponential forgetting factor is commonly applied to all the past data uniformly. This is undesirable for nonlinear systems whose dynamics are different in different operating regions. In [

Recurrent RBF networks, which combine features from the RNN and the RBF network, are suitable for the modeling of nonlinear dynamic systems [

Complex RBF networks are more efficient than the RBF network, in the case of nonlinear signal processing involving complex-valued signals, such as equalization and modeling of nonlinear channels in communication systems. Digital channel equalization can be treated as a classification problem.

In the complex RBF network [

Although the input and centers of the complex RBF network [

Learning of the complex Gaussian RBF network can be performed in two phases, where the RBF centers are first selected by using the incremental

In [

Both the MLP and the RBF networks are used for supervised learning. In the RBF network, the activation of an RBF unit is determined by the distance between the input vector and the prototype vector. For classification problems, RBF units map input patterns from a nonlinear separable space to a linear separable space, and the responses of the RBF units form new feature vectors. Each RBF prototype is a cluster serving mainly a certain class. When the MLP with a linear output layer is applied to classification problems, minimizing the error at the output of the network is equivalent to maximizing the so-called network discriminant function at the output of the hidden units [

The MLP is a global method; for an input pattern, many hidden units will contribute to the network output. The localized RBF network is a local method; it satisfies the minimal disturbance principle [

The MLP has very complex error surface, resulting in the problem of local minima or nearly flat regions. In contrast, the RBF network has a simple architecture with linear weights, and the LMS adaptation rule is equivalent to a gradient search of a quadratic surface, thus having a unique solution to the weights.

The MLP has greater generalization for each training example and is a good candidate for extrapolation. The extension of a localized RBF to its neighborhood is, however, determined by its variance. This localized property prevents the RBF network from extrapolation beyond the training data.

The localized RBF network suffers from the curse of dimensionality. To achieve a specified accuracy, it needs much more data and more hidden units than the MLP. In order to approximate a wide class of smooth functions, the number of hidden units required for the three-layer MLP is polynomial with respect to the input dimensions, while the counterpart for the localized RBF network is exponential [

For the MLP, the response of a hidden unit is constant on a surface which consists of parallel

The error surface of the MLP has many local minima or large flat regions called plateaus, which lead to slow convergence of the training process for gradient search. For the localized RBF network, only a few hidden units have significant activations for a given input; thus the network modifies the weights only in the vicinity of the sample point and retains constant weights in the other regions. The RBF network requires orders of magnitude less training time for convergence than the MLP trained with the BP rule for comparable performance [

Generally speaking, the MLP is a better choice if the training data is expensive. However, when the training data is cheap and plentiful or online training is required, the RBF network is very desirable. In addition, the RBF network is insensitive to the order of the presentation of the adjusted signals and hence more suitable for online or subsequent adaptive adjustment [

The RBF network is a good alternative to the MLP. It has a much faster training process compared to the MLP. In this paper, we have given a comprehensive survey of the RBF network. Various aspects of the RBF network have been described, with emphasis placed on RBF network learning and network structure optimization. Topics on normalized RBF networks, RBF networks in dynamic systems modeling, and complex RBF networks for handling nonlinear complex-valued signals are also described. The comparison of the RBF network and the MLP addresses the advantages of each of the two models.

In the support vector machine (SVM) and support vector regression (SVR) approaches, when RBFs are used as kernel function, SVM/SVR training automatically finds the important support vectors (RBF centers) and the weights. Of course, the training objective is not in the MSE sense.

Before we close this paper, we would like also to mention in passing some topics associated with the RBF network. Due to length restriction, we refer to the readers to [

The generalized single-layer network (GSLN) [

When a training set contains outliers, robust statistics [

Hardware implementations of neural networks are commonly based on building blocks and thus allow for the inherent parallelism of neural networks. The properties of the MOS transistor are desirable for analog designs of the Gaussian RBF network. In the subthreshold or weak-inversion region, the drain current of the MOS transistor has an exponential dependence on the gate bias and dissipates very low power, and this is usually exploited for designing the Gaussian function [

The authors acknowledge Professor Chi Sing Leung (Department of Electronic Engineering, City University of Hong Kong) and Professor M. N. S. Swamy (Department of Electrical and Computer Engineering, Concordia University) for their help in improving the quality of this paper. This work was supported in part by NSERC.