Asymmetric Variate Generation via a Parameterless Dual Neural Learning Algorithm

In a previous work (S. Fiori, 2006), we proposed a random number generator based on a tunable non-linear neural system, whose learning rule is designed on the basis of a cardinal equation from statistics and whose implementation is based on look-up tables (LUTs). The aim of the present manuscript is to improve the above-mentioned random number generation method by changing the learning principle, while retaining the efficient LUT-based implementation. The new method proposed here proves easier to implement and relaxes some previous limitations.


INTRODUCTION
Random numbers are currently used for a variety of purposes such as: cryptographic keys generation, games, some classes of scientific experiments as well as "Monte Carlo" methods in physics and computer science [1][2][3][4][5][6]. Standard programming environments are endowed with basic pseudorandom signal generators such as the uniform and the Gaussian ones, while usually the needed distributions are more involved than uniform/Gaussian. A simple example of application is password generation: a random password generator is a software that inputs from a random or pseudorandom number generator and automatically generates a password. An example of application where involved probability distributions are needed is in independent component analysis (ICA, [7]) testing: as the behavior of an ICA algorithm might depend on the statistical distribution of the sources, ICAalgorithm testing tools might require random sequences generators capable of producing random numbers distributed according to involved probability laws.
The principal methods known in the literature to obtain a batch of samples endowed with an arbitrary distribution from a samples batch having a simple distribution are the "transformation method" and the "rejection method" [8]. In the present paper, we focus on the transformation method, which may be well implemented through a tunable neural system, because the availability of a random number source and of a tunable nonlinear system, along with a proper learning procedure, allows obtaining a wide class of pseudorandom signal generators.
A well-known effect of nonlinear neural systems is to warp the statistical distribution of its input. In particular, we assume that the system under consideration has a nonlinear adaptive structure described by the transference y = f (x), where x ∈ X ⊆ R denotes the system input random signal, having probability density function p x (x), and y ∈ Y ⊆ R denotes the output signal, having probability density function p y (y), as shown in Figure 1. In the hypothesis that the neural system transference is strictly monotonic, namely f (x) > 0, for all x ∈ X, the relationship between the input distribution, the output distribution, and the system transfer function is known to be [9] p y (y) where f −1 (·) denotes the inverse of function f (·). Usually, (1) is interpreted as an analysis formula, which allows computing the output distribution when the input distribution and the system transference function are known. However, the cardinal equation (1) may also be interpreted as a formula that allows for designing the nonlinear system when Computational Intelligence and Neuroscience Neural system the distribution p x (·) is known and it is desired that the system responds according to a desired distribution p y (·). In fact, (1) may be rewritten as the differential equation: In general, such design operation is rather difficult, because (2) in the unknown f (·) involves the solution of a nonlinear differential equation, provided that a consistent boundary condition is specified.
In the recent contribution [10], we presented a pseudorandom samples generator based on a nonlinear monotonic neural system, whose transference function is denoted by f (·), tuned on the basis of the differential equation (2). The cardinal design equation (2) was proposed to be solved via a (relaxation-type) fixed-point algorithm. The key advantages of the method proposed in [10] are as follows. (a) In order to obtain a fully-tunable neural transference function, a look-up-table representation was chosen. It guarantees high flexibility in the shape of the neural transference as well as easiness of representation and handling of the involved quantities. (b) The fixed-point learning algorithm exhibits fast convergence over other possible methods such as the gradient-based one: unlike these methods, the fixed-point learning algorithm does not require the computation of derivatives of the involved functions.
The resulting random-number generation method should be thus read as a two-stage procedure. The first stage consists in solving the cardinal differential equation (2) in the unknown function f (·), given the distributions p x (·) and p y (·) as data. The second stage consists in generating input random samples drawn from the distribution p x (·), then letting such random samples pass through the learnt nonlinear neural system by computing output values y = f (x). The random samples y are assured to be distributed according to the probability density function p y (·).
However, we recognized that the method presented in [10] also suffers from some drawbacks, namely the following. (a) For numerical convergence purpose, each step of the fixed-point-type tuning algorithm needed to be followed by some normalization steps. Namely, from (2), it is easily seen that when the function p y ( f (x)) approaches 0, the computation of f (x) becomes ill-conditioned, therefore the quantity p y ( f (x)) was replaced by p y ( f (x)) + γ, with γ > 0 being a small constant to be properly sized. Also, in order to refine learning, after each iteration step, the solution f (x) needed to be normalized either by affine scaling, in order to control the range of variable y, or by linear scaling in order to match the true value of output distribution moment of preselected order. This, in turn, requires computing in advance the (closed form) moments of interest of the output distribution. (b) In spite of affine scaling, it was not easy to control the range of the output value y, as affine scaling does not guarantee convergence in every case of interest, therefore it could not be employed in every case. (c) The developed procedure was customized to generate output distributions that are either symmetric (namely, p y (−y) = p y (y)) or completely skewed to the right (namely, p y (y) = 0, for all y < 0) only. Asymmetric or general-shape distributions were not considered.
In the present paper, we consider the problem of extending the previous method to the generation of asymmetric distributions by removing the constraint of symmetry or skewedness to the right. Also, we propose a way to avoid normalization of probability density function. The solution of choice implies a change in the viewpoint of cardinal equation (1): instead of converting formula (1) into the differential equation (2), we convert it into a new differential equation, hereafter referred to as dual cardinal equation, which will prove easier to solve and more flexible to use in practice, while retaining the previous numerical representation/advantages. Thus, we will retain the effective numerical representation of the involved quantity already introduced in the works [10,11], based on the "look-up table" (LUT) implementation of neural activation functions as well as the efficient numerical algorithm to solve the dual cardinal equation. LUTs were proven to provide an efficient way of representing and handling the variables appearing within the devised random number generation algorithm. A prominent advantage of the procedure is the lack of hard computational requirements except for LUT handling, which consists of sorting/searching on lists of numbers and of few simple algebraic operations on numbers.
The effectiveness of the proposed approach will be evaluated through numerical experiments. In particular, the designed experiments followed a logical succession, beginning with a basic assessment of the proposed method when applied to bi-Gaussian distribution, which is then followed by comparably more difficult distributions, namely a generalized Gaussian distribution and an asymmetric Gamma distribution.
The existing method presented in [3] is worth discussing. It concerns a neural-networks-type algorithm to generate random vectors with arbitrary marginal distributions and correlation matrix, based on NORTA method. The "normalto-anything" (NORTA) method (see, e.g., [12]) is one of the most efficient methods for random vector generation. In [3], a technique was presented to generate the correlation matrix of normal random vectors based on an artificial neural networks approach. The NORTA algorithm works in the following way to generate random samples with prescribed probability density function. First, generate zero-mean unitvariance random samples x i , i ∈ {1, . . . , Q}. Then, generate the desired random samples as y i = P −1 y (Φ(x i )), where Φ(·) denotes the cumulative distribution function of a standard normal random variable and P y (·) denotes the desired cumulative distribution function, with P −1 ]. It appears, thus, as a transformation method.
Most of the methods of random vector generation known from the literature impose constraints on the size of the random vectors and many of them are applicable only for bivariate distributions whose components are equidistributed. Conversely, within the NORTA framework, marginal probability distributions for vector components as well as their correlation matrix may be specified. Obtaining the prescribed generated random vector correlation matrix requires solving an involved nonlinear system of equations, which is the most serious problem in this kind of approach. Paper [3] makes use of a multilayer perceptron neural network to estimate correlation matrices of normal random vectors, allowing thus to overcome the analytically involved equations of NORTA algorithm. While the method proposed here is more general than NORTA in the sense that it works for any kind of available generator (not only Gaussian), it is less general in the sense that it does not allow to generate multivariate random variables with prescribed joint statistics.

DUAL CARDINAL EQUATION AND ITS NUMERICAL SOLUTION
The present section formalizes the learning problem at hand and illustrates a fixed-point-based numerical algorithm to solve the dual cardinal equation.

Dual cardinal equation and neural system
The key point of the new method consists in learning the inverse function f −1 (·) instead of the function f (·). As it will be clarified in the next sections, this choice simplifies the learning problem while adding slight computational burden to the usage of the learnt neural system as a generative model. We denote by x = g(y) def = f −1 (y) the inverse function of the actual neural transfer function and refer to the new neural system, having g(·) as transfer function, as the "dual neural system" (shown in Figure 1). The purpose here is to learn a dual neural system that warps p y (·) into p x (·) under the constraint g (y) > 0, for all y ∈ Y. We denote the interval of interest for the generated random variable as Y = [y y]. With this hypothesis on the nonlinear dual neural transfer function, the cardinal equation (1) may be rewritten as which will be hereafter referred to as "dual cardinal equation." It is worth noting that the boundary condition g(y) = 0 is completely arbitrary. While there are no theoretical reasons to set the boundary condition in any specific way, the above choice is motivated by the observation that it simplifies the fixed-point adapting algorithm with respect to the previous version proposed in [10].
In general, a closed-form solution to (3) may not be realized, thus we should resort to an iterative learning algorithm to search for a solution. Formally, this means designing an algorithm that generates a succession of functions g n (y), n ∈ N, whose limit coincides to the solution of (3). A way to generate such a succession is to employ the algorithm: As a figure-of-convergence of learning process, we consider the weighted difference of function g(·) between two successive iterations, namely, As initial guess, we assume g 0 (y) = 0, for all y ∈ Y.
After learning an inverse function g(·), the numerical procedure should calculate the actual nonlinear function f (·) by numerical inversion. As it will be clarified in the next section, within the framework proposed here, such operation involves a very little computational effort.

Numerical implementation of the learning procedure
From an implementation viewpoint, the algorithm (4) needs to be discretized in order to obtain a version suitable to be implemented on a computer. We choose to represent function g n (y) by a numerical vector: in practice, we suppose the interval Y = [y y] of interest to be partitioned into N ≥ 1 discrete bins. This gives rise to the vector-type representation y ∈ R N+1 of the support of the output sequence probability density function, where y contains N + 1 values regularly spaced in Y with spacing-width denoted as Δ y . Then g n (y) may be represented by a numerical vector g n ∈ R N+1 and the neural input-output transference is now represented by the discrete relationship (g, y) ∈ R N+1 × R N+1 , namely, a numerical lookup table. The entries of a vector g n may be denoted by an extra footer, that is, by g n,k , with k ∈ {0, 1, . . . , N}. The interval Δ y relates to the integer N and may be defined as In order to translate the learning rule (4) into a version suitable to numerical representation, we should consider the inherent limitations of numerical integration of differential equations. The following notes are worth taking into account. (a) Output support selection: the ultimate purpose of the random number generation method under construction is to generate random samples with desired probability distribution within a range of interest, namely, with values within an interval that is deemed suitable for the purposes that random samples generation is launched for. Therefore, the output range Y = [y y] is to be freely selected according to the needs the random samples are to be generated for. Then, the above-mentioned vector y has entries y k computed as y k = y + k·Δ y , with k ∈ {0, 1, . . . , N}. (b) Input support selection: in order to prevent the denominator of the quantity g n+1 (y) in (4) to become too close to zero, a sensible choice is to carefully select the support X. As in this paper we consider the input probability density function to be either (symmetric) Gaussian or uniform, we set with R x > 0. The value of constant R x is to be selected in such a way that p x (R x ) 0. It is worth recalling that the support of the input distribution may be arbitrarily selected as it does not affect the support of the output distribution. (c) Iterative range scaling: after each learning step, an affine normalization operation is performed, that linearly scales the entries of the putative solution g n so that g n,0 = −R x and g n,N = R x .
In order to describe the numerical learning algorithm, the following operators are defined for a generic look-up where the subscript k denotes the kth entry of the vectors cumsum(h) and affscale{h; a, b}. The behavior of the "cumsum" operator is illustrated in Figure 2, which also provides a visual representation of look-up tables. In practice, the considered numerical version of the learning rule (4) writes (A0) g 0 := 0, where symbol := denotes vector values assignment and p y denotes the vector of N + 1 entries containing the values of p y (·) corresponding to the values in y, and its entries may be denoted as p yk , with k ∈ {0, 1, . . . , N}.
In terms of look-up-tables entries, the learning relaxation index Δg n of definition (5) may be approximated as

Use of the neural system as generative model
When a suitable dual neural system described by the transference g(·) has been learnt, it may be effectively used to generate random samples drawn from the desired statistical distribution. The number of available input samples (that coincides with the number of output samples to be generated) is hereafter denoted by Q. The difficulty here is that the input samples x are known while the output samples y are supposed to be computed as y = f (x). However, unlike in [10], the function f (·) is not known in the present setting as its inverse g(·) only has been learnt. Nevertheless, the inversion of function g(·) is not required in order to employ the dual neural system as a generative model, provided an appropriate usage of the look-up table representing g(·) is made. First, it is necessary to produce a realization {x i }, i ∈ {1, . . . , Q}, drawn from the available-generator distribution p x (·) (having, e.g., zero-mean Gaussian or uniform probability density function) ranging in X. About generation of input samples, as they are generated by using an available generator whose range is wider than X, some generated input samples will be necessarily discarded. The amount of discarded input samples may be quantified. Let us denote by P x (·) the cumulative distribution function of the input, namely, The ratio ρ of the number of discarded samples over the total number of generated samples is given by The parameter R x may thus be selected in order to adjust the value of ρ(R x ) to design needs. Then, it is necessary to address the proper values in the learnt look-up

COMPUTER-BASED NUMERICAL EXPERIMENTS
In the following experiments, we consider generating random univariate samples with prescribed density function within prescribed ranges of interest, supposing that a prototype Gaussian random number generator is available. The prototype Gaussian distribution has zero mean and unitary variance. The parameter R x was set to 1 in all the experiments, which corresponds to a ratio ρ ≈ 0.3173 that allows retaining about 70% of the generated input samples. The experiments were run on a 1.86 GHz, 512 MB-RAM platform. Simone

Experiments on a "bi-Gaussian" distribution
The first case of generation of a random variable concerns a "bi-Gaussian" distribution defined by that may assume fairly asymmetric shapes. The numerical results presented below pertain to values σ 1 = 0.3, μ 1 = −0.5, σ 2 = 1, and μ 2 = 0.8. The interval of interest for the output variable is set to Y = [−2. 5 4]. The total number of generated output samples amounts to Q = 68219. The number of points in which the function g(·) is computed is N = 1000. The results obtained by running the learning algorithm (7) are shown in Figure 3. The values of the index Δg n shows that the fixed-point algorithm may be stopped after 5 iterations. In Figure 3, the histogram estimates (with 50 bins) of the generated Gaussian data and of the bi-Gaussian output-obtained with the learnt dual system-may be observed as well.
Cumulative results on repeated independent trials are illustrated. The number of iterations of the algorithm (7) Table 1: Average results about the experiment on bi-Gaussian random number generation; averages computed over 100 independent trials when the algorithm (7)

Experiments on a generalized Gaussian distribution
The second example of random samples generation is about a generalized Gaussian distribution [13]: where sinh (·) denotes the hyperbolic sine function and sech(·) denotes the reciprocal of the hyperbolic cosine function (namely, the hyperbolic secant function) The present generalized Gaussian distribution (GGD) differs by the standard GGD model encountered in literature (see, e.g., [14]). It belongs to the general exponential family of distributions of the type p y (y) ∝ exp (−κ 2 (y)), with κ(·) satisfying appropriate compatibility conditions. The distribution (12) as well as the GGD in [14] belong to the above exponential family. The numerical results presented below pertain to values σ = 1, μ = 0.8, and λ = 0.5. The interval of interest for the output variable is set to Y = [− 3 4]. The total number of generated output samples amounts to Q = 68335. The    number of points in which the function g(·) is computed is N = 1000. The results obtained by running the learning algorithm (7) are shown in Figure 4. The values of the index Δg n show that the fixed-point algorithm may be safely stopped after 5 iterations again. In Figure 4, the histogram estimates (with 50 bins) of the generated Gaussian data and of the generalized Gaussian output may be observed as well.
Cumulative results are illustrated as well. The number of iterations of the algorithm (7) was set to 10, while the other data stayed the same of the previous single-run experiment. The number Nof points ranged from 200 to 1000 with step 200. The average number of generated samples varies between about 68250 and 68290. The obtained results are summarized in Table 3.

Experiments on a Gamma distribution
The third example is repeated from [10]: we considered the generation of a (symmetric) Gamma distribution: This choice is motivated by the observation that the random number generation algorithm in [10] gives rise to the most inaccurate result when tested on the Gamma distribution. The numerical results presented below pertain to values α = 0.8 and β = 4. The interval of interest for the output variable is set to Y = [− 2 2]. The total number of generated output samples amounts to Q = 68355. The number of points in which the function g(·) is computed is N = 1500. The results obtained by running the learning algorithm (7) are shown in Figure 5. The values of the index Δg n show that the fixed-point algorithm may be safely stopped after 8 Computational Intelligence and Neuroscience 5 iterations again. Figure 5 shows the histogram estimates (with 50 bins) of the generated Gaussian data and of the Gamma-distributed output.
Cumulative results were obtained by setting the number of iterations of the algorithm (7) to 20, while the other data stayed the same of the previous single-run experiment. The number Nof points ranged from 1000 to 1800 with step 200. The average number of generated samples varies between about 68230 and 68280. The obtained results are summarized in Table 4.

CONCLUSION
The aim of the present manuscript was to present a novel random number generation technique based on dual neural system learning. We elaborated over our recent work [10] in order to obtain a new learning algorithm free of the need of choosing parameters and normalization-criteria. The main idea is to shift the learning paradigm from the viewpoint of cardinal equation solving to dual cardinal equation solving, which appears to be more easily profitable.
The proposed numerical results confirmed the agreement between the desired and obtained distributions of the generated variate. The analysis of computational burden, in terms of running times, shows that the proposed algorithm is not computationally demanding.