Primordial non-Gaussianity and Bispectrum Measurements in the Cosmic Microwave Background and Large-Scale Structure

The most direct probe of non-Gaussian initial conditions has come from bispectrum measurements of temperature fluctuations in the Cosmic Microwave Background and of the matter and galaxy distribution at large scales. Such bispectrum estimators are expected to continue to provide the best constraints on the non-Gaussian parameters in future observations. We review and compare the theoretical and observational problems, current results and future prospects for the detection of a non-vanishing primordial component in the bispectrum of the Cosmic Microwave Background and large-scale structure, and the relation to specific predictions from different inflationary models.


I. INTRODUCTION
The standard inflationary paradigm predicts a flat Universe perturbed by nearly Gaussian and scale invariant primordial perturbations. These predictions have been verified to a high degree of accuracy by Cosmic Microwave Background (CMB) and Large-Scale Structure (LSS) measurements, such as those provided by the Wilkinson Microwave Anisotropy Probe (WMAP; Komatsu et al., 2009), the 2dF Galaxy Redshift Survey (2dFGRS; Percival et al., 2002) and the Sloan Digital Sky Survey (SDSS; Tegmark et al., 2004). Despite this success, it has proved to be difficult to discriminate between the vast array of inflationary scenarios that have been proposed by high-energy theoretical investigations, or even to rule-out alternatives to inflation.. Since most of the present constraints on the Lagrangian of the inflaton field have been obtained from measurements of the two-point function, or power spectrum, of the primordial fluctuations, a natural step is to extend the available information is to look at non-Gaussian signatures in higher order correlators.
The lowest order additional correlator to take into account is the three-point function or its counterpart in Fourier space, the bispectrum. Every model of inflation is characterized by specific predictions for the bispectrum of the primordial perturbations in the gravitational potential Φ(k). The bispectrum B Φ (k 1 , k 2 , k 3 ) of these perturbations is defined as where we have introduced the notation k i j ≡ k 1 +k 2 so that the Dirac delta function here is δ D (k 123 ) ≡ δ D (k 1 +k 2 +k 3 ).
Together with the assumption of statistical homogeneity and isotropy for the primordial perturbations, this implies that the bispectrum is a function of the triplet defined by the magnitude of the wavenumbers k 1 , k 2 and k 3 forming a closed triangular configuration. The current constraints that we are able to derive on the bispectrum B Φ (k 1 , k 2 , k 3 ) provide additional information about the early Universe; the possible detection of a non-vanishing primordial bispectrum in future observations would represent a major discovery, especially as it is predicted to be negligible by standard inflation.
The cosmological observable most directly related to the initial curvature bispectrum is given by the bispectrum of the CMB temperature fluctuations, which provide a map of the density perturbations at the time of decoupling, the earliest information we have about the Universe. Current measurements of individual triangular configurations of the CMB bispectrum are, however, consistent with zero. Studies of the primordial bispectrum, therefore, are usually characterized by constraints on a single amplitude parameter, denoted by f NL , once a specific model for B Φ is assumed. Since most models predict a curvature bispectrum obeying the hierarchical scaling B Φ (k, k, k) ∼ P 2 Φ (k), with P Φ (k) being the curvature power spectrum, the non-Gaussian parameter roughly quantifies the ratio f NL ∼ B Φ (k, k, k)/P 2 Φ (k), defining the "strength" of the primordial non-Gaussian signal. In addition, we can write where F(k 1 , k 2 , k 3 ) encodes the functional dependence of the primordial bispectrum on the specific triangle configurations. For brevity, the characteristic shape-dependence of a given bispectrum is often referred to simply as the bispectrum shape (a precise definition of the bispectrum shape function will be given in section II.A). Inflationary predictions for both the amplitude f NL and the shape of B Φ that are strongly model-dependent. Notice that the subscript "NL" stands for "nonlinear", since a common phenomenological model for the non-Gaussianity of the initial conditions can be written as a simple nonlinear transformation of a Gaussian field. Generically, of course, non-Gaussianity is associated with nonlinearities, such as nontrivial dynamics during inflation, resonant behaviour at the end of inflation ('preheating'), or nonlinear post-inflationary evolution. At the very least, future CMB and LSS observations are expected to be able to eventually detect the small last contribution. Perturbations in the CMB provide a particularly convenient test of the primordial density field because CMB temperature and polarization anisotropies are small enough to be studied in the linear regime of cosmological perturbations. Once the effects of foregrounds are properly taken into account, a non-vanishing CMB bispectrum at large scales would be a direct consequence of a non-vanishing primordial bispectrum. As we will see, while other CMB probes of primordial non-Gaussianity are available, such as tests of the topological properties of the temperature map based on Minkowski Functionals or measurements of the CMB trispectrum, the estimator for the non-Gaussian parameter f NL has been shown to be optimal. We will focus mostly on this bispectrum estimator in the section of this review dedicated to the CMB.
In the standard cosmological model, the large-scale structure of the Universe, that is, the distribution of matter and galaxies on large scales, is the result of the nonlinear evolution due to gravitational instability of the same initial density perturbations responsible for the CMB anisotropies. This is, perhaps, the most important prediction of the inflationary framework which provides a common origin for the CMB and large-scale structure perturbations as the result of tiny quantum fluctuations stretched over cosmological scales during a phase of accelerated expansion. The large-scale structure we observe at low redshift, however, is characterized by large voids and small regions with very large matter density, and it is therefore a much less direct probe of the initial conditions. The distribution of matter becomes a highly non-Gaussian field precisely as a result of the nonlinear growth of structures, even for Gaussian initial conditions. This non-Gaussianity is expressed, in particular, by a non-vanishing matter bispectrum at any measurable scale, including the largest scales probed by current or future redshift surveys. In this context, the effect of primordial non-Gaussianity, i.e. of an initial component in the curvature bispectrum, will constitute a correction to the galaxy bispectrum. It follows that the possibility of constraining or detecting this initial component is strictly related to our ability to distinguish it from other, primary sources of non-Gaussianity, that is the nonlinear gravitational evolution, and, in the case of galaxy surveys, nonlinear bias.
The study of non-Gaussian initial conditions for large-scale structure has a relatively long history, with important contributions going back to the mid eighties. The standard picture that has been developed over the years, assumed that, at large scales, the effect of primordial non-Gaussianity on the galaxy distribution is simply given in terms of an additional component to the galaxy bispectrum. This is obtained, in perturbation theory, as the linearly evolved and linearly biased initial matter bispectrum, related to the curvature bispectrum B Φ (k 1 , k 2 , k 3 ) by the Poisson equation. Such component becomes subdominant as the gravity-induced non-Gaussian contribution grows in time. In this framework, as one can expect, high-redshift and large-volume galaxy surveys would constitute the best probes of the initial conditions. It has been shown, in fact, that proposed and planned redshift surveys, such as Euclid (Refregier et al., 2010), should be able to provide constraints on the primordial non-Gaussian parameters comparable, if not better, than those expected from CMB missions such as Planck. What is more important, in the event of a detection by Planck, is that confirmation by large-scale structure observations will be required.
Recent results from N-body simulations with non-Gaussian initial conditions, however, have revealed a more complex picture. The effect of primordial non-Gaussianity at large scales is not limited to an additional contribution to the galaxy bispectrum, but it quite dramatically affects the galaxy bias relation itself, that is, the relation between the matter and galaxy distributions. A surprising consequence is that it induces a large correction even for the galaxy power spectrum. Such an effect has attracted considerable recent attention and, remarkably, have placed constraints on the non-Gaussian parameter from current LSS data-sets which already appear to marginally improve on CMB limits. However, from a theoretical point of view, a proper understanding of the phenomenon remains to be properly developed and, for example, reliable predictions for the galaxy bispectrum are not yet available. Most importantly, as for general cosmological parameter estimation, a complete likelihood analysis aimed at constraining, or detecting, primordial non-Gaussianity in large-volume redshift surveys should involve joint measurements of the galaxy power spectrum and bispectrum, as well as possibly higher-order correlation functions. While we are still far from a proper assessment of what such analysis would be able to achieve, current results in this direction are very encouraging.
This review is divided in four parts. In section II we will first discuss initial conditions as defined in terms of the primordial curvature bispectrum and its phenomenology. We will then review the observational consequences of primordial non-Gaussianity on the CMB bispectrum, section III, and on the large-scale structure bispectrum as measured in redshift surveys, section IV. In both cases we will discuss theoretical models for the observed bispectra and technical problems related to the estimation of the non-Gaussian parameters, with the differences that naturally characterize such distinct observables. We also give an example of joint analysis using both CMB and large-scale structure when we consider the possibility of constraining a strongly scale-dependent non-Gaussian parameter f NL (k), emerging in some recently proposed inflationary models.

A. The primordial bispectrum and shape function
The starting point for this discussion is the primordial gravitational potential perturbation Φ(x, t) which was seeded by quantum fluctuations during inflation or by some other mechanism in the very early universe (t t dec ). When characterizing the fluctuations Φ we usually work in Fourier space with the (flat space) transform defined through The primordial power spectrum P Φ (k) of these potential fluctuations is found using an ensemble average, where we have assumed that physical processes creating the fluctuations are statistically isotropic so that only the dependence on the wavenumber remains, k = |k|. Recall that for nearly scale-invariant perturbations, the fluctuation variance on the horizon scale k ≈ H is almost constant ∆ 2 k∼H ≈ k 3 P Φ (k)/2π 2 ≈ const., implying P Φ (k) ∼ k −3 . The primordial bispectrum B Φ (k , k 2 , k 3 ) is found from the Fourier transform of the three-point correlator as Φ(k 1 )Φ(k 2 )Φ(k 3 ) = (2π) 3 δ D (k 123 ) B Φ (k 1 , k 2 , k 3 ) . (II.5) Here, the delta function enforces the triangle condition, that is, the constraint that the wavevectors in Fourier space must close to form a triangle, k 1 + k 2 + k 3 = 0. Examples of such triangles are shown in fig. 1, illustrating the basic squeezed, equilateral and flattened triangles to which we will refer later. Note that a specific triangle can be completely described by the three lengths of its sides and so, in the isotropic case, we are able to describe the bispectrum using only the wavenumbers k 1 , k 2 , k 3 . The triangle condition restricts the allowed wavenumber configurations (k 1 , k 2 , k 3 ) to the interior of the tetrahedron illustrated in fig. 2. The most studied primordial bispectrum is the local model in which contributions from 'squeezed' triangles are dominant, that is, with e.g. k 3 k 1 , k 2 (as illustrated in the left of fig. 1). This is well-motivated physically as it encompasses 'superhorizon' effects during inflation when a large scale mode k 3 (say) which has exited the Hubble radius exerts a nonlinear influence on the subsequent evolution of smaller scale modes k 1 , k 2 . Although this effect is small in single field slow-roll inflation, it can be much larger for multifield models. In a weakly coupled regime, the potential can be split into two components, the linear term Φ L , representing a Gaussian field, giving the usual perturbation results plus a small local non-Gaussian term Φ NL (Salopek & Bond, 1990), Tetrahedral domain for allowed wavenumber configurations k 1 , k 2 , k 3 contributing to the primordial bispectrum B(k 1 , k 2 , k 3 )). A regular tetrahedron is shown satisfying k 1 + k 2 + k 3 ≤ 2k max ≡ 2K.
These considerations lead naturally to the definition of the primordial shape function  S (k 1 , k 2 , k 3 ) ≡ 1 N (k 1 k 2 k 3 ) 2 B Φ (k 1 , k 2 , k 3 ) , (II.12) where N is a normalization factor which is often chosen such that S is unity for the equal k i case, that is, S (k, k, k) = 1 (we shall discuss alternatives to this rather arbitrary convention later). For example, the canonical 'local' model (II.9) has the shape Thus it is usual to describe the primordial bispectrum in terms of an overall amplitude f NL and a transverse twodimensional shape S (k 1 , k 2 , k 3 ) = S (α,β), which incorporates any distinctive momentum dependence. Of course, if there is a non-trivial scale dependence, then the full three-dimensional dependence of S (k 1 , k 2 , k 3 ) on the k i must be retained.
There are other physically well-motivated shapes in the literature which have also been extensively studied. The simplest shape is the constant model S const (k 1 , k 2 , k 3 ) = 1 , (II.14) which, like the local model, has a large-angle analytic solution for the CMB bispectrum (Fergusson & Shellard, 2009). The local model tends to be the benchmark against which all other models are compared and normalized, but for practical purposes the constant model is much more useful, given its regularity at both late and early times. The equilateral shape is another important case with  S equil (k 1 , k 2 , k 3 ) = (k 1 + k 2 − k 3 )(k 2 + k 3 − k 1 )(k 3 + k 1 − k 2 ) k 1 k 2 k 3 . (II.15) While not derived directly from a physical model, it has been chosen phenomenologically as a separable ansatz for higher derivative models (Creminelli, 2003) and DBI inflation (Alishahiha et al., 2004). The equilateral shape is contrasted with the local model in fig. 3. Another important early result was the primordial bispectrum shape for single-field slow roll inflation derived by Acquaviva et al. (2003); Maldacena (2003) S Mald (k 1 , k 2 , k 3 ) ∝ (3 − 2η) k 2 1 k 2 2 + k 2 2 k 2 3 + k 2 3 k 2 1 k 1 k 2 k 3       (6 − 2η)S local (k 1 , k 2 , k 3 ) + 5 3 S equil (k 1 , k 2 , k 3 ) , (II.16) where , η are the usual slow roll parameters. In the second line, we have noted that this shape can be accurately represented as the superposition of local and equilateral shapes. The coefficients in (II.16), which include the scalar spectral index n − 1 = −6 + 2η −0.05, confirm that f NL 1 and so standard single slow roll inflation cannot produce an observationally significant signal. Nevertheless, it is interesting to determine which shape is dominant in (II.16) and to what extent other primordial shapes are independent from one another.
Whether two different primordial shapes can be distinguished observationally can be determined from the correlation between the corresponding two CMB bispectra weighted for the anticipated signal to noise, as in the estimator (see next section) and the Fisher matrix analysis (see section III.H). However, direct calculations of the CMB bispectrum can be very computationally demanding. A much simpler approach is to determine the independence of the two shape functions S and S from the correlation integral (Fergusson & Shellard, 2009, see also ) where we choose the weight function to be ω (k 1 , k 2 , k 3 ) = 1 k 1 + k 2 + k 3 , (II.18) reflecting the primary scaling of the CMB correlator. The shape correlator is then defined bȳ Here, the integral is over the tetrahedral region shown in fig. 2 taken out to a maximum wavenumber k k max corresponding to the experimental range l ≤ max for which forecasts are sought (with max ≈ τ 0 k max ). The weight function ω (k 1 , k 2 , k 3 ) appropriate for mimicking the large-scale structure bispectrum estimator (see section IV.C.2), would be different with varying scaling lawas introduced by the transfer functions for wavenumbers k above and below k eq , the inverse comoving horizon at equal matter-radiation. Nevertheless, the 1/k weight given in (II.18) provides a compromise between these scalings, and so shape correlation results should offer a useful first approximation.
Below we will survey primordial models in the literature, showing how close the shape correlator comes to a full Fisher matrix analysis. However, here we note that the local shape (II.13) and the equilateral shape (III.53) have only a modest 46% correlation. For the natural values of the slow roll parameters ≈ η we find the somewhat surprising result that S Mald is 99.7% correlated with S local (and it cannot be easily tuned otherwise because 3 ≈ η is not consistent with deviations from scale-invariance favored observationally n − 1 < 0). Such strong correspondences are important in defining families of related primordial shapes, thus reducing the number of different cases for which separate observational constraints must be sought.

B. General primordial bispectra and separable mode expansions
The three shape functions (II.13), (II.14) and (III.53) quoted above share the important property of separability, that is, they can be written in the form S (k 1 , k 2 , k 3 ) = X(k 1 ) Y(k 2 ) Z(k 3 ) + 5 perms. , (II.20) or as the sum of just a few such terms. As we shall see, if a shape S is separable, then the computational cost of evaluating the corresponding CMB bispectrum B 1 2 3 is dramatically reduced. In fact, without this property, the task of estimating whether a non-separable bispectrum is consistent with observation appears to be intractable (for large max ). Of course, the number of models which can be expressed directly in the form (II.20) is very limited, despite the usefulness of approximate ansätze such as the equilateral shape (III.53). Indeed, approximating non-separable shapes by educated guesses for for the separable functions X, Y, Z is neither systematic nor computationally efficient (because arbitrary non-scaling functions create numerical difficulties, as we shall explain later). FIG. 4 The one-dimensional tetrahedral polynomials q n (k) on the domain (II.23), rescaled to the unit interval for n = 0-5. Also plotted are the shifted Legendre polynomials P n (2x − 1) (dashed lines) which share qualitative features such as n nodal points.
Instead, we shall present a separable mode expansion approach for efficient calculations with any non-separable bispectrum, as described in detail in Fergusson et al. (2009) (and originally proposed in Fergusson & Shellard, 2007. Our aim will be to express any shape function as an expansion in mode functions where, here, for convenience we have represented the symmetrized products of the separable basis functions q p (k) as with a one-to-one mapping ordering the products as n ↔ {prs}. The important point is that the q p (k) must be an independent set of well-behaved basis functions which can be used to construct complete and orthogonal three-dimensional eigenfunctions on the tetrahedral region V T defined by (see fig. 2) The introduction of the cut-off at k max is motivated both by separability and the correspondence with the observational domain l ≤ max . In the shape correlator (II.19), we have already seen what is essentially an inner product between two shapes on this tetrahedral region, which we can define for two functions f, g as with weight function w. Satisfactory convergence for known bispectra can be found by using simple polynomials q p (k) in the expansion (II.21), that is, using analogues of Legendre polynomials on the domain (II.23). With unit weight, the polynomials satisfying q p (k 1 ), q r (k 1 ) = δ pr can be found by generating functions with the first three given by (Fergusson et al., 2009) The first few polynomials q p (k) are plotted in fig. 4, where they are contrasted with Legendre polynomials. The three-dimensional separable basis functions Q n in (II.22) reflect the six symmetries of the bispectrum through the permuted sum of the product terms. They could have been constructed directly from simpler polynomials, such as FIG. 5 Orthonormal eigenmode decomposition coefficients (II.26) for the equilateral and DBI models (upper panel) and shape correlations (II.19) of the original bispectrum against the partial sum up to a given mode n (lower panel). The correlation plot includes both primordial and late-time CMB bispectra for the equilateral and DBI models, as well as the late-time CMB bispectrum from cosmic strings (refer to section III). In all cases, we find that we need at most 15 three-dimensional modes to obtain a correlation greater than 98% (primordial convergence without the acoustic peaks requires only 6 modes).
1, k 1 + k 2 + k 3 , k 2 1 + k 2 2 + k 2 3 , ..., however, the q p polynomials have two distinct advantages. First, the q p 's confer partial orthogonality on the Q n and, secondly, these remain well-behaved when convolved with transfer functions.
In order to rapidly decompose an arbitrary shape function S into the coefficients α Q n ↔ α Q prs , it is more convenient to work in a non-separable orthonormal basis R n ( R n , R m = δ nm . These can be derived directly from the Q n through Gram-Schmidt orthogonalization, so that R n = n p=0 λ mp Q p with λ mp a lower triangular matrix (see Fergusson et al., 2009). Thus we can find the unique shape function decomposition In the orthonormal R n frame, Parseval's theorem ensures that the autocorrelator is simply S , S = n α R n 2 . Hence, with a simple and efficient prescription we can construct separable and complete basis functions on the tetrahedral domain (II.23) providing rapidly convergent expansions for any well-behaved shape function S . These eigenmode expansions will prove to be of great utility in subsequent sections. Examples of this bispectral decomposition and its rapid convergence for the equilateral and DBI models are shown in fig. 5.

C. Families of primordial models and their correlations
We will now briefly survey the main categories of primordial models in the literature and their relative independence, closely following the discussion in Fergusson & Shellard (2009).

The constant model
The constant model (II.14) is the simplest possible primordial shape with triangles of every configuration contributing equally to the bispectrum B(k 1 , k 2 , k 3 ); it is the equipartition model. The constant model was motivated initially by its simplicity (Fergusson & Shellard, 2009) leading to an analytic solution for the large-angle CMB bispectrum, as well as due to its close correlation with equilateral models. However, the shape does have more explicit physical motivation in at least one context (Chen & Wang, 2009), during multifield inflation for a slowly turning trajectory (denoted quasi-single field inflation). For multifield inflation, it is well known that the conversion of isocurvature fluctuations into curvature fluctuations during 'corner-turning' can source significant non-Gaussianity (see e.g. Rigopoulos et al., 2006a;Vernizzi & Wands, 2006). In the quasi-single field case with mass m ∼ H isocurvature modes, a detailed investigation of the ongoing conversion into the curvature mode demonstrated that novel shapes could be generated (Chen & Wang, 2009), amongst them shapes which were very nearly constant. Generically, these model-dependent shapes belonged to a one-parameter family which interpolated non-trivially between equilateral (III.53) and local (II.13) shapes (see also Renaux-Petel, 2009;Senatore et al., 2009). This is an important caveat for the present discussion, because non-Gaussian searches could uncover shapes intermediate between the categories we will discuss below.

Equilateral triangles -centre-weighted models
Bispectra dominated by contributions from nearly equilateral triangle configurations, k 1 ≈ k 2 ≈ k 3 can be fairly easily characterized analytically and are the most amenable to CMB searches. However, equilateral non-Gaussianity requires the amplification of nonlinear effects around the time modes exit the horizon, which is not possible in a slowroll single field inflation. Instead, the kinetic terms in the effective action must be modified as in the Dirac-Born-Infeld (DBI) model (Alishahiha et al., 2004) or by explicitly adding higher derivative terms, such as in K-inflation (see, for example, Chen et al., 2007). The resulting corrections modify the sound speed c s and inflation is able to take place in steep potentials. For DBI inflation, this leads to non-Gaussianity being produced with a shape function of the form (Alishahiha et al., 2004;Creminelli, 2003) (II.27) Another example of a model with non-standard kinetic terms is ghost inflation (Arkani-Hamed et al., 2004) with a derivatively-coupled field driving inflation and a trilinear term in the Lagrangian creating a non-zero equilateral-type shape S ghost tending towards constant. General non-Gaussian shapes arising from modifications to single field inflation have been extensively reviewed in Chen et al. (2007). Using a Lagrangian that was an arbitrary function of the field and its first derivative, they were able to identify six distinct shapes describing the possible non-Gaussian contributions. Half of these had negligible  amplitude being of the order of slow-roll parameters (with two already given in (II.16)). Of the remaining three shapes (Chen et al., 2007, see also Seery & Lidsey, 2005), one was believed to be subdominant, the second recovered the DBI shape (II.27), leaving a third distinct single field shape which is the inverse of the local shape (II.13), S single ∝ S local −1 . Finally, we recall the original equilateral shape (III.53), noting that it was introduced not because of a fundamental physical motivation, but as a separable approximation to the DBI shape (II.27) . Despite the apparent visual differences between these shapes (see Fergusson & Shellard, 2009), particularly near the edges of the tetrahedral domain, the shape correlator (II.19) reveals at least a 95% or greater correlation of the DBI, ghost and single shapes to the equilateral shape (III.53) (consistent with results in Smith & Zaldarriaga, 2006) . Comparative results between the shape correlator are given in Table I (together with the corresponding CMB correlation results brought forward and showing the efficacy of these estiomates). These particular centre-weighted shapes must be regarded as a single class which would-be extremely differentiate observationally, without a bispectrum detection of very high significance.
Finally, we comment on the 'orthogonal' shape S orthog proposed in Senatore et al. (2009), together with S equil , for characterizing single field inflation models with an approximate shift symmetry (see also Chen et al., 2007). This shape is approximately S orthog ∝ S equil − 2/3, which means it is very similar to an earlier study of flattened shapes (Meerburg et al., 2009) which proposed an 'enfolded' shape with S enfold ∝ S equil − 1. From the eigenmode decomposition (II.26) of the equilateral model shown in fig. 5, it is clear how the degree of correlation can be altered by subtracting out the important constant term.

Squeezed triangles -corner-weighted models
The local shape covers a wide range of models where the non-Gaussianity is produced by local interactions. These models have their peak signal in "squeezed" states where one k i is much smaller than the other two due to non-Gaussianity typically being produced on superhorizon scales. We have already observed that single-field slow-roll inflation (II.16) is dominated by the local shape (Bartolo et al., 2004a), though f loc. NL is tiny (Acquaviva et al., 2003;Bartolo et al., 2004a;Maldacena, 2003;?;?). The production of non-Gaussianity during multiple field inflation (Bernardeau & Uzan, 2003, 2002Lyth & Rodriguez, 2005;Rigopoulos et al., 2006aRigopoulos et al., ,b, 2007Seery & Lidsey, 2005;Vernizzi & Wands, 2006) shows much greater promise through conversion of isocurvature into adiabatic perturbations (see, for example, recent work in Byrnes et al., 2008;Chen & Wang, 2009;Naruko & Sasaki, 2009;Renaux-Petel, 2009, and references therein). The magnitude of the non-Gaussianity generated is normally around f loc. NL ≈ O(1), which is at the limit for Planck detection, but models can be tuned to create larger signals. Significant f loc. NL can be produced in curvaton models with f loc. NL ≈ O(100) (Bartolo et al., 2004b;Linde & Mukhanov, 2006;Lyth et al., 2003). Large f loc. NL can also be generated at the end of inflation from massless preheating or other reheating mechanisms (Chambers & Rajantie, 2008;Enqvist et al., 2005a,b).
We note that local non-Gaussianity can also be created in more exotic scenarios. Models based on non-local field theory, such as p-adic inflation, can have inflation in very steep potentials. Like single field slow-roll inflation, the predicted 'non-local' shape function is a combination of a dominant local shape (II.13) and an equilateral shape (III.53) (see, for example, Barnaby & Cline, 2008;Halliwell, 1993;Seery et al., 2008;Zaldarriaga, 2004). The ekpyrotic model can also generate significant f loc.
NL (Buchbinder et al., 2008;Koyama et al., 2007;Lehners & Steinhardt, 2008a,b). Here the density perturbations are generated by a scalar field rolling in a negative exponential potential, so non-linear interactions are important with f loc. NL ≈ O(100).
FIG. 6 Shape functions for the nearly scale-invariant 'warm' and 'flat' NG models, S (k 1 , k 2 , k 3 ) = S (α,β) on transverse slices with 2k = k 1 + k 2 + k 3 = const. These distinct and independent shapes prove to be largely uncorrelated with each other and the local and equilateral models illustrated in fig.Ḟrom Fergusson & Shellard (2009). In using the shape correlator for the local model, we must introduce a small-wavenumber cut-off, taken to be a k min = 2/tau 0 , otherwise the shape correlatorC(S local , S local ) becomes infinite. This logarithmic divergence does not afflict the CMB bispectrum b l 1 l 2 l 3 because we do not consider contributions below the quadrupole l = 2 (a threshold which is approximated by the primordial cut-off). The local shape is modestly correlated at the 40-55% level with the equilateral shapes, mainly through the constant term in the expansion (II.26). As can be seen in table II this somewhat underestimates the CMB correlator. Nevertheless, a NG signal of only modest significance should be able to distinguish between these independent models.
Finally, warm inflation scenarios, i.e. models in which dissipative effects play a dynamical role, are also predicted to produce significant non-Gaussianity (Gangui et al., 1994;Gupta et al., 2002). Contributions are again dominated by squeezed configurations but with a different more complex shape possessing a sign flip as the corner is approached (see fig. 6). This makes the warm S warm and local S local shapes essentially orthogonal with only a 33% correlation. Again, in using the shape correlator, we need to introduce the same phenomenological cut-off k min as for the local model, but we also note the more serious concern which is the apparent breakdown of the approximations used to calculate the warm inflation shape near the corners and edges.

Flattened triangles -edge-weighted models
It is possible to consider inflationary vacuum states which are more general than the Bunch-Davies vacuum, such as an excited Gaussian (and Hadamard) state (Holman & Tolley, 2008, see also discussions in Chen et al. 2007;Meerburg et al. 2009). Observations of non-Gaussianity in this case might provide insight into trans-Planckian physics. The proposed shape for the bispectrum is (II.28) The bispectrum contribution from early times is dominated by flattened triangles, with e.g. k 3 ≈ k 1 + k 2 , and for a small sound speed c s 1 can be large. Unfortunately, as the divergent analytic approximation breaks down at the boundary of the allowed tetrahedron, some form of cut-off must be imposed, as shown for the smoothed shape in fig. 6 where an edge truncation has been imposed together with a Gaussian filter. The lack of compelling physical motivation and ill-defined asymptotics make predictions for this model uncertain.

Features -scale-dependent models
There are also models in which the inflation potential has a feature, providing a break from scale-invariance. This can take the form of a either a step (Chen et al., 2006) or a small oscillation superimposed onto the potential (Bean et al., 2008). Analytic forms for both these three point functions have been presented in Chen et al. (2008) with one approximation taking the form, where k * is the associated scale of the feature in question and P is a phase factor. Results for the shape correlator for a particular feature model (with k * ≈ * /τ 0 and * = 50), are given in table II, showing that it is essentially independent of all the other shapes. Clearly, scale dependent feature models form a distinct fifth category beyond equilateral, local, warm and flat shapes.

A. The CMB bispectrum
In this section we will study the connection between the primordial bispectrum at the end of inflation and the observed bispectrum of CMB anisotropies B 1 2 3 . Our work will be primarily concerned with the analysis of the three-point function induced by a NG primordial gravitational potential Φ(k) in the CMB temperature fluctuation field. Temperature anisotropies are represented using the a m coefficients of a spherical harmonic decomposition of the cosmic microwave sky, Analogous expansions are performed for the E mode polarization field in order to produce polarization multipoles a E m . For simplicity and clarity, throughout most of this review we will focus on the temperature multipoles a T m , and omit the superscript T for convenience of notation. However we stress here that all of the considerations we make in the following can be readily applied to polarization multipoles and related bispectra. More discussion about this subject can be found in section III.H.2.
The primordial potential Φ is imprinted on the CMB multipoles a lm by a convolution with transfer functions ∆ l (k) representing the linear perturbation evolution, through the integral: The radiation transfer functions encode all the typical effects observed in the CMB power spectrum at linear order, that is, the Sachs-Wolfe effect, Integrated Sachs-Wolfe effect, doppler peaks and silk damping (see e.g. Dodelson, 2003;Ma & Bertschinger, 1995). An equation identical to (III.30) produces the E-mode polarization CMB multipoles starting from the primordial temperature fluctuation field, provided polarization transfer functions replace temperature transfer functions in the convolution above. It is sometimes useful to rewrite equation (III.30) in position, rather than Fourier, space. In this case it is straightforward to show that (III.30 becomes where, starting from the primordial potential Φ(x), we transform from Cartesian into polar coordinates x = (r,x), and defined: In this expression j is the spherical Bessel function of order . The CMB bispectrum is the three point correlator of the a m , so substituting we obtain B 1 2 3 m 1 m 2 m 3 = a 1 m 1 a 2 m 2 a 3 m 3 (III.34) where in the last line we have integrated over the angular parts of the three k i , having inserted the exponential integral form for the delta function in the bispectrum definition (II.5). The last integral over the angular part of x is known as the Gaunt integral, which can be expressed in terms of Wigner-3 j symbols as (for more details on these functions and their properties (see e.g. Komatsu, 2002, and references therein) Given that most theories we shall consider are assumed to be isotropic, the m-dependence can be factorized out of the physically relevant part of the bispectrum (Luo, 1994). It is then usual to work with the angle-averaged bispectrum, or the even more convenient reduced bispectrum which removes the geometric factors associated with the Gaunt integral, From the previous two formulae we also derive the following useful relations between the full, averaged and reduced bispectra: The reduced bispectrum from (III.34) then takes the much simpler form This is the key equation in this section, since it explicitly relates the primordial bispectrum, predicted by inflationary theories, to the reduced bispectrum observed in the cosmic microwave sky. This formula is entirely analogous to the well known relation linking the primordial curvature power spectrum P Φ (k) and the CMB angular power spectrum C , that is, Finally, it is important to note that the Gaunt integral in (III.41) encodes several constraints on the angle averaged bispectrum B 1 2 3 which are no longer transparent in the reduced bispectrum b 1 2 3 . These are: 1. The sum of the three multipoles i must be even.

The i 's satisfy the triangle condition
Analogous to the wavenumber constraint (II.23), the second condition is tells us that the only multipole configurations giving non-zero contributions to the bispectrum are those that form a closed triangle in harmonic ( -)space. For wavenumbers, the triangle condition is enforced through the x-integral over the three spherical Bessel functions j (k i x) which evaluates to zero if the k i 's cannot form a triangle, whereas in multipole space it is enforced by the angular integration dΩ x over the spherical harmonics Y i m i in (III.39).

B. Separable primordial shapes and CMB bispectrum solutions
In terms of the shape function (II.12), the reduced bispectrum (III.43) can be rewritten as The expression above can be simplified and simple analytic solutions can sometimes be obtained for the very important class of separable shapes obeying the ansatz S = XYZ, as in (II.20). Substituting (II.20) into (III.45), we find that where we have defined the quantities: Instead of the three-dimensional integral of (III.45) we now have to deal with a much more tractable product of three one-dimensional integrals. Moreover, if we work at large angular scales in the Sachs-Wolfe approximation, the transfer functions become ∆ l (k) = 1 3 j l [(τ o − τ dec ) k]. The presence of a product of spherical Bessel functions in the integrals above can lead in some cases to simple analytic solutions.
Let us demonstrate this for the separable primordial shapes considered in section II. The simplest possible shape, the constant model (II.14) with S (k 1 , k 2 , k 3 ) = 1, has a large-angle analytic solution for the reduced bispectrum (Fergusson & Shellard, 2009) The large-angle solution (III.48) is an important benchmark with which to compare the shape of late-time CMB bispectra from other models b 1 2 3 (note the l −4 scaling). The more general constant solution does not have an analytic solution because the transfer functions cannot be expressed in a simple form, but it can be evaluated numerically from the expression The numerical solution is shown in fig. 7, exhibiting the a regular pattern of acoustic peaks introduced by the oscillating transfer functions. For the local shape (II.13), the Sachs-Wolfe approximation also yields a large-angle analytic solution b local where the divergences for the squeezed triangles (k 1 k 2 , k 3 ...) in the primordial shape (II.13) are also reflected in b local 1 2 3 . It is straightforward, in principle, to calculate the full bispectrum from the separable expressions arising from (II.13), b local 1 2 3 = x 2 dx α 1 (x)β 2 (x)β 3 (x) + 2 perm. , (III.51) where the separated integrals analogous to (III.49) become However, we note that these highly oscillatory integrals must be evaluated numerically with considerable care. For the equilateral shape (III.53) we first make its separability explicit by expanding the expression in the form, While there is no simple large-angle analytic solution known for the equilateral model, it can be evaluated from the simplified expression b equil 1 2 3 = x 2 dx 2δ 1 δ 2 δ 3 + α 1 β 2 β 3 + 2 perm. + β 1 γ 2 δ 3 + 5 perm. , (III.54) where α l , β l are given in (III.52) and γ l , δ l are defined by (compare with the local case) (III.55)

C. Non-separable bispectra revisited
Recall the mode expansion (II.21) of a general non-separable primordial shape. If we substitute this into the expression for the reduced bispectrum (III.45), then the separability of the expansion leads to the same efficient calculation route discussed in the previous section through (Fergusson et al., 2009) where the q p simply result from convolving the basis functions q p (k) with the transfer functions, The computationally costly 3D integrals have again reduced to a sum over products of 1D integrals; we note that this economy arises because the triangle condition is enforced in (III.56) through the product of Bessel functions, resulting in a manifestly separable form in which we can interchange orders of integration. With this mode expansion, all nonseparable theoretical CMB bispectra b 1 2 3 become efficiently calculable provided there is a convergent expansion for the shape function.
In the same way that we decomposed an arbitrary primordial shape S (k 1 , k 2 , k 3 ) in section II.B, it is possible to construct analogous late-time separable basis functions Q n and orthonormal modes R n with which to describe the CMB bispectrum β 1 2 3 (Fergusson et al., 2009;Fergusson & Shellard, 2007). The tetrahedral domain V T defined by the triangle condition for multipole configurations { 1 , 2 , 3 } is essentially identical to that for wavenumbers (II.23), except that only even cases contribute 1 + 2 + 3 = 2n, n ∈ N. However, the appropriate weight function now incorporates Wigner-3 j symbols arising from bispectrum products, w 1 2 3 = 1 4π (2 1 + 1)(2 2 + 1)(2 3 + 1) l 1 l 2 l 3 0 0 0 2 and w s 1 2 3 = where in the second expression we have exploited the freedom to divide by a separable function v = (2 + 1) 1/6 and use a weight which makes the bispectrum functions more scale-invariant (eliminating an −1/2 factor -see below). The inner product between two functions f 1 2 3 and g 1 2 3 is altered from the primordial wavenumber integral (??) into a sum over multipoles on the tetrahedral domain, that is, But for the change in the weight (which only affects configurations near the edges of the tetrahedron), the 1D polynomialsq p ( ) and 3D separable product basis functions Q n ( 1 , 2 , 3 ) =q {p q r q s} (n ↔ {prs}), as well as the resulting orthonormal modes R n , are nearly identical to their primordial counterparts q p (k), Q n (k 1 , k 2 , k 3 ) and R n (k 1 , k 2 , k 3 ) defined in section II.B. We can now expand an arbitrary CMB bispectrum b 1 2 3 in both the separable and orthonormal mode expansions, which is achieved in the following form, where the variance term √ C C C reflects the signal-to-noise weighting expected in the CMB estimator (see section III.E). Again, the coefficients in the expansions are determined, first, from the orthonormal inner products α R n = R n , · and, secondly, the separableᾱ Q n are found with the transformation matrix analogous to (II.26). Examples of the convergence of these mode expansions for equilateral, DBI and cosmic string CMB bispectra are given in fig. 5.

D. CMB bispectrum calculations and correlations
Prior to the systematic mode expansion approach (III.56) being implemented, robust hierarchical schemes were developed to calculate any non-separable CMB bispectrum (III.45) directly (Fergusson & Shellard, 2007). These use the transverse coordinate system (k,α,β) given in (II.11) and employ adaptive methods on a triangular grid to accurately determine the oscillatory 2D αβ-integrations, with important efficiencies also coming from the flat sky approximation, binning and interpolation schemes. Precision to greater than 1% across the full Planck domain ≤ 2000 was established by direct comparison with analytic solutions such as (III.48) and (III.50). Examples of nonseparable (and separable) CMB bispectra found using these hierarchical coarse-graining methods are shown in figs 7 and 8. While the CMB bispectra b 1 2 3 retain the qualitative features of the primordial shape functions S (k 1 , k 2 , k 3 ), they are overlaid with the oscillatory transfer functions which give rise to a coherent pattern of acoustic peaks. These direct bispectrum calculations revealed that typical primordial models could be described by eigenmode or other expansions using only a limited number of terms.
Motivated by the form of the CMB estimator, we can define the following correlator to determine whether or not two competing theoretical bispectra can be distinguished by an ideal experiment, where the normalization N is defined as The emergence of the inner product (III.59) in the expression (III.61) means that substitution of the mode expansions (III.60) for the theoretical bispectra reduces the correlator to While the late time correlator (III.61) is the best measure of whether two CMB bispectra are truly independent, it can be demonstrated that for the majority of models the shape correlator (II.19) introduced earlier is sufficient to determine independence. On the basis of the direct calculation of the bispectrum results and the CMB correlator, we can now quantitatively check the forecasting accuracy of the primordial shape correlator proposed previously (again closely following the discussion in Fergusson & Shellard (2009)).

Nearly scale-invariant models
For nearly scale-invariant models, the centre values for the bispectrum b lll all have roughly the same profile but with different normalisations. As we see from (??), the oscillatory properties of the transfer functions for the CMB power spectrum, create a series of acoustic peaks for any combinations involving the following multipole values, l = 200, 500, 800, .... Of course, to observe the key differences between the scale invariant models we must study the bispectrum in the plane orthogonal to the (l, l, l)-direction, that is, the directions reflecting changes in the primordial shape functions. To plot the bispectrum (see figs 7 and 8), we consistently divide b lll by the large-angle CMB bispectrum solution for the constant model (II.14). This is analogous to multiplying the power spectrum C l 's by l(l + 1), because it serves to remove the overall −4 scaling of the bispectrum, flattening while preserving the transverse momentum-dependence primordial shape and the effects of the oscillating transfer functions.
The starting point is the constant model (II.14) which, despite its apparent simplicity, has a CMB bispectrum b const 1 2 3 revealing a non-trivial and coherent pattern of acoustic peaks that we have already noted (see fig. 7). Given that the constant model has no momentum dependence, we stress that the resulting bispectrum is the three-dimensional analogue of the angular power spectrum ( +1)C for a scale-invariant model. The largest (primary) peak, for example, is located where all three i = 220 (corresponding to the large blue region near the origin). We can interpret fig. 7, therefore, as the pure window function or beam effect of convolving any model with the radiation transfer functions ∆ (k) while transforming from Fourier to harmonic space. The CMB bispectrum for the equilateral model is plotted in fig. 8, showing how the the centre-weighting from the primordial shape is well-preserved despite the convolution with the oscillating transfer functions. For the full CMB correlator (III.61), the DBI, ghost and single shapes are generally even more closely correlated with equilateral, presumably because distinctive features are 'washed out' by the transformation from Fourier to harmonic space. Comparative results between the shape correlator and Fisher matrix analysis are given in Table I, establishing that these models are highly correlated and difficult to set apart observationally.
The CMB bispectrum for the local model is also shown in fig. 8, demonstrating a marked contrast with equilateral which reflects their different primordial shapes shown in fig. 3. The dominance of the signal in the squeezed limit creates strong parallel ridges of acoustic peaks which connect up and emanate along the corner edges of the tetrahedron(see Bucher et al., 2009, for further details). The 51% CMB correlation between the local and equilateral models is underestimated by the shape correlator at 41%, presumably because of effective smoothing due to the harmonic analysis. Reflecting their distinctive primordial properties, the CMB bispectra for the flat and warm models are poorly correlated with most of the other models, though the flat shape could be susceptible to confusion with the local CMB bispectrum with which it has a larger correlation (see table II). It is clear that the local, equilateral, warm and flat shapes form four distinguishable categories among the scale-invariant models.

Scale-dependent models, cosmic strings and other late-time phenomena
Models which have a non-trivial scaling, such as the feature models, can have starkly constrasting bispectra as illustrated in fig. ??. For example, instead of having the same pattern of acoustic peaks which characterise the scaleinvariant models, the feature model can become entirely anticorrelated so that the primary peak has the opposite sign. Later, for this particular choice of k * in (II.29)), for increasing l the phase of the oscillations becomes positively correlated by the second and third peaks. This can lead to small correlation with the other primoridial shapes, all below 45% as shown in table II for this k * and max . Clearly, these non-separable feature models form a distinct fifth category beyond the four scale-invariant shapes noted above and, of course, there are many possible model dependencies which can lead to further subdivision.
By way of further illustration of the breadth of other possible non separable CMB bispectra, we present the late-time CMB bispectrum predicted analytically for cosmic strings (Regan & Shellard, 2009 where min = min( 1 , 2 , 3 ), * = min(500, min ), ζ = min(1/500, 1/ min ) and L = ζ 1 2 ( 2 1 2 2 + 2 2 2 3 + 2 3 2 1 ) − 1 4 ( 4 1 + 4 2 + 4 3 ) . (III.65) Here, A ∼ (8πGµ) 3 is a model dependent amplitude with Gµ = µ/m 2 Pl measuring the string tension µ relative to the Planck scale. The cutoffs around ≈ 500 in (III.64) are associated with the string correlation length at decoupling (perturbations with 500 can only be causally seeded after last scattering). Here, the non-separable nature and very FIG. 8 The reduced CMB bispectra for several non-Gaussian models, including (top panels, left to right) equilateral, local, flattened models and (bottom panels) warm, feature, cosmic string models (see main text). All five primordial models are normalised relative to the constant solution (III.48) and are taken from Fergusson & Shellard (2009) different scaling of the string CMB bispectrum are clear from a comparison with (III.50). Moreover, given the latetime origin of this signal from string metric perturbations, the modulating effect of acoustic peaks from the transfer functions is absent, as is clear from fig. 8. This is just one example of late-time phenomena such as gravitational lensing, secondary anisotropies and contaminants which are accessible to analysis using the more general CMB mode expansions (III.60).

E. The estimation of f NL from CMB bispectra
In light of the previous discussion, it is evident how measurements of the bispectrum from CMB experimental data-sets are able to provide information about the primordial three point function of the cosmological curvature perturbation field at the end of inflation. This in turn allows us to put significant constraints on inflationary models, or on alternative models for the generation of cosmological perturbations. We will now start dealing with the problem of bispectrum estimation in the CMB, as a test of primordial non-Gaussianity.
Let us assume that we have measured the three point function of a given CMB dataset. There are now two general ways to exploit this information: 1. Tests of the Gaussian hypothesis. By comparing the measured three point function to its expected distribution obtained from Gaussian simulations we can detect if some configurations present a significant deviation from Gaussian expectations. The issue with this approach is that it is sensitive not only to primordial non-Gaussianity, but also to any other possible source of NG, including those of non-cosmological origin. Original bispectrum tests of this kind on COBE maps (Ferreira et al., 1998) revealed significant deviations from Gaussianity in the data. This NG signature in the three point function seemed to be localized in harmonic space around multipoles = 16 and was object of much scrutiny (e.g. Bromley & Tegmark, 1999;Magueijo, 2000a,b;Magueijo et al., 1999Magueijo et al., , 2000. It was then finally ascertained that the detected signal was not cosmological in origin, but due to a systematic artifact (Banday et al., 2000). Moreover, the overall statistical significance of the result disappeared in a later analyisis involving the measurement of all the bispectrum modes available in the map (Komatsu et al., 2002) (only a subset of all the configurations had been studied before). General tests of Gaussianity are very useful to identify unexpected effects in the data, and to monitor systematics. However, as long as we are interested in a primordial NG signal, it is better to follow the approach of making an ansatz for the bispectrum we expect from the theory under study and obtain a quantitative constraint on a given model. This approach is outlined in the point that follows.
2. f NL estimation. In this case we choose the primordial model that we want to test, characterizing it through its bispectrum shape. We then estimate the corresponding amplitude f model NL from the data. If the final estimate is consistent wih f model NL = 0, we conclude that no siginificant detection of the given shape is produced by the data, but we still determine important constraints on the allowed range of f model NL . Note that ideally we would like to do more than just constrain the overall amplitude, and reconstruct the entire shape from the data by measuring single configurations of the bispectrum. However, the expected primordial signal is too small to allow the signal from a single bispectrum triangle to emerge over the noise. For this reason we study the cumulative signal from all the configurations that are sensitive to f model NL .
Since in this review we are concerned with the study of the primordial bispectrum, we will take the latter approach, and deal with the problem of f NL estimation from measurements of the bispectrum in CMB maps. We will first present a cubic estimator that optimally extracts the f NL information from the data contained in the bispectrum (section III.E.1). We will then address the issue of understanding if this optimal cubic statistic extracts all the possible information available on f NL in the data, or if there is enough additional information beyond the three point function to allow more precise f NL measurements using non bispectrum-based estimators of f NL (section III.E.2). We will then discuss concrete numerical implementations of bispectrum estimators (section III.F) and review the experimental constraints on f NL obtained from bispectrum analysis of WMAP data (section III.G). Using a standard Fisher matrix analysis, forecasts on the f NL error bars are achievable for future CMB surveys (section III.H). Following, we will study the NG signals in the map that could contaminate the primordial NG measurement and how they are dealt with when analyzing the data. Finally we will describe algorithms for the simulation of primordial NG CMB maps that are useful for testing and validation of estimators before applying them to real data.
In the following, we assume that the reader is familiar with essential concepts in statistical estimation theory, such as the definition of a statistical estimator, the role played by maximum likelihood estimators in statistics, the definitions of unbiasedness and optimality, and the definition and main applications of the Fisher information matrix. The reader unfamiliar with these concepts can consult the appendix of this review and references therein.
1. Bispectrum estimator of f NL In this section we are concerned with the statistical inference of f NL from measurements of the bispectrum of the CMB anisotropies. We recall that we defined f NL earlier as the amplitude of the bispectrum of the primordial potential. In principle, we can include both temperature and polarization multipoles a T,E m in the analysis, in order to maximize the available data. However, for clarity we will consider only temperature multipoles in the following, and omit the superscript T in a m , for simplicity of notation. The extension to polarization is conceptually straightforward, and will be discussed in a following paragraph. We will start by considering a simple cubic 1 statistic written in the form: In the previous equationf NL represents the statistical estimate of f NL from the data, a m are the multipoles of the observed CMB temperature fluctuations, W m 1 m 2 m 3 1 2 3 are some weight functions, and N is a normalization factor that has to be chosen to make the estimator unbiased i.e. to ensure that: (III.67) We now want to find the weights W m 1 m 2 m 3 1 2 3 that provide the best estimator (i.e. the minimum error bar estimator) within the class of cubic statistics written in the form (III.66). It is a well known result (see Appendix A) that the best unbiased estimator of a parameter from a given data-set is the maximum-likelihood estimator. In order to answer our question we then have to write the bispectrum likelihood as a function of the parameter f NL , and maximize with respect to f NL .
In the assumption that the bispectrum configurations are characterized by a Gaussian distribution 2 , maximizing the likelihood is equivalent to minimizing the following χ 2 : where B obs 1 2 3 is the observed angular averaged bispectrum i.e. , by definition: and σ 2 is the bispectrum variance i.e. the a m six-point function: σ 2 = a 1 m 1 a 2 m 2 a 3 m 3 a 4 m 4 a 5 m 5 a 6 m 6 . (III.70) We will now make the assumption that we are working in the weak non-Gaussian limit i.e. f NL is small and the distribution of a m can be approximated as Gaussian in the calculation of the variance. The implications of this approximation will be discussed in greater detail in the following sections; for the moment it will suffice to point out that the weak non-Gaussian approximation is generally a good one since most inflationary models predict f NL to be small, and because the level of primordial non-Gaussianity is already constrained to be small by WMAP measurements (Komatsu et al., 2009;. After restricting indeces so that 1 ≤ 2 ≤ 3 and 4 ≤ 5 ≤ 6 , the six point function above can be calculated using Wick's theorem, yielding (Luo, 1994): In the last formula ∆ is a permutation factor that takes the value of 1 when all 's are different, 2 when two 's are equal and 6 when all 's are equal. We can now substitute (III.71) into (III.68), and differentiate with respect to f NL to get an explicit expression for the optimal cubic statistic we were looking for: where b 1 2 3 is the reduced bispectrum and G m 1 m 2 m 3 1 2 3 is the Gaunt integral defined by equation (III.39); N is the normalization factor mentioned at the beginning of the paragraph, that guarantees the unbiasedness of the estimator.
Note that the noise and window function of the experiment are included in the C l and b 1 2 3 that appear in the formula above, with the following replacements: W is the window function (not to be confused with the weights W) and N is the noise power spectrum (constant for uncorrelated white noise). The noise is assumed to be Gaussian, thus characterized by a vanishing three point function.
Comparing our result (III.72) to the initial ansatz (III.66), we then see that the optimal weights are: in other words we are weighting the observed bispectrum by its expected signal-to-noise ratio.
We have now constructed a statistic that optimally extracts the information about f NL from the bispectrum of the map. The question now is: is there additional information about f NL in the map that is not contained in the bispectrum? This issue will be investigated in the following sections. For the impatient reader we anticipate that the answer is no: the bispectrum statistic built here is actually the minimum error bar estimator of f NL from CMB data.

Optimality of the cubic estimator
In this section we address the issue of whether the cubic statistic (III.72) optimally extracts all the f NL information contained in the a m , or if other statistical estimators (e.g. four-point function, or pixel space statistics such as the Minkowski functionals, or again wavelet estimators, just to make a few among many possible examples) are able to produce smaller error bars and are thus more efficient than the bispectrum.
In a non-Gaussian primordial CMB map, the a m likelihood depends on the NG parameter f NL . We will indicate it with p(a| f NL ), where a indicates a vector including all the a m 's (we will assume all other cosmological parameters are fixed, and concentrate on f NL ). It is a well known result in parameter estimation theory that there is a lower limit on the error bars that can be assigned to a given parameter (in our case f NL ). Such lower limit, also known as the Rao-Cramer bound, is defined in term of the Fisher matrix F as 3 : We remind the reader that the Fisher matrix is defined as: If we can show that the bispectrum estimator of the previous section saturates the Rao-Cramer bound for the a m Fisher matrix above, then we conclude that it provides the best (i.e. minimum variance) estimate of f NL from the data, rather than just the best f NL estimate from the bispectrum of the data. In other words, no more information about f NL could be extracted from the a m than the information contained in the bispectrum. The aim of this section is to show that this is actually the case. The issue of the optimality of bispectrum estimators of f NL was addressed in great detail in Babich (2005). In this section we will basically review the main results of that study, referring the reader to the original paper for their complete derivation.
As we mention in appendix A, there is a sufficient and necessary condition for an estimator E to saturate the Rao-Cramer bound, expressed by formula (A.5). This condition, applied to our case, reads: Our aim is to show that the bispectrum statistic (III.72) satisfies this condition. We then need to start from a computation of the full likelihood p(a| f NL ) for a general primordial non-Gaussian model. Following Babich (2005) we will start by limiting ourselves to the particular case of the local model. We recall from a previous section that local NG is the only case for which an explicit expression for the primordial potential is provided. In real space: Starting from this formula it is possible to obtain a likelihood function for Φ, dependent on the parameter f loc. NL . This is done by means of an expansion in terms of the order parameter f loc.
NL Φ 2 L (x) . The full expression for the Probability Density Function (PDF) P(Φ| f loc. NL ) (see Babich, 2005) can be expanded around its Gaussian expectation for f loc. NL = 0, and schematically written as: where C is the covariance matrix of the Gaussian part of the potential Φ i.e.
Formula (III.80) is then telling us that the logarithm of the full likelihood can be decomposed into the sum of a Gaussian likelihood P G , plus a NG term that depends linearly on f loc. NL , and that this decomposition is accurate up to i.e. we are assuming that NG is weak, as we did in the previous section.
After computing P(Φ| f loc. NL ), one has to account for 2D projection and radiative transfer in order to obtain the required likelihood P(a| f loc. NL ). As shown in Babich (2005), this can be achieved by expanding the PDF III.80 in spherical harmonics and performing the functional integration: where δ (M) D is the Dirac delta function of dimension M, and M < N due to the 2D projection 4 The previous formula can be derived by recalling equation (III.31), together with the well known formula in probability theory: where δ D is again the Dirac delta function, x and y are random variables linked by the functional relation y = F(x), and P(x), P(y) are the PDFs of x and y respectively. Solving the functional integral (III.82) yields (Babich, 2005): In the previous formula we can recognize the standard a m PDF, valid in the standard Gaussian case, in the first term on the r.h.s. Added to this we find a first order f NL -correction proportional to the CMB angular bispectrum. Higher order correlators are not present at order O ( f NL Φ L (x) ). For reasons that will become clear shortly, although we have not computed it, we have explicitly denoted the O f 2 NL Φ 2 L (x) term in the expansion with I 2 (a, f NL ). Note that, besides assuming weak NG in this formula, we are also assuming rotational invariance (this is evident from the fact that the a m covariance matrix appearing in the Gaussian piece of (III.84) is diagonal and equal to C ). Rotational invariance is a general property of the CMB sky but it is broken when we deal with real CMB measurement characterized by inhomogeneous noise patterns and sky cuts. We will investigate these effects in the following section. For the moment we consider the purely ideal case described by (III.84). Armed with the PDF expression for local NG, and recalling the necessary and sufficient condition (III.78), we can finally determine whether the estimator (III.72) is optimal or not. First of all we see that: We then see from combining (III.85) and (III.72) that: We now see that, in order for the necessary and sufficient condition for optimality (III.78) to be verified, we need the (∂I 2 /∂ f NL ) term to be exactly equal to − f NL . The second order quantity I 2 should then be calculated explicitly in the expansion (III.84) in order to complete the calculation and verify if, or under which conditions, this is true. However this turns out not to be necessary if we consider the following "regularity condition for a PDF". 5 For a general PDF of a random variable x depending on a parameter λ we have Since this regularity condition must be valid for each value of the parameter λ (λ → f loc. NL in our case), it is clear that it must hold term-by-term, i.e. at each order, in the expansion (III.84). By taking the average value of equation (III.86), keeping in mind that the estimator is unbiased, and imposing (III.87), we then find that the average value of ∂I 2 (a, f NL )/∂ f NL must be exactly equal to − f NL . If we could then replace ∂I 2 (a, f NL )/∂ f NL in (III.86) with its average value we would exactly obtain the condition for optimality and conclude that the cubic estimator (III.72) saturates the Rao-Cramer bound. For present CMB experiments, the terms in the expansion (III.84) are evaluated summing over a large number of -modes 6 , or equivalently in pixel space, averaging over a large number of pixels (∼ 10 6 and 10 7 for WMAP and Planck respectively). For this reason we expect that the error made by replacing the f NL -order term in (III.86) with its average value will be very small. In (Babich, 2005), an estimate of this error has been done in the approximation of neglecting radiative transfer and projection effects (i.e. working in 3D with the primordial potential, rather than with a CMB map). The conclusion was that for a number of observations N > 30 the approximation above works very well. Moreover the variance of the f NL -order term scales like 1/N. In the full radiative transfer case we expect the scaling to be unchanged, although the coefficients in front of it that led to the N > 30 estimate might change. However, as noted above, the number of pixels in present-day experiments is many orders of magnitude larger than 30. That leads us to conclude that the approximation of replacing the average of the first order term in equation (III.86) is a very good one. We then reach the following important conclusion: for a rotational invariant CMB sky, in the limit of weak NG, the cubic estimator defined by formula (III.72) is the best unbiased CMB estimator of f loc.

NL
Let us now move to problem of generalizing the last conclusion to shapes different from local. In this case a full expression of the primordial potential Φ(x) is not available. The steps that lead to the conclusion that the local f NL estimator is optimal can thus not be reproduced. However it was pointed out in Babich (2005) that, in the limit of weak NG, the full CMB NG likelihood can still be expressed in terms of its power spectrum and bispectrum by mean of an Edgeworth expansion, regardless of its full expression. The Edgeworth expansion is basically a way to express a NG PDF as a series expansion around its Gaussian part Bernardeau & Kofman, 1995;Taylor & Watts, 2001). For CMB anisotropies one finds, at the end of the calculation: It is easy to see that ln P(a| f NL ) takes the same form as in equation (III.84). For this reason all the previous derivation applies also to the present case and the following conclusion holds: in the weak non-Gaussian limit and assuming rotational invariance of the CMB sky, the cubic estimator (III.72) is the best unbiased CMB estimator of f NL for any non-Gaussian shape.
Before concluding this section, we would like to stress that, despite the technical complications arising in the detailed probe of the bispectrum estimator's optimality, the physical reason behind this result is quite clear. We can always expand the a m PDF in series of its momenta. The order parameter of this expansion is ( f NL Φ ). This parameter is the natural measure of the amplitude of primordial NG, and it is actually predicted by inflation to be very small. For this reason higher order momenta in the primordial non-Gaussian a m PDF are suppressed with respect to the bispectrum. Basically, the information on f NL in a CMB map is entirely contained in the three point function.

Breaking rotational invariance
So far the assumption of rotational invariance of the CMB sky has been made when probing the optimality of the cubic estimator (III.72). In an ideal situation, the CMB sky is clearly rotationally invariant. However, two elements break rotational invariance in a CMB map derived from a real experiment: an anisotropic distribution of noise in pixel space, and a galactic mask. Anisotropic noise comes from the fact that the CMB sky is generally scanned in a non-uniform way: regions that are less contaminated by astrophysical foreground emission are generally observed more times, and are thus characterized by a lower noise level (see fig. 9 for an example). A sky cut has also to be introduced in order to remove the regions on the galactic plane that are most contaminated by foregrounds. When rotational invariance is broken the considerations of the previous two sections do not strictly apply anymore and the estimator (III.72) become sub-optimal. However, the same Edgeworth expansion approach that was adopted in the previous section can still be applied, but this time keeping rotation invariance breaking terms in the calculation, in order to find the new more general form of the optimal estimator. The general estimator turns out to be the sum of two terms: the first term is cubic in a m and is analogous to the one appearing in the rotationally invariant case, while the second term is linear in a m and accounts for breaking of rotational invariance. The explicit expression of this general optimal f NL estimator is (Creminelli et al., 2006): In the rotationally invariant case the a m covariance matrix C 1 m 1 , 2 m 2 is diagonal and equal to C , while the linear term is proportional to a monopole. We then recover the form of the cubic estimator III.72 as expected. Note that in the signal dominated regime of the experiment under study (e.g. 300 for WMAP and 1000 for Planck), and if the mask is not too large, then the simple cubic estimator (III.72) is still basically optimal, since we are in a nearly rotationally invariant case. For small masks it has been shown in Komatsu & Spergel (2001) that the bispectrum and power spectrum of the map are, to a good approximation, just rescaled by a factor f sky , representing the fraction of the sky left free by the mask, i.e. b mask 1 2 3 = f sky b f ull sky In this case one can then assume the covariance matrix to be diagonal, and account for the effects of the mask by correctly rescaling the normalization term in order to keep the estimator unbiased. This nearly rotationally invariant estimator then takes the form:f The approximation of weak non-Gaussianity is the basis for all the results derived so far. One can then ask at which point (i.e. for which values of f NL ) this approximation breaks down. As we observed earlier, the Edgeworth expansion (III.88) shows that the likelihood of a generic primordial NG distribution can be expanded in series of its momenta, with order parameter O f NL Φ 2 L (x) . We know that Φ L ∼ 10 −5 , while WMAP observations already constrain f NL 100. That means that the order parameter of the PDF expansion is ∼ 10 −3 and thus the weak NG approximation seems a very good one in the entire range of allowed and predicted f NL . However a subtle effect has been pointed out by , that change the previous conclusions in certain cases. Let us quickly summarize their main results. We already saw that, for the angular averaged bispectrum of a Gaussian temperature field: We then included this expression for the variance in the weights of the optimal estimator (III.72), and in the normalization factor N. It is easy to see that in this approximation the variance of the estimator can be predicted as: However the approximation of taking f NL = 0 in the calculation of the estimator variance is not always a good one if ∆ f NL is dominated by squeezed configurations 7 , or more in general by configurations in which one of the 's is small. It turns out that, in these cases, the f NL -dependent corrections to the Gaussian expectation of the bispectrum variance become important when f NL gets large enough. This effect increases the variance of the estimator with respect to the expectation for f NL = 0. There is a clear physical interpretation for this. One can see, for example by calculating (III.95) in the simple Sachs Wolfe case (see also section III.H) that the variance of the estimator scales roughly like 1/ max , or equivalently like 1/N pix in pixel space, N pix being the number of pixel in the observed map. This increase of the signal-to-noise ratio with the number of pixels is due to the fact that more and more bispectrum configurations are included into the sum over modes to estimate f NL . However, if the signal is completely dominated by low-modes, as in the local case, there is an intrinsic large cosmic variance, due to the small number of low-configurations. Clearly, cosmic variance cannot be beaten by increasing the resolution of the map.  then found that for N pix getting large, the S /N of the estimator for local NG grows asymptotically as (ln N pix ) i.e. much slower than the expected N pix , that one would obtain by neglecting f NL -dependent corrections in the calculation of the estimator variance. They carried out a calculation of the estimator variance in the flat-sky approximation, and neglected the transfer functions, to find the following expression: where A is the bispectrum amplitude. We clearly see from this formula what we were stating above i.e. when f NL gets large, the variance starts scaling like 1/ ln N 2 pix . The same formula also shows the technical point behind this behaviour: in the correction term, the order parameter is actually not f NL A 1/2 anymore but rather f NL A 1/2 N pix . This enhancement by a factor N pix can make the first order corrections non-negligible anymore. The natural question is now how large an f NL we need to make the correction term important in (III.96). Following  let's call σ 0 the standard deviation of the estimator computed for f loc. NL = 0. Let's say that for an experiment at a given angular resolution (defined by max in harmonic space or by N pix in pixel space) a value of f loc. NL is measured, equal to nσ 0 . Substituting this value into formula (??)behavior, and calling σ 2 f NL the real estimator variance, one finds the following relative correction to σ 0 : For an experiment like WMAP the r.h.s. term becomes ∼ 1 when f NL is about 6σ 0 away from the origin. For an experiment like Planck f NL has to be about 3.5σ 0 . The definition of an highf NL regime is thus dependent on the experiment under study, as a consequence of the fact that the enhancement of first order terms in the variance expression (∆E) depends on N pix . In other words we can conclude the following: if f loc. NL will be detected at several σ (in terms of the Gaussian expectation for the standard deviation), then f NL -dependent correction terms in the estimator variance will have to be taken into account, and the simple expansion (III.84) of the CMB likelihood does not constitute a valid approximation anymore.
One caveat in all this discussion is that formula (III.96) was obtained in flat sky and neglecting the transfer functions and it should be checked how dependent the final results are on these approximations. Since the scaling argument is based on a very general physical reason, i.e. the weight of squeezed configurations in the local f NL estimator discussed earlier in this section, one expects that the general scaling with N pix obtained in (III.96) does not depend on the details of radiative transfer and 2D projection. Liguori et al. (2007) actually checked the results of this section numerically, by applying an implementation of the optimal cubic estimator (III.72) to full-sky simulations of CMB local NG maps with different N pix and f loc. NL , including the full radiative transfer 8 . Although, as expected, the coefficients in formula (III.96) change with respect to the simple flat sky no radiative transfer approximation, the scaling of the error bars with N pix follows very well the expectations, going from ∼ 1/N pix ln N pix at low f NL to 1/ ln N pix when f NL is detected at high significance. Since in the large f NL regime the variance starts to scale very slowly, like 1/ ln 2 N pix , one is led to wonder whether the estimator discussed in the previous sections becomes suboptimal at this point, and whether a better one can be found. The answer to this question is not immediate. In order to check for the optimality of an estimator, as we have seen, one has to see if it saturates the Rao-Cramer bound. However, also the local f NL likelihood and Rao-Cramer bound estimated in the previous sections have to be recomputed, since they were obtained neglecting higher order terms in f NL A 1/2 . In order to account for the f NL A 1/2 N pix enhanced terms it is necessary to produce an exact expression of the full likelihood. This can be extremely challenging in the full radiative transfer case, but it is feasible in the flat sky no radiative transfer approach that we are considering (and that we showed earlier to be a good approximation as long as scaling arguments are involved).  proceed to calculate the full likelihood in this approximation and conclude that the optimal cubic estimator of weak local NG indeed does not saturate the Rao-Cramer bound in the highf NL regime. The estimator (III.72) is thus no longer optimal in this case. They then proceed (always in the flat sky no transfer function case) to derive a cubic estimator that saturates the Rao-Cramer bound also for large f loc. NL . We won't enter into the details of this derivation here, referring the reader to  for a complete discussion. The main aim of this section was to show under which conditions the optimality of the cubic estimator that we discussed in previous sections is valid. Since current bispectrum analysis of WMAP data (Komatsu et al., 2009; NL ∼ 2σ 0 , the weak NG approximation applies, and the cubic estimator we derived earlier is indeed an optimal estimator in this case. However, if future Planck measurement will produce a detection of f loc. NL at high significance the estimator will have to be modified in order to account for the enhanced variance in the highf NL regime. This is not necessarily a remote possibility if one considers that the present central value of f NL from WMAP estimates would produce a ∼ 8σ 0 detection with Planck. Before concluding this section we would like to remark once again that the variance enhancement discussed here applies only to non-Gaussianity of the local type, whose bispectrum is dominated by squeezed configurations, affected by a large cosmic variance. For the other shapes the cubic estimator (III.72) is optimal in both the small and highf NL regimes.

F. Numerical implementation of the bispectrum estimator
In this section we turn to the problem of finding an efficient numerical implementation of the optimal bispectrum estimator (III.89). Let us repeat its expression here for convenience: We remind the reader that this is the full expression, valid for the general non-rotationally invariant case. For a rotationally invariant CMB sky the linear term in the formula above vanishes, and the covariance matrix is diagonal and reduces to C , giving the simplified expression (III.72) that we reproduce again here for convenience: In a schematic way, the full estimator can be written as: where the "cubic" term is the one containing the product a 1 m 1 a 2 m 2 a 3 m 3 , while the linear term is the one dependent on a single a m and vanishing in the rotationally invariant case, where it is proportional to a monopole. It was shown before in a formal way that a pure cubic estimator becomes suboptimal when rotational invariance is broken, and adding the linear term is necessary to restore optimality. It is useful to try to understand the reason of this effect qualitatively and in a more intuitive way. Let's assume that we have a map characterized by non-stationary noise, and we are observing a region of the sky that was sampled many times so that the noise level in this area is low. That implies that the level of small scale power in this large region is lower than average. Now, for a specific realization of the CMB sky, this modulation of small scale power on a large region can look like a non-Gaussian signal sourcing a squeezed configuration of the bispectrum. On average this effect must cancel if the underlying noise model is Gaussian. However, this "confusion" between signal and noise increases the variance of any estimator of a primordial NG signal that is peaked on squeezed configurations. We know that this happens for the local model. This heuristic argument thus shows that, even though in principle a linear term must always be included when rotational invariance is broken, for a realistic noise model only local non-Gaussian estimates will be affected.
1. Primary cubic term for f NL Let us focus for the moment on the rotationally invariant case, where the linear term vanish, and the covariance matrix is simply C = C δ 1 2 δ m 1 m 2 . We immediately see that a brute force implementation of equation (III.100), consisting in computing and summing over all the bispectrum configurations, would take O 5 max operations, where max , the maximum multipole in the calculation, depends on the resolution of the experiment. As mentioned earlier, max ∼ 500 for WMAP and max ∼ 2000 for Planck in the signal dominated regime. At these resolutions a brute force approach would be absolutely unfeasible for a general shape. If however we assume that the primordial shape under study is separable then the dimensionality of the problem can be reduced and the overall number of operations scaled down significantly, making the computational cost affordable. Let us illustrate this point more in detail. Substituting (III.46) into the estimator expression (III.72), and remembering the identity (III.39), it is possible to rewrite (III.100) as follows (or more in general as a linear combination of terms of the following kind): From an inspection of previous formula we see how, as a direct consequence of separability, the initial sum over the indeces 1 2 3 , m 1 m 2 m 3 has been factorized in the product of three sums, each running over two indices , m. This greatly reduces the computational cost from O 5 max to O 3 max operations. If we define the new quantities we can recast the estimator expression above in the following form: where it is evident that we are now calculating our statistic in position space rather than in pixel space. Note how the filtered maps M X , M Y , M Z can be efficiently calculated using a fast harmonic transform algorithms such as those included in the HEALPix package. This fast position space algorithm was initially introduced in Komatsu et al. (2005) in the context of local f NL estimation, and applied to the estimation of WMAP 1-year data by the WMAP team in Komatsu et al. (2003). It was then applied to equilateral f NL estimation for the first time in Creminelli et al. (2006). An alternative numerical implementation with respect to the one used by the aforementioned authors was introduced in Smith & Zaldarriaga (2006). Although different under many technical aspects, this second algorithm is still based on the calculation of the position space statistic (III.106); we refer the reader to the original work for additional details. This second implementation has been used to produce alternative estimates of f loc. NL , and f eq. NL from WMAP data, and to estimate the amplitude of the orthogonal shape, recently introduced in Senatore et al. (2009). Let us now discuss the possible limitations of this numerical approach. As noted in section II, the separability condition is in principle quite restrictive: the only separable shape arising directly from primordial models of inflation is the local one. On the other hand, it is still possible to study non-separable models by finding separable shapes that are highly correlated to the primordial one. As observed in Creminelli et al. (2006); Fergusson & Shellard (2009);Smith & Zaldarriaga (2006), the f NL limits obtained from a highly correlated separable shape in this way will be very close to those that would have been obtained using the original non-separable model (see again sections II, III.B and II.B for a detailed discussion of this issue). We know from earlier sections that the other two shapes mentioned so far in this section besides local, namely the equilateral and orthogonal shape, have actually been derived as separable approximations of theoretical inflationary shapes. These approximations were obtained in an heuristic way i.e. an educated guess of a good separable approximation of the shape under study was made, and the correlation was checked a posteriori. There is obviously no a priori guarantee that this approach would be easily repeatable for all the shapes of interest. The eigenmode expansion method introduced in (Fergusson et al., 2009), and summarized by equation (II.26), however, provides a general and rigorous method to find separable approximations of any shape, thus enabling the estimation of any possible primordial model. In this case, recall that we expand our (non-separable) primordial shape function in terms of the separable basis functions Q n (see (II.26), constructed from symmetric polynomical products q p (k), as where the second expression for the reduced bispectrum b 1 2 3 (III.56) expands in convolved basis functions (III.57) in harmonic space with In the mode expansion approach, then, the f NL -estimator for a specific model generalises to the following where the filtered maps or shells M p (r,n) are defined by Defining the integral β prs ≡ drr 2 dΩn M {p M r M s} , the estimator collapses into the compact form where it is possible to show a precise relationship between the theoretical bispectrum expansion coefficients α prs and expectations for the observed coefficients β prs .
It was also pointed out in Fergusson et al. (2009) (see also Fergusson & Shellard (2007)), and summarized in formula (III.60), that the separation can be performed directly in harmonic space on the reduced bispectrum b 1 2 3 , rather than on the primordial shape S (k 1 , k 2 , k 3 ). This provides an alternative, but equivalent, late-time f NL -estimation pipeline with respect to the primordial shape separation approach given above (III.111). In fact, since orthonormality is more direct on the harmonic domain without the intervention of transfer functions, the approach is considerably more straightforward conceptually. In this case, expectations for the observational expansion coefficients in the orthonormal frame R n (with n ↔ {prs} see (III.60)) become simply β R n =ᾱ R n , that is, for an ensemble of maps possessing the theoretical bispectrum described by the coefficientsᾱ R n . This means that for a NG bispectrum signal of sufficient significance we can consider directly and efficiently reconstructing the bispectrum from the observed coefficients β R n using (III.60). We also note that the harmonic space separation scheme also allows for the estimation of noninflationary late-time bispectra, such as the bispectrum of cosmic strings, as well as other secondary anisotropies.
We can then conclude, in light of these developments, that the fast cubic statistic (III.106) can be applied in complete generality to any model of primordial NG, as well as to any other potential source of CMB NG. We also point out that alternative approaches have been considered for harmonic space f NL analysis using wavelets and binning. For example, Bucher et al. (2009) recently proposed using a suitable binning scheme in which the full expression for b 1 2 3 is calculated in a subset of all the triples 1 , 2 , 3 , small enough to make the calculation feasible while maintaining calculation accuracy. Approaches based on a harmonic space separation scheme, of course, require the full calculation of the reduced bispectrum b 1 2 3 in order to determine the correlation between the theoretical prediction and the final expanded or binned bispectrum. The calculation of b 1 2 3 implies the necessity of numerically solving the radiative transfer integral (III.30) for all the configurations 1 , 2 , 3 which appears to be intractable in the non separable case, since the dimensionality of the problem cannot be reduced. However, this can be achieved efficiently in the general case either using the separable mode expansion integral (III.56) or else the hierarchical adaptive approach Fergusson & Shellard (2007) discussed in section III.D.

Linear correction term for f NL
Let us now consider the realistic situation in which inhomogeneous noise and a sky-cut break rotational invariance (see fig. 9). In this case two complications arise: 1. A linear term in a m has to be added 2. The a m covariance matrix is now no longer diagonal. The inverse covariance weighting C −1 a that appears in expression (III.98) is hard to compute numerically for high angular resolution experiment, since its size makes a brute force numerical inversion impossible.
A first approach, introduced in Creminelli et al. (2006), is to simplify the problem by assuming that the covariance matrix is diagonal in the cubic term of the estimator, and then finding the linear term that minimize the variance under this assumption. In other words, we keep the cubic term in the form (III.100) and compute the variance of this term, relaxing the assumption of isotropy at this point 9 It turns out that the variance is minimized (while leaving the estimator unbiased) for the following choice of the linear term: where N = f sky 1 2 3 B 2 1 2 3 /C 1 C 2 C 3 is the normalization term. Despite being sub-optimal with respect to a full implementation of equation (III.98), this choice of linear term has been shown to significantly improve the error bars with respect to the simple cubic statistic (III.100) in presence of anisotropic noise. At the same time, the simplicity of this implementation in comparison to the full optimal statistic (III.98) is manifest, since no C −1 terms appear in equation (III.112). Let us consider again a separable primordial bispectrum shape that can be written S (k 1 , k 2 , k 3 ) = X(k 1 )Y(k 2 )Z(k 3 ) + perm.. Applying the same procedure as we did for the cubic term, the linear term can be recast in the form: where we explicitly wrote C 1 m 1 , 2 m 2 as a 1 m 1 a 2 m 2 . This last formula can be rewritten as:  Nevertheless, we do not observe the increase of variance at higher l max : the variance becomes smaller as we include more multipoles. This result is in contradiction with the result of Creminelli et al. (2006a) and Creminelli et al. (2006b). We attribute this discrepancy to the error in the normalization of linear term in their formula.
In the right panel of figure 2, we show the variance of f NL again using Gaussian simulations, but now in the presence of flat sky cut and in the absence of any noise. The purpose of the plot is to demonstrate (as pointed out in the previous section) that for the combined CMB temperature and polarization analysis, sky-cut does contribute significantly to the linear term. We find that the generalized estimator does a very good job in reducing the variance excess, and the simulated variance of f NL does accurately saturate the Fisher matrix bound.
FIG. 10 Error bars (obtained from Gaussian simulations) for the pure cubic (triangles) and pseudo-optimal (stars) implementations of the bispectrum estimator, to be compared to the solid red line, representing the Fisher matrix (Rao-Cramer) bound, saturated by the full optimal statistic described in the text. Left panel: error bars as a function of the maximum multipole included in the analysis. Right panel: error bars as a function of the fraction of the sky considered in the analysis. This analysis included both temperature and polarization data. From .
Like for the cubic part of the estimator, we have rewritten the linear term as a fast position space integral. The ensemble averages appearing in the last formula can be computed as Monte Carlo averages over a large number of Gaussian realization of the CMB sky, characterized by the same beam, mask and noise properties as the experiment under study. This pseudo-optimal, but relatively straightforward, implementation of the linear term has been adopted by a number of groups in order to estimate f loc. NL from WMAP data (Creminelli et al., 2006Komatsu et al., 2009;. The full optimal estimator (III.89) was implemented only quite recently in , where the authors developed an efficient conjugate gradient inversion (e.g. Shewchuk, 1994) algorithm based on earlier results from Smith et al. (2007), in order to compute the C −1 a pre-filtering in reasonable CPU-time. Note that after the inverse covariance matrix pre-filtering is calculated, the numerical implementation of the estimator is very similar to the one outlined above for the pseudo-optimal case. The new position space statistic is obtained from formulae (III.106), (III.114), by making the following replacements, wherever the corresponding quantitites appear: with analogous substitutions to be made for the Y, Y, Z, Z terms appearing in the same equations. The improvement in error bars from the pure cubic sub-optimal estimator, to the pseudo-optimal and optimal statistics is shown on fig. 10.

G. Experimental constraints on f NL
In order to obtain an estimate of f NL from a given data-set one has first to generate sets of Gaussian CMB maps and obtain the MC averages that appear in the linear term expression (III.114), after an inverse covariance pre-filtering of the full optimal estimator is implemented. The normalization term N can be pre-computed using formula (III.46) to evaluate numerically the theoretical bispectrum shape for the model we want to estimate. The statistic (III.98) can then be computed for the experimental data a obs to get our result: The error bars are then obtained by running the estimator on simulated Gaussian maps 10 : where . MC indicates the MC average and a sim a vector of simulated multipoles (obviously including mask, beam and noise features of the experiment). For an accurate step-by-step description of an f NL -analysis of WMAP data, including details about channel coadding, noise model, beams and pixel weighting schemes, we refer the reader to the explanations contained in Komatsu et al. (2009). The most stringent limits so far have been obtained by applying the bispectrum estimator to the WMAP datasets. Constraints have been put on the local, equilateral and orthogonal shapes. The best constraints come from the full implementation of the optimal estimator done in  and Senatore et al. (2009). They are, at 95% C.L., − 4 < f loc. NL < 80 , (III.118) −125 < f equil.

NL
< 71 (III.120) Since the first release of WMAP data, different groups have used the cubic statistic described in the previous paragraph, either in its pure cubic form (III.106), or in the improved version including the pseudo-optimal linear term implementation (III.114). The results of different analysis of the WMAP 1-year, 3-year and 5-year datasets are summarized and commented in table III, where just the local and equilateral shapes have been included since the only constraint on the orthogonal shape to date has been already mentioned in (III.118).

H. Fisher matrix forecasts
The fisher matrix, defined as the curvature of the likelihood function calculated in its peak reassessment (see equation (A.3) in Appendix A), plays a very important and well-known role in parameter estimation theory, not only because it defines the optimality of estimators through the Rao-Cramer bound, but also because it allows us to estimate a priori what the smallest error bars attainable will be for a given parameter (see again Appendix A). In other words, using the Fisher matrix we can forecast how well a parameter will be measured by a given experiment. This is very useful in order to optimize the experimental design to the detection of the parameters of interest. In our specific case, a Fisher matrix analysis will help us to understand what is the smallest f NL detectable in principle using different CMB datasets, and which experimental features can be improved in order to increase the sensitivity to f NL .

A general derivation
Formula (A.12) from appendix A, when applied to our case yields: The error bars can be obtained from Gaussian simulations as long as the weak NG approximation applies. As we saw earlier, this works at any f NL for any shape, except for the local shape when a large f loc. NL makes the error bars f NL -dependent. In this case the error bars would need to be calculated from NG simulations of f loc.
NL . So far no high-significance detection of f loc. NL has been reported, so working with G maps is at this stage sufficient to get accurate error bars.
NL < 147 WMAP 3-year estimate obtained in , corresponding to a "nearly 3-σ" detection of local NG, seems not to agree well with the 9 < f loc.
NL < 129, ∼ 2.3σ result obtained on the same data set in . The origin of the discrepancy is unclear, although it is argued in  that it might be due to differences in the coadding scheme of different data channels, or analogous differences in the choice of some weights. As pointed out in , one additional advantage of the fully optimal implementation of the estimator is actually that all the ambiguity related to the use of different coadding schemes disappears, since the optimal coadding strategy is automatically selected in the inverse covariance filtering process. Another discrepancy is that between the two equilateral constraints on WMAP 5-year data. It seems that the pseudo-optimal estimator produces better constraints than the optimal one. This is clearly not possible.  claim that their numerical pipeline calculates the theoretical ansatz for the bispectrum shape more accurately than it was done before. That is due to a subtlety that went unnoticed in previous works, consisting in the necessity to extend above the horizon the upper integration limit in the calculation of the equilateral shape related quantities β (r), γ (r), δ (r) (see equation (??)). This is required in order to obtain stable numerical solutions, and it calls for a reasessment of the expected and measured error bars, that actually increase with respect to previous calculations.
where B 1 2 3 is the angular averaged bispectrum (i.e. the measured quantity). This can be rewritten in terms of the reduced bispectrum as: where we have defined (see also w l 1 l 2 l 3 in (III.58)): I 1 2 3 = (2 1 + 1)(2 2 + 1)(2 3 + 1) 4π Note how the features of the experiment enter the Fisher matrix through the parameter max , defining the angular resolution, and in the angular power spectrum expression in the denominator, that contains the angular beam and experimental noise:C where C is the theoretical power spectrum for a given set of cosmological parameters, W is the beam of the experiment, and N is the experimental noise. N is a constant for uncorrelated noise. Likewise, the theoretical bispectrum will be convolved by the experimental beam.
Note that, since the noise is generally Gaussian, its three point function vanishes. The experimental noise thus only enters in the denominator of the Fisher matrix expression. The effects of partial sky coverage can be easily accounted for. From (III.91) it follows that if only a fraction f sky of the full sky is covered then the Fisher matrix takes a f sky factor in front, that produces a degradation of the error bars of f sky . We saw previously that for separable shapes the reduced bispectrum can be calculated either analytically, under some simplifying assumptions on the transfer functions (e.g. the Sachs-Wolfe approximation), or numerically through formula (III.43). It is then possible to evaluate numerically the fisher matrix and the corresponding error ∆ f NL ≡ √ 1/F. In the context of f NL estimation, the first calculation of this kind was done for f loc. NL in Komatsu & Spergel (2001), where it was found that WMAP could reach a sensitivity ∆ f NL = 20 (note how this bound is actually saturated by the optimal estimator results presented in table III), while Planck (The Planck Collaboration, 2006) could go down to ∆ f NL = 5 11 . What allows Planck to improve on WMAP is that it has a much better angular resolution and that it is cosmic variance dominated in a very large range of scales i.e. the power spectrum signal C b is larger than the noise N up to max = 2000. Angular resolution and sensitivity are the two factors that increase the ability of a CMB experiment to constrain f NL . This information is provided by the Fisher matrix expression (III.122). Looking at such expression, we notice how the signal-to-noise ratio is obtained by adding over all the bispectrum configurations up to max , weighted by their variance. Thus, the higher max is, the more configurations are included in the sum and the larger is the final sensitivity to f NL . On the other hand we see that, if the power spectrum of the instrumental noise appearing in the variance term in the denominator dominates from a certain S =N , then the signal contribution is suppressed above that threshold by the noise power spectra appearing in the denominator of (III.122). So what determines the sensitivity of a CMB experiment to f NL is the range of over which the instrumental noise is low, so the experiment is cosmic variance dominated. This range is 2000 for Planck and 500 for WMAP, hence Planck can obtain tighter constraints than WMAP. This is shown in fig. 11, where Fisher matrix forecasts of f NL are plotted for different CMB experiments: the predicted error bars decrease with up to the angular scale at which the measurements start to be noise dominated, after which the f NL signal-to-noise ratio saturates. A simple calculation done in  taking the Sachs Wolfe approximation, and working in flat sky, showed that before noise dominates the signal-to-noise ratio for the local shape grows as: where the (ln) is dictated by the coupling between large and small scales introduced by squeezed configurations, from which most of the local signal comes. Note also how, in absence of experimental noise, the beams in the numerator and in the denominator of (III.122) cancel each other out. An ideal noiseless CMB experiment would then have a signal to noise ratio indefinitely growing. However, this would not imply infinite sensitivity to f NL , because above a certain max secondary anisotropies would start to dominate. Fisher matrix analysis of the equilateral shape Smith & Zaldarriaga, 2006, e.g. ) showed that the minimum achievable error bars in this case are ∆ f NL 100 and ∆ f NL 60, for WMAP and Planck respectively 12 . Additional shapes are studied in Smith & Zaldarriaga (2006).

Polarization
Babich & Zaldarriaga (2004) showed with a Fisher matrix analysis that the CMB E-mode polarization measurements can be used to improve the sensitivity to f NL . Although we have dealt so far only with temperature bispectra and related estimators, including polarization is fairly straightforward. As usual the calculation starts from the formula (III.30) 11 Note that all the errors quoted in this section are at 1-σ. 12 Note how the larger error bars in this case with respect to the local constraints does not reflect a higher sensitivity of CMB measurement to f loc.
NL , but only the conventional choice of the normalization of the bispectrum amplitude in the definition of f NL . The normalizations are in fact chosen in such a way that the bispectra have the same value for equilateral configurations 1 = 2 = 3 , where the local bispectrum is suppressed, and the equilateral bispectrum is peaked.
linking the multipoles of CMB anisotropies to the primordial potential Φ, but this time including the polarization radiative transfer ∆ E (k) in the convolution integral: The bispectrum is then defined in the usual way, but this time more configurations can be built by correlating temperature and polarization multipoles: The point to emphasize is that the polarization signal is generated on scales where the temperature signal is suppressed by Silk damping. The polarization bispectra thus open a window over a new k-range in the 3D → 2D projection k → , and increase the overall information available. In other words, since the new configurations T T E, T EE, etc. including polarization are partially independent of the pure temperature (T T T ) bispectrum, adding those additional configurations to the Fisher matrix (and to the actual f NL estimation from data) increases the total signal available. The Fisher matrix expression now becomes: where i, j, k, p, q and r run over the T and E superscripts. We still work in the assumption that all the quantities involved are Gaussian, but now the different bispectra of temperature and polarization are correlated for a given configuration 1 , 2 , 3 , thus defining a multivariate Gaussian distribution. The full covariance matrix between bispectra (indicated by cov in the formula above) has then to be evaluated. A numerical evaluation of (III.129) shows ) that, for an ideal (i.e. noiseless) experiment, adding the polarization signal produces an improvement of a factor ∼ 2 on f NL constraints. For WMAP, adding polarization bispectra produces very little improvement, since polarization data are mostly noise dominated. For Planck, however, including polarization does generate a significant improvement, bringing the forecasted error bars from ∆ f NL 5 to ∆ f NL 3.5 . Some error bar forecasts from temperature and polarization bispectra as a function of max for different experimental designs including WMAP and Planck are shown in fig. 11. Motivated by this analysis, Yadav et al. (2007 have implemented a bispectrum estimator of f NL including both temperature and polarization bispectra. All the general considerations about optimality and the numerical implementation techniques described in previous sections apply in an analogous way to the temperature + polarization case, although the presence of additional bispectra with a non-trivial covariance matrix introduces a few additional technical complications. We refer the reader to Yadav et al. (2007 for further discussion.

I. Non-Gaussian contaminants
So far we have considered only primordial non-Gaussianity as a source of the three-point function of the CMB. However many other astrophysical and cosmological effects can produce an observable angular bispectrum. Among these, diffuse astrophysical foreground emission (see e.g. Bennett et al., 2003;Leach et al., 2008, and references therein), unresolved point sources (e.g. Komatsu et al., 2003) and secondary anisotropies are probably the most important NG sources. Since the main focus on this review is on the primordial bispectrum, we will not describe this NG sources in great detail. We will however outline in this section their main effects in order to understand if, and how, they could contaminate an estimate of primordial NG. Let us consider a number N s of sources of a CMB bispectrum signal and call B i 1 2 3 the bispectrum produced by the i-th source. Let us also indicate with B f NL =1 1 2 3 the primordial component of the bispectrum calculated for f NL = 1. For our purposes B f NL =1 1 2 3 is the signal that we want to measure, while the other bispectra are contaminants, that we would like to eliminate. The total bispectrum of the map in presence of these contaminants is then: where A i is the amplitude of the i-th bispectrum. If we have a precise prediction of the bispectra generated by the contaminants, we can then think of extending our f NL estimator to a joint-estimator of all the amplitude parameters. The optimal cubic f NL estimator defined in (III.72) would then be generalized to the multi-parameter case by minimizing the following χ 2 : The new errors on f NL in this case can be forecasted as usual by means of a Fisher matrix analysis. The Fisher matrix described in the previous paragraph can be generalized straightforwardly to the multi-parameter case. In this case F becomes an array whose entries are defined as: The optimal errors on a given amplitude A i (including f NL ) then become, according to the multi-dimensional generalization of the Rao-Cramer bound: where the crucial point to notice is that we now first invert the Fisher matrix and then we take the square root of the diagonal elements to find the errors. This is the error that is obtained when the full joint-parameter likelihood is calculated and then the 1-dimensional likelihood for a given parameter is obtained by integrating out all the other degrees of freedom: a process defined in statistics as marginalization. One can see that the inverse of the Fisher matrix defines the covariance matrix of the parameters under study. If the various parameters are completely uncorrelated, then the Fisher matrix is diagonal and we would have , showing that the parameters can obviously be estimated independently and the marginalization process doesn't change the error bars on a given parameter of interest (in our case f NL ). If the different parameters are correlated, however, then off-diagonal terms appear in the Fisher matrix and the error bars after marginalization (i.e. the "real" error bars to quote in the results) are larger than those that would have been obtained by naively neglecting contaminants. An obvious but useful observation is that two bispectral amplitudes will be strongly correlated when the respective shapes are similar. To make a practical example, the bispectrum generated by correlating weak lensing of CMB anisotropies with the Integrated Sachs-Wolfe (ISW) effect can be shown to be peaked on squeezed configurations. For this reason the presence of this effect can be a significant contaminant for estimates of local non-Gaussianity.
So far in this section we have described the degradation effects on the error bars if an hypothetical joint-estimator of all the CMB bispectrum amplitudes was built, and the amplitudes of contaminants were marginalized over to estimate f NL . However a joint-estimation might be difficult, due to factors like the presence of theoretical uncertainties on the shapes of contaminant bispectra or possible practical difficulties in finding an efficient implementation of this full bispectrum-likelihood estimator (e.g. if the additional secondary bispectra are non-separable). As a result, the practical approach so far has been to estimate only f NL using the techniques described in previous sections, and neglect possible non-primordial contaminants. In this case the possible effect of contaminants would not show up as a degradation of the error bars but in an even worse way, by introducing a bias in the f NL measurements. Let us see this by assuming that the CMB three point function takes contributions both from a primordial NG component and from a contaminant bispectrum with amplitude A i . Let us also assume that we can produce a set of NG Monte Carlo simulations of CMB maps including both bispectra. We assign a given f NL in input to the primordial component of our simulated maps. Finally we estimate the average f NL obtained from the simulations by applying the usual optimal cubic statistic described so far. The result of our MC average will be: where B observed is the averaged bispectrum extracted from the map. The f NL term on the r.h.s. of the second line comes from the fact that the normalization N is chosen in such a way as to obtain an unbiased estimator of the primordial component. However a second term is present, which accounts for the fact that a contaminant bispectrum, B cont. 1 2 3 is in the map; this term clearly biases the estimator 13 . The magnitude of the bias will depend again on how similar the shape of the contaminant bispectrum is to the primordial one. If, for example, the contaminant bispectrum is strongly peaked on equilateral configurations and suppressed on squeezed ones, a local estimator of NG will then not be significantly biased by it, since the second term in equation (III.134) will cancel-out. However, an estimate of f equil NL will in this case be significantly biased.
In general we can define the correlation coefficients between two bispectra, labeled i and j, as: (III.135) The definition of "correlation coefficient" becomes completely transparent if we rewrite the previous formula in terms of the Fisher matrix, and keep in mind that F −1 define the covariance matrix of the bispectrum amplitudes: The correlation coefficient vary by definition from 0, for totally uncorrelated shapes, to 1, for identical shapes. The more a given contaminant bispectrum is correlated to the primordial bispectrum that we want to measure, the larger will be the induced bias. At this point we distinguish between three possibilities. The first is that the contaminant bispectrum shape and amplitude are perfectly known. In that case we can compute the expected bias from formula (IV.173) and subtract from our estimate. The second possibility is that the shape of the contaminant bispectrum is known, but its amplitude is defined with a given uncertainty. In this case we can propagate this uncertainty by quoting it in addition to the statistical error bars on f NL obtained in the usual way. The third and worst possibility is that we are unaware of the presence of some contaminant effect, or we know nothing about its bispectrum. In this case we might obtain a biased estimate of f NL without knowing it and thus eventually misinterpret a spurious NG effect as primordial NG. Contaminants are then very dangerous, because if not properly taken into account can lead to spurious claim of detection of primordial NG. For this reason, if a positive detection of f NL were to be made at some point for a certain model, all possible tests for the presence of contaminant effects should be performed. Moreover, since we cannot be absolutely sure that we are considering all possible source of NG contamination, cross-validation of the result using other non-bispectrum based estimators will be very important. These other estimators (Minkowski Functionals, wavelets, needlets, higher order correlators are just some examples among those considered in the literature) are by construction sub-optimal estimators of the primordial component. However, in principle they are expected to produce a totally different response to NG contaminants than the primordial bispectrum. A cross-detection of f NL with many different statistics would then be much less likely due to some unknown spurious effect. Another way to test the primordial origin of an observed NG signal, recently proposed in , is to modify the optimal bispectrum estimator in order to evaluate a function of rather than a single amplitude f NL . The point is that if a clear detection of f NL is achieved at several σ, then the signal is large enough to allow a less radical data compression.  have then recently proposed to estimate the "bispectrum related power spectrum", C skew defined as: Like in the usual f NL estimator, the optimal S /N weighting is included, and the observed bispectrum from the map is correlated to the theoretical shape. However in this case we don't measure the overall amplitude, but rather the amplitude for each -bin. Note thatf by construction of C skew , the usual f NL estimator is then retrieved by summing the bispectrum-related power spectrum over all the . The general idea is now that the functional dependence of this skew power spectrum on will show significant variation between different sources of NG, allowing a clearer test of the hypothesis that the origin of the observed signal is primordial. A number of investigations of WMAP data have already been performed using this statistic in order to look for primordial and secondary signals (Calabrese et al., 2009;Smidt et al., 2009), and related pseudo-C l statistics have been developed in . In any case, as long as bispectrum estimators are considered, independently of the specific statistic or implementation, the best way to deal with NG contaminants is to make sure to list all of them and study their bispectra, or at least find ways to assess their potential impact on the final results. In the following paragraphs we will then turn our attention to a classification of the most important potential sources of spurious NG, and see how they are treated in the primordial bispectrum analysis. Finally, we will consider some effects that interact with the f NL measurement not necessarily by directly producing a secondary bispectrum, but rather by changing the normalization of the estimator or by increasing the error bars without producing any bias.

Diffuse foreground emission
There are three main astrophysical effects producing a galactic microwave emission from our galaxy in the typical frequency range of a CMB experiment (Bennett et al., 2003;Dodelson, 2003): free-free emission from electronion scattering, synchrotron emission from acceleration of cosmic ray electrons in magnetic fields, and thermal dust emission.
Since these sources produce signals with a peculiar spectral and spatial distribution, multifrequency observations allows the separation of them from the primordial component of the CMB signal by suitable component separation algorithms. In the resulting "cleaned" map the foreground contribution to the a m is minimized, although obviously it can never be completely eliminated. The remaining foreground contamination after cleaning is called the foreground residual. Note that the emission from the galactic plane of the CMB map is so strong that a clean separation of the primordial CMB component from the foregrounds is impossible. The galactic regions that are too contaminated to produce a clean component separation have to be masked out in the analysis. The size of the galactic mask will depend on the choice of the foreground flux level above which the pixel is considered too contaminated to be included in the analysis. The choice of the cut-off will depend on the specific analysis that one wants to perform on the data. Since the primordial NG signal is much smaller than the Gaussian component, more conservative masks (i.e. larger) need generally to be used for f NL estimates than those applied to C estimation. Direct information about the spatial distribution of foreground emission in the sky (i.e. free-free, synchrotron or dust) is provided in the form of templates, obtained either from the most foreground contaminated channels of the CMB experiment itself, or from external astrophysical surveys (e.g. observations of radio-emission, maps of Hα emission). Templates are affected by several sources of uncertainties and errors (see e.g. Bennett et al., 2003) and using them in assessing the possible impact on f NL of foreground emission or residuals has both advantages and disadvantages. The safest approach is probably to combine internal consistency tests on the data with analysis involving the use of templates.
The first extensive tests of possible foreground contamination in f NL measurements were performed in , where a detection of a primordial local signal at above 99.5% level on WMAP 3-years data was claimed. As explained earlier (see caption of table III), further analysis on more recent datasets and/or using more optimal estimators have led to an updated f loc.
NL estimate that is about 2-σ away from the origin, i.e. just a "hint" of a possible local signal, rather than a detection. However, as long as a detection was claimed in , tests to exclude a possible contamination from diffuse foregrounds had to be carried out. In this case the authors relied mostly on the "internal consistency test" approach. Their analysis included: 1. Expanding the original galactic mask in order to see if the estimated value of f NL is stable for different choice of the mask. A significantly lower value of f NL for a larger mask might mean that some unmasked noise contribution is affecting the measurement with the original mask.
2. Comparing f NL estimates from foreground reduced maps to estimates from "raw" maps that include a galactic mask, but have not gone through a component separation process. If foregrounds have a significant impact on f NL then one expects the measurements from raw and reduced maps to differ significantly.
3. Comparing different frequency channels. If foregrounds significantly contaminate measurements at given frequencies, then different channels should produce different results.
Analysis involving some kind of prior information about foreground emission were carried on both in  and . The two approaches adopted in this case were: 1. Producing simulations including both a Gaussian primordial CMB signal and the foreground emission. The latter has in this case to be generated according to a model that allows for a good reconstruction of the observed templates. The f NL estimator can then be applied to these simulations in order to check if the measured f NL is consistent with 0 (as it should be, in absence of significant foreground contamination, since the primordial input is Gaussian).
2. For an optimal estimator including full C −1 pre-filtering , adding the foreground templates to the noise covariance, by assigning infinite variance to each template T i templ (n). In this way the estimate is "blind" to the template amplitudes. This produces a loss of information that in turn determines an increase of the variance. The larger the contamination from foreground is, the more the variance increase. For negligible contamination, the variance stays the same. In any case, the effect of foregrounds is entirely included in the error bars, provided the assumed templates are accurate enough. This method of analysis, called template marginalization, is adopted in . A complete mathematical derivation of this method is provided in Rybicki & Press (1992).
In addition to the methods outlined above, there is also the possibility of using the foreground templates for a jointestimation of f NL and of the templates amplitudes (see equation (III.131)). This approach has been recently used in Cabella et al. (2009) for a needlet estimator. It could be obviously reapplied in the same form to a bispectrum estimator.
In conclusion, all the tests above have been applied to WMAP 3-years and 5-years data releases. No evidence for the presence of a significant contamination of the local f NL measurement from diffuse foreground was produced. Other shapes of f NL were not considered since the only type of non-Gaussianity that has produced a marginal detection is so far the local one. Although diffuse foregrounds and foreground residuals do not seem to contaminate primordial NG measurements in WMAP, this is not guaranteed to hold true for Planck, due to its much higher sensitivity.

Unresolved point sources
Extragalactic point sources are the most important foreground at small angular scales (see Wright et al., 2009). Sources are identified by searching the maps for bright spots that fit the beam profile, and then masked out. However not all the sources can be resolved and eliminated in this way. Unresolved point sources contaminate the map and are a source of a NG signal that can potentially interfere with primordial NG measurements. Unclustered extra-galactic point sources have a Poisson distribution and their bispectrum is then simply a constant: b ps 1 2 3 = b ps , (III.139) with an amplitude that has to be estimated from the data and depends on the level of contamination from unresolved sources. We can now use equation (III.136) to estimate the correlation between primordial shapes and the point source bispectrum. For a given choice of the amplitude we can also estimate the expected bias on the f NL estimator. Simulations of NG maps including the bispectrum from point sources can also be produced and the primordial f NL estimator for different shapes applied to them in order to estimate the bias. Finally, since b ps 1 2 3 is manifestly separable, an estimator of b ps can be built. All of these analysis were performed in Komatsu et al. (2009Komatsu et al. ( , 2003  NL was found in this case. As for the diffuse foreground case, the enhanced f NL sensitivity that Planck can achieve with respect to WMAP might increase the impact of these effects.

Secondary anisotropies
One big advantage of using CMB anisotropies to test primordial NG is that they are small and can then be treated in the linear regime. The CMB temperature fluctuation field is thus linked to the primordial potential through a linear convolution with radiation transfer functions, as we saw earlier. At this level, the Gaussianity of the primordial potential is conserved in the CMB temperature fluctuation field. If, however, we work at second order in perturbation theory, the initial conditions are propagated non-linearly into the observed CMB anisotropies, and the resulting CMB fluctuations are mildly non-Gaussian even starting from a Gaussian primordial curvature field. Second order effects are clearly very small. However they may well be of the same order of magnitude as primordial NG, since the NG component of the primordial potential is O f NL Φ 2 L (x) . In conclusion, secondary anisotropies are a potential source of CMB NG, at a level that could in principle contaminate estimates of primordial non-Gaussianity. To fully account for these effects, it is necessary to obtain a relation analogous to equation (III.30), but to second order in perturbation theory. Radiation transfer functions are obtained at first order by solving the linearized system of Boltzmann-Einstein equations (see e.g. Dodelson, 2003;Ma & Bertschinger, 1995). The same equations will then have to be expanded and numerically integrated at second order in this case. Having obtained second order transfer functions, the full angular bispectrum of secondary anisotropies can be calculated and correlated to the primordial one in order to check for the presence of contaminant effects. A full second order Boltzmann code is actually not yet available, although much progress has been made over the past few years in that direction. The full system of second order Einstein-Boltzmann equations has been derived in (Bartolo et al., 2006a,b;Pitrou, 2009;Pitrou et al., 2008) and partially integrated numerically in (Nitta et al., 2009) including only the source terms that can be written as product of first order perturbations. These terms have been shown to produce a totally negligible NG contamination. No numerical evaluation is available to date of the "genuine" second order source terms. Although a full solution of the relevant equations hasn't been obtained yet, a significant number of secondary effects are known, and have been modeled for some time. Among these there are for example weak lensing, Sunyaev-Zeldovich (SZ) effect, Rees-Sciama (RS) effect, and so on. Therefore, a natural approach that was adopted in the literature consisted in studying the bispectra arising from these well-known effects and from their correlations (e.g. ISW-lensing correlation, SZ-lensing correlation and so on). It goes beyond the purpose of this review to discuss in detail these results and their implications. Let us just mention them briefly. A fisher matrix analysis in Serra &  showed that the combination of bispectra arising from ISW-lensing, SZ-lensing and unresolved point sources produced a negligible contamination at the angular resolution and sensitivity of WMAP, but a significant one for an experiment with the characteristics of Planck. It was in particular shown that estimates of local NG would be biased, especially by ISW-lensing correlation, with f bias NL ∼ 10 for local NG. A similar result on ISW-lensing was obtained in another Fisher matrix analysis by Smith & Zaldarriaga (2006), and a similar level of contamination was found in Mangilli & Verde (2009) for the analogous RS-lensing bispectrum. A bispectrum estimator of local and equilateral NG was applied to simulated lensed primordial NG CMB maps by Hanson et al. (2009), and three main effects were studied: a possible bias induced by neglecting the lensing of primordial bispectrum in the normalization and weights of the estimator, an increase of the variance due to lensingproduced higher order correlators, and ISW-lensing bias. The only significant effect turned out to be the ISW-lensing bias on f loc. NL , at a level confirming Fisher matrix predictions. Note that this bias, being well-known and expected, can be simply calculated and subtracted from future Planck estimates. The reason why the coupling between lensing and ISW tend to bias the local estimate can be understood physically: large scale potential fluctuations source the ISW effect and produce a lensing signal on small scales, generating a NG signal on squeezed triangles. Although both the primordial local bispectrum and the ISW-lensing bispectrum are peaked on squeezed triangles, the presence of acoustic oscillations in the primordial configurations reduce the overall correlation between the two shapes, thus making the final bias significant, but not too large. In order to conclude our brief survey of studies of secondary bispectra, let us mention the work done in Babich & Pierpaoli (2008), where point source density modulation bispectra induced by lensing magnification and selection effects, as well as SZ modulation from lensing magnification were studied. The conclusion was again that these effect are negligible for WMAP but close to the sensitivity level of Planck. Despite the great attention received so far in the literature, much work still has to be done in the area of assessing NG contamination from secondary sources. It is clear that a complete and accurate description of secondary bispectra will be crucial for analysis of the future Planck data set.

Non-Gaussian noise
Systematics are another potential cause of contamination beyond astrophysical and cosmological sources. The noise in the experiment is generally well described as Gaussian. However possible non-Gaussian properties have to be tested in our context. This was done in  by taking differences of yearly WMAP data in order to create jackknife realizations of WMAP noise maps for different detectors, including instrument systematics. The estimator can then be applied to these realizations in order to check that a negligible f NL is measured. This was the result obtained on the WMAP 3-year data-set.

Other effects
In this section we quickly summarize other effects that could interfere with estimates of primordial non-Gaussianity, but did not fit the classification above in the sense that they do not correspond to NG effects contaminating the CMB sky or the instrument noise.
One of this effects is 1/ f noise, expected to affect especially the low frequency channels of Planck. The 1/ f noise component is generally removed from the map using "destriping" algorithms (see e.g. Efstathiou, 2005;Maino et al., 2002). The unsubtracted "destriping residuals" form a Gaussian correlated random field in pixel space. Their nontrivial covariance matrix should in principle be included in the inverse covariance pre-filtering of the optimal estimator. If not included in the pre-filtering, this effect could in principle enhance the estimator error bars (although it cannot generate any bias, since it is Gaussian). Unfortunately, a full numerical evaluation of this covariance matrix is quite challenging. Donzelli et al. (2009) applied the estimator in its pseudo-optimal implementation to maps of Gaussian CMB signal + noise, accounting only for anisotropic noise in the linear term, but including destriping residuals in the noise model adopted for the simulations. The final result shows that the error bars do not increase when 1/ f noise effects are included in the simulations, even though they are neglected in the covariance matrix appearing in the estimator.
Another effect to take into account for Planck is that of an asymmetric beam. The beam in the estimator normalization term is approximated as a circular beam. However Planck optical simulations (e.g. Sandri et al., 2004) show that in reality we have to deal with elliptic beams, characterized by a non-trivial azimuthal dependence. If the circular beam approximation in the normalization of the estimate is not accurate enough, a bias could be introduced. Moreover the anisotropy of the beam could cause an increase of the variance if neglected in the inverse covariance pre-filtering. Again, these effects were found to be negligible in tests on realistic simulations performed by Donzelli et al. (2009).
Finally, the estimate of f NL is done assuming a given cosmological model, i.e. by fixing all the other cosmological parameters to their best-fit value obtained from a likelihood analysis of the angular power spectrum. Since they are themselves the product of a statistical estimation process, these values obviously present uncertainties that should be propagated into the final f NL -error bars 14 . This calculation was done in Liguori & Riotto (2008), where it was found that the propagated error is f NL dependent and it can become important only if a large f NL will be detected in the data at some point.

J. Generation of simulated non-Gaussian CMB maps
In this section we will describe algorithms for the generation of non-Gaussian CMB maps with a given bispectrum. There are three main reasons why primordial NG simulations of the CMB are useful in the context of bispectrum estimation of f NL : 1. To test the unbiasedness of the f NL bispectrum estimator (by checking that the Monte Carlo average of the recovered f NL reproduces the f NL set in input).
2. To study how the expected primordial NG signal imprinted in the CMB is modified by the presence of other effects, like those considered in section III.I. For example, weak lensing of primordial NG might in principle change the observed bispectrum and affect the estimates. This can be studied again by testing the estimator on NG lensed simulations, as it was done in Hanson et al. (2009) 3. For local NG, to obtain the error bars of the f NL estimator if a large f NL is detected at several σ (see section III.E.4). We have previously seen that for a several sigma detection of local NG the bispectrum variance is f NLdependent. The Monte Carlo average (III.117) thus has to be evaluated on NG simulations with the measured f NL in input.
Unless we are in the situation described at point 3. of the list above all we need to produce is then maps with given power spectrum and bispectrum, since higher order correlators can be neglected. In the large local f NL case higher order correlators are instead important and have to be included. Fortunately the local case is the only one for which we have a full expression of the primordial potential Φ(x) that allows us to produce exact simulations. We will divide this section in two parts. In the first we will describe exact simulation algorithms of local NG, while in the second we will describe methods to generate maps with given power spectrum and bispectrum, starting from an arbitrary primordial shape.

Algorithms for local non-Gaussianity
First of all, let us recall that the CMB multipoles a m are related to the primordial gravitational potential Φ through the well known formula: where ∆ (k) are the radiation transfer function and the potential is written in Fourier space. We already met this formula when we calculated the relation between the primordial and CMB bispectrum in section III.A. We also recall that the local non-Gaussian primordial potential takes a very simple expression in real space, where: In the previous expression Φ L is a Gaussian random field, characterized by a primordial power spectrum P Φ (k) = Ak n−4 ; in the following we will refer to Φ L (x) as the Gaussian part of the primordial potential. The remaining non-Gaussian part of the potential is simply the square of the Gaussian part point by point (modulo a constant term, necessary to enforce the condition Φ(x) = 0; however it is clear that this term only affects the CMB monopole). It is then convenient to work directly in real space and recast formula (III.140) in the following form where α (r) ≡ dk k 2 j (kr)∆ (k), also used in (III.31), is the real space counterpart of the radiation transfer functions ∆ (k), j (kr) is a spherical Bessel function, and r is a look-back conformal distance. This formula suggests to structure an algorithm for the generation of local CMB NG maps in the following steps 1. Generate the Gaussian part Φ L of the potential in a box whose side is the present cosmic horizon.
2. Square the Gaussian part point by point to get the non-Gaussian part.
3. Expand in spherical harmonics the Gaussian and non-Gaussian parts of the potential for different values of the radial coordinate r in the simulation box.
4. Convolve the spherical harmonic expansions of Φ L and Φ NL with the radiation transfer function ∆ (r) in order to obtain the Gaussian and non-Gaussian part of the multipoles of the final NG CMB simulation. For a given choice of the non-Gaussian parameter f NL a CMB map is then obtained simply through the linear combination a m = a L m + f NL a NL m (the superscripts L and NL always indicating Gaussian and non-Gaussian respectively). The most difficult and time consuming part in this process is actually the generation of the Gaussian part of the potential Φ. One possibility is to generate the Gaussian part of the potential in a cubic box in Fourier space (where different modes are uncorrelated and have variance given by the primordial power spectrum P Φ (k), then apply a Fast Fourier Transform (FFT) algorithm to go to real space. Cartesian coordinates are then transformed into spherical coordinates by means of an interpolation algorithm in order to transform Φ L (x) into Φ L (r,n). Finally, the Gaussian potential in spherical coordinates is squared point by point to get the NG part and the spherical harmonic expansion and radiation transfer function convolution at point 4 of the list above are performed in order to obtain the multipoles of the final CMB map. The aforementioned algorithm was implemented in Komatsu et al. (2003) to generate NG local CMB maps at the resolution of the Planck satellite.
The difficulty with this approach arises from the fact that we are working in a box of the size of the present cosmic horizon, (about 15 Gpc in conformal time), but at the same time a cell in this box must have a side no bigger than 20 Mpc in order to resolve the last scattering surface, where most of the CMB signal is generated. A more convenient and accurate way to produce the local NG a m was found in (Liguori et al., 2006;Liguori et al., 2003): the idea is to work directly in spherical coordinates, use a non uniform discretization of the simulation box (since no sample points are needed in a large region of the box where photons are just free streaming, while many sample points are needed at last scattering, as we just pointed out above) and generate the multipoles of the expansion of Φ L (x) through the following two step approach: 1. Generated uncorrelated radial multipoles n m (r), gaussianly distributed and characterized by the following spectrum: n 1 m 1 (r 1 )n * 2 m 2 (r 2 ) = δ D (r 1 − r 2 ) r 2 δ 2 1 δ m 2 m 1 ; (III.143) where δ D is the Dirac delta function.
2. Filter the multipoles n m with suitable functions in order to produce a Gaussian random field with the properties of the multipole expansion of the primordial Gaussian potential Φ L . It can be shown that the expression of the filter functions is: where P Φ is the primordial curvature power spectrum, and the filtering operation takes the form Φ L m (r) = dr 1 r 2 1 n m (r 1 )W (r, r 1 ) . (III.145) In the last expression Φ L m (r) are the desired quantities, i.e. the multipoles of the expansion of the Gaussian part of the primordial potential for a given r.
This algorithm, recently improved in Elsner & Wandelt (2009), was used to produce NG local maps at the resolution of WMAP and Planck in temperature and polarization. An example of its results is shown in the upper panels of fig. 12. Fig. 13 shows 1-point PDFs of temperature anisotropies for different values of f loc. NL , extracted from these simulations.

Algorithms for arbitrary bispectra
In the limit of weak non-Gaussianity, an algorithm to produce non-Gaussian CMB simulations with a given power spectrum and bispectrum for separable primordial shapes was described in Smith & Zaldarriaga (2006). In this algorithm the non-Gaussian components of the CMB multipoles are obtained using the following formula: where a G m is the Gaussian part of the CMB multipoles, generated using the angular power spectrum C , while B 1 2 3 is the given bispectrum of the theoretical model for which simulations are required. Note that alternative algorithms to generate CMB maps with given bispectrum have been proposed in the literature (Contaldi & Magueijo, 2001;Rocha et al., 2005) but they are less general that the one introduced by equation (III.146). Although (III.146) is completely general, as before its numerical evaluation is only computationally affordable for bispectra that can be written in separable form. We have emphasized already that separability results in a reduction of the computational cost of the estimator (III.100) from O( 5 max ) to O( 3 max ) operations; the same argument applies here, and allows to rewrite (III.146) into an equivalent form in pixel space. Starting from formula (III.46), and substituting it in (III.146), we find: As already discussed in the f NL -estimator section, the limitation dictated by separability is clearly overcome by using the eigenfunction representations for the bispectrum (II.26) and (III.60) introduced in Fergusson et al. (2009). As usual, the basic idea is to start by expanding an arbitrary bispectrum shape S (either primordial or in the CMB) using a separable polynomial decomposition until a good level of convergence is achieved and then to substitute the mode decomposition into (III.146) to get a linear combination of numerically tractable terms written in the form (III.147). NL , in order to give an idea of the order of magnitude of the signal that one wants to detect. For f NL < 1000 the non-Gaussianity is too small to be seen this this plots. Note that WMAP constrain f loc. NL to be 100.
Using the separable mode coefficients α prs for the reduced bispectrum (III.56) and the filtered map expressions M p (r,n) (III.110) as the starting point, we find that the expression (III.147) generalises to a NG lm = 1 18 prs α prs dxx 2 q l p (x) dΩn Y m * l (n) M G r (r,n) M G s (r,n) , (III.148) where the M G p (r,n) are found by summing using a set of Gaussian a G m 's convolved with the q l p 's (refer to eqn (III.108)), Here, the accuracy of convergence with the α prs is parametrized in terms of the correlationC(S , S N ) between the original non-separable shape and the eigenmode expansion, as defined previously (II.19). Note that this convergence can also be checked more accurately using the full Fisher matrix correlation on the CMB bispectra C(b 1 2 3 , b N 1 2 3 ), described in sections III.H and III.I.
In addition to the bispectrum separability requirement, there is an important further caveat which can prevent the straightforward implementation of the algorithm (III.146). By construction, terms O( f 2 NL ) and higher are not explicitly controlled. Following the discussion in (Hanson et al., 2009) we can write the connected N-point functions as: Thus the condition that the map has the power spectrum C l specified in input will only be satisfied if the power spectrum of the non-Gaussian component in (III.150) remains small. Since this method does not control O( f 2 NL ) terms, one has to ascertain that spuriously large C NG l contributions do not affect the overall power spectrum significantly. It turns out that this effect plagues current map simulations if the standard separable expressions for the local and equilateral bispectra are directly substituted into (III.146). However a slight modification of equation (III.146), described in Hanson et al. (2009) andFergusson et al. (2009) allows us to overcome this problem at no computational cost. Moreover, it was shown by Fergusson et al. (2009) that maps obtained from the eigenmode expansions (II.26) and (III.60) are stable independently of the shape under study, thus making this map-making generating algorithm robust and fully general. Examples of DBI NG maps produced by combining the eigenmode expansion method with the map making algorithm described in this section are shown in the lower panels of fig. 12.

IV. LARGE-SCALE STRUCTURE
In the standard scenario, early perturbations produced during inflation are responsible for the common origin of the CMB temperature fluctuations and the large-scale matter and galaxy distributions in the Universe, i.e. the large-scale structure. The Cosmic Microwave Background provides a remarkable example of a Gaussian random field in nature. Information on cosmological parameters is in fact derived from measurements of its power spectrum, the C l 's, while bispectrum measurements from WMAP data remain consistent with zero. The distribution of matter, as we can infer today from shear or galaxy observations, unlike the CMB, can be described as a highly non-Gaussian random field, even for Gaussian initial conditions. The matter overdensity δ(x) is defined in terms of the matter density ρ(x) and its mean valueρ by with zero mean by construction. Here, at late times, the limiting value δ = −1 in voids, accounting for a large fraction of the volume of the Universe, while achieving values δ 1 in collapsed objects such as dark matter halos. Its probability distribution function is therefore expected, at low redshift, to depart strongly from a Gaussian distribution centred at δ = 0, even though it could be well approximated by it at decoupling, when perturbations around δ = 0 were of the order of δ ∼ 10 −5 . Such non-Gaussianity is the result of the nonlinear evolution of structures subject to gravitational instability.
In addition, nonlinearities in the bias relation between the galaxy and matter distributions constitute a second source of non-Gaussianity in the large-scale structure mapped out by redshift surveys. Non-Gaussian initial conditions would therefore provide a third component in the non-Gaussianity of the galaxy distribution. The question regarding the detection of effects due to primordial non-Gaussianity, is therefore strictly related to our ability to distinguish between these different contributions and, ultimately, it will depend on the robustness of our theoretical predictions in the linear and mildly nonlinear regime. From this respect, cosmological Perturbation Theory (PT), and its more recent developments, is very important for providing the tools to study the evolution of non-Gaussianities and how to differentiate their origin.
Considering only the matter distribution, the leading order prediction in standard PT for the matter bispectrum at large scales is given by the sum of a primordial component and a component due to gravitational instability, which is present also for Gaussian initial conditions. Until fairly recently it was assumed that this picture could be easily extended to the galaxy distribution, with the galaxy bispectrum receiving an additional contribution due to nonlinear bias. Following the historical development of the subject, in section IV.A we will discuss early work on higher-order moments of the matter and galaxy distribution, starting with the skewness. Here, most of the theoretical results on higher-order correlation functions are developed. We will then consider in IV.B the matter bispectrum and its description in Eulerian perturbation theory, with specific attention given to effects at large scales due to a primordial component, as well as at small-scales, nonlinear corrections in presence of non-Gaussian initial conditions. In section IV.C, we will deal with the galaxy bispectrum. We will first introduce the simple model based on local bias and discuss problems related to bispectrum measurements in redshift surveys with specific attention given to the detection of primordial non-Gaussianity. We will see how early results indicated that the galaxy bispectrum could be used as a tool to constrain non-Gaussian initial conditions which is, in principle, competitive with the CMB, illustrating this with actual results from current data-sets. We will then consider the outcome of recent N-body simulations with non-Gaussian initial conditions showing that the simple prediction for the galaxy bispectrum assumed in most of the previous literature on the subject fails to describe not only the measured halo bispectrum, but even the halo power spectrum, even at large scales! We now know that correlators of biased populations such as galaxies and dark matter halos, receive large corrections, at large scales, from local primordial non-Gaussianity. These results opened up new and promising opportunities for detection in future large-scale structure observations. Although, in our view, a proper understanding of these effects remains to be adequately developed at the time of writing, particularly with respect to higher-order galaxy correlation functions, we will describe the different descriptions proposed so far in the literature and the prospects for detection of primordial non-Gaussianity in measurements of the galaxy bispectrum.
From an historical perspective, non-Gaussian initial conditions have been studied for quite a long time. For instance, early works on the clustering of density peaks and rare objects can be found in Grinstein & Wise (1986) ;Lucchin & Matarrese (1988); Matarrese et al. (1986), while early N-body simulations with non-Gaussian initial conditions go back to the early eighties (Coles et al., 1993;Messina et al., 1990;Moscardini et al., 1991;Weinberg & Cole, 1992;White, 1999). In the early days, a large variety of non-Gaussian models, often defined in terms of a nonlinear transformation of a Gaussian field were considered. In some cases, a large non-Gaussian component were studied because, on the one hand, they could be used falsify some models and, on the other, as a way to reconcile contradictory observational results with theoretical frameworks. In this review, however, we will consider only models predicting small departures from Gaussian initial conditions which are consistent with CMB observations. Finally, while we focus in this review on direct bispectrum measurements, it should be stressed that the effects of primordial non-Gaussianity on large-scale structure are not limited to corrections to its higher-order correlation functions. Aside from the recent results on the galaxy power spectrum mentioned above, significant departures from Gaussian initial conditions are expected to have important effects on the halo mass function and therefore on the observed cluster number density. See section 2.1 in   In addition, the corresponding effect on the abundance of voids has been studied by Kamionkowski et al. (2009), while the possibility of constraining primordial non-Gaussianity from measurements of Minkowski Functionals in large-scale structure has been explored by Hikage et al. (2008Hikage et al. ( , 2006. Further effects on the intergalactic medium and reionization (Crociani et al., 2009;Viel et al., 2009) or on future 21cm observations (Cooray, 2006;Pillepich et al., 2007) have also been investigated. We refer the reader to other reviews in this issue for a more complete discussion of these alternative approaches.

A. The skewness
Since the first large-scale observations did not allow an accurate determination of individual bispectrum or trispectrum configurations, most of the attention in the early literature focused on the moments of the galaxy distribution, and, in the first place on the third-and fourth-order moments, i.e. the skewness and kurtosis, respectively. The "normalized" moment of order p can be defined in terms of the smoothed density field δ R (x) as For Gaussian initial conditions, a perturbative treatment of the equations of gravitational instability predicts at leading order (Peebles, 1980) s 3,R = 34 7 σ R , (IV.155) with σ 2 R = δ 2 R , computed in linear theory. When non-Gaussian initial conditions are present, one expects an extra contribution to the skewness, typically with a different relation with σ R , whose value depends on the non-Gaussian model. Comparisons between the second-and third-order moments, S 3,R and σ R , (as well as higher-order moments such as the kurtosis) measured in redshift surveys have been early recognized as a tool to test the Gaussianity of primordial perturbations, Coles & Frenk, 1991;Coles et al., 1993;Fry & Scherrer, 1994;Juszkiewicz et al., 1993;Lahav et al., 1993;Lucchin et al., 1994;Luo & Schramm, 1993). These works recognized as well the importance of reliable predictions in the nonlinear regime and of a proper modeling of the effects of galaxy bias. In this respect, Fry & Scherrer (1994) proposed a more quantitative prediction for the contribution to the galaxy skewness due to galaxy bias based perturbation theory and on the local bias expansion of (Fry & Gaztañaga, 1993). They derived, for the skewness of the galaxy distribution, an expression of the form where we assumed non-Gaussian initial conditions described by a non-vanishing initial skewness s (0) 3,R (but vanishing higher-order moments) and where b 1 and b 2 represent constant bias parameters typical of the galaxy population (which ODEL el dynamics is gov- is conformal time, al with the vacuumar sigma model for the modified leap-Turok 1994; Nagaulating the density texture dynamics is ity perturbations by itions for a gravitauctuations are then stwood 1981; Efsta-We have tested our cts, and our models ion simulations. The del turns out to be model characterized e we use ⍀ ϭ 1 and ure models can be for different matter etails are given in tions are estimated from Baugh et al. (1995). ions in each ensemers to estimate the e evolution of the variance in spheres Figure 1 shows the variance in the texture model (filled triangles) at two different epochs, 8 ϭ 0.1 and 8 ϭ 1.0. Note that the initial results match roughly the linear variance in the CDM ⌫ ϭ 0.5 model (dashed line), although in detail they are closer to the ⌫ ϭ 0.7 shape (continuous line). The evolved nonlinear variance at 8 ϭ 1.0 is close to the corresponding nonlinear variance in the Gaussian ⌫ ϭ 0.5 simulations (open circles). Note that the deviations from the linear growth are similar in the two cases, indicating that the initial non-Gaussianities have only a small effect in the nonlinear growth.
The top panel in Figure 2 shows the initial values of the normalized skewness; S 3 ϵ 3 / 2 2 in both models. The Gaussian model (open circles) matches well with the Zeldovich approximation, as expected (see Baugh et al. 1995). The non-Gaussian model shows a characteristic increase of S 3 with scale. At large scales, R ? 10 h Ϫ1 Mpc, a fit of the form 3 ϭ A 2 ␣ yields A 3 1 and ␣ 3 3/2 ϩ 0.1 (Fig. 2, dotted line). This is close to the strongly non-Gaussian transition mentioned in the Introduction. The lower panel in Figure 2 shows the evolution of S 3 for Gaussian models (open circles) at 8 ϭ 1 in comparison with the non-Gaussian models at 8 ϭ 0.4 (triangles) and 8 ϭ 1 (squares). The Gaussian models reproduce quite well the PT predictions at large scales. As the texture simulations evolve, the shape and amplitude of S 3 in the non-Gaussian model slowly approaches the one in the Gaussian models at small scales. At larger scales there is a change in the slope of S J , showing a characteristic minimum which separates the regime where gravity starts dominating the evolution from the one in which the initial conditions are still the dominant effect, i.e., equation (2) We find that the initial conditions follow: J ϭ 2 J/ 2 ϩ ⑀ , 0.1 (Fig. 3, dotted line), again close to the transition to non-Gaussian initial conditions. This tendency agrees the predictions for J ϭ 3 Ϫ 4 by Turok & Spergel he time evolution, at intermediate scales R 3 10 h Ϫ1 be approximated by z ϭ 0 ( 8 ϭ 1.0) and z ϭ 1.5 ( 8 ϭ 0.4).

COMPARISON WITH THE APM
ure 4 we show the values of S 3 , S 4 , and S 5 estimated angular APM Galaxy Survey (Gaztañaga 1994), no evolution in S J (filled symbols) or the texture (open s ymbol s) given by equation (4). The model for tion leads only to small differences, since the mean n the APM is only z3 0.15. These three-dimensional s result from using a simple scaling law to model the n effects. Although there are some potential probh this modeling (Bernardeau 1995), we believe that sults are accurate (see Gaztañaga 1995;Baugh & ga 1996). Depending on the way counts-in-cells are d, the mean angular amplitudes could increase or rapidly with scale at the largest scales, i.e., l ? 10Њ 1 Mpc). These diverging estimates are not reliable, pling effects from the finite APM volume dominate the statistics at these larger scales (Gaztañaga 1994;Baugh & Gaztañaga 1996). Large-scale galaxy fluctuations, ␦ g , might be biased tracers of the underlying matter fluctuations, ␦. To account for this possible bias and uncertainties in the normalization of the texture model, we consider different outputs and scale them with different biasing prescriptions. In Figure 4 we show S J in the texture model for two different outputs, together with the values at 8 ϭ 0.4 normalized with a linear biasing relation, ␦ g ϭ b␦, which produces S J, g ϭ S J /b J Ϫ 2 . We have chosen b ϭ 1.5 as the optimal value to match the APM amplitudes S J, g around 8 h Ϫ1 Mpc. Note that for 8 ϭ 0.4 the linear bias requires b ϭ 2.5 if we want to fit the APM variance at 8 h Ϫ1 Mpc, but this value of b produces a poor matching for S J, g . We introduce more biasing parameters with a nonlinear transformation: which for small variances, 2 Ͻ 1, still gives a linear relation for 2 but changes the final amplitudes to S g, J given by equation (10) in Fry & Gaztañaga (1993), e.g., S 3, g ϭ (S 3 ϩ 3c 2 )/b. For the texture amplitudes at 8 ϭ 0.4 we have to fix b ϭ 2.5, c 2 3 1, c 3 3 6 and c 4 3 Ϫ180 to match 2 , S 3 , S 4 , and S 5 at 8 h Ϫ1 Mpc in the APM. The resulting shapes are shown as long-dashed lines in Figure 4. For scales up to 40 h Ϫ1 Mpc in S 3 , or up to 20 h Ϫ1 Mpc in S 4 , we find no significant minimum or rise within the errors in the APM, in contrast to the unbiased or biased texture predictions.

CONCLUSION
In Gaussian models with a different initial power spectrum there is an excellent agreement for S J between PT and N-body simulations on scales where the variance is approximately linear (Juszkiewicz et al. 1993;Bernardeau 1994;Gaztañaga & Baugh 1995). The values of S J do not evolve much with time. we will discuss explicitly in section IV.C). This relatively simple expression describe the skewness measured in galaxy surveys, as the sum of three components corresponding to three sources of non-Gaussianity for the galaxy distribution: one primordial, one due to gravitational instability and the last to nonlinear bias. Further studies in perturbation theory can be found in (Chodorowski & Bouchet, 1996;Durrer et al., 2000) while an alternative derivation of the smoothed moments of the density field based on the spherical collapse model has been studied in (Gaztañaga & Fosalba, 1998). The skewness predicted by texture models has been studied in simulations as a function of the smoothing scale R by Gaztañaga & Mähönen (1996) and compared to measurements of the same quantities in the APS Galaxy Survey (Gaztañaga, 1994), see fig. 14. The differences between the s 3,R in the non-Gaussian texture model with respect to the Gaussian case provides a qualitative example of the typical effects that we expect for non-Gaussian initial conditions as a function of the smoothing scale R and redshift. The measured skewness, as higher order moments, corresponds to a single number. Despite the possibility to study its peculiar dependence on the smoothing scale R, it is nevertheless difficult to separate the different components, particularly with respect to bias effects. However, this possibility is offered in principle by direct measurements of the galaxy bispectrum, relying on its dependence on the shape of triangular configurations. In the next sections we will discuss in details first the bispectrum of the matter distribution then the bispectrum of the galaxy distribution, a direct observable in redshift surveys.

B. The matter bispectrum
In this review we will focus on the predictions for correlation functions in Fourier space from Eulerian Perturbation Theory (PT). This approach solves perturbatively the equations for the matter density and velocity field evolution governed by gravitational instability. These are the continuity equation, the Euler equation and Poisson equation relating the matter density and the gravitational potential. In the PT framework, the relation between the results and the initial conditions, given in terms of the initial correlators of the density field is particularly transparent. Moreover, recent works have significantly extended, as we will discuss later, the predicting power of this specific tool. Different approaches are also available: see, for instance, Scoccimarro (2000a) for a comparison between bispectrum measurements in N-body simulations and predictions in Lagrangian Perturbation Theory. We refer the reader to  for a comprehensive review of cosmological perturbation theory of the large-scale structure.

Leading-order results in Perturbation Theory
As mentioned before, we consider specifically models where non-Gaussian initial conditions are completely given in terms of the correlators of the curvature perturbations at early times, and the mechanism responsible for the extra non-Gaussian properties of the density field is not active during the subsequent evolution of matter perturbations, governed only by gravitational instability. In PT, the solution for the evolved matter density contrast is expressed as a series of corrections to the linear solution δ (1) (Fry, 1984) where each term can be written formally as 15 with F n (q 1 , . . . , q n ) representing the symmetrized n-order kernel in PT. The initial conditions in the Gaussian case are completely specified by the linear power spectrum P 0 (k), with δ (1) k 1 δ (1) k 2 = δ D (k 12 )P 0 (k 1 ), where we adopt the notation k i j ≡ k i + k j . Non-Gaussian initial conditions are described, in the first place, by a non-zero expression for the threepoint function of the linear solution, that is δ (1) k 1 δ (1) k 2 δ (1) k 3 . In turn, the initial matter correlators, i.e. the correlators of the linear solution δ (1) , are given in terms of the correlators of the curvature perturbations, as Notice that we denote with Φ the primordial curvature perturbations, i.e. evaluated during the matter dominated era, not their value linearly extrapolated at present time 16 . The linear, i.e. initial, power spectrum is given by while the initial bispectrum and trispectrum are (IV.164) Notice that given these simple relations between curvature and primordial matter correlators, issues such as the property of separability discussed in section II.B for the CMB bispectrum are not present in the case of three-dimensional, large-scale structure observables. The nonlinear power spectrum is obtained perturbatively from the expansion where the term δ (1) k 1 δ (1) k 2 corresponds to the linear solution, P 0 (k) while the other terms represent, in analogy with perturbation theory in quantum field theory, one-and higher loop corrections as they involve integrations over internal momenta. In particular, the term δ (1) k 1 δ (2) k 2 vanishes for Gaussian initial conditions as it depends on the initial bispectrum B 0 (see Taruya et al. 2008 for an analysis of nonlinear corrections to the matter power spectrum due to primordial non-Gaussianity).
In a similar fashion, nonlinear corrections in (IV.157) provide a perturbative expansion for the matter bispectrum, In this case, the leading order contributions are given by the tree-level terms δ (1) k 1 δ (1) k 2 δ (1) k 3 and δ (1) k 1 δ (1) k 2 δ (2) k 3 , with the first being the initial component and the second corresponding to a contribution to the matter bispectrum due to gravity alone, of the form B tree G (k 1 , k 2 , k 3 ) = 2F 2 (k 1 , k 2 )P 0 (k 1 )P 0 (k 2 ) + 2 perm. .

(IV.167)
Notice that this contribution is present even for Gaussian initial conditions as it depends only on the initial power spectrum P 0 and describe the emergence of non-Gaussianity due to gravitational instability. The leading order, treelevel expression of the matter bispectrum with non-Gaussian initial conditions is therefore given in terms of the sum B tree (k 1 , k 2 , k 3 ) = B 0 (k 1 , k 2 , k 3 ; z) + B tree G (k 1 , k 2 , k 3 ; z) .

(IV.168)
This expression corresponds to the first two terms on the r.h.s. of (IV.156) for the skewness, which can be obtained from (IV.168) by integration. The possibility of distinguishing the primordial component B 0 from the gravity-induced one B G relies on their specific and distinct dependence on scale, on the triangular configuration shape and on redshift. For a primordial non-Gaussianity described by a curvature bispectrum obeying the hierarchical scaling B Φ ∼ P 2 Φ , typical of weakly non-Gaussian models such as the local and equilateral ones, the different redshift and scale dependence of the two contributions is evident in their ratio for equilateral triangles (k 1 = k 2 = k 3 = k), given by 17 . (IV.169) We therefore expect, for a wide range of non-Gaussian models, the initial contribution B 0 to be larger at large models, for values of the respective parameters f NL corresponding to the current 95% C.L. limits , with the shaded area indicating the allowed region.
In addition, B tree G presents a specific dependence on triangle shapes, determined by gravitational instability and described by (IV.167) at tree-level. The shape dependence of B 0 , determined by the specific non-Gaussian model under consideration, is generically different. Such differences can be explicitly shown in plots of the reduced bispectrum, defined as Q(k 1 , k 2 , k 3 ) = B(k 1 , k 2 , k 3 ) P(k 1 )P(k 2 ) + 2 perm.
, (IV.170) which removes the redshift and scale dependencies of the gravity contribution. fig. 16 shows the reduced bispectrum Q(k 1 , k 2 , k 3 ) at tree-level in perturbation theory, at z = 1 for k 1 = 0.01 h Mpc −1 , k 2 = 1.5k 1 as a function of the angle θ between k 1 and k 2 . In all panels, the continuous line represents the gravity-induced term which assumes larger values for nearly collapsed triangles, i.e. for θ 0 or π. This indicates that the probability of finding larger values for the matter density in triplets of points forming a squeezed or folded triangle is larger than for nearly equilateral triangles. This prediction is confirmed by the typical filamentary nature of the large-scale structure, evident from snapshots of N-body simulations or images of redshift surveys, since along these filaments it is easier to form collapsed triangles than equilateral ones. It should be stressed that the bispectrum is, in fact, the lowest order statistic sensitive to the three-dimensionality of structures and that these features are not captured by the information contained in the power spectrum alone. The effects of the primordial component on the matter bispectrum are shown by the dashed lines which correspond, as in fig. 15 to the 2-σ limits from CMB observations, in the case of the local (upper left panel), equilateral (upper right panel) and orthogonal (lower left panel) models while they correspond to the values f NL = ±300 in the folded case, for which no experimental bounds are available. Although the large scales k 1 = 0.01 h Mpc −1 and k 2 = 0.015 h Mpc −1 and the relatively high redshift z = 1 have been chosen to enhance the effect of the non-Gaussian component, these triangles are not completely out of reach for future, large-volume surveys. Primordial non-Gaussianity modifies, in very specific ways, the shape dependence of the matter bispectrum produced by gravitational instability. While the dependence of the matter bispectrum on scale and redshift is responsible for the specific behavior of the skewness of the matter density field on the smoothing scale R and redshift, the sensitivity to the triangle shape is completely lost in analysis of the density higher-order moments. Instead, accurate measurements of the bispectrum, when achievable, offer in principle the possibility to disentangle the different contributions when triangles of different size and shape are included in the analysis.
The matter bispectrum is not, unfortunately, a direct observable. While we will discuss later how the statistical properties of the matter distribution can be inferred from galaxy redshift surveys, we should mention that the shear field in weak lensing surveys is another observable directly related to the matter distribution. The observational consequences on the weak lensing bispectrum, of a primordial non-Gaussian component (of the local type) such as the one in (IV.168), have been explored in Takada & Jain (2004). The authors find that, the primordial component alone (i.e. without contamination from the gravitational one) could be detected if f loc.
NL > 150 f 1/2 sky , assuming l max 500 and a tomography over four redshift bins for a galaxy number density ofn g = 100 arcmin −2 . The large cosmic variance for low 's makes difficult the detection of the primordial component, prominent instead at larger scales. As we will see in the next section, primordial non-Gaussianity has some effect on small scales as well, due to the nonlinear evolution of structures.

Second-order corrections
The simple prediction of (IV.168) for the matter bispectrum is expected to be valid at the largest observable scales and at high-redshift, where nonlinear evolution is subdominant. Despite the fact that such conditions correspond as well to the regime where a detection of the initial component B 0 is favored, the effects of non-Gaussian initial conditions can be significant even at smaller scales and at low redshift. Since these effects are the result of nonlinear gravitational evolution and non-Gaussian initial conditions, it is no longer possible to identify distinct contributions resulting from distinct sources of non-Gaussianity, as it is the case for the tree-level expression of (IV.168). Nevertheless, it is possible to distinguish individual corrections in PT to the matter bispectrum depending exclusively on the initial power spectrum P 0 , and therefore present as well for Gaussian initial conditions, and corrections depending instead on higher-order initial correlators, such as the initial bispectrum B 0 and trispectrum T 0 , which can be interpreted as small-scales effects due to non-Gaussian initial conditions. One-loop corrections in PT for Gaussian initial conditions have been studied in Scoccimarro (1997), while the extension of these results to non-Gaussian initial conditions is studied in Sefusatti (2009).
A comparison of these results with measurements of the matter bispectrum in N-body simulations (Desjacques et al., 2009) with non-Gaussian initial conditions of the local kind can be found in Sefusatti et al. (2010). Fig. 17 shows the equilateral configurations of the matter bispectrum measured in N-body simulations together with predictions from perturbation theory at tree-level (dashed line) and one-loop (continuous line). In particular, the upper left panel considers B(k, k, k) for Gaussian initial conditions while the upper right panels shows the same quantity divided by the tree-level prediction in PT to highlight the small-scales non-linear behavior. The lower left and right panels show, respectively, the ratio and the difference between the matter bispectrum with an initial local component corresponding to f NL = 100 and the Gaussian case. The agreement between one-loop predictions and the simulations results is quite remarkable, while we notice that the tree-level prediction fails to accurately describe the effect of primordial non-Gaussianity already at relatively large scales.
The significance of these relatively small corrections to individual configurations is to be considered in relation to the much larger number of configurations that can be measured as we include smaller and smaller scales and they could lead to a measurable effect when considered in terms of the cumulative signal to noise ratio.  17 Upper panels: equilateral configurations of the matter bispectrum measured in N-body simulations with Gaussian initial conditions (data points) and tree-level (dashed lines) and one-loop (continuous lines) predictions in perturbation theory. The right panel shows the ratio to the tree-level prediction with acoustic oscillations removed. Lower panels): ratio (left) and difference (right) between the matter bispectrum measured in realizations with local non-Gaussian initial conditions ( f NL = 100) and the Gaussian case, compared with PT predictions. From (Sefusatti et al., 2010). effects loose in part the shape dependence of the original initial bispectrum and require an accurate model (perhaps beyond standard perturbation theory) and strong priors on the underlying cosmological parameters to be distinguished from the nonlinear, "Gaussian" component. A step in the direction of improved predictions is offered by the promising results of Renormalized Perturbation Theory Crocce & Scoccimarro, 2006a,b) and of the Renormalization Group approach (Matarrese & Pietroni, 2007;Pietroni, 2008). The extension of the latter to the case of non-Gaussian initial conditions has been recently considered in Bartolo et al. (2009), which studies specific predictions for the matter power spectrum and bispectrum.

C. The Galaxy Bispectrum
From the discussion above, we could expect that future, large-volume and high-redshift galaxy surveys will be able to directly detect a possible, large primordial component to the matter bispectrum by measurements of the galaxy bispectrum, or at least provide constraints on the non-Gaussian parameters comparable to the constraints from measurements of the CMB bispectrum. Such an expectation is motivated by the simple observation that the number of Fourier modes available in a three-dimensional, ideal, all-sky galaxy survey is in principle much larger than the number of modes available in two-dimensional CMB maps.
The galaxy distribution is, however, a less direct probe of the early Universe than the CMB temperature fluctuations. On top of the nonlinear evolution of structures and its contribution to higher-order correlation functions, one has to take into account the nonlinear nature of galaxy bias, itself responsible for additional non-Gaussianity. An analysis of the galaxy bispectrum should therefore be able to detect a small primordial component by separating it from these primary contributions.
In this respect, an even more complex picture, due to additional and somehow unexpected effects of primordial non-Gaussianity on galaxy bias, has been emerging in the last couple of years, following the results of Dalal et al. (2008). N-body simulations have shown, in fact, that nonlinear bias and an initial bispectrum are not two distinct sources of non-Gaussianity for the galaxy bispectrum, not even at large scales! Instead a local initial component can significantly affect the bias relation precisely at large scales, adding extra corrections. In the spirit of a review and since we do not have, at the time of writing, a satisfactory model of the galaxy bispectrum in presence of non-Gaussian initial conditions, in this section we will summarize earlier results, while in section IV.C.5 we will present the recent developments that radically changed our understanding of the effects of local non-Gaussianity on the largescale structure and finally comment, in section IV.C.6 on some consequences for galaxy bispectrum measurements as far as current research provides.

The galaxy bispectrum and local bias
Until recently, it was commonly assumed, even for non-Gaussian initial conditions, that the galaxy overdensity δ g (x), defined in terms of the galaxy density n g (x) and its meann g as can be expressed, at large-scales, as a local function of the matter density contrast, δ(x) 18 , i.e.
Such a reasonable expectation is based on the fact that the physics of galaxy formation operates on much smaller scales, below the typical halo size, than those we are interested in. At large scales, where fluctuations are small, δ R 1, we can consider the Taylor expansion, (Fry & Gaztañaga, 1993) (IV.173) describing the bias relation between galaxy and matter in terms of a series of constant bias parameters, b i . This expansion allows for a consistent extension of the perturbative expressions for the matter correlators to the galaxy ones. In fact, from (IV.173) we can derive the galaxy three-point function in position space (IV.174) and the tree-level expression for the galaxy bispectrum given by where the second term on the r.h.s., proportional to the quadratic bias parameter b 2 , is of the same order of the gravityinduced contribution to the matter bispectrum B tree G , (IV.167). Relying on this simple result, measurements of the galaxy bispectrum has been considered in the first place, in the context of Gaussian initial conditions, as a way to determine the bias parameters, and break the degeneracy between linear bias (b 1 ) and the amplitude of matter fluctuations (e.g. σ 8 ), otherwise affecting power spectrum measurements (Fry, 1994;Jeong & Komatsu, 2009a;Matarrese et al., 1997;Nishimichi et al., 2007;Scoccimarro, 2000a;Scoccimarro et al., 1999;Sefusatti et al., 2006;Sefusatti & Scoccimarro, 2005). In this respect, the corresponding reduced galaxy bispectrum is where Q is the reduced matter bispectrum (including a possible initial contribution) and the effect of nonlinear bias is simply given by an additive constant term. As already mentioned, measurements of triangular configurations different in shape and size allow to disentangle the different sources of non-Gaussianity and determine independently b 1 and b 2 , provided that accurate predictions for the matter bispectrum, from PT or N-body simulations, are available (Guo & Jing, 2009a,b) and the effects of redshift distortions and the survey geometry are properly taken into account (Scoccimarro, 2000a;Smith et al., 2008).
In particular, if we allow the possibility of non-Gaussian initial conditions, then the matter bispectrum includes an initial contribution, so that we can rewrite (??) at tree-level explicitly as and we can extend the analysis to obtain simultaneous constraints on the bias parameters and on the parameter determining the amplitude of the primordial bispectrum, i.e. f NL . A first conservative estimate of the possibilities offered by this method in measurements of the galaxy bispectrum in the 2dF Galaxy Redshift Survey (Colless et al., 2001) and in the Sloan Digital Sky Survey (SDSS York et al., 2000) is given in Verde et al. (2000) as a simple extension of previous results for the bias alone (Matarrese et al., 1997) suggesting that a primordial component could be detected for values of a local f NL of the order of 10 3 -10 4 . As we will see in the next sections, a complete analysis of the galaxy bispectrum, including all measurable configurations can improve this estimate by more than an order of magnitude. Among the various observational issues in analyses of galaxy correlators, e.g. finite volume effects or completeness of the galaxy samples, we stress that particularly relevance has the problem of redshift distortions. Redshift distortions have in fact a significant impact on the shape dependence of the galaxy bispectrum, particularly at small scales (Scoccimarro, 2000a;Scoccimarro et al., 1999). A recent treatment of redshift distortions in bispectrum predictions (with Gaussian initial conditions) can be found in Smith et al. (2008).

A bispectrum estimator
In this section we define a simple estimator for the measurement of the galaxy bispectrum in N-body simulations as well as actual data. This allows us to derive an expression for the bispectrum variance and define a Fisher matrix for an analysis of the galaxy bispectrum in terms of the non-Gaussian (and bias) parameters. In the next section we will consider a proper likelihood analysis and the effects of the bispectrum covariance. Since what follows can be applied in general to bispectrum measurements we will consider, to simplify the notation, the case of the matter density field in Fourier space, described by the density contrast δ k . We will point-out relevant differences in the application to the galaxy distribution.
For a cubic box of volume V, a bispectrum estimator can be defined as (Scoccimarro et al., 1998) where V f ≡ k 3 f = (2π) 3 /V is the volume of the fundamental cell and where each integration is defined over the bin q i ∈ [k i − ∆k/2, k i + ∆k/2] centered at k i and of size ∆k equal to a multiple of the fundamental frequency k f . The Dirac delta function δ D (q 123 ) ensures that the wavenumbers q 1 , q 2 and q 3 indeed form a closed triangle, as imposed by translational invariance, while the normalization factor V B (k 1 , k 2 , k 3 ), given by represents the number of fundamental triangular configurations (given by the triplet q 1 , q 2 and q 3 ) that belong to the triangular configuration bin defined by the triangle sizes k 1 , k 2 and k 3 with uncertainty ∆k.
The leading contribution to the bispectrum variance following from this estimator, in analogy with the power spectrum case (Feldman et al., 1994), is given by Scoccimarro et al. (1998) with the factor s 123 = 6, 2, 1 respectively for equilateral, isosceles and general triangles and where P tot (k) ≡ P(k) + 1 (2π) 3 1 n , (IV.181) with the particle (or galaxy) number densityn accounting for the shot noise contribution. In the case of a galaxy distribution, the matter power spectrum P(k) on the r.h.s. should be replace with the galaxy power spectrum, expressed, at large-scales, by P g (k) = b 2 1 P(k), under the local bias assumption of (IV.173). (IV.180) constitutes the Gaussian limit to the bispectrum variance, as it neglects higher-order corrections dependent on the three-, four-and six-point, connected, correlation functions.

Fisher matrix forecasts
In this section we consider simple forecasts for the constraints on the non-Gaussian parameters from measurements of the galaxy bispectrum in future redshift surveys. Specifically, we will consider a Fisher matrix for reduced galaxy bispectrum Q g in terms of the non-Gaussian parameter f NL and the linear and quadratic bias parameters b 1 and b 2 . These three parameters characterize the relative weight of the different non-Gaussian contributions to the galaxy bispectrum. Since the possibility to detect a primordial component relies on our ability to separate the three contributions, a robust result should, at least, involve a marginalization over bias. On the other hand, we will assume all cosmological parameters as known. This is in part justified by the weak dependence of the matter reduced bispectrum on cosmology discussed in the previous section. In this respect, it can be shown that the reduced bispectrum has the same signal to noise as the bispectrum. For a given triangular configurations, in fact, since the variance of Q is dominated by the variance of B (see, for instance, Scoccimarro et al., 2004). The Fisher matrix can be written as where the indeces α and β run over the parameters of interest f NL , b 1 and b 2 , while the reduced bispectrum variance, as mentioned above, can be expressed in first approximation as Now, Q on the right-hand side is independent of z at the tree level when initial fluctuations are Gaussian. Therefore, the first term falls as 1=b 1 at higher z where b 1 is larger (Fig. 3). On the other hand, the second term actually grows as z: for the current example b 2 =b 2 1 0:016 at z 1 and 0.31 at z 3. Therefore, our sensitivity to b 2 grows with z, while our sensitivity to b 1 declines with z.
Let us study more quantitatively the sensitivity to b 1 . In the limit of linear bias, b 2 0, a signal-to-noise ratio of the reduced bispectrum of equilateral configurations is given by We expect, therefore, that a signal-to-noise ratio of the bispectrum from gravitational instability declines with z, resulting in an increasing error on b 1 at higher z.
In practice, however, we predict that galaxy surveys at higher z should result in better determinations of both b 1 and b 2 . The reason is quite simple: k max at higher z must be larger than that at lower z. In the upper left panel of Fig. 4 we show k max as determined from R; z 0:5: k max 0:17 h Mpc ÿ1 at z 1 and k max 0:47 h Mpc ÿ1 at z 3. The difference is clear: when the modes up to k max are included, a survey at z 3 yields an error on b 1 that is a factor of 5 better than that at z 1. As for b 2 , a survey at z 3 does better by nearly 2 orders of magnitude.
How about primordial non-Gaussianity? In the lower panels of Fig. 4 we show the predicted errors on f loc: NL and f eq: NL , marginalized over b 1 and b 2 . We find that the difference between z 1 and 3 is negligible at the same k max . This is a consequence of the fact that a signal-to-noise ratio for the primordial bispectrum component is not, in the first approximation, redshift dependent. For equilateral configurations one finds with ∆B 2 g given by (IV.180). Notice that ∆Q 2 g depends on the linear bias parameter b 1 . The sum over the triangles configurations can be explicitly defined in terms of three sums over the wavenumber k 1 , k 2 and k 3 in steps of ∆k, with k * min = max(k min , |k 1 − k 2 |) to ensure that a close triangle can be formed and with k max representing the minimal physical scale included in the analysis. Clearly, larger values of k max correspond to a much larger number of available configurations. For this reason, in fact, the cumulative signal to noise for the bispectrum, i.e. the sum of the signal to noise over all measurable configurations, grows more rapidly with k max than it does for the power spectrum. On the other hand, we expect the primordial component to decrease significantly at small scales (high-k). In practice, however, k max can be defined as the smallest scale at which we can trust our model for the galaxy bispectrum, in our case, the tree-level expression in (IV.177).
In fig. 18 the forecasted errors on bias parameters and non-Gaussian parameters as a function of k max for an ideal geometry galaxy survey of volume V = 10 h −3 Gpc 3 and a galaxy number density of n g = 5 × 10 −3 h 3 Mpc −3 at redshift z = 1 (dashed, red lines) and z = 3 (continuous, blue lines). The negligible difference between the results for the non-Gaussian parameters at different redshift is a consequence of the fact that the signal to noise of the primordial component to the matter and galaxy bispectrum for a single triangular configuration, B 0 /∆B 2 , is, in our approximation, constant, both as a function of redshift and scale. This is not the case for the contributions due to gravitational instability and bias. It is clear that the choice of k max significantly affects the final result. For instance, Sefusatti & Komatsu (2007) define k max , for a given survey, from as the inverse of the scale R given by the condition σ(R, z) = 0.5 to ensure that the tree-level predictions is applied within the mildly nonlinear range. Notice that the choice of k max depend on redshift, since at larger redshift we can expect a larger range of validity of perturbation theory predictions, both for matter and galaxy bispectrum. The dependence on the survey volume is simply given by ∼ 1/ √ V. Scoccimarro et al. (2004), from a Fisher matrix analysis as the one described above, have shown that the 2dF and SDSS surveys should be able to probe values of f loc. NL 100, assuming k max = 0.3 h Mpc −1 . They suggested as well that an all-sky survey with a galaxy number density of n g ∼ 3 × 10 −3 h 3 Mpc −3 up to redshift z ∼ 1 can probe values of f loc. NL of order unity. Sefusatti & Komatsu (2007) provided more specific predictions for a choice of planned and proposed high-redshift galaxy surveys, based on a similar Fisher approach, for the errors on non-Gaussian parameters both for the local and equilateral model. It is found that, for equilateral non-Gaussianity, the degeneracy between the non-Gaussian parameter f eq. NL and the bias parameters is severe, but it extends to unphysical regions of the b 1 -b 2 plane and it can be severely reduced by introducing a correlation between linear and quadratic bias as the one predicted by the halo model. The marginalization over bias can be then replaced by a marginalization over the parameters of the Halo Occupation Distribution describing the galaxy population. Sefusatti & Komatsu (2007) finds that future large-volume surveys (V ∼ 100 h −3 Gpc 3 at z ∼ 1, 2), designed to accurately measure acoustic oscillations in the galaxy correlation function and thus map the late-time expansion of the Universe, should be able to probe f loc.
NL ∼ 4 and f eq. NL ∼ 20, i.e. values comparable to those expected from future CMB missions. At that time they constituted they best forecasts for constraints on f NL from large-scale structure measurements. These results implied, in particular, that if Planck will indeed detect primordial non-Gaussianity, a confirmation by large-scale structure observations will be required.

Effects of covariance and current results
The simple Fisher matrix analysis described in the previous section makes several approximations, starting with the assumption of an ideal geometry for the survey under consideration, and the Gaussian variance for the galaxy bispectrum configurations. In fact we can expect a proper treatment of the survey selection function and of the bispectrum covariance to have a significant impact on the estimation of the non-Gaussian (and bias) parameters. For instance, triangular bispectrum configurations at the largest scales probed by a realistic redshift survey (where the initial component should provide the largest corrections) are indeed highly correlated, because of the limited number of measurable Fourier modes.
The issue of bispectrum covariance has been studied in (Scoccimarro, 2000a;Scoccimarro et al., 2004;Sefusatti et al., 2006;Sefusatti & Scoccimarro, 2005). For instance, Scoccimarro et al. (2004) compare the Fisher matrix results for an ideal survey with a volume and galaxy number density similar to those of the main sample of the SDSS, with the predictions resulting from a likelihood analysis of the same survey, including the effects of survey geometry and covariance. Such analysis involves all measurable triangular configurations defined by wavenumbers k 1 , k 2 , k 3 ≤ 0.3 h Mpc −1 , with ∆k = 0.015 h Mpc −1 , resulting in a total number of triangle bins, N T = 1015. The estimation of the corresponding, 1015 × 1015, bispectrum covariance matrix clearly represents a challenging computational problem as it cannot be determined from a relatively small number of N-body simulations. This work uses instead a code (Scoccimarro, 2000a) implementing particle displacements as predicted by second order Lagrangian perturbation theory (2LPT, see, for instance, , and references therein) to produce 6, 000 realizations of the density field. Such large number of realizations is in fact necessary for an accurate determination of the covariance matrix. In addition, the 2LPT results, including particle velocities, allow for exact redshift distortions. Each mock catalog, in redshift space, is then weighted according to the FKP procedure (Feldman et al., 1994;Matarrese et al., 1997;Scoccimarro, 2000a) to take into account the SDSS selection function. The same covariance matrix is compared to analytic expressions in Sefusatti et al. (2006).
Given a proper estimate of the covariance matrix, a likelihood function for the reduced bispectrum Q n can be defined time marginalized over the third our corresponds to the survey ge-. The lower right panel shows the er marginalization; smaller uncerarger volume survey geometry. We bispectrum measurements to conntical results are obtained by scaltalogs. e these results to those of the precorrespond to very different sur-, comparing Fig. 5 to the longd 4 shows that our more realistic rger by a factor of 4 -5. There are listic'' treatment to be actually an ble error bars with a more sophisollowing reasons. First, we only f the survey; second we use FKP optimal at large scales and thus ur sensitivity, particularly to priand finally, we have only used space. Due to the lack of translao a signal in open configurations. pare the results of Fig. 5 between larger volume survey leads to an ed error bars of 20% for b 1 , 35% his is more than what one expects ons to the constraining power of eased volume (ϳ20%), and is a vement of the bispectrum covarirower survey window function in

IV. CLUSTER ABUNDANCE AND PRIMORDIAL NON-GAUSSIANITY
The abundance of clusters probes the probability distribution function ͑PDF͒ of the matter fluctuations, and it is thus a natural candidate to constrain primordial non-Gaussianity ͓33,34,48 -51͔. For large masses, the abundance of clusters depends on the right tail of the PDF which decays exponentially for Gaussian initial conditions. However, before this can be used to place a constrain on f NL , it is necessary to have under control a number of systematic effects.
Even with recent progress in the determination of cosmological parameters, uncertainties on ⍀ m and in particular 8 can alter the Gaussian abundance prediction enough to make difficult probing the small levels of primordial non-Gaussianity corresponding to, e.g., f NL Ӎ100. In addition, calculation of the mass function from a given PDF is not straightforward. Even for Gaussian initial conditions, measurements in numerical simulations suffer from systematic uncertainties of order 10-30 %, depending on the definition of halo mass ͓52͔. For primordial non-Gaussianity of the type given by Eq. ͑1͒, the mass function has been estimated analytically in ͓33͔, but there has been so far no complementary study using numerical simulations to give a sense of the uncertainties involved.
At a more fundamental level, it is not even clear that one can probe f NL using rare events such as clusters, given that in terms of the normalized bispectrum eigenmodesq n that diagonalize it (Scoccimarro, 2000a). These can be expressed asq whereQ m ≡ Q m , ∆Q 2 m ≡ (Q m −Q m ) 2 and their signal to noise is given by where λ n represents the eigenvalue forq n , with q nqm = λ 2 n δ nm . The eigenmodes presenting the largest signal to noise can be easily interpreted by considering how they weight different bispectrum configurations. In fact, the largest signal to noise corresponds to an eigenmode defined by a nearly equal weighting of all triangles, and it therefore represents the overall bispectrum amplitude. The next eigenmode weights instead with opposite sign triangles close to the equilateral shape and nearly collinear triangle. Each eigenmode represents in fact a fraction of the information contained in the bispectrum configurations, and a crucial role in this respect is played by the shape and scale dependence. To illustrate this point, fig. 19 (from Scoccimarro et al., 2004) shows the 95% C.L. limits on f loc. NL from the likelihood analysis of the IRAS PSCz catalog (Saunders et al., 2000) as a function of the number of eigenmodes included.
Although the diagonalization of the covariance matrix does not ensure the exact independence of the eigenmodes, which can still present non-vanishing higher-order correlations, it has been shown that this is nevertheless a reasonable assumption in practice (Scoccimarro, 2000a) . This allows us to write a likelihood function for the non-Gaussian and bias parameters, denoted generically as p α , in terms of the product of the probability distribution functions P n (x) for each individual eigenmode, that is (IV.188) The probability distributions P n (x), which can be determined from the mock catalogs, are not expected in general to be Gaussian, although this can be in fact a good first-order approximation in the case of the SDSS main sample (Scoccimarro et al., 2004). A direct implementation of this kind of analysis, taking into account all measurable bispectrum configurations and their covariance, has been performed by Scoccimarro et al. (2001) for different IRAS catalogs (Efstathiou et al., 1990;Fisher et al., 1995;Strauss et al., 1992) and by  for the IRAS PSCz catalog (Saunders et al., 2000) considering the case of the χ 2 model of primordial non-Gaussianity (Scoccimarro, 2000b). Scoccimarro et al. (2004) derives the limit | f loc. NL | < 1800 at 95% C.L. for the bispectrum measured in the PSCz catalog. Along this lines, Scoccimarro et al. (2004) also studied the constraints on f loc. NL for local non-Gaussianity, that could be obtained from measurements of the galaxy bispectrum in the SDSS main sample, including the effects of the survey geometry and bispectrum covariance, forecasting the 1-σ error ∆ f loc. NL 150, after marginalization over the bias parameters. This work compared this more realistic estimate of the predicted errors on f loc. NL from the likelihood analysis of the SDSS bispectrum to the Fisher matrix forecast for an ideal geometry of nearly the same volume and galaxy density finding a worsening of a factor of 4-5. They point-out, however, that the realistic errors, which are an estimate from the north part of SDSS alone, should be taken as a an upper bound to the results actually achievable because of the FKP weighting scheme, not optimal at the largest scales where the primordial component is the largest and because of the fact that extra signal can be found as well in open configurations, not considered there, due to the broken translation invariance. We might add, based on the results of section IV.B.2, that nonlinear corrections present for non-Gaussian initial conditions might increase the overall signal due to a non-zero f NL , particularly on small scales where a large number of triangular configurations can be measured.
At this point we should remind the reader that all the results discussed so far on the galaxy bispectrum and its significance for constraining primordial non-Gaussianity, assume the expression (IV.177) to be a reliable prediction. As we shall se in the remainder of this section, this is not the case, as additional effects of non-Gaussian initial conditions have to be taken into account. Nevertheless, the primordial component, whose direct detection has been the main target of the earlier works discussed above, is still expected to provide a contribution to the galaxy bispectrum, and there are good reasons to believe that these results can be still interpreted as a "conservative estimate" of the possibilities offered by bispectrum measurements in the large-scale structure to test the Gaussianity of the initial conditions.

Primordial non-Gaussianity and non-local Galaxy Bias
The constraints and forecasts discussed so far in this section are based on the tree-level expression for the galaxy bispectrum, (IV.177), derived under the assumption of local bias, (IV.173). As anticipated, our understanding of galaxy bias in presence of primordial non-Gaussianity radically changed in the last two years, after Dalal et al. (2008) presented measurements of the halo power spectrum in simulations with non-Gaussian initial conditions of the local kind showing the presence of large corrections at large scales, not captured by the local bias prescription! Fig. 20 shows the matter-halo cross-power spectrum for different values of f NL from these simulations, where the unexpected effect of non-Gaussianity at large scales is evident.
Local bias, (IV.173)), in fact, implies a leading contribution to the galaxy (or halo) power spectrum of the simple form with no dependence on f NL , while the simulations results of Dalal et al. (2008), later confirmed by Desjacques et al. (2009);Grossi et al. (2009);Pillepich et al. (2008), are consistent with a scale-dependent correction to the linear bias P and the halo-matter cross spectrum P h ¼ h Ã h i. We have used the cross spectrum rather than the halo auto spectrum because the former should be less sensitive to shot noise from the small number of halos compared to dark matter particles. We have checked, however, that using the halo auto spectra to com results as the cross spectra; i.e. stochasticity. Examples of the v resulting bias factors are plotted As can be seen, we numericall predicted scale dependence. Be statistics of rare objects, the error simulations plotted in Fig. 8 ar tempt to improve the statistics on bining the bias measurements fr Figure 8 plots the average ratio b in our simulations and our ana using c ¼ 1:686 as predicted fr model [78]. In computing the ave we used a uniform weighting ac tions, redshifts, and mass bins. A shift the results by $10%, so w the systematic error in our com agreement between our numeric our predicted bias scale depende and perhaps surprising. Naively, what larger collapse threshold c ellipsoidal rather than spherical halos in this mass range [70].

VI. COSMOLOGICAL C
Having derived fitting formul clustering of halos in NG models well upcoming surveys may con NG could possibly affect the con cosmological parameters. We foc veys and redshift surveys. Cluste cosmological parameters, in pa rameters, by exploiting the expo galaxy cluster abundance on cosm goal for upcoming redshift surv energy by localizing baryonic ac features in the galaxy power spec Examples of upcoming survey Cosmology Telescope, 4 South Energy Survey, 6 WiggleZ, 7 Acceleration Probe, 9 and the Telescope. 10 Because primordial non-Gau abundance and power spectra o these types of surveys will be we NG. On the other hand, potentia FIG. 8 (color online). Ratio of the bias shift Áb measured from our simulations to that predicted by Eq. (9), using c ¼ 1:686. Biases were computed from cross spectra measured on 28 simulations with 5 various f NL ðÀ500; À100; 100; 500Þ, 3 various redshifts (z ¼ 0, 0.5, 1), and 5 halo mass bins. Note that at higher k, nonlinear evolution also generates scale FIG. 7 (color online). Cross-power spectra for various f NL . The upper panel displays P h ðkÞ, measured in our simulations at z ¼ 1 for halos of mass 1:6 Â 10 13 M < M < 3:2 Â 10 13 M . The solid line corresponds to the theoretical prediction for P with a fitted bias b 0 ¼ 3:25. We see a strongly scale-dependent correction to the bias for f NL Þ 0, increasing towards small k (large scales). The bottom panel displays the ratio bðk; f NL Þ=bðk; f NL ¼ 0Þ. The errors are computed from the scatter amongst our simulations and within the bins. Triangles correspond to our large (1024 3 particle) simulations whereas diamonds correspond to our smaller (512 3 particle) simulations. The dotted lines correspond to our expression for the bias dependence on f NL defined in Eq. (9). of the form

IMPRINTS
where δ c 1.68 is the linear critical density for spherical collapse, extrapolated at z = 0. Such correction therefore increases with the scale, with redshift via the growth factor D(z) and with the non-Gaussian parameter f NL , and vanishes for unbiased populations (b 1 = 1). A theoretical interpretation, based on the peak-background split (Cole & Kaiser, 1989) has been assumed by Afshordi & Tolley (2008); Dalal et al. (2008); Giannantonio & Porciani (2009);Slosar et al. (2008), and, with a somehow different derivation, by McDonald (2008). According to these works, the local relation between the galaxy density and the matter density in (IV.173) is modified, in presence of local primordial non-Gaussianity, to include an explicit dependence on the primordial curvature perturbation, Φ, i.e.
The galaxy two-point function, will be given by the following perturbative expansion, IV.193) where the second term can be rewritten as the scale-dependent of the linear bias parameter of Eq. (IV.191), with the 1/k 2 behavior resulting from the relation between δ k and Φ k given by M(k) ∼ k 2 . Giannantonio & Porciani (2009) describes, in fact, the galaxy distribution as multivariate distribution, although the matter density δ and the curvature Φ are not two independent random field, but they are related by Poisson equation, (IV.161). It should be noted, that no derivation of a similar effect due to a different kind of primordial non-Gaussianity (if feasible) has been, so far, proposed. The derivations presented in the works cited above, in fact, all rely on the relatively simple expression defining the local model (II.6) while their generalization to a model defined by a generic initial bispectrum is a quite challenging problem. Following the results of Dalal et al. (2008), moreover, an apparently different explanation, resulting in fact in a very similar but distinct effect on the galaxy power spectrum has been proposed by  and Taruya et al. (2008). Taruya et al. (2008), starting from the local bias prescription of (IV.173), point-out that the next-toleading order correction to the galaxy two-point function in presence of local primordial non-Gaussianity, represents, in fact, a large correction, identical up to a constant factor, in the large-scale limit, to the bias correction of (IV.191). The perturbative expression for the galaxy power spectrum is given by IV.194) where the second term, proportional to the quadratic bias parameter b 2 and dependent on the matter bispectrum B, corresponds to the lowest order, one-loop correction. Remarkably, for local non-Gaussianity, in the limit k → 0, such correction presents the same scale and redshift dependence, and, for massive halos or highly biased populations (b 1 1), even the same amplitude, as the one resulting from (IV.191). The expression, however, can be applied to any model of primordial non-Gaussianity, given the appropriate initial matter bispectrum (see, for instance, ). In the case of equilateral non-Gaussianity, the correction is almost negligible, while local non-Gaussianity appears to be a limiting case leading to a particularly significant effect. The same correction has been considered already by Scoccimarro (2000b) in the context of χ 2 initial conditions, where it leads to a redefinition of the bias parameters, with no additional scale-dependence.  presents a different derivation of an expression similar to the one of (IV.194), based on earlier works on the density peak correlation function (Grinstein & Wise, 1986;Matarrese et al., 1986). In this case, a specific prediction for the bias parameters, valid however only in the high density threshold limit, is included. It is interesting to notice that the possibility of large-scale effects on the correlations of biased distributions has been explicitly pointed-out by Grinstein & Wise (1986), although without further study.
The two distinct corrections to the galaxy power spectrum, one corresponding to the modified bias relation of (IV.192), the other to the perturbative correction due to nonlinear bias of (IV.194), have been studied in a comprehensive framework recently by Giannantonio & Porciani (2009), where the authors suggest that the effect measured in N-body simulations is mainly due to the multivariate nature of the galaxy distribution with local primordial non-Gaussianity, rather than the effect of nonlinear bias (IV.194). In addition, Desjacques et al. (2009) pointed-out that even the galaxy bias parameters b i , related in the framework of the halo model to the halo bias parameters b h,i (M) for halo populations of mass M, present a dependence of f NL due to the effects of non-Gaussianity on the halo mass function. The picture that has been emerging in the last years is therefore quite complex and it should be stressed that a wide consensus in the community on a well defined model, even for the galaxy power spectrum, is still lacking. For instance, a discrepancy of the order of a 10% between predictions and simulations results, did not find yet a unique interpretation (see discussions in Desjacques et al., 2009;Giannantonio & Porciani, 2009;Grossi et al., 2009;Maggiore & Riotto, 2009;Pillepich et al., 2008).
This rather surprising effect of local non-Gaussianity on the bias relation leads, remarkably, to the possibility of placing limits on f loc. NL from current large-scale structure observations, already comparable to limits from the CMB! (Afshordi & Tolley, 2008;Slosar et al., 2008). Specifically, Slosar et al. (2008) derived from measurements of the cross-correlation of several large-scale structure data-sets and the CMB  the 2-σ constraints (IV.195) leading to an marginal improvement of the WMAP results. Encouraging predictions for the constraints that can be derived in future spectroscopic as well as photometric redshift surveys can be found in Carbone et al. (2008). A fair comparison between these forecasts and those derived for the galaxy bispectrum in Sefusatti & Komatsu (2007) is clearly not possible as the latter do not include the effect on the bias relation discussed above. Two observations, however, are in order. In the first place, these effects on the galaxy power spectrum are specific of the local model of non-Gaussianity, while the galaxy bispectrum is in principle sensitive to any initial component B 0 . In the second FIG. 21 Large-scale contributions to the galaxy bispectrum due to primordial non-Gaussianity of the local (left panel) and equilateral (right panel) type described as one-loop corrections assuming a local bias prescription. Thin lines correspond to the contributions for Gaussian initial conditions. From Sefusatti (2009), see the reference for further details.
place, robust results can be obtained from galaxy power spectrum measurements at large scales in photometric surveys. The degradation of the information that can be extracted from bispectrum measurements in photometric surveys with respect to spectroscopic ones is still to be properly studied. The impact of photometric errors on the accurate determination of the bispectrum dependence on the triangle shape can in fact be significant.
6. The Galaxy Bispectrum after Dalal et al. (2008) First steps in the direction of an extension of the results discussed above to the galaxy bispectrum have been taken in Jeong & Komatsu (2009b) and Sefusatti (2009). Specifically, Jeong & Komatsu (2009b) considered an expression for the high-peak three-point function derived in Matarrese et al. (1986), analogous to the one for the two-point function studied by , and applied it to the case of local non-Gaussianity. Sefusatti (2009) considered instead the perturbative approach of Taruya et al. (2008) based on the local bias expansion of (IV.173), and applied it to local and equilateral non-Gaussianity.
These works show that the galaxy bispectrum is expected to be sensitive to both the initial matter bispectrum B 0 , as well as to the initial matter trispectrum T 0 , by means of a contribution analogous to (IV.194) and given by which represents a large correction at large scales, with an asymptotic behavior characterized by an extra 1/k 2 factor with respect to the primordial matter bispectrum component, B 0 and a dependence on f 2 NL . In addition, Sefusatti (2009) points-out that, unlike the power spectrum, large-scale corrections due to nonlinear bias are present as well for equilateral non-Gaussianity (and virtually for any non-pathological form of the primordial bispectrum and trispectrum). Fig. 21 shows the one-loop corrections to the galaxy bispectrum due to nonlinear bias and primordial non-Gaussianity under the assumption of local bias (Sefusatti, 2009). The left panel assumes local non-Gaussianity including a nonzero initial bispectrum and trispectrum, while the right panel assumes a non-zero initial bispectrum of the equilateral type. Thin lines correspond to Gaussian initial conditions. The black continuous line represents the matter bispectrum and therefore the first term on the r.h.s. of (IV.196), while the blue dashed line correspond to the second term. Notice that at next-to-leading order in PT, the matter bispectrum T depends on the initial trispectrum T 0 as well as the initial bispectrum B 0 , so that an effect is present also for equilateral non-Gaussianity where the figure assumes T 0 = 0.  . The halo bispectrum for some triangular configurations. Each panel shows the result for an isosceles configuration specified by α ≡ k 1 /k 3 and k ≡ k 1 = k 2 . Error bars are measurements from our simulations (the average and the standard error among different realizations) and solid lines are their 4-th order polynomial fits, while we keep the terms up to second and linear order for dashed and dotted lines. We use the outputs at z = 0.5 and consider the haloes more massive than 4.6 × 10 13 h −1 M .

FIG. 22
Measurements of a set of triangular configurations of the halo bispectrum in N-body simulations with local non-Gaussian initial conditions, as a function of the non-Gaussian parameter f NL . Large values of the parameter α correspond to more squeezed configurations. In the upper panels, the dependence of the halo bispectrum on f 2 NL is evident. From Nishimichi et al. (2009). It should be noted, however, that these results ignore, at least for local non-Gaussianity, the modified bias relation of (IV.192) (see, in this respect, some comments in Giannantonio & Porciani, 2009), and do not provide reliable predictions for the constant bias parameters. Furthermore, they have not been properly tested against measurements of the halo bispectrum in numerical simulations. The only work, at the time of writing, in this direction (Nishimichi et al., 2009) shows, however that the dependence of the halo bispectrum on f NL is roughly consistent with the functional form resulting from the prediction of (IV.196). The authors attempt as well, using a simple fit to their measurements, a preliminary forecast analysis for a future large-volume (100 h −3 Gpc 3 ), high-redshift survey, finding a detectable value of f NL 20, using a very limited number of configurations. Fig. 22 from Nishimichi et al. (2009) shows measurements of a set of triangular configurations of the halo bispectrum in simulations with local non-Gaussian initial conditions, as a function of the non-Gaussian parameter f NL , where the dependence of the halo bispectrum on f 2 NL is evident. A simple but reasonable expectation would be that the inclusion of the effects of primordial non-Gaussianity on galaxy bias will improve the results of Sefusatti & Komatsu (2007), which are based on the detectability of the primordial component alone. Our understanding of these phenomena is, however, evolving rapidly in these days, and these notes on recent developments are likely to become outdated relatively soon.

D. Running non-Gaussianity
1. The case of a scale-dependent f NL DBI models of inflation predict, as we have seen, a primordial curvature bispectrum very close to the equilateral model in its shape dependence. An additional but quite generic feature of these models is given by a significant departure from the hierarchical scaling B Φ (k, k, k) ∼ P 2 Φ (k) (Chen, 2005;Chen et al., 2007;Chen & Wang, 2009;Khoury & Piazza, 2008;Renaux-Petel, 2009;Shandera & Tye, 2006). More recently, this possibility has been explored as well in models of local non-Gaussianity (Byrnes et al., 2008(Byrnes et al., , 2009aByrnes & Tasinato, 2009;Kumar et al., 2009).
Under a phenomenological point of view, this extra scale-dependence can be described by a running f NL (k), or, more properly, in terms of an amplitude parameter f NL and a running parameter n NG , defined by (IV.197) where k p is a properly chosen pivot scale, while K(k 1 , k 2 , k 3 ) = (k 1 + k 2 + k 3 )/3 defines an overall scale characteristic of the triangular configuration on which B Φ (k 1 , k 2 , k 3 ) depends. In other terms, the f NL (K) defined above replaces the constant f NL in the definitions of the local and equilateral bispectra effectively introducing an extra dependence on scale.
Observational consequences of a running f NL (K) have been explored in Lo  and Sefusatti et al. (2009), while in Taruya et al. (2008) this effect is included in the prediction for one-loop corrections to the matter and galaxy power spectrum.
Lo  provided an analysis of the possibility of constraining the running parameter n NG by combining current limits from the CMB on the amplitude parameter f NL at the pivot scale k p = 0.04 Mpc −1 with future measurements of cluster abundance. The effect of a n NG significantly different from 1, can result in a much larger (or smaller) amount of non-Gaussianity on the smaller scales relevant for the cluster mass function. Fig. 23 (left panel, from Lo  illustrates the difference in the range of scales probed by different observables. Focusing in particular on the equilateral model for the curvature bispectrum, this work assumes the amplitude of f NL (k) to be constrained by the CMB bispectrum at the pivot point scale k p and derives the expected constraints on its running by considering the effective amplitude of f NL (k) at the smaller scales (k ∼ 0.3 − 0.6 h Mpc −1 ) probed by cluster surveys. For an all-sky cluster survey up to redshift z max = 1.3 they find the 1-σ constraints, marginalized over Ω m , σ 8 and h, assuming the fiducial values f NL = 38 and n NG = 0, ∆n NG 2 with a Planck prior ∆ f NL (k = k p ) = 40. Their analysis, however, does not include the simultaneous limits that measurement of the CMB bispectrum alone is expected to provide on both the amplitude f NL and running n NG . The solid line has n NG − 1 = 0, the dashed n NG − 1 = 0.2 and the dotted-dashed n NG − 1 = 0.6. The shaded region in the upper left-hand corner shows the range that is excluded at 95% confidence by current CMB data [27] for equilateral shape non-Gaussianity (plotted is the more conservative lower bound on f NL ). The shaded regions on the right show the range of scales probed by the galaxy bispectrum and by clusters. The range of scales probed by the bispectrum depends (among other things) on the redshift of the survey, survey volume and the number density of galaxies: the above plot assumes V ∼ 10h −3 Gpc 3 , z ∼ 1 and the maximum k is determined by the nonlinear scale [47].
Before beginning a detailed discussion of observables, it is interesting to note that, from an effective field theory point of view, one does not expect to see f NL more than a few without significant fine-tuning. For slow-roll models with standard kinetic terms, f NL is proportional to slow-roll parameters. For example, the contribution from the third derivative of the potential goes like V /2πH 10 −4 , where the small number comes from the amplitude of scalar fluctuations and the inequality comes from demanding a flat enough potential for sufficient inflation. This was shown in detail by Maldacena [14]. Terms with a shift symmetry in the inflaton field φ, as φ → φ + c (with c a constant) may be added without spoiling slow-roll, and in particular terms like (∇φ) 2n /M 4n−4 , with n > 2 and M some mass scale, may be added. In order to truncate the added terms at for Planck. Since it is always possible, given the observable of interest (e.g. the CMB bispectrum for a specific experiment) and the non-Gaussian model, to choose the pivot point in such a way to remove any degeneracy between the amplitude and the running parameters, a measurement of the running parameter comes at no cost with respect to the determination of the f NL (k = k p ). Notice, however, that, for reasons related to the numerical implementation of the CMB estimator, Sefusatti et al. (2009) assumes for the overall scale representative of a given triangular configuration, the geometric mean of the three wavenumber, i.e. K ≡ (k 1 k 2 k 3 ) 1/3 . While the difference with the more physically motivated definition in terms of the arithmetic mean K = (k 1 + k 2 + k 3 )/3 is very small for equilateral non-Gaussianity, in the local model this is not the case. Sefusatti et al. (2009) considers as well the Fisher matrix from large-scale structure information, and specifically the galaxy power spectrum (including the effect on halo bias) for the local model and the galaxy bispectrum, but in terms of the simple description of (IV.177), therefore excluding halo bias effects. This different choice of observables with respect to the model of primordial non-Gaussianity assumes a negligible effect of equilateral non-Gaussianity on the galaxy power spectrum (still to be confirmed by N-body simulations). It is shown, in particular, that future galaxy redshift surveys can significantly improve CMB results. Fig. 23 (right panel) shows the contours plots for the 1-σ uncertainties resulting for a joint Fisher matrix analysis of CMB and large-scale structure information. The expected limits are plotted against the predictions for the relation between the amplitude f loc. NL and the running n NG from DBI inflationary models. It is interesting to notice how these models predict a stronger running for smaller values of the amplitude parameter. In this respect, constraining the value of n NG can place additional limits on the parameters of the inflaton Lagrangian.

V. CONCLUSIONS
Weakly non-Gaussian initial conditions are defined, in most of the relevant inflationary models, by a non-vanishing bispectrum for the primordial curvature perturbations. The most direct observables of this primordial density correlator are, naturally, the bispectrum of the temperature fluctuations in the CMB and the bispectrum of the mass distribution at large scales as probed by galaxy surveys. In this review we presented an overview of the problems, results and expectations connected with the detection of (or constraints on) primordial non-Gaussianity specifically in bispectrum measurements of the CMB and LSS.
The CMB is an ideal observable for tests of primordial NG because temperature and polarization anisotropies can be described in the linear regime of cosmological perturbations. The statistical properties of the primordial curvature field are thus directly reflected in the pattern of CMB fluctuations. As we have seen, tests of primordial NG are formulated in terms of the estimation of the bispectrum amplitude f NL for each of the shapes predicted by different inflationary models. It was originally shown in the literature that a maximum likelihood estimator of the bispectrum optimally extracts all the f NL information from a CMB map. Extracting the primordial non-linear parameter from the bispectrum has subsequently become the standard way to test primordial NG in the CMB. The best f NL measurements to date come from analysis of the WMAP datasets, and roughly constrain the primordial bispectrum amplitude to be 100 for the local, equilateral and orthogonal shapes. Despite already being very stringent (the NG part of the CMB temperature anisotropies is constrained at the level of 10 −3 of the total fluctuation), these bounds are still far from the typical order of magnitude of primordial NG predicted by most inflationary models. As we have seen, Fisher matrix forecasts show that future results from the Planck satellite (whose release date is predicted to be in 2012) will improve previous WMAP constraints by roughly one order of magnitude, thus impacting the range of some theoretical predictions. This significant improvement is due to the better sensitivity of Planck to many more bispectrum configurations in the analysis, and the possibly of exploiting both temperature and polarization datasets. Another important limitation on current constraints is that inflationary predictions encompass more shapes than those that have been constrained so far. The reason why many shapes remain to be constrained is that they cannot be written as a separable product of one-dimensional functions of a single wavenumber. Separability, as we have seen, is a crucial property since it makes the actual analysis computationally affordable in terms of CPU time. We have reviewed recent work showing that this limitation can also be overcome in future analysis by means of a fully general, and mathematically well defined, eigenmode expansion of the bispectrum shape. Thanks to this, and in light of the significant improvement in sensitivity provided by Planck, better and more general CMB constraints on primordial NG models will be available in the near future. One caveat is that the high precision of the forthcoming CMB datasets makes them much more sensitive to other spurious (i.e. non primordial) sources of NG, that could bias the f NL estimate. Achieving an accurate control on these contaminants is clearly a crucial goal for future analysis. As we have seen, much work is being done in order to predict, detect and isolate non-primordial NG effects, but some issues still have to be addressed. In particular a complete prediction of the total bispectrum generated by second order cosmological perturbations is not yet available, although a number of effects have been studied in detail. Accurate characterization of NG from diffuse foreground residuals is another important issue that will require further investigation.
For large-scale structure, many aspects of the general CMB scenario outlined above change, as should be evident from a comparison of the discussions in sections III and IV. In the first place, we cannot rely on a direct relation between the observed galaxy bispectrum and the primordial curvature bispectrum predicted by inflationary models. As we have seen, a small departure from Gaussian initial conditions should result in a correction to the galaxy bispectrum induced by gravitational instability and nonlinear bias, constituting the dominant contributions. The nature of this correction is a complex problem in its own right, since it is due to the linearly evolved initial matter bispectrum as well as to the effects of primordial non-Gaussianity on the galaxy bias relation. Such effects are still under investigations and we do not have, to date, an accurate theoretical model. On the other hand, early results from galaxy power spectrum measurements are very encouraging, albeit restricted at present to the local non-Gaussian model. Current data sets already appear to be able to confirm and improve CMB results. In this respect, it is evident that the ultimate goal is the implementation of a complete large-scale structure analysis in terms of all measurable correlators, including power spectrum, bispectrum and beyond, i.e. an analysis that fully reflects the non-Gaussian nature of the mass and galaxy distributions even on large scales.
There are several issues which remain to be resolved, for which we can identify three main categories. First, we need to develop a robust model for the galaxy correlators accurately accounting for small-scale nonlinearities for both the matter and galaxy density fields, as well as in the presence of non-Gaussian initial conditions; this also must account describe non-localities in the bias relation. In this review we have briefly summarized the state of the art, noting that our understanding of these phenomena is evolving rapidly. Secondly, once a reliable model is available, it will be necessary to develop the machinery that will allow us, in the event of a future detection, to properly identify the effects of different models and their bispectrum shapes. In this respect, the CMB results presented in section III, provide an important benchmark. Finally, observational problems connected with redshift surveys such as the effects of redshift distortions and/or photometric errors, survey selection function, completeness, etc., will have to be addressed. We have not discussed these issues here as they are generic to all large-scale structure experiments, but they clearly represent a major challenge for the exploitation of future data-sets. Both the first and the last point are crucial for virtually all the science goals of future ground-based or satellite surveys, particularly dark energy studies. Although only partial results have been obtained so far, there is every indication that characterising non-Gaussianity in future galaxy surveys will result in a significant test of the initial conditions of the Universe. To summarize, sufficient experimental sensitivity has been reached recently in CMB experiments (namely WMAP) to allow for meaningful constraints on the non-linear parameter f NL for several different families of models. These results are already arguably the most stringent quantitative test of the predictions of standard inflation. However, much tighter constraints on a broader range of models are expected from the future Planck data release. Thus a dramatic confrontation is set to continue between the de facto standard model of inflation and observational datasets from both the CMB and large-scale structure. Tests of primordial non-Gaussianity are rapidly becoming one of the most effective and promising approaches for gleaning important information about the physical processes that generated the primordial cosmological perturbations.
In this context a crucial role is played by the Fisher information matrix, defined as The Fisher matrix appears in an important theorem, known as the Cramer-Rao inequality, stating that for any unbiased estimator of λ This theorem is then placing a lower bound on the error bars that can be attained when estimating a given parameter from a given set of observations. No matter which estimator is used, the smallest attainable error bars will be given by the square root of the inverse of the Fisher matrix. For a demonstration of this crucial result see e.g. Kendall & Stuart (1979) or, in relation to the CMB bispectrum, Babich (2005). It is then clear that the best estimator of a parameter is an unbiased estimator saturating the Rao-Cramer bound. If such an estimator is found, then it is impossible to obtain a better estimate using any other statistic. The question then becomes if, for a given the PDF p(x|λ), an estimator saturating the Rao-Cramer bound exists. It can be shown that a necessary and sufficient condition for an estimator E(x) of a parameter λ to be optimal is the following: where F is the Fisher information matrix just introduced above. Another crucial quantity in estimation theory is the so called maximum-likelihood estimator. In a maximumlikelihood (ML) approach we take the observed data set x obs as fixed and we estimate λ as the parameter that maximize the probability (likelihood) to observe the given data. In formulae, the ML-estimate of λ is the valueλ that satisfies: In this context the PDF p(x|λ) is often denoted as the likelihood function and indicated as L(x, λ). Two powerful theorems involving the likelihood have been proven: 1. If there is an optimal unbiased estimator (i.e. an unbiased estimator saturating the Rao-Cramer bound) then it is the maximum-likelihood estimator or a function of it.
2. The maximum likelihood estimator is asymptotically optimal, i.e. it saturates the Rao-Cramer bound when N → ∞, N being the number of repeated observations in our data set x obs (1) , . . . , x obs (N) . These two theorems answer our initial question about the best estimator choice. The first theorem basically states that if a best method exist, then the ML-estimator is that method. Note that this result follows naturally from the optimality condition (A.5) introduced above. The second theorem says that for very large data sets the ML-estimator is the best method, i.e. the one saturating the Rao-Cramer bound. In other words, when dealing with the practical problem of estimating a parameter from a given data set, we should in theory always choose a ML-likelihood approach. However in practice this is not always possible: for example, the PDF p(x|λ) might be too difficult to calculate or sample numerically, or the ML condition (A.6) (generally a complicated non-linear equation) too difficult to solve. In this case other approaches and different estimators have to be chosen.
An important role is played by the likelihood of Gaussian random variables. If a given observed variable O α is characterized by gaussianly distributed errors, then it is easy to see that its likelihood is L = e χ 2 /2 , (A.7)