Compact source detection in multi-channel microwave surveys: from SZ clusters to polarized sources

In this paper we describe the state-of-the art status of multi-frequency detection techniques for compact sources in microwave astronomy. From the simplest cases where the spectral behaviour is well-known (i.e. thermal SZ clusters) to the more complex cases where there is little a priori information (i.e. polarized radio sources) we will review the main advances and the most recent results in the detection problem.


I. INTRODUCTION
Extragalactic foregrounds play a crucial role in microwave astronomy, not only by their effect as contaminants of the Cosmic Microwave Background (CMB) but also by their own right as cosmological probes. Galaxies and galaxy clusters, if not properly identified and taken into account, can seriously affect the measurement of the CMB anisotropies angular power spectrum in temperature [1]- [3] and polarization [4], [5], CMB non-Gaussianity tests [6]- [12] and even the performance of component separation methods used for the study of Galactic foregrounds 1 [13], [14]. On the other hand, galaxy and galaxy cluster surveys in the sub-mm regime of the electromagnetic spectrum are powerful tools for cosmology [15]- [17]. This is the motivation of the considerable number of works on extragalactic foreground detection that have appeared in the literature on recent years.
As opposed to Galactic foregrounds, that are typically extended as diffuse clouds over large areas of the sky, individual extragalactic objects appear as compact blobs that subtend very small angular scales. For this reason, both galaxies and galaxy clusters are often referred to as compact sources and their detection/separation is typically treated as a problem apart from the one posed by the separation of Galactic diffuse components.
It is precisely the compactness of extragalactic sources that makes it possible to detect them against the fluctuations of the diffuse components (CMB included) in single-frequency 1 Instituto de Física de Cantabria, CSIC-UC, Av. los Castros s/n, 39005 Santander, Spain 2 Departamento de Matemáticas, Universidad de Oviedo, Calvo Sotelo s/n, 33007 Oviedo, Spain 3 Astrophysics Group, Cavendish Laboratory, JJ Thomson Avenue, Cambridge CB3 0HE, UK 1 On the other hand, the opposite is also true: foregrounds can affect the performance of compact source detection algorithms. In general, compact source detection algorithms find it easier to deal with diffuse foregrounds than diffuse component separation techniques to deal with compact sources, so the typical CMB analysis pipeline includes the detection of compact sources as a previous step to diffuse component separation.
(channel) settings. Most of the detection methods that have been proposed in the literature make use of this scale diversity. The well known SExtractor package [18], for example, is particularly good at estimating and then subtracting the background at coarse scales and then detecting compact sources by looking for small sets of connected pixels above a given threshold. For the same reason, techniques based on scaleselecting devices such as band-pass filters [19]- [24] and wavelets [25], [26] have proven to be very useful. More sophisticated detection Bayesian algorithms [27]- [31] make also use at some point of the scale diversity of compact sources versus diffuse foregrounds. A recent review on methods for the detection of compact sources in microwave images can be found in [32].
The majority of current CMB experiments, such as the Wilkinson Microwave Anisotropy Probe (WMAP) [33] and Planck [34], can observe the sky in several frequency bands simultaneously. Some of them are also capable of measuring not only the intensity of the microwave radiation, but also its polarization. This means that extragalactic compact sources can be observed in several different images or channels. Multichannel 2 information can be used to improve the chances of detecting compact sources. In this paper we will review the state of the art multi-channel detection techniques for compact sources in microwave astronomy.
Multi-channel (multi-frequency) astronomy is almost as old as the telescope itself: we have been doing it since the first colour filters became available. During the XX th and XXI st centuries astronomy has expanded its range of operation into the radio, microwave, infrared, ultraviolet, Xray and gamma regimes of the electromagnetic spectrum: a tremendous amount of information we are only beginning to piece together. The most basic multi-channel consideration we can do is: given that we have observed some object in the optical, let us see how does it look like in, for example, the infra-red. Catalogue cross-correlation and band-merging, follow-ups and the construction of spectral energy distribution (SED) curves are fundamental parts of this multi-channel quest for knowledge. But we are not going to review this.
A follow-up -looking at a certain position of the sky where a 2 In this work we wil use the term multi-channel detection instead of the more common multi-frequency term because it can accommodate both the classic problem of detecting a source using different frequency maps and the detection of a source in polarization using images of the Stokes' parameters I, Q, U . As we will see, both problems are formally equivalent and therefore we prefer the more general terminology. particular source has been already detected in a different region of the electromagnetic spectrum-can only work if we have a clear detection of the source in the first place. Sometimes, however, the sources are just too faint to be detected with enough significance in any of the available channels. But maybe if we could join together all the channels we would be able to detect the source. Even in the cases where the source is clearly detected in one channel, it may be too weak to be observed in other channels. Once again, one can ask if there is any possibility to use other information apart from just the source position from the 'good' channels in order to enhance the source in the 'bad' ones. Or we can more generally ask if there is a better way to combine all the channels to get better SEDs than just going channel by channel separately. This is the aim of our review.
The key for multi-channel source separation is spectral diversity: we hope the signal of interest to scale from one image to other in a different way than the other components (background). Assuming a linear mixture model in which N different, unknown sources are added with a set of channeldependent weights to produce M observed channels (M ≥ N ), and making certain assumptions about the statistical properties of the sources -for example statistical independence, or non-Gaussianity, etc-the Blind Source Separation (BSS) problem is solvable by means of a number of statistical signal processing techniques such as the Maximum Entropy Method [35], Independent Component Analysis [36], Correlated Component Analysis [37], [38] or wavelet-based Internal Template Fitting [39], just to mention a few of them. For a more detailed review of BSS methods applied to microwave experiments, see [14].
Unfortunately for our purposes the above mentioned BSS techniques are not well suited for the detection of extragalactic compact sources, except for the particular case of galaxy clusters observed through the thermal Sunyaev-Zel'dovich (tSZ) effect. Individual galaxies leave their imprint on the microwave sky through an enormous variety of astrophysical mechanisms -from radio active lobes to dust thermal emission-so that, strictly speaking, each individual galaxy has its own unique spectral behaviour. In the linear mixing model this translates into N ≫ M and the BSS problem is sorely under determined. New methods, specifically tailored for compact sources, become necessary. Even in the case of tSZ clusters, where all the sources share the same spectral behaviour, it may be advisable to apply other techniques different from the above mentioned BSS methods. The tSZ is sub-dominant at all frequencies, making it very difficult for BSS techniques to detect but the brightest clusters in the sky. The multi-channel detection methods we are going to describe in this review make use, up to different extent, of both scale and spectral diversities in order to optimize the detectability of compact sources.
In this paper we will review the most widely used methods for detecting extragalactic compact sources using multichannel data. We will first formalize the problem in section II. Then we will review the different approaches currently used in microwave astronomy. We choose the order in which we introduce the methods on the basis of the amount of information that is used to achieve detection: we will start in section III with traditional stacking and band-merging techniques that make a minimal use of multi-channel information and then proceed in section IV to a linear filtering scheme that takes into account the correlation of the background among different channels. In section V we will discuss how to filter the data when both the correlation of the background among channels and the spectral behaviour of the sources are known. We will see how the second condition can be relaxed. Then we will discuss the more general theory of Bayesian detection in section VI. Finally, we will devote section VII to the particular case of polarization data.

OBSERVATIONS
Let us consider a set of N two-dimensional images (channels) in which there is an unknown number of compact sources embedded in a mixture of instrumental noise and other astrophysical components. Without loss of generality, let us consider the case of a single source with a certain typical angular scale R and located at the origin of the coordinates. Our data model is where the subscript j = 1, . . . , N denotes the index of the channel: it may refer to a given frequency (i.e. 30 GHz, 44 GHz, etc), to a polarization channel (i.e. the Stokes' parameters I, Q, U ) or any other image indexing we can consider. The scale R of the sources can vary from one object to other (as it happens for galaxy clusters) or be the same for all the objects belonging to a given class (i.e. radio sources observed by low angular resolution experiments such as WMAP). It is common to factorize the source term as the product of a channel-dependent amplitude or intensity and a spatial profile The spatial profile, in turn, includes the possible effects of any channel-dependent point spread function (beam): The above formula can be expressed more easily in Fourier space: where for simplicity we have kept the same symbol for the functions in real and in Fourier space (the argument x or k indicates clearly enough in which space we are). We choose the normalization of the profile so that Finally, for simplicity we will consider symmetric beams and profiles, τ j ( x; R) = τ j (| x|; R) = τ j (x; R). This assumption is not strictly necessary and can be relaxed on demand, but it greatly simplifies calculations and is quite reasonable in most applications. The term n j ( x) in equation (1) is the generalized noise in the j th channel, containing not only instrumental noise, but also CMB and all the other astrophysical components apart from the compact sources. Let us suppose that the noise has zero mean and that it can be characterized by its cross-power spectrum: In other words, we work under the assumption that the properties of the noise can be sufficiently described by its second-order statistics. Please note that the homogeneity of the background is a necessary condition for the power spectrum to be a full second order description of the background; this condition is not globally met in real astronomical images (where the Galactic foregrounds, for example, are strongly non homogeneous), but we can always take patches small enough to satisfy local homogeneity (at least on a first approximation).

A. Notation
In this paper we will use the notation x to indicate coordinate vectors (both in real and Fourier spaces) and the notation x to indicate vectors whose components are labelled with the indexes of the different channels. According to this notation, we can write equations (1-6) as It will be useful in some cases to expand the vector of intensities as where I * can be interpreted as a channel-independent intensity and f is a vector containing the spectral behaviour of the sources across the different channels. For example, when we consider the tSZ I * can be associated with the cluster Compton-y parameter and f takes the well-known form, in thermodynamic temperature units, whereν = hν/kT , ν is the frequency of observation, k is the Boltzmann constant and T is the temperature of the CMB. Other example is provided by radio sources whose spectral behaviour can be approximated by a power law where ν 0 is some frequency of reference and α is known as the spectral index. In this case, I * can be interpreted as the source flux density at the reference frequency ν 0 . Please note that analytic spectral laws such as (9) or (10) are not always available, or even convenient. A list of symbols used in this paper can be found in table I, that appears in the Appendix.

III. BASIC MULTI-CHANNEL DETECTION
Probably, the simplest imaginable approach to the multichannel problem consists in trying somehow to transform it to a much simpler single-channel problem. The theory and applications of single-channel source detection are well studied in the literature; we will assume in this work that the reader is already familiar with the topic. A short, recent review can be found in [32] and references therein.
Let us start with the simplest -and most frequent-case of multi-channel observation we can conceive. Imagine we have just two independent observations D 1 and D 2 of the same source, taken at the same frequency and with the same instrumental characteristics (beam size, noise level, etc). A perfect example is any pair of identical radiometers in an experiment such as Planck. Let us also assume that the two observations are simultaneous, or at least that the source under study has not experienced variability in the time passed between observations. Then, it is well known that if the noises of the two observations are Gaussian with rms σ 1 = σ 2 = σ and independently distributed, the linear combinationD = (D 1 +D 2 )/2 has a lower rms noiseσ = σ/ √ 2. If the different channels have different noise levels but the noises are still uncorrelated, another well-known result is that the noise of the linear combination of the channels can be minimized by inverse noise variance weighting of the individual channels. Once the multi-channel data has been combined (or projected) into a single channel (or plane) it is straightforward to apply threshold detection, or filtering plus thresholding, or any other of the techniques described in [32]. With the appropriate modifications, the weighting internal linear combination scheme can be extended to correlated noise [40]. The problem with this approach is that if the source has not the same intensity in all the channels (or if its spectral behaviour is not perfectly known) it is impossible to relate the intensity observed in the combined image with the true intensity of the source.
Other merging schemes can be advisable in some other circumstances. Imagine that some of the components of the noise in (1) are constant across all the channels: the best example is the CMB itself in images expressed in thermodynamic temperature units. Then an obvious way to reduce the noise is to subtract pairs of channels. As an example, [41] and [42] used combinations of the WMAP W and V bands in order to produce a CMB-free map to better detect the elusive radio galaxies in the WMAP 5-year data. The problem with this approach is that if a source has by chance a spectrum flat enough, it may be also cancelled by the subtraction.

IV. MATRIX MULTI-FILTERS
A recurrent problem of channel linear combination is that accurate photometry -that is, the determination of I * -is not possible unless the exact values of all the components of the spectral behaviour f are known. But the spectral behaviour is not known in most cases. General laws such as (10) are not reliable when the range of frequencies involved is wide enough, and even if they are there is the additional problem that different sources in a given patch of the sky will have different spectral indexes. The classic solution in these cases, if one wants to persist using linear combination nevertheless, is to go back to the individual channels and extract photometric measurements on them.
In order to overcome these problems [43], [44] proposed a method that combines the signal-to-noise boosting capacity of linear filters with a particular merging scheme that makes totally irrelevant the spectral behaviour f . The method is called matrix multi-filters (MTXF) and its foundations can be summarized as follows: Let Ψ ij ( x) be a set (matrix) of N × N linear filters, and let us define the set of quantities The quantity above is the sum of a set of linear filterings of the individual channels. We intend to use the combined filtered images w i ( x) as estimators of the source amplitudes A i . For that purpose, we require that the filters Ψ ij satisfy the conditions: • The combined filtered image at the position of the source w i 0 is an unbiased estimator of A i .
• The combined filtered image at the position of the source w i 0 is an efficient estimator of A i , that is, the variance of the estimator is minimum.
Looking at the structure of (11) it is easy to verify that a sufficient condition in order to guarantee the first condition above is that is, the filter functions are orthonormal to the source profile functions. This guarantees statistical unbiasedness independently of whatever values f may take. The solution to the problem given the two conditions above and the constraints (12) is, in Fourier space where * denotes complex conjugation and In the case where P is diagonal (no noise correlation among channels), it can be shown that the matrix multi-filters default to a simple matched filter applied individually to each channel. The MTXF do not project N channels into a single effective plane where to detect the sources. They project instead N channels into N new planes where the sources can be detected and their N amplitudes A i (i = 1, . . . , N ) can be estimated separately. The difference with the single-channel approach is that each one of the N new planes w i is constructed by combination of the original N channels in such a way the noise is cancelled more effectively: the MTXF use the multi-channel cross-correlation of noise but do not use at all any information about the spectral behaviour of the sources. It can be described as a 'semi multi-channel approach', in the sense that it uses only half of the available information. When the noise cross-correlation among channels is zero, the method defaults to standard single-channel matched filtering. But when the cross-correlation is not null, the MTXF give better signal-to-noise boosts than the standard single-channel matched filter. Figure 1 shows the comparison between the single-channel matched filter and the MTXF for a simulation of the Planck 30, 44, 70 and 100 GHz channels. Note that for the 44 and 70 GHz channels the output matrix filtered maps look far cleaner than their matched filtered equivalents. For the 30 GHz channel, the distinction is not so clear, but some improvement can be appreciated nevertheless. Finally, for the 100 GHz channel both filtered images look practically identical. The gain factors obtained for these images with the MTXF are [2.9, 3.8, 3.5, 2.8] for the [30,44,70, 100] GHz channels. The signal-to-noise gain ratio between the MTXF and the matched filter is GMTXF/GMF = [1.38, 1.52, 1.49, 1.00] for the [30,44,70, 100] GHz channels. Note that in one of the channels (100 GHz) the signal-to-noise gain is equal to one; although this is not always the case, the MTXF tend to default to the standard matched filter for some of the channels when the multi-channel mixing does not add information in a constructive way: this is often (but not always) the case for the Planck 100 GHz when combined with the lower frequency channels, that have worse angular resolution and higher noise levels. This cannot be taken as a general rule, because the conditions change from one region of the sky to another. Only in the worst case the 100 GHz MTXF filtered image is as good as the single-channel MF filtered image. Regarding the number of true and spurious detections produced by both methods Figure 2 shows the receiver operating characteristic (ROC) curves for the MTXF and the MF in the four considered channels; a clear improvement can be appreciated at 30, 44 and 70 GHz, whereas both methods work similarly for thr 100 GHz case. This serves as an indication of how MTXF can produce results better or at least equal to single-channel matched filters without making any use of the sources spectral diversity.

V. MATCHED MULTI-FILTER
Multi-channel detection techniques reach the summit of their power when they fully exploit the spectral diversity of compact sources with respect to the background. he fully multi-channel compact source detection techniques that are being applied in the context of CMB astronomy lie in two main groups: linear filtering methods based on the so-called matched multi-filters plus some thresholding detection criteria and Bayesian methods, to be discussed in the next section.

A. Standard matched multi-filters
Let us assume that the spectral behaviour f of the sources is known a priori. An example is the well-known thermal Sunyaev-Zel'dovich effect which, ignoring relativistic effects, has a spectral behaviour given by (9). This opens interesting possibilities. For example, if f is known, it is straightforward to stack the different channels with an optimal weighting designed to minimize the effects of the background while keeping intact the intensity of the sources [45]: such that if there is a source of intensity I * located at the origin, then D 0 = I * .
The solution to this kind of internal linear combination of channels is given by the generalized eigenvalue problem [45] ( where the elements of the matrices G and M are given by It is evident that M is a measure of the cross-correlation of the noise among the different channels. The combination (15) using the weights obtained through (17) gives the optimal internal linear combination (ILC) map for the detection of sources with the spectral behaviour f . The ILC map can be further processed using a single-channel matched filter as suggested by [45]. In an independent work, [40] combined simulated multiwavelength maps in order to increase the average signal-to-noise ratio of galaxies assuming that their spectral behaviour could be modelled by expressions such as (10) with a known spectral index.
In the previous ILC method, however, the separation between linear combination and filtering seems somewhat artificial. The process of filtering and projecting into a single effective plane can be achieved in a single step by means of the so-called matched multi-filters [45]- [48]. Let us define the effective filtered plane as a sum of optimally filtered imageŝ where the ψ i are N linear filters depending on a certain scale parameter S. The meaning of this scale parameter will become evident shortly. If we impose to the effective filtered planeD the usual unbiasedness and efficiency conditions, that is • At the position of the sources,D is an unbiased estimator of I * . • The variance ofD is minimum. The solution, as proven in [45], is given in Fourier space by where the vector F has components F i k; S = f i τ i k; S . We show explicitly the dependence on S for a reason that will be clear immediately. The filters defined by (21) Figure 3 shows a small part of a realistic sky simulation obtained with the Planck Sky Model (PSM) package [49] for the Planck HFI frequencies of 100 143, 217, 353, 545 and 857 GHz. There is a bright galaxy cluster in the center of the images, but it is very hard to spot it visually. On the other hand, the presence of the cluster is evident in the MMF-filtered image shown in Figure 4.

B. Objects with different size
Until now, we have not made reference to the scale R of the sources that appeared in equations (1)- (5). In sections III and IV we have implicitly assumed that all the sources shared the same basic profile τ 0 . This is pretty well the case of

galaxies in low resolution experiments such as WMAP and
Planck: the angular scale of these objects is typically far smaller than the instrument psf and therefore all of them can be safely considered as point sources, with observed profiles that are basically equal to the observing beam. In that case, we can effectively forget about the scale R and assume that is something intrinsic to the profile τ 0 that does not require to be made explicit. But this cannot be the case for galaxy clusters: many of them are resolved objects even at the relatively low angular resolutions of Planck [50]. Detection algorithms must therefore provide not only a list of positions and intensities, but also of sizes of clusters. Moreover, they should give accurate photometry (values of I * ) for all the range of possible object sizes. MMF can be adapted to do precisely this in a very simple manner.
Let us assume that all the compact sources in a given multichannel image have the same spatial profile except for a scale parameter R. For example we may consider the universal galaxy cluster pressure profile given by [51] with a different contrast radius R 500 for each cluster in our sample. If the j th cluster has a true value of its scale parameter equal to R j , then is it straightforward to prove that the filters (21) satisfy the conditions of unbiasedness and maximum efficiency if and only if their scale is S = R j . Moreover, as a part of the demonstration it can be shown that the signal-to-noise boosting given by the filter for that particular cluster is maximum also if and only if S = R j . This suggests a very simple detection/estimation algorithm: 1) Filter the multi-channel image with a set of filters (21) with different scales S i . 2) Select as cluster candidates the positions of the maxima of the filtered images above a certain threshold. 3) For each cluster candidate j, obtain the curve showing the signal-to-noise boosting versus the scale S i . Find the location S max (j) of that curve. 4) Make R j = S max (j).

C. Unbiased matched multi-filters
Galaxy clusters interact with CMB photons not only through the thermal SZ effect, but also through the so-called kinematic SZ effect due to the proper motion of the cluster. This is a perfect example of a case in which the same object produces two signals with identical spatial distribution but different spectral behaviours. The mixing of the two signals can affect negatively the determination of each of them. Normally the kinematic effect is at least one order of magnitude fainter than the tSZ and therefore the bias introduced by it in tSZ measurements can be neglected. But the contrary is not true: the tSZ can contaminate significantly the estimation of the kinematic effect. In order to avoid this [46] proposed a modification of MMF specifically tailored to cancel out this bias.
Let us consider a case in which we wish to observe the kinematic SZ effect without being disturbed by tSZ. In thermodynamic units the vector f = [f 1 , . . . , f N ] of the thermal effect is given by equation (9). The spectral behaviour of the kinematic effect is, in the same units, flat: [1, . . . , 1]. We look for a set of N filters Φ i , . . . , Φ N such that the estimation of the kinematic effect is not affected by the thermal. A sufficient condition for this is which can be interpreted as a kind of orthogonality with respect to the spectral behaviour laws of the components. The solution, when we add the maximum efficiency constraint, looks similar to the MMF: In this equation the constants α, β and ∆ are given by Similarly, it is possible -albeit less interesting-to design filters that give the thermal SZ effect while cancelling the bias introduced by the kinematic effect. Both families of filters are called unbiased matched multi-filters (UMMF).

D. Unknown spectral behaviour
The prior knowledge of f is the fundamental key of the success of MMF and the reason why they are able to get high signal-to-noise ratio boosts with respect to the individual channels. However, as we mentioned before this prior knowledge is only certain when considering the SZ effect (and while ignoring relativistic effects). Extragalactic radio and infrared sources are not that simple to model. One possible solution is to group sources in broad families -radio flat, radio steep, dusty galaxies of a certain type, etc-and to define average spectral laws, such as (10), for each family. This approach will not be optimal for individual sources and will certainly introduce biases in the photometry, but at least is expected to improve the number of detections with respect to singlechannel detection.
Fortunately MMF allow us to solve the conundrum in a very elegant way by means of an adaptive filtering scheme similar to the one described in section V-B.
Now imagine that f describes the true (unknown) spectral behaviour of a source and that g = [g i ], i = 1, . . . , N is a new vector of equal size as f , but whose elements can take any possible value. We can define the MMF for vector g: where all the definitions are in Fourier space and we have not written down the explicit dependence on k for simplicity. If we apply the filter (27) to a source with true spectral behaviour f and true intensity I * we will get so I g = I * unless g = f . This is no help since we do not know the true value I * . But it can be shown [52] that the quantity where σ 2 g is the variance of the image filtered with (27), is maximum when g = f . SN R g is obviously the signal-to-noise ratio of the source after filtering. This allows us to reproduce the same kind of algorithm that was used in section V-B to detect the sources and estimate simultaneously their position, intensity and spectral behaviour (instead of their scale). The only difference here is that while in section V-B there was only one parameter per cluster to be determined (the scale), here we have N − 1 parameters per cluster (components of f ) to determine. The method has been recently applied to the two highest frequency channels of WMAP [53], leading to a new catalogue of 157 objects detected at the 5σ level at 94 GHz, which is a substantial improvement over the WMAP Five-Band Point Source Catalogue.

ESTIMATION
All the multi-channel detection methods above have a common two-step methodological approach: in a first moment the data are somehow pre-processed and then objects are detected by means of some criterion -typically thresholdingthat hopefully separates them from the background noise. For the pre-processing step, one chooses beforehand a given tool (e.g. linear combinations of channels, or a given kind of filter/wavelet) and adjusts a small number of parameters (e.g. the weights of the linear combination, the scale of the wavelet, etc) according to some optimality criterion (e.g. unbiasedness, maximum efficiency). For the detection step, an arbitrary threshold is chosen with the hope to reach a compromise between minimizing the number of spurious detections and maximizing the number of true detections. A third step for the estimation of a number of parameters of the sources (e.g. intensity, size...) can be attempted after detection or, in some cases, simultaneously to it 3 (as it occurred with MMF). Errors to the estimated parameters are calculated by means of some prescription such as Fisher analysis. All these steps are somewhat arbitrary and are focused only in the statistical properties of the background (and the deterministic spectral behaviour of the sources, in the case of multi-channel detection). This approach leaves out all probabilistic (and potentially useful) information about how many sources are expected to be found above a given flux limit, how are they distributed as a function of intensity, how many classes of sources are there and in which proportion are they present in the data, etc.
The Bayesian system of inference is the only one that provides a consistent extension of deductive logic to a broader class of degrees-of-belief by mapping them into the real interval [0, 1] [54]. Bayesian inference provides a logically consistent way of tackling the detection problem as a part of the decision theory, while incorporating a probabilistic description of the sources as a priori information in a natural way. The Bayesian framework also provides a sensible detection criterion through the Bayesian posterior odds ratio [54], [55]. In estimation, Bayesian methods give a full description of the a posteriori distribution of the parameters given the current data, thus allowing us to obtain expectation values, confidence level contours and any other statistics of interest.
A detailed description of multi-channel Bayesian detection of compact sources is given in [55]. Here we will just summarize the main theoretical aspects of the problem. Let us start with Bayes' theorem P r (Θ|D, H) = P r (D|Θ, H) P r (Θ|H) P r (D|H) , where D is the vector of the observations as in eq. (1) and is the vector Θ contains all the relevant parameters (positions, intensities, sizes, etc) to the detection problem and H is the underlining hypothesis. In the usual Bayesian terminology, P r(Θ|D, H) is the posterior probability distribution of the parameters, P r(D|Θ, H) ≡ L(Θ) is the Likelihood, P r (Θ|H) ≡ P r(Θ) is the prior and P r (D|H) ≡ Z is the Bayesian evidence. The most simple detection scenario can be described as a decision between two incompatible hypothesis H 1 (there is a source) and H 0 (there is not a source). In this case, it can be shown [54], [55] that the decision criterion that minimizes the expected loss (that is, that optimizes the balance between the number of true and false detections) is given by the following condition on the Bayesian odds ratio where ξ is a threshold that is fixed for any given loss function.
The choice of the loss function depends on the particular setting of the experiment; a common custom in microwave astronomy is to give identical weight to a spurious detection than to a missing true detection.

A. Likelihood
The form of the likelihood is determined by the statistical properties of the generalised noise (background sky emission plus instrumental noise) in each frequency channel. If the generalised noise is statistically homogeneous it is more convenient to work in Fourier space, since there are no correlations between the Fourier modes of the generalised noise. Please note that although Galactic emission is not homogeneous, one can always operate locally in patches small enough to make the homogeneity assumption valid (as a first approximation). It is also common to assume that both the background emission and instrumental noise are Gaussian random fields. This is a very accurate assumption for instrumental noise and the primordial CMB, but more questionable for Galactic emission. Under the previous assumptions, the likelihood ratio between the hypotheses H 1 and H 0 can be written as Note that the effect of the products X t P −1 Y, with X and Y generic vectors, is to project the multi-channel data into one single equivalent channel (or plane), as it happened in the MMF and UMMF. The similarities are more profound. It is shown in [55] that, when the source term s is written as the sum of N s compact sources distributed across the image, the likelihood ratio (32) can be split into two parts: • A sum of 'auto-terms' each containing just the parameters corresponding to the i th source (i = 1, . . . , N s ) • A sum of 'cross-terms' with mixed parameters corresponding to the i th , j th sources (i, j = 1, . . . , N s ) The cross-term goes quickly to zero when the sources are well separated, but must be taken into account when source blending is frequent (crowded fields). It is worth noting that maximising the likelihood ratio (32), in the absence of the cross-term (negligible source blending), with respect to the source intensities I * ,j , leads precisely to the MMF described in section V. This is the multi-channel generalization of the well-known result that matched filtering is the solution of the generalized maximum likelihood test (GLRT) for Gaussian noise and a single source [32], [56].

B. Priors
If the data model provides a good description of the observed data and the signal-to-noise ratio is high, then the likelihood will be very strongly peaked around the true parameter values and the prior will have little or no influence on the posterior distribution. At the faint end of the source population, however, priors will inevitably play an important role. Moreover, since for most cases in astronomy the faint tail overwhelmingly dominates the population, the selection of the priors becomes important and has to be addressed very carefully. Physical (informative) priors are particularly useful when addressing the detection problem, whereas noninformative priors can be more adapted to the task of parameter estimation once the sources have been detected [55]. For multichannel detection, the list of priors to be considered must include 1) Prior on the positions: For extragalactic sources, it is reasonable to assume a uniform prior. Clustering can be a problem, particularly for dusty galaxies observed at ν ≥ 300 GHz.
2) Prior on the number of sources: Following the same rationale of local uniformity, i.e no clustering, the probability of finding N s objects (above a given flux limit) in a sky patch follows a Poisson distribution where λ is the expected number of sources per patch.

3) Prior on intensity:
A usual informative prior on the distribution in intensity of extragalactic sources is a Generalized Cauchy Distribution such as in [31]. This provides a good model for the observed distribution of fluxes, fitting the de Zotti or Tucci models almost perfectly [2], [3]. Moreover, the distribution can be properly normalised as required for evidence evaluation. For the case of galaxy clusters, a power law distribution fits well the cluster populations assuming a standard WMAP best-fit ΛCDM cosmology [57] and the Jenkins mass function [58]. 4) Prior on the sizes: Point sources are best modelled by imposing the prior π(R) = δ(R) on the radius. For galaxy clusters, the fact that a significant number of them can be resolved must be taken into account. In [55] a truncated exponential law is found to fit the simulated catalogues very well.

5) Prior on the spectral parameters:
For galaxy clusters observed through the thermal Sunyaev-Zel'dovich effect, the spectral behaviour can be safely fixed to (9), if we ignore relativistic effects. For point sources, however, the fact that each source has a spectral behaviour that is different from all the others must be accommodated. Either if we use power laws such as (10) for radio sources or grey body laws for dusty galaxies, there is at least one spectral parameter that changes from one source to another. Informative priors can be derived from the available source counts models [2], [3], or uniform priors can be used when physical models are not available.
6) Priors on the models: If there is more than a kind of source in the data (e.g. we have radio sources, dusty galaxies, Galactic compact sources and galaxy clusters in the same image) then appropriate priors in the probability of the different hypotheses H j should be included in the Bayesian framework. More importantly, the Bayesian framework allows us to explicitly consider the probability of background fluctuations above a certain level in order to optimize the decision between the hypotheses H j and the null hypothesis H 0 (i.e. no source).

C. Bayesian detection methods in the literature
Solving the Bayesian equations for object detection and parameter estimation typically implies sampling from a very complex posterior distribution with variable dimensionality (dependent on the number of objects). Thus the main problem that Bayesian methods need to overcome is the computational burden of evaluating integrals over the posterior and its marginals. This can be avoided in some cases by semianalytical Maximum A Posteriori evaluation [31], but if a full analysis of the posterior is required then sampling is unavoidable. Typical implementations include Monte-Carlo Markov chain (MCMC) algorithms [27] and posterior refinements such as nested sampling [28], [59]. [29] implemented a simultaneous multiple minimization code based on Powell's direction set algorithm [60], that is generalized to the multichannel case in [55]. A full description of that algorithm and its catalogue making procedure it out of the scope of this review.

VII. POLARIZED SOURCES
An interesting case of multi-channel setting is the detection of extragalactic sources in polarization data. Mathematically the problem is not different from the general model given in section II, but the particularities of polarization observations have motivated the appearance of some specific techniques apart from the ones already discussed. CMB polarization has been described as the next observational frontier of cosmology. In the last few years, some methods have been specifically developed to address the important problem of detecting compact sources in polarization data. In this section, we will give a brief review of these methods, which have been applied to CMB simulations as well as to real data [61]- [63].
Polarization of light is conveniently described in terms of the Stokes parameters I, Q, U and V , (see [64] for an excellent review on CMB polarization). Q and U are the linear polarization parameters and V indicates the circular polarization. Whereas Q, U and V depend on the orientation of the receivers, the total polarization, defined as is invariant with respect to the relative orientation of the receivers and the direction of the incoming signal, and therefore has a clear physical meaning. This quantity can be treated as the modulus of a vector. Thus, the methods presented in [61] have to do with the study of a set of images which contain signals whose individual intensities can be considered as components of a vector, but where the quantity of interest is the modulus. First, we will consider the case of linear polarization, V ≡ 0, given that the CMB is not circularly polarized in the standard cosmological models [64]. However, since some models predict a possible circular polarization [65], we will comment briefly on this case later. If we have a compact source embedded in the data of Q and U , these can be expressed in the following way with A Q,U the compact source intensity in Q and U , τ ( x), the beam profile and n Q,U ( x) the corresponding noise in both components. The P -map, P ( x) ≡ (D 2 Q ( x) + D 2 U ( x)) 1/2 , is characterised by a source at the centre of the image with amplitude A ≡ (A 2 Q + A 2 U ) 1/2 immersed in non-additive noise which is correlated with the signal.
We assume the same beam profile for the images in Q and U , as well as a Gaussian independently distributed noise with zero mean and the same variance for Q and U . Given these typical conditions, the distribution of P if a source is present is the Rice distribution where σ is the noise rms deviation and I 0 is the modified Bessel function of zero order. By using the Neyman-Pearson lemma [32], a filter is obtained which produces the maximum likelihood estimator of the amplitude,Â. In the case of a pixelized image, we can writê where the indexes refer to the pixels and I 1 the modified Bessel function of the first order. This filter is called the Neyman-Pearson filter (NPF). Alternatively, a matched filter can be applied to each image and then, with the two filtered images Q MF , U MF a non-linear fusion P ≡ (Q 2 MF + U 2 MF ) 1/2 can be made pixel by pixel. This filter is called the filtered fusion (FF).
Simulations with the Planck characteristics showed that the FF performed better than the NPF especially for low polarization fluxes. The FF outperformed the NPF when the power of the detections with a given significance and the flux and position estimation were compared.
If the circular polarization is taken into account, the NPF and the FF can be also calculated. Indeed, the filters can be computed for the modulus of a vector with any number of components [63]. Simulations showed that in the case of circular polarization, the FF also produces better results than the NPF.
Taking into account the results with simulations, [62] applied the FF to the detection of polarized sources in the WMAP 5-year data. They detected, with a significance level greater than 99.99% in at least one WMAP channel, 22 objects, 5 of which were doubtful, since they did not have a plausible low frequency counterpart. These detections were a clear advance with respect to the 5 polarized sources listed by the WMAP team when they analysed the same data. The application of a filter targeted specifically for polarization detection had a clear influence on the improvement.
The application of these methods to Planck data can be expected in the future. It would also be interesting to combine them with a Bayesian approach, so that some previous information about polarization properties of the sources could be taken into account.

VIII. FINAL REMARKS
CMB image processing is a relatively young area of research. Much work remains to be done. There is an incipient but resolved interest among CMB cosmologists to incorporate the newest ideas to solving the problem of compact sources in microwave images.
In this paper we have reviewed the status of the different algorithms and methods that attempt the detection of extragalactic compact sources (galaxies and clusters of galaxies) using multi-channel (that is, obtained by means of more than one single detector) data. This should not be confounded with the traditional band-merging approach to catalogue making. We included the case of detection in polarized data because it is formally equivalent to the detection in multi-frequency experiments. The techniques we have reviewed include linear fusion of images, different multi-channel filtering methods and Bayesian object detection. Although the area of research is young, all of these methods have already been applied to astronomical data sets with great success. In next years, however, we expect to see new interesting developments in the field.
One of the most promising ideas for future research is related to the notions of sparsity and l p -approximations. For the particular case of point like objects, the idea of sparse dictionaries comes naturally [66]. However, the full application to multi-channel CMB compact source detection has not been addressed yet. Wavelet techniques, that are very popular in single-frequency source detection, open another interesting possibility that has not been explored yet. The same applies to other space-scale representations such as Gabor and Wigner-Ville transforms. Undoubtedly, both Bayesian and multi-filtering techniques will see substantial improvements in the next few years. All these new developments will be really come in handy for the new generation of many channel cosmological surveys such as the upcoming J-PAS 4 .

ACKNOWLEDGEMENT
The authors would like to thank the editors of this Special Issue for their invitation to submit a contribution. We also thank Prof. Gianfranco De Zotti for his useful advice. DH and FA acknowledge partial financial support from the Spanish Ministerio de Ciencia e Innovación project AYA2010-21766-C03-01. DH also acknowledges the Spanish Ministerio de Educación for a José Castillejo' mobility grant with reference JC2010-0096 and the Astronomy Department at the Cavendish Laboratory for their hospitality during the elaboration of this paper. FA also wants to thank the Astronomy Department at the Cavendish Laboratory for their hospitality during two wave vector (Fourier) N number of channels D j observed data at the j th channel s j signal at the j th channel n j noise at the j th channel A j source amplitude at the j th channel τ j source profile at the j th channel τ 0 intrinsic source profile (before smearing by the beam) b j beam point spread function at the j th channel R scale parameter P cross-power spectrum matrix f spectral behaviour of the source Ψ,Φ matrix (or vector) of filters SN R signal-to-noise ratio Θ vector of parameters (position, size, amplitude...) I, Q, U, V Stokes' polarization parameters P total polarization I 0 modified Bessel function of zero order