Kinematics of Milky Way Satellites: Mass Estimates, Rotation Limits, and Proper Motions

In the past several years high resolution kinematic data sets from Milky Way satellite galaxies have confirmed earlier indications that these systems are dark matter dominated objects. Further understanding of what these galaxies reveal about cosmology and the small scale structure of dark matter relies in large part on a more detailed interpretation of their internal kinematics. This article discusses a likelihood formalism that extracts important quantities from the kinematic data, including the amplitude of rotation, proper motion, and the mass distribution. In the simplest model the projected error on the rotational amplitude is shown to be $\sim 0.5 $ km s$^{-1}$ with $\sim 10^3$ stars from either classical or ultra-faint satellites. The galaxy Sculptor is analyzed for the presence of a rotational signal; no significant detection of rotation is found, and given this result limits are derived on the Sculptor proper motion. A criteria for model selection is discussed that determines the parameters required to describe the dark matter halo density profiles and the stellar velocity anisotropy. Applied to four data sets with a wide range of velocities, the likelihood is found to be more sensitive to variations in the slope of the dark matter density profile than variations in the velocity anisotropy. Models with variable radial velocity anisotropy are shown to be preferred relative to those in which this quantity is constant at all radii in the galaxy.


Introduction
Since their initial discovery [1], dwarf spheroidals (dSphs) have offered a unique insight into the formation of galaxies and structure on the smallest scales. Initially characterized as unusual and ghostly stellar systems, photometric studies tended to find that these systems contained old stellar populations with no recent signature of star formation activity [2]. Though photometrically well-studied since their discovery over seventy years ago, as late as nearly 30 years ago minimal was known on the internal kinematic properties of their stellar populations or on the kinematic properties of these objects in the Milky Way (MW) halo.
Aaronson [3] provided the first measurement of the line-of-sight velocities of stars in Milky Way dSphs. From the spectra of merely three carbon stars, Aaronson suggested a mass-to-light ratio for the Draco dSph nearly an order of magnitude greater than that of Galactic globular clusters. Follow-up studies of several dSphs, including Sextans, Fornax, Ursa Minor, Sculptor, increased the velocity samples by an order of magnitude, and in the process established these systems to be dark matter dominated [4,5,6]. It was further suggested that all of these systems share a similar dark matter halo mass of ∼ [1 − 5] × 10 7 M ⊙ [6]. Even at the time of these early measurements, it was understood that the mass distributions of these systems provide strong constraints on the properties of the particle nature of dark matter, including its mass and primordial phase-space density [7,8,9].
With the advent of high resolution, multi-object spectroscopy, the velocity samples from the brightest dSphs initially studied in Refs. [4,5,6] have now increased by up to three orders of magnitude [10,11,12]. These new data sets have revealed that the velocity dispersions of the systems are all ∼ 10 km s −1 , and in all cases the dispersions remain constant even out to the projected radius of the outermost velocity measurements [11]. Though the data sets have increased by more than ten-fold, the more modern analysis of these systems still confirms the global conclusion established from the initial observations that dSphs are strongly dark matter dominated [10,11,13,14].
Not only has the past several years seen an increase in the kinematic data sets for the brightest dSphs, the number of known Milky Way satellites has more than doubled due to the Sloan Digital Sky Survey (SDSS). As of the writing of this article, the SDSS has discovered 14 new Galactic satellites [15,16,17]. The new SDSS systems have lower luminosities and surface brightnesses than the 11 classical Milky Way satellites that were known prior to SDSS. The half-light radius for several of these new objects is less than ∼ 100 pc; this radius is smaller than the typical half-light radius of the classical satellites but still somewhat larger than the typical globular cluster half-light radius of ∼ 1 − 10 pc.
Several kinematics studies on the ultra-faint population of SDSS satellites have been undertaken in the past several years [18,19,20,21]. Using spectra from eight of the SDSS satellites, Simon and Geha [20] concluded that these objects are strongly dark matter dominated. Several of the ultra-faint satellites have velocity dispersions as low as ∼ 5 km s −1 , making them the most-promising systems to study the phase space limits of the dark matter. It has additionally been observed that the ultra-faint satellites are the most metal-poor systems known, and that they form a continuation of the luminosity metallicity trend set by the brightest dSphs [22,21].
With the above data sets now available, it is becoming increasingly necessary to develop better theoretical tools to interpret them. An important aspect of the theoretical modeling will necessarily require an interpretation of the kinematic data sets for the population of MW satellites; a detailed understanding of these kinematic data sets will be important not only for determining the mass distributions of each individual system, but for a global comparison to theories of Cold Dark Matter (CDM) [23,13]. Understanding the mass distributions will also be important for interpretation of limits on particle dark matter masses and annihilation cross sections in high-energy gammaray experiments [24,25,26]. Further, understanding the kinematics of these systems may eventually reveal whether they have dark matter cusps or cores, which would in itself provide a stringent test of the CDM paradigm [27].
The primary aim of this article is to discuss a maximum likelihood formalism that is used for extracting important physical quantities from dSph kinematic data sets. Section 2 begins by reviewing the properties of the kinematic data sets and defining the likelihood. Section 3 then uses the likelihood to extract rotational and proper motion signals. Section 4 discusses mass modeling and a new calculation for model selection. Section 5 presents the conclusions.

Likelihood Function and Error Modeling
Information on the kinematic properties of dSphs is extracted from the line-of-sight velocities of their individual stars. This section introduces the likelihood used in the data analysis and projections for the errors attainable on several parameters using the likelihood.

Likelihood Function
The probability for a velocity data set, v, is assumed to be of the form . (1) In Eq. 1 the dispersion of the distribution is given by the sum of the measurement uncertainty on a star, σ m , and the intrinsic dispersion of the system at the projected radius of the star. The latter quantity is symbolized by σ los and is determined by the model; Section 4 below provides more details on this quantity and specifically how it relates to the mass of the systems. The systemic line-of-sight velocity in the direction of the i th star is given by u. Written in the above form, Eq. 1 may be read as the probability for the data set, given the parameters u and σ los . Appealing to Bayes' Theorem and defining the likelihood function as the parameters u and σ los may be determined directly from the data by the maximization of Eq. 2. Equation 2 assumes uniform priors on the model parameters.
The form of Eq. 1 results from the convolution of Gaussian distribution which represents the measurement error on the velocity of a given star with a separate sampling distribution that is assumed to be Gaussian. It is the sampling distribution of velocities that is connected to physical quantities such as the velocity anisotropy of the stars, and the potential of the stellar and dark matter components. For a given model of the galaxy, the true line-of-sight velocity distribution function may indeed be nonaussian; certain limiting cases of the velocity distribution for analytic potentials have been considered in Ref. [28]. This paper shows that when attempting to reconstruct the line-of-sight velocity distribution for a given model, degeneracies exist between the stellar velocity anisotropy and the stellar and dark matter potentials. Though more information may be gained on model parameters if the true velocity distribution were known, and thus utilized in the parameter estimation, the Gaussian approximation provides the most conservative sampling distribution in reconstructing model parameters in variance estimation problems (for a specific discussion of this point, see the discussion in Chapter 8 of Ref. [29]). Further, the mass estimations presented here using the likelihood in Eq. 2 agree with mass estimates that use a Gaussian likelihood in the binned velocity dispersion [30,31]; in this latter case the velocity dispersion does not necessarily correspond to the variance of a Gaussian line-of-sight velocity distribution, making it self-consistent to determine parameters such as the velocity anisotropy.
The distribution function in Eq. 1 provides the simplest description of a data set. Including higher-order effects naturally introduces a larger set of model parameters. The first modification to Eq. 1 from higher order corrections comes from noting that the mean velocity, u, varies as a function of the position of the star in the galaxy. This variation in the mean velocity results from the fact that, for lines-of-sight with larger angles from the line-of-sight directly to the center of the galaxy, the proper motion of the object contributes an increasingly larger component to the line-of-sight velocity. To describe how the line-of-sight velocity varies as a function of position, consider a cartesian coordinate system in which the z-axis points in the direction of the observer from the center of the galaxy, the x-axis points in the direction of decreasing right ascension, and the y-axis points in the direction of increasing declination. The angle φ is measured counter-clockwise from the positive x-axis, and ρ is the angular separation from the center of the galaxy. The mean line-of-sight velocity is then In the small angle approximation, sin ρ ≃ R/D, where R = √ x 2 + y 2 , and D is the distance from the observer to the center of the dSph. Then sin φ = y/R, so that Eq. 3 can be written as u = v x x/D + v y y/D − v z . In the limit that the vector pointing from the observer to the center of the galaxy is exactly parallel to the lines-of-sight to each star, u ≃ −v z . Equation 3 show that the line-of-sight velocity of a system increases roughly linearly with the increase of the projected distance from the center of the dSph. This effect is purely geometric and may be used to recover the proper motion of a dSph with similar accuracy to the proper motions attained in ground and space-based measurements [32,12]; an application to a specific data set of Sculptor is given below. The extraction of dSph proper motions in this manner is analogous to the determination of the proper motions for the Large Magellanic Clould [33] and for M31 [34] from their stellar and satellite distributions, respectively.
There may also be rotational motion, in addition to the dominant contribution from random motions, present in the galaxy. Though rotation is intrinsic to the dynamics of the system and is not purely geometric as that described by Eq. 3, a simple parameterization is possible if the rotation amplitude is described by a term A sin(φ − φ 0 ), where φ 0 defines the projected axis of rotation. Adding all of the terms together gives the following expression for the line-of-sight velocity of a star, With the addition of each of the terms in Eq. 4, our likelihood function now reads and the vector set of 6 parameters (v x , v y , v z , A, φ 0 , σ los ) may be directly determined from the data. In the sections below these parameters are determined from an example data set; before jumping into this data analysis the following sub-section provides a discussion of the theoretical predictions for the errors attainable on these quantities.

Error Modeling
From the likelihood function defined in Eqs. 1 and 5, the Fisher matrix formalism may be used to derive projected errors on the model parameters. For m model parameters that are varied, the Fisher matrix is defined as an m by m matrix so that the entry for the a th and b th parameters is given by Here θ is a vector defining the set of parameters. In the simple case studied in this section, the parameters are given by θ = {v x , v y , v z , A, φ 0 , σ los }. According to the Rao-Cramer inequality, the minimum possible variance attainable on a parameter using maximum likelihood statistics is given by the inverse of the Fisher information matrix, F −1 aa . The average in Eq. 6 is taken over the data, and the derivatives are evaluated at the true model of parameter space. The inverse of the Fisher matrix thus provides an approximation for the true covariance of the parameters, and using F −1 provides a good approximation to the errors on parameters that are well-constrained by the data.
The Fisher matrix is constructed by differentiating the log of the likelihood function in Eq. 5. It will be understood that the total dispersion σ 2 ı = σ 2 los + σ 2 m is evaluated at the projected radius of the ı th star. Averaging over the likelihood function, and using the above definition of u, the final expression for the Fisher matrix is The sum is over the N number of observed stars in the galaxy. The analysis in this section considers the simplified case that σ 2 los does not in itself depend on any model parameters. A more detailed model would consider this quantity as a function of the parameters that describe the mass modeling of the system; this is discussed in more detail in Section 4 below.
In the second term in Eq. 7, the derivatives are with respect to the theory dispersion alone, whereas both of the contributions to the variance sum in the denominator. For the well-studied satellites, with intrinsic velocity dispersions of 10 km s −1 , the dispersion from the distribution function dominates the dispersion from the measurement uncertainty, while for many of the newly-discovered satellites, both contributions to the dispersions are similar. Equation 7 shows that, to determine the error on any of the θ parameters, one must determine 1) the distribution of stars within the dSph that have measured velocities, and 2) the error on the velocity of each star. This implies that the projected errors are independent of the mean velocity of the stars. Additionally, under the approximation that sin ρ ≪ 1 and no rotation, the first term in Eq. 7 vanishes, and the errors are independent of the parameters describing the mean motion of the system.
The projected errors obtained using Eq. 7 provide an excellent estimate of the measured errors on both v x and v y [32,12]. Though there has been no conclusive detection of a parameter similar to A in published kinematic data samples, it is interesting to determine the expected error on this quantity given expected future data samples. Figure 1 shows example error projections for A, for two different model galaxies. The upper solid curve assumes structural parameters similar to that of Segue 1, with a Plummer radius of 0.03 kpc and a stellar limiting radius of 0.1 kpc [35]. The lower dashed curve assumes structural parameters similar to that of Draco, with a King core radius of 0.18 kpc and a King limiting radius of 0.93 kpc [36]. Each curve assumes that the measurement uncertainty on each star is 2 km s −1 . In both cases, the stars have been uniformly distributed at projected positions in the galaxies; this provides a good representation of the present observational configurations.
In addition to their interesting applications for understanding the rotation and proper motion of the dSphs, the calculations presented in this section are crucial for uncovering properties of underlying dark matter distributions. For example a strong gradient may reflect ongoing tidal disruption, which would clearly affect dark matter mass modeling, as is discussed in more detail in Section 4.

Proper Motions and Rotation
This section discusses an application of the maximum likelihood formalism introduced in Section 2, with a specific focus on the methodology for extracting an intrinsic rotational signal and proper motions using an example data set. Extracting rotation from a data set is important for reasons discussed above, and, in addition to its phenomenological interests, extracting the proper motions of MW satellites may have important implications for understanding the origin of the accretion history of MW [37,38,39]. Specifically determining the latter would present a unique observational test of MW halo formation within the CDM paradigm.
Several dSphs have kinematics data sets large enough that statistically significant constraints may be placed on the parameters v x , v y , and A. For illustrative purposes this section considers just one example, the Sculptor dSph. Sculptor is located at a distance of 80 kpc and has a measured King limiting radius for its stellar distribution of ∼ 1.6 kpc [40]. Given these parameters it is one of the more spatially extended dSphs. The mass content of Sculptor has been estimated in several recent papers [41,14,13], and it has been shown that Sculptor may contain some degree of rotational support [41]. Further, the previous determinations of the proper motion of Sculptor from its line-ofsight velocities may indicate a discrepancy between the proper motion as determined from this method and from ground and space based measurements [12]. This latter fact may in itself be indicative of the presence of an intrinsic rotational component, provided the systematics on the ground and space-based determinations of the Sculptor proper motions are well-understood [42].
To extract the rotation and proper motion signal, a simplified model is considered by assuming the likelihood function is characterized by the six parameters introduced in Section 2. It is assumed that the intrinsic dispersion σ los is uniform throughout the galaxy, and does not depend on any of the parameters of the mass modeling introduced in Section 4 below. Introducing the set of parameters discussed in Section 4 does not affect the reconstruction of the parameters discussed in this section since the intrinsic dispersion is uncorrelated with the parameters of the function u [32]. In order to determine the probability distributions of the parameters θ = (v x , v y , v z , A, φ 0 , σ los ), a standard metropolis-hastings algorithm [29] is used to sample the likelihood function as written in Eq. 5. For all runs described here 10 4 accepted points were obtained in each chain, with the first 10% excluded to account for a conservative burn-in phase. For simplicity, a uniform proposal distribution is assumed for each of the parameters over a wide range chosen to encompass physically-accpetable values for each of these parameters. The line-of-sight velocity data used is taken from the Walker et al. [43] sample, and only those stars with > 90% c.l. probably for membership are used in the analysis.  Fig. 2 were obtained by allowing the A parameter to float freely, the distributions are found to be relatively unaffected if A is instead fixed so that A = 0. This reflects the fact that the radial gradient in the velocity of the stars is distinct from the intrinsic rotational component, which has a sinusoidal behavior as a functional of the position angle. The results presented in Fig. 2 are in agreement with the measurements of Walker et al. [12], though here a larger set of parameters is marginalized over in this analysis. Figure 3 shows the corresponding probability distribution for the rotational parameter A. Again the remaining five parameters (v x , v y , v z , φ 0 , σ los ) are marginalized over. The result is that, given the rotational parameterization and the using entire distribution of 1352 Sculptor stars, there is no statistically significant detection of rotation. From figure 3 the 90%. c.l. upper limit on the rotation is found to be ∼ 2 km s −1 . The result presented in figure 3 is somewhat degenerate with the parameters describing u; for example if v x and v y were (unphysically) set to zero, the implied upper limit on A reduces by about 50%.  Figure 3. The probability for the rotational amplitude, A, using the Sculptor line-ofsight velocity data. Figure 3 represents the averaged value of A throughout the entire galaxy. It may be possible that the rotation amplitude in the outer region differs from the rotation rate in the inner region; if this were the case then it is plausible that this effect is washed out in the averaging process. To provide a simple test for a possible differential rotation rate, an additional likelihood analysis was considered with just the outer sample of Sculptor stars. Here the outer stars are defined as only those with projected radius beyond 0.5 kpc. Even in this case, there is no statistically significant detection of A, though in this case the 90% c.l. upper limit increases to 10 km s −1 .

Mass Distributions and Model Selection Criteria
This section discusses the extension of the maximum likelihood analysis developed in Section 2, with a goal of using the kinematic data to determine dark matter mass distributions. A calculation of the gravitating mass of a stellar system is one of the more fundamental tasks in astronomy, and simple scaling arguments provide some guidance to anticipate the results. It is worthwhile to first review these arguments as applied to the dSphs before undertaking a more detailed and model dependent treatment.

Spherical Mass Modeling
Initial Estimates -Under the assumption that a star cluster is spherically-symmetric, the orbital distribution of the tracer particles are isotropic, that mass follows light, and the cluster is isolated from any external gravitational potential, the virial theorem provides a mass estimate of M vir ≃ r e σ 2 ⋆ /G, where r e is the observed extent of the cluster and σ ⋆ is the velocity dispersion of the stars. Although this is probably the simplest estimate one can make for the mass of a star cluster, it does provide a useful extremum bound. For example Merritt [44] has show that the virial theorem may be used to derive a lower bound on the mass of a star cluster, which is obtained from the assumption that all of the mass is concentrated as a point in the center. This minimum mass is given by M min = 3σ 2 ⋆ / r −1 G, where 1/ r −1 is the harmonic mean stellar radius in the cluster. Of course for dSphs it is not consistent to assume that these systems are isolated, since they are orbiting within the extended dark matter halo of the MW. For dSphs orbiting with the MW halo, the minimum mass estimate above is particularly useful, as it in turn provides a conservative estimate of the radius at which particles would be stripped due to the MW potential. As an example consider the case of Segue 1, which is a MW satellite with a stellar luminosity ∼ 340 L ⊙ at a Galactocentric distance of 28 kpc. From the de-projected light distribution, the harmonic mean stellar radius is ∼ 10 pc, and given the velocity dispersion of 4.3 km s −1 [21], the implied minimum mass of Segue 1 is ∼ 4 × 10 5 M ⊙ . Assuming that Segue 1 is a point mass orbiting in the potential of the MW, the radius at which particles would be presently getting stripped is the Jacobi radius, r t = [M/3M MW ] 1/3 D, where D = 28 kpc. Assuming the minimum mass of M = M min , r t ≃ 300 pc. It is important to note that this provides only an estimate of the instantaneous tidal radius; if Segue 1 came significantly closer to the MW in the past then this estimate would differ. The above estimate provides a lower bound on the radius at which particles would be getting stripped, under the assumption of a circular orbit. A similar argument for the tidal radius of Segue 1 was considered in Geha et al. [21] using the Illingworth approximation for the mass as M min (For an alternative interpretation for the origin of Segue 1, see Ref. [45]).
Jeans Equation -At the next level of detail from the dynamical perspective, an estimate for the mass of the dSphs may be obtained by appealing to the sphericallysymmetric jeans equation, assuming that the gravitating mass of the system consists of both stars and dark matter. The analysis here closely follows the treatment given in the appendix of Strigari et al. [13], and refers to this paper for further details. A standard discussion of the spherical jeans equations comes from Ref. [46].
The spherical jeans equation is Here ρ ⋆ is the de-projected stellar density profile, the circular velocity is V c (r) = GM/r, and the parameter β(r) = 1 − σ 2 r /σ 2 t characterizes the difference between the radial and tangential velocity dispersions of the stars. Integrating σ 2 r along the line-of-sight gives the velocity dispersion as a function of projected radius, R, Here, I(R) is the projected surface density of the stellar distribution, and ρ ⋆ is the three-dimensional stellar distribution. In Eq. (9), σ r depends on the parameterization of the mass distribution of the dark matter component. The stellar density profile is taken to be fixed; for example the measurements of the projected density profiles for many of the classical satellites come from Ref. [47], and more updated profiles from, e.g., Refs. [36,48,49,50], while measurements of the density profiles for the ultra-faint satellite come from Ref. [35]. It is important to note that fixing the stellar density profile may introduce a degeneracy in determining the projected velocity dispersion profile, particularly in the central regions [51]. However the effect on the integrated mass distributions as considered here is less severe, motivating the assumption of fixing ρ ⋆ , rather than marginalizing over it, in the analysis. Given the above assumption for the velocity anisotropy of the stars and for the shape of the dark matter profile for the galaxy, the likelihood function can now schematically be written as For compactness, the vector y = (v x , v y , v z , A, φ 0 ) has been defined, and β and Φ are vectors that describe the stellar velocity anisotropy and the gravitational potential of the system, respectively. The line-of-sight velocity dispersion is dependent on β and Φ through the spherical jeans equation. The mass of the system, as well as quantities related to the mass distribution, are determined via Φ, and thus by integrating out the model parameters one may determine the probability distribution for the mass of the system contained within a fixed physical radius.
Error Projections on Mass Distribution -Before performing an example calculation using Eq. 10, it is interesting to get an idea as to how the errors on the mass distribution depend on the physical radius within which the mass is determined. To perform these estimates, we again appeal to the Fisher matrix formalism outlined above. However the analysis here is different from above in that now the likelihood depends on the vector set of parameters β and Φ in addition to y.
The example considered here uses the velocity data sample from Fornax of Walker et al. [43], specifically the stars with > 90% c.l. for membership. This gives a total of 2409 Fornax members. The three-dimensional surface density profile for Fornax is assumed to take the form with the parameters {a, b, c, [r 0 /kpc], [r cut /kpc]} = {0.3, 1.2, 3.0, 0.8, 1.1}. A profile of this form with these parameters is consistent with the recent measurements of Fornax star counts [50], though generally the results presented are independent of the normalization of the surface density profile. The stellar mass-to-light ratio is assumed to be unity, consistent with the results presented in Ref. [50]. The dark matter density profile is assumed to be the einasto profile, and following CDM simulations, α = 0.17 [52]. The velocity anisotropy is assumed to be of the following form, Thus in the Fisher matrix calculation the base set of parameters are now given by θ = {ρ −2 , r −2 , β 0 , β 1 , r β } (the rotational and geometric parameters, y are ignored here: this is justified given that the Fornax data is consistent with A = 0 and that the y parameters do not correlate with the parameters that determine the mass). Given the base set of parameters in θ used to calculate F, the error on a derived parameter, g, is given by The derived parameter specifically considered here is the log of the mass within a given fixed physical radius (See Ref. [53] for another example where the derived parameter considered is the log slope of the dark matter density profile). Where desired Gaussian priors may be taken by simply adding 1/σ 2 aa to the aa component of the Fisher matrix. Figure 4 shows the error on the log of mass as a function of the physical radius within which the mass is measured. Here the fiducial baseline parameters for the velocity anisotropy have been taken as {β 0 , β 1 , r β /kpc} = {−0.5, 0, 0.2}, implying slightly tangential orbits in the central region of the halo and isotropic orbits at outer radii. Different combinations of {ρ −2 , r −2 } have been taken as indicated to represent the degeneracy between these two parameters when fitting the data. Each of these parameters sets, combined with an anisotropy model, produces a velocity dispersion profile that roughly fits the profile of Fornax. While the goal here is to not undertake a direct fit to the data and to explore the exact degeneracy space of these parameters, examining these three sets of fiducial parameters gives a feel for how the constraints on the mass depend on the fiducial parameter set. Priors on each of r −2 and r β are taken as 1/(5 kpc) 2 , while priors on β 0 and β 1 are taken as 1/1 2 . Each of these priors are motivated by the range of these parameters scanned in the algorithm described in the sub-section below. As is seen, for the stellar profile considered above and the fiducial set of parameters taken, the best-constrained mass is at a radius ∼ 0.6 − 1.0 kpc. This best constrained radius is found to be relatively weakly dependent on the sets of fiducial parameters, particularly near the best constrained mass, provided that they give a good fit to both the star count and velocity dispersion data of Fornax. For kinematic data sets that have been analyzed, the mass is a seen to be strongly-constrained at the approximate half-light radius, which is a general property of dispersion supported systems [54,55].  Fornax Mass Distribution -The probability for the mass distribution of Fornax is now determined directly from the kinematic data, and compared to the projected error on the mass distribution as determined from Fig. 4. As above, a metropolis hastings algorithm is used to determine the respective parameter distributions, and the same data for both the star counts and the line-of-sight velocity distribution have been used. In the parameter scan, uniform priors have been taken on each of the parameters over the following ranges as follows:  Figure 5 shows two example probability distributions for the Fornax mass, within 0.6 kpc (left) and within the approximate Fornax stellar tidal radius of 3 kpc (right). The probability distributions are seen to be slightly non-gaussian, particularly the M(3 kpc) distribution. Comparing the approximate width of each of these distributions with the errors projected in Fig. 4 provides generally good agreement, in spite of the intrinsic assumption in the Fisher matrix formalism that the errors on the parameters are Gaussian. Specifically for the left panel, a Gaussian fit gives log 10 [M (0.6 kpc)/M ⊙ ] = 7.47 ± 0.04. These results confirm the general trend seen in Fig. 4 that the error on the integrated mass within a fixed physical radius increases at larger radii towards outer regions of the halo.
Results of the calculations for the mass distributions of the entire population of dSphs are presented in Refs. [23,13,54,55]. These results, as well as more recent determinations, show that the central mass distributions for the dSphs are very similar, despite an over four order of magnitude variation in their luminosities. The average density within a spherical radius of ∼ 0.3 kpc is ∼ 0.1 M ⊙ pc −3 ; for the brightest satellites baryons can contribute to the potential in this central region, while for the least luminous satellites the potential is dominated by dark matter within this region. Within the context of spherical models, these constant central density results are robust to the specific parameterization of the mass distribution, primarily due to the fact that the integrated mass is directly constrained via the jeans equation and the approximately similar scale for the velocity dispersion profiles [13].
Model Selection -The likelihood formalism introduced above does not give any information regarding the optimal parameterization of the dark matter mass profile. For example referring to the calculation above, is the einasto profile with just two free parameters an acceptable description of the data? Given the parameterization of the dynamics via the spherical jeans equation, we can answer this question and determine how many parameters are required to describe the mass profile given the maximum likelihood formalism. Moreover, we can determine how the parameterization of the density profile depends on the given data sets. For example Segue 1, with only 24 measured line-of-sight velocities, may require a smaller set of parameters than does Fornax, which has ∼ 2400 measured line-of-sight velocities.
To specifically answer the question of how to determine the appropriate set of parameters in maximum likelihood theory one may appeal to the bayes evidence. For the purposes here the evidence, E, is defined as the integral of the likelihood in Eq. 10 over all of the model parameters. When comparing models, the ratio of their respective evidences gives an idea of how much more probable one model is over another. For example if 1 < ∆ ln E < 2.5, the different between the two models is substantial; for 2.5 < ∆ ln E < 5, the different between the two models is strong, and for ∆ ln E > 5 the difference between the two models is decisive [56].
As an illustration, four different dSphs that span a wide range in their respective number of velocities are considered: Segue 1, Sextans, Sculptor, and Fornax. These dSphs have 24, 424, 1352, and 2409 stars respectively; for the latter three galaxies we consider only those stars that have a probability of > 90% membership from the Walker et al. [43] sample. For each dSph we determine how many model parameters are necessary to describe the data, and we consider several different models.
For the "Baseline" 3 parameter model, the following range of parameter space is integrated over for Fornax, Sculptor, and Sextans: log 10 [ρ 0 /M ⊙ kpc −3 ] = [6 : 10], log 10 [r 0 /kpc] = [−1 : 1]. The velocity anisotropy is assumed to be a constant, β, with a range given by β ≡ β 0 = β 1 = [−2 : 0.5]. For Segue 1 the ranges are the same except for the scale radius, which is taken to vary over the range log 10 [r 0 /kpc] = [−2 : 0]; this range is motivated by the likely upper limit to the dark matter tidal radius for Segue 1 [21]. All of the ranges above are chosen as plausible values to describe the halos of dSphs. A flat prior is chosen over these regions; as a further detail one may chose a prior that weights each of these parameters differently, for by example considering the scatter in the ρ s − r s relation as seen in CDM simulations [57]. The bayes evidence for the Baseline model will be denoted as E 0 .
Three different models are compared to this Baseline 3 parameter model: i) a model in which the parameter space for α is enlarged in the range [0.14 : 0.3] (corresponding to 1/α ≃ [3 − 7]), so that central slopes that are both more flat and more steep than the CDM value are allowed; ii) a model with the Baseline 3-parameter ρ(r) profile, but with a three-parameter velocity anisotropy profile which depends on the three-dimensional physical radius as in Eq. 13 and iii) a model in which α = [0.14 : 0.3] and the β(r) profile in Eq. 13 is assumed.
Model i) is thus described by four parameters, while model ii) is described by five free parameters, and model iii) is described by six free parameters. In Table 1, we define model i) as the "Exp" model, model ii) is denoted as the " β" model, and model iii) is denoted as " β+ EXP". These models provide useful illustrations of the calculation of the evidence as applied to dSphs; alternative models may of course be defined and even larger parameter spaces may be explored. The utility of the above models as defined allows us to explore to what extent CDM-like inner slopes are more favored, and to what extent an alternative parameterization of the velocity anisotropy provides a better fit to the data as compared to simply changing the value of the inner slope.
The results for the ratio of the bayes evidence for the various models, relative to E 0 , are shown in Table 1. For each of the galaxies, we see roughly the same pattern; as more parameters are added, the better that the model fits the data. This result implies that models with larger sets of parameters are favored even after penalization for the larger volume of parameter space that is integrated over. Allowing for a larger volume of parameter space for the dark matter density profile affects the evidence more than simply varying the shape of the anisotropy profile. In total, the best fitting models are those that allow both the velocity anisotropy and the central slope to vary freely, i.e. the β+ EXP models.
The results in Table 1 indicate that for all four galaxies variable velocity anisotropies are slightly preferred relative to those with constant velocity anisotropy, and that central dark matter profiles both less cuspy and more cuspy than ΛCDM based fits are equally acceptable. Future data sets, both line-of-sight velocities and potential proper motion measurements for stars in dSphs [58,53,59], will be important in narrowing the acceptable ranges for both the velocity anisotropy and the central slope.

Conclusion
This article has discussed the analysis of kinematic data from Milky Way dwarf spheroidals, with a primary motivation of 1) understanding physical quantities that are well-constrained by the data and 2) understanding the systematics that underly the determination of the dark matter masses of these systems, given the simplest assumption that the dSphs are purely pressure supported systems. Of the possible systematics perhaps the most significant and observationally-accessible is the determination of a velocity gradient in the data sample, which may be indicative of tidal disruption from the potential well of the Milky Way. The results in the literature indicate that, based on the kinematic data alone, velocity gradients due to tidal disruption or rotation are not conclusively present in any of the dSphs. This article has provided an example, using a simple parameterization, of how to search for rotation in the kinematic data sets using a maximum likelihood analysis. The kinematic sample of Sculptor was analyzed, and it was found that the maximum likelihood rotational amplitude is zero, with an upper limit of ∼ 2 km s −1 at 90% c.l. The magnitude of these errors are consistent with the projected magnitude of the errors from theoretical modeling.
When modeling the mass distribution of the dark matter halos of the dSphs, degeneracies between model parameters affect the determination of the total mass profiles, even in the context of the simplest spherical models. To shed light on these degeneracies, this article has discussed a new criteria for model selection applied to the dSph kinematic data sets, taking a step towards determining how many parameters are needed to describe the mass distribution of spherical halos. For the four dSphs studied here, chosen because they have a wide range of available line-of-sight velocities, it is shown that, assuming CDM-motivated Einasto profiles for the dark matter halos, models with variable velocity anisotropy are slightly preferred relative to those with constant velocity anisotropy. Further, central slopes for the dark matter profile that are found in CDM simulations are not a unique description of the data sets; both more cuspy and less cuspy models are allowed for the central slope. This is primarily due to the degeneracy between the central dark matter slope with the central stellar profile and the velocity anisotropy distribution [53].
Future photometric and kinematic data sets promise to further pin down the mass distributions of the dSph dark matter halos. Upcoming data for the ultra-faint satellites will be particularly important, and may be able to show whether any tidal effects are present in these galaxies. Further, development of non-spherical distributions for both the light and dark matter should be considered given these data sets (for initial results along these lines see Ref [60]). Controlling systematics in these data sets will prove to be important step towards further testing the currently favored ΛCDM theory of structure formation.