Genetic Diversity of Sheep Breeds from Albania, Greece, and Italy Assessed by Mitochondrial DNA and Nuclear Polymorphisms (SNPs)

We employed mtDNA and nuclear SNPs to investigate the genetic diversity of sheep breeds of three countries of the Mediterranean basin: Albania, Greece, and Italy. In total, 154 unique mtDNA haplotypes were detected by means of D-loop sequence analysis. The major nucleotide diversity was observed in Albania. We identified haplogroups, A, B, and C in Albanian and Greek samples, while Italian individuals clustered in groups A and B. In general, the data show a pattern reflecting old migrations that occurred in postneolithic and historical times. PCA analysis on SNP data differentiated breeds with good correspondence to geographical locations. This could reflect geographical isolation, selection operated by local sheep farmers, and different flock management and breed admixture that occurred in the last centuries.

this problem. Nuclear genome evolves five-to-ten times slower than mtDNA; it is contributed by both parents and its variability is less affected by demographic forces such as bottleneck. Therefore, nuclear markers can detect more recent genetic events that influence the extant divergence of domestic breeds. Several studies have demonstrated that the combination of nuclear and mtDNA markers can increase the information obtained [27][28][29][30]. The use of both markers might provide a more accurate and comprehensive understanding of a species' history [31]. SNP markers could help in understanding the recent evolutionary history of domestic animals [10,32]. We aimed at investigating the geographic distribution of the genetic diversity of sheep breeds in Albania, Greece, and Italy and to gather information on the migration history of the species. To accomplish that, we employed sequence data from the mitochondrial D-loop and 27 nuclear loci (SNPs).

Sampling and DNA Extraction
We focused on sheep breeds of Albania, Greece, and Italy. Samples of the European mouflon were also included. About twenty unrelated samples per breed were selected. Three animals per flock from 11 farms spread over the traditional rearing area were sampled. A total of 313 animals from 18 sheep breeds were analyzed. Breeds, acronyms used, and country of origin of each breed are reported in Table 1. Part of the samples were obtained from a previous project (Econogene, http://www.econogene.eu/). Blood samples were collected in EDTA tubes and frozen at −20 • C until extraction. Genomic DNA was isolated using standard procedures, checked for DNA quality on agarose gel and quantified using a DTX microplate reader (Beckman Coulter) after staining with PicoGreen (Invitrogen).

Amplification and Sequencing of the Mitochondrial D-Loop
To amplify the partial D-loop of 721 bp, primers described by Tapio et al. [7] were used from 15,541 to 16,261 of the complete sequence described by Hiendleder et al. [33] available in GenBank (NC 001941.1).
Polymerase chain reaction (PCR) was performed in a total volume of 50 μL containing 20 ng of genomic DNA, 40 pMol of each primer (Sigma-Aldrich), 200 μM dNTPs, 5X PCR buffer, and 5 units of Taq DNA polymerase (Promega) on a PCR Thermo Cycler (MJ Research). A 5 minutes denaturation step at 95 • C was followed by 14 cycles of denaturation at 95 • C for 30 sec, annealing for 30 sec starting at 62 • C and decreasing 0.5 • C per cycle and extension at 72 • C for 120 sec, then by 20 cycles of denaturation at 94 • C for 30 sec, annealing at 55 • C for 30 sec and extension at 72 • C for 120 sec; the final extension step was carried out at 72 • C for 5 minutes. PCR products were purified through ExoSap-IT (USB Corporation) to remove residual primers and dNTPs and used as templates for forward and reverse sequencing reactions.
Sequencing was performed using the primers described by Tapio et al. [7] with a CEQ 8800 sequencer using DTCS QuickStart Kit and purifying with Agencourt CleanSEQ 96 (Beckman Coulter), according to the manufacturer's instructions. After the optimization of the sequencing protocol, sequencing was outsourced to Macrogen (http://www.macrogen.com/). The sequences of D-loop were submitted to GenBank (accession numbers: JN184789-JN184999).
Geographic distribution of eigenvectors was performed to investigate population genetic differences on the basis of their geographic distances. This approach permitted the generation of a synthetic configuration of locations based on the pairwise genetic distances that matched the real geographic configuration. Principal component analysis (PCA) scores for the first two components, obtained using Nei's 1973 genetic distance, were plotted on a geographic map. As breeds are scattered among several farms, a virtual geographic entity representing the centroid of each breed on geographic maps was created using WGS84 geographical coordinates [38]. For a given component, it is a measure of the variance accounted for by that component. On thematic maps produced with the geographic information system (GIS) Manifold software package (Manifold System, Version 7, Manifold Net Ltd., Carson City, USA, http://www.manifold.net/), all breeds are thus represented according to a geometric distribution (see Figures 3(a) and 3(b)). Breeds showing high eigenvectors contribute sensibly to the explanation of the variance related to the component displayed. Classes were elaborated on the basis of the criterion of the natural breaks (Jenks optimization method). This algorithm reduces the variance within classes and maximizes the variance between classes. Colour classes were chosen in order to support the distinction between the different categories of behaviours observed: green: positive contribution; yellow: intermediary values; red: negative contribution to the component displayed.

Nuclear Polymorphism Analysis
The same 313 sheep belonging to 18 breeds sequenced at D-loop were genotyped with 37 previously described SNPs [39]. SNP ascertainment bias was minimised by sequencing target DNA in at least 8 individuals from different populations. Large-scale genotyping of all animals was performed by outsourcing to a commercial genotyping company (http://www.Kbioscience.co.uk/). Allele frequencies, Nei's estimation of observed and expected heterozygosities (Ho and He, resp.), were calculated using Fstat 2.93 [40]. Weir and Cockerham's [41] estimates of F is per population, F st per locus, and population pairs were calculated for each locus using Genalex 4.0 [42]. The same software was used to test deviations from Hardy-Weinberg equilibrium (HWE) for each locus and population and for locus over all populations; test for conformity with HWE expectations was assessed by calculating the Chi-squared value.
A PCA was performed on the covariance matrix of SNP frequency data to investigate spatial patterns of genetic variation using GENETIX software [43].
Geographic distribution of eigenvectors was performed as described above using pairwise genetic distances [47] calculated on the basis of the selected SNP markers.

Mitochondrial Haplotypes
Ninety-three polymorphic sites and 154 haplotypes were identified from 313 sequences. Relatively high haplotype diversity was found in all three sampled geographic regions; the largest nucleotide diversity is present in Albania (0.02107) while the highest number of haplotypes observed is recorded in Greece (83) ( Table 2).
The average number of nucleotide differences and the average number of nucleotide substitutions per site were used to calculate the genetic distance between breeds. The lowest distance was observed between Laticauda and Anogeiano (D: 2.357-Dxy: 0.006), while the highest distance was observed between Bardhoka and Kymi (D: 12.450-Dxy: 0.03) ( Table 3).
AMOVA revealed that mitochondrial diversity is mainly distributed within breeds (95.04%) and only in part among regions (0.90%); low variability was also found among breeds/within regions (4.06%) ( Table 4).

Phylogenetic Analysis and Haplogroups
The NJ tree obtained from mtDNA haplotypes and wild sheep sequences, used as out-group, revealed three of the five haplogroups described in the literature: A, B, and C ( Figure 1). Haplogroup B is the most frequent among the analyzed samples (89%), while A and C are less common (8% and 3%, resp.).
Greek and Albanian breeds are present in all three haplogroups, while Italian breeds are present only in haplogroups B and A (Table 5). This is shown also in Figure 2, representing the percentage of occurrence of each haplogroup in Albania, Greece, and Italy.

SNP Analysis
A total of 37 SNPs identified as polymorphic on eighteen sheep breeds selected throughout Europe [39] were applied to genotype 313 individuals of three Albanian, ten Greek, and five Italian sheep breeds. After removing those found monomorphic in the selected breeds, 27 SNPs were used for the analysis.
The frequencies of the major alleles ranged from 0.99 for the locus LEP1 to 0.538 for the locus IL2 1. Except for CAST 1, LEP1, LEP2, GDF8, and PRNP 1, which show frequencies of the rare alleles of 0.035,  Table 6). The distances range from 0.022 (Ruda-Karagouniko) to 0.253 (Sfakia-Gentile di Puglia) using Reynolds' distances and from 0.008 (Ruda-Karagouniko) to 0.096 (Gentile di Puglia-Anogeiano) using Nei's distances. Both indices indicate Ruda and Karagouniko as the breeds with the minimum pairwise distance and show the maximum distances between the Italian breed Gentile di Puglia and Anogeiano, Karagouniko, Kefalleneas, Kymi, Lesvos, Pilioritiko, Sfakia, Skopelos, and Bergamasca breeds. The Mantel test showed correspondence between geographic and genetic distances with a P = 0.04.  [44], below, and Reynolds' [45], above, genetic distances. Breed codes are as in Table 1. BAR

PCA on SNPs
Genetic relationships were also explored by means of PCA. To examine the overall pattern of population differentiation, we considered the first three axes, which cumulatively explained 48.87% of the total inertia contained in the data set ( Figure 4). From PCA, it can be seen that the some breeds are quite differentiated, with good correspondence to geographical locations, even if SNPs were few (4 of the 5 Italian breeds are well separated from the main cluster). Particularly, a differentiation between northern Italian (Bergamasca and Delle Langhe) and southern Italian (Gentile di Puglia, Altamurana, and Laticauda) breeds can be seen. The projection of loci in the space formed by the first three principal components (data not shown) shows that the differentiation of outlying breeds is caused by a small number of SNPs: the Delle Langhe population is mainly affected by alleles in the MSTN gene along the second component.; the Skopelos breed position in the graph is affected by alleles in the PRP gene and the Laticauda breed (lesser) differentiation is mainly due to alleles in the CALPA and LEP genes. PCA scores calculated on mtDNA marker for the first 2 components were plotted on a geographic map, using the centroids of the sampling area of each breed (Figures 3(a) and 3(b)). The highest eigenvector contribution (coloured in green) was observed for Albanian and Greek breeds, as expected, even if four breeds show unexpectedly low diversity.
As for mtDNA analyses, the results of the PCA based on SNPs data were also used for making inferences about population genetic differences on the basis of their geographic distances. PCA scores for the first two principal components were plotted on a geographic map, using the centroids of the sampling area of each breed (Figures 3(c) and 3(d)). The line separating the map in two regions shows the isoline for an eigenvalue of 0. RUDA breed is an exception showing an eigenvalue above 0 but located in an area where all other breeds show lower eigenvalues. In accordance with the domestication history of the species, genetic diversity was higher in south-eastern populations than in north-western populations. The first component of Figure 3(c), in fact, shows a regular loss of genetic diversity towards North West.

DISCUSSION
Agriculture arose mainly within the distribution range of the wild ancestors of the most valuable livestock, such as the Fertile Crescent of southwest Asia, where early farmers were able to outcompete local huntergatherers. Once livestock slowly spread northwest across Europe, farming also shifted northwest from the Fertile Crescent to areas where farming had never arisen independently-first to Greece, then to Italy, and finally to northwest Europe [50]. Therefore, today the most productive farming zones do not correspond to those most productive in the past. Then, founder effect, genetic drift, and natural or artificial selection led to the formation of distinct breeds or varieties [51].
We therefore focused on the analysis of sheep of three countries aligned on this route, to evidence signs of migration. Geographical isolation, natural and artificial selection for physical or productive characters, genetic drift, mutations, and interpopulation gene flows have altered gene frequencies over many generations. The genetic diversity within and across breeds and species forms the basis of our current animal genetic resources for food production and other purposes.
The nature of the markers used for the analysis can affect the detection of geographical structuring, as suggested by Naderi et al. [52]. In fact, mtDNA informativeness is limited because it does not detect male-mediated gene flow and does not predict the nuclear genomic diversity [53]. Moreover, results may be affected by phenomena such as homoplasy, incomplete lineage sorting, effective population sizes, and sex-biased dispersal [27]. By combining markers with different modes of inheritance and rates of evolution this bias can be minimized [54].
Our mtDNA analysis shows higher levels of sheep nucleotide diversity in the South-East, which is congruent with data reported in the literature [22] and congruent with the proximity to the domestication centre. This is confirmed by eigenvector analysis, which showed high contribution to variance by Albanian and Greek breeds, even if four breeds show unexpectedly low diversity. However, this behaviour can be explained by recent isolation or selection for some traits that reduced the overall genetic diversity through bottleneck. Very high haplotypes diversity was found in all three regions analysed (greater than 0.9), in agreement with previously published works on Portuguese breeds [9], Indian breeds [55], and Balkan sheep [56]. The major mitochondrial variation is distributed within breeds (95.04%), while it is lower among regions (0.90%) and among breeds within regions (4.06%). Phylogenetic methods were employed to examine the evolutionary history of the 18 breeds. Neighbour-joining and median-joining network revealed three of the described haplogroups, A, B, and C. The mouflon shares a haplotype with domestic sheep, as previously reported by Hiendleder et al. [33].
The SNP analysis revealed a rare allele frequency <5% for LEP1 and LEP2 loci, in agreement with those observed on a different European breed panel by Pariset et al. [39]. Observed and expected mean heterozygosity also showed similar values to those reported in the same paper. Expected heterozygosity values, which can indicate response to selection, are higher than observed heterozygosity values (Hs 0.063, 0.07, and 0.042; Ho 0.052, 0.06, and 0.038 in CALPA, PRNP-1, and GDF8, resp.).
Among the breeds tested, Altamurana showed the highest F is value suggesting the inbreeding in this population.
Regarding the phylogeographic structure we found that the 95.2% of variation occurred within breeds indicating the weak phylogeographic structure in sheep. These data are consistent with those previously published by Kijas et al. using a different SNP panel [17]. Sheep generally do not have a strong geographic structure and show a high genetic variability within breeds.
Anyway, Mantel test analysis using SNPs revealed a correlation between genetic and geographic distance. The possibility to assess the presence of a geographic component in genetic diversity using SNPs was already reported in previous studies on sheep [39] and goats [10,57,58].
In the PCA, the breeds appear differentiated with 48.9% of the variance explained by the first three principal components. Also this analysis shows a good correspondence to geographical locations: the breeds remaining separated by the main group are all Italian. PCA supports therefore a westward route to Italy that could indicate that transport of animals made by sea as already proposed for cattle [11,12] and goats [8,59]. This is particularly plausible because small sized species as sheep are easy to transport during migration   and commercial trade and can adapt easily to various environments [53,60,61]. A similar decrease in genetic diversity as well as an increase in the level of differentiation at the breed level from South-East to North-West in European sheep breeds, supporting the hypothesis of livestock migration from the Middle East towards western and northern Europe, was found by Peter et al. [16] and Lawson Handley et al. [15], using other nuclear markers. The formulation of the modern breed concept during mid-1800s has caused remarkable changes in the livestock sector: large-scale production expanded and its application to breeding and husbandry practices led to the formation of well-defined breeds that were exposed to intense anthropogenic selection. The differentiation of three breeds observed using PCA analysis could be related to a recent selection, which appears to be linked to CALPA (Laticauda), PRNP 1 (Skopelos), and GDF8 (Delle Langhe) SNP markers. Gentile di Puglia breed seems influenced by both CALPA and GDF8. In particular, these two genes have an effect on conformation and therefore are an easy target of selection. Other SNPs related to meat traits were found potentially under the effect of selection and apparently not associated with production attitude of the breeds [58].

CONCLUSIONS
We employed mtDNA and nuclear SNPs to investigate the genetic diversity of sheep breeds of three countries of the Mediterranean basin: Albania, Greece, and Italy. Our results showed significant genetic differentiation among the sheep breeds, supported by mtDNA and by SNP. The differentiation identified by nuclear markers could indicate a reduced gene flow due to geographical isolation, associated with different flocks management, or an effect of the introduction of different stocks centuries ago (cf. Figure 3(d), showing the 2nd dimension geographic distribution of eigenvectors). In general, D-loop sequence analysis shows a pattern reflecting migrations that occurred in postneolithic and historical times, with the most divergent mtDNA lineage occurring in the southern breeds, as shown in Figure 2 and Table 5. PCA on SNP data differentiated breeds with good correspondence to geographical locations. It is interesting that the correlation between genetic and geographic distances revealed using nuclear SNPs was not confirmed by mtDNA, for which Mantel test was not significant. Our results seem to indicate a better correlation between geographic distances and autosomal markers.