The aim of this study was to identify single nucleotide polymorphisms (SNPs) that could be associated with back fat thickness (BFT) in pigs. To achieve this goal, we evaluated the potential and limits of an experimental design that combined several methodologies. DNA samples from two groups of Italian Large White pigs with divergent estimating breeding value (EBV) for BFT were separately pooled and sequenced, after preparation of reduced representation libraries (RRLs), on the Ion Torrent technology. Taking advantage from SNAPE for SNPs calling in sequenced DNA pools, 39,165 SNPs were identified; 1/4 of them were novel variants not reported in dbSNP. Combining sequencing data with Illumina PorcineSNP60 BeadChip genotyping results on the same animals, 661 genomic positions overlapped with a good approximation of minor allele frequency estimation. A total of 54 SNPs showing enriched alleles in one or in the other RRLs might be potential markers associated with BFT. Some of these SNPs were close to genes involved in obesity related phenotypes.
The pig (
To understand the biological mechanisms affecting BFT in pigs, we recently carried out several studies to elucidate the genetic factors involved in the definition of this trait and to obtain a systems biology comparative picture of human and pig obesity related traits [
Taking advantage from the sequenced genome of the pig and its reference assembly (Sscrofa10.2) [
In this study, with the final aim to identify SNPs that could be useful to evaluate the peculiarities of the Italian Large White heavy pig breed and explain, at least in part, the missed genetic variability for the BFT trait not completely captured by our previous association works, we tested the potential and limits of an experimental design in which we combined the Ion Torrent sequencing technology to sequence RRLs. Reduced representation libraries were obtained by enzymatically digest DNA pools constructed from divergent Italian Large White pigs with extreme estimated breeding value (EBV) for BFT. In addition, we used Illumina PorcineSNP60 BeadChip genotyping data already generated from the same animals to obtain a comparative analysis and validation of the sequencing information.
A subset of the Italian Large White pigs that were previously used in a GWA study, carried out to identify markers associated with BFT EBV [
Genomic DNA was extracted from blood using the Wizard Genomic DNA Purification kit (Promega Corporation, Madison, WI, USA). Extracted DNA was quantified using a NanoPhotometer P-330 instrument (Implen GmbH, München, Germany) and pooled at equimolar concentration to constitute two DNA pools, one including DNA from the 50 Italian Large White pigs with the lowest BFT EBV and a second including DNA from the 50 Italian Large White pigs with the highest BFT EBV.
The investigated animals were previously genotyped with the Illumina PorcineSNP60 BeadChip (Illumina Inc., San Diego, CA, USA), interrogating 62,163 SNPs [
Ten micrograms of DNA from each of the two pools were digested overnight with 50 U of
Sequencing of the two RRLs was obtained using 200 ng of DNA that was purified by agarose gel electrophoresis as described above, enzymatically sheared, end-repaired, and adapter-ligated using the Ion Xpress Plus Fragment Library Kit (Life Technologies). Obtained DNA material was size-selected using the e-gel system (Invitrogen, Carlsbad, CA, USA) and bands corresponding to 100 bp of inserts were collected and quantified by qPCR using a StepOnePlus Real-Time PCR System (Life Technologies). Selected fragments were clonally amplified, purified, and sequenced using the Ion One Touch 100 Template Kit and the Ion PGM Sequencing Kit with two Ion 318 chips (Life Technologies), for the two RRLs.
Obtained sequencing reads were filtered and trimmed using the Ion Torrent suite v.2.2 (Life Technologies) which (i) eliminated polyclonal sequences and sequences of low quality and (ii) trimmed adapters and low quality 3′-ends. Then data were inspected with FastQC v.0.11.22 [
In order to evaluate differences in allele frequency derived by the number of alternative reads between the two RRLs, Fisher’s exact test was computed for each alternative genomic position covered by a minimum depth of 3x. All the positions with
A total of 3,390,796 and 3,731,776 sequenced reads were obtained from the two RRLs produced using the positive and negative BFT EBV DNA pools, respectively (Table
Summary of sequencing data obtained from the two reduced representation libraries (RRLs) of the positive (Pos_
Information1 | Pos_ |
Neg_ |
Pos + Neg |
---|---|---|---|
Sequenced reads | 3,581,496 | 3,887,066 | 7,468,562 |
Reads after preprocessing | 3,390,796 | 3,731,776 | 7,122,572 |
Removed duplicates | 698,191 | 845,961 | 1,544,152 |
Mapped reads (Qm > 20; Rdup) | 1,449,838 | 1,476,125 | 2,925,963 |
Sequenced bases (Qm > 20; Rdup) | 137,429,598 | 145,859,611 | 256,880,473 |
Mean and max depth of coverage (Qm > 20; Rdup) | 1.18; 209 | 1.16; 217 | 1.29; 426 |
Sequenced bases (Qm > 20; RD ≥ 3; Rdup) | 3,394,898 | 3,057,171 | 3,942,266 |
Sequenced bases retained by SNAPE (Qm > 20; RD ≥ 3; Rdup) | 3,369,555 | 3,034,731 | 237,969 (in common) |
SNPs (Qm > 20; RD ≥ 3; Rdup) | 10,694 | 10,339 | 39,165 |
Using sequencing data, a total of 39,165 putative SNPs were called with high confidence by SNAPE [
Summary of the SNP annotation results obtained using the variant effect predictor (VEP) tool.
Gene position or SNP effect | Number of SNPs |
---|---|
3 prime UTR variant | 203 |
3 prime UTR variant, NMD transcript variant | 1 |
5 prime UTR variant | 58 |
Downstream gene variant | 2710 |
Intergenic variant | 24414 |
Intron variant | 12591 |
Intron variant, NMD transcript variant | 126 |
Intron variant, noncoding transcript variant | 306 |
Missense variant | 159 |
Missense variant, splice region variant | 8 |
Noncoding transcript exon variant, noncoding transcript variant | 29 |
Splice acceptor variant | 2 |
Splice donor variant | 1 |
Splice region variant, 3 prime UTR variant | 1 |
Splice region variant, intron variant | 25 |
Splice region variant, synonymous variant | 12 |
Stop gained | 2 |
Stop lost | 1 |
Stop retained variant | 1 |
Synonymous variant | 217 |
Synonymous variant, NMD transcript variant | 3 |
Upstream gene variant | 2675 |
Total |
43545 |
To validate some of the called SNPs we took advantage from the Illumina PorcineSNP60 BeadChip genotyping data obtained on the same animals used to construct the two RRLs. Considering SNP positions covered by a minimum of three reads, 661 out of 62,163 SNPs of the chip (1.1%) were identified from the 13,596,939 sequenced positions (0.45% of the porcine genome). SNAPE analysis over these positions reported that (i) 3 positions were discarded and 8 had read depth < 3 (for further features of SNAPE in addition to the general criteria adopted), (ii) 257 were identified as SNPs (152 polymorphic SNPs carrying two alleles while 105 SNPs were monomorphic for an alternative form from that of the reference genome), and (iii) 375 positions showed only the sequence of the reference genome.
Of the overlapping 653 positions (661 – 8 = 653), (i) for 28 of them the chip genotype data of the individual pigs were not possible to retrieve (probably due to problems in the design of the chip probes that could prevent the genotyping) and (ii) for 63 DNA positions having all individuals homozygous for only one genotype 59 of these base positions matched with the genotype inferred by NGS, whereas 2 were called as heterozygous and 2 were called as homozygous for a noncomplementary nucleotide by sequencing data (Table S3). If we go into more details for the 28 SNPs that failed to report reliable genotyping data from the PorcineSNP60 BeadChip, for 12 out of 28 both alleles were present in the NGS reads; 15 out of 28 showed only one allele and one was an erroneous SNP.
In addition to these overlaps between NGS sequencing and genotyping data, we wanted to evaluate if the estimated allele frequencies derived by NGS in RRLs obtained from DNA pools could match the true allele frequencies at the same positions obtained by using the PorcineSNP60 BeadChip. Starting from 559 SNPs (derived by the subsequent filtering steps of the 661 SNPs reported above), 262 (145 called SNPs by SNAPE) had the same type of substitution. Excluding the transversions GC
Summary of regression analysis between allele frequency estimated by Ion Torrent sequencing and the allele frequency obtained by genotyping with the Illumina PorcineSNP60 BeadChip.
RD | Polymorphic sites | Polymorphic and monomorphic sites | ||
---|---|---|---|---|
|
Positions |
|
Positions | |
|
0.1199 | 258 | 0.6882 | 317 (258 + 59) |
|
0.1601 | 99 | 0.6399 | 119 (99 + 20) |
|
0.1611 | 36 | 0.5868 | 41 (36 + 5) |
|
0.3866 | 11 | 0.7006 | 13 (11 + 2) |
RD = read depth;
Scatter plot of allele frequency estimated by Ion Torrent sequencing data for SNPs called by at least 6 reads (allele frequency NGS) and obtained by genotyping data (MAF genotyping) for the same SNPs.
For each of the two initial pileups we filtered out genomic positions having depth < 3x and then we used SNAPE to extract the allele frequency of each genomic position taking the advantage of the filters implemented in it. Polymorphic positions were compared among the 237,969 positions that were in common between the two RRLs (Table
In order to evaluate if the 54 SNPs that showed differences in number of alternative reads between the two RRLs were located in chromosome regions associated with BFT in Italian Large White pigs (listed in Table S4), we compared their positions on the basis of our previous GWA study carried out in the same breed [
Overlapping results between the SNPs associated with back fat thickness as identified with the Ion Torrent sequencing data (
Chr. | Marker | PosM | PGWAS |
|
|
---|---|---|---|---|---|
1 | ALGA0000009 | 52,297 |
|
68,514 |
|
1 | ALGA0000014 | 79,763 |
|
68,514 |
|
6 | M1GA0008302 | 787,265 |
|
873,061 |
|
6 | M1GA0008318 | 945,991 |
|
873,061 |
|
6 | M1GA0008329 | 996,248 |
|
873,061 |
|
9 | DRGA0009307 | 17,138,159 |
|
16,885,924 |
|
12 | DIAS0000309 | 48,865,200 |
|
48,937,212 |
|
Chr. = chromosome; marker = marker in the Illumina PorcineSNP60 BeadChip; PosM = nucleotide position of the marker on the Sscrofa10.2 reference genome;
Next generation sequencing is changing the way to identify markers associated with production traits in livestock species. Several applications and strategies have been designed mainly using Illumina platforms (i.e., [
Reduced representation libraries were generated as a simple strategy to reduce the complexity of mammalian genomes and to obtain information from a small part of it that can be sampled after restriction fragment digestion [
Among the 159 SNPs causing missense mutations, 37 were predicted to affect the function of the encoded protein (Table S2). These polymorphisms will be prioritized to evaluate their association with several production traits together with SNPs whose alleles were differentially enriched in the two RRLs (Table
Several methodological approaches were tested in this study for the first time: (i) partial sequencing obtained with Ion Torrent technology of the pig genome from DNA pools by using RRLs; (ii) the application of SNP calling and MAF estimation on Ion Torrent low coverage sequencing data from DNA pools; (iii) the validation of SNP called in DNA pools using individual genotyping data from the same animals of the pools; (iv) the possibility to identify enriched alleles in the two sequenced RRLs representing two extremes for important phenotypes (BFT). All these approaches were implemented in a case study that tried to identify additional markers associated with BFT in the Italian Large White pig breed. The purpose was to set up a strategy that could reduce as much as possible the sequencing cost and that could produce data useful to identify novel markers for the targeted trait. Association studies will be carried out to evaluate the effects of the 54 selected markers.
Ion Torrent can be successfully applied for SNP discovery even if its limited throughput reduced the possibilities to obtain reliable allele frequencies in the two DNA pools. Other reductionist approaches, like genotyping by sequencing or genotyping by genome reducing and sequencing [
The authors declare that they have no financial and personal relationships with other people or organizations that can inappropriately influence their work.
Samuele Bovo, Francesca Bertolini, and Giuseppina Schiavo contributed equally to this work.
The authors thank ANAS for providing data and samples, Sara De Fanti (BiGEA Department) and Emilio Scotti (DISTAL) of the University of Bologna for technical assistance, members of the Centre for Genome Biology for their support, and Rita Casadio and Pier Luigi Martelli (Biocomputing Group, University of Bologna) for their advises on data analysis. This study was supported by Italian MiPAAF (INNOVAGEN Project) and AGER-HEPIGET (Grant no. 2011-0279) funds.