Genome-Wide Association Study towards Genomic Predictive Power for High Production and Quality of Milk in American Alpine Goats

This paper reports an exploratory study based on quantitative genomic analysis in dairy traits of American Alpine goats. The dairy traits are quality-determining components in goat milk, cheese, ice cream, etc. Alpine goat phenotypes for quality components have been routinely recorded for many years and deposited in the Council on Dairy Cattle Breeding (CDCB) repository. The data collected were used to conduct an exploratory genome-wide association study (GWAS) from 72 female Alpine goats originating from locations throughout the U.S. Genotypes were identified with the Illumina Goat 50K single-nucleotide polymorphisms (SNP) BeadChip. The analysis used a polygenic model where the dropping criterion was a call rate ≥ 0.95. The initial dataset was composed of ~60,000 rows of SNPs and 21 columns of phenotypic traits and composed of 53,384 scaffolds containing other informative data points used for genomic predictive power. Phenotypic association with the 50K BeadChip revealed 26,074 reads of candidate genes. These candidate genes segregated as separate novel SNPs and were identified as statistically significant regions for genome and chromosome level trait associations. Candidate genes associated differently for each of the following phenotypic traits: test day milk yield (13,469 candidate genes), test day protein yield (25,690 candidate genes), test day fat yield (25,690 candidate genes), percentage protein (25,690 candidate genes), percentage fat (25,690 candidate genes), and percentage lactose content (25,690 candidate genes). The outcome of this study supports elucidation of novel genes that are important for livestock species in association to key phenotypic traits. Validation towards the development of marker-based selection that provides precision breeding methods will thereby increase the breeding value.


Introduction
The use of goats to convert poorly digestible fiber into highquality meat and milk has been a mainstay since 9,000 YBP [1]. Goats provide milk to humans more than any other dairy animal [2]. Specialization in milk production has made conversion of fiber to useable human nutrients very profitable [1]. Goat milk is generally consumed at a much larger margin than cow milk worldwide. Dairy goats are comprised of 380,000 as of January 2018 [3].
Global demand for milk consumption per capita is increasing due to both population growth and consumer preferences [4]. Genome-wide association analysis exists for other milk-producing ruminants [4]. Complex genetic and metabolic networks are responsible for regulating lactation physiology [5][6][7] that differ between goat milk and cow milk according to the Nutritional Composition of Goat Milk Products in the U.S. [8] based on its fat, lactose, and fatty acid composition. This suggests novel differences exist in the regulatory networks in caprines [7].
The secretory mechanism of lactation [7,9] is well described, and current models exist for the regulation of fat, protein, and lactose synthesis in goat mammary tissue. The chromosomal loci and the resultant expression of microRNA by the lactating mammary glands have been proposed [10]. These discoveries using high throughput genotyping and sequencing tools generate large quantities of data with moderate time and money [10].
GWAS have benefitted from genotyping technologies like that of the high-density Goat 50K SNP BeadChip [10]. However, identification of causative mutations related to American goat milk yield and other American goat milk traits is lacking [4]. Whole genome level sequence variants used in association analysis provide a clearer understanding for biological mechanisms associated with each quantitative trait loci (QTL) [4]. To improve genomic selection, we employed a combination with current bioinformatics software, such as PLINK [11] in combination with R [12], Qlick [13], and analysis using SNP & Variation Suite v8.8.1 [14].
Current genotyping arrays can capture small fractions of low frequency variants for individual goats. New or custom genotyping array developments provide avenues to utilize high priority variants with GWAS [15]. Custom chips include both common variants selected to replicate the original GWAS signals and a selection of implicated regions for relevant traits by GWAS [15]. Although second array-based technologies are limited in the range of loci they can target, cost-effective, fine-mapped, low frequency alleles may support the construction of chips that foster improved statistical power detecting variants [15].
The GoatSNP50 BeadChip was used to determine genetic diversity of Boer, Cashmere, and Rangeland goats [16]. The success of more recent genomic studies in the dairy goat industry suggests that successful implementation of low-panel SNP assays could revolutionize goat breeding and production in the United States, as it did with dairy cattle [17].
High-density SNP data has effectively identified regions of the genome that are important predictors of production traits (i.e., [18] and [19]). The development of robust prediction equations for genomic selection often requires the use of large numbers of animals with high-quality genotypes and phenotypes. However, when numbers of samples are limited, improvements in statistical limitations can result in preferential selection of the most informative samples [15]. Quantitative traits can use much smaller sample sizes than can do random sampling [15], and the extreme costs associated with genome scale association studies sometimes require that qualitative similar data analysis uses alternative methods when including smaller sample sizes [20].
This study explored the use of genome-wide associations to identify heritable milking traits in the American Alpine goat. We hypothesize that associations would exist and could be elucidated for future development of a low-panel SNP BeadChip for use in research and animal management that would not be cost prohibitive for researchers and smaller scale producers.
The specific aims of the study were to (1)  The production data was collected from milk records contributed and indexed for each trait into the CDCB repository. Does from throughout the United States were utilized in this study with varying phenotypes and environmental stimuli. Neither handling and feeding conditions throughout the experiment nor birth month nor season factors were included in the statistical analysis. The statistical study was carried out following two generally applied mathematical models, a mixed model and a fixed model. The mixed model is described in where Y ijklm is the test day measurement (milk, log SCC, fat percentage, protein percentage, casein percentage, serum protein percentage, lactose percentage, and total solid percentage) for a particular lactation L i of a doe of birthing age A k at parity P l within A k , during postpartum week W j , and with m number goats weaned (N m ) [21]. Lactation was the only random effect, with 155 levels. The week, age, parity within age, and number of lambs weaned were fixed effects. This mixed model was based on that applied by Stanton et al. [22] for estimating lactation curves of dairy cows for milk, fat, and protein. This model presents greater validity when used to study lactation curves with a high number of test day for lactation and could be considered a good approximation to more complex mathematical models. The lactation random effect explains more variation in test day yield than does any other factor.  6) with genome-wide significance levels greater than -log 10 ðpÞ = 4:0 is shown in the figures. Association with milk yield (TDMilk), protein yield (TDProtein), fat yield (TDFat), protein, fat, and lactose composition was completed, respectively. Phenotypic/genotypic associations were observed for variance, and novel SNPs with strong association for particular traits were identified that segregated for inclusion on a low-density SNP Chip. After these basic association tests correlating specific phenotypic traits, the number of SNPs for each goat was selected based on the strongest associations having p < 0:05, where their negative logarithms would be the greatest. Analysis was conducted using SNP & Variation Suite v8.8.1 [14]. Spreadsheets were generated that identified relevant associations between different data sets and were used to produce figures and charts for additional visualization of the strongest associations having p < 0:05.  The GWAS data sets of phenotypes associated with genotypic components of the 50KSNP BeadChip are shown ( Table 1). The number of statistically significant (p ≤ 0:05) SNPs at a call rate ≥ 0:95 was selected for approximate numbers of genotype/phenotype association (Table 1).

International Journal of Genomics
Quality genotypic data points were obtained after thorough quality control and quality assurance (QC/QA) [24]. This GWAS study used several advantageous procedures involving approaches that distinguish chromosome aberra-tions that may affect genotype accuracy and detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium (HWE) [24,25] test p values. Allelic frequency of the p values as well as the capability of selecting SNPs,    (Figures 1-6) with genome-wide significance levels (−log 10 ðpÞ = 4:0) was completed for test day milk, test day protein, test day fat, protein, fat, and lactose, respectively. Phenotypic/genotypic associations were identified for variance, and novel SNPs with strong association for particular traits were segregated for inclusion on a low-density SNP chip. Manhattan plots for these six analyses indicated several significant regions at the genome threshold indicating outliers and chromosome level, niche association with specific, phenotypic traits. Figure 19 serves as an example of the many areas of identification of genomic coordinates specifically compared to known genes in the National Center for Biotechnology Information (NCBI) repository.

4.1.
Purpose. Custom-designed genotyping tools that are smaller and are less costly than the original, high-density tools have similar predesigned amounts of computational power. The selection criteria include, but are not limited to, small p values ( * p < 0:05), large estimated effect sizes (odds ratios), and previous reports of association. They also include biologically relevant pathways, plausible function of variations, or a combination of all these factors. This is an exploratory study and did not analyze high probability rare variants. Previous studies show that 460 to 4,600 individuals are needed to assume certainty for  7 International Journal of Genomics detections [15]. Also, Lee et al. [15] mentioned that the detection of rare-variant associations is underpowered when involving standard single-variant association analysis. Complex traits are predicted using genomic selection (GS) where information involving thousands of genetic markers from genomic signatures is utilized [26]. This exploratory study proposes that a more specific method be used to identify salient genomic signatures.
The goals of this exploratory study were to identify regions of the Alpine goat genomes that are important for predicting genetic merit in female dairy goats. This is useful to develop genomic prediction equations for the genetic evaluation of economically important milk production traits (TDMilk, TDProtein, TDFat, protein, fat, and lactose). This would further identify methods for high-yield and highquality milk towards the production of goat milk, cheese, ice cream, and many other dairy goat products for the con-sumer through the producer. The selection of goat phenotypes could be predicted while it is young, before reaching the age of gestation, thereby facilitating appropriate use of resources and positively leading to selecting genetic change in goat herds.
These approaches comprise the first step in identifying SNPs for future production of a low-density SNP panel (possibly a 5K BeadChip) available for goat producers at a reduced cost, compared to the current cost of a 50K Bead-Chip. This would facilitate both genetic selection and integration of genomic technologies into production systems, contributing to precision agriculture.

Genomic
Variability. This study identified information that can be used for the benefit of genomics in goats and    Figures 16-18 indicate, the same number of SNPs associates to the same chromosomes. However, a closer look reveals they are of the same quantity but not of the same composition (same number of SNPs but different SNPs). Many explanations for these data sets (the results involving possible indels, substitutions, translocations, and other chromosomal, stable aberrations) suggest a more complex system than initial identification of quantity provides. Because these exist, within the same breed of female American Alpine goats, more detailed studies are warranted.   [27] show that little overlap occurs between breed genomic regions of breeds that are under selection. This is because traits are complex and do not always display classic signatures of selection. Onzima et al. [27] mention that the processes of complex networks of genes acting together to environmental stressors are what lead to diverse biological, molecular, and cellular functions. They also suggest that most signatures are breed specific. Although this study was small in comparison to other GWAS, it is still remarkable that it supports the idea that breed specificity, a driving force in the genomic composition, can be explained as part of adaptation to varied environmental conditions [27].
Another explanation comes from Keller et al. [28], who described that variation can occur due to the stochastic nature of genomes caused by recombination of chromosomes in autosomal cells. Furthermore, the different frequencies that are observed within the same breed could be due to different population structures, population history, and use of selection strategies [29]. Brito et al. [30] made similar propositions for variability that in general, the breed development history impacted genes because they were potentially under selection.

Yield Variability.
Another area of interest is the variability that appears in yield as it relates to the phenotypes under observation in this study. The variability in yield among the individual traits within the same breed of goat is as follows: lactose < protein < TDProtein < TDFat > Fat < TDMilk. Although there are several studies considering variability from a genomic aspect, there are not many that refer to the effects of the genomic composition and selection of the animals on their trait variability.

Conclusion
Although this study is exploratory and is based on low numbers of samples, it strongly suggests an association between rare variants with traits of interest. To carry out a conclusive study for rare-variant studies on Alpine dairy goats, genotyping by sequencing or genotyping much larger numbers of individuals is ideal. Nevertheless, this study shows robust associations of signals and has undertaken the discovered variants or genes to molecular or cellular functions for over 25,960 distinct SNPs that are associated with specific traits expressed by American Capra hircus or Alpine dairy goats.
The amount of genotypic data in this study was relatively high, considering the low number of subjects. It enabled the discovery of multiple regions of the genome that were associated at chromosome-wide significance levels with desired traits from American Alpine dairy goats. A combined approximation of 30,594 SNPs was detected for six phenotypic traits at levels of significance ( * p < 0:05), respectively. These results imply that the possibility of a marker-based selection of desired traits is possible.
Currently, there is a 50K SNP BeadChip that is available to identify those goats that contain the genes for desirable traits examined here. The creation of a 5K or less SNP chip for the traits described in this study can provide similar information about each goat and would be a better way of precision breeding while preserving resources for better use in a goat-producing farm or facility. A larger scale study is cleared indicated to more conclusively identify DNA fragments for inclusion on a low-panel SNP chip. Finally, additional insight as to the relationship of phenotypic/genotypic effects on the variability in yields should be examined.

Conflicts of Interest
None of the authors of this paper have a financial or personal relationship with other people or organizations that could inappropriately influence or bias the content of the paper. No one, other than those mentioned or played a part in this document, was involved in the collection, analyses, or interpretation of data.  Figure 19: Identification of genomic coordinates (Combo_Pheno_Marker) using a portion of a heat map associated with TDMilk in comparison to RefSeq genes in National Center for Biotechnology Information (NCBI). The chromosomes of Capra hircus were used to for association with known sequences of SNPs. LOC102169846 is associated with estrogen sulfotransferase. CSN1S1 is associated with casein alpha s1. ODAM is an ameloblast associated with a processed peptide: odontogenic ameloblast-associated protein. CABS1 is associated with a calcium-binding protein. These and many more associated SNPs identified by this study are found in the NCBI repository.