Variations in the Regulatory Region of Alpha S1-Casein Milk Protein Gene among Tropically Adapted Indian Native (Bos Indicus) Cattle

Regulatory region of milk protein alpha S1-casein (αS1-CN) gene was sequenced, characterized, and analyzed to detect variations among 13 Indian cattle (Bos indicus) breeds. Comparative analysis of 1,587 bp region comprising promoter (1,418 bp), exon-I (53 bp), and partial intron-I (116 bp) revealed 35 nucleotide substitutions (32 within promoter region, 1 in exon-I, and 2 in partial intron-I region) and 4 Indels. Within promoter, 15 variations at positions −1399 (A > G), −1288 (G > A), −1259 (T > C), −1158 (T > C), −1016 (A > T), −941 (T > G), −778 (C > T), −610 (G > A), −536 (A > G), −521 (A > G), −330 (A > C), −214 (A > G), −205 (A > T), −206 (C > A), and −175 (A > G) were located within the potential transcription factor binding sites (TFBSs), namely, NF-κE1/c-Myc, GATA-1, GATA-1/NF-E, Oct-1/POU3F2, MEF-2/YY1, GATA-1, AP-1, POU1F1a/GR, TMF, GAL4, YY1/Oct-1, HNF-1, GRalpha/AR, GRalpha/AR, and AP-1, respectively. Seventy-four percent (26/35) of the observed SNPs were novel to Indian cattle and 11 of these novel SNPs were located within one or more TFBSs. Collectively, these might influence the binding affinity towards their respective nuclear TFs thus modulating the level of transcripts in milk and affecting overall protein composition. The study provides information on several distinct variations across indicine and taurine αS1-CN regulatory domains.


Introduction
Bovine caseins are distinguished into four protein fractions, namely, alpha S1-casein, alpha S2-casein, beta-casein, and kappa-casein encoded by genes: S1-CN, S2-CN, -CN, and -CN, respectively [1]. Alpha S1-casein represents the major protein fraction (31%) among the bovine milk proteins (caseins and whey) and constitutes up to 40% of total casein [1]. It is a calcium sensitive and highly phosphorylated protein. It has an important role in the capacity of milk to transport calcium phosphate and is organized at 5 ′ -terminus of casein cluster located on bovine chromosome 6 (BTA6). Till now, 9 variants (A-I) have been reported in the coding region of S1-CN. Amongst these, B and C, differing in amino acid substitution (Glu/Gly) at position 192 of the mature protein, are the most common. e variant C has been reported to be common in zebu breeds, while other rare variants like A, D, and F have only been reported in European cattle [2]. ese variants are well characterized and their associations with quantitative effects on milk performance/production parameters have been widely reported [3,4]. e results on association studies involving only coding region variants are not always consistent [5] and this might be attributed to the presence of intragenic haplotypic combination of variants in the regulatory and coding regions [3,6]. Moreover, casein gene expression is also known to be differentially regulated by hormones and most of the potential hormone receptor binding sites occur within the 5 ′ -�anking region of casein genes [6]. us, mutations at these regulatory regions might also have enduring effect on milk protein gene regulation at transcriptional level [6,7] either individually or as inter-or intragenic haplotypes.

ISRN Biotechnology
For S1-CN, mutations in the promoter region have been reported to in�uence the protein-coding efficiency of the relevant structural gene by changing the binding affinity towards their respective nuclear transcription factors (TFs) [8,9] and can thus be considered as functional candidate for milk protein content. Additionally, variants of S1-CN5 ′ -�anking region ( S1-CN5 ′ ) have also been associated with economically important traits like longevity and somatic cell scores in different taurine breeds [3,10,11]. Sequence variation within S1-CN5 ′ has been widely studied in several species like cattle involving mainly B. taurus [10][11][12][13][14], yak [15], buffalo [16], goat [17], and sheep [18]. In contrast to the studies conducted on S1-CN5 ′ in B. taurus and few zebu cattle [12,19], no systematic study has been made to reveal the variations and haplotypes existing in S1-CN5 ′ among Indian cattle breeds. e Indian native (B. indicus) breeds are adapted to diverse climatic conditions and range from good milch (of dairy merit) animals to extreme draught types. ese breeds are known for their survival under inadequate feeding and low-input production practices naturally. Further, due to evolutionary divergence, B. indicus and B. taurus are expected to have variations in the candidate genes related to dairy traits. Keeping in view the scanty information available in Indian native cattle breeds, the present study was aimed to (i) sequence the full-length S1-CN5 ′ in 13 Indian zebu breeds; (ii) search for putative TFs based on the indicine sequence (sequence speci�c to Indian zebu cattle); (iii) check if detected polymorphisms lie within identi�ed TFBSs; (iv) identify variations within indicine breeds and their comparison with taurine breeds; and (v) identify homologies in the regulatory domains as well as phylogenetic relationship for S1-CN5 ′ from different mammalian species.

PC� Primers and
Pmpli�cation of S1-CN5 ′ . Primer pairs S1-CN-F1 (5 ′ -CCAATCCAGATATTGAACCTGC-3 ′ ) and S1-CN-R1 (5 ′ -ATAGGAAAGTACCAATACTTG-C-3 ′ ) were used to amplify a fragment of 1,639 bp including promoter, exon-I, and intron-I region of S1-CN. e primers were designed through PRIMER2 soware using cow genomic sequence data available at ENSEMBL database (BTAU 4.0:6). PCR reaction was performed in 25 L reaction mixture containing 100-150 ng of genomic DNA, 5 p mole of each primer, 200 M of dNTPs mix, 1X PCR buffer, 1.5 mM MgCl 2, and 1.5 unit of Taq DNA polymerase (Invitrogen, Carlsbad, CA, USA) and was carried out in Mastercycler ep Gradient thermal cycler (Eppendorf, Germany) using thermal cycling conditions as initial denaturation at 95 ∘ C for 2 min, followed by 30 cycles at 95 ∘ C for 60 sec, 59 ∘ C for 45 sec and 72 ∘ C for 2.30 min with a �nal extension at 72 ∘ C for 10 min. e amplicons were electrophoresed through 1.2% SafeView stained agarose gel and were visualized under UV transilluminator.

Sequencing, Annotation and Comparative
Analysis of S1-CN5 ′ . e PCR product from each sample was puri�ed using Exonuclaese I/Calf Intestinal Phosphatase (Exo-CIP) enzymatic treatment and used to sequence S1-CN5 ′ region using forward primer S1-CN-F1 and three additional internal primers (P1F: 5 ′ -GTTCCTGTCATACAACTGTG-3 ′ , P2F: 5 ′ -ACTGGACACGACTTAGAAAC-3 ′ , and P3F: 5 ′ -CAATGCCATTCCATTTCCTG-3 ′ ) designed on the 1,639 bp amplicon. Sequencing reactions were performed with BigDye v3.1 cycle Sequencing Kit in an ABI PRISM 3130 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). e resulting sequences were aligned and polymorphic sites identi�ed from sequence comparison were con-�rmed through manual inspection. e transcriptional factor binding sites were identi�ed using TESS (http://www .cbil.upenn.edu/cgi-bin/tess/tess/), MATCH [21] (http:// www.gene-regulation.com/cgi-bin/pub/programs/match/bin/ match.cgi/), TRANSFAC [22] (http://www.biobase-international.com/), AliBaba2.1 Search engine [23] (http://www .gene-regulation.com/pub/programs/alibaba2/index.html), and from the literature [15,19]. e potential functions of the putative TFBSs were determined from TRANSFAC database. PHASE v2.1.1 soware [2426] was used to analyze identify haplotypes. e frequency of variations was calculated as number of animals with variation/total population size, whereas breed-wise frequency was estimated as number of mutations within a breed/total mutations. Linkage disequilibrium (LD) measures, ′ and 2 , between all single nucleotide polymorphisms (SNPs) were estimated using Arlequin v3.5 soware [27]. To determine the homologies among the DNA binding domain, sequences of S1-CN5 ′ from major milk producing mammalian species (B. indicus, B. taurus, B. bubalis, and C. hircus) were extracted from GenBank and Ensemble databases. Molecular Evolutionary Genetic Analysis (MEGA) soware version 5.0 [28] was used for the comparative sequence analysis and phylogenetic sequence analyses employing the Neighbor-Joining (NJ) method as this method does not require the assumption of a constant rate of evolution. Genetic distances were estimated by the p-distance model and standard errors of the estimates were obtained through 5,000 bootstrap replicates.
Among 36 variations observed in the promoter region; 15 were located within the putative TFBS in�uencing the binding affinity of their respective TFs, thus possibly affecting the gene transcript. Further, in intron-I, variation at position 82 (T > A) was also located within G�-1 TFBS ( Table 1). Many of the observed variations in�uenced more than one nuclear factor (Table 1). However, observed deletions in S1-CN5 ′ did not affect any known TFBS. Across the variations at DNA binding domains, −1259 (T > C) is located within the GATA-1 and NF-E exhibited maximum frequency (0.63).

Homologies of Regulatory Domains among Major Dairy
Species. Sequence comparison of S1-CN5 ′ among major livestock species of dairy purpose (cattle, buffalo, and goat) revealed divergence at binding domain for several ubiquitous TFs and motifs speci�c to mammary gland and hormone receptors. Amongst the 31 different TFB elements annotated for Indian zebu cattle, 12 showed variations ( Figure 2). ese variable regions included transcriptional activators such as, ER, MEF, 16 bp milk box, and TBP, repressors of gene regulation such as YY1 and AP-1, while others were related with basal regulation of gene expression such as GAL4, GATA-1, Oct-1, POU3F2, Sox-5, and TFIID ( Figure 2). (----)  Variations reported in B. taurus [19] but not observed in the present study. based on UPGMA with 5,000 bootstrap replicates for S1-CN5 ′ among 15 different mammalian species revealed four major groups. Ruminants from Bovidae family (cattle, yak, buffalo, goat, and sheep) were grouped together; members of Hominidae family (human, chimpanzee, orangutan, and gorilla) and monkey formed another group; rat and mouse from Muridae family were clustered together, while, horse from Equidae family was distinctly separated (Figure 3).

Discussion
Due to close linkage of four casein genes, regulatory domains of one casein gene might in�uence the other caseins as well in addition to the respective casein [29]. It is pertinent to study variation in S1-CN regulatory region as it is located at the 5′ end of casein group with orientation in the sense direction and most likely its 5 ′ region controls the transcription regulation of other caseins [10]. Further, compared to other caseins, S1-CN5 ′ is the most variable [3] and these variations might in�uence the encoded transcripts and hence the milk composition and properties. Evidence for signi�cant association of mutations within the regulatory region of casein complex with production traits across different taurine (B. taurus) breeds has been provided in number of studies [6,7,9,10].
In the present study, sequence characterization of S1-CN5 ′ among Indian zebu cattle (B. indicus) revealed a dense region of binding sites for tissue-speci�c factors, hormone receptors, and ubiquitary transcription factors with few overlapping binding sites. Overall 39 variations identi�ed in Indian zebu cattle breed indicated high variability of S1-CN5 ′ . e polymorphic nature of S1-CN5 ′ has also been reported by Schild and Geldermann [19] with 17 variable sites  (Table 1). Among the 4 variations unique to B. taurus the variation −728 (− > T) was genotyped using SspI restriction site and results suggested signi�cant association of heterozygous genotype (− > T) with average higher -S1 protein content in taurine breeds [6,7,9,19,30]. is variation at −728 (− > T) has been observed to have close linkage with −175 A > G (intragenic haplotype; [4]). Further, intergenic haplotypes have also been reported for S1-CN5 ′ variation −728 (− > T) with variation in S1-CN5 ′ −1084 C > T and −186 T > C and -CN5 ′ (−109 C > G) [7]. However, in contrast to such reports, variation/deletion was not observed at position −728 among the analyzed Indian cattle (B. indicus) breeds and all animals were homozygous for T allele (TT). Another important variation at −175 (A > G) located within the binding site of ubiquitously expressed AP-1 TF [19] showed different genotypic pattern across B. taurus and Indian native cattle breeds. e particular variation was genotyped across 62 Simmental and 80 German Holstein cows by Kuss et al. [13,14] and re�ected association of heterozygous AG genotype with high S1-CN protein content. Conversely, none of the animals in our study showed heterozygous AG genotype at position −175, as all were either homozygous with GG or AA genotype. ese �ndings clearly demonstrate the nucleotide divergence in the regulatory region of S1-CN5 ′ across Indian and taurine cattle breeds.
Out of 39 variations observed in the present study, 16 were located within putative TFBS, some of which are ubiquitously expressed and involved in regulation of tissue-speci�c gene expression. e variations located within transcriptional activators included −1158 T > C and −330 A > C, positioned in the potential binding sites for ubiquitously expressed Oct-1 that could possibly change the transcriptional mechanism. e −1158 T > C also overlaps with POU3F2 TFBS speci�c to nervous system and binds Oct-1 (Figure 1, Supplementary  Table S3). e variation at −1016 A > T was located within MEF-2 (regulator of cellular differentiation). Among the group of transcriptional repressors within the cis-acting elements, observed variation c536 A > G was positioned at TMF binding domain which represses activation of TATA box and 82 T > A (intornic region variation) within the G�1 TFBS (Supplementary Table S3). Some of the variations were located within the binding sites for TFs with activator and the repressor activity was; −1016 A > T and −330 A > C within ubiquitously distributed YY1 TFBS; −610 G > A, −207 A > T and −206 C > A within GR TFBS while; −778 C > T and −175 A > G within AP-1 TFBS. AP-1 is a known transcriptional activator, but few studies also suggest its role as repressor for S1-CN5 ′ [13,14]. e variations located within other important TFBS included −1399 A > G and −1259 T > C, located within Nuclear TFs (NF-Kappa E1 and NF-E, resp.) ( Table 1) which are nuclear proteins with unknown speci�c function. −1399 A > G also overlaps with binding domain of c-Myc ( Figure  1, Table 1). Similarly, variations −1288 G > A, −1259 T > C, and −941 T > G are marked within the GATA-1. Both c-Myc and GATA-1 are regulators involved in cell proliferation and cell growth. Under category of tissue-speci�c TFs, the observed variations were G-610A located within POU1F1a that in�uences secretion from pituitary gland and has transactivation activity; −214 A > G within the liver speci�c activator, HNF-1 that acts in cooperation with other TFs. Although not tissue speci�c, variation at −521 A > G was sited within the TF GAL4 that mediates transactivation of gene regulation (Supplementary Table S3). Additionally, variations at −207 A > T and −206 C > A overlapping with the binding domain for GRalpha were located within AR that mediates androgen-speci�c gene regulation. Eleven out of the sixteen above-discussed variations occurring within important putative TFBSs are speci�c to Indian zebu cattle and have not been observed in any other breed>species. e remaining �ve variations (−1016 A > T, −53 A > G, −521 A > G, −330 A > C, and −175 A > G) are common with B. taurus (Table 1). e effect of these variations, individually or in combination, could in�uence the regulation of S1-CN gene expression effectively. e variations from B. taurus counterpart at genomic level also indicate the possible differences in milk performance traits of the two subspecies. Also, homology differences of regulatory sequences among major dairy species (cattle, buffalo, and goat) might be responsible for difference at production level. As regulation of gene expression is under multifactorial control, there is a need to focus on haplotypes rather than individual variations. e present study generates the knowledge related to variations in naturally evolved Indian cattle breeds within regulatory region of S1-CN gene, wherein such information was lacking. e novel variations found in Indian cattle breeds may be responsible for differential content of milk components as compared to taurine breeds. is study needs to be extended further in combination with protein coding gene polymorphism (intragenic haplotypes) to evaluate effects of promoter polymorphism on milk production traits. e ability to link sequence variability to dairy traits in context of other members of casein family using SNP chip or other tools could be important. is would lead to efficient utilization of resources like Indian native cattle impacting the socioeconomic structure of large population in India.