Mechanism of Gene Amplification via Yeast Autonomously Replicating Sequences

The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification.


Introduction
Gene amplification represents a cellular process characterized by the production of multiple copies of a particular gene or genes, thereby leading to their enhanced expression. In some organisms it is an integral part of the normal developmental process [1] or is closely associated with abnormal processes, such as malignancies [2], increased drug resistance [3], and mutations [4]. Dhar et al. [5] reported origin of a complete chromosome due to amplification of rRNA sequences in Plantago. Significant effort in this field was the discovery of cis-acting genetic element aps (amplification promoting sequence) from the nontranscribed spacer region of tobacco ribosomal DNA (rDNA), which reportedly increased the level of expression of recombinant proteins [6].
In yeast genome, gene amplification is closely associated with specific DNA sequences known as autonomously replicating sequences or ARS. These sequences are identified by their unique ability of high frequency transformation and stable plasmid maintenance [7]. ARS are short DNA sequences of few hundred base pairs which support maintenance of plasmid in growing yeast cells. Some of these ARS elements are known to behave as origins of replication [8,9]. ARS elements are spread over sixteen chromosomes of yeast on an average once every 30-40 kb. Apart from the extrachromosomal origin function, many but not all ARS elements also function as replication origins in their original chromosomal context [10,11]. A strong ARS has been estimated to yield 50,000 transformants/ g of DNA, while the weakest ARS yields < 100 transformants/ g of DNA [12]. Based on the time taken for replication initiation, the origins in yeast have been classified into early and late replicating [13]. Dhar et al. [14] reported a comprehensive compilation on the distribution of the ARS within the genus Saccharomyces. The present study offers the physical aspects governing replication efficiency and mechanism of gene amplification via ARS elements. A thorough understanding of these elements may aid in their better utilization for the development of specialized yeast vectors for scientific and commercial applications in genetic engineering. 2 The Scientific World Journal

Isolation of Amplification Promoting Sequences.
Five different sources were used for the isolation of amplification promoting sequences (APS). (I) For the in silico analysis, all the ARS located on the sixteen S. cerevisiae chromosomes were considered (∼740). Only some of the ARS elements proposed to behave as CEOs (compromised early origins) were taken into consideration for experimental purposes. These were ARS310, ARS315, ARS606, ARS806, ARS1305, ARS1426, and ARS1512; (II) nontranscribed spacer (NTS) region of 5S ribosomal DNA of P. ovata (366 bp); (III) full 5S rDNA region (363 bp; NTS region is 242 bp) was chosen from P. lagopus to check its efficacy as amplification promoting sequence; (IV) sites of replication fork impediment at Ste20 gene located on chromosome VIII of S. cerevisiae [15] and (V) regions prone to DNA bending within fragile site FRA11A. FRA11A maps to 11q13.3 region within the 11q13 locus of the human chromosome. The DNA sequence of the FRA11A retrieved from NCBI was analyzed in depth for identifying the regions of maximum curvature and DNA bending using bend.it and TWIST-FLEX programs. The 11 Mb sequence of FRA11A was analysed using several bioinformatics tools. Although there were many regions with curvature > 14, two regions were found with curvature > 16. These regions are FSI-11445795-11455795 and FSII-11451795-11452795 which were selected for further analysis (Table 1).

PCR Amplification.
Genomic DNA was isolated from yeast strains (Table 2), plants, and resected normal and tumor tissues using standard protocols [16][17][18]. Total RNA was removed from the DNA samples by treatment with RNaseA at a concentration of 10 g/mL. Specific primers were designed for amplification of APS using IDT (Integrated DNA Technologies) online software. Oligonucleotide primers for each APS were designed and were flanked with the BamH1 and Sal1 restriction sites at the 5 and the 3 ends. For BamH1 "GGA-TCC" sequence was used and for Sal1 "GTCGAC" was used. Trinucleotides "GCA" or "CAG" were also used as support at the 5 end as done previously by Venkateswarlu et al. [19]. The sequences of all the primers used in this study are listed in Table 3. The APS sequences were PCR amplified using specific primers. Desired fragments were eluted from the gel using GenAxy DNA mini elution kit. YEp51G (yeast episomal plasmid) was used for vector cassette construction ( Figure 1). It is a shuttle plasmid bearing leucine and ampicillin as the

Gene Amplification
Assays. Crude yeast cell extract was prepared. Bradford assay was done for determining concentration of solubilized protein [21]. Mitotic stability and plasmid loss rate assays were performed as described by Dani and Zakian [22] with slight modifications. For curvature analysis of APS native 10% polyacrylamide gel electrophoresis was carried out to confirm the bending intensities of the fragments showing noticeable bendability during in silico analysis. The protocol was adopted from Bechert et al. [23]. The migration of the marker fragments was determined and calibration curves were plotted (logarithm of the base pairs versus distance migrated). The calibration graph for the length determination of the APS DNA incorporating fragments was performed individually for each gel.

In Silico Analysis.
After assessing the amplification potential of the APS by gene amplification assays, the sequences were further analysed using several bioinformatics tools (WEB-THERMODYN, TREP, model.it, bend.it, SDSC, Twist-Flex, EMBOSS, and RNAfold). This analysis was essentially carried out to investigate the role of sequential behaviour of the APS in promoting gene amplification. SGD and oriDB databases were used for collation of information. Also, an effort was made to identify the presence and location of known replication enhancer (RE) sequences and matrix attachment regions (MAR) within these APS [24]. A 189 bp region has crucial role for ARS activity and scaffold binding The Scientific World Journal 3    The Scientific World Journal  being reported earlier by Amati et al., 1990 [25]. Presence of this sequence in S. cerevisiae ARS was checked using BLAST. Geneious Pro (5) software was used for the phylogenetic analysis of S. cerevisiae ARS elements using neighbour joining method (Table 5).

Results
Genomic DNA was successfully isolated from S. cerevisiae, P. ovata, P. lagopus, and resected tissue samples. Amplification products within 320-350 bp size range corresponding to 5S rRNA of P. ovata and P. lagopus were observed ( Figure 2).
The amplification products of different ARS were observed as differentially migrating bands. Amplicon corresponding to ARS315 was of the expected size of ∼428 bp, while that of ARS606 was ∼367 bp. A bright band of size ∼381 bp was observed representing ARS1512. PCR amplification products of ARS1305 and ARS1426 were comparatively of lower molecular weight, that is, ∼281 bp and ∼328 bp, respectively ( Figure 3). PCR product of the expected molecular weight ∼613 bp was observed corresponding to Ste-YF1, while the PCR amplicon (∼529 bp) was corresponding to Ste-YF2, and ∼531 bp amplification product was obtained for Ste-YF3. PCR product representing Ste-YF4 was also successfully amplified The Scientific World Journal  (∼487 bp) ( Figure 4). PCR amplicon of ∼546 bp was observed corresponding to FRAI, while amplification product of ∼689 bp represented FRAII ( Figure 5).
Increase in the activity was analysed spectrophotometrically from the crude cell lysates. Each sample was analysed in three biological replicates to confirm the values obtained. Protein activity increased significantly in the cells having the plasmid as compared to the wild type cells without the plasmid. Maximum activity was observed in the pYEp51GA-6 followed by pYEp51GA-13, while wild type pYEp51G showed the minimum activity. Except pYEp51GA-1 and pYEp51GA-2 which showed insignificant increase in expression of the GUK1 gene, the remaining constructs showed an increase in the protein activity ( Table 6). The transformation efficiency 6 The Scientific World Journal and the mitotic stability of the cells were calculated as shown in Table 7. It was observed that Yep51GA-3, Yep51GA-5, Yep51GA-8, YEp51GA-12, and YEp51GA-13 were more stable as compared to the other transformants over the generations. High plasmid loss indicative of the low mitotic stability was observed in the pYEp51GA-9, pYEp51GA-7, and the wild type.
In most of the ARS elements, it was found that the tetranucleotide stretches of "A" followed by "TTTT" were present in the regions depicting highest curvature ( Figure 6). Long stretches of "A" were also located in the bend regions of ARS506 and ARS1426, while extended "T" stretches were also detected (e.g., ARS606 and ARS1125). Interesting observation was that the ARS showing the maximum bending tendencies in experimental (Table 8) and theoretical analysis (ARS315, ARS606, ARS1305, and ARS1426) had A4 repeats followed by T4 repeats (highlighted by arrows in Figure 7). Also, there were ample trinucleotide sequences like TTA and ATA known to have high curvature values in the regions of ARS elements under study. The present observations regarding DNA bending at APS fragments indicate the strong possibility of the presence of replication enhancer sequences within them which promote DNA bending and hence may lead to gene amplification ( Table 9). Out of the six RE sequences used for the present analysis, RE2, RE4, and RE6 were completely absent in any of the ARS, while RE3 and RE5 were generously present in many ARS. Also, presence of RE1 was observed in ARS1127, while Ste20-YF3 and FSII showed the presence of RE3. Moreover, ARS315, ARS606, ARS1305, ARS1426, and ARS1512 showed maximum homology with 189 bp sequence. FRAI showed lesser similarity as compared to FRAII with this sequence. A general observation was that all the selected APS had high similarity with this sequence, which points towards the pathway of formation of secondary structures by these APS during gene amplification.
Curvature propensity and bendability values of the selected APS sequences are listed in Table 10. The magnitude of the predicted curvature propensity was within the range calculated for experimentally tested curved motifs (>1 and <22.5) for all the APS except those obtained from plants. Based on these results, the 2D projections depicting exact regions of DNA bending amongst the ARS are shown in Figure 8. Further, local bending of selected APS was predicted by static-geometry models as well using model.it DNA analysis tool. 3D models of the bent DNA were created from the sequence data and gave the output as a PDB file (Figure 9).
The analysis of hereditary molecular differences amongst these ARS elements was carried out in order to gain information about their evolutionary relationships. The phylogram generated in this manner had three main branches (Figure 10), where ARS1316 and ARS1507 appeared as outliers  with bootstrap value of 100, while the rest of the ARS were clustered in the same branch. In phylogenetic analysis, branch length is a measure of the amount of divergence between two nodes in a tree. The ARS1620 showed the maximum branch length, followed by ARS701 and ARS416. Apart from these, ARS502, ARS513, ARS1221, ARS910, ARS310, ARS319, ARS1302, ARS1001, and ARS108 also showed longer branch lengths as compared to the rest. ARS located on chromosome V showed longer branch lengths as compared to others (ARS518, ARS503, ARS515, ARS519, ARS513, ARS504, and ARS502). A noteworthy observation was that ARS located on the same chromosome were located very close to each other in the dendrogram, for example, ARS of chromosome VI (ARS1618, 1607, 1608, 1633, 1603, 1626, 1629, 1631, and 1604),  Phylogenetic analysis also pointed out that the ARS containing intact ACS sequence were also evolutionary proximal. Adjacent location of the very similar sequences further confirms the accuracy of the tree (ARS1200-1 and 1200-2).
To correlate the evolutionary trends of the ARS with their physical properties, a combined analysis of the eight

Discussion
Identification of the key players of gene amplification process and their inducing factors is a challenge to a number of biological processes in both prokaryotes and eukaryotes [2]. The aim of the present research was to understand the mechanism of gene amplification via fragile sites. Therefore, ARS elements known to behave as CEOs (compromised early origins) which are known to cause fragility in yeast were chiefly considered [26]. Slight differences in significant and conserved functional sequence motifs within ARS can modulate their ORC binding affinity and origin activity [14]. Although the basis of origin inefficiency of CEO ARS elements is not clear, it is possible that the chromatin conformation of these ARS is such that it results in weaker binding of the ARS to trans-acting factors crucial for origin activation. This has been proposed as a possible explanation for the fact that only one origin, that is, ARS607, is used in >85% of cells in chromosome VI [27]. Gene amplification assays performed during the present investigation showed that plasmid bearing amplification promoting sequences from yeast and human beings had significantly enhanced protein activity. It has been documented that the effect of APS is likely due to a combination of (i) its structural features such as its 80% A+T content, (ii) repetitive ARS core elements, (iii) DNA bending, and (iv) SAR-related (scaffold attachment region) sequences [6]. All of these factors play a crucial role in formation of protein-DNA complexes that initiate and regulate nearby gene amplification and transcription. Synergistic effect of these factors may explain a significant increase in gene expression associated with the vector cassette bearing ARS1426 and FRAII. The bioinformatics analysis was essentially carried out to investigate the role of sequential behaviour of the APS in promoting gene amplification. The exact regions harbouring the bend DNA, RE sequences, and MARs amongst the chosen APS were also successfully located.
Out of all the APS chosen for the present investigation, selected 189 bp SAR was found in the APS elements derived from yeast and humans. This specific SAR has been reported in Drosophila, where 189 bp region from the 5 SAR element of fushi tarazu (ftz) gene was found to be crucial for ARS activity and scaffold binding [25]. Similar studies have also shown that 40% of the 58 Drosophila SARs tested function as ARS elements in yeast [28]. Nuclear scaffold interacts with genomic DNA at these specific sites (SARs) and forms the basis of the DNA loops. This indicates that the ATrich genomic DNA present in these APS elements remains specifically attached to residual nuclear structures. A correlation between the replication enhancer sequences and SARs was evident from earlier studies which confirm that several scaffold binding sites coincide or map very near to enhancer elements [29]. Further, strong affinity of these regions with DNA stem-loops or cruciforms has been reportedly seen frequently during eukaryotic replication [30]. For example, the 14-3-3 proteins present in yeast, plants, amphibians, and invertebrates, located within the nucleus, are involved in eukaryotic DNA replication via binding to the cruciform DNA that forms transiently at replication origins [31]. Also, deletion of the cruciform binding domain of the protein at the ori site leads either to reduction or failure in replication in budding yeast [32]. This analysis indicates a positive correlation between scaffold binding affinity and ARS activity as homologs of 14-3-3 proteins in S. cerevisiae, Bmh1p and Bmh2p, have cruciform DNA-binding activity and associate in vivo with ARS307 [30].
DNA bending is known to play a key role in the formation of nucleoprotein structures, as well as in the specific interaction of proteins with their DNA sites. Many studies have suggested that DNA bending was necessary for origin activity [33]. Such regions of DNA bending have been reported in yeast ORC, promoter regions of prokaryotes, GAL1-10 and GAL80 regulatory genes of yeast, regions of nucleosome formation, and recombination sites using electrophoretic and circular permutation analyses [34,35]. Both in silico and in vitro analyses of DNA bending pointed out strong bending potential of some of the APS, like FRAII and Ste20-YF3, apart from the S. cerevisiae CEO ARS-ARS315, ARS606, ARS1305, ARS1426, and ARS1512. The results of this analysis are in line with the studies of Hagerman [36], wherein it was proposed that direct repeats of the sequence GA4T4C gave rise to bending, while GT4A4C did not. Our observations are also in coherence with the preliminary model for ARS elements given by Eckahl and Anderson [37]. AT stretches have been associated with increased instability and gene amplification events in yeast [38].
Much surprisingly, plant APS (PO5 and PL5) could not efficiently trigger gene amplification. Minimum DNA bending tendencies, high plasmid loss indicative of the low mitotic stability, detection of only the tetranucleotide "A" repeats, and absence of TTTT repeats and SAR related sequences may explicate this behaviour. Even though DNA bending was not observed in the plant APS chosen for this study, bent DNA regions have been seen in mutant proteins of transcriptionally active OccR-octopine complexes in plant tumors [39]; plant MADS-box proteins [40] and rRNA promoter upstream sequences in Arabidopsis thaliana [41].
To correlate the evolutionary trends of the ARS with their physical properties, a combined analysis of the eight parameters (i.e., free energy values (Δ ), REP values, origin efficiency, SIDD, flanking elements, colocalization of a possible fragile site, curvature, and GC content) of each of these chromosomal ARS was conducted (Supp I). This was compared with their closely placed members in the phylogenetic tree. A thorough comparison of the properties of the members of each cluster revealed interesting results. It was observed that, in addition to the closeness in the phylogenetic tree, ARS1316 and 1507 also share similarity with respect to their physical properties. They behave as confirmed origins of replication and have low curvature (9-10), low TREP, and low G+C values. ARS1620 which showed the longest branch length had high SIDD value, no flanking elements, and low curvature (approximately 7). These properties were similar to the properties of ARS701 which had the second longest branch length. ARS elements of chromosome IV were found to be located close to one another and approximately all the members showed similar properties; for example, ARS409, 432, and 418 showed replication efficiency, fragile site colocalization, high   Such specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification.
These bend sites also serve to facilitate initiator protein interactions or the association of the origin with the nuclear matrix.
DNA bending acts as a structural element to facilitate the necessary DNA-protein interactions.
AT tracts serve as binding sites for nuclear matrix-associated proteins and DUE (DNA unwinding elements) at the site where DNA unwinding begins.
Regions of low helical stability caused by the AT-rich tracts facilitate the initial unwinding of the DNA molecule and are often found in the vicinity of replication origins curvature, and Δ and REP values (curvatures 14-16 were noteworthy). Similarly, ARS518, 503, 515, 519, and 513 showed very similar properties, while ARS of chromosomes IV and XII showed very similar properties. ARS308-1308 were more closely placed in the phylogram and this close relatedness was well reflected in their physical properties (high Δ , low curvature, presence of LTR, and fragile sequences).
The combinatorial power of all the aspects of amplification promoting sequences analyzed in the present investigation was harnessed to reach a consensus on the factors which stimulate gene expression in presence of these sequences. A thorough comparison of the properties of the members of each cluster revealed interesting results. Approximately, all the closely placed members showed similar properties, for example, replication efficiency, fragile site colocalization, high curvature, and Δ and REP values. It is worth mentioning here that the analysis of APS has shown that the replication enhancer sequences and SARs are also localized in their high curvature regions. Higher DNA bendability evident at the regions prone to gene amplification can hence be attributed partly to the presence of the tetranucleotide repeats of A and T, SAR like sequences, and replication enhancer sequence in this region. The protein binding sites in the S. cerevisiae ARS may play a crucial role in stimulation of origin activity. It has been reported that DNA-binding site and the OBF1 protein are involved in the regulation of the activation of nuclear origins of replication in S. cerevisiae; that is, OBF1 DNA-binding site is an enhancer of DNA replication [42].
14 The Scientific World Journal The location of the SAR in Figure 11 is also confirmed by an earlier study which reported that the highly AT-rich region forms the 5 boundary of the functional gene regulatory domains [25].
The collated results of the analysed features of the dissected S. cerevisiae ARS are represented diagrammatically (Figure 11). Based on the experiments carried out in the present study, a mechanism of gene amplification was hypothesized ( Figure 12). In nutshell, the mechanism of gene amplification in ARS (autonomously replicating sequence) of S. cerevisiae was that AT-rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements ( Figure 12). The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification. The combinatorial power of all the aspects of amplification promoting sequences analyzed in the present investigation paved a way to reach a consensus on the factors which stimulate gene expression, in presence of these sequences.