Phylogenetic Determination of Chenopodium quinoa Based on the Chloroplast Genes rbcL and matK

,


Introduction
Te cultivation of quinoa (Chenopodium quinoa Willd.) is vital to feeding Andean communities; however, cultivation has increased notably in regions of North America, Europe, Africa, and Asia because of its great adaptability to diferent environmental conditions [1].Additionally, the quinoa grain has a high nutritional value and contains all essential amino acids, which is why it is considered a high-quality protein food global production system [2].Studies on the morphoagronomic characteristics of quinoa varieties in countries such as Colombia, Peru, and Bolivia have shown high phenotypic variability [1,[3][4][5].However, the phenotypic characterization of quinoa varieties has limitations because information is reduced, and the expression of quantitative traits is subject to strong environmental infuences [6].Morphological parameters, such as plant height, shape and size of the leaf or stem, fower characteristics, inforescences, and grain, are not an identifcation method that diferentiates Chenopodium species or varieties since most parameters vary with age, stage of development, and environmental conditions.
An alternative for the identifcation of plant species and varieties is the use of molecular markers or DNA barcodes that provide more information on inter-and intraspecifc variation of the species [7,8].Many studies have shown that DNA barcoding is an efective tool for plant identifcation [9][10][11][12].Likewise, given the high conservation of its sequence, compact size, lack of recombination, and maternal inheritance, the chloroplast genome has been used for the generation of genetic markers and for phylogenetic classifcation [13], genetic divergence [14], and DNA barcoding systems for molecular identifcation [15].Commonly studied chloroplast coding genes include atpF-atpH, matK, rbcL, rpoB, rpoC1, psbK-psbL, and trnH-psbA since they meet the requirements for use as DNA barcodes in plants [16].Hollingworth et al. (2009) recommended the combination of two loci, ribulose-1,5-bisphosphate carboxylase (rbcL) and maturase (matK), as a plant barcode based on assessments of recoverability, sequence quality, and levels [17].For species discrimination, this two-locus barcoding can identify species and contribute to the discovery of new species [18,19].
Studies of these two genes in Chenopodium showed the paraphyletic origin of this genus [6].An accurate assessment of the magnitude and organization pattern of the genetic diversity of individuals or conserved material of Colombian quinoa is essential to understanding the genetic structure and reliably identifying each genotype, elucidating the variability of quinoa cultivars [20], which will allow adequate use of the potential of varieties along with their conservation.Terefore, understanding the phylogenetic relationships of Colombian quinoas could be an important tool in the search for unique adaptive production, quality, and resistance characteristics for breeding programs that seek to further explore the genetic background of this species [13,21].Te objective of this study was to understand the phylogenetic relationships between Colombian quinoa with matK and rcbL markers to establish strategies in breeding and conservation programs that lead to more efcient selection processes and the identifcation of quinoa germplasm of agronomic, industrial, or pharmaceutical interest.

Materials and Methods
2.1.Plant Material.Tirty-two accessions from the Quinoa seed collection of the Biotecnología Vegetal de la Gobernación de Boyacá were evaluated (Table 1), of which nine seedlings were planted with a randomized complete block design (RCBD) under greenhouse conditions in the city of Tunja, located at an altitude of 2690 meters above sea level, with an average temperature of 13 °C, relative humidity of 78%, and 12 : 12 photoperiod.Of the nine plants planted per accession, a random sample was taken for sequencing the chloroplastic genes.

DNA Extraction and PCR Amplifcation of Specifc
Regions of the Chloroplast.DNA was isolated from young leaves of each of the quinoa accessions using the modifed method of Dellaporta et al. [22], visualized on 0.8% agarose gels in a Maxicell Primo EC-340 electrophoresis chamber.Te concentration was determined by spectrophotometry in a Biotek EPOCH 2 device, followed by dilution using HPLC water to a total volume of 100 µl at 10 ng/µl, stored at −20 °C.For the PCR amplifcation of the rbcL and matK regions of the chloroplast, the primers (5′-ATTATACTCCTGAGTATGA-3′)-jrR (5′-ACTCCATTTGCTAGCTTC-3′) and matKF (5′-CTATATCCACTTAGTATT-TTCAGGAGG′)-matKR (5′-AAAGTTCTAGCACAAGAAAGTCGA-3′) were used, respectively, according to [6].Te amplifcation cycles were initial temperature at 95 °C for 1 min, followed by 35 cycles of denaturation at 94 °C for 30 seconds.Te annealing temperature was 54 °C for both primers, and the extension was at 72 °C for 1.5 min.Te amplifed products were separated on 1.2% agarose gels at 180 volts for 1.5 hours in a Maxicell Primo EC-340 electrophoresis gel system chamber and stained with ethidium bromide.

Sequencing and Sequence
Analysis.Amplicon sequencing was carried out with capillary electrophoresis on an ABI PRISM 3500 XL sequencer.Te editing, assembly, and alignment of the sequences were done with BioEdit v.7.0.9.0 and MEGA-X [23].All sequence data were uploaded to the National Center for Biotechnology Information (NCBI), and the access codes of the 62 sequences are OQ148485-OQ148548 (https://www.ncbi.nlm.nih.gov/popset/?term=OQ148485).Sequences were individually subjected to BLAST searches for similarity with sequences in GenBank databases and were further evaluated on the Barcode of Life Data System v4 (BOLD) platform.Multiple alignments of the sequences were produced with MUSCLE and CLUSTAL using MEGA-X to identify SNPs in the aligned sequences.For the general characterization of the sequences, the conserved, variable, informative parsimony, and singleton sites were evaluated, the nucleotide composition of the sequences by accession was estimated, and the pattern and nucleotide substitution rates of the matK and rbcL regions were determined.To evaluate the evolutionary distances, the model with the best ft was determined, and three phylogenetic trees were generated with the selected model, the frst using the matK region, the second using the rbcL region, and the third concatenating the two matK and rbcL regions with the neighbor-joining tree algorithm, the Jukes-Cantor sequence distance estimation model, and the bootstrap of 1000 replicates.

Results
Te rbcL and matK chloroplast regions of 32 Chenopodium accessions from central-eastern Colombia were amplifed to generate molecular profles that discriminated between C. quinoa and other Chenopodium.Te nucleotide sequences of the amplicons were deposited in the NCBI GenBank and can be accessed under the assigned accession numbers (Table 1).Multiple alignments of the nucleotide sequences representing the rbcL and matK regions of the chloroplast identifed the size of the chloroplast matK gene at ∼650 bp and the size of the rbcL gene at ∼1350 bp.Te 32 matK gene sequences showed that each sequence had an average of 646 bp, with 56.96% variability, with a total of 260 conserved sites, 77 informative sites by parsimony, and a transition/transversion ratio of 0.69.Te rbcL gene sequences, despite the fact that they presented a low percentage of variability of 32.88%, had a higher PIS (parsimony-informative site) than matK with 85 sites and a transition/transversion ratio of 0.82 (Table 2).A total of 434 amino acids were translated from the rbcL gene sequences, while a total of 215 amino acids were translated with matK.
Te accessions for the matK gene showed varied compositions, and the Q19-5 sample had diferent compositions than the other accessions with 29.7% thymine (T), 15.7% cytosine (C), 36.8% adenine (A), and 17.8% guanine (G).When the sequences obtained from the rbcL gene were compared, the composition of the nucleotides was similar, with percentages close to the average for the genes.Only accession Q9-2 had a sequence size of 518 bp, which represented approximately half the base pairs of the average sequence.Te %GC was higher in rbcL than in matK, with 42.3%.
To fnd the best model of the evolutionary distances and phylogeny based on the two genes, the models with lower BIC (Bayesian information criterion) scores were considered the best description of the nucleotide substitution pattern during evolution, according to [22].Te analyses of the 32 According to the Tamura 3-parameter model and matK gene sequences, the nucleotide substitution rate ranged from 0.05 (AC, TG, CG, and GC) to 0.14 (CT and GA).For the rbcL gene, the substitution rates ranged between 0.06 (AC, TG, CG, and GC) and 0.14 (CT and GA), and, for the matK and rbcL concatenated genes, it was between 0.06 (AC, TG, CG, and GC) and 0.13.(CT and GA) (Figure 1).
Te nucleotide substitution pattern was evaluated at complete codon positions (1st + 2nd + 3rd nucleotide) and according to the best-ft model of the matK and rbcL sequences, which is shown in Table 3. Te transitional substitution was greater than the transversional substitution in the matK and rbcL regions.However, the matK region exhibited a higher substitution rate from C to T. In contrast, the frequency of change from G to A, T to C, and A to G in the rbcL region was higher than that of matK.
Te concatenated sequences of matK and rbcL were analyzed using BLAST to search for homology with the sequences reported in GenBank, where 96% of the sequences presented 99% identity with the species Chenopodium quinoa Willd.except for accession Q9-2, which presented 98.37% identity with Chenopodium desiccatum; however, this same accession presented 98.37% similarity with Chenopodium atrovirens on the BOLD platform.Te BOLD results were diferent in three accessions, and the species returned from this database did not match those from BLAST.Te Q8-3 accession showed 99.57% similarity with Chenopodium suecicum in BOLD, while, in BLAST, there was 99.83% identity with Chenopodium quinoa Willd.Accession Q23-3 presented 96.16% similarity with Chenopodium desiccatum in BOLD (Table 4).
Te results of the sequence analyses were consistent with the phylogenetic reconstruction from the matK and rbcL sequences, which discriminated accessions based on the number of base substitutions per site, as evidenced by the tree topologies of the sequences obtained with the neighborjoining method (Figures 2(a) and 2(b)), which showed that some accessions had high genetic variability and a low index of genetic divergence.
Te phylogenetic trees obtained with the sequences of the two genes analyzed independently showed similar groupings in some accessions in the two trees; for example, the grouping for accessions Q30-8 and Q4-6 did not correspond to the place of origin.
Finally, the phylogenetic tree obtained by concatenating the matK and rbcL regions conformed to the tree built by the matK segment; that is, there was little variation in the conformation of the groups, except for slight changes in the branch supports (Figure 3).Comparing these results with those obtained showed similarity and coherence in the distribution of the groups.

Discussion
Genetic variation is important for the survival of plants since it leads to adaptation and evolution [23].Likewise, studies on genetic diversity among cultivated plant species are essential for conservation and improvement programs [24].Tis study, where 32 Chenopodium accessions were analyzed by evaluating matK and rbcL genes, showed that these regions contain higher amounts of adenine + thymine (A + T) than guanine + cytosine (G + C) and that nucleotide substitution rates were high, as were genetic distances, demonstrating that the use of chloroplast molecular markers plays an important role in elucidating evolutionary relationships and identifying species, such as quinoa [25].
Terefore, it is essential to understand the evolutionary causes of nucleotide changes and their patterns of variation between species since this generates genetic divergence [26].In this study, the matK region exhibited a higher nucleotide substitution rate than the rbcL region.Ho et al. showed that the rbcL gene has fewer variations than other chloroplast genes [11].In addition, this gene presents more universality in the primers than in the matK gene.Te latter has combinations of several primers to achieve amplifcation of the region [27,28].However, the universality of primers is an important factor in the identifcation of unknown species because primers are more efective if they can be used to amplify diferent types of plants [29].In this study, the sequencing of both matK and rbcL genes showed efectiveness in the identifcation of species.
Te analysis of the nucleotide composition of the sequences showed that matK was the region with the greatest variability or polymorphic sites with respect to the rbcL gene.Te latter region evolves slowly and is, therefore, used more frequently in phylogenetic analyses [30].In addition, the rbcL gene is a more universal region and is less efcient in separating closely related taxa [16], although some studies on angiosperm species have shown that this region presents enough variation to distinguish species [31].Meanwhile, matK, which presents a faster evolution, is widely used for phylogenetic analyses.In this study, the analysis of the two regions and the analysis of the translation revealed that the nucleotide composition was similar to Chenopodium quinoa.
Searching the BOLD and GenBank databases of the concatenated matK and rbcL genes showed 98 to 100% agreement with Chenopodium quinoa sequences; however, some sequences coincided with C. desiccatum, C. atrovirens, and C. suecicum, and this coincides with what was obtained by [32] using Bayesian and maximum parsimony analysis of the noncoding trnL-F (cpDNA) and nuclear ITS regions, where tetraploid species of the genus Chenopodium such as C. berlandieri subsp.nuttalliae and C. quinoa belong to Chenopodium sensu stricto.Terefore, the DNA barcoding used in this study was able to distinguish intraspecifc and interspecifc divergences because, among other things, of the fact that both regions have high amplifcation and sequencing success rates [33].
Studies on C. murale obtained similar results using the matK and rbcL genes, showing 100% match with C. murale using BLAST, while, in BOLD, the rbcL gene showed a high 4 International Journal of Agronomy degree of similarity with several taxa, including C. ambrosioides, C. album, and C. fcifolium, ranging from 96.3 to 100% [11].Research by Huang et al. [34] showed that the combination of the rbcL and matK genes separated 40% of the sampled species in the combined data set for the family Apiaceae.Terefore, these two regions, matK and rbcL, have been shown to be ideal for species identifcation.Quinoa is currently gaining international attention given the productive and nutritional potential of its seeds, which has caused a signifcant increase in planting areas.Because it is characterized by its primary and secondary germplasm, the authors of [14] reported the complete mitochondrial and chloroplast genome sequences of quinoa accession PI614886 and the identifcation of sequence variants in additional quinoa accessions and related species.Tis was the frst reported mitochondrial genome assembly in the genus Chenopodium.Te inference of phylogenetic relationships between Chenopodium species based on mitochondrial and chloroplast variants supports the hypothesis that the ancestor of the A genome was the cytoplasmic donor in the original tetraploidization event and that highland and coastal quinoas were generated independently, producing results such as those found in this study.
Ibrahim et al. 2019 characterized seven quinoa genotypes with diferent provenances and identifed genetic polymorphisms and unique markers in each genotype with variation at the seed color level using ISSR, SCoT markers, and two chloroplastid markers (rbcL and rpoC1) [15].Sequence alignment revealed that rbcL recovered from the quinoa genotypes with high similarity to other rbcL genes obtained from Chenopodium species in other studies, with similarities ranging between 76 and 80%.Furthermore, the rbcL gene showed highly consistent genetic similarity with low genetic evolution and mutation.On the other hand, the sequence alignment revealed that the rpoC1 obtained from the quinoa genotypes had high similarity with other rpoC1 genes obtained from other plant species in other studies.Te  International Journal of Agronomy  6 International Journal of Agronomy similarities ranged from 80% to 81%.Te evolutionary divergence in Chenopodium using rbcL gene sequences for 19 accessions found 0.68% diversity in interspecifc sequences [6].Studies on the Chenopodium genus have shown the ability of a single barcoding gene (rbcL) or a combination of these genes to distinguish individuals of the Chenopodium genus [15,35].In addition, phylogeny factors and the identifcation of associated gene families are essential tools for understanding the evolution, mutation, and genetic factors associated with the main characteristics of the Chenopodium genus [25].
For the conservation or genetic improvement of plants, phylogenetic studies represent the basis for inferring the evolutionary history of the species, their delimitation, genetic diferentiation, and gene fow [36].Tis information provides the necessary tools for conservation and input to predict the genetic diversity of species in breeding programs [37].Te reconstruction of the phylogenetic trees that took into account the matK region, rbcL region, and both concatenated regions in this study suggested that most of the quinoa accessions are closely related, which implies a close relationship that is associated with the reproductive system of the species and its coevolution processes in the Andean productive systems.Terefore, our results support future quinoa breeding and genetic conservation programs.

Conclusions
Te present study demonstrated the efcacy of the matK and rbcL genes as reliable markers to support Chenopodium quinoa conservation and improvement programs, locally, nationally, and internationally.Although the results of the rbcL gene were less promising, the use of both genetic markers for the molecular identifcation of the Chenopodium genus is recommended as an efective, modern, and widely used method.
Te phylogenetic trees obtained from the matK and rbcL gene analyses provided strong evidence of phylogenetic relationships between the evaluated accessions, where similar clades were established between the phylogenetic trees obtained with the matK gene and the tree where both matK and rbcL genes were concatenated.Tis fnding implies that genetic divergence in quinoa is phylogenetically informative at the species level.In addition, the low rates of genetic diferentiation may be due to the lack of certifed seeds and the constant exchange of seeds between farmers in the principal producing areas of the Andean region.Terefore, DNA barcoding in Chenopodium quinoa is an appropriate tool for establishing the correct identifcation of this species, although more phylogenetic studies involving more species from the Chenopodium genus are needed to fully understand the phylogeny of quinoa.Te composition and sequence data provide a database for other studies on quinoa diversity.International Journal of Agronomy

Figure 1 :
Figure1: Nucleotide substitution rates (matK, rbcL, and matK + rbcL).Te X-axis shows the substitution of one nucleotide for another during evolution.For example, AT implies a mutation in which adenine was replaced by thymine, and AC adenine was replaced by cytosine.Te Y-axis shows the rate of substitution in the analyzed sequences.

Figure 2 :
Figure 2: Phylogenetic tree based on the sequences of 32 quinoa accessions by neighbor-joining method.(a) Sequences of matK.(b) Sequences of rbcL.

Figure 3 :
Figure 3: Phylogenetic tree based on the combination of matK and rbcL sequences of 32 quinoa accessions by the neighbor-joining method.

Table 1 :
Chenopodium accessions are molecularly characterized by the matK and rbcL genes and GenBank access number.

Table 2 :
Sequence statistics and nucleotide pair frequency analysis of sequences representing chloroplast rbcL, and matK loci of accessions of Chenopodium investigated in the present study.sequences showed that the best evolutionary model for the matK gene sequences was 3-parameter Tamura (T92 + G + I), with 65 parameters and the lowest BIC value, 5973.62.For the rbcL gene, the best-ft model was a 3-parameter Tamura (T92), with 63 parameters and a BIC value of 7875.96.For the concatenated matK and rbcL sequences, the 3-parameter Tamura (T92 + G + I) model had 65 parameters and a BIC value of 25590.22. nucleotide

Table 3 :
Pattern of nucleotide substitution of matK and rbcL regions (in percentage).

Table 4 :
Searching results of matK and rbcL genes on GenBank and BOLD databases.
*Bold values indicate that there is no match between GenBank and BOLD databases.