DNA Barcoding for Minor Crops and Food Traceability

This outlook paper addresses the problem of the traceability of minor crops. These kinds of cultivations consist in a large number of plants locally distributed with a modest production in terms of cultivated acreage and quantity of final product. Because of globalization, the diffusion of minor crops is increasing due to their benefit for human health or their use as food supplements. Such a phenomenon implies a major risk for species substitution or uncontrolled admixture of manufactured plant products with severe consequences for the health of consumers. The need for a reliable identification system is therefore essential to evaluate the quality and provenance of minor agricultural products. DNA-based techniques can help in achieving this mission. In particular, the DNA barcoding approach has gained a role of primary importance thanks to its universality and versatility. Here, we present the advantages in the use of DNA barcoding for the characterization and traceability of minor crops based on our previous or ongoing studies at the ZooPlantLab (Milan, Italy). We also discuss how DNA barcoding may potentially be transferred from the laboratory to the food supply chain, from field to table.


DNA Barcoding for Plant Identification
Plants as primary producers are the basis of human nutrition from time immemorial.It is estimated that about 7,000 species of plants have been cultivated for consumption in human history (FAO data) and a large number of cultivars and varieties are also recognized.The Commission on Genetic Resources for Food and Agriculture (http://www.fao.org/nr/cgrfa/cthemes/plants/en/)estimated that 30 crops are usually referred currently as major agricultural products since they provide 95% of human food energy needs (e.g., rice, wheat, maize, and potato).These resources are widely monitored and well characterized with the analysis of DNA markers specifically developed for each cultivar (see, e.g., [1][2][3]).On the contrary, reliable characterization tools for the minor varieties are far from being defined.Minor crops include plants for food, pharmaceutical, cosmetic, and ornamental purposes with a modest production in terms of cultivated acreage and quantity of final product [4].There are no fixed standard values to define a minor crop; however, conventionally, all the local varieties could be placed in this category.Most of these species or varieties show peculiar traits from the alimentary, pharmaceutical, or ornamental points of view.Some examples of minor crops that are now widely cultivated and worldwide distributed are Goji (Lycium barbarum L. [5]), Chokeberry (Aronia melanocarpa (Michx.),[6]), Peach Palm (Bactris gasipaes Kunth [7]), Teff (Eragrostis tef (Zucc.)[8]), and Okra (Abelmoschus esculentus (L.) Moench [9]).A large number of minor crops were usually produced and consumed locally [10] but, nowadays, the continuous demand by developed countries for identifying new active metabolites for human health and nutrition has increased their diffusion at global level [11][12][13][14].This phenomenon implies a major risk for species substitution or uncontrolled admixture of manufactured plant products.Substitution or adulteration can be deliberate (e.g., to maximize financial gains) or inadvertent (e.g., due to an insufficient knowledge by farmers) but they can have serious consequences for consumers at any rate [14][15][16][17][18][19].
Given these premises, it is clear that the definition of a reliable traceability system is an aspect of major concern when plants, parts of plants, or plant extracts are used in food 2 Advances in Agriculture industry.The need for an unequivocal identification is also essential to start quality assurance procedures for agricultural products, to authenticate their geographical provenance (in the case of protected designation of origin), and to prevent commercial frauds and adulteration cases.
Agricultural products are subjected to strong processing and manufacturing before they are released as final products to the consumer.These processes alter the plant structure, thereby impeding the use of morphological characters to identify most of the agricultural products.To overcome this limit, the analysis of proteins and/or DNA is nowadays used as the main tool for plant traceability.However, although chemical or protein-based approaches are useful in characterizing the composition of fresh products, these methods can be biased by several factors such as the strong food manufacturing processes, the limited number of detectable isozymes, or the high tissue and developmental stage specificity of the markers [20].DNA markers are more informative than protein or chemical based methods because DNA better resists industrial processes such as shredding, boiling, pressure cooking, or transformations mediated by chemical agents (see, e.g., [18,21,22]).This property allows a successful identification of plant material, even when it is present in small traces [23,24].Moreover, the availability of advanced technologies and efficient commercial kits for DNA extraction permits obtaining an acceptable yield of genetic material from processed or degraded plant material [25].
As a consequence, DNA markers have rapidly become the most used tools in the genetic analyses of crops and cultivars, as well as in the tracking and certification of the raw materials in food industry processes [26][27][28][29][30][31][32].PCR-based methods are more sensitive and faster than other technologies in characterizing agricultural products [1][2][3].Among these, discontinuous molecular markers such as RAPDs, AFLPs, and their variants (e.g., ISSR, SSAP) have been successfully adopted for the characterization of crop species [24].Moreover, sequencing-based systems such as single nucleotide polymorphisms (SNPs) and simple sequence repeats (SSRs) are also used because of their high level of polymorphism and high reproducibility [30].However, being highly species specific, these approaches require access to the correct DNA sequence of the organisms and their application is often limited to a single species.
In the last decade, DNA barcoding was proposed as a universal DNA-based tool for species identification [33].The name "DNA barcoding" figuratively refers to the way an infrared scanner univocally identifies a product by using the stripes of the universal product code (UPC).At the same time, this approach is based on the analysis of the variability within one or a few standard regions of the genome called "DNA barcode/s" [33].The rationale of the method is that the DNA barcoding sequence/s univocally corresponds to each species (i.e., low intraspecific variability) but largely differs between taxa (i.e., high interspecific variability) [33,34].DNA barcoding has the advantage of combining three important innovations: molecularization of the identification approach (i.e., the investigation of DNA variability to differentiate taxa), standardization of the process (from sample collection to the analysis of molecular results), and computerization (i.e., the not redundant transposition of the data using informatics) [34].
Several plastidial and nuclear regions have been proposed as barcode regions for plants [35][36][37] and some of them are now used for the identification of crop species, as recently reviewed by [38].In 2009, the Plant Working Group of CBOL (consortium for the barcode of life) defined a standard corebarcode panel of markers based on the combination of portions of two coding plastidial regions: matK and rbcL [39,40].Despite their high universality in terms of amplification and sequencing success, the analysis of these coding regions fails in some cases due to the interspecific sharing of sequences [41].Internal transcribed spacer regions of nuclear ribosomal DNA (ITS) were recommended as additional marker being highly variable in angiosperms [40].ITS works well in many plant groups but, in some cases, incomplete concerted evolution and intraindividual variation make it unsuitable as universal plant barcode [40].However, the combination of matK and rbcL with the plastidial intergenic noncoding region trnH-psbA increases the identification performance of DNA barcoding.As a consequence, the use of trnH-psbA is growing due to its easy amplification, and its high genetic variability among closely related taxa [15,35,42].
At the University of Milano-Bicocca (Milan, Italy), the ZooPlantLab group (http://www.zooplantlab.btbs.unimib.it/) is one of the most active centers where DNA barcoding is used as a universal traceability system.The ZooPlantLab research team investigates concrete problems dealing with agricultural production of minor crops by transferring the analytical pipeline from the laboratory to food supply chain.This approach aims to overcome technical traceability problems in order to offer solid solutions to the market.
In the following sections, we present some of the potential applications and advantages of DNA barcoding for the identification and traceability along the food supply chain of minor crops.We also examine the most innovative approaches dealing with DNA barcoding that have been recently adopted to characterize these kinds of agricultural products.

Traceability of Minor Crops in the Supply
Chain: The Case of Spices Spices represent a clear example of minor crops.Most of these belong to Lamiaceae, a large family of 264 genera and almost 7,000 described species [78] characterized by aromatic oils and secondary metabolites.Thanks to their peculiar chemical profiles, these plants are commonly used as flavor for cooking, essences for cosmetics, and active components in medicines.Given their economical importance, many members of Lamiaceae have been investigated widely with different approaches ranging from morphology to chemistry and genetics in order to characterize their variability and improve the quality of cultivated varieties [25,26,79,80].
Although some species showed distinctive morphological traits, this family encompasses many critical genera such as Thymus [43], where differences among closely related taxa are limited to few minor morphological characters.However, morphology could be ineffective for tracing spices along the supply chain (i.e., from the crop cultivation sites to the final products) which usually encompasses strong manufacturing processes such as crushing, powdering, or aqueous/alcoholic extraction of plant material.
International agencies such as the American Spice Trade Association (ASTA, http://www.astaspice.org) and the European Spice Association (ESA, http://www.esa-spices.org/)support the characterization of the phytochemical profile to assess the quality of herbs and spices.The evaluation of chemical characteristics is essential to standardize the industrial production of spices-derived products; however, in most cases, the analysis of chemical compounds is not able to univocally identify the original plants at the species level [26].For this reason, we proposed the DNA barcoding approach as a universal and suitable tool to characterize and trace aromatic species.DNA analyses were conducted starting from different plant portions [22] or their derived products (e.g., oils, extracts) stored at different conditions (i.e., dried, frozen).In our study [22], we investigated 6 major groups of cooking spices (i.e., mint, basil, oregano, sage, thyme, and rosemary) also including their most relevant cultivars and hybrids.We collected samples at different stages of the industrial supply chain starting from seeds and plants cultivated by private farmers or in garden centers to commercial dried spices or other manufactured products.We also tested the performances of DNA barcoding starting from plant extracts.A good yield of high quality DNA was obtained through extraction protocols from all of the considered samples and then used for the next steps of the analysis (i.e., PCR and sequencing).A sufficient amount of DNA was also extracted from several of the plant extracts (Labra M., unpublished data) by using commercial kits.This first result confirmed that the industrial processes to transform the raw plant material such as drying, crushing, and aqueous or alcoholic extractions do not excessively degrade DNA.Among the four tested DNA barcoding regions (i.e., rbcL, matK, trnH-psbA, and rpoB), the trnH-psbA ranked the first in genetic divergence values among species, followed by matK and rbcL.On the contrary, rpoB showed the lowest sequence divergence among the tested taxa (see [22] for further details).
Our results partially supported the guidelines provided by the CBOL [40].Indeed, the two core-barcode markers (i.e., matK + rbcL) properly assigned the tested spices to the expected genus and, in most cases, they also reached the species level.However, the highest identification performances were achieved by using the additional trnH-psbA barcode region.A clear example is that of basil (genus Ocimum), a group consisting of 30-160 species with many recognized cultivars [81].In our study, exclusive trnH-psbA haplotypes, were found for almost all the tested cultivars, providing a reliable system for their identification.This result deserves to be highlighted because it is one of the first pieces of evidence supporting the usefulness of DNA barcoding in discriminating organisms at a taxonomic level lower than the species one.
Other important data revealed by our analyses concerned the capability of DNA barcoding to identify parental and hybrid species in some members of Lamiaceae.An example is represented by the case of peppermint (M.piperita L.), a sterile hybrid between M. aquatica L. × M. spicata L. [82,83].The plastidial markers used in this study confirmed that M. spicata L. is the maternal parental of M. piperita L. because both taxa showed the same DNA profile.However, to confirm definitively the hybrid origin of M. piperita L. and to identify the exact parental inheritance, the ITS2 codominant marker was sequenced (Labra M., unpublished data).
On the whole, the most relevant result of our work consisted in the assessment of the universality of DNA barcoding in a context of minor crops traceability.Using a single primer combination for each one of the few DNA barcoding markers and following standard laboratory protocols, it is possible to recognize the original species starting from different plant portions or derived processed materials.The same approach is also useful for validating several other herbal products commonly distributed on the market such as tea [50], saffron [44,84], ginseng [69], black pepper [59], and many others (see also Table 1).These cases clearly emphasize the high versatility of DNA barcoding.It is an authentic functional tool for molecular traceability of agricultural products, as most of the minor crops have not yet been characterized with private markers such as SSR or SNP in order to allow a reliable DNA fingerprinting system.Moreover, DNA barcoding does not require any previous knowledge of the plant genome for the investigated species and the analytical procedures can be easily adopted by any laboratory equipped for molecular biology.

Commercial Frauds and Dangerous Substitutions
Nowadays, the global diffusion of several minor crops in the absence of suitable traceability protocols is leading to frequent cases of plant substitution and inadvertent or deliberate adulteration.There are several documented examples of commercial frauds where minor crops were substituted with related taxa showing a higher productivity or biomass but without the agronomical and nutritional characteristics of the original species/cultivars [27,85,86] (see also Table 1).Astounding cases of this phenomenon were observed for some of the most common spices such as the Mediterranean oregano adulterated with Cistus incanus L., Rubus caesius L. [87][88][89] and saffron substituted with Crocus vernus (L.) Hill, Carthamus, and Curcuma [19,44,84].In this context, the use of DNA barcoding can be decisive because it can not only verify the presence/absence of the original species, but also identify the nature of the replaced species.One of the most striking substitution cases ever revealed by our investigations refers to fish meat (e.g., sold as slices, fillets, blocks, surimi, fish sticks, and fins).In this product category, the manufacturing processes often lead to the loss of any morphological diagnostic feature that may correctly identify the original species.In our molecular investigation [90], we documented the frequent substitutions of Palombo (i.e., the Italian vernacular name for Mustelus mustelus and Mustelus asterias) with other less valuable shark species.Our test showed that about 80% of the screened fish products did not correspond to these two species but to other species or genera,

Molecular identification of minor crops in complex matrices
Natural health products Identification of pharmaceutical plants in commercial products [69] Juice and vegetal beverages Juice authentication [70][71][72]

Honey
Identification of pollen and plant residuals [73] Jams or yogurt Identification of fruit in commercial products [74,75] Food supplements Identification of allergenic plants [76,77] some of which are fished or marketed illegally.Starting from this experience, we tested the usefulness of DNA barcoding to evaluate the contamination of plant-based products.For example, in a pilot study on spices conducted by our group, we detected contaminant DNA in commercial samples of sage (i.e., Salvia) produced by local farmers.This DNA corresponded to species belonging to the family Poaceae (i.e., Festuca sp.).We hypothesized that these contaminant plants were accidentally grown together with the sage and fragments of them were erroneously collected, shredded, and consequently admixed to the final commercial products (Labra M., unpublished data).These conditions are dangerous if the contaminant taxon is toxic or allergenic for humans.A typical example is that of nuts and almonds which cause allergies in many people [91].Several commercial foodstuffs (e.g., bakery, pastry, and snacks) showed contamination by these plants (see, e.g., [76,92]).Also in this case, DNA barcoding acts as a very versatile tool, allowing the detection of both species (and many other allergenic taxa) also when they were present in traces [76].
Similarly, DNA barcoding can be efficient in identifying those plant species causing intoxication or poisoning in consumers.In recent years, plant exposures are among the most frequent poisoning cases reported by poison control centers [15,93,94].Many of these are due to inadvertent misidentification as reported in [95] where the authors documented the exchange of spontaneous salad (Lactuca alpine (L.) Wallr.) with Aconitum spp.and wild garlic (Allium ursinum L.) with Colchicum sp.Both Aconitum and Colchicum contain toxic metabolites with severe consequences for human health after ingestion [96,97].Our analysis showed that DNA barcoding allowed us to detect the presence of poisonous plants and identify specific sequence-characterized amplified regions (SCARs) useful in a real-time PCR approach for rapid diagnosis in poison centers [60].

Plant Molecular Identification in Complex Matrices
Most food and cosmetic products are made up of a pool of plant species, major and minor crops, and spontaneous species.These are considered complex matrices [31] and, to establish traceability, the availability of universal tools able to univocally identify each plant species is needed.We underline that the assumptions for which DNA barcoding region(s) and the primers used are universal [33] imply that when the method is applied to complex matrices, PCR amplifications will produce several DNA barcoding amplicons, corresponding to different species.For this reason we tested this diagnostic method to identify the plant composition on different mixed products such as the commercial potpourris [14] and multiflower honeys (Bruni et al., submitted).For most of these herbal products, a detailed list of ingredients is not reported on the label; as a consequence, it is difficult to understand which species are used for their preparation and especially how safe these are for human health.In the case of potpourris, our results showed that the principal ingredients are simple aromatic plants (e.g., species  [31,67,101,102,104,105].In another study, we demonstrated that, starting from a robust local database, it is possible to characterize the pollen composition of multiflower honey, one of the most complex food matrices. Our tests, conducted on honey samples produced in the Italian Alps, showed the conspicuous presence of endemic taxa.This result allowed us to assess not only the composition of honeys, but also their geographical origin (Bruni et al., submitted).See also Table 1 for further examples.
In comparison to agricultural products made by a single plant, the molecular characterization of complex matrices requires some technical advances, especially concerning the sequencing step.The traditional DNA-sequencing method [106] can only be adopted for direct sequencing of amplicons deriving from a single taxon.Complex matrices often contain mixtures of DNA from many individuals belonging to a certain taxonomic group (e.g., angiosperms) and DNA amplification may generate amplicons of the same size for a certain locus (e.g., a DNA barcode region for plant identification), therefore impeding direct sequencing with the Sanger approach.A possible solution could be the adoption of a preliminary cloning step to separate single DNA templates but this strategy has its own limitations (e.g., high costs) and can introduce biases (e.g., low representation of the sequenced colonies in the case of highly complex matrices [107,108]).Recovering DNA sequences from the tens to thousands of specimens present in a complex food matrix requires the ability to read DNA from multiple templates in parallel.Since 2005, advances in the field of next-generation sequencing (NGS) technologies [109] have been helping in addressing this issue with ever-lowering costs.To date, several models of high-throughput sequencing devices have been commercially introduced based on different chemistries and detection techniques [108].NGS technologies can generate up to tens of millions of sequencing reads in parallel and these approaches are being used in a variety of applications, including the traceability of food matrices containing agricultural products [73,74,110].
In conclusion, given the rapid evolution and standardization of NGS advances, we think that a universal approach such as DNA barcoding combined with them can offer a new opportunity for the traceability of minor crops from field to table.

Table 1 :
List of studies dealing with DNA barcoding identification of minor crops.
[103]102]99][100]are sometimes edible (e.g., Salvia officinalis L.; Ocimum basilicum L.) or ornamental (e.g., Salvia splendens Sellow ex J.A. Schultes, Lavandula angustifolia Miller) without negative effects on human health.In other cases these products revealed the presence of plants which produce natural toxic metabolites, such as alkaloids that are dangerous for human health[14,[98][99][100].However, the main critical element for the identification of plantbased complex matrices is the availability of DNA barcoding reference databases[101,102].To date, the Barcode of Life Data System (i.e., BOLD, http://www.boldsystems.org/[103])contains 52,767 plant DNA sequences although several minor crops and local varieties are missing.Recent works, edited by our laboratory and other groups, highlighted the need for dedicated reference archives of DNA barcoding data for these kinds of plants