Recent advances in biological instrumentation and associated experimental technologies now permit an unprecedented efficiency and scale for the acquisition of genomic data, at ever-decreasing costs. Further advances, with accompanying decreases in cost, are expected in the very near term. It now becomes appropriate to discuss the best uses of these technologies in the context of the angiosperms. This white paper proposes a complete genomic census of the approximately 500,000 species of flowering plants, outlines the goals of this census and their value, and provides a road map towards achieving these goals in a timely manner.
The angiosperms (flowering plants) are believed to comprise somewhere around 360,000 species [
The main objective is to establish vibrant and efficient communication among scientific experts in the disparate disciplines of taxonomy, systematics, cytometry, genomics, and bioinformatics, with the aim of planning a comprehensive molecular census of the angiosperms. This census is envisaged to start with measurement of nuclear genome sizes (
The rationale underlying a complete global molecular census is primarily and simply that it would provide an accounting of existing plant biodiversity generated by millions of years of evolution, using the most advanced and cost-effective means available. The angiosperms, or flowering plants, emerged approximately 140 million years ago, their sudden appearance and rapid diversification being described by Darwin as “an abominable mystery” [
As the primary autotrophs on land, angiosperms represent an invaluable resource of genetic and genomic information. It is therefore surprising, and somewhat alarming, how little genetic and genomic information has been collected for the angiosperms. In part, this has been due to cost. Historically, information of the type that we propose be gathered—cytological data and DNA sequences—has been obtained using expensive and delicate equipment, with high costs of operation.
At present, biases in genomic information for angiosperms reflect the interests of society: Dirzo and Raven [
The justification for a comprehensive molecular census of the angiosperms is easy to articulate: firstly, because we lack most molecular information from the great majority of plant species, any attempt to acquire this information will provide value. Secondly, it is clear that increasing pressures on the environment because of the largely uncontrolled growth of human activity is increasingly resulting in loss of biodiversity [
Putting these two observations together, it is obvious we risk extinction of many (perhaps most) plant species
If we accept that it would be valuable to derive a comprehensive molecular census of the angiosperms, it is reasonable to move on to examine the value of each of the census components. Within the context of this article and this volume, these components are the
The
Many research groups have an ongoing interest in studying the processes responsible for genome structure variation in multicellular plants, and the influence of this variation on gene and chromosome function. It is clear that nuclear genomes in angiosperms are highly dynamic, particularly as reflected in the amplification of transposable elements (TEs), in the removal of selectively neutral DNA sequences, in the occurrence of polyploidy, in recombinational rearrangement by ectopic events, and the resolution of double-strand DNA breaks [
At the cellular and organismal levels, analysis of
Differences in
As of now, genome sizes have been measured and compiled for 8,959 species [
Most
The Accuri C6, costing around $35,000, has an extraordinarily large dynamic range of measurement. This dynamic range easily accommodates DNA content values spanning 0.32–80.00 pg in a single instrument run, and analysis of noise and upper limits indicates the C6 should be able to handle the largest and smallest angiosperm values yet reported [
Low-cost plate samplers are now available. A general criticism of much current
A follow-on is that part of these analyses would be more effectively done in the field, such that immediate results guide subsequent sampling. The Partec CyFlow miniPOC instrument, designed for CD4 and CD4% testing in developing countries but, based on its technical specifications, clearly adaptable for genome size measurements, is particularly notable, since it weighs less than 5 kg, operates on a 12-volt battery supply, costs less than $20,000, and, being equipped with USB ports, can be readily interfaced with wireless communication devices.
An additional impediment to large-scale
Based on these advances, it is now conceivable, in terms of affordability, methodology, and practicality, to consider extending
It is, finally, worth emphasizing quality control of
The angiosperms are unusual, compared to most eukaryotic clades, in that their genome sizes span an extraordinary range. The smallest reported angiosperm genome size, that of
In part, the first question is answered by the established occurrence and dynamic amplification and loss of repetitive DNA sequences within the genome (see, for example, [
Dramatic decreases in costs of sequencing have accompanied the emergence of new sequencing technologies [
Massively parallel sequencing has revolutionized molecular biology by making genomic sequencing possible for many more organisms than previously attainable. Low redundancy and shallow coverage genome survey sequences from this type of sequencing have the potential to rapidly provide large, cost-effective datasets for phylogenetic inference, to replace single gene or spacer regions as DNA barcodes, and to provide a plethora of data for other comparative molecular evolution studies [
Coupling low coverage (1–5
The angiosperms are also unusual in terms of the widespread occurrence of ancestral genome duplication events [
As indicated above, rapid advances in technology will ensure that our capabilities to acquire sequence data run the danger of outstripping our abilities to select appropriate questions to be addressed. In effect, we will be in the position of the amateur photographer a few years back, encountering the replacement of film by digital technologies. Without the limitation imposed by the time and cost of buying and developing film, the art of photography has been infinitely changed. Care in selection of subject, exposure, and so on, has been replaced by taking many more pictures in the expectation that at least some will be outstanding. The danger is the inevitable deluge of data, and the lack of both software to adequately curate these data, and storage to archive them. It will be critical to design appropriate experiments and strategies, to assemble teams, and to generate funding resources, such that a complete molecular census of the angiosperms can be achieved.
Performing a complete molecular census of the angiosperms will require coordinated planning. First and foremost, we need to establish a network of scientists who display the ability to work interactively and without egotism, who also are aware of the technical, technological, and infrastructural issues that relate to the collection of census data, and who are sensitive to the cultural, social, and geographical issues that necessarily accompany collection activities. Coordination will also lead to the identification of taxonomic and collection deficiencies required for census implementation. This will impact the training programs recognized as important in the plant sciences, and is certainly anticipated to put much more emphasis on the “classical” training of plant systematists, morphologists, and taxonomists.
The scientists wishing to participate in coordination activities should have a common interest in genome structure, function, and evolution, and, since they use many different genetic, molecular, and computational strategies to investigate related questions, a greater level of interaction would promote synergistic activities. Analyzing genome sizes across all angiosperms would provide a solid foundation for initiating collaborations. For instance, if some lineage of plants were found to have unusually rapid rates of genome size change, this would provide an impetus to those investigating transposable elements to look for
As in all comprehensive research projects, the great majority of the samples will be simple to collect and analyze, but the last few samples are likely to be recalcitrant to analysis or (more likely) collection. One of the first tasks of the coordinating group will be to set initial priorities for sample analysis. Although these priorities cannot be established in advance of assembly of a coordinating group with broad disciplinary and international representation, it is likely that the first studies would be focused on full sampling across the angiosperm phylogeny. One can imagine that the secondary priorities will be for deeper sampling of specific families that have interesting priorities discovered in the first broad sampling and/or families that have active research communities.
A first glance at genome size data can also identify lineages that have recently changed their ploidy, where 1
Finally, research coordination activities including participants who have been conducting field work in many countries over the years would make use of current best practices and the contacts for collaboration. The characterization, preservation, and utilization of angiosperm genetic diversity is a vital issue worldwide, so a full buy-in to this proposed approach is expected from the full international community of plant scientists.
In summary, the proposed coordination activities would aim to bring together a range of scientists with similar interests who have not previously worked together, but whose collective interests span multiple relevant, interconnected disciplines and technologies. We expect that the proposed collaboration will generate genome size data leading to myriad and diverse spinoffs, as the data will implicate particularly interesting lineages for further investigation, including cases where genome change appears to be rapid or in an unexpected direction. Once genome sizes are known broadly, future choices of genomes that deserve full shotgun (or complete, pending technological advances and funding) sequence analysis can be based on a much more complete knowledge foundation.
The census relates to crops at multiple direct levels. First, the data generated by grants that will be written or otherwise facilitated by the coordination and planning activities should first uncover unusual and unexpected properties of angiosperm genome size, leading to discoveries regarding repeat structure and, finally, genome sequence. The identification of wild diploid relatives of major and minor crops will provide a road map for complete sequencing, and will focus the development of other molecular tools for those crops. The relevance of orphan crops, particularly in developing countries, should be mentioned. In general, these have not benefited from the application of modern and intense breeding programs, and therefore dramatic yield improvements may be feasible. Finally, through the many investigations stimulated by the interactions enabled through the census, we fully anticipate the discovery of novel genomic features and sequence types, whose evolutionary properties and mechanistic underpinnings presently are unknown but which might have direct relevance to crop improvement.
What type of research coordination might be considered? We propose the following.
This workshop would be designed to provide theoretical underpinning and practical training in census activities. Ideally, workshop participants would include graduate students and postdoctoral research associates, in order to provide them direct experience in technical methods, which are envisaged as being either based on wet laboratory or
An annual conference centered around census goals and activities would provide a natural environment to establish collaborative activities.
This would be set up to allow registration of network participants, to promote interactions among these participants, to archive methods and technologies, and to provide hyperlinks to resources, for example, to tools for resolving conflicts in the application of scientific names.
These would be between participating laboratories, to allow efficient transfer of information, particularly for cross-training students, postdoctoral research associates, and beginning faculty.
These would outline future research needs and directions, Grand Challenge questions, and would provide written support and resources for proposals to acquire the specific data types, starting with
The plant
This will be a critical part of any proposed census, and the search for funds for implementation of the collection, measurement, and archiving activities will be challenging. Review panels in national funding agencies tend not to favor projects that propose surveys without also addressing an underlying biological question. Part of this bias comes from the predominance of hypothesis testing as the core of the scientific method within biological research, and it contrasts dramatically with the situation in other observational sciences, such as astronomy, where large-scale collection of data in the absence of hypotheses is the norm [
Notable resources in terms both of infrastructure and research personnel already exist, and these would provide an excellent foundation for the proposed coordination activities.
Botanical gardens represent the historical locations for plant collections, both living and in archival vouchered forms. Some important examples are in the developed countries, including the Royal Botanic Gardens, Kew (RBG Kew), the Arnold Arboretum, the New York Botanical Garden, and the Missouri Botanical Garden. Others, in developing countries, access notable biodiversity (for example, Xishuangbanna, China, and Bogor, Indonesia). RBG Kew has established a program of measurement of
A second type of collection is exemplified by seed storage programs at the Svalbard Global Seed Vault, at the USDA facility in Fort Collins, Colorado, and by RBG Kew at Wakehurst Place. The first and second are repositories of seed accessions primarily of crop species of importance to the world and to US agriculture, respectively. The third has a comprehensive mandate, having already banked dry seed of 10% of the world’s seed plant diversity since 2000, and aiming to reach 25% by 2020. For the two latter locations, a regular cycle of seed germination, to verify viability in storage over time, is an integral part of the program. The seedling samples generated in this way represent an obvious and available resource for genome size measurements and for genomic DNA extraction.
Global resources in terms of research personnel include cytologists active in genome size measurements, scientists involved in large-scale genome sequencing efforts, taxonomists, systematists, evolutionary biologists, and bioinformaticians. These interested and trained scientists are the most vital resource of all, and the proposed pan-angiosperm
A number of cytologists are active in flow cytometric genome measurement, particularly in Europe. Work by cytologists and collaborators has defined the means to efficiently obtain
The rate of production of genomic sequences is increasing nearly exponentially, driven by considerable innovation in instrumentation and sequencing technologies [
A number of collaborative mechanisms currently exist in the US that link personnel and activities within this group. For example, an NSF Research Coordination Network already exists on the topic of Microevolutionary Molecular and Organismic Research in Plant History (microMorph;
A key program currently driving collaboration in plant bioinformatics in the US is the iPlant Collaborative. Close interaction with this program and its scientists will be important for developing and using informatics tools to link international efforts in angiosperm genomics. iPlant has links with the 1KP project, and via that to NESCent, further extending the potential for collaborative links.
Overall, the proposed pan-angiosperm census should develop and strengthen communication, knowledge, and scientific training between individuals, and their laboratory members, that are acknowledged experts in one, or at most two, of the five research focus areas identified above. Understanding the gaps in information between experts in these areas will allow us to design workshops that effectively identify these gaps, and fill them. Through serving as a bridge between different international groups that are addressing similar census goals, we should be able to facilitate the set-up and funding of formal activities that efficiently divide census tasks across geographical locations, and that successfully engage the support of the various governments. We envisage the process of writing white papers as an important aspect of our activities, since this should provide local funding applications with an international imprimatur signifying approval from the global scientific community. We emphasize that although a number of local activities are in place to address the goal of a molecular census, none are comprehensive and all would benefit from a coordination mechanism such as that suggested here.
We are currently at an unprecedented moment for the investigation of life on Earth. The scale of generation of DNA sequence information dwarfs any previous type of biological information gathering. Decreasing costs of these analyses, and increasing power in tools for the extraction of biologically meaningful insights from these data, continue to advance at extraordinary rates. All plants harbor genetic novelty with potential agricultural, biomedical, environmental, and industrial value, and the small number of species with any molecular analysis at all indicates that only a tiny portion of this value has been identified. Still, with 500,000 species, full genome sequence analysis of all angiosperms is not on the near-term horizon. Priorities need to be set, with genome size and ploidy as key criteria. Moreover, genome size is itself an important biological feature that can help tell us how genomes evolve and function. A broad characterization of
We propose a coordinated process to set priorities and foster communications between laboratories worldwide that will pursue