Source: CORNELL UNIVERSITY submitted to
CONSERVATION, MANAGEMENT, ENHANCEMENT AND UTILIZATION OF PLANT GENETIC RESOURCES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
REVISED
Funding Source
Reporting Frequency
Annual
Accession No.
1014310
Grant No.
(N/A)
Project No.
NYC-149859
Proposal No.
(N/A)
Multistate No.
NC-_old7
Program Code
(N/A)
Project Start Date
Oct 4, 2017
Project End Date
Sep 30, 2022
Grant Year
(N/A)
Project Director
Doyle, JE, J.
Recipient Organization
CORNELL UNIVERSITY
(N/A)
ITHACA,NY 14853
Performing Department
PLANT BREEDING AND GENETICS
Non Technical Summary
This project aims to characterize genetic variation and species relationships in perennial wild relatives of soybean. This group of 30 or more species has levels of genetic variation considerably higher than soybean or its annual progenitor. Perennial species harbor resistance to numerous plant pathogens and are adapted to diverse climatic niches. We seek a better understanding of this largely untapped resource for soybean. The principal source of data will be molecular markers sampled across the entire genome.
Animal Health Component
0%
Research Effort Categories
Basic
80%
Applied
20%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20218201080100%
Knowledge Area
202 - Plant Genetic Resources;

Subject Of Investigation
1820 - Soybean;

Field Of Science
1080 - Genetics;
Goals / Objectives
Collect and maintain plant genetic resources of dedicated crops and their crop wild relatives, evaluate and enhance this germplasm. Characterize plant germplasm using a combination of molecular and traditional techniques and utilize modern plant genetic techniques to help manage plant germplasm.
Project Methods
We have used genotyping by sequencing (GBS) to generate genome wide single nucleotide polymorphism data (SNPs) from 20 species and 115 accessions. These data greatly expand our understanding of relationships within and among species of the perennial wild relatives of soybean that had previously been based on chloroplast, nrDNA and single copy nuclear gene datasets for only a handful of accessions per species. We have sampled as many as half of the putative species but have not sampled enough accessions to cover their geographic ranges. We plan to complete genotyping of at least 800 accessions with current funding but realize that there may be additional taxa of interest or the need to repeat some of the previous work. CSIRO germplasm includes over 2000 accessions of perennial Glycine and the USDA maintains just over 1000 accessions.We will analyze the SNP data at several levels. First, we want to adequately circumscribe species so that future projects do not confuse genetic variation among species with within species variation either at the genetic or phenotypic levels. We will continue to use Bayesian Factor Delimitation to test hypotheses of species relationships. We have successfully determined that the dysploid (2n=38) species referred to as G. tomentella D1 and/or D2 are a single variable species G. "varia" (in prep.). Other questions remain with regards to species delimitation. For example, there are at least ten additional species to be delimited within the G. tomentella species complex and a minimum of three additional species within the G. tabacina diploid species complex.Population Structure: As predominantly selfing species, among population variation can be quite high while levels of genetic variation are low within populations. In our preliminary dataset, G. syndetika, a recently described species known from only ten accessions, has deep population substructure that was not apparent in previous studies with small sample sizes. For this reason we plan to sample as many accessions as possible across the geographic range of each species.Some of the allopolyploid species have expanded their range outside of the Australian continent, beyond the range of their diploid progenitors. We are interested in examining climate niche data for these species and how niche relates to their genetic diversity, comparing accessions within the diploid ranges with those outside of the current progenitor ranges.Within species genetic variation and phenotypic data from other USDA supported projects and collaborations will be used in determining crosses to be made to create mapping populations to investigate white mold and aphid resistance. Linkage maps for G. tomentella D3, G. syndetika and G. stenophita would be useful in understanding the role of chromosome rearrangements and polyploidy in Glycine evolution.

Progress 10/01/20 to 09/30/21

Outputs
Target Audience:Academic scholars, postdocs, and students. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Training opportunities are available for the postdoctoral fellow involved in the project, through collaborations with academic, government, and industry groups. The postdoctoral fellow will also receive additional training in project organization and publication with mentoring of a graduate and undergraduate students working on parts of the project. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?Much effort in the upcoming year will focus on analysis of the large amount of data we now possess. This will require a great deal of integration because of the heterogeneous nature of the data: GBS data and whole genome skim generated here, and BUSCO/organelle genome data from Syngenta. To further collaborative goals with the USDA soybean germplasm collection, we will continue to generate and analyze sequence data from DNA samples sent to us from the collection. Planned emphases in this effort include: all available accessions of the two C-genome species, for comparison with accessions representing potential new species thought to belong to that group; additional samples newly accessioned by the USDA germplasm collection, particularly samples lacking definitive identification; accessions for which we have DNA or seed samples from our nearly 40 years of research on perennial Glycine and for which genome skim data are not yet available. Taxonomic emphasis is likely to be on G. stenophita and allopolyploid species possessing a G. stenophita genome (G. tabacina, G. pescadrensis), given our focus on white mold resistance in G. stenophita accessions. In general, subsequent steps in the overall project will involve focusing on allopolyploids.

Impacts
What was accomplished under these goals? This year was marked by a shift in methodology from genotyping by sequencing (GBS) to whole genome skim sequencing, which was due both to the decreased cost of DNA sequencing and the greater genomics capabilities of the new postdoctoral fellow now overseeing the project. The shift was also stimulated by the availability of Illumina short-read genome skim sequences representing complete organellar genomes and 2500 nuclear genes for over 550 Glycine accessions through collaboration with a private corporation (Syngenta). The nuclear genes were from the BUSCO gene set of genes shared across many taxa and commonly used to assess genome sequencing completeness. A research highlight was the acquisition, from the Australian National Herbarium (CANB), of dried herbarium material from a recently described (2015) new species of Glycine from Western Australia, G. remota. A genome skim sequence was obtained from this species, and combined with GBS sequences representing diploid species of Glycine that had been generated as part of this project to pinpoint G. remota in the Glycine phylogeny as a member of the I-genome. A paper is in preparation describing this finding. Phylogenetic analysis of the Syngenta plastid genome (plastome) sequences was conducted, as were analyses of the first 100 BUSCO nuclear genes to develop a pipeline for the planned full analysis of the nuclear genome. Initial findings are very promising, with practical advances such as identification of mis-classified accessions, and the elucidation of phylogenetic relationships, such as incongruence between plastid and nuclear genomes. Another milestone was the establishment of a formal collaboration with the USDA soybean germplasm collection through its new Curator, Adam Mahan. This led to our sequencing a plate of 96 samples sent to us from the collection, including unidentified or provisionally identified samples recently obtained from Australia. Analysis of these data, as well as of full Syngenta dataset and our previously generated GBS data, is pending.

Publications


    Progress 10/01/19 to 09/30/20

    Outputs
    Target Audience:Seminars and conference presentations were made to academic scholars, postdocs, and students. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?University of Wisconsin, Madison, WI, December, 2019. "A polyploid odyssey." Cornell University, Plant Breeding & Genetics Section, September, 2020. "The impacts of polyploidy, from cells to species." What do you plan to do during the next reporting period to accomplish the goals?To facilitate identification of progenitor taxa and to document the extent of diversity within G. tomentella T4 (= G. tomentella sensu stricto), an allopolyploid comprising subgenomes contributed by an H-genome diploid species (one of 6 possible taxa) and the G. tomentella D3 diploid (D-genome), we propose to de novo assemble genomes from three genetically different T4 individuals using a hybrid approach with long- and short- reads. Specifically we will use Oxford Nanopore to 50x coverage and Illumina Nova-Seq to 50x coverage. We will then use the newly generated genome assemblies as a reference to map our GBS data. Comparing percentage of reads mapped and SNPs called will provide insights into the likely diploid progenitors of the polyploid accessions. Further downstream analyses such as PCA and Structure will allow for determination as to which of the diploids is genetically closest to each of the polyploid's two homoeologous subgenomes. If resources are available, further Illumina (2 x 250 reads) genome resequencing of additional diploid progenitors to 20x coverage will be done to identify regions of the different genomes that are shared. Ultimately, we would like to look for signatures of selection in the genomes of these plants, which are known to harbor resistances and tolerances to biotic and abiotic stresses.

    Impacts
    What was accomplished under these goals? This past year was marked by difficulties even before the pandemic brought the project to a halt. These included deaths and other challenges in family of the Research Associate leading the project. Due to the Research Associate's at-risk status, she did not feel comfortable returning to campus while students were present, even after the lab received reactivation permission. A major goal had been to determine the identities of allotetraploid Glycine populations in the islands between Taiwan and mainland China; on Taiwan and in the Ryukyu Islands of Japan several different allotetraploid species occur, and Chinese colleagues have been studying their island populations and were collaborating with us. However, it was determined after significant investment of effort and communication that the DNA samples proposed for use in Genotyping By Sequencing (GBS) analyses were too degraded for that purpose, which was a major setback. Progress was made on morphological studies of perennial Glycine, leading to the return of herbarium specimens on loan from Royal Botanic Gardens, Kew, that had been borrowed years ago by a graduate student who did not remain for a Ph.D. In addition, some progress was made on Bayes Factor species delimitation and phylogeny reconstruction of diploid accessions using GBS data. This work, when completed, will likely result in the recognition of several new species. Communication is ongoing with the soybean germplasm collection at Illinois, with the goal of helping to curate the collection based on new information generated as part of this project.

    Publications


      Progress 10/01/18 to 09/30/19

      Outputs
      Target Audience:Seminars and conference presentations were made to academic scholars, postdocs, and students. Changes/Problems:To facilitate identification of progenitor taxa and to document the extent of diversity within G. tomentella T4 (= G. tomentella sensu stricto), an allopolyploid contributed by an H genome diploid species (one of 6 possible taxa) and the G. tomentella D3 diploid, we propose to de novo assemble genomes from three genetically different T4 individuals using a hybrid approach with long- and short- reads. Specifically we will use Oxford Nanopore to 50x coverage and Illumina Nova-Seq to 100x coverage. We will then use the newly generated reference sequences to map our GBS data. Using PCA and Structure, we will be able to determine which of the diploids is genetically closest to each of the polyploid's two homoeologous subgenomes. If time and money allow, we will sequence additional diploid progenitors to 50x coverage using Illumina (2 x 250 reads) to identify regions of the different genomes that are shared. Ultimately, we would like to look for signatures of selection in the genomes of these plants, which are known to harbor resistances and tolerances to biotic and abiotic stresses. What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?The PI has reported findings at the following seminars and conference: West Virginia University, Morgantown, WV, November, 2018. "Polyploidy: Significance and Unanswered Questions." 45th Annual South African Association of Botanists, African Mycological Society, and South African Society for Systematic Botany Joint Congress, Johannesburg, South Africa, January, 2019 (keynote/plenary speaker). "A systematist in Wonderland: Harnessing multi-omics data to understand patterns of plant biodiversity and the processes involved in its generation." College of William & Mary, Williamsburg, VA, March, 2019. "What does polyploidy do?" What do you plan to do during the next reporting period to accomplish the goals?Plans are in place to complete 4-6 more GBS libraries that are composed mostly of tetraploid individuals, including newly discovered material from China. To help in determining progenitor taxa we propose additional sequencing described below in the section on changes. We will take an iterative approach to our data analysis of the tetraploids, including a "big picture" analysis with all taxa included but fewer individuals and focusing on individual triads (2 diploid progenitors and allopolyploid). Our additional work will focus on a triad that is more diverse than those previously studied. Preliminary analyses are completed, additional analyses needed for publication will be undertaken.

      Impacts
      What was accomplished under these goals? We have taken an iterative approach to data analysis. With data from over 600 accessions we have determined accessions that were mis-identified in the USDA/GRIN and/or CSIRO germplasm collections, or that are new taxa. Work in the past year has focused on G. tomentella complex diploid species. In many instances it was important to include tetraploid individuals in initial analyses to identify mis-identified species. We used NeighborNet in the SplitsTree package to cluster accessions. This preliminary step helped identify new taxa. We've used Admixture and Bayesian phylogenetic analyses to confirm groupings and new taxa. Bayesian Factor Delimitation favors the hypothesis with 13 species among diploid G. tomentella, which is four more species than anticipated based on previous work with markers having less resolution than our genome-wide GBS approach. Preliminary investigation of tetraploid data sets indicates that four of as many as eight tetraploid species are not going to be as easy to determine progenitor species as once thought. The number of potential diploid progenitor taxa have increased and their close relationships makes it challenging to determine which diploid dataset to work with. We have proposed changes below to facilitate accurate description of the genetic make-up of the tetraploids.

      Publications


        Progress 10/04/17 to 09/30/18

        Outputs
        Target Audience:Researchers interested in wild relatives of soybean; biologists interested in species delimitation, polyploidy and legume phylogeny; anyone interested in the collection, identification, and maintenance of diversity in germplasm banks. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Presentations by Jeff Doyle: Reed College, Portland, OR, November 10, 2017. "Perspectives on the Prevalence, Pattern, and Process of Plant Polyploidy" University of Texas, Austin, March, 2018. "Polyploidy: Significance and Unanswered Questions" Michigan State University, March, 2018. "Polyploidy: Significance and Unanswered Questions" 7thInternational Legume Conference, Sendai, Japan, September, 2018 (symposium co-organizer and speaker, Root to tip legume phylogenomics: Building the Foundation for Next Generation Legume Systematics). "Genomics, transcriptomics, and more: The making of a model non-model legume system, perennialGlycine(Phaseoleae)" West Virginia University, Morgantown, WV, November, 2018. "Polyploidy: Significance and Unanswered Questions." What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

        Impacts
        What was accomplished under these goals? The legume genus, Glycine, comprises two groups of species (subgenera): 1) the annual subgenus, Soja, consisting of two recognized annual species, the soybean (G. max) and its wild progenitor, G. soja, native to eastern Asia; and 2) the perennial subgenus, Glycine, most of whose species are native to Australia. For decades there were barely a half dozen recognized species in subgenus Glycine. Today there are 29 recognized species in the subgenus, but it has been known that one named species, G. tomentella, actually encompassed a polyploid complex involving a minimum of eight species requiring delimitation and species description. Additionally, it was known that G. tabacina, a name properly applied only to a tetraploid entity, also includes at least three still un-named diploids. We are using a "next-generation" sequencing approach (genotyping by sequencing; GBS), which detects variation at many thousands of DNA sites across the entire genome, to sample more accessions than any previous projects. This has illuminated more species and variation that originally thought. For example, we have discovered more variation among accessions designated "G. tomentella" and are now considering it to harbor up to fourteen undescribed species. A second important contribution of this project will be to alert the USDA germplasm repository of mis-identified accessions. So far, out of around 542 accessions, 39 accessions have had incorrect species designations. At least a dozen accessions without designations have been placed in the appropriate species groups. Another dozen accessions have not been placed within a known species group. These accessions represent either new species or are awaiting analyses with other groups of taxa. The major emphasis in this portion of the NC 7 project is focused on the group's second goal, to use molecular methods to characterize the wild perennial Glycine germplasm available. 1) We have sequenced and mapped reads from eight GBS libraries. Reads and SNPs were checked and filtered for quality. SNPs were used in network analyses to confirm diploid and tetraploid accessions, confirm previous designations or test hypotheses of new affiliations. 2) We have over 20,000 filtered genome-wide SNPs from 542 accessions of perennial Glycine accessions. These SNPs have been employed in network analyses at several different levels of comparison. 3) The most striking result we have to date is that what we had considered a single species based on smaller sample sizes have levels of divergence that suggest that what was considered one species should actually be recognized as two species. Ironically, as we investigate more accessions, an older hypothesis of two species in a single taxon appears more likely to be one species. Our understanding of the depth of divergence between taxa and genome groups has improved. We now are investigating the possibility there are a total of 14 unrecognized species within G. tomentella. 4) When analyses are completed we will notify the Soybean Germplasm repository of accessions that have changed species affiliation. The data that have been collected will be use to name and describe new species. We have found from other projects involved in assessing the wild relatives of soybean, that understanding species and the numbers of accessions for each species is imperative to designing genetic experiments. Too few accessions within a species makes tools like genome-wide association mapping unusable. As we develop species identifications and analyze collection patterns we will be able to inform future collections in Australia.

        Publications