Source: VIRGINIA POLYTECHNIC INSTITUTE submitted to NRP
GENOME-GUIDED ADAPTIVE INTROGRESSION IN DISEASE RESISTANT AMERICAN CHESTNUT POPULATIONS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1018599
Grant No.
2019-67013-29173
Cumulative Award Amt.
$500,000.00
Proposal No.
2018-06175
Multistate No.
(N/A)
Project Start Date
Feb 15, 2019
Project End Date
Feb 14, 2024
Grant Year
2019
Program Code
[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production
Recipient Organization
VIRGINIA POLYTECHNIC INSTITUTE
(N/A)
BLACKSBURG,VA 24061
Performing Department
For Resources & Environ Consrv
Non Technical Summary
American chestnut was once a dominant and keystone species of eastern hardwood forests, providing substantial ecosystem services and economic benefits. In the early part of the 20th century, chestnuts were decimated by an exotic fungal blight. Billions of trees were killed by the blight, although the species persists as periodic sprouts from uninfected roots. Two parallel methods have emerged to develop blight-resistant trees for reforestation - hybrid breeding and genetic modification- which are both nearing the goal of robust disease resistant trees. However, before widespread re-introduction of disease resistant chestnuts to natural forests can be realized, we need to incorporate natural adaptive genetic diversity. To do this, we will couple genome re-sequencing with modern analytical tools to comprehensively characterize patterns of genomic variation in wild chestnuts. Specifically, we will sequence the genomes of 500 trees from across the natural species range. We will use these data to characterize contemporary population connectivity (i.e., barriers to pollen and/or seed movement); to understand historical demographic events that may impinge on genetic diversity across the species range; and to identify relationships between genome and environment. This information will be used to develop a strategy for vegetative propagation to conserve naturally occurring adaptive diversity in wild chestnut trees, which will allow us to expand the genomic variability and provide for local adaption in disease resistant populations. This project primarily aligns with the 'locally adapted cultivar development' program area priority.
Animal Health Component
70%
Research Effort Categories
Basic
30%
Applied
70%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2020613108050%
2020613108150%
Goals / Objectives
The rapid growth of American chestnut, coupled with its decay resistant wood, previously made it the single most valuable hardwood species in the country. Moreover, the prodigious and reliable chestnut seed crop was an important source of food and feed throughout its native range. In the first half of the 20th century, an exotic fungal blight decimated the species, killing approximately four billion trees. Two approaches to developing blight resistant American chestnut's are currently in progress. First, The American Chestnut Foundation (TACF) has introduced blight resistance genes from Chinese chestnut (Castanea mollissima) into American chestnut, followed by repeated backcrossing to American chestnut to yield disease resistant populations for reforestation. Second, TACF and collaborators have developed genetically modified American chestnut lines that show levels of resistance comparable to that of Chinese chestnut. Federal regulatory approval is currently being sought (decisions expected by 2020) to use these trees for restoration. Crossing the transgenic founder lines with wild germplasm will be required to produce diverse, locally adapted reforestation populations.Our long-term goal is to develop disease resistant, locally adapted American chestnut populations for restoration of the species. To this end, we will comprehensively characterize the genomic basis of local adaptation in remaining natural American chestnut populations, which will guide further breeding to both increase the effective population size and improve local adaptation of our blight resistant cultivars. We will accomplish these goals through the following objectives:Objective I. Rangewide sampling and genome re-sequencing of natural chestnut populations (Year 1) -In collaboration with TACF'sextensive network of state chapters, we are in the process of collecting ~1000 geo-referenced leaf samples from across the species range. We will sequence the genome of ~500 of these samples, using the recently completedC. dentatagenome as a reference. This will result in a comprehensive dataset of genetic diversity for American chestnut, both across the species range and across the genome.Objective II. Population structure, gene flow, and demographic history (Year 2-3) - As natural barriers to migration of pollen and seed, as well as historical demography, may affect how we distribute our sampling effort for vegetative propagation and subsequent breeding, we will characterize range-wide and local population structure (i.e., connectivity, or lack thereof, among populations)and historical population size changes.Objective III. Genotype-environment analyses to uncover the genomic bases of local adaptation (Year 2-3) - We will identify relationships between genome and environment using a variety of modern statistical tools.The results of these analyses will inform selection of germplasm for conservation/breeding through genomic prediction of environment for individual trees.Objective IV. Genome-informed germplasm conservation and breeding (Year 3-4) - Objectives (i) through (iii) will guide selection and vegetative propagation of a diverse sample of wild chestnut germplasm that will be housed in TACF orchards, and subsequently crossed with our transgenic, disease resistant cultivars to develop deployment populations for specific locations across the natural chestnut range.
Project Methods
Objective I. Rangewide sampling and genome re-sequencing of natural chestnut populations (Year 1) - We will generate 15X re-sequencing data for 500 chestnut trees sampled from across 23 U.S. states and one Canadian province (Ontario). To do this, we will extract genomic DNA (gDNA) from young leaf tissue using a Qiagen DNeasy kit and prepare libraries using the TruSeq DNA PCR Free Library Preparation Kit. The libraries will be sequenced on an Illumina NovaSeq 6000 instrument with the S4 flow-cell in 2x150bp paired end mode. Resulting data will be aligned to the C. dentata reference genome using the Burrows-Wheeler Aligner (BWA) mem algorithm, and subsequently converted to BAM format, sorted, and indexed with SAMtools. Polymorphisms (SNPs and INDELs) will be called using the Genome Analysis Toolkit (GATK) HaplotypeCaller algorithm, with the resulting VCF file filtered to yield a final high quality dataset.Objective II. Population structure, gene flow, and demographic history (Year 2-3) - To identify demographically independent units of potential germplasm conservation interest, we will use Discriminant Analysis of Principal Components (DAPC), which combines the dimensionality reduction of PCA with the between-group partitioning of discriminant analysis, and is able to successfully recover population groupings from data simulated under a variety of population genetic models. Like many temperate tree species, we expect isolation-by-distance in American chestnut. Thus, there is unlikely to be a 'true' number of discrete genetic populations. Our goal is therefore to identify a number of clusters that provides a useful representation of the among-population variability present in the species.To understand the demographic history of extant chestnut populations, we will use SMC++ (SequentialMarkovCoalescent+ Plenty ofUnlabeledSamples), a computationally efficient extension of the pairwise sequentially Markov coalescent (PSMC). SNP data will first be converted to SMC++ format with the vcf2smc function, and population histories will be subsequently inferred with the estimate function, using a mutation rate of 5 x 10-8. The outcome of this analysis will be estimates of effective population size (Ne) going backward in time. This will reveal both the current species Ne, as well as historical changes associated with late Pleistocene climate oscillations, and will allow us to identify extant populations that may have been acutely affected by such bottlenecks, and thus harbor less extant variation.Objective III. Genotype-environment analyses to uncover the genomic bases of local adaptation (Year 2-3) - To identify relationships between allele frequencies and environmental variables, we will use Bayenv2 software. The intuition behind this method is to control genotype-environment relationships for the shared evolutionary history of populations by including an allelic covariance matrix estimated from a panel of neutral markers. We will run Bayenv2 on both raw climate variables as well as the leading principal components of all 21 worldclim variables. The outcome of these analyses will be a large set of candidate loci for local adaptation to be used in subsequent modeling of multilocus genome-environment relationships (see below). It should be noted that our goal is to capture as much standing adaptive variation as possible. Thus, false negatives are more of a concern than false positives.We will also test for signals of local adaptation in a multivariate framework. An advantage of this approach over standard univariate outlier tests is the ability to detect subtle shifts in allele frequencies across multiple loci, while simultaneously identifying the key environmental variables to which those shifts are attributable. Specifically, we will employ redundancy analysis (RDA), a constrained ordination approach in which multivariate regressions are fitted between a set of predictors (i.e., environmental and geographic variables) and response variables (i.e., genotype data). Significant RDA axes will be identified through permutation, and candidate genotype-environment relationships will be those with loadings >3 standard deviations from the mean loadings of the significant axes. This method has been shown to outperform both univariate and several other multivariate approaches in terms of both true positive and false positive rates.Objective IV. Genome-informed germplasm conservation and breeding (Year 3-4) - We will take a two-pronged approach to prioritizing areas for ex situ conservation and breeding with disease resistant chestnut lines. First, broad management units will be defined on the basis of the results of Objective II, in which neutral markers were used to elucidate patterns of population structure and postglacial history for the species. This will be done by identifying geographic breakpoints revealed by the DAPC analysis, which may include, for example, broad northern and southern groups, and a division defined by the Appalachian mountains. Second, adaptive units will be identified within these larger management units using machine learning. The intuition behind this method is that the genotype-environment relationships uncovered in Objective III provide information on genomic architecture of local adaptation. Our goal is to understand how the underlying multivariate genotypes are arrayed in space, and how to divide the range into provenances that are homogeneous with respect to their genome-wide adaptive allelic content. Targets for the number of wild-type trees to propagate within each management and adaptive unit will be proportional to allelic diversity among putatively adaptive loci and to the predicted future spatial extent of suitable climate associated with each adaptive unit. Methods for propagation of wild trees will include transplanting re-sprouted stems into orchards, grafting wild scion onto root stock, top-work grafting, and collecting seed from rare flowering wild trees. All of these methods have previously been used in TACF's backcross breeding program.We used simulations to determine how varying the number of transgenic founders and generations of outcrossing affects diversity in our blight resistant lines. These simulations indicated that effective population size can be increased to > 500 individuals and the average inbreeding coefficient reduced to < 0.01 by outcrossing two transgenic founder trees over three generations with 50, 150, and 450 (650 total) wild-type American parents, and using 3 progeny per cross as parents in subsequent generations. To combine resistance to chestnut blight and phytopthora root rot, American chestnut backcross trees selected for resistance to Phytophthora cinnamomi will be bred with transgenic blight resistant trees in the third outcross generation.

Progress 02/15/19 to 02/14/24

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. Former PhD candidate Alexander Sandercock was primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication in Molecular Ecology last year and another that has been published in PNAS. Dr. Sandercock defended his thesis May 26, 2023, and has since moved to a post-doctoral position with USDA/Cornell Breeding Insight, where he is leveraging the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?Sequence data have been deposited in the GenBank short read archive (PRJNA804196), and our allied projects with the Hudson Alpha Institute to generate highquality American and haplotype-resolved Chinese chestnut genomes are available on Phytozome. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The American chestnut (Castanea dentata), once a prominent hardwood species in the eastern United States, was devastated by the introduction of the fungal pathogen Cryphonectria parasitica in the early 20th century, leading to the functional extinction of the species. Despite the survival of millions of trees as root collar sprouts, these trees rarely reproduce. We are developing blight-resistant American chestnuts through interspecific hybridization, conspecific backcrossing, and genetic engineering, but success relies on incorporating adaptive genomic diversity from wild germplasm to produce locally adapted restoration populations across varying climatic conditions. To inform these efforts, we conducted a comprehensive analysis on 384 wild American chestnut trees to assess population structure, demographic history, and genomic diversity. Our study identified three distinct genetic populations--northeast, central, and southwest--each with unique adaptive allele frequencies. We found the highest genomic diversity in the southwest, reflecting historical bottleneck events associated with Quaternary glaciation. Genomic regions under positive selection suggest a common evolutionary response to fungal pathogens across populations, indicating that American chestnuts underwent postglacial expansion from the southern part of their range. To develop sampling recommendations for ex situ conservation of wild adaptive genetic variation, we identified polymorphsims with evidence of past climate-related selection,and found that on the basis of this subset of SNPs, the species range can be subdivided into three seed zones, andthat 21 to 29 trees per seed zone will need to be conserved to capture most extant adaptive diversity.Additionally, we evaluated 269 backcross trees to understand the extent to which breeding programs have captured wild adaptive diversity. This analysis provided insights into optimal reintroduction sites for specific families based on their adaptive profiles and projected future climate conditions. Our findings indicate that 21 to 29 trees per seed zone will need conservation to capture most extant adaptive diversity. Overall, our results offer a strategic blueprint for the ex situ conservation of germplasm and the targeted reintroduction of blight-resistant American chestnut populations. This approach can be applied to restore the American chestnut across its native range and offers a model for developing restoration plans for other imperiled tree species.

Publications

  • Type: Journal Articles Status: Published Year Published: 2024 Citation: Sandercock A.M., Westbrook J.W., Zhang Q, Holliday J.A. (2024). A genome-guided strategy for climate resilience in American chestnut restoration populations. Proceedings of the National Academy of Sciences, 121 (30) e2403505121. https://doi.org/10.1073/pnas.240350512


Progress 02/15/23 to 02/14/24

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. Former PhD candidate Alexander Sandercock was primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication in Molecular Ecology last year and another that has been accepted pending revision in PNAS. Dr. Sandercock defended his thesis May 26, 2023, and has since moved to a post-doctoral position with USDA/Cornell Breeding Insights, where he is leveraging the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?In addition to journal publications and conference presentations detailed above, our sequencing data have been deposited in the GenBank short read archive (PRJNA804196), and our allied projects with the Hudson Alpha Institute to generate high quality American and haplotype-resolved Chinese chestnut genomes are avilable on Phytozome (https://phytozome-next.jgi.doe.gov/info/Cdentata_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP1_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP1_v1_1). What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We showed previously that adaptive diversity across the American chestnut range can be split into three distinct groups: southern, central, and northern. These divisions corresponded with temperature and precipitation gradients, suggesting climate and population structure significantly influenced genomic variation. We used a re-sampling approach to show that collecting pollen from approximately 20-30 trees in each zone effectively captures adaptive diversity for future breeding efforts, and plans for this sampling are being developed. Using whole-genome sequencing on 371 selected backcross trees from the breeding program of The American Chestnut Foundation (TACF), we also found that historical sampling of wild pollen has been reasonably effective at capturing the above diversity, but careful construction of future crosses will be necessary to avoid it's loss in future generations. Breeding simulations were performed in AlphaSimR to assess how allelic diversity and correlations with wildtype allele frequencies varied with increasing numbers of crosses among backcross and wild type parent trees currently planted in TACF orchards. We found that greater than 95% of the allelic diversity represented with 30 crosses in each seed zone with only marginal gains from doing more crosses. During the current reporting period we primarily focused on completing a manuscript detailing the above results, which has been preliminarily accepted for publication in The Proceedings of the National Academy of Sciences.

Publications

  • Type: Journal Articles Status: Published Year Published: 2023 Citation: Conn C.E E., Howie N., Lynch M., Lee S., Young E., Westbrook J., Holliday J., Zhang Q. & Cipollini M.L L. (2023). Validation of an Alternative Small Stem Assay for Blight Resistance in Chestnut Seedlings and Recommendations for Broader Use. PLANT DISEASE, 107(5), 1576-1583. doi:10.1094/PDIS-06-22-1489-RE
  • Type: Journal Articles Status: Other Year Published: 2024 Citation: (Accepted subject to revision) Sandercock A., Westbrook J., Zhang Q. & Holliday J. (2024). The road to restoration: Identifying and conserving the adaptive legacy of American chestnut. Proceedings of the National Academy of Sciences.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Holliday J., Westbrook J., Sandercock A. & Malukiewicz J. (2023). Quantitative, functional, and comparative genomic tools for species restoration: the case of American chestnut. In Southern Forest Tree Improvement Conference. Knoxville, TN.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Holliday J., Westbrook J., Malukiewicz J. & Sandercock A. (2023, September 27). Quantitative, functional, and comparative genomic tools for species restoration: the case of American chestnut. In VII Encuentro Cient�fico en Biolog�a Vegetal y Biotecnolog�a, de mol�culas a ecosistemas. University of Talca, Chile.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Holliday J., Sandercock A. & Westbrook J. (2023). Genomic tools for American chestnut restoration. In Forest Genetics 2023. Vernon, BC, Canada.


Progress 02/15/22 to 02/14/23

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. PhD candidateAlexander Sandercock has been primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication last year and another to be submitted by end of summer 2023. Mr. Sandercock will be defending his thesis May 26, 2023, and will subsequently be moving on to a post-doctoral position with USDA/ARS Breeding Insight, where he will leverage the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at one online meeting organized by the American Chestnut Foundation. Our sequencing data have been deposited in the GenBank short read archive (PRJNA804196), and our alied projects with the Hudson Alpha Institute to generate high quality American and haplotype-resolved Chinese chestnut genomes are avilable on Phytozome (https://phytozome-next.jgi.doe.gov/info/Cdentata_v1_1;https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP1_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP1_v1_1). What do you plan to do during the next reporting period to accomplish the goals?We are currently in a 6-month no cost extension and are focused on completing a manuscript describing patterns of local adaptation in wild American chestnut populations, as well as in the backcross breeding program, and how this information can/will be used to develop the ex situ conservation plan that was the ultimate goal of this project.

Impacts
What was accomplished under these goals? Identifying loci underlying climatic adaptation and defining seed zones We used our 21 million SNP whole-genome sequencing dataset to pinpoint loci associated with climate, define seed zones for germplasm conservation, and create a new method for sampling wild adaptive diversity. Our past work indicated the American chestnut range was split into three distinct groups: southern, central, and northern. These divisions corresponded with temperature and precipitation gradients, suggesting climate and population structure significantly influenced genomic variation. Our models demonstrated that climate contributed the most to explainable variance. In assessing genotype-environment associations, we used ten climate variables. To manage computational demands and limit the need for multiple test corrections, we replaced the environmental variables with three PC axes. This approach identified a total of 18,483 potentially adaptive SNPs across two GEA methods. We hypothesized that climate-related genomic variation in the American chestnut could be divided into seed zones reflecting eastern US temperature and precipitation gradients. Models indicated that two or three zones were likely, with the latter dividing the range into north, central, and south regions. The genetic differentiation between these seed zones was significantly higher at adaptive than neutral loci, indicating that while the number of seed zones was the same as the number of background populations, our approach successfully identified loci related to local adaptation. Our previous work showed that genomic diversity in American chestnut decreased with increasing latitude, suggesting a lower sampling intensity in the southernmost Zone 3, and a higher intensity in northern zones. We found this to be true, with fewer trees needed in Zone 3, and more in Zone 1, to achieve specific diversity targets. We also found that reaching a 99% allele frequency match required approximately 10 times the sampling intensity as a 90% match. This finding had implications for how many trees would need to be sampled depending on the seed zone model used. The adaptive capacity of the backcross breeding program We also conducted whole-genome sequencing on 371 selected backcross trees from the breeding program of The American Chestnut Foundation (TACF). Our goals were to quantify the wild adaptive diversity within the backcross populations and identify the most suitable areas for reintroducing the backcross families. Although TACF and its state chapters maintain breeding orchards across the species' historical range, pollen collections favored flowering wild trees from the central area, Seed Zone 2. As a result, we predicted a Seed Zone 2 ancestry bias within the backcross samples. Indeed, ancestry for TACF and central state chapter breeding materials primarily traced back to Seed Zone 2. Of the backcross trees, about 76% contained ancestry from at least two seed zones. The allele-frequency distribution for adaptive loci in both wild and backcross samples showed a shift towards medium frequency alleles in the wild population and low frequency alleles in the backcross population. Despite this, the overall correspondence between the wild and backcross populations was high, with the backcross population explaining approximately 80% of allele frequency variation in the wild population. Given the complex ancestry of the backcross trees, determining the best geography and climate for each tree based solely on pedigree information is challenging. We used Locator software to estimate the most suitable seed zone for each tree. This software proved reasonably accurate, averaging an error of 169.82 km for wild-type trees and 313.16 km for backcross trees. Although the backcross population contained adaptive genomic diversity from throughout the natural range, it contained the least from the southern Seed Zone 3. We will therefore recommend enhanced sampling intensity from Seed Zone 3. In future we hope to develop reciprocal common gardens to measure phenotypic reaction norms to different climates, and test our predicted matching of genotypes to geography/climate.

Publications

  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Sandercock AM, Westbrook J, Zhang Q, Johnson H, Saielli T, Scrivani J, Fitzsimmons S, Collins K, Schmutz J, Grimwood J, Holliday JA (2022) Frozen in time: Rangewide genomic diversity, structure, and demographic history of relict American chestnut populations. Molecular Ecology 31 (18), 4640-4655.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Conn CE, Howie N, Lynch M, Lee S, Young E, Westbrook JW, Holliday JA, Zhang Q, Cipollini M (2022) Validation of an alternative small stem assay for blight resistance in backcross hybrid chestnuts (Castanea spp.) and recommendations for its expanded use. Plant Disease. 2022/11/16.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Holliday JA, Sandercock A, Westbrook J. Discovery of candidate genes for blight and root rot resistance in Castanea. TACF Annual Meeting (Invited). Sept 30-Oct , 2022.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Holliday JA, Sandercock A, Westbrook J. Genomic tools for species restoration: the case of American chestnut (Castanea dentata). IUFRO Tree Biotechnology Conference (Invited). July 6-8, 2022.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Sandercock, A., Holliday, J., & Westbrook, J. (2021) Landscape genomics of American chestnut. In TACF Science and Technology Committee Annual Meeting. Online.


Progress 02/15/21 to 02/14/22

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. Prior to the pandemic, we also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period Ph.D. student Alex Sandercockpresented at one online meeting organized by the AmericanChestnut Foundation. Alex also drafted a manuscript focused on his analyses of diversity, population structure, and demographic history in chestnut (Objective II), which has been submitted as a preprint to bioRxiv and will be submitted to a journal for publication shortly. With this submission, our whole-genome sequence data will be released in the sequence read archive at NCBI. Finally, we published an article in Chestnut (the Journal of The American Chestnut Foundation), which summarizes our findings to date with respect to genetic diversity in wild chestnut populations. What do you plan to do during the next reporting period to accomplish the goals?The next reporting period will thus be focused on data analysis related to Objectives III and IV. Specifically, we will test for genotype-environment relationships Latent Factor Mixed Models (LFMM) andBayPass software. We will also useredundancy analysis (RDA) to test for multivariate genotype-environment relationships. RDA isa constrained ordination approach in which multivariate regressions are fitted between a set of predictors (i.e., environmental and geographic variables) and response variables (in this case, genotype data).The outcome of these analyses will be a large set of candidate loci for local adaptation to be used in subsequent modeling of multilocus genome-environment relationships. We will then partition our three populations(or, management units) into adaptive units using Multivariate Random Forests (MRF).Specifically, we will first reduce the dimensionality of the genotype matrix containing adaptive loci with PCA, retaining those principal components that capture approximately 90% of the total variance. We will then build MRFswith individual PC loadings as the response variables and geography (latitude, longitude, elevation) as predictors. Each management unit will be treated as an independent group in this analysis, such that the result is a set of geographic coordinates that define relatively homogeneous provenances with respect to the frequencies of adaptive alleles, and these provenances will form the basis for ex situ conservation efforts.

Impacts
What was accomplished under these goals? During this reporting period we used our WGS data for each of 384 American chestnut genotypes, sampled from across the entire historical species range, as well as a reference panel comprising all congeners, to estimate population structure, demographic history, genomic diversity, and signatures of selection.We also performed WGS on aCastaneaspecies reference panel of 96 individuals to detect potential hybrid ancestry in the putativeC. dentatasamples. The reference panel included 19C. sativa, 15C. pumilavar.pumila, 10C. pumilavar.ozarkensis, 6C. pumilavar.alabamensis, 4C. dentata, 1C. dentataxmollissimahybrid, 1C. seguinii, 2C. henryi, 18C. crenata, and 20C. mollissima. Of the 384C. dentatasamples sequenced, 86 had greater than 20x coverage, 242 had 10-20x coverage, and 56 had less than 10x coverage. Eighteen samples with greater than 10% missing data were removed, and 10 additional samples were removed that had > 10% cluster membership with one or more of theCastaneaspecies reference samples based on ADMIXTURE analysis. The finalC. dentatadataset contained 356 individuals with an average coverage of ~17x and 21,116,005 high quality SNPs. TheCastaneaspecies reference dataset contained 92 samples and 49,309,429 SNPs that passed the filtering criteria. Population structure withinC. dentata, estimated with Discriminant Analysis of Principal Components (DAPC) and ADMIXTURE software, was best explained by a two or three population model. The three population ADMIXTURE model was characterized by a southwest, central, and northeast cluster. The southwest and central population separated in northern Georgia and eastern Tennessee, while the central and northeast population have an area of admixture in Pennsylvania before becoming more distinctly separated in southern New York. The two population DAPC model included the same southern population and boundary as ADMIXTURE, but the central and northeastern populations were merged. Both analyses were mostly in agreement with population memberships at the same K values. SMC++ estimates of effective population size (Ne)over time suggest that each population underwent contractions and expansions beginning approximately two million years ago. All populations followed a similar pattern of demographic history, however, the southwest population lagged the central and northeastern populations' events by approximately 100,000 years. Nerapidly increased for all three populations approximately 6,700-11,700 years ago, after which the central population underwent an additional contraction within the past 7,000 years. The southwest population had the highest contemporary Ne, followed by the northeast and central populations(Ne(southwest)=20,306, Ne(central)= 8,347, Ne(northeast)= 13,078). The southwest population had the greatest nucleotide diversity, followed by central and northeast populations (πsouthwest= 0.0069; πcentral= 0.0064; πnortheast= 0.0058). All populations had negative average Tajima's D, which were similarly clinal (Dsouthwest= -1.083; Dcentral=-1.016; Dnortheast=-0.335). Consistent with these negative values for Tajima's D, there was a deficiency of rare variants and an excess of high frequency variants in each population, suggesting recent expansion following a bottleneck. Sliding window analyses revealed heterogeneous genome-wide Tajima's D, nucleotide diversity, and FST. Throughout the genome, the southwest population had the most negative Tajima's D values, followed by the central and northeast populations. Conversely, the southwest population had the highest nucleotide diversity values throughout the genome, with decreasing values for the central and northeast populations. The highest FSTvalues were attributed to the southwest-northeast population pair. Finally, we identified genomic regions under selection within each population, which suggests that defense against fungal pathogens is a common target of selection across all populations. Taken together, these results suggest that American chestnut underwent a postglacial expansion from the southern portion of its range leading to three extant populations. These populations will serve as management units for breeding adaptive genetic variation into the blight-resistant tree populations for targeted reintroduction efforts.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Sandercock, A., Holliday, J., & Westbrook, J. (2021). Landscape genomics of American chestnut. In TACF Science and Technology Committee Annual Meeting. Online.
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Sandercock AM, Westbrook J, Zhang Q, Johnson H, Saielli T, Scrivani J, Fitzsimmons S, Collins K, Schmutz J, Grimwood J, Holliday JA (2022) Whole-genome resequencing reveals the population structure, genomic diversity, and demographic history of American chestnut (Castanea dentata). bioRxiv. doi: https://doi.org/10.1101/2022.02.11.480151
  • Type: Other Status: Published Year Published: 2022 Citation: Sandercock AM, Westbrook J, Holliday JA (2022) The history and landscape of genetic diversity in American Chestnut. Chestnut: Journal of The American Chestnut Foundation.


Progress 02/15/20 to 02/14/21

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:Not surprisingly, the pandemic slowed our progress somewhat. We were unable to use the lab for DNA extractions during spring 2020, and have been greatly limited in capacity since then. Our collaborators at Hudson Alpha shifted focus somewhat to sequencing viral genomes, and this slowed their progress with our libraries sequencing. Nevertheless, we now have all data in hand and expect to make rapid progress with analysis in the coming year. What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. Prior to the pandemic, we also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. We currently have one undergraduate (Risa Dickerman) helping with gDNA extraction, who is also developing her own project around estimating population structure in chestnut. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at one online meeting organized by the American Chestnut Foundation. Once completed, our sequencing data will be deposited in the GenBank short read archive, and processed data files will be deposited in DataDryad. What do you plan to do during the next reporting period to accomplish the goals?The next reporting period will thus be focused on data analysis related to Objectives II and III.We recently received the final sequence data and graduate student Alex Sandercock is in the process of completing the bioinformatics tasks. Alex developed a pipelinefor bioinformatic processing of these large sequence files that involves splitting the data into individual chromosomes and completing the alignment and variant calling steps in parallel on our HPC systems. Alex has also been testinga variety of software tools for the analysis of population structure (e.g., 'Admixture', 'Discriminant Analysis of Principal Components', 'Uniform Manifold Approximation and Projection'), demographic history (e.g., 'Sequential Markovian Coalescent', 'NeEstimator', 'SNeP'), and genotype-environment relationships (e.g., 'Latent Factor Mixed Models', 'BAYENV'). We describe some preliminary results from these analyses above, and will repeat these and others once the full SNP set from all 384 samples is available.

Impacts
What was accomplished under these goals? Sampling To date, 384 American chestnut trees have been sampled and had their DNA extracted for this study. The trees were sampled throughout the American chestnut geographical range and from different ecoregions. Chestnut leaf samples were obtained by TACF and citizen volunteers in each region. Leaf samples were primarily collected from May through July in 2018, 2019, and 2020 with a preference for young leaves, and the GPS coordinates of each sample location were recorded. When possible, leaves were kept cool using wet packs and shipped cool to preserve the DNA. If wet packs were not available, leaves were desiccated with silica gel for shipping. DNA Isolation/Sequencing Upon receiving the samples, we cataloged and stored the leaves at -80C. For the first 96 DNA extractions, we used Qiagen's DNAeasy Plant DNA extraction kit, modified with a phenol-chloroform cleanup step instead of the Qiagen "shredder" column. This modification was used to reduce the chance of DNA loss in the cleanup steps since the leaves that were used were older. When DNA concentration was low, a secondary CTAB-based extraction was used. For samples 97 through 384, we ground samples to a powder using a Spex 2000 Geno/Grinder and used Qiagen's DNAeasy Plant DNA extraction kit, modified with an additional 100% ethanol wash step to remove salts that carried over from the precipitation steps. For each protocol leaves were evaluated for quality and quantity with a Nanodrop and Qubit, respectively. DNA was then stored in a 100-200 ul AE solution at -20C. Library preparation and genomic sequencing were conducted at the HudsonAlpha Institute for Biotechnology (HAI). In collaboration with HAI, genomic DNA (gDNA) samples were sequenced on an Illumina NovaSeq 6000 instrument. The Illumina S4 flow-cell was in a 2x150bp paired-end mode. Of the 384 samples sequenced, 86 had greater than 20x coverage, 242 had 10-20x coverage, and 56 had less than 10x coverage. Bioinformatics The bioinformatics analyses were performed on Virginia Tech's Advanced Research Computing system due to the large dataset and required computing resources. A reference genome for American chestnut (previously completed by HAI) was used for this analysis. SNPs were called using a custom pipeline adapted from the Broad Institute's Genome Analysis Toolkit (GATK) best practices. The resulting individual fastq files sent from Hudson Alpha were aligned using the Burrows-Wheeler Aligner (BWA)memalgorithm with theC. dentatagenome as a reference. The resulting SAM files were converted to BAM format and then sorted and indexed using SAMtools (Li & Durbin 2010; Liet al.2009). The GATK HaplotypeCaller algorithm (McKennaet al.2010; Poplinet al.2017) was used to call polymorphisms (SNPs and INDELs) by chromosome, and GatherVcfs was used to combine the individual sample chromosome GVCF files into a single GVCF file for each sample. The samples were then passed through the GATK GenotypeGVCFs to perform joint genotyping, and the resulting VCF output file was filtered using the GATK VariantFiltration algorithm. The following flags were used for filtering: low map quality (MQ<40); high strand bias (FS > 40); differential map quality between reads supporting the reference and alternative alleles (MQRankSum < -12.5); bias between the reference and alternate alleles in the position of alleles within the reads (ReadPosRankSum < -8.0); and low depth of coverage (DP<5). Preliminary analysis So far, 192 samples have been processed through the bioinformatics step. The 192 samples were processed as two sets of 96 samples due to the large file sizes. The first dataset contained ~47 million SNPs and INDELS, and the second contained ~57 million SNPs and INDELS. These datasets will be combined when the remaining 192 samples have completed the bioinformatics step. Preliminary analyses evaluating population structure and demographic history were performed using the first 96 samples. DAPC and an ADMIXTURE analyses were used to estimate population structure, and preliminary results suggest two and three populations, respectively. Both analyses are in agreement with an independent northeastern population, which is consistentwith previous studies (Mülleret al.2018). We also identifieda distinct Alabama population from the ADMIXTURE analysis, which is novel, and may reflect the glacial refuge for this species, or possibly introgression with other Castanea species in this area (e.g., chinquapins). Additionally, we estimated the demographic history ofC. dentata using SMC++ (Terhorstet al.2017) and assumed a generation time of 30 years. American chestnut populations size most likely declined several times in the past, beginning ~2.7 million years ago before rapidly recovering ~50 thousand years ago. Thus, early results suggest that the American chestnut population structure can be described by a two or three population model and that multiple past demographic events influenced current genomic diversity.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Isabel N, Holliday JA, Aitken SN (2020) Forest genomics: Advancing climate adaptation, forest health, productivity, and conservation. Evolutionary Applications 13(1): 3-10.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Sandercock A, Westbrook J, Holliday, JA. Annual Meeting of NIFA Project NE-1833 (Biological Improvement of Chestnut through Technologies that Address Management of the Species and its Pathogens and Pests). Landscape genomics of the American chestnut. September 17, 2020 (Virtual).
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Carlson JE, Staton ME, Quaye CA, Cannon N, Zhebentyayeva T, Islam-Faridi N, Yu J, Huff M, Mandal M, Lasky JR, Noorai RE, Lasky JR, Saski C, Ficklin S, Drautz-Moses DI, Fitzsimmons S, Fan S, Conrad A, Schuster SC, Abbott AG, Westbrook J, Holliday JA, Nelson CD, Georgi L, Hebard FV (2020) A reference genome assembly and adaptive trait analysis of Castanea mollissima Vanuxem, a source of resistance to chestnut blight in restoration breeding. Tree Genetics and Genomes. 16(57).


Progress 02/15/19 to 02/14/20

Outputs
Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:We had some issues with gDNA quality/quantity, mainly from older leaves. Because the library preparation method employed by Hudson Alpha does not use PCR (which has the advantage that no PCR duplicates will be present in the data), a higher amount of DNA is required. Despite extensive efforts to optimize our protocol, we decided that we would simply re-collect those problematic samples early in the growing season of 2020. What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. We have also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at two conferences. Included among these conferences were two presentations at American Chestnut Foundation meetings, which were aimed at coordination/collaboration between our group and the extensive network of citizen scientists affiliated with the various TACF state chapters. Once completed, our next generation sequencing data will be deposited in the GenBank short read archive, and processed data files will be deposited in DataDryad. What do you plan to do during the next reporting period to accomplish the goals?Our next reporting period will be focused on completing collection and sequencing of all ~500 samples, completing bioinformatics on the resulting data, and beginning analyses outlined in objectives II and III.

Impacts
What was accomplished under these goals? Objective I.During spring and summer of 2019, in conjunction with The American Chestnut Foundation, we collected samples from >500 wild re-sprouts of American chestnut. Genomic DNA (gDNA) was extracted from these samples using either a modified Qiagen kit or the CTAB method. In the course of these extractions, we realized that some of the samples were collected too late in the season, and despite extensive efforts to optimize the extraction procedure, the yield and/or quality was too low in ~250 trees for the PCR-free kit our collaborators at the Hudson-Alpha Institute (HAI) are using to constructwhole-genome sequence (WGS) libraries. Specifically, because this kit does not use PCR, the input amounts of DNA must be quite a bit higher than for kits that have a PCR step. We were able to obtain high quality DNA of sufficient concentration for WGS from 269 samples. As a result, we will complete additional sampling this spring, targeting only young, recently-emerged leaves that give the highest DNA yield. In fall of 2019, we sent an initial set of 96 samples to HAI for library preparation and sequencing. Staff at HAI completed an additional round of QC on these and subsequently made libraries and sequenced on an Illumina NovaSeq instrument, with a target coverage of 20X per sample. We subsequently sent an additional 96 samples for library prep and sequencing in 2020. The remaining samples for which we have high quality DNA were held until additional sampling can be completed in spring 2020, after which we will extract DNA from these remaining samples and ship a final set of 288 samples to HAI during summer 2020, for a total of 480 trees sequenced with WGS.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Westbrook JW, Zhang Q, Mandal MK, Jenkins EV, Barth LE, Jenkins JW, Grimwood J, Schmutz J, Holliday JA (2019) Optimizing genomic selection for blight resistance in American chestnut backcross populations: A trade-off with American chestnut ancestry implies resistance is polygenic. Evolutionary Applications 13(1).
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Westbrook JW, Holliday JA, Newhouse A, Powell WA (2020) A plan to diversify transgenic blight-tolerant American chestnut population. Plants Planet People 2(1).
  • Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Genomics to Accelerate American Chestnut Restoration. Virginia Chapter, The American Chestnut Foundation, Annual Meeting Guest Speaker, 11/16/2019, 2019
  • Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Genomics of Local Adaptation in Trees. The American Chestnut Foundation Annual Meeting. Gettysburg, PA, October 18-19, 2019