Progress 02/15/19 to 02/14/24
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. Former PhD candidate Alexander Sandercock was primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication in Molecular Ecology last year and another that has been published in PNAS. Dr. Sandercock defended his thesis May 26, 2023, and has since moved to a post-doctoral position with USDA/Cornell Breeding Insight, where he is leveraging the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?Sequence data have been deposited in the GenBank short read archive (PRJNA804196), and our allied projects with the Hudson Alpha Institute to generate highquality American and haplotype-resolved Chinese chestnut genomes are available on Phytozome. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
The American chestnut (Castanea dentata), once a prominent hardwood species in the eastern United States, was devastated by the introduction of the fungal pathogen Cryphonectria parasitica in the early 20th century, leading to the functional extinction of the species. Despite the survival of millions of trees as root collar sprouts, these trees rarely reproduce. We are developing blight-resistant American chestnuts through interspecific hybridization, conspecific backcrossing, and genetic engineering, but success relies on incorporating adaptive genomic diversity from wild germplasm to produce locally adapted restoration populations across varying climatic conditions. To inform these efforts, we conducted a comprehensive analysis on 384 wild American chestnut trees to assess population structure, demographic history, and genomic diversity. Our study identified three distinct genetic populations--northeast, central, and southwest--each with unique adaptive allele frequencies. We found the highest genomic diversity in the southwest, reflecting historical bottleneck events associated with Quaternary glaciation. Genomic regions under positive selection suggest a common evolutionary response to fungal pathogens across populations, indicating that American chestnuts underwent postglacial expansion from the southern part of their range. To develop sampling recommendations for ex situ conservation of wild adaptive genetic variation, we identified polymorphsims with evidence of past climate-related selection,and found that on the basis of this subset of SNPs, the species range can be subdivided into three seed zones, andthat 21 to 29 trees per seed zone will need to be conserved to capture most extant adaptive diversity.Additionally, we evaluated 269 backcross trees to understand the extent to which breeding programs have captured wild adaptive diversity. This analysis provided insights into optimal reintroduction sites for specific families based on their adaptive profiles and projected future climate conditions. Our findings indicate that 21 to 29 trees per seed zone will need conservation to capture most extant adaptive diversity. Overall, our results offer a strategic blueprint for the ex situ conservation of germplasm and the targeted reintroduction of blight-resistant American chestnut populations. This approach can be applied to restore the American chestnut across its native range and offers a model for developing restoration plans for other imperiled tree species.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2024
Citation:
Sandercock A.M., Westbrook J.W., Zhang Q, Holliday J.A. (2024). A genome-guided strategy for climate resilience in American chestnut restoration populations. Proceedings of the National Academy of Sciences, 121 (30) e2403505121. https://doi.org/10.1073/pnas.240350512
|
Progress 02/15/23 to 02/14/24
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. Former PhD candidate Alexander Sandercock was primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication in Molecular Ecology last year and another that has been accepted pending revision in PNAS. Dr. Sandercock defended his thesis May 26, 2023, and has since moved to a post-doctoral position with USDA/Cornell Breeding Insights, where he is leveraging the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?In addition to journal publications and conference presentations detailed above, our sequencing data have been deposited in the GenBank short read archive (PRJNA804196), and our allied projects with the Hudson Alpha Institute to generate high quality American and haplotype-resolved Chinese chestnut genomes are avilable on Phytozome (https://phytozome-next.jgi.doe.gov/info/Cdentata_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP1_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP1_v1_1). What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
We showed previously that adaptive diversity across the American chestnut range can be split into three distinct groups: southern, central, and northern. These divisions corresponded with temperature and precipitation gradients, suggesting climate and population structure significantly influenced genomic variation. We used a re-sampling approach to show that collecting pollen from approximately 20-30 trees in each zone effectively captures adaptive diversity for future breeding efforts, and plans for this sampling are being developed. Using whole-genome sequencing on 371 selected backcross trees from the breeding program of The American Chestnut Foundation (TACF), we also found that historical sampling of wild pollen has been reasonably effective at capturing the above diversity, but careful construction of future crosses will be necessary to avoid it's loss in future generations. Breeding simulations were performed in AlphaSimR to assess how allelic diversity and correlations with wildtype allele frequencies varied with increasing numbers of crosses among backcross and wild type parent trees currently planted in TACF orchards. We found that greater than 95% of the allelic diversity represented with 30 crosses in each seed zone with only marginal gains from doing more crosses. During the current reporting period we primarily focused on completing a manuscript detailing the above results, which has been preliminarily accepted for publication in The Proceedings of the National Academy of Sciences.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2023
Citation:
Conn C.E E., Howie N., Lynch M., Lee S., Young E., Westbrook J., Holliday J., Zhang Q. & Cipollini M.L L. (2023). Validation of an Alternative Small Stem Assay for Blight Resistance in Chestnut Seedlings and Recommendations for Broader Use. PLANT DISEASE, 107(5), 1576-1583. doi:10.1094/PDIS-06-22-1489-RE
- Type:
Journal Articles
Status:
Other
Year Published:
2024
Citation:
(Accepted subject to revision) Sandercock A., Westbrook J., Zhang Q. & Holliday J. (2024). The road to restoration: Identifying and conserving the adaptive legacy of American chestnut. Proceedings of the National Academy of Sciences.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2023
Citation:
Holliday J., Westbrook J., Sandercock A. & Malukiewicz J. (2023). Quantitative, functional, and comparative genomic tools for species restoration: the case of American chestnut. In Southern Forest Tree Improvement Conference. Knoxville, TN.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2023
Citation:
Holliday J., Westbrook J., Malukiewicz J. & Sandercock A. (2023, September 27). Quantitative, functional, and comparative genomic tools for species restoration: the case of American chestnut. In VII Encuentro Cient�fico en Biolog�a Vegetal y Biotecnolog�a, de mol�culas a ecosistemas. University of Talca, Chile.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2023
Citation:
Holliday J., Sandercock A. & Westbrook J. (2023). Genomic tools for American chestnut restoration. In Forest Genetics 2023. Vernon, BC, Canada.
|
Progress 02/15/22 to 02/14/23
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. PhD candidateAlexander Sandercock has been primarily responsible for data generation and analyses associated with this project, which resulted in a first author publication last year and another to be submitted by end of summer 2023. Mr. Sandercock will be defending his thesis May 26, 2023, and will subsequently be moving on to a post-doctoral position with USDA/ARS Breeding Insight, where he will leverage the skills he's gained in bioinformatics and computational biology to advance complex trait dissection in specialty crops. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at one online meeting organized by the American Chestnut Foundation. Our sequencing data have been deposited in the GenBank short read archive (PRJNA804196), and our alied projects with the Hudson Alpha Institute to generate high quality American and haplotype-resolved Chinese chestnut genomes are avilable on Phytozome (https://phytozome-next.jgi.doe.gov/info/Cdentata_v1_1;https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP1_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaMahoganyHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP2_v1_1; https://phytozome-next.jgi.doe.gov/info/CmollissimaNankingHAP1_v1_1). What do you plan to do during the next reporting period to accomplish the goals?We are currently in a 6-month no cost extension and are focused on completing a manuscript describing patterns of local adaptation in wild American chestnut populations, as well as in the backcross breeding program, and how this information can/will be used to develop the ex situ conservation plan that was the ultimate goal of this project.
Impacts What was accomplished under these goals?
Identifying loci underlying climatic adaptation and defining seed zones We used our 21 million SNP whole-genome sequencing dataset to pinpoint loci associated with climate, define seed zones for germplasm conservation, and create a new method for sampling wild adaptive diversity. Our past work indicated the American chestnut range was split into three distinct groups: southern, central, and northern. These divisions corresponded with temperature and precipitation gradients, suggesting climate and population structure significantly influenced genomic variation. Our models demonstrated that climate contributed the most to explainable variance. In assessing genotype-environment associations, we used ten climate variables. To manage computational demands and limit the need for multiple test corrections, we replaced the environmental variables with three PC axes. This approach identified a total of 18,483 potentially adaptive SNPs across two GEA methods. We hypothesized that climate-related genomic variation in the American chestnut could be divided into seed zones reflecting eastern US temperature and precipitation gradients. Models indicated that two or three zones were likely, with the latter dividing the range into north, central, and south regions. The genetic differentiation between these seed zones was significantly higher at adaptive than neutral loci, indicating that while the number of seed zones was the same as the number of background populations, our approach successfully identified loci related to local adaptation. Our previous work showed that genomic diversity in American chestnut decreased with increasing latitude, suggesting a lower sampling intensity in the southernmost Zone 3, and a higher intensity in northern zones. We found this to be true, with fewer trees needed in Zone 3, and more in Zone 1, to achieve specific diversity targets. We also found that reaching a 99% allele frequency match required approximately 10 times the sampling intensity as a 90% match. This finding had implications for how many trees would need to be sampled depending on the seed zone model used. The adaptive capacity of the backcross breeding program We also conducted whole-genome sequencing on 371 selected backcross trees from the breeding program of The American Chestnut Foundation (TACF). Our goals were to quantify the wild adaptive diversity within the backcross populations and identify the most suitable areas for reintroducing the backcross families. Although TACF and its state chapters maintain breeding orchards across the species' historical range, pollen collections favored flowering wild trees from the central area, Seed Zone 2. As a result, we predicted a Seed Zone 2 ancestry bias within the backcross samples. Indeed, ancestry for TACF and central state chapter breeding materials primarily traced back to Seed Zone 2. Of the backcross trees, about 76% contained ancestry from at least two seed zones. The allele-frequency distribution for adaptive loci in both wild and backcross samples showed a shift towards medium frequency alleles in the wild population and low frequency alleles in the backcross population. Despite this, the overall correspondence between the wild and backcross populations was high, with the backcross population explaining approximately 80% of allele frequency variation in the wild population. Given the complex ancestry of the backcross trees, determining the best geography and climate for each tree based solely on pedigree information is challenging. We used Locator software to estimate the most suitable seed zone for each tree. This software proved reasonably accurate, averaging an error of 169.82 km for wild-type trees and 313.16 km for backcross trees. Although the backcross population contained adaptive genomic diversity from throughout the natural range, it contained the least from the southern Seed Zone 3. We will therefore recommend enhanced sampling intensity from Seed Zone 3. In future we hope to develop reciprocal common gardens to measure phenotypic reaction norms to different climates, and test our predicted matching of genotypes to geography/climate.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Sandercock AM, Westbrook J, Zhang Q, Johnson H, Saielli T, Scrivani J, Fitzsimmons S, Collins K, Schmutz J, Grimwood J, Holliday JA (2022) Frozen in time: Rangewide genomic diversity, structure, and demographic history of relict American chestnut populations. Molecular Ecology 31 (18), 4640-4655.
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Conn CE, Howie N, Lynch M, Lee S, Young E, Westbrook JW, Holliday JA, Zhang Q, Cipollini M (2022) Validation of an alternative small stem assay for blight resistance in backcross hybrid chestnuts (Castanea spp.) and recommendations for its expanded use. Plant Disease. 2022/11/16.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2022
Citation:
Holliday JA, Sandercock A, Westbrook J. Discovery of candidate genes for blight and root rot resistance in Castanea. TACF Annual Meeting (Invited). Sept 30-Oct , 2022.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2022
Citation:
Holliday JA, Sandercock A, Westbrook J. Genomic tools for species restoration: the case of American chestnut (Castanea dentata). IUFRO Tree Biotechnology Conference (Invited). July 6-8, 2022.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2022
Citation:
Sandercock, A., Holliday, J., & Westbrook, J. (2021) Landscape genomics of American chestnut. In TACF Science and Technology Committee Annual Meeting. Online.
|
Progress 02/15/21 to 02/14/22
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. Prior to the pandemic, we also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period Ph.D. student Alex Sandercockpresented at one online meeting organized by the AmericanChestnut Foundation. Alex also drafted a manuscript focused on his analyses of diversity, population structure, and demographic history in chestnut (Objective II), which has been submitted as a preprint to bioRxiv and will be submitted to a journal for publication shortly. With this submission, our whole-genome sequence data will be released in the sequence read archive at NCBI. Finally, we published an article in Chestnut (the Journal of The American Chestnut Foundation), which summarizes our findings to date with respect to genetic diversity in wild chestnut populations. What do you plan to do during the next reporting period to accomplish the goals?The next reporting period will thus be focused on data analysis related to Objectives III and IV. Specifically, we will test for genotype-environment relationships Latent Factor Mixed Models (LFMM) andBayPass software. We will also useredundancy analysis (RDA) to test for multivariate genotype-environment relationships. RDA isa constrained ordination approach in which multivariate regressions are fitted between a set of predictors (i.e., environmental and geographic variables) and response variables (in this case, genotype data).The outcome of these analyses will be a large set of candidate loci for local adaptation to be used in subsequent modeling of multilocus genome-environment relationships. We will then partition our three populations(or, management units) into adaptive units using Multivariate Random Forests (MRF).Specifically, we will first reduce the dimensionality of the genotype matrix containing adaptive loci with PCA, retaining those principal components that capture approximately 90% of the total variance. We will then build MRFswith individual PC loadings as the response variables and geography (latitude, longitude, elevation) as predictors. Each management unit will be treated as an independent group in this analysis, such that the result is a set of geographic coordinates that define relatively homogeneous provenances with respect to the frequencies of adaptive alleles, and these provenances will form the basis for ex situ conservation efforts.
Impacts What was accomplished under these goals?
During this reporting period we used our WGS data for each of 384 American chestnut genotypes, sampled from across the entire historical species range, as well as a reference panel comprising all congeners, to estimate population structure, demographic history, genomic diversity, and signatures of selection.We also performed WGS on aCastaneaspecies reference panel of 96 individuals to detect potential hybrid ancestry in the putativeC. dentatasamples. The reference panel included 19C. sativa, 15C. pumilavar.pumila, 10C. pumilavar.ozarkensis, 6C. pumilavar.alabamensis, 4C. dentata, 1C. dentataxmollissimahybrid, 1C. seguinii, 2C. henryi, 18C. crenata, and 20C. mollissima. Of the 384C. dentatasamples sequenced, 86 had greater than 20x coverage, 242 had 10-20x coverage, and 56 had less than 10x coverage. Eighteen samples with greater than 10% missing data were removed, and 10 additional samples were removed that had > 10% cluster membership with one or more of theCastaneaspecies reference samples based on ADMIXTURE analysis. The finalC. dentatadataset contained 356 individuals with an average coverage of ~17x and 21,116,005 high quality SNPs. TheCastaneaspecies reference dataset contained 92 samples and 49,309,429 SNPs that passed the filtering criteria. Population structure withinC. dentata, estimated with Discriminant Analysis of Principal Components (DAPC) and ADMIXTURE software, was best explained by a two or three population model. The three population ADMIXTURE model was characterized by a southwest, central, and northeast cluster. The southwest and central population separated in northern Georgia and eastern Tennessee, while the central and northeast population have an area of admixture in Pennsylvania before becoming more distinctly separated in southern New York. The two population DAPC model included the same southern population and boundary as ADMIXTURE, but the central and northeastern populations were merged. Both analyses were mostly in agreement with population memberships at the same K values. SMC++ estimates of effective population size (Ne)over time suggest that each population underwent contractions and expansions beginning approximately two million years ago. All populations followed a similar pattern of demographic history, however, the southwest population lagged the central and northeastern populations' events by approximately 100,000 years. Nerapidly increased for all three populations approximately 6,700-11,700 years ago, after which the central population underwent an additional contraction within the past 7,000 years. The southwest population had the highest contemporary Ne, followed by the northeast and central populations(Ne(southwest)=20,306, Ne(central)= 8,347, Ne(northeast)= 13,078). The southwest population had the greatest nucleotide diversity, followed by central and northeast populations (πsouthwest= 0.0069; πcentral= 0.0064; πnortheast= 0.0058). All populations had negative average Tajima's D, which were similarly clinal (Dsouthwest= -1.083; Dcentral=-1.016; Dnortheast=-0.335). Consistent with these negative values for Tajima's D, there was a deficiency of rare variants and an excess of high frequency variants in each population, suggesting recent expansion following a bottleneck. Sliding window analyses revealed heterogeneous genome-wide Tajima's D, nucleotide diversity, and FST. Throughout the genome, the southwest population had the most negative Tajima's D values, followed by the central and northeast populations. Conversely, the southwest population had the highest nucleotide diversity values throughout the genome, with decreasing values for the central and northeast populations. The highest FSTvalues were attributed to the southwest-northeast population pair. Finally, we identified genomic regions under selection within each population, which suggests that defense against fungal pathogens is a common target of selection across all populations. Taken together, these results suggest that American chestnut underwent a postglacial expansion from the southern portion of its range leading to three extant populations. These populations will serve as management units for breeding adaptive genetic variation into the blight-resistant tree populations for targeted reintroduction efforts.
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2021
Citation:
Sandercock, A., Holliday, J., & Westbrook, J. (2021). Landscape genomics of American chestnut. In TACF Science and Technology Committee Annual Meeting. Online.
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Sandercock AM, Westbrook J, Zhang Q, Johnson H, Saielli T, Scrivani J, Fitzsimmons S, Collins K, Schmutz J, Grimwood J, Holliday JA (2022) Whole-genome resequencing reveals the population structure, genomic diversity, and demographic history of American chestnut (Castanea dentata). bioRxiv. doi: https://doi.org/10.1101/2022.02.11.480151
- Type:
Other
Status:
Published
Year Published:
2022
Citation:
Sandercock AM, Westbrook J, Holliday JA (2022) The history and landscape of genetic diversity in American Chestnut. Chestnut: Journal of The American Chestnut Foundation.
|
Progress 02/15/20 to 02/14/21
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:Not surprisingly, the pandemic slowed our progress somewhat. We were unable to use the lab for DNA extractions during spring 2020, and have been greatly limited in capacity since then. Our collaborators at Hudson Alpha shifted focus somewhat to sequencing viral genomes, and this slowed their progress with our libraries sequencing. Nevertheless, we now have all data in hand and expect to make rapid progress with analysis in the coming year. What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. Prior to the pandemic, we also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. We currently have one undergraduate (Risa Dickerman) helping with gDNA extraction, who is also developing her own project around estimating population structure in chestnut. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at one online meeting organized by the American Chestnut Foundation. Once completed, our sequencing data will be deposited in the GenBank short read archive, and processed data files will be deposited in DataDryad. What do you plan to do during the next reporting period to accomplish the goals?The next reporting period will thus be focused on data analysis related to Objectives II and III.We recently received the final sequence data and graduate student Alex Sandercock is in the process of completing the bioinformatics tasks. Alex developed a pipelinefor bioinformatic processing of these large sequence files that involves splitting the data into individual chromosomes and completing the alignment and variant calling steps in parallel on our HPC systems. Alex has also been testinga variety of software tools for the analysis of population structure (e.g., 'Admixture', 'Discriminant Analysis of Principal Components', 'Uniform Manifold Approximation and Projection'), demographic history (e.g., 'Sequential Markovian Coalescent', 'NeEstimator', 'SNeP'), and genotype-environment relationships (e.g., 'Latent Factor Mixed Models', 'BAYENV'). We describe some preliminary results from these analyses above, and will repeat these and others once the full SNP set from all 384 samples is available.
Impacts What was accomplished under these goals?
Sampling To date, 384 American chestnut trees have been sampled and had their DNA extracted for this study. The trees were sampled throughout the American chestnut geographical range and from different ecoregions. Chestnut leaf samples were obtained by TACF and citizen volunteers in each region. Leaf samples were primarily collected from May through July in 2018, 2019, and 2020 with a preference for young leaves, and the GPS coordinates of each sample location were recorded. When possible, leaves were kept cool using wet packs and shipped cool to preserve the DNA. If wet packs were not available, leaves were desiccated with silica gel for shipping. DNA Isolation/Sequencing Upon receiving the samples, we cataloged and stored the leaves at -80C. For the first 96 DNA extractions, we used Qiagen's DNAeasy Plant DNA extraction kit, modified with a phenol-chloroform cleanup step instead of the Qiagen "shredder" column. This modification was used to reduce the chance of DNA loss in the cleanup steps since the leaves that were used were older. When DNA concentration was low, a secondary CTAB-based extraction was used. For samples 97 through 384, we ground samples to a powder using a Spex 2000 Geno/Grinder and used Qiagen's DNAeasy Plant DNA extraction kit, modified with an additional 100% ethanol wash step to remove salts that carried over from the precipitation steps. For each protocol leaves were evaluated for quality and quantity with a Nanodrop and Qubit, respectively. DNA was then stored in a 100-200 ul AE solution at -20C. Library preparation and genomic sequencing were conducted at the HudsonAlpha Institute for Biotechnology (HAI). In collaboration with HAI, genomic DNA (gDNA) samples were sequenced on an Illumina NovaSeq 6000 instrument. The Illumina S4 flow-cell was in a 2x150bp paired-end mode. Of the 384 samples sequenced, 86 had greater than 20x coverage, 242 had 10-20x coverage, and 56 had less than 10x coverage. Bioinformatics The bioinformatics analyses were performed on Virginia Tech's Advanced Research Computing system due to the large dataset and required computing resources. A reference genome for American chestnut (previously completed by HAI) was used for this analysis. SNPs were called using a custom pipeline adapted from the Broad Institute's Genome Analysis Toolkit (GATK) best practices. The resulting individual fastq files sent from Hudson Alpha were aligned using the Burrows-Wheeler Aligner (BWA)memalgorithm with theC. dentatagenome as a reference. The resulting SAM files were converted to BAM format and then sorted and indexed using SAMtools (Li & Durbin 2010; Liet al.2009). The GATK HaplotypeCaller algorithm (McKennaet al.2010; Poplinet al.2017) was used to call polymorphisms (SNPs and INDELs) by chromosome, and GatherVcfs was used to combine the individual sample chromosome GVCF files into a single GVCF file for each sample. The samples were then passed through the GATK GenotypeGVCFs to perform joint genotyping, and the resulting VCF output file was filtered using the GATK VariantFiltration algorithm. The following flags were used for filtering: low map quality (MQ<40); high strand bias (FS > 40); differential map quality between reads supporting the reference and alternative alleles (MQRankSum < -12.5); bias between the reference and alternate alleles in the position of alleles within the reads (ReadPosRankSum < -8.0); and low depth of coverage (DP<5). Preliminary analysis So far, 192 samples have been processed through the bioinformatics step. The 192 samples were processed as two sets of 96 samples due to the large file sizes. The first dataset contained ~47 million SNPs and INDELS, and the second contained ~57 million SNPs and INDELS. These datasets will be combined when the remaining 192 samples have completed the bioinformatics step. Preliminary analyses evaluating population structure and demographic history were performed using the first 96 samples. DAPC and an ADMIXTURE analyses were used to estimate population structure, and preliminary results suggest two and three populations, respectively. Both analyses are in agreement with an independent northeastern population, which is consistentwith previous studies (Mülleret al.2018). We also identifieda distinct Alabama population from the ADMIXTURE analysis, which is novel, and may reflect the glacial refuge for this species, or possibly introgression with other Castanea species in this area (e.g., chinquapins). Additionally, we estimated the demographic history ofC. dentata using SMC++ (Terhorstet al.2017) and assumed a generation time of 30 years. American chestnut populations size most likely declined several times in the past, beginning ~2.7 million years ago before rapidly recovering ~50 thousand years ago. Thus, early results suggest that the American chestnut population structure can be described by a two or three population model and that multiple past demographic events influenced current genomic diversity.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Isabel N, Holliday JA, Aitken SN (2020) Forest genomics: Advancing climate adaptation, forest health, productivity, and conservation. Evolutionary Applications 13(1): 3-10.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2020
Citation:
Sandercock A, Westbrook J, Holliday, JA. Annual Meeting of NIFA Project NE-1833 (Biological Improvement of Chestnut through Technologies that Address Management of the Species and its Pathogens and Pests). Landscape genomics of the American chestnut. September 17, 2020 (Virtual).
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Carlson JE, Staton ME, Quaye CA, Cannon N, Zhebentyayeva T, Islam-Faridi N, Yu J, Huff M, Mandal M, Lasky JR, Noorai RE, Lasky JR, Saski C, Ficklin S, Drautz-Moses DI, Fitzsimmons S, Fan S, Conrad A, Schuster SC, Abbott AG, Westbrook J, Holliday JA, Nelson CD, Georgi L, Hebard FV (2020) A reference genome assembly and adaptive trait analysis of Castanea mollissima Vanuxem, a source of resistance to chestnut blight in restoration breeding. Tree Genetics and Genomes. 16(57).
|
Progress 02/15/19 to 02/14/20
Outputs Target Audience:The primary audience for this project will be public and private landowners interested in restoring American chestnut to Appalachian forests. There is a widespread recognition in the region of the economic and ecological role this species previously played on the landscape, as evidenced by the citizen science undertaken by TACF volunteers of the last three decades. The goal of blight resistant chestnuts has nearly been realized, and this project will support the final piece of the restoration puzzle, namely, diversifying those resistant trees so they can be planted across the climatically heterogeneous native range of the species. Changes/Problems:We had some issues with gDNA quality/quantity, mainly from older leaves. Because the library preparation method employed by Hudson Alpha does not use PCR (which has the advantage that no PCR duplicates will be present in the data), a higher amount of DNA is required. Despite extensive efforts to optimize our protocol, we decided that we would simply re-collect those problematic samples early in the growing season of 2020. What opportunities for training and professional development has the project provided?By integrating molecular biology, bioinformatics, and genecology this project has provided interdisciplinary education and training at various levels. During spring 2019, we were fortunate to recruit a PhD student, Alexander Sandercock, to work on the project. Mr. Sandercock has a wealth of experience in conservation biology and genetics and quickly took the lead in managing the collections, extracting DNA, and coordinating with the sequencing core at HAI. We have also had four undergraduate students working on the project, mostly assisting with DNA extraction and sample management. Finally, Research Associate Qian Zhang, who manages our laboratory, has assisted with optimizing our gDNA extraction protocols to meet the quantity/quality requirements set by the sequencing facility. How have the results been disseminated to communities of interest?During this reporting period we published two journal articles and presented at two conferences. Included among these conferences were two presentations at American Chestnut Foundation meetings, which were aimed at coordination/collaboration between our group and the extensive network of citizen scientists affiliated with the various TACF state chapters. Once completed, our next generation sequencing data will be deposited in the GenBank short read archive, and processed data files will be deposited in DataDryad. What do you plan to do during the next reporting period to accomplish the goals?Our next reporting period will be focused on completing collection and sequencing of all ~500 samples, completing bioinformatics on the resulting data, and beginning analyses outlined in objectives II and III.
Impacts What was accomplished under these goals?
Objective I.During spring and summer of 2019, in conjunction with The American Chestnut Foundation, we collected samples from >500 wild re-sprouts of American chestnut. Genomic DNA (gDNA) was extracted from these samples using either a modified Qiagen kit or the CTAB method. In the course of these extractions, we realized that some of the samples were collected too late in the season, and despite extensive efforts to optimize the extraction procedure, the yield and/or quality was too low in ~250 trees for the PCR-free kit our collaborators at the Hudson-Alpha Institute (HAI) are using to constructwhole-genome sequence (WGS) libraries. Specifically, because this kit does not use PCR, the input amounts of DNA must be quite a bit higher than for kits that have a PCR step. We were able to obtain high quality DNA of sufficient concentration for WGS from 269 samples. As a result, we will complete additional sampling this spring, targeting only young, recently-emerged leaves that give the highest DNA yield. In fall of 2019, we sent an initial set of 96 samples to HAI for library preparation and sequencing. Staff at HAI completed an additional round of QC on these and subsequently made libraries and sequenced on an Illumina NovaSeq instrument, with a target coverage of 20X per sample. We subsequently sent an additional 96 samples for library prep and sequencing in 2020. The remaining samples for which we have high quality DNA were held until additional sampling can be completed in spring 2020, after which we will extract DNA from these remaining samples and ship a final set of 288 samples to HAI during summer 2020, for a total of 480 trees sequenced with WGS.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Westbrook JW, Zhang Q, Mandal MK, Jenkins EV, Barth LE, Jenkins JW, Grimwood J, Schmutz J, Holliday JA (2019) Optimizing genomic selection for blight resistance in American chestnut backcross populations: A trade-off with American chestnut ancestry implies resistance is polygenic. Evolutionary Applications 13(1).
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Westbrook JW, Holliday JA, Newhouse A, Powell WA (2020) A plan to diversify transgenic blight-tolerant American chestnut population. Plants Planet People 2(1).
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2019
Citation:
Genomics to Accelerate American Chestnut Restoration. Virginia Chapter, The American Chestnut Foundation, Annual Meeting Guest Speaker, 11/16/2019, 2019
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2019
Citation:
Genomics of Local Adaptation in Trees. The American Chestnut Foundation Annual Meeting. Gettysburg, PA, October 18-19, 2019
|
|