Source: UNIV OF CONNECTICUT submitted to
A CONSERVATION GENOMICS PROGRAM TO IDENTIFY CLIMATE ADAPTED GENES IN THE EASTERN HEMLOCK (TSUGA CANADENSIS)
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030624
Grant No.
2023-67012-40000
Cumulative Award Amt.
$225,000.00
Proposal No.
2022-09626
Multistate No.
(N/A)
Project Start Date
Aug 1, 2023
Project End Date
Jul 31, 2025
Grant Year
2023
Program Code
[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production
Project Director
Fetter, K. C.
Recipient Organization
UNIV OF CONNECTICUT
438 WHITNEY RD EXTENSION UNIT 1133
STORRS,CT 06269
Performing Department
(N/A)
Non Technical Summary
Forests are critically important resources for human and biological systems that provide numerous benefits, including raw materials for timber industries, habitat for agricultural pollinators, stabilization and enrichment soils and water quality, just to name a few of their benefits. Globally, trees are facing an extinction crisis, with nearly a third of species threatened or endangered, and in the United States, 11% of trees face extinction. The eastern hemlock (Tsuga canadensis) is a keystone species of eastern forests in the United States and Canada and is facing an extinction crisis from an invasive pest and climate change. This project aims to create the knowledge base to aid the recovery of the species and to prepare for future challenges to its existence.To achieve these goals, principles of conservation genomics will be used to create a new reference genome and to identify beneficial genetic variation. A reference genome is a set of DNA sequences that ideally span every chromosome, where genes are identified and described according to their structure and function. Reference genomes aid genetic projects in numerous ways, most importantly they provide an index of the genome and a description of every gene and regulatory element. Using the reference genome and a sample of DNA sequences from wild trees, genetic variation which is beneficial for the persistence of the species can be identified and described. Seeds from trees containing the most beneficial genetic variation in challenging environments can be collected and stored for future breeding efforts or reforestation. Through using principles of conservation, tree improvement, and evolutionary biology, species and the benefits they provide can be protected from extinction.
Animal Health Component
30%
Research Effort Categories
Basic
70%
Applied
30%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20206131080100%
Knowledge Area
202 - Plant Genetic Resources;

Subject Of Investigation
0613 - Mixed conifer-broadleaf forests;

Field Of Science
1080 - Genetics;
Goals / Objectives
The major goal of this project is to preserve healthy forest ecosystems through conservation of biological resources. Healthy forests are essential for functioning ecosystems that directly support agroforestry and industries that rely on intact and functioning forest ecosystems. Covering over 1.3 million hectares, the eastern hemlock (Tsuga canadensis) is an integral part of eastern North American forests. Throughout its range, T. canadensis is recognized as a keystone species, providing important ecosystem services for water quality, soil stabilization, wildlife habitat, and economic and aesthetic assets for humans. In 2013, the IUCN listed T. canadensis as near threatened, with many local populations having been lost in recent decades. Climate change threatens more populations, both directly and indirectly. There is a pressing need for conservation in the eastern hemlock system focused on genomics, gene conservation, seed banking, and breeding.Investing in personnel and bioinformatic resources to support tree improvement and gene conservation is a major goal of this work. Tree improvement is an important objective to sustain food and forest resources to meet the challenges of the 21st century. Demand for agriculture and forest products is increasing at at rapid pace. At the same time, challenges facing forests and agriculture from biotic and abiotic sources is increasing too. Scientists with broad training in genomics, population biology, and plant breeding are needed to meet the needs of society. In this grant, a postdoctoral fellow, graduate student, and undergraduate students will be involved in training and research focused on tree improvement. The team will be led by the PDF to generate a reference genome for a species of conservation concern and generate a population genotyping data set.The major achievement of the proposed work is to stabilize T. canadensis populations and to prevent them from becoming a threatened or endangered species. The milestones towards the major achievement will take years and decades. Some of the milestones can be achieved in a shorter time span, e.g. within the time frame of the NIFA fellowship. Other milestones, however, require significant efforts to create breeding populations, test breeding lines for increased stress resilience, deploying genetic resources in a landscape, for example.To achieve the major goals, the following objectives will be performed during the time period of the NIFA fellowship:Create a reference genome for a near-threatened forest keystone species.Create a sequenced population set from a range-wide sample of wild accessions.Identify climate-adaptive genetic variation for gene conservation.Execute training and career development plan.Communicate results to the scientific and conservation communities.
Project Methods
Objective 1: Generate a chromosome-scale reference genome for the eastern hemlockCreating the reference genomeAn accession from the Harvard University's Arnold Arboretum has been selected to sequence. The tree (#1509-1*A) was collected in 1882 from wild provenance in Boston, MA. Leaf tissue was collected and high molecular weight DNA was extracted using an Circulomics Nanobind Plant Nuclei Big DNA Kit. DNA quality was assessed, then fragmented to approximately 20 Kb. After passing quality control, we sequenced four libraries using an Oxford Nanopore PromethION to a coverage of 20x for the 20 Gb genome. To account for a higher error rate of Nanopore data, we created a set of Illumina short reads. Genomic DNA was extracted from the haploid megagametophyte of a seed of the reference tree. 350 bp and 550 bp paired-end libraries were prepared and sequenced on an Illumina NovaSeq S4 for 300 cycles to 80x coverage. This work was funded by a grant from The Nature Conservancy.Using short and long reads, we will create a draft hybrid assembly using MaSuRCA (Zimin et al., 2013). The quality of the genome assembly will be assessed with BUSCO (Simaao et al., 2015) and QUAST (Gurevich et al., 2013). To create a high-quality, chromosome-scale assembly, we will contract with Dovetail genomics to use chromosome conformation capture methods to generate a linkage map for each chromosome. We will provide a draft assembly and libraries prepared with their Omni-C kit. The libraries will be sequenced with Illumina short reads. The final assembled genome will merge our draft with the linkage map. This method represents the current state-of-the-art for genome assembly.Genome annotation provides critical structural and functional information to an assembly. We will annotate the genome using an in-house pipeline developed for eukaryotic non-model organisms. The pipeline, called EASEL (Webster et al., 2022), is a genome annotation tool that leverages machine learning, RNA folding, and functional annotations to enhance gene prediction accuracy. EASEL utilizes AUGUSTUS (Stanke & Morgenstern, 2005) with parameters optimized for prediction of gene models incorporating extrinsic evidence from transcripts and protein alignments. The EASEL pipeline aligns transcript reads from RNA-seq and assembles putative transcripts via Stringtie2 (Kovaka et al., 2019) and PSIClass (Song et al., 2019). Open reading frames are predicted through TransDecoder (Brian & Papanicolaou, 2020) utilizing a gene family database (EggNOG, Huerta-Cepas et al., 2019) for refinement. Transcriptome and protein hints are generated by aligning the frame-selected sequences to the genome. These hints are independently used to train AUGUSTUS, and the resulting predictions are combined into a single gene set. Implicated gene structures are further refined through machine learning based on a set of primary and secondary sequence features (e.g. RNA folding, Kozak consensus, splice sites, functional annotation). Annotations are combined and filtered from a feature matrix of output and scored to generate the highest supported annotation. Descriptive statistics of the final assembly will be reported.Objective 2: Identify climate adapted genomic variation for gene conservationGenetic variation that increases fitness of an individual in a population in its home range is said to be locally adaptive. Finding locally adaptive genetic variation is difficult for several reasons: allele frequency of neutral sites across a genome can mimic sites under selection; tests to detect sites under selection may lack statistical power to yield true positives, and fitness effects may be small or undetectable, to name a few challenges (Savolainen et al., 2013). The objective here is to identify a set of loci that demonstrate allele frequencies indicative of associations to climate, while accounting for background levels of allele frequency variation due to ancestry, kinship, or both. To achieve this objective I will conduct a gene-environment analysis. Genetic offsets will be identified using current and future climate predictions with gradient forest models, and compared to generate climate risk assessments.Creating the SNP setIn total, tissue from 288 unique genotypes will be collected from two sources: seed banks (N=144) and field collections (N=144). Seeds will be ordered from GRIN, the Canadian NTSC, and the Holden Arboretum representing a wide range of geographic diversity (Fig. 3). Field collections of leaves will be made in critical portions of the species' distribution not represented in seed banks, namely, the range core (sensu Hampe & Petit, 2005) in New England and New York, and populations in the Great Lakes basin. Both regions represent the largest population sizes of hemlock (in terms of basal area) (Fitzpatrick et al., 2012), and are likely to harbor adaptive genetic variation.Seeds will be stratified for 10 weeks and germinated following (Jetton et al., 2014), grown in petri dishes until the six leaf stage. Whole plants will be placed in 96 well microcollection tubes and lysed for DNA extraction. Leaf tissue will be collected in the field and preserved in silica gel. Approximately 250 mg of leaf tissue will be packed into a microcollection tube, lysed, and DNA extracted. Staff at CGI will perform DNA extractions. Libraries for ddRADseq will be prepared using SphI-MluCI RE following Johnson et al. (2017) and checked for quality control. 150 bp PE reads will be sequenced on an Illumina NovaSeq 6000 for 300 cycles across 6 lanes at CGI.Read quality will be checked with fastp (Chen et al., 2018) and adapters trimmed with sickle (Joshi, 2011). To detect variants, a combination of bioinformatic tools will be used to align reads to the reference genome (bwa, Li & Durbin, 2009), sort, de-duplicate, and index reads (samtools, Li et al., 2009), and to call SNPs (bcftools, Danecek et al., 2021). Sites will be filtered with bcftools to retain biallelic sites without indels, for site-wise missingness <20%, genotype quality >90, and a site depth appropriate to the coverage. SNPs that show excess heterozygosity in tests of Hardy Weinberg Equilibrium will be removed.Climate adaptation and genetic offsetsNeutral population genetic structure will be described using ADMIXTURE (Alexander et al., 2009) to obtain a Q-matrix, and with DAPC (Jombart et al., 2010). Bioclimatic variables from each site will be downloaded from WorldClim (Fick & Hijmans, 2017) and summarized using PCA. Climate adapted candidate sites will be identified from a gene-environment analysis in a latent-factor mixed model using LFMM2 (Caye et al., 2019). LFMMs are regression models fit for each genetic site that includes explanatory variables for climate (PC1) and a correction for ancestry (i.e. the Q-matrix). Predictive models generalizing genetic-environment relationships of current and future climate will be fit with gradient forests to yield genetic importance values (Ellis et al., 2012). Future climate will be projected 50 years into the future using two different emission scenarios (RCP 4.5 and 8.5). Local genetic offsets will be calculated as Euclidean distances between genetic importance values of current and future climates for each sample location following Fitzpatrick and Keller (2015). Genetic offsets will be visualized and spatially mapped for the entire species' range using kriging interpolation with the gstat package (Graeler et al., 2016). Areas with large genetic offsets are priority areas for gene conservation.?

Progress 08/01/23 to 07/31/24

Outputs
Target Audience:During the reporting period, the primary target audiences for our project included conservation professionals at the Nature Conservancy and Holden Arboretum, undergraduate and graduate students at the University of Connecticut's Department of Ecology and Evolutionary Biology, and various foresters and land managers in state and federal systems. Our engagement with these groups was crucial for disseminating our research findings on tree conservation and developing resilient forest ecosystems. We prioritized these audiences because of their direct involvement in conservation practices, ecological research, and land management, ensuring that our work had a practical and immediate impact. Our efforts also focused on diverse undergraduates, particularly those who were first-generation college students. By including these students in our project, we aimed to foster a new generation of conservation scientists who are equipped with the knowledge and skills to address the challenges of climate change and invasive pests. This group is significant because they bring unique perspectives and experiences to the field of conservation, and their involvement helps to diversify and strengthen the scientific community. Additionally, working with these students aligns with the broader goals of the USDA AFRI EWD program to support education and training in food and agriculture disciplines. We also made a concerted effort to reach out to general scientific and conservation professionsals throughworkshops and educational programs at conferences (PAG and NAFGS). These activities were designed to increase awareness of the importance of tree conservation and the threats posed by climate change and invasive species. By engaging with a wide audience, we aimed to build a broader base of support for conservation efforts and to educate people about the role of forests in maintaining healthy agroecological systems. Changes/Problems:Obtaining tissue took longer than expected. Tissue for the study is from wild sources and the range of the species is the entire east coast of the USA. Another challenge is DNA extraction. The DNA for this species is hard to cleanly extract and I've had to do a lot of trouble shooting. But progress is being made. What opportunities for training and professional development has the project provided?I designed and taught a reading course on conservaton genomics with 15 students enrolled in the course. I attended a workshop on conservation genomics of imperiled forest trees hosted by the Nature Conservancy in April 2024. During the workshop, I worked and trained closely with conservation and tree breeding professionals. A major outcome of the workshop was designing a new grant to fund applied conservation in forest trees using the genomic resources developed in the current USDA NIFA grant. How have the results been disseminated to communities of interest?Through the classroom, in workshops, and in scientific conferences. What do you plan to do during the next reporting period to accomplish the goals?I will finish the DNA extractions and get the libraries sequenced. Then, I will detect variants and perform the climate adaptation modelling.

Impacts
What was accomplished under these goals? Goal #1 was achieved. The reference genome was sequenced and assembled. The chromosome confirmation sequencing (HiC) was performed and assembled. Much progress was made on Goal #2. 756 samples were collected from across the range of the focal species. DNA extractions began and are presently underway. Goal #4 was performed. I designed and taught a reading course on conservaton genomics with 15 students enrolled in the course. Goal #5 was achieved at two conferences: PAG in San Diego in January 2024, and the North American Forest Genetics Soceity meeting in Oaxaca, Mexico in June 2024.

Publications