Performing Department
(N/A)
Non Technical Summary
This project aims to build a hop pangenome by assembling genome references for male Humulus lupulus var. cordilfolius and H. japonicus, and combined with the previously sequenced male var. lupuloides, lupulus, and neomexicanus, will represent the diversity in Humulus and will aid in the development of genomic resources, an understanding of the species diversity (e.g., CNV, PAV, and core and dispensable genes) and molecular markers for hop breeding programs. Modernizing hop breeding programs worldwide will require new genomic tools in the form of a pangenome and an understanding of genotype-phenotype relationships.The pangenome will provide a key enabling tool that will result in identifying novel genes for hop improvement and sequence data to develop a common marker platform for future breeding applications. For example, a hop pangenome will provide a key resource to develop markers for key disease resistance genes that will improve the long-term sustainability of new hop varieties. Developing novel and disease resistant cultivars would prove especially useful as this would lead to a reduction in overall pesticide use which is associated with greater environmental sustainability and stewardship and increased profits for US hop growers. With funding from the USDA NIFA Postdoctoral Fellowship, I will develop and assess the utility of hop pangenomic resources.
Animal Health Component
50%
Research Effort Categories
Basic
50%
Applied
50%
Developmental
0%
Goals / Objectives
Hop (Humulus lupulus) is a dioecious, perennial plant and the primary bittering and flavoring agent in beer. Hop breeding has long focused on developing varieties that contain favorable agronomic characteristics and unique flavors and aromas. While only female plants are cultivated commercially for their inflorescence or "cones", breeding and germplasm collections contain male and female plants which are used as breeding parents to develop novel variation including for disease resistance. However, we still lack a complete understanding of the genome organization and structure within Humulus germplasm resources. Multiple lines of evidence indicate the presence of significant structural variation within and between individuals, and especially between previously described subpopulations. By developing a better understanding of hop genome organization, we can develop tools to more efficiently introgress novel alleles from diverse germplasm. Herein, I will develop Humuluspangemoic resourcesto enable development of genomics-assisted breeding tools to aid in modernizing hop breeding programs. My objectives include 1)Develop a hop pangenome and examine comparative genomics and diversity within the Humulus species; 2) Assess the utility of the hop pangenome graph via a pangenome-wide association study (panGWAS).
Project Methods
To generate reference genomes, I will sequence two male genotypes using PacBio Revio (60x coverage) and Dovetail Hi-C (60x coverage),including a wild H. l. var. cordifolius genotype previously received as seed from Suntory Holdings in Japan, and a H. japonicus genotype resulting from a cross between two individuals 'SJ1' and 'HJ8' (Havill et al., in prep). These individuals have been selected due to availability of the germplasm, the presence of both sex chromosomes (X1X2Y1Y2 and XY1Y2), and the presence of important phenotypic traits such as aphid and powdery mildew resistance. I will assemble scaffolds using hifiasm (Cheng et al. 2021) and generate pseudomolecules representing autosomal and sex chromosomes in the H. l. var. cordifolius and H. japonicus accessions using JUICER (Durand et al. 2016). I will assess genome assembly quality using various metrics such as gene completeness with BUSCO (Manni et al. 2021).I will align previously generated data consisting of twoF1 populations, onewithin each species (Havill et al. 2023a; Havill et al., in prep)using each genome reference for the basis to construct a high-density genetic map with JoinMap5 following established procedures (Havill et al. 2023a; van Ooijen 2019). We will examine the extent of expected genome coverage based upon genetic map length. We will examine relationships between the genetic and physical maps with Spearman's correlation coefficient. We will also examine recombination rate variation within each species genome and compare differences between species, parents, chromosomes, and haplotypes (Dukic et al. 2016).Genome annotation will be conducted using two approaches: (1) through identifying genes using RNAseq, and (2) through identifying repeat sequences. To capture a significant portion of the transcribed genes, RNAseq will be conducted in triplicate on seven tissues including: apical vegetative buds (2 cm of apical tissue containing meristem), 10day old stems from the 3rd internode, 10-day old root tissue, 10-day old young fully-developed leaves from the 3rd node, and floral tissue from growth stages BBCH 51 (inflorescence buds visible), 71 (beginning of cone development) , 89 (cone maturity) (Rossbauer et al. 1995). RNAseq data will be analyzed using BRAKER2 (Hoff et al. 2016). Repeat sequences will be identified using RepeatMasker (Smit et al. 2015).I will construct a hop pangenome using the var. lupulus (21110M) as the backbone reference compared to the four additional haplotype-resolved genome references MN-1421 (H. l. var. lupuloides), MN-586 (H. l. var. neomexicanus), H. l. var. cordifolius, and H. japonicus. I will also leverage existing genomes, such as the cv. 'Cascade' or other high-quality genomes that may become available during this research. I will use Minigraph-Cactus (Hickey et al. 2024) for pangenome graph construction.To examine relationships within Humulus, I will first identify single-copy genes using OrthoFinder (Emms & Kelly 2019). To identify homologous chromosomes and begin characterizing structural variation (SV), synteny analyses using GENESPACE will be conducted (Lovell et al. 2018). For characterization of SVs, including insertions, deletions, and translocations, I will use the var. lupulus assembly as a reference and align the remaining genomes using minimap2 (Li 2018). I will test for gene family expansions or contractions using CAFE to examine gene family evolution for traits of interest, including disease resistance (De Bie et al. 2006).I will identify protein sequences with 100% similarity in each genome and remove them using CD-HIT (Fu et al. 2012). Nonredundant protein sequences will be clustered into gene families using OrthoFinder (Emms & Kelly 2019). Based on their frequency, we will divide genes into the following three categories: core (present in all five genotypes), dispensable (those present in more than one but less than five) and private (present in only one genotype).I will generatelow-pass (3x) Illumina resequencing on the wild hop core collection(WHCC; Havill et al., in prep). A field trial containing the WHCC is planted in a randomized complete block design with two replicates (plants). Phenotypic data (biological sex = female or male; flowering time = day of the year; yield* = inflorescence yield at 10% moisture, powdery mildew severity = 0-100%, 0 = no infection, 100 = 100% infection; downy mildew severity = 0 - 100, 0 = no infection, 100 = 100% infected leaves; and maturity* = days to cone ripeness following first observed flower; * = collected on female individuals only) will be collected in 2025 and 2026.The low-pass (3x) Illumina resequencing data on the WHCC will be mapped against the pangenome graph constructed in Aim 1, and SNPs and SVs will be called using vg and Giraffe (Garrison et al. 2018; Sirén et al. 2021). I will combine previously unpublished genotype-by-sequencing data for each individual accession in the WHCC with resequencing data, to perform imputation of missing data from within the combined dataset using BEAGLE combined with stringent quality filtering to achieve high accuracy (Browning et al. 2018; Wu et al. 2019). I will use the resulting VCF to examine population structure using ADMIXTURE and r/PopHelper (Alexander et al. 2009; Francis 2017), genetic diversity using r/hierfstat (Goudet 2005), and demographic history of Humulus germplasm with r/slendr (Petr et al. 2023). I will examine the concordance between population structure estimates using SNPs and SVs separately to examine consequences of SVs on admixture inferences.Phenotypic trial data collected from each genotype will be subjected to statistical analyses for each trait. Phenotypes exhibiting significant genetic variation will be combined with genetic data to conduct a pangenome-wide association study (panGWAS) with r/MVP (Yin et al. 2021) to identify key marker-trait associations using independent quality-filtered SNP and SV datasets for assessed traits.