Source: HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY submitted to NRP
UNRAVELING THE MECHANISMS OF SEX DETERMINATION IN HEMP
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030323
Grant No.
2023-67013-39620
Cumulative Award Amt.
$644,082.00
Proposal No.
2022-10318
Multistate No.
(N/A)
Project Start Date
May 1, 2023
Project End Date
Apr 30, 2026
Grant Year
2023
Program Code
[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production
Recipient Organization
HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY
601 GENOME WAY
HUNTSVILLE,AL 358062908
Performing Department
(N/A)
Non Technical Summary
Hemp is a crop species with enormous potential for sustainably producing fiber, oil, and protein in the country. One major complication to hemp breeding and production for fiber is that hemp plants have separate male and female sexes, often close to 50:50, yet female plants produce more fiber and oil than male plants do. A key question in hemp biology is the identification of the genes that control the sex of hemp plants (whether it is male or female), with the future goal of producing all-female populations that produce more fiber biomass and oil. In this proposal, we will generate high quality genomic resources for hemp that specifically focus on identifying these genes that control sex determination. We will sequence and assemble several male (XY) hemp genomes, and identify genes in those genomes. Further, we will sequence the genomes of plants with mutations in flower development to narrow in on the genes that control sex. The identification of these genes will allow us to produce genetic markers that are perfectly linked to sex, which is crucial for hemp breeding and production. In addition, the identification of these genes sets the stage for producing all-female populations of hemp that produce more biomass for fiber and oil and protein. Lastly, this project will develop a training program focused specifically on recruiting Alabama HBCU undergraduates into the agricultural industry through a joint genomics and breeding training program at HudsonAlpha and New West Genetics.
Animal Health Component
40%
Research Effort Categories
Basic
40%
Applied
40%
Developmental
20%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20117301080100%
Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1730 - Hemp;

Field Of Science
1080 - Genetics;
Goals / Objectives
Hemp is a crop with tremendous potential for producing both fiber,oil, and protein. One complication is that hemp is dioecious, with separate male and female sexes, governed by an XY sex chromosome pair, and female plants produce more valuable products than males (oil, fiber). This project aims to identify the genes that control sex determination in hemp. This XY sex chromosome has been challenging to assemble, which limits our ability to identify the master sex determination genes that control male, female, and monoecious flower formation. The deliverables of this proposal include a set of universally perfect sex-linked markers that can be used to assay sex in any hemp population, in addition to discovering genes that are likely master sex determination genes that can be modified to control sex in hemp to produce all-female populations.We propose to identify the master sex determination genes in hemp by Objective (1): building five hemp genomes that completely resolve the XY sex chromosome pair and identifying all genes on the sex chromosome, Objective (2): create a comprehensive gene expression atlas for all major tissues of hemp, and Objective (3): perform a large-scale resequencing experiment on an ionized radiation mutation population for sex mutants. Further, we will create an interdisciplinary internship program between HudsonAlpha and New West Genetics that targets students from Alabama's Historically Black Colleges and Universities.
Project Methods
Aim 1: Develop chromosome-scale assemblies for five hemp genomes, including fully-phased X/Y sex chromosomes.In this first aim, we will generate five genome assemblies for hemp, including 3 XY males and 2 monoecious individuals. We will use PacBio HiFi long-read sequencing, in addition to DovetailOmni-C scaffolding, to generate chromosome-scale genomes. A key focus of these genomes will be the accurate reconstruction of the X/Y sex chromosome pair, which historically has been one of the more challenging chromosomes to assemble in both plants and animals. Genome assemblies will be validated using BUSCO, synteny, and contiguity. The sex chromosome pair will be partitioned into the non-recombining sex determination region (SDR) and the pseudoautosomal region (PAR) using sex-linked k-mers determined by our in-house "Cytogenetics-by-Sequencing" pipeline. All genes within the SDR will be considered as potential candidate sex-determination genes, and we will use the gene expression atlas in Aim 2 to narrow in on candidate genes involved in floral development.Aim 2: Constructing a gene expression atlas to dissect sex determination In order to predict putative sex determination genes, we will construct a high-resolution gene expression atlas to annotate genes in the five genome assemblies. This gene expression atlas will be comprised of at least 10 tissue types and stages, including 1) young leaf, 2) expanded leaf, 3) stem, 4) root, 5) early male flower bud, 6) mid male flower bud, 7) open male flower, 8) senescing male flower, 9) early female flower bud, 10) mid female flower bud, 11) open female flower, 12) senescing female flower, 13) seed. These tissues will be used for RNA isolation and mRNA sequencing, followed by differential gene expression testing. The genomes and RNA-seq data will be released in a public gene expressionbrowser.Aim 3: Identify master sex regulator genes and develop universal sex-linked markersIn this aim, we will further hone in on master sex determination genes that control male versus female flower development. New West Genetics has produced an ioninizing radiation population that will be screened for mutations in flowers, such as male sterility and female sterility and hermaphroditism. These mutant individuals will be resequenced with whole genome shotgun sequencing, and aligned back to the reference genomes produced in Aim 1. The key goal here is to identify one or more genes that are the key regulators of sex determination. A second outcome of this aim is to use the genomes and resequencing data, both public and the data generated here, to produce a set of sex-linked loci that are universally predictive of sex. Current marker development is often restricted to small germplasm pools, and here we instead leverage the power of pangenomics.Aim 4: Create an internship program focused on HBCU students learning genomic and field skillsHudsonAlpha, New West Genetics, and Alabama A&M will partner together in this aim to create an internship program that trains the next generation of students at the interface of genomics, breeding, and field work. We will recruit two students per year for this internship program, and focus on Alabama HBCU students. Interns will spend two months in the Harkess Lab at HudsonAlpha Institute for Biotechnology as part of the BioTrain genomics education program, and then several weeks at New West Genetics.

Progress 05/01/23 to 04/30/24

Outputs
Target Audience:The target audience of year 1 of this grant is specifically geared towards the Cannabis genomics community, including both academic research scientists and industry, as we work towards genomic data generation. Working closely with our vendor New West Genetics, a fiber hemp breeding and seed company, we have successfully engaged several scientists within New West Genetics, as well as within HudsonAlpha. A key goal of our proposal is to build an internship program that spans non-profit HudsonAlpha and New West Genetics, specifically targeting non-traditional and minority students to enter the Cannabis genomics and breeding industry. We were not able to recruit a summer intern due to the May start date of the proposal, but have successfully recruited TJ Singh, an undergraduate from Mississippi State University to begin in summer 2024. We will compensate for our lack of year 1 interns by increasing the number of summer interns in years 2-3. Hannah Mueller, a technician in the Harkess Lab, was specifically responsible for much of the wetlab work for this proposal; she is a first-generation college student. Changes/Problems:We have not had any major changes to this proposal. However, given the lowering costs of genome sequencing technologies (e.g. PacBio sequencing on the Revio, and Illumina sequencing on the NovaSeq X), we have been able to expand the scope of Objective 1 to include 10 more genomes through a collaboration with the Smart laboratory without any change in budget. What opportunities for training and professional development has the project provided?PI Alex Harkess has presented this ongoing work at four invited university seminars in year 1: Iowa State University, Harvard University and the Arnold Arboretum, University of Georgia, University of Alabama. Dr. Sarah Carey has presented some of this work at the Cannabis Genomics session at Plant and Animal Genome 2024. Hannah Mueller, a technician on this project, has presented some of this work at American Society of Plant Biology (ASPB) 2023 in Savannah Georgia. Hannah has also gained immense expertise in high molecular weight DNA isolation, RNA isolation, and PacBio and Omni-C library preparation through this project. In April 2024, Hannah left our laboratory to pursue her dream of Physical Therapy school, and she openly credits this laboratory experience with helping her find her dream path. We are very proud of the opportunities this grant provided to her. Dr. Sarah Carey, although funded by her own USDA NIFA postdoctoral fellowship, has taken ownership of much of the genome assembly for this project as it relates to her funded proposal. For instance, she has independently led the assembly of the Objective 1 five genomes, as they are also very useful for her NIFA postdoctoral fellowship on common hop sex chromosomes. Sarah has become a leader in the field of sex chromosome genomics and assembly. Dr. Julie Robinson was funded by this proposal as a postdoc, but recently she received a 2024 USDA NIFA postdoctoral fellowship on developing inducible sex determination systems in soybean. Again, we are very proud of the opportunities this grant has given the trainees in my laboratory. Our training program written as "Broader Impacts" for this proposal recruited our first intern trainee this year, but will occur in 2024. How have the results been disseminated to communities of interest?The project has been disseminated in several ways in year 1. Primarily, this has been through invited seminars and conference presentations. PI Alex Harkess has presented this ongoing work at four invited university seminars: Iowa State University, Harvard University and the Arnold Arboretum, University of Georgia, University of Alabama. Dr. Sarah Carey has presented some of this work at the Cannabis Genomics session at Plant and Animal Genome 2024, an invited seminar at Oregon State University, as well as at two internal HudsonAlpha research seminars. Hannah Mueller, a technician on this project, has presented some of this work at American Society of Plant Biology (ASPB) 2023 in Savannah Georgia. We are currently building several manuscripts with the pangenome data from Objectives 1 and 3, that likely will be submitted in Year 2. In year 1, we do not have any publications. Upon grant award, we did publish a press release broadly (https://www.eurekalert.org/news-releases/991632), as did New West Genetics (https://newwestgenetics.com/2023/06/07/hudsonalpha-and-new-west-genetics-collaborate-on-usda-nifa-grant/). What do you plan to do during the next reporting period to accomplish the goals?In year 2 of this proposal, we are on track to complete several major goals of the manuscript and are currently ahead of schedule. First, we will aim to complete all genome assemblies for the 5 genotypes sequenced. Given that the raw data is already in hand, we anticipate no major delays in this goal. The major goals for year 2 are to polish the assemblies, and when we have fully sequenced all of the RNA-Seq from Objective 2, we will have all the material needed for gene annotation. We have expanded this aim somewhat to include more genomes, through a collaboration with Dr. Larry Smart's laboratory, where we are assisting in the data generation and assembly of an additional 10 genomes. In year 2, we will focus heavily on building the pangenome graph for Cannabis. Second, as mentioned above, we are currently sequencing all of the RNA-seq necessary to complete Objective 2. We will work towards releasing this data in a gene expression atlas on a physical representation of a plan using the R and ggplot modules "gganatogram". This will be released as a publicly available module in year 2 on our lab website. All differential expression analyses will be conducted in year 2, and we will overlap male-specific gene expression with X- and Y-linked genes to narrow in on putative sex determination genes. Third, for Objective 3 we have generated all of the raw low-pass Illumina genotyping data necessary from 960 individuals to map dioecy and monoecy genes. We are currently using our pangenome graph from Aim 1 to map this low-pass data and expect to have narrowed these candidate sex determination genes within the year 2 reporting period. One challenge we suffered from was our inability to recruit a summer intern given the May start date of the proposal, so our proposed internship program was delayed until year 2. We have recruited one international student from MS State to begin a 4 month internship in year 2.

Impacts
What was accomplished under these goals? In the first year of this project, we have met or exceeded our goals for data generation and analysis. Objective 1: We have generated all raw data for 5 chromosome-scale PacBio genomes (3 XY males, 1 XX monoecious individual, and 1 XX female. This includes 60X coverage PacBio HiFi data, and 60X coverage Dovetail Omni-C sequencing. These data are currently being assembled into chromosome-scale assemblies. Additionally, due to the falling prices of genome sequencing (in particular Illumina PE150 costs), we collaborated with Dr. Larry Smart's laboratory at USDA/Cornell to assist in Omni-C data generation to scaffold 10 additional Cannabis genomes. Combined, this dataset is poised to produce the largest Cannabis pan-genome to date. We are working closely with Smart lab postdoc George Stack, who is leading these genome assemblies. Objective 2: For the gene expression atlas to uncover sex determination genes on the X and Y, our vendor collaborators at New West Genetics have isolated 96 tissue types in replicate for RNA for the gene expression atlas for objective 2. We have successfully isolated all RNA from these tissues and they are currently in TruSeq stranded RNA-seq library preparation. These libraries are currently slated for sequencing in year 2 of this proposal (summer 2024). Objective 3: Our vendor collaborators at New West Genetics have also provided excellent plant material for several populations of mutants related to sex variation. In year 1, we partnered with Josh Clevenger's laboratory at HudsonAlpha to generate 1X coverage low-pass Illumina genotyping of 960 Cannabis individuals from an F2 population segregating for sex phenotypes - specifically, monoecy, maleness, and femaleness. These data have all been generated, at an average of 1X coverage per haplotype. Using the pangenome graph from Objective 1, we are currently mapping sex variation to the X and Y sex chromosomes using the Clevenger Lab's Khufu genotyping pipeline. Preliminarily, we have hits on the X chromosome that match to some known sex determination genes in the Rosales. We believe that this is the most highly dense Cannabis genotyping dataset ever produced, and we are currently performing additional phenotyping on the New West Genetics germplasm that we sequenced in order to map additional traits (oil andflowering time in particular). Further, we are proving the ability for mapping low-pass genotyping data to pangenome graphs, which currently is not routine.

Publications