Source: OREGON STATE UNIVERSITY submitted to NRP
CHROMOSOME-LEVEL ASSEMBLY AND GENOMIC DATA SCIENCE TO REVEAL INSIGHTS ABOUT CONE DEVELOPMENT, DISEASE RESISTANCE, AND THE EVOLUTION OF HOP
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1023066
Grant No.
2020-67034-31722
Cumulative Award Amt.
$120,000.00
Proposal No.
2019-07273
Multistate No.
(N/A)
Project Start Date
Jun 15, 2020
Project End Date
Aug 14, 2022
Grant Year
2020
Program Code
[A7101]- AFRI Predoctoral Fellowships
Recipient Organization
OREGON STATE UNIVERSITY
(N/A)
CORVALLIS,OR 97331
Performing Department
COS Biochem & Biophysics
Non Technical Summary
Hop (Humulus lupulus) is known for the unique array of bioactive compounds that it produces. It has a long history of use in traditional medicine. Hop produces xanthohumol, which has anticancer activity, and bitter acids, which impart flavor in brewing and possess antimicrobial activity. Hop is also susceptible to diseases including powdery mildew (PM), which reduces the quality and quantity of hop cone yield. The genome of hop is large and complex, challenging early assembly efforts. Because of advances in sequencing and assembly, we now have a chromosome-level assembly that we can analyze for gene content and other genomic features. With a comprehensive annotation of genes, we can investigateevolutionary patterns in hop and extensively analyze gene expression and function. First, large-scale structural similarities between hop and closely related species, including hemp and mulberry,will be identified by aligning genes from the hop assembly to the genes of these other species. Significantly similar gene sequences will be ordered based on their genomic coordinates to detect regions of conserved gene order between the assemblies. In the next phase of the project, gene expression during hop cone development will be assessed. First, RNA will be extracted from cone and leaf tissues at five developmental time points in cones, followed by sequencing and alignment of RNA to the hop assembly. Overall, the purpose of this study is to 1) find genes that show variable expression during development; 2) predict the function of these genes by comparing them to genes from other species with known function that have a highly similar DNA sequence; 3) determine the location of genes along the chromosome; and 4) identify other genes and genomic features surrounding these genes of interest.The results of this study will have an impact on improving hop health and production. My goal is to uncover genes that both show variable expression during development and contribute to the presentation of desired traits, including disease resistance and synthesis of bioactive compounds. Mapping similarities between the genomes of hop and closely related specieswill improve our understanding about the evolutionary history of hop. It will also provide insight about whether genes involved in traits of interest are conserved between species or show evidence of selective pressure and duplication. The results of this study will inform the development of disease-resistant hop cultivars containing a desired profile of bioactive compounds through improved genomics-assisted breeding strategies. Taken together, the chromosome-level assembly of hop will transform hop genomics.
Animal Health Component
25%
Research Effort Categories
Basic
75%
Applied
25%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012230108075%
2062230104025%
Goals / Objectives
The first major goal of this project is to use the chromosome-level assembly of the Cascade hop genome to construct a high-quality set of gene predictions, repeat annotations, and other genomic features, including long non-coding RNAs (lncRNAs). The predicted function of gene models and their location along the length of chromosome scaffolds will allow for a comparison of gene content, conserved gene ordering and large-scale synteny, and occurrence of gene duplication in closely related plant species. Mapping regions of synteny will be important both for understanding how hop has evolved and for revealing selective pressures from environmental and cultivation stresses. Overall, a comprehensive annotation of the chromosome-level assembly will help hop researchers to select for desirable traits and will inform breeding and harvesting strategies. Hereafter, this part of the project will be referred to as CASA (chromosome-level assembly and synteny analysis).The first objective of CASA is to perform gene prediction and repeat annotation using the chromosome-level assembly of Cascade. I will assess the quality of these annotations using established metrics, including the percentage of gene models containing a protein domain, and the extent and type of evidence supporting the gene model prediction.The second objective of CASA builds upon the first objective through an analysis of gene content. Regions of synteny between hop and closely related species will be detected based on the relative location of genes along chromosome scaffolds. I will investigate the mechanism underlying duplication events by looking for evidence of whole genome duplication (WGD) and tandem duplication, and investigating functional enrichment of genes that occur in syntenic regions.The next phase of this project is a time-course (TC) RNA-seq study in cone and leaf tissues at five developmental time points in cone. The major goal of the TC RNA-seq study is to discover how gene and long non-coding RNA (lncRNA) expression changes during development. Detection of patterns of coordinated expression between genes and lncRNAs could provide information about how genes are regulated during development.The first objective of the TC RNA-seq study is to collect tissues, extract RNA, sequence, and align the RNA-seq from cone and leaf tissues at five developmental time points to the Cascade assembly, and then perform differential expression analysis.The second objective of the TC RNA-seq study is to annotate lncRNAs, identify patterns of co-expression with protein-coding genes, and assign functional associations based on the homology of protein-coding genes to known genes.
Project Methods
Overview:The chromosome-level assembly of Cascade will provide the foundation for identifying genomic features, including genes, repeat sequences, and long non-coding RNAs, that are associated with multiple important biological processes, including disease response and synthesis of desirable bioactive compounds.Methods:Repeat AnnotationRepeat sequences in the hop assembly will be identified with ade novolibrary of long terminal retrotransposon (LTR) sequences and a library of repeat annotations from plants. Thede novolibrary of LTRs will be constructed using LTR_FINDER, LTRharvest, and LTR_retriever. Masking will be performed by RepeatMasker.Gene PredictionGene models will be constructed by first aligning the assembly to sources of external evidence, including hop-specific RNA-seq and ESTs, and plant genes from UniProt. Gene prediction programs Augustus and SNAP will be trained on the Cascade assembly. Draft gene models will be iteratively constructed and improved with MAKER using the external alignment data and the trained gene prediction programs.Gene model quality will be assessed by1) quality metrics specific to MAKER, including the annotation edit distance (AED) and quality index (QI); 2) the overall number of gene models and the percentage of genes containing a protein domain; and 3) visually inspecting gene models on a browser to compare against RNA-seq transcripts and alignments.Gene Duplication and SyntenyDuplication in the hop genome will be assessed by aligning genes from the Cascade assembly all-against-all, collecting putative paralogs based on an E-value and score threshold, ordering the genes based on their genomic coordinates, and analyzing this set to detect conserved gene order. Synteny between hop and its close relatives will be investigated by aligning genes from the hop assembly to the genes of closely related species, includingCannabis sativa(hemp),Trema orientalis,Parasponia andersonii,Morus notabilis(mulberry), andVitis vinifera(wine grape), and an extensively annotated model plant, such asArabidopsis. The task of detecting synteny will be accomplished using MCScanX, and syntenic blocks will be visualized using Tripal Synteny Viewer.Time-course RNA-seqHop cones produce compounds valued for their aromatic and therapeutic properties, including essential oils, polyphenols, and bitter acids. The production of these compounds varies during development. Hop cones also show age-related resistance to powdery mildew (PM), where resistance to PM is acquired later in cone development. Because biosynthetic activity and disease response changes during development, it will be valuable to monitor gene expression throughout these formative stages.Tissue from cone and leaf will be gathered at five developmental time points in cones. RNA will be extracted using a Qiagen Plant RNeasy mini kit, followed by library prep and paired-end sequencing. RNA-seq will be processed by removing adaptor sequences and poor quality bases and reads with Trimmomatic. Reads will be aligned to the Cascade assembly with Hisat2 and assembled into transcripts with Stringtie. Differential expression across time points will be assessed with maSigPro. Functional enrichment of genes will be assigned based on homology to UniProt plant genes and their associated GO terms. To visually capture changes in gene expression that occur during development, gene expression will be displayed on GBrowse using the TopoView glyph. Expression of select genes will be validated by droplet digitalPCR (polymerase chain reaction).Identification of long non-coding RNAsLong non-coding RNAs (lncRNAs) will be identified by selecting transcripts at least 200 nucleotides (nt) in length and removing transcripts containing either a protein or transposon domain. Putative lncRNAs must lack an open reading frame (ORF), or have an ORF less than 300 nt in length. The pool of candidate lncRNA transcripts will be assessed for protein-coding potential with Coding-Potential Assessment Tool (CPAT). CPAT will be trained on protein-coding genes and lncRNAs fromArabidopsis.Functional Association and Co-expression of lncRNAs and Protein-Coding GenesThe functional association of lncRNAs can be inferred from the predicted function of co-expressed protein-coding genes. Co-expression will be determined from the pairwise Pearson correlation coefficient computed between all lncRNA and protein-coding gene pairs, applying a false discovery rate of 0.05.EffortData generated from this project will be uploaded to public databases, includingsequencing read archive (SRA), GenBank, NCBI Assembly, and Ag Data Commons, as well as our centralized repository for hop genomics,http://hopbase.org. Scripts written to perform computational analyses will be available on GitHub.Findings will be shared with scientists through scientific publications, conference posters and presentations, and social media. Also, our online resource, HopBase, providesa platform to engage with researchers who study hop and its close relatives.Efforts to engage the public will include social media and writing for popular science news outlets about the evolution of hop and its unique biochemistry.My scientific outreach to the community will include two components. The first is my involvement as a co-host on the radio show about graduate student research at Oregon State University, Inspiration Dissemination (ID). As a co-host, I will use this outlet to encourage graduate students studying plant genomics to participate in episodes of ID. I will also share my research on ID as a guest. The episode and accompanying blog post will be promoted on social media to reach a broader audience.The second component of my scientific outreach is my involvement in a computational biology summer camp for middle school students in summer 2021 that is organized by my primary advisor. I have participated in this camp during summers 2017-2019. For the next year, I will create content for this session, including a command line and web-based activity, and I will lead this session. The focus will be plant genomics, including foundational concepts in genome assembly and annotation, with an emphasis on what makes this area of research interesting and impactful to our daily lives. This camp provides an opportunity to mentor students and is an excellent opportunity to educate students about agricultural research happening in Oregon.EvaluationThe success of the project will be measured by the creation and impact of deliverables, including manuscripts and the addition of new features and data tohttp://hopbase.org. Two manuscripts are expected to result from this project: one manuscript at the end of year one, and a second manuscript at the end of year two. The first manuscript will report the chromosome-level assembly of Cascade and comparative genomic analyses between hop and closely related species. The second manuscript will contain the results of the time course RNA-seq study. Data will be uploaded tohttp://hopbase.org, NCBI Assembly, GenBank, sequencing read archive, and Ag Data Commons. Scripts written to perform computational analyses will be made available on GitHub. Posters and presentations at conferences that include findings from this project will also be an indicator of success. Feedback from scientists in a conference setting will provide a way to assess progress and gain new perspectives.

Progress 06/15/20 to 07/31/22

Outputs
Target Audience:Target Audience Target Audience Summary: The target audience reached by my workincludes scientists, the general public, and middle school students. Efforts: Scientists During my fellowship I reached scientists throughscientific publications and preprints. I also used Twitter to engagewith the scientific community. General public As part of my extension and outreach activity, I developed content for a Computational Biology Camp for middle school students, organized by my lab and primary advisor, Dr. Hendrix. I was also a lead instructor for one of the days of the camp. Our camp was held in-person at Oregon State University during the week of July 25-29,2022and need-based scholarships were available to students. The goal for the camp was to introduce students to foundational ideas in computational biology and phylogenomics. Students learned how to form a scientific research question, and then how to use Biopython and bioinformatic web tools to perform biological sequence analysis, to answer the question. The web page and contentI developed this year for the camp can be accessed at the below URL. This is in addition to the content I developed in previous years. From 2016-2021, I was a co-host on the Inspiration Dissemination (ID), a radio show, blog, and podcast about graduate student life and research at Oregon State University. Co-hosting ID provided an opportunity to meaningfully refine my science communication skills to share scientific findings and information with the general public. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?My fellowship provided extensive opportunities for training and professional development. Included below is a list of activities and accomplishments that happened since I began my fellowship in June 2020. Awards I wrote and submitted a successful postdoctoral fellowship proposal for the NSF Postdoctoral Research Fellowship in Biology in December 2021. The proposed project will take place in the lab of Todd Michael at the Salk Institute for Biological Studies in San Diego, California. This project was funded by NSF and will begin in September 2022. I received the "Graduate Researcher of the Year" award from my department at OSU, Biochemistry and Biophysics, on June 15, 2022. I was a finalist for the student travel award to attend the Plant and Animal Genome (PAG) 2022 conference in San Diego, California (email correspondence with Dr. Tom Blake, PAG plant travel grant coordinator). I received the award for "Best Graduate Student Poster" at the Center for Quantitative Life Sciences (CQLS) Fall Conference on September 21, 2021 at OSU. Poster title: A chromosome-level assembly of the "Cascade" hop (Humulus lupulus) genome provides insights into Cannabaceae evolution. Manuscript and proposal writing I had the opportunity to write and revise my manuscript about the chromosome-level assembly of the Cascade hop genome. This process included assembling and editing the manuscript, submitting the manuscript to a journal, and then making revisions and composing a response to reviewers. I participated in writing and developing ideas for a proposal about comparative genomic and metabolomic studies of terpene synthase genes in hop and hemp. This proposal was developed by the Cannabaceae research group that includes interdisciplinary researchers from USDA, the Global Hemp Innovation Center (GHIC), and the departments of biochemistry, horticulture, and pharmacy at OSU. The resulting proposal was submitted to USDA in May 2021 and received notice of award in December 2021. Through these meetings, I had the opportunity to participate in grant proposal writing related to my fellowship-associated research goals. Service I was asked to be part of the search committee for a Research Development Coordinator in the College of Science at Oregon State University. I was recruited to participate in the search committee by Professor Vrushali Bokil, Associate Dean of Research and Graduate Studies in the College of Science. I served as an ad hoc reviewer for Frontiers in Plant Science, section Plant Bioinformatics in February 2022, for HortScience in July 2021, and Scientific Reports in May 2021. Workshops, coursework, and continued work on HopBase.org I took a class in Spring 2022 called Algorithms for Computational Biology that was taught by my graduate advisor, Dr. Hendrix (received an A in this class). The class included a variety of algorithms used in computational biology, including sequence alignment with Smith-Waterman and Needleman-Wunsch, Burrows-Wheeler transform, and methods for developing phylogenetic trees. I took a class called Statistical Genomics in Winter 2021 (I received an A in this class). Besides learning about many different statistical methods commonly used in genomics, such as dimensionality reduction using PCA, I completed a group project. I worked with two other students to develop a project focused on clustering and dimension reduction of time-labeled gene expression data from post-mortem human brain tissue samples. By applying clustering and dimension reduction techniques, our goal was to train a model capable of predicting time-of-day on unlabeled gene expression data sets. We performed data analysis in this course with R. I was accepted to the Total Evidence Dating Virtual Workshop led by the Vascular Plant Timeline Working Group held on Wednesdays and Fridays from March 2-30, 2022, which was a great opportunity to ask questions about Bayesian statistics and fossil calibration dates to generate a fossil-calibrated time tree. I continued to update and maintain the data on HopBase.org, which also involved answering questions and troubleshooting problems from database users. Talks I was invited to give a virtual talk about my research to the Gary Muehlbauer Lab in the Department of Agronomy and Plant Genetics at University of Minnesota. I gave a talk about my graduate research project as a prospective postdoctoral fellow to the labs of the Harnessing Plants Initiative at the Salk Institute in May 2022. I had the opportunity to give a talk at an international conference, the 5th International Humulus Symposium, about my work on HopBase.org, held virtually in March 2021. Teaching I had the opportunity to further develop learning content and to lead activities for the computational biology camp for middle school students. The camp was organized by my graduate advisor, Dr. Hendrix, and was held during the week of July 25-29, 2022. Previously, the camp was held on July 12th-16, 2021 and November 14-15, 2020. I helped to mentor and train a first-year graduate student who took over my graduate project on the hop genome. Milestones I graduated with my PhD from the Department of Biochemistry and Biophysics at Oregon State University on July 21, 2022. How have the results been disseminated to communities of interest?We disseminated results to communities of interest primarily through peer-reviewed publications in scientific journals. We also deposited our sequencing data to NCBI (BioProject ID PRJNA562558) and have made other accompanying data, including gene models, available on http://hopbase.cqls.oregonstate.edu/. Our computational pipelines and scripts are available on GitHub: https://github.com/padgittl/CascadeDovetail and https://github.com/padgittl/CascadeHopAssembly, to make the protocols more transparent and to guide others embarking on similar tasks. For my first-author paper about the PacBio assembly of the Cascade hop genome, released in February 2021, we shared the announcement of the publication on Twitter and coordinated an announcement of the publication with the OSU press office. Coordinating the press release through OSU resulted in being contacted by local news organizations about the release of our paper, which reached a broader, more general audience. The peer-reviewed publication will allow our research to reach an audience interested in the genomics of hop and its close relatives. To enhance public understanding about computational biology and genomics, I helped to organize the computational biology camp for middle school students. A primary goal of the camp was to encourage interest in learning among young students, particularly students who might not be aware of opportunities in computational biology, and to demonstrate how a career in science can be fun and also provide benefits to people. From 2016-2021, I was a co-host on the Inspiration Dissemination (ID), a radio show, blog, and podcast about graduate student life and research at Oregon State University. Co-hosting ID provided an opportunity to meaningfully refine my science communication skills to share scientific findings and information with the general public. I was centrally involved in preparing and hosting ID. During my time as a co-host on ID, I was part of more than 30 shows and I helped to mentor new co-hosts. The process of producing a show consisted of meeting with the guest for an hour-long pre-interview where we mapped the trajectory of the 30-minute, on-air show, followed by writing a 500-800-word blog post and composing an outline of questions for the show. Blog: http://blogs.oregonstate.edu/inspiration/author/padgittl/ What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Impact Statement: My project had two major goals. The first major goal was to use the chromosome-level assembly of the Cascade hop genome to construct a high-quality set of gene model predictions, repeat annotations, and long non-coding RNAs (lncRNAs). Towards this goal, I used multiple methods to develop and evaluate gene model predictions, repeat and transposon annotations, and lncRNAs. The second phase of the project was the time-course (TC) RNA-seq study where the major goal was to identify changes in expression associated with genes and lncRNAs during development, with the intention of detecting patterns of coordinated expression between genes and lncRNAs as a way to discover facets of gene regulation. Major activities completed / experiments conducted: For the first phase of the project, in the first objective, I performed assembly polishing and repeat masking. I also assessed the quality of the gene models using established metrics, including the percentage of gene models containing a protein domain, and the extent and type of evidence supporting the gene model prediction. As part of the second objective, I performed a detailed investigation of gene content, organization, evolution, and synteny (conserved gene order). I investigated the mechanism underlying duplication events by looking for evidence of whole genome duplication and tandem duplication using distributions of Ks (the rate of synonymous substitutions) values, and investigating functional enrichment of genes occurring in syntenic regions. For the TC RNA-seq study, I assembled the transcripts and processed the data for the differential expression analysis. The TC RNA-seq study was a collaborative project with Dr. Renee Eriksen, a post-doc in the lab of my co-advisor, Dr. Henning. We published two papers directly related to the major goals of the project: Padgitt-Cobb LK, Kingan SB, Wells J, Elser J, Kronmiller B, Moore D, Concepcion G, Peluso P, Rank D, Jaiswal P, Henning J, Hendrix DA (2021). A draft phased assembly of the diploid cascade hop (Humulus lupulus) Genome. The Plant Genome, e20072. Eriksen RL, Padgitt-Cobb LK, Randazzo AM, Hendrix DA, Henning JA. Gene Expression of Agronomically Important Secondary Metabolites in cv.' USDA Cascade' Hop (Humulus lupulus L.) Cones during Critical Developmental Stages. Journal of the American Society of Brewing Chemists. 2021 Aug 28:1-4. A third manuscript reporting the chromosome-level assembly of Cascade is posted as a preprint to bioRxiv: Padgitt-Cobb LK, Pitra NJ, Matthews PD, Henning JA, Hendrix DA. A chromosome-level assembly of the "Cascade" hop (Humulus lupulus) genome uncovers signatures of molecular evolution and improves time of divergence estimates for the Cannabaceae family. bioRxiv. 2022. https://www.biorxiv.org/content/early/2022/04/12/2022.03.24.485698.full.pdf Simultaneously, we published two additional papers that used the hop genome assembly as the reference assembly: Padgitt-Cobb LK, Kothen-Hill S, Henning J, Hendrix DA. The long-read genome assembly of hop (Humulus lupulus) uncovers the pseudoautosomal region and other genomic features. In V International Humulus Symposium 1328 2021 Mar 8 (pp. 1-16) Eriksen RL, Padgitt-Cobb LK, Townsend MS, Henning JA (2021). Gene expression for secondary metabolite biosynthesis in hop (Humulus lupulus L.) leaf lupulin glands exposed to heat and low-water stress. Scientific Reports, 11(1), 1-18.). 2) Data collected: Using the chromosome-level assembly of Cascade and RNA-seq data from cone, leaf, meristem, and stem tissues, I developed gene models and then used these gene models as the basis for further modes of data analysis and generation, including creation of orthologous gene groups, syntenic gene blocks, phylogenetic trees, and functional annotations. Based on the information obtained from the gene models, I also developed a set of lncRNAs that will be a valuable resource for further study. 3) Summary statistics and discussion of results: For the first phase of the project, the outcomes of the project included 1) identification of gene content and genomic organization; 2) identification of syntenic gene blocks; and 3) development of orthologous gene groups, including single-copy orthologs that we used for estimating a time tree based on multiple methods. The analysis of orthologous gene groups incorporated protein sequences from seven other plant species, including Cannabis sativa, Morus notabilis, Parasponia andersonii, Prunus persica, Trema orientale, Vitis vinifera, and Ziziphus jujuba. Further, we performed extensive comparative genomic analyses with other members of the Cannabaceae family, including hemp (Cannabis sativa). We found an enrichment of defense response and terpene synthase genes that tend to co-localize near each other, which has implications for understanding how copy number variation and genomic arrangement influences defense response and terpene production in hop. Defense response and terpene production are areas of intense research interest for hop and its close relatives. For the second phase of the project (TC RNA-seq study), we performed an analysis of gene expression during critical stages of development in hop cones. We found that important secondary metabolic genes are up-regulated during the middle stages of cone development, which is supported by previous research reporting the abundance of secondary metabolites during the middle stage of development. We hypothesized that stress during the middle stage of cone development could be particularly harmful to the production of bitter acids in hop. From these results, we can conclude that the time of harvest is an important factor in determining the final concentration of desired secondary metabolites. 4) Key outcomes or other accomplishments realized: The major outcomes of this project signify a change in knowledge. Specifically, key direct outcomes of this work includes the submission of our manuscript reporting these results to bioRxiv (Padgitt-Cobb LK, Pitra NJ, Matthews PD, Henning JA, Hendrix DA. A chromosome-level assembly of the "Cascade" hop (Humulus lupulus) genome uncovers signatures of molecular evolution and improves time of divergence estimates for the Cannabaceae family. bioRxiv. 2022. https://www.biorxiv.org/content/early/2022/04/12/2022.03.24.485698.full.pdf). Our bioRxiv preprint is currently under review. The findings reported in the bioRxiv preprint demonstrate an improved understanding of genomic content and evolution in hop. The assembly, gene models, gene model functional annotations, and repeat annotations will continue to inform and guide research efforts for the plant science community. This work also serves as the foundation for future research projects for other graduate and undergraduate students. Further, the genomic resources that we developed are publicly available on http://hopbase.cqls.oregonstate.edu/ and under NCBI BioProject IDPRJNA562558. Scripts and pipelines are accessible on https://github.com/padgittl/CascadeDovetail. Taken together, these resources will be valuable for the plant science community.

Publications

  • Type: Theses/Dissertations Status: Accepted Year Published: 2022 Citation: Padgitt-Cobb, Lillian K. 2022. Towards Resolving Functional and Evolutionary Mysteries of the Large and Heterozygous Genome of Hop (Humulus Lupulus) and the Cannabaceae Family Using Genomic Data Science. Oregon State University. https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/x920g4930
  • Type: Conference Papers and Presentations Status: Published Year Published: 2021 Citation: Padgitt-Cobb LK, Kothen-Hill S, Henning J, Hendrix DA. The long-read genome assembly of hop (Humulus lupulus) uncovers the pseudoautosomal region and other genomic features. InV International Humulus Symposium 1328 2021 Mar 8 (pp. 1-16).
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Eriksen RL, Padgitt-Cobb LK, Randazzo AM, Hendrix DA, Henning JA. Gene Expression of Agronomically Important Secondary Metabolites in cv. USDA Cascade Hop (Humulus lupulus L.) Cones during Critical Developmental Stages. Journal of the American Society of Brewing Chemists. 2021 Aug 28:1-4.
  • Type: Journal Articles Status: Under Review Year Published: 2022 Citation: Padgitt-Cobb LK, Pitra NJ, Matthews PD, Henning JA, Hendrix DA. A chromosome-level assembly of the "Cascade" hop (Humulus lupulus) genome uncovers signatures of molecular evolution and improves time of divergence estimates for the Cannabaceae family. bioRxiv. 2022. https://www.biorxiv.org/content/early/2022/04/12/2022.03.24.485698.full.pdf
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Padgitt-Cobb LK, Kingan SB, Wells J, Elser J, Kronmiller B, Moore D, Concepcion G, Peluso P, Rank D, Jaiswal P, Henning J, Hendrix DA (2021). A draft phased assembly of the diploid cascade hop (Humulus lupulus) Genome. The Plant Genome, e20072.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Eriksen RL, Padgitt-Cobb LK, Townsend MS, Henning JA (2021). Gene expression for secondary metabolite biosynthesis in hop (Humulus lupulus L.) leaf lupulin glands exposed to heat and low-water stress. Scientific Reports, 11(1), 1-18.


Progress 06/15/20 to 06/14/21

Outputs
Target Audience:Target Audience Summary: The target audience reached by my work this year includes scientists, the general public, and middle school students. Efforts: Scientists I reached scientists through two peer-reviewed scientific publications. Specifically, my research reached scientists after publication of "A draft phased assembly of the diploid Cascade hop (Humulus lupulus) genome," published in The Plant Genome in February 2021. I am also second-author on a published paper in Scientific Reports, titled, "Gene expression for secondary metabolite biosynthesis in hop (Humulus lupulus L.) leaf lupulin glands exposed to heat and low-water stress." Although these two papers do not directly include the work described in my proposal, the published work is relevant to my target audience, including researchers interested in hop breeding, genetics, and genomics, haplotype-phased assemblies, andgenome assembly more generally. Additionally, I am second-author on a manuscript currently under review at the Journal of the American Society of Brewing Chemists, titled, "Gene expression of agronomically important secondary metabolites in cv. 'USDA Cascade' hop (Humulus lupulus L.) cones during critical developmental stages," which targets a similar audience to thetwo papers mentioned previously. Once published, this work will help breeders to determine the optimal time to harvest hop cones, to achieve the desired aromatic profile, based on gene expression levels during development. We found increased expression of genes involved in bitter acid biosynthesis during the mid stage of development, which will be relevant information to the target audience. On March 8th, 2021, I gave a virtual talk at the 5th International Humulus Symposium about my work on HopBase, titled "Developing and improving genomic resources for Humulus lupulus at HopBase.org." Through this talk, I was able to directly reach scientists in the hop and hemp community, and received both feedback and questions about HopBase. I shared my scripts and run commands on GitHub, with the intention of making our scientific process more transparent, and also with the goal of helping other scientists to perform similar analyses. I used my Twitter platform to engage with thegeneral scientific community. General public My co-advisor, Dr. John Henning, was interviewed for Oregon Public Broadcasting's program "Think Out Loud" about the PacBio long-read assembly of the hop Cascade genome. Also, my advisors and I were interviewed for a news piece by a journalist atCapital Press. By sharing the results of our work with the media, we were able to reach the general community. Middle school students As part of my extension and outreach activity, I developed content for a Computational Biology Camp for middle school students, organized by my lab and primary advisor, Dr. Hendrix. I was also a lead instructor for one of the days of the camp. Our camp was held over the weekend of November 14-15th, 2020, and need-based scholarships were available. The goal for the camp was to introduce students to foundational ideas in computational biology and phylogenomics.Students learned how to form a scientific research question, and then how to use Biopython and bioinformatic web tools to perform biological sequence analysis, to answer the question. The web pages I developed for the camp can be accessed at the below links: http://compbiocamp.cgrb.oregonstate.edu/2020/TreeOfLifePart1.html http://compbiocamp.cgrb.oregonstate.edu/2020/TreeOfLifePart3.html http://compbiocamp.cgrb.oregonstate.edu/2020/TreeOfLifePart4.html Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Training activities I had weekly one-on-one meetings with my primary advisor, Dr. Hendrix, to discuss ideas, coding, and results, and I also met with both Dr. Hendrix and my co-advisor, Dr. Henning, to discuss ideas and results. I completed a graduate-level course in statistical genomics, offered by the statistics department at Oregon State University (OSU), during winter 2021. This course involved a collaborative final project using a dataset provided by a group member. We performed clustering and dimension reduction on a gene expression data set labelled by time, with the goal of generating principal components that could be included in a machine learning model to predict time in a time-unlabeled gene expression dataset. I collaborated on two projects with Dr. Renee Eriksen, a USDA post-doc in Dr. Henning's lab. The first project involved gene expression from hop tissues exposed to heat and drought stress, and this project culminated in a publication in Scientific Reports, titled, "Gene expression for secondary metabolite biosynthesis in hop (Humulus lupulus L.) leaf lupulin glands exposed to heat and low-water stress." For this project, I did transcript assembly and prepared the data for differential expression analysis. The second collaborative project involved the time-course RNA-seq experiment described in objective two of my fellowship proposal, and this work culminated in a manuscript that is under review at the Journal of the American Society of Brewing Chemists, titled, "Gene expression of agronomically important secondary metabolites in cv. 'USDA Cascade' hop (Humulus lupulus L.) cones during critical developmental stages." For this project, I did the transcript assembly and prepared the data for differential expression analysis. In May 2021, I reviewed a manuscript from Scientific Reports about the identification of long non-coding RNAs in a non-model plant species exposed to drought. This experience offered a valuableopportunity to acquire the skills to review a manuscript. Professional development activities I contributed to an interdisciplinary collaboration involving hop and hemp researchers, including researchers associated with the Global Hemp Innovation Center (GHIC) at OSU, as well as USDA-ARS, to identify and discuss potentially synergistic areas of research. Through these meetings, I have had the opportunity to participate in grant proposal writing related to my fellowship-associated research goals. I had the opportunity to give a talk at an international conference, the 5th International Humulus Symposium, about my work on HopBase.org, held virtually in March 2021. For the same event, my primary advisor, Dr. Hendrix, was invited to give a keynote talk and submit a conference manuscript, which was titled, "The Long-read Genome Assembly of Hop (Humuluslupulus) Uncovers the Pseudoautosomal Region and other Genomic Features." I had the opportunity to develop content for the virtual Computational Biology Camp for middle school students, organized by my lab and held on the weekend of November 14-15th, 2020. I developed most of the content for the second day of activities for the two-day camp, and also led the camp activities on the second day. This opportunity allowed me to develop skills in teaching and the creation of educational materials. How have the results been disseminated to communities of interest?Results have primarily been disseminated to communities of interest through peer-reviewed publications. For my first-author paper about the PacBio assembly of the Cascade hop genome, released in February 2021, we shared the announcement of the publication on Twitter and coordinated an announcement of the publication with the OSU press office. Coordinating the press release through OSU resulted in being contacted by local news organizations about the release of our paper, which reached a broader, more general audience. The peer-reviewed publication will allow our research to reach an audience interested in the genomics of hop and its close relatives. My computational pipelines are accessible on my GitHub project page, both to make the protocol more transparent and to guide others embarking on similar tasks. I provided scripts and run commands on GitHub to document the process of gene prediction and repeat identification, as well as centralizing all of the references that were helpful to me during the process of developing the pipeline. My GitHub project page is here: https://github.com/padgittl/CascadeHopAssembly. What do you plan to do during the next reporting period to accomplish the goals? Annotate lncRNAs in the chromosome-level assembly and TC-RNA-seq data set. Finish comparative genomic analysis of hop and hemp genes, including identification of collinear genes andorthologous gene groups. This will include an analysis of sequence evolution based on dN/dS, and detection of large-scale duplication events with dS plots. Identify co-expression of lncRNAs with protein-coding genes and assign functional associations based on the homology of protein-coding genes to known genes. Finish preparing manuscript describing the chromosome-level assembly of Cascade, including gene model predictions, repeat annotations, lncRNA annotations, and comparative analyses of hop and closely-related species. Make data publicly available on relevant sites, including NCBI, Ag Data Commons, and HopBase. Write an article about comparative genomics of hop and hemp for a general audience that is interested in science, targeting an outlet such as WIRED or Nautilus. Prepare post-doctoral fellowship applications in anticipation of graduation in June 2022.

Impacts
What was accomplished under these goals? IMPACT Hop (Humulus lupulus) is prized for its variety of aromatic and flavor compounds, as well as compounds with anti-microbial properties and pharmaceutical relevance, including terpenes and prenylated flavonoids. Hop is susceptible to pathogens, including powdery mildew, that cause damage and interfere with production of desirable compounds. Uncovering genetic mechanisms underlying response to different stressors is an imperative area of study for hop researchers, and requires high-quality genomic resources. Recent advances in technology have made assembly of the large and complex hop genome more tractable. The PacBio long-read assembly (PBA) of the cultivar Cascade has provided the scaffolding foundation for a chromosome-level assembly (CLA), the most complete hop assembly to-date. I developed a set of gene models for the CLA which I am using to investigate sequence evolution, gene function, and gene ordering along the length of chromosomes, to improve ourunderstanding of how the genome is organized.The close relationship between hop and hemp, both members of the Cannabaceae, opens the door for comparative genomic analyses, to uncover their shared evolutionary history, as well as how similarities and differences in their gene and regulatory content may help guide breeding strategies.This work will improve our understanding of genomic features in hop, to guide future research efforts. "The first major goal of this project is to use the chromosome-level assembly of the Cascade hop genome to construct a high-quality set of gene predictions, repeat annotations, and other genomic features, including long non-coding RNAs (lncRNAs)." 1) Major activities completed/experiments conducted I generated repeat annotations and gene models, and assessed their quality. The repeat annotation method is described in my paper about the PBA. The gene prediction approach incorporates alignment evidence from a variety of sources and uses two gene prediction tools: Augustus trained by BUSCO, and SNAP. All of the evidence is compiled by MAKER to generate a consensus gene prediction. 2) Data collected Data already collected. 3) Summary statistics and discussion of results First, I polished the CLA with DNA short reads from Cascade using POLCA polishing tool. Polishing improved BUSCO statistics, increasing from 85.8% single-copy and 6.9% duplicated (92.7% complete overall) to 88.4% single-copy and 7.7% duplicated (96.1% complete overall). Additionally, the N50 of the CLA (345.3 Mb) is much larger than the PBA (672.6 kb), and the CLA has significantly fewer scaffolds (1533) than the PBA (8661). I found that 64.3% of the CLA is repeat-associated, based on de novo repeat annotations and sequence similarity to a set of known plant repeat sequences. Long terminal retrotransposons make up 62.1% of the CLA overall, and represent 96.7% of repeat content. There are 71233 gene models. Quality assessment includes the number of genes with a Pfam domain and using the annotation edit distance (AED) from MAKER to evaluate the extent of supporting evidence for a gene. An AED of zero denotes total agreement with the evidence and an AED of one denotes total lack of consistency with the evidence. I calculated that 64.8% of genes have an AED below 0.5. I annotated the genes based on similarity to UniProt genes and Pfam domains, and also identified repeat-associated genes. There are 48143 genes (67.6%) with a Pfam domain and 32511 repeat-associated genes (45.6%). There are 16148 genes (22.7%) with significant similarity to a UniProt plant gene or non-repeat Pfam domain. 4) Key outcomes or other accomplishments realized Changes in knowledge include N50 (345.3 Mb) and number of scaffolds (1533) in the CLA; percentage of the CLA that is repeat-associated (62.1%); overall number of genes (71233); predicted gene function. "The second objective of CASA builds upon the first objective through an analysis of gene content. Regions of synteny between hop and closely related species will be detected based on the relative location of genes along chromosome scaffolds." 1) Major activities completed/experiments conducted This objective includes a comparative analysis of genes in hop and closely related species. First, collinear genes were identified with MCScanX. Synonymous substitution rates (dS) were calculated for each pair of paralogs from hop and hemp. Species-specific dS values were visualizedto detect any peaks suggesting duplication events. Orthologous gene groups (OGGs) were identified by OrthoFinder among closely related species, including hop, hemp, Parasponia andersonii, Trema orientale, Morus notabilis, Prunus persica, Vitis vinifera, and Ziziphus jujuba. 2) Data collected Data already collected. 3) Summary statistics and discussion of results I found 2396 collinear gene pairs between hop and hemp, 543 hemp-only pairs, and 2489 hop-only pairs. Each of the 10 largest CLA scaffolds shares significant collinearity with placed scaffolds from the hemp assembly. The dS plot for hemp shows an enrichment of dS values near zero and above 1.25. Hop is also enriched for dS values near zero. The abundance of dS values near zero suggests a large number of recently duplicated genes. This analysis requires further work. There are 3289 OGGs with at least one gene from each of the eight species in the analysis, and 245 groups have at least two genes from every species, suggesting this subset of OGGs contains at least one duplication event. There are 882 groups containing single-copy genes in all species. There are 78 OGGs containing only hop genes; 70 OGGs contain single-copy genes in only hop and hemp, and 161 groups contain more than one gene from only hop and hemp. 4) Key outcomes or other accomplishments realized Changes in knowledge include my initial work to uncover the extent of shared gene content among closely-related species, conserved gene ordering in hop and hemp, and sequence evolution of paralogs within species. "The first objective of the TC RNA-seq study is to collect tissues, extract RNA, sequence, and align the RNA-seq from cone and leaf tissues at five developmental time points to the Cascade assembly, and then perform differential expression analysis." 1) Major activities completed/experiments conducted For this activity, I collaborated with Dr. Renee Eriksen, a post-doc in Dr. Henning's lab. We performed a time-course (TC) RNA-seq study using the PBA as the reference genome. I performed the transcript assembly and Dr. Eriksen performed the analysis of differential expression between time points. This work is described in a manuscript titled, "Gene expression of agronomically important secondary metabolites in cv. 'USDA Cascade' hop (Humulus lupulus L.) cones during critical developmental stages," which is currently under review. Our work is based on the first objective for the TC RNA-seq study,using the PBA as the reference genome. We plan to re-analyze these data using the CLA. 2) Data collected Data already collected. 3) Summary statistics and discussion of results We assembled RNA-seq from cone tissue of Cascade at three stages of development, corresponding to early, mid, and late, or near harvest. The mid stage showed a significant increase in expression of genes involved in secondary metabolic pathways, including biosynthesis of bitter acids, xanthohumol, and oils. 4) Key outcomes or other accomplishments realized Our findings show that expression of biosynthesis genes is higher during the mid stage, with implications for understanding how stress may impact production of key metabolites at this time point. "The second objective of the TC RNA-seq study is to annotate lncRNAs, identify patterns of co-expression with protein-coding genes, and assign functional associations based on the homology of protein-coding genes to known genes." 1) Major activities completed/experiments conducted Completion of this objective is a goal for the upcoming year.

Publications

  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Padgitt-Cobb, L. K., Kingan, S. B., Wells, J., Elser. J., Kronmiller, B., Moore. D., Concepcion, G., Peluso, P., Rank, D., Jaiswal, P., Henning, J., Hendrix, D. A. (2021). A draft phased assembly of the diploid cascade hop (Humulus lupulus) Genome. The Plant Genome, e20072.
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Eriksen, R. L., Padgitt-Cobb, L. K., Townsend, M. S., & Henning, J. A. (2021). Gene expression for secondary metabolite biosynthesis in hop (Humulus lupulus L.) leaf lupulin glands exposed to heat and low-water stress. Scientific Reports, 11(1), 1-18.
  • Type: Journal Articles Status: Under Review Year Published: 2021 Citation: Eriksen, R.L., Padgitt-Cobb, L. K., Randazzo, A.M., Hendrix, D. A., Henning, J.A. (2021) Gene expression of agronomically important secondary metabolites in cv. USDA Cascade hop (Humulus lupulus L.) cones during critical developmental stages.
  • Type: Conference Papers and Presentations Status: Submitted Year Published: 2021 Citation: Padgitt-Cobb, L. K., Henning, J., Hendrix, D.A. The Long-read Genome Assembly of Hop (Humulus lupulus) Uncovers the Pseudoautosomal Region and other Genomic Features.