Performing Department
Students
Non Technical Summary
Decades of studying plant gene function have revealed that DNA that does not encode proteins is far from "junk DNA", rather, it has a strong and widespread impact on plant traits. In corn, for example, genetic variation in noncoding regions explains on average 40% of the genetic variation across agronomic traits. On a molecular level, we know that binding between short segments of noncoding DNA and certain proteins dictates the extent, time and space at which a particular gene is expressed (or silenced). There are tens of thousands of such noncoding regulatory DNA segments in a genome that potentially operate and interact to control gene expression, and ultimately, agronomic traits. We know that variation in these noncoding DNA elements explainsagronomic performance, and that targeted changes to specific noncoding sequences can alter agrnomic traits such as yield. However, predicting phentoypic outcomes from changes to individual regulatory DNA sequences remains challenging. Consequently, our limited understanding of noncoding DNA limits crop improvement.The overarching goal is to uncover putative regualtory DNA elements as conserved noncoding sequences within grass genomes, establish correlations between conserved noncoding sequence variation and awn traits, and dissect the function of CNSs with suspected roles in awn traits. Two primary questions will be addressed: 1) How does variation in conserved noncoding sequences relate to awn traits over deep evolutionary time? 2) How do specific conserved noncoding sequences regulate awn development? The central hypothesis is that awns repeatedly evolved in the grasses via cis-regulatory modification to a conserved leaf blade developmental program. Ultimately, the proposed work will unveil principles and strategies for precisely enhancing crop plants through targeted editing of regulatory DNA elements.This project aims to dissect the function of specific noncoding DNA elements. To do so, I will ask how variation in putative regulatory DNA elements is correlated with important agronomic traits across grass species, using the grass awn as an example. Awns impact agricultural yield, playing roles in photosynthesis, seed dispersal and sprouting, and defense. While manygrasses have awns, they are not essential and are thought to have arisen at least 12 times independently. Structurally, awns are a modified leaf blade, which are present in all grasses, suggesting that the awn arises through expressionof conserved leaf blade genes in a new time and place, possibly through modification of regulatory DNAsequences near genes known to berequired to formleaf blades. Descriptions of awn traits are available for over 11,000 grass species and the genomes of approximately 180 grass species have been sequenced,providing a wealth of phenotypic and genetic data to associate variation in noncoding sequence with species-level traits. Furthermore, genome editing tools are available for multiple grass species.First, I will use Conservatory, a recently developed algorithm, to generate a comprehensive database of putative regulatory DNA elements among 180 sequenced genomes representing grass diversity. Second, I will associate regulatory DNA variation with publicly available awn phenotypes, focusing on genes expressed in developing awns. Finally, to investigate the function of specific noncoding sequences, I will use genome editing to test the function of individual or combinations of putative regulatory DNA sequences. Initial editing will be carried out in maize and Brachypodium distachyon, with a focus on five candidate regulatory sequences near a gene required for awn development known as DROOPING LEAF (DL).
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Goals / Objectives
The major goals of this project are to identify functional roles of specific regulatory DNA sequences on agronomic traits. Often, regulatory DNA sequences do not specify the structure of a protein, but instead control when, where, and how much of a particular protein is made. This regulatory control, and ultimately, plant traits, emerges from the combined operation and interaction of tens of thousands of short, modular regulatory DNA sequences. Recent work has shown that targeted changes to specific regulatory DNA sequences can improve agronomic performance, including yield traits, but the outcomes of those changes were not predictable. This research aims to close that knowledge gap, toward rational fine-tuning of agronomic performance. The chosen system of study is the grasses, which include the major cereal crops maize, rice, wheat and sorghum. Many grass species have awns, which are long, bristle-like outgrowths borne on the leaves of grass flowers. Across grass species, awns vary in form and carry out diverse and important functions such as photosynthesis, seed burial, germination, and defense. Despite the diversity of awn form and function, awns are modified leaf blades, which are essential in all grasses. Furthermore, grasses with leaf blade defects often show correlated awn defects, implying that leaf blades and awns use the same genes for their development. This project tests the prediction that awns arise through regulatory changes to leaf blade genes, and that specific, modular regulatory sequences are required to produce awns. Using this system, this project aims to uncover regulatory DNA sequences that, when removed or altered, predictably change awn traits. This dissection of awn regulatory logic may provide new insights on gene regulation that could be leveraged for precise crop improvement.Specific objectives include:Identify DNA sequences that likely have regulatory function in grasses. Often, such DNA sequences are preserved across evolutionary time, and can be detected computationally with methods established in the laboratory that the PD has recently joined. The PD will adapt these methods to annotate putative regulatory sequences in 120 diverse grass genomes.Identify genes that control awn development. The PD will measure gene activity throughout the development of floral leaves, in the awned grass Brachypodium distachyon and the awnless grass Zea mays (maize). Genes active in this developmental course and the putative regulatory sequences near these genes will be prioritized for characterization.Identify regulatory DNA sequences that correlate with awn traits across species. The PD will develop, publicly release, and maintain novel software to carry out these analyses.Test the function of regulatory DNA sequences with suspected roles in awn development by CRISPR editing, followed by thorough phenotypic characterization of awns and leaves. Initial characterization will dissect the function regulatory sequences near a gene that, when removed, ablates both leaf blade midribs and awns.Provide training in agricultural sciences, plant molecular biology, evolutionary biology, and genome editing to two undergraduate students.Participate in at least one public-facing outreach event per year.Publish 2-3 research articles pertaining to this project.Present research findings at one annual conference per year.
Project Methods
The first activity of the project involves identification of conserved noncoding sequences (CNSs) in grass genome assemblies. Genome assemblies will be obtained from public databases and collaborators. The published analysis pipeline Conservatory will be used to identify CNSs. In brief, Conservatory leverages established open-source software to carry out several routine bioinformatics tasks, including protein sequence alignment, identification of species-specific orthologous proteins, multiple sequence alignment, and identification of conserved positions in a multiple sequence alignment. The output of Conservatory is a genome annotation text file in GFF3 format. Annotations will be subject to quality control by reanalysis of publicly available genomic datasets, including ATAC-seq, ChIP-seq, DAP-seq, methyl-seq, GWAS hits, selective sweeps, and known transcriptional enhancers. Furthermore, success will be measured by comparison of the new annotations to published CNS annotations; the updated annotation is expected to recover known annotations and new ones, and furthermore, refine the breadth of known CNS conservation across species. To deliver science-based knowledge of this activity, annotation files will be made available on a public-facing website for general use, and both the number of visits to this website and the number of downloads will be monitored. Conservatory software is hosted publicly on github; the PD will contribute to ongoing development, maintenance, and assistance of users in troubleshooting. Suitable datasets and analysis software are already available for use, therefore, this activity will be carried out in Year 1.The second aim of the project is to reconstruct transcriptional profiles of developing floral bracts and associate variation in noncoding sequence with species-level awn phenotypes. The PD will define morphologically distinct stages of bract development in wild-type and select mutants of Brachypodium distachyon and Zea mays using scanning electron microscopy. Total RNA will be extracted from individual floral bracts of wild-type and select mutant Brachypodium distachyon and Zea mays lines spanning using standard techniques. RNA-seq libraries will be constructed and subject to short-read nucleotide sequencing using standard techniques and quality controls. Sequencing data will be quality controlled and analyzed using standard approaches for single-cell transcriptomics, including sequence read alignment to a reference genome, alignment quality control, read counting and normalization, and unsupervised clustering. To measure success, we will check that transcriptional programs at distinct stages of lemma development show expression of "usual suspect" genes, gleaned from previous studies on lemma development in grasses. This effort is expected to reveal genes that are expressed in developing lemmas that may be critical for formation of awns. Next, variation in CNSs near these genes will be correlated to species-level awn phenotypes available from public databases, using one of several phylogenetically aware association mapping frameworks: phylogenetic principal component analysis, relative evolutionary rates, and phylogenetic least-square regression. All analysis frameworks leverage open source software tools; the PD will write and publicly release custom analysis code as necessary. Sample collection, library preparation, sequencing, and analysis will be carried out in Year 1. Additional analysis will be carried out in Year 2 as needed.The third aim is to test the function of putative regulatory sequences that exhibit association with awn phenotypes. The PD will design and make varios DNA constructs for expression of CRISPR-Cas9 components in Brachypodium distachyon and Zea mays, using established protocols in the PD advisor's lab. Wild-type and mutant strains of Brachypodium distachyon will be transformed with these constructs using established protocols; edited maize lines will be generated by a core transformation facility and phenotyped by the PD. The resulting transgenic strains will be genotyped for heritable edits at target loci of interest using PCR and sanger sequencing. Edited lines will be phenotype for the following traits: awn presence/absence, awn length, awn thickness, awn cell number, presence of chlorenchyma, photosynthetic rate, awn morphology, leaf morphology, leaf blade:sheath ratio, and leaf vascular traits. Appropriate controls, such as removal of awns from sibling lines that lack edits or CRISPR-Cas9 transgenes, will be included for photosynthesis measurements. As additional controls, full knockout lines of the genes associated with CNSs of interest will be generated and phenotyped. Success will be measured by alteration of awn phenotypes in edited lines, and by intermediate effects of CNS-edited lines compared to wild-type and knockout mutant lines. In Year 1, edited lines will be generated and seed of suitable lines bulked for phenotyping. Phenotyping will be carried out late in Year 1 into Year 2 of the project.