Source: UNIVERSITY OF CALIFORNIA, RIVERSIDE submitted to
TRANSPOSABLE ELEMENTS AS DRIVERS OF GENE AND GENOME EVOLUTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0233738
Grant No.
(N/A)
Project No.
CA-R-BPS-5070-H
Proposal No.
(N/A)
Multistate No.
(N/A)
Program Code
(N/A)
Project Start Date
Jun 1, 2013
Project End Date
May 31, 2018
Grant Year
(N/A)
Project Director
Wessler, SU, R.
Recipient Organization
UNIVERSITY OF CALIFORNIA, RIVERSIDE
(N/A)
RIVERSIDE,CA 92521
Performing Department
Botany and Plant Sciences
Non Technical Summary
Transposable elements are fragments of DNA that can move from one site in the genome to another and often increase their copy number in the process. The focus of this project is on basic and applied research on transposable elements (TEs), which were first discovered in maize and are now known to comprise the largest component of all sequenced plant (and animal) genomes. The plants studied in this project focus on members of the grass clade, which are the most important source of calories for the human diet. However, because TEs are ubiquitous in all genomes, the findings from this project will inform studies of all crops grown in California, including all fruit and vegetable species. TEs create a significant fraction of genomic diversity, which is responsible for differences in gene expression patterns within a species. For example, they frequently insert into gene regulatory regions where they alter the tissue specific pattern of expression or the activation of transcription in response to stresses such as drought, salinity or cold temperature. In addition, because TEs are themselves induced by a variety of stresses, they are uniquely suited to increase the frequency of mutation when a population is most in need of generating diversity to survive. As mentioned above, TEs comprise the majority of all characterized plant genomes where they generate significant diversity. However, because the vast majority of TEs in all plants and animals are no longer active, it is difficult to determine how a population senses danger and responds by rearranging its genome. It is for this reason that this project focuses on a few closely related rice strains where a TE called mPing is actively amplifying throughout the genome and generating diversity right before our eyes. These studies are important to California because we are particularly interested in understanding how TEs generate diversity that allows plants to adapt to climatic extremes such as temperature, drought and salinity. This proposal also seeks to increase science literacy in the plant sciences. Specifically, we have developed and are expanding the Dynamic Genome courses to involve incoming UCR freshmen in the excitement of authentic research experiences through the analysis of TEs in plant genomes. As such, UCR freshmen will be introduced to the excitement of plant research and obtain the cutting edge tools necessary to continue in research throughout their college career.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2010999108010%
2011510108010%
2011530108020%
2060999108010%
2061510108010%
2061530108010%
9030999108010%
9031510108010%
9031530108010%
Goals / Objectives
The first objective is to resequence four strains of Oryza sativa (japonica) where the mPing transposable element is rapidly amplifying throughout the genome. These strains harbor the most active transposable element (TE) known to date in any eukaryote and as such, provides an unprecedented opportunity to observe the earliest events of a TE burst in real time. Over the past ten years my laboratory has discovered the mPing element and then characterized many aspects of its movement. Of specific interest is how the host survives the insertion of approximately 25 new mPing elements per plant per generation and what is the extent of genotypic and phenotypic diversity generated by independent bursts in these four strains. The second objective is to exploit a series of recombinant inbred populations that have been developed by our collaborators at Kyoto University and are being propagated and phenotyped by our collaborators at the Dale Bumpers Rice Institute in Arkansas (Okumoto) to assess the consequences of mPing insertions on quantitative traits. Resequencing and comparative analysis of the four strains described in objective one will allow the development of markers to genotype the recombinant inbred populations. The third objective is to develop authentic research projects for the Dynamic Genome courses derived from ongoing research in the laboratory. Expected outputs from this project include new software to aid in the annotation of transposable elements in resequenced short reads from a variety of crop plants, the identification of active TEs for use as mutagens in transposon tagging protocols, the training of undergraduates in genomics methodologies who will continue their training in faculty research laboratories, and the development of curricula for undergraduates to study plant genomics. The latter will be disseminated through publication in science education journals and on our website.
Project Methods
For the resequencing effort, one of the four strains will serve as the reference genome for the project and will require an initial build using Illumina 500 bp paired ends, 3 kb and 5kb mate pair libraries and an 8 kb 454 mate pair library to coverage of ~180X. The assembly will be built using SOAPdenovo and/or Velvet software. Given the expected similarity of the 3 other strains, they will be resequenced to ~100 fold coverage using only a 500 bp mate pair library and 2 lanes on the upgraded UCR HiSeq-2000 Illumina machine. SNPs, indels and chromosome rearrangements will be determined for one of the strains, HEG4 that is one of the two parents of the recombinant inbred population that is currently being grown in Arkansas. The other parent is the reference Nipponbare genome. SNPs will be detected using the Broad Institute Genome Analysis Toolkit (GATK). Breakdancer will be used to detect deletions. Detection of smaller indels and other rearrangements will be performed with Pindel and custom Perl scripts.

Progress 06/01/13 to 05/31/18

Outputs
Target Audience:This project has two major target audiences - undergraduate students who participate in Bio20 and the summer internship experience and citrus researchers. For Bio 20 during AY 2017-18, 13 sections of up to 24 first year undergraduate students participated in an authentic research project that utilized citrus and the citrus genome sequence generated by this project. Furthermore, during summer 2017, 16 rising second year students participated in an intensive summer research project involving the characterization of citrus promoters from genes under circadian (clock) control. In addition, we have made the citrus genome sequence available to a handful of citrus researchers. When the genome sequence is published later this year, that number will certainly increase significantly. Changes/Problems:In 2016 the direction of this project was changed to the analysis of TEs from the citrus genome. To this end we have spent the past 18 months generating a high quality genome sequence of the Fairchild variety which will be of use to the wider citrus community and will serve as a source of California-relevant projects for our undergraduate students in Bio20. Briefly, progress can be summarized as follows: To build a better reference genome for citrus, we generated 120 fold long reads from Fairchild mandarin using PacBio single-molecule real-time (SMRT) sequencing platform. In total, 3.9 million reads were obtained with an average read length of 9.4 kb. The long reads were assembled using a diploid assembler FALCON and the alternative heterozygous contigs were removed using HaploMerger2. The assembled genome is 360 Mb with contig N50 of 3,635 kb. To further improve the assembly, we generated 150 fold 10x Genomics linked reads and 280 fold bionano genome map. These data were used to scaffold the assembled contigs and fill the sequence gaps between contigs. Finally, we obtained a reference genome of 366 Mb with contig N50 of 10 Mb. All chromosomes were assembled into less than 10 contigs. The assembly is better than reference genomes that were reported in prior studies, such as clementine mandarin (genome size of 301 Mb and contig N50 of 119 kb), sweet orange (genome size of 327 Mb and contig N50 of 50 kb), and Pummelo (genome size of 345 Mb and contig N50 of 2,182 kb). Finally, this project has also involved culturing citrus for the eventual introduction of rice TEs. To this end, culturing of Fairchild is in progress. We restarted the embryogenic callus induction and somatic embryogenesis in the Fall of 2017. We are culturing the embryo like structures both on solid and liquid media. We are optimizing the growth factors for the media so that there won't be any callus browning or cell apoptosis. This experiment will be further standardized using more embryos for reproducibility during coming months. The protocols for callus induction and direct regeneration from cotyledonary explants is being finalized and will be repeated with more explants and can be exploited for doing agrobacterium mediated gene transfer in the Spring 2019. In addition to this, we are developing protocols for embryogenic callus induction using embryos from the developing ovules of Fairchild fruit. What opportunities for training and professional development has the project provided?Both the citrus and rice components of this project provided vast training opportunities for both wet and dry (computational) experiences. For first or second year undergraduates: During 2017-2018 thirteen sections of the Biology 20 course analyzed different aspects of the Fairchild genome. In winter 2018 two sections amplified loci associated with resistance to HLB in Fairchild and other varieties in the Citrus Collection. These students were given specific regions of the genome to design PCR primers for amplification and sequencing to identify polymorphisms between the varieties. The other sections of the course identified and amplified putative clock genes from Fairchild. During summer 2018, 17 students (10 first-years and 7 transfers) continued this project to identify clock gene homologs in the Fairchild Mandarin genome. The students started with the protein sequences of the known Arabidopsis clock genes (TOC1, CCA1, PRR7, PRR9, and LHY1) and used TARGeT (target.cyverse.org) to identify candidate genes in Fairchild. The students designed primers to amplify the genes including the putative promoter regions. They analyzed the sequences from other citrus varieties as well. We are planning on submitting these sequences to Genbank. The students cloned the promoter regions each gene with the aim of testing for circadian activity in tobacco leaf assays. After the summer program ended, 12 students enrolled in a research course (Biology 190) to continue the project. During fall 2018 the students moved the cloned sequences into expression vectors and transformed the resulting plasmids into Agrobacterium and infiltrated tobacco leaves. Leaf disks were imaged for three days and activity was detected in the promoter region of one of five tested. The expression was not cyclic. Fourteen new students (all transfer students) in Biology 190 are repeating the fall work with these promoters with a slightly modified experimental design of including the promoter region plus the 5' UTR to see if critical signals were missing from the original cloned sequences. This work was done in collaboration with Dr. Dawn Nagel, an associate professor in Botany and Plant Sciences. For advanced undergraduates: The undergraduate laboratory assistants (ULAs) worked with the students in the sections to help with learning techniques and data analysis. ULAs also receive training in preparing for lab courses and assist with the reagents and supply preparation. The ULAs are a major component of the course and are often the first person the students will ask for assistance. Postdoctoral associate: The citrus genome sequencing project introduced the postdoctoral associate to the field of third-generation sequencing technologies and the computational analysis necessary to perform genome assembly (Pacbio long reads, bionano genome mapping, and 10X Genomics linked reads). How have the results been disseminated to communities of interest?Most of the results have been published or are in preparation. In addition, Wessler receives frequent invitations (about 1/month) to present seminars on the rice project and the Bio20 course. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? All of the original goals have been accomplished and are either published or manuscripts describing the results are in preparation. Specifically: (1) Software (RelocaTE and RelocaTE2) to facilitate the identification of transposable elements when comparing a reference genome to short (Illumina) reads was developed and published. (2) The four strains with bursting mPing elements were completely sequenced and compared. Analyses of these genome sequences led, for the first time, to an understanding of into how a higher eukaryote (in this care, rice) survives a TE burst and how the burst continues without being epigenetically silenced. These results were published in two publications - Lu et al, PNAS 2017, and Chen et al Nat Comm 2019. (3) The entire population of 272 RIL was sequenced (~10X coverage) and phenotyped in Arkansas. Surprisingly, we determined that the amplification of mPing has a minimal impact on host phenotype. A manuscript describing these findings is in preparation and will be submitted to eLIFE when completed. (4) Several modules for Bio 20 lab were designed and execulted. We have plans to publish these in a suitable STEM education journal.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Chen, J, Lu L, Benjamin, J., Diaz, S., Hancock, CN, Stajich, JE, Wessler, SR (2019) Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice. Nature Comm.10: 641


Progress 10/01/16 to 09/30/17

Outputs
Target Audience:There were two target audiences: 1 - Undergraduate students enrolled in Bio20 (the Dynamic Genome course) who participated in the analyses of the citrus genome. 2 - scientists working with citrus who can benefit from the availability of a high quality reference genome. Changes/Problems:1 - An unknown student discarded all of our tissue culture lines and we had to start from square one. What opportunities for training and professional development has the project provided?During the academic year 2016-2017 15 out of 18 sections (over 300 students) worked in citrus projects verifying TE insertions. The summer students in 2016 tried to characterize gene families in citrus and in 2017 tried to clone hATs. There were 17 students each summer. This summer we plan to characterize the clock genes in citrus. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?1 - We will continue to refine the Fairchild genome sequence and continue annotation. 2 - We will continue developing cell culture lines and hopefullly, will introduce Ping and mPing constructs this year. 3 - We are collaborating with Dr. Xiaoyu Zhang to annotate the chromatin of Fairchild to better understand the genome and its expression. We are in the process of generating a genome-wide and high-resolution set of data on chromatin structure, modifications and composition in citrus. The results should allow us to systematically identify and curate cis-regulatory elements, and better understand the interactions between genetic and epigenetic pathways in transcriptional regulation. Ongoing experiments are designed to profile transcription activities (RNA-seq), DNA methylation (whole-genome bisulfite sequencing), histone modifications and variants (ChIP-seq), transcription factor binding sites (ATAC-seq) and 3D chromatin interactions (HiC) in young leaves. The methods optimized here will also be applied to other tissues, developmental stages, cultivars as well as different biotic or abiotic stress conditions in the future. It is expected that the results will not only aid in the annotation of the citrus genome, but also provide the first comprehensive view of chromatin structure and function in this perennial tree species. .

Impacts
What was accomplished under these goals? The goals were changed in a 2016 report and, for some reason, that is not reflected anywhere in the documentation available at this site. The revised goals of the project were (1) to identify active transposable elements (TEs) in the citrus genome, and (2) to generate strains that contain the active rice transposable elements Ping and mPing. Our difficulties identifying full-length TEs in the published Clementine genome sequence led to the decision to generate a high quality citrus genome using the Fairchild strain. Over the past two years we have made significant progress which is summarized below: (1) Fairchild genome sequencing: To build a better reference genome for citrus, we generated 73 fold long reads from Fairchild mandarin using PacBio single-molecule real-time (SMRT) sequencing platform. In total, 3.2 million reads were obtained with an average read length of 8.3 kb. The long reads were assembled using a diploid assembler FALCON, and resulted in a 374 Mb genome with contig N50 of 349 kb. The assembly is better than reference genomes that were reported in prior studies, such as clementina mandarin (genome size of 301 Mb and contig N50 of 119 kb) and sweet orange (genome size of 327 Mb and contig N50 of 50 kb). However, the assembled Fairchild genome still has lots of sequencing gaps, particularly in heterozygous regions. To further improve the Fairchild genome assembly, we generated 30 fold PacBio long reads with average length of 20 kb, 150 fold 10x Genomics linked reads, and 280 fold bionano genome map. We assembled Fairchild genome using a hybrid approach that uses PacBio long reads to build contig and uses linked reads and bionano genome map to scaffold the contigs. We obtained a reference quality genome of 366 Mb with contig N50 of 10Mb. All chromosomes were assembled into less than 10 contigs. We further phased the heterozygous SNPs using both 10x Genomics linked reads and a pollen-genotyping method, which generated chromosomal-level haplotype blocks. These phased heterozygous SNPs were used to facilitate the assembly of haplotypes and resolving the structural differences between haplotypes. (2) Generate citrus strains with rice TEs: Culturing of Fairchild is in progress. I n 2017 protocols were being finalized where undeveloped Fairchild ovules started producing callus and embryo-like structures. Unfortunately the entire process of embryogenic callus induction and somatic embryogenesis had to be restarted in the Fall of 2017 because the research material was accidentally disposed of by a student worker in the tissue culture facility. Currently we are collecting material for solid and liquid cell culture with the same growth factors in both media. This experiment will be further standardized using more explants for reproducibility during coming months. In addition to this, we are developing protocols for embryogenic callus induction using embryos from the mature ovules of Fairchild fruit. Finally we are developing protocols for callus induction and direct regeneration from cotyledonary explants.

Publications

  • Type: Journal Articles Status: Accepted Year Published: 2017 Citation: Lu Lu, Chen, J., Robb, S., Okamoto, Y., Stajich,J., Wessler, SR. Tracking the genome-wide outcomes of a transposable element burst over decades of amplification. Proc. Natl. Acad. Sci. 114: 10550-59.


Progress 10/01/15 to 09/30/16

Outputs
Target Audience:We have several target audiences. --UCR undergraduates who are enrolled in the Dynamic Genome course (Bio20) as freshmen. Enrollment increases by 1section/quarter each year. During the reporting period 15 sections of Bio20 were taught or about 300 students. --Plant researchers who use software developed during this project. --Community College students who participate in Dynamic Genome workshops. --Instructors/Professors who utilize the active learning lessons we have published in CourseSource. Changes/Problems:As discussed above, the focus of the project was changed dramatically last year from rice to citrus. This was done largely to take advantage of the fantastic citrus collection which is now a focus of both the research and educational goals. What opportunities for training and professional development has the project provided?Because there was no budget associated with this project prior to the end date of this report, there were no funds for training personnel. All personnel associated with the above accomplishments were funded from other grants. How have the results been disseminated to communities of interest?Wessler has presented the results from this project at several universities and scientific workshop and symposia. James Burnette has been a speaker at several meetings where the focus is science education. James Burnette and Alex Cortez have led over 10 outreach events to the local community including high school teachers and community college students and professors. These active learning experiences involve isolation of DNA from both plants and animals (fish) and analysis. What do you plan to do during the next reporting period to accomplish the goals?The focus of this project was changed in my June 2016 submission to the analysis of transposable elements in citrus genome. To this end we are generating a high quality genome sequence of the Fairchild variety which will be of use to the wider citrus community. Next year's progress report will include a summary of the new reference genome and the use of citrus genomics in my Dynamic Genome class.

Impacts
What was accomplished under these goals? We resequenced the four strains (EG4/HEG4 and A119/A123) of Oryza sativa (japonica) where the mPing transposable element is rapidly amplifying. Comparative analysis of the 4 strains identified that they derive from two independent bursts of the mPing transposon. Both strain pairs have been maintained by self- or sib pollination over 20 and 100 years respectively. The burst is associated with an increase in the autonomous Ping element. Most importantly, we show by several criteria epigenetic regulation has been maintained in all strains meaning that this burst is evading detection by the host. Over the past year we have also performed whole genome epigenetic analysis including the methylome and two histone marks. The data will be submitted for publication this year. The second objective is to exploit a series of recombinant inbred populations that were developed by our collaborators at Kyoto University and were propagated and phenotyped at the Dale Bumpers Rice Institute in Arkansas. This year we completed high throughput sequencing of all 275 RILs and determined that they contain over 12,000 new mPing insertions. The strains are being precision-phenotyped at the Danforth Plant Center and analyzed for drought and salt stress. The third objective is to develop authentic research projects for the Dynamic Genome courses derived from ongoing research in the laboratory. One outcome of this was the development of new software (RelocaTE2) which aids in the annotation of transposable elements in resequenced short reads.

Publications

  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Robb, S, Burnette, JM, Chapovskya, A, Palmer, K, Wessler, SR (2015) An open source, collaborative electronic notebook for undergraduate laboratory classes. CourseSource. 2: 1-10.
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Burnette, JM, Kanizay, L., Chester, N., and Wessler SR. (2016) Dilution and Pipetting Lesson Using Food Dyes. CourseSource 3: 1-5
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Chen J, Wrightsman TR, Wessler SR, Stajich JE.,RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ. 2017 Jan 26;5:e2942. doi: 10.7717/peerj.2942.


Progress 10/01/14 to 09/30/15

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Nothing to report.

Publications


    Progress 10/01/13 to 09/30/14

    Outputs
    Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

    Impacts
    What was accomplished under these goals? Nothing to report.

    Publications


      Progress 06/01/13 to 09/30/13

      Outputs
      Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

      Impacts
      What was accomplished under these goals? Nothing to report.

      Publications