Source: UNIVERSITY OF ARIZONA submitted to NRP
QUANT-COGE BROWSER
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1000317
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2013
Project End Date
Sep 30, 2018
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIVERSITY OF ARIZONA
888 N EUCLID AVE
TUCSON,AZ 85719-4824
Performing Department
Plant Science
Non Technical Summary
How eukaryotic organisms regulate mRNA levels is a fundamental question in biology. Most of the early attention was focused on the study of gene transcription, while only recently posttranscriptional mechanisms have gained recognition for their regulatory importance. These epigenetic regulatory pathways control mRNA levels both transcriptionally and posttranscriptionally, and pioneering work in Arabidopsis thalianahas helped define these processes. For this reason, there is a wealth of epigenomic information already available for this model plant. However, it is almost entirely unusable to the wider research community due to the computational intensive procedures needed to leverage these data resources. For this reason, we will develop an easy to use web-based system to store, access, and visualize Arabidopsis epigenetic data in a comparative genomics context: the EPIC-CoGe Browser, which will be subsequently expanded to support other organisms and other forms of quantitative data.
Animal Health Component
100%
Research Effort Categories
Basic
(N/A)
Applied
100%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012499108050%
2027410108050%
Goals / Objectives
Objective: The development of an easy to use web-based system to store, access, and visualize epigenetic data in a comparative context will enable the international plant research community to access data and address hypotheses not previously possible.
Project Methods
The Browser will synthesize existing investments from three NSF funded projects: EPIC, CoGe, and the iPlant Collaborative. EPIC, whose mission is "reading the second [genetic] code [of life by] mapping epigenomes to understand plant growth, development and adaptation to the environment," is currently funded as a Research Coordination Network. Their primary goal has been to coordinate the research activities of the international community and develop a whitepaper to drive this effort. However, this community currently lacks a computational browser to access and visualize epigenetic data. Also, their research interests are diverse. While much of the epigenetic community originally focused on the model plant system, Arabidopsis thaliana, the community research interests span all plants, including those of agronomic importance for global food safety and sustainability. However, to achieve such broad applicability, the EPIC-CoGe Browser requires scalable computing resources and data management systems. The iPlant Collaborative is a large investment by the NSF to create cyberinfrastructure (CI) for the plant research community. Cyberinfrastructure is made up of extensible, scalable, and capable computing resources, and "domain expertise", which includes computer science, mathematics, statistics, algorithms, and all disciplines of plant biology. iPlant is building and deploying the software systems necessary to connect supercomputing resources (XSEDE) to computational biologists, bench biologists, field biologists, and plant breeders. The comparative genomics platform, CoGe, is part of the "powered by iPlant" program. CoGe utilizes iPlant's CI in order to achieve the scalability necessary to serve the entire comparative genomics community for all domains of life (CoGe currently makes available 20,000 genomes from ~15,000 organisms). In addition, CoGe provides a suite of web-based tools for in-depth analyses and comparisons of genomic data. The EPIC-CoGe Browser will be an extension of CoGe and likewise a member of the Powered by iPlant program to access the required scalable and capable computational resources. While year one of this project will focus on public epigenetics data for Arabidopsis thaliana and developing the four subsystems described above, as the technology continues to improve for amassing epigenetics data easily and inexpensively, the need for the EPIC-CoGe Browser will continue to grow as more plant species are investigated. Year two of the project will focus on catering to the needs of the epigenetic research community by: 1. providing researchers with more data management and collaboration tools, 2. supporting additional organisms, and 3. supporting advanced comparative analyses and publication quality images. Data management and collaboration tools are required for on-going research with pre-publication data. These systems will permit researchers to add their own data to EPIC-CoGe, share those data among a group of researchers, and restrict their public access, while also being able to engage the broader community for soliciting help and analytical expertise. EPIC-CoGe will engage the rice and maize research communities in order to expand EPIC-CoGe Browser's capabilities into additional species, and specifically those with agronomic and food safety importance. By being based on the CoGe system, which inherently supports thousands of organisms, these examples will permit the expansion of EPIC-CoGe to all domains of life. In addition, CoGe provides many tools for comparative genomics, and the data visualizations of EPIC-CoGe will be adapted for use in these tools. This synthesis of data and analytical tools will permit information from well-studied plants to be leveraged for less understood plants. In order to best meet the needs of the plant epigenetic research community, year two will also focus on soliciting feedback from scientists through online questionnaires, discussion forums, and workshops. The workshops will be held at national and international conferences such as International Conference on Arabidopsis Research and the Annual Maize Genetics Conference.

Progress 10/01/13 to 09/30/18

Outputs
Target Audience:Epic-CoGe (https://genomevolution.org/wiki/index.php/EPIC-CoGe) is an expanded suite of tools for the comparative genomics browser, CoGe (https://genomevolution.org). It's main target audience are biologiststudents and researchers who want to manage, analyze, and compare a variety of genomic information. Over the past year, CoGe was visited by 66,900 users who visited coge 106,600 times and viewed over 280,000 web pages. These users performed 67,000 analyses with CoGe, and loaded 2300 and 2900 new genomes and function genomic datasets. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One postdoc was trained this past year. How have the results been disseminated to communities of interest?Publications and web-based tutorials (linked above). What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? This past year, the keystone paper for EPIC-CoGe was published:https://www.ncbi.nlm.nih.gov/pubmed/29474529 in Bioinformatics. In addition, a postdoc in the group developed a comprehensive set of tutorials for using CoGe to aid in the research of Plasmodia genomes:https://genomevolution.org/wiki/index.php/Using_CoGe_for_the_analysis_of_Plasmodium_spp In addition, the team participated in several additional published works including: Algorithm development: Models for similarity distributions of syntenic homologs and applications to phylogenomics:https://ieeexplore.ieee.org/abstract/document/8423659 Polyplpoidy analysis of Brassica genomes:From Alpha-Duplication to Triplication and Sextuplication: https://link.springer.com/chapter/10.1007/978-3-319-43694-4_5 Discoveries of the effects of polyploidy and gene dosage on genome evolution:Patterns of Population Variation in Two Paleopolyploid Eudicot Lineages Suggest That Dosage-Based Selection on Homeologs Is Long-Lived:https://academic.oup.com/gbe/article/10/3/999/4943970

Publications

  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Nelson AD, Haug-Baltzell AK, Davey S, Gregory BD, Lyons E. EPIC-CoGe: managing and analyzing genomic data. Bioinformatics. 2018 Feb 20;34(15):2651-3.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Castillo AI, Nelson AD, Haug-Baltzell AK, Lyons E. A tutorial of diverse genome analysis tools found in the CoGe web-platform using Plasmodium spp. as a model. Database. 2018 Jan 1;2018.


Progress 10/01/16 to 09/30/17

Outputs
Target Audience:CoGe provides a suite of web-based tools and resources to help life science researcher manage, analyze, and visualize genome data. In the past year, CoGe was visited over 102,500 times by nearly 59,000 users. These users loaded over 2000 new genomes and 3000 -omic datasets into CoGe. In addition, they run over 60,000 analyses with CoGe. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?This project was completed by a first year graduate student rotating in the Lyons Research Group at the University of Arizona. Throughout the course of this project, the student developed programming and data analysis skills along with the best practices used for writing and distributing research open-source software. How have the results been disseminated to communities of interest?LoadExp+ allows users to upload, analyze, and visualize a variety of public and private NGS data using CoGe's web-accessible graphical user interfaces (GUIs) and application programming interfaces (APIs). The streamlined user interface is designed to allow novice users to quickly move from data analysis to visualization, and allows more experienced users to customize their analyses and make them available to collaborators. LoadExp+ is a web portal through which numerous genomic and epigenomic analyses can be conducted. These analyses include RNAseq, whole-genome bisulfite-sequencing (BS-seq), ChIPseq, SNP identification, and population genetics calculations. The collection of these tools in one location, integration with an advanced genome browser, and the use of CoGe's user-friendly interface present advantages over other web-based bioinformatics platforms for the life sciences. For advanced users, there is also an REST API available for programmatic access to data integrated through LoadExp+ (https://goo.gl/Pf4xjf). What do you plan to do during the next reporting period to accomplish the goals?We are planning to: Finish and publish the graphical user integrate for the dynamic visualization of quantitative genomic data Publish a use-case paper of using these tools to assist researchers in an underserved genomic community

Impacts
What was accomplished under these goals? To fully exploit NGS technologies in biological research we must increase access to NGS analysis tools. We have tackled this problem through the creation of LoadExp+, an integrated suite of NGS workflows for analysis of genomic and epigenomic data within the CoGe platform. These workflows enable users to easily perform a variety of analyses, share their data with collaborators, and visualize their results on a single, web-based platform with an intuitive GUI. While many web-based platforms exist to assist researchers analyzing NGS data, LoadExp+ provides additional features for managing public and private data, support for many types of NGS data, and seamless integration with the EPIC-CoGe genome browse

Publications

  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Grover J, Bomhoff M, Davey S, Gregory BD, Mosher RA, Lyons E. LoadExp+: A web-based suite that integrates next-gen sequencing data analysis workflows and visualization. Plant Direct 1:2 (2017)


Progress 10/01/15 to 09/30/16

Outputs
Target Audience:CoGe provides a suite of web-based tools and resources to help life science researcher manage, analyze, and visualize genome data. In the past year, CoGe was visited over 103,000 times by over 60,000 users. These users loaded over 2000 new genomes and 2600 -omic datasets into CoGe. In addition, they run over 80,000 analyses with CoGe. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Online tutorials in comparative genomics: https://genomevolution.org/wiki/index.php/Fish_Comparative_Genomics How have the results been disseminated to communities of interest?Publications (see previous) and talks at conference:https://pag.confex.com/pag/xxiv/recordingredirect.cgi/id/1978 What do you plan to do during the next reporting period to accomplish the goals?With the explosion in the number of genome sequences being generated, we will work towards developing new additions to CoGe to help with the analysis and visualization of many to many genome comparisons.

Impacts
What was accomplished under these goals? FractBias is a new tool added to CoGe to enable the rapid detection and visualization of fractionation bias following polyploidy:https://genomevolution.org/wiki/index.php/FractBias Integration of all available assembled fish genomes into CoGe for comparative genomic analyses with tutorials on how to analyze these genomes:https://genomevolution.org/wiki/index.php/Fish_Comparative_Genomics

Publications

  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Joyce BL, Haug-Baltzell A, Davey S, Bomhoff M, Schnable JC, Lyons E. FractBias: a graphical tool for assessing fractionation bias following polyploidy. Bioinformatics doi:10.1093/bioinformatics/btw666 (2016)
  • Type: Book Chapters Status: Published Year Published: 2016 Citation: Joyce BL, Baltzell AKH, Bomhoff M, Lyons E. Hook, line, and sinker: using CoGe tools for catching fish genome evolution. Bioinformatics in Aquaculture (2017)
  • Type: Book Chapters Status: Published Year Published: 2016 Citation: Joyce BL, Baltzell AKH, McCarthy FM, Bomhoff M, Lyons E. iAnimal: cyberinfrastructure to support data-driven science. Bioinformatics in Aquaculture (2017)


Progress 10/01/14 to 09/30/15

Outputs
Target Audience:How eukaryotic organisms regulate mRNA levels is a fundamental question in biology. Most of the early attention was focused on the study of gene transcription, while only recently posttranscriptional mechanisms have gained recognition for their regulatory importance. These epigenetic regulatory pathways control mRNA levels both transcriptionally and posttranscriptionally, and pioneering work in Arabidopsis thalianahas helped define these processes. For this reason, there is a wealth of epigenomic information already available for this model plant. However, it is almost entirely unusable to the wider research community due to the computational intensive procedures needed to leverage these data resources. For this reason, we will develop an easy to use web-based system to store, access, and visualize Arabidopsis epigenetic data in a comparative genomics context: the EPIC-CoGe Browser, which will be subsequently expanded to support other organisms and other forms of quantitative data. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?We are working on prototyping pipelines for variant analysisand methylation calls. Also, we are developing a better unified search interface to let researchers identify data of interest.

Impacts
What was accomplished under these goals? Developed RNASeq processing workflows to let researches send fastq data to EPIC-CoGe, then have those reads cleaned, mapped, quantified, and loaded as experiments. This work supported by this grant was the foundation that lead to the more robust set of tools provided by another USDA project, iAnimal (2013-00984).

Publications

  • Type: Journal Articles Status: Published Year Published: 2015 Citation: Tang H, Zhang X, Miao C, Zhang J, Ming R, Schnable J, Schnable P, Lyons E, Lu J. ALLMAPS: Robust scaffold ordering based on multiple maps. Genome Biology 16:3 (2015)
  • Type: Conference Papers and Presentations Status: Published Year Published: 2015 Citation: Plant and Animal Genome Conference: Computer Workshop Genome Management and Analysis with CoGe


Progress 10/01/13 to 09/30/14

Outputs
Target Audience: Various research communities leverages genome data. This includes plant and animal scientists. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Workshops for EPIC-CoGe held at: *PAG XXII Conference *EPIC Meeting *JCVI Summer workshop on genomics *University of Nebraska *University of Oklahoma How have the results been disseminated to communities of interest? In person workshops, virtual workshops, written tutorials (https://genomevolution.org/wiki/index.php/Tutorials), video tutorials (https://genomevolution.org/wiki/index.php/Tutorials#Video_Tutorials) What do you plan to do during the next reporting period to accomplish the goals? Develop RNASeq processing pipelines.

Impacts
What was accomplished under these goals? Completion of the user data magnement system for EPIC-CoGe.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Syntenic Analysis of Banana's Paleopolyploidy Events E Lyons Plant and Animal Genome XXII Conference
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Genome and transcriptome sequencing identifies breeding targets in the orphan crop tef (Eragrostis tef) Gina Cannarozzi, Sonia Plaza-W�thrich, Korinna Esfeld, St�phanie Larti, Yi Song Wilson, Dejene Girma, Edouard de Castro, Solomon Chanyalew, Regula Bl�sch, Laurent Farinelli, Eric Lyons, Michel Schneider, Laurent Falquet, Cris Kuhlemeier, Kebebew Assefa, Zerihun Tadele BMC genomics 15(1)
  • Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: EPIC-CoGe: Functional and Diversity Comparative Genomics E Lyons Plant and Animal Genome XXII Conference