Development of maize gene expression database to augment gene discovery

DEVELOPMENT OF MAIZE GENE EXPRESSION DATABASE TO AUGMENT GENE DISCOVERY

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

HATCH

Reporting Frequency

Annual

Accession No.

0230311

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Oct 1, 2012

Project End Date

Sep 30, 2014

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
UNIV OF MINNESOTA
(N/A)
ST PAUL,MN 55108

Performing Department
Plant Biology

Non Technical Summary
It is an exciting time to study plant genetics. The availability of genomic sequence and approaches to study their function has the potential to provide rapid progress. Maize has long been a model organism for the study of quantitative genetics and phenotypic variation. The overall goal of this project is to use functional genomics approaches to identify molecular causes of phenotypic variation in maize inbred lines. This project will focus on developing computational tools and data analyses approaches to efficiently combine gene expression data with genotype and phenotype data. While this project will focus on datasets involving maize, it is expected that the findings and approaches will be widely applicable to many species.

Animal Health Component

(N/A)

Research Effort Categories

Basic

(N/A)

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1510	1080	100%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1510 - Corn;

Field Of Science
1080 - Genetics;

Keywords

Goals / Objectives
Objectives The first two objectives both relate to the continued development of the COB (Co-expression bench) database. The third objective will utilize COB to better understand the molecular basis of plant height variation. 1. Add expression viewer and phenotypic data to COB. Previous efforts have established a powerful tool for visualizing co-expression data. Our goal is to add further functionality to this website by adding tools for visualization of expression data from many experiments and to correlate expression levels and phenotypic variation. 2. Develop robust approach for integration RNAseq and microarray data. There is a growing abundance of genome-wide transcript profiling data derived from both microarray and RNAseq experiments. These data types both have value but can be difficult to integrate in a single view and analysis. We will develop approaches to integrate these data types and will collect these data for our database. 3. Apply COB for discovery of plant height variation genes. The COB tools will be applied to identify functional variants that underlie QTL for plant height variation in maize. This will provide an opportunity to demonstrate the utility of our approaches for discovery of functional variants in complex plant genomes.

Project Methods
A primary goal of this project is to develop a tool for exploring the relationship of gene expression levels with phenotypic variance. The currently developed tools allow for the ease generation and exploration of co-expression networks generated from developmental tissues or from different genotypes. In our first aim the goal will be to develop further tools to visualize expression patterns of particular genes and to be able to relate expression levels with phenotype. A researcher often wishes to visualize the expression pattern of a particular gene (or set of genes) in different tissues or genotypes. COB currently contains the expression data from a maize expression atlas (Sekhon et al., 2011) that includes 60 different stages or tissues from a single genotype, B73. COB also contains expression profiles for seedling tissue of 62 different genotypes including 24 teosinte and 38 maize lines. We will implement tools that will allow users to input a single gene or multiple genes and generate heatmaps that visualize the expression of the gene(s) in both experiments. The data layout will be dynamic to allow for reordering of genes, tissues or genotypes. If multiple genes are used as a query the results will include each of the individual genes as well as a visual for the average of the group of genes. In addition, the underlying data will be downloadable in a text format to allow for user manipulation. A primary objective is to enable the discovery of genes that underlie phenotypic variation. To enable this objective we will also add phenotypic data to COB. Many groups have been performing associating mapping in maize and have collected phenotypic data on large populations of maize (Buckler et al., 2009; Krill et a., 2010; Yan et al., 2010; Brown et al., 2011; Cook et al., 2011; Tian et al., 2011) and there are many additional datasets that are likely to be published soon. These phenotypic data will be collected from Gramene or other sources and the genotypes will be mapped to the same set of lines for which expression data is available. Several different approaches tools will be developed to explore the relationships between expression and phenotype. A user will be able to query a specific gene and recover information about the correlation between that genes expression level and all phenotypes (along with statistical significance estimates). Alternatively, a user can compare a list of candidate genes to a specific phenotype. In addition, we will collect GWAS data and be able to query significant genetic associations of SNPs near particular genes with the phenotype of interest. The primary goal of this first objective is to fully develop a set of tools for visualizing and exploring the relationship of gene expression patterns with phenotypic differences in maize genotypes. CoPI Myers group has familiarity with the development of these types of tools and we do not anticipate substantial issues completing this aim. A set of beta-users in the maize community will provide feedback to assure that the tools that we develop are useful and provide the desired visualizations or analyses.

Progress 10/01/12 to 09/30/14

Outputs
Target Audience: Our target audience for this project is maize geneticists as well as geneticists or breeders working on other crop plants. Our goal was to develop tools that could be used to enable discovery of genes that might underlie important phenotypic variation in maize. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? A graduate student, Rob Schaefer, was trained in this project. Rob is in a computational biology graduate program and this project provided opportunties for him to work with field staff and understand the basics of plant genetics as he was working to develop computational resources for the community. How have the results been disseminated to communities of interest? The main findings of this project are reported in a publication (Schaefer et al., 2014) and the COB tool generated by this project is publicly available athttp://csbio.cs.umn.edu/cob/. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? A primary goal of this project was to improve COB as a tool that could be used to explore the genes (and gene expression patterns) for genes that migth underlie phenotypic QTL. We have made a number of improvements to COB and have published a manuscript that highlights how this tool can be used to discover novel biology regarding maize quantitative trait variation. In addition, we provide use-cases to illustrate how a user can go from phenotypic data or QTL locations to candidate genes.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Shaefer RJ, Briskine R, Springer NM, Myers CL (2014) Discovering Functional Modules Across Diverse Maize Transcriptional Datasets Using COB, The Co-expression Browser. PLoS One 9(6):e99193.

Progress 10/01/12 to 09/30/13

Outputs
Target Audience: The primary target audience for this project is maize geneticists and breeders. In addition, the work performed on this project is expected to have relevance for researchers studying other crop plants as well. This project aims to develop resources to better utilize transcriptome data to understand and predict complex traits. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? This project has provided training opportunities for a computer science student Roman Briskine. He has had the opportunity to combine his expertise in computer science with training experiences in plant genetics and genomics. How have the results been disseminated to communities of interest? The results of this work are made available through the project database website (http://csbio.cs.umn.edu/cob/) and several of the analyses have been reported in a publication (Sekhon et al., 2013). What do you plan to do during the next reporting period to accomplish the goals? We will continue to add functionality to the website. In particular we anticipate generating tools to visualize expression patterns for sets of genes and to add phenotypic data.

Impacts
What was accomplished under these goals? The main objectives of this project are to (1) develop added functionality to a maize coexpression database (COB:http://csbio.cs.umn.edu/cob/); (2) to develop approaches to integrate RNAseq and microarray data into co-expression networks and (3) to apply tools from the COB database to discover potential candidate genes that incluence natural variation for plant height. The COB database has been improved during this year. Tools have been added to allow for querys based on specific genes and sets of chromosomal coordinates. We are continuing efforts to add additional functions for visualizing expression relationships and phenotypic data. Significant progress was made on finding methods to integrate RNAseq and microarray data. Co-expression networks were constructed using different datasets and compared to determine if robust relationships could be identified. The results of these analyses were published in a PLoS One manuscript this year (Sekhon et al., 2013). The third objective involves identification of candidate genes. The appropriate phenotypic data was generated and we are working to get this data integrated into the database.

Publications

Type: Journal Articles Status: Published Year Published: 2013 Citation: Sekhon RS, Briskine R, Hirsch CN, Myers CL, Springer NM, Buell CR, de Leon N, Kaeppler SM. Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS One. 2013 Apr 23;8(4):e61005.