Performing Department
Genetics and Biochemistry
Non Technical Summary
Anyone who has accessed an Internet search engine knows there are volumes of free digital information available to mine anything from the best shoe price to where Fiji is located. The same is true for agricultural genetic information stored on the Internet that can be mined and used to develop new crops. New crops are always needed, and an acceleration of the crop development cycle is essential to mitigate the effects of population pressure and climate change on food and plant co-product (biofuel, cotton, etc.) yields. The applied basic research described in this proposal could have a powerful impact on human health and commerce. A ramp up of crop development speed is essential given the competitiveness from world markets for existing crop commodities, the enormous market potential of bioenergy and other co-products, and the threat of climate change on shifting living (e.g. emerging pests and weeds) and non-living (e.g. floods and droughts) factors affecting crop yield. In short, the application of new technologies such as the analysis of huge DNA sequence datasets described in this proposed research may very well be essential to maintaining and improving the profitability of the US/SC agricultural industry now and in an uncertain future where the Farmer's Almanac may not be predictive of future climate.
Animal Health Component
0%
Research Effort Categories
Basic
80%
Applied
20%
Developmental
(N/A)
Goals / Objectives
Objective A: Build a Crop Gene Interaction Network Database. We will insert gene interaction networks for crops relevant to SC into our GeneNet Engine data-mining resource. These networks will be collected from public sources and in some cases constructed de novo using NGS data. A key functionality of the GeneNet Engine is the delivery of known DNA polymorphisms (e.g. SNPs with flanking DNA) near genes found by the user to relevant to the biology in which they are interested.Objective B: Construct Translational Genomics Software. We will construct tools to analyze complex gene interaction patterns (networks). The GeneNet Engine itself is a tool that allows for the exploration of networks by finding interacting partners with specific gene names, genetic signal of a trait, and enriched biological function. We will explore way tool improve this tool for comparative genomics. In addition, we will work the construction of fast network alignment software to identify conserved interaction patterns between crops.Objective C: Outreach to SC Crop Development Community. If travel funds are provided, deliver on-site training in the use of these tools and relevance to crop development.
Project Methods
Objective A: Build a Crop Gene Interaction Network Database. In this Objective, we will add networks of agricultural relevance to South Carolina. These could include bioenergy grasses, soybean, cotton, vegetables, and others as they are generated by the Feltus lab or other groups. To obtain the networks, we will A) search the literature for gene interaction networks (RNAseq and hybridization based) for target crops relevant to South Carolina; and B) construct co-expression networks de novo from public RNAseq data. For example, there are at least two soybean networks available from the literature (Yim, Yu et al. 2013; Yu, Zhang et al. 2014), and there are 138 soybean RNAseq datasets in the NCBI SRA database (SRA 2014). Given the PI's previous research, we will be interested in adding bioenergy grass networks including one we are constructing for sorghum (unpublished data). For de novo RNAseq network construction, FASTQ files will be pre-processed by removing adaptors and soft-trimmed with trimmomatic (Trimmomatic 2013) and mapped to the relevant genome assembly with gene model coordinates in GTF files using bowtie2/TopHat (Trapnell, Pachter et al. 2009; bowtie2 2013). These SAM/BAM alignments for all conditions will be used to construct FPKM matrices and input into a gene co-expression network (GCN) construction pipeline for gene interaction module discovery. PI Feltus is currently exploring alternate RNAseq normalization techniques (Dillies, Rau et al. 2013) for GCN construction. We will remove outlier distributions and construct a single global GCN for each species, but we will explore statistical partitioning of the expression sets prior to network construction as we have done for rice (Ficklin and Feltus 2013) and Arabidopsis (Feltus, Ficklin et al. 2013). The construction of multiple GCNs allows for the maximal capture of gene co-expression space. Once constructed, each GCN will be portioned into gene modules (sub-networks) using network community discovery techniques such as MCL (Hwang, Cho et al. 2006) and link community approaches (Ahn, Bagrow et al. 2010; Kalinka and Tomancak 2011), both of which have been extremely informative in our ongoing analysis of Arabidopsis and rice networks, followed by enrichment analysis of GO, Interpro, and KEGG terms (Huang, Sherman et al. 2009; Ficklin, Luo et al. 2010). In this way, we will generate a set of possible co-expressed gene interaction modules with functional annotation. The annotations within a module can be tested for significance using enrichment analysis and simulation techniques for which the Feltus lab has developed relevant code. All modules form public or de novo constructed GCNs will be inserted into the GeneNet Engine database. In addition, any public genetic data in an easily accessible format can be loaded into the database for the association of GWAS and QTL signals with gene interaction modules. Furthermore, any public DNA polymorphism data (e.g. SNP-DB) can be loaded for the association of useful DNA markers to genes of interest. As mentioned above, a key usage example would be for a researcher to identify a candidate gene of interest in the database, find interacting genes, and then select DNA markers near these genes for a molecular breeding experiment.Objective B: Construct Translational Genomics Software. In this objective, we will create tools to facilitate the translational genomics potential for agricultural crops. For example, we will collaborate with computer engineers to port network alignment algorithms such as IsoRankN (Liao, Lu et al. 2009) to optimized code. We are currently developing global network alignment software based on homology and topology to Nvidia's CUDA code so that it will run quickly on GPUs and scale with network size (unpublished software). If funds are available for back-end compute resources, we will create a web-based tool to allow for network alignment using this GPU-optimized code. The GeneNet Engine itself is a tool that allows for the exploration of networks by finding interacting partners with specific gene names, genetic signal of a trait, and enriched biological function. We will explore way tool improve this tool for comparative genomics such as the creation of mapping tables between syntenic regions of related crop genomes.Objective C: Outreach to SC Crop Development Community. If travel funds are provided, PI Feltus will provide an on-site, one day work shop at a target site (e.g. PeeDee REC) on how to use these tools as well as provide examples of their importance. If funds are not provided for travel, a remote workshop will be organized. These workshops will occur at least once per year probably in the Fall after the harvest has been completed for Summer crops.Outputs. The focal output of this project will be the networks deposited into the GeneNet Engine database and any public software tools developed. From a deliverable perspective, each network and tool is a potential publishable unit. Therefore, the number of peer-reviewed publications will be a key metric of progress as the project progresses. In addition, the utility of these tools will be assessed in a qualitative and quantitative manner. The IP logs of those who accessed the database and any tools we develop will be captured and the number of unique South Carolina users counted. In addition, records will be kept of any contacts by South Carolina researchers via workshop, email, and telephone will be recorded to qualitatively assess the impact of this work be South Carolina crop researchers. Furthermore, user email addresses will be recorded and online surveys (e.g. Survey Monkey) will be performed to monitor the success of the project as well as provide feedback for tool improvement.