Source: UNIVERSITY OF ILLINOIS submitted to NRP
CONTROL OF GENE EXPRESSION IN CROPS: DISCOVERY RESEARCH USING HIGH PERFORMANCE COMPUTING, ENUMERATIVE ANALYSIS AND MACHINE LEARNING
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0202721
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2010
Project End Date
Sep 30, 2015
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIVERSITY OF ILLINOIS
2001 S. Lincoln Ave.
URBANA,IL 61801
Performing Department
Crop Sciences
Non Technical Summary
Just as organisms come in many shapes and sizes, so does the DNA in their genomes. Complex organisms tend to have larger genomes, carrying more information in terms of the number of DNA bases. The plant kingdom contains the largest genomes of all, and though it will be some time before the very biggest are sequenced, the large genomes of crop plants such as maize, soybean and rice are now completed. Although annotation of the genomes will provide an overview of the protein coding sequences, regulatory sequences of crop plants are not currently accessible to automated annotation, with some relatively simple exceptions such as intron splice sites. This problem is essentially one of code breaking. The more that is known about the functions, expression and structure of genes and their products, the more the deficiencies in our knowledge about the regulatory DNA become apparent. The regulation of gene expression is the key to understanding most of the agriculturally important characteristics and responses of plants, including tolerance to environmental stresses, nutrient partitioning and morphology. Great strides have been made in understanding these systems in the model plant Arabidopsis and it is now possible to transfer many of these discoveries and methods into crop systems. In particular, the promoters of soybean are now accessible for analysis.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012499104025%
2012499106075%
Goals / Objectives
1. Implement web and command-line tools developed for Arabidopsis for the genome of soybean: Modern computers make the analysis of a relatively large genome such as soybean possible on a single, powerful server-class machine. We will implement both evolutionary and enumerative motif analysis programs on a server and provide the service via a web site. We anticipate that this site will be widely used by soybean researchers in academia and industry. We are partnering with the ARS-funded Soybase program to ensure the website is permanently available and secure. 2. Investigate soybean functional genomics data for new tissue- and response-specific motifs and pathways: A comprehensive set of predictions will be made concerning the critical regulatory DNA sequences of the soybean genome. To begin with a global analysis of soybean promoters, we will compare windows of sequence upstream of the transcriptional start sites characterized by a bulk 5 RACE experiment followed by short-read sequencing. We will target the mechanisms that regulate co-expression of genes in specific tissues, investigating the pathways responsible for cell fate determination and cell-specific gene expression. In collaboration with others we will also investigate pathways of biotic and abiotic stress response. 3. Use these newly developed tools to compare all of the soybean promoter and intergenic regions and open new avenues for exploring regulatory pathways by discovering promoter motifs: The soybean genome and soon to be completed sequences of Glycine tomentella and Phaseolus vulgaris provide a compelling opportunity to discover regulatory motifs by comparison to closely related genomes, as well as by cross-analysis of expression data and DNA sequence. We will examine these genomes for evidence of conserved promoter motifs and protein binding sites. 4. Use systems biology knowledge to develop synthetic promoters to drive expression of transgenes: We will use synthetic biology techniques to synthesize DNA bearing combinations of promoter elements that do not exist in soybean itself, with the aim of developing promoter sequences that will drive expression at known levels and in known tissues (or constitutively through the plant). The use of luciferase and GFP reporter genes will allow expression to be monitored in vivo in transiently or permanently transformed plants. While recombinant DNA and transgenic plant work will be performed in other projects and with other funding sources, data from these experiments will be used to refine and enhance procedures outlined in this proposal.
Project Methods
Parallelization of the enumerative programs will be performed by rewriting them in the C++ programming language in a form more easily adapted to the Message Passing Interface (MPI), the communication standard for computer clusters. Detection of conserved motifs will be done by refining the enumerative approach and applying alternative statistics such as Z-scores. Alignment of the large groups of sequences produced will be accomplished using the MAFFT algorithm (http://mafft.cbrc.jp/alignment/software/). Support vector machine algorithms used will include the svmlite module tool for Perl and the e1071 package for the R environment for statistical computing (http://cran.r-project.org/src/contrib/Descriptions/e1071.html). Biological validation of predicted motifs will be accomplished using transient bombardment assays with dual luciferase constructs and construction of stable transgenic plants containing motif-driven expression constructs. Luciferase or GFP reporter genes will enable the direct visualization of temporal (luciferase) or spatial (GFP) regulation of gene expression with bioluminescence detection and fluorescence microscopy.

Progress 10/01/10 to 09/30/15

Outputs
Target Audience: Target audiences include scientific journal readers and conference attendees, as well as students educated as part of the research project. Changes/Problems:Since major funding to support the proposed laboratory efforts in Arabidopsis with recombinant DNA and transgenic plants was not obtained except from industry sources that precluded publications, the public part of the project was refocused on software and collaborative bioinformatics analysis. What opportunities for training and professional development has the project provided?Several graduate and undergraduate students have been trained in plant genome informatics and related disciplines such as statistics and probability. How have the results been disseminated to communities of interest?Results were disseminated via academic publications, web-based software products and conference presentations. The software currentlyis available at http://stan.cropsci.uiuc.edu/tools.php. We are also providing other software specifically to certain collaborators. What do you plan to do during the next reporting period to accomplish the goals?This is a final report, however the project will continue and we hope to refine the methods for promoter and transcription factor analysis further and to make the software available to a wider audience, as well as continuing research into new mechanisms of plant gene regulation.

Impacts
What was accomplished under these goals? Synthetic biology research directed at the above goals is necessary in order to evaluate the biological significance of the promoters. Without the extensive funding needed to do this in our own laboratory, we focused on bioinformatics and promoter identification in collaboration with other academic and industry groups who have created transgenic plants and performed validation studies. We hope to reactivate the transgenic project when funds are available, but our in silico project has had a number of outputs. We have been cited and / or acknowledged in several publications where others have used our promoter analysis software, and we are collaborating with other groups in ongoing work. We have supplied a large number of predicted promoters to an industry collaborator for evaluation for biotechnology purposes. Also,our promoter and genome analysis and identification work has led to publications from our own group and enabled other discoveries.

Publications

  • Type: Journal Articles Status: Published Year Published: 2015 Citation: K.A. Hudson and M.E. Hudson. 2015. The basic helix-loop-helix transcription factor family in the sacred lotus, Nelumbo nucifera. Tropical Plant Biology 7 (2), 65-70A.
  • Type: Journal Articles Status: Published Year Published: 2015 Citation: K.A. Hudson and M.E. Hudson. 2015. Classification of basic helix-loop-helix transcription factors of soybean. International Journal of Genomics #603182.


Progress 10/01/13 to 09/30/14

Outputs
Target Audience: Target audiences include scientific journal readers and conference attendees, as well as students educated as part of the research project. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Undergraduate and graduate students have been trained in bioinformatics. How have the results been disseminated to communities of interest? Via publications, conferences, and collaboration. What do you plan to do during the next reporting period to accomplish the goals? We will apply for competitive funds to restart the synthetic biology project, and continue the bioinformatics analysis of crop promoters.

Impacts
What was accomplished under these goals? Lacking funding for further synthetic biology research directed at the above goals, we focused on bioinformatics and promoter identification in collaboration with other academic and industry groups. We hope to reactivate the transgenic project when funds are available.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Li, Y., Varala, K. and Hudson, M.E. A survey of the small RNA population during far-red light-induced apical hook opening. Front. Plant. Sci. 5: 156.


Progress 01/01/13 to 09/30/13

Outputs
Target Audience: Readers of the scientific literature, conference attendees, and colleagues. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? A MS student (Kathleen Keating) worked on this project and graduated in2013. How have the results been disseminated to communities of interest? The website is up but the URL has not yet beendisseminated. What do you plan to do during the next reporting period to accomplish the goals? We will share the website URL with interested parties once the website isfully functional.

Impacts
What was accomplished under these goals? We completed the public version of a promoter analysis website to allow analysis of many different plant genomes. This is currently undergoing beta testing.

Publications

  • Type: Journal Articles Status: Submitted Year Published: 2014 Citation: Hudson, K. and Hudson, M. Analysis of the Sacred Lotus bHLH family.


Progress 01/01/12 to 12/31/12

Outputs
OUTPUTS: We have designed synthetic plant promoters with previously developed software using public data for three groups of promoters: constitutive strong promoters, root specific promoters, and promoters active in leaves but not in seed. We are currently in the process of generating transgenic plants containing constructs with synthetic plant promoters driving reporter genes. A website with our software is now live for all plant species in phytozome.org. PARTICIPANTS: Kathleen Keating (graduate student) and Tong Geon Lee (postdoc) have participated in this project. We are partnering with Dow Agrosciences and with the USDA ARS in Ames, IA to apply the software that has been developed. TARGET AUDIENCES: The following publication from USDA ARS in West Lafayette, IN used our software in order to identify new pathways of regulation of gene expression in soybean: Hudson, K.A. 2010. The Circadian clock-controlled transcriptome of developing soybean seeds. The Plant Genome. 3(1):1-11. Also, our software is being used in research groups in industry including at Dow. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Dow Agrosciences are working with our technology in the area of crop biotechnology. A new website to allow access to our promoter analysis software is available and has had several hits.

Publications

  • D.E. Cook, T.G. Lee, X. Guo, S. Melito, K. Wang, A.M. Bayless, J. Wang, T.J. Hughes, D.K. Willis, T.E. Clemente, B.W. Diers, J. Jiang, M.E. Hudson and A.F. Bent. 2012. Copy number variation of multiple genes at Rhg1 mediates nematode resistance in soybean. Science 338, 1206-1209.
  • B.T. James, C. Chen, A. Rudolph, K. Swaminathan, J.E. Murray, J.-K. Na, A.K. Spence, B. Smith, M.E. Hudson, S.P. Moose and R. Ming. 2012. Development of microsatellite markers in autopolyploid sugarcane and comparative analysis of conserved microsatellites in sorghum and sugarcane. Molecular Breeding 30,661.
  • Y. Li, K. Varala, S.P. Moose and M.E. Hudson. 2012. The inheritance pattern of 24 nt siRNA clusters in arabidopsis hybrids is influenced by proximity to transposable elements. PloS one 7, e47043.


Progress 01/01/11 to 12/31/11

Outputs
OUTPUTS: We have developed software approaches to design synthetic plant promoters using public data for three groups of promoters: constitutive strong promoters, root specific promoters, and promoters active in leaves but not in seed. We are currently in the process of generating transgenic plants containing constructs with synthetic plant promoters driving reporter genes. We are using fluorescent protein reporters and have developed an assay for expression levels of these reporters. PARTICIPANTS: Kathleen Keating (graduate student) has been the main participant in this project. Dow Agrosciences are a partner organization. Students on this project have many opportunities for professional development and training. TARGET AUDIENCES: Crop biotechnology companies and public sector researchers are the main target audience of this project. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
We are working together with Dow Agrosciences to apply our technology to implementations in crop biotechnology. A new web site to allow access to our promoter analysis software should be available shortly.

Publications

  • Varala, K., Swaminathan, K., Li, Y. and Hudson, M.E. 2011. Rapid genotyping of soybean cultivars using high throughput sequencing. PLoS ONE 6(9):e24811. doi:10.1371/journal.pone.0024811.
  • Li, Y., Swaminathan, K. and Hudson, M.E. 2011. Rapid, organ-specific transcriptional responses to light regulate photomorphogenic development in dicot seedlings. Plant Physiology August 2011 vol. 156 no. 4 2124-2140.


Progress 01/01/10 to 12/31/10

Outputs
OUTPUTS: The project has produced a set of software tools that can analyze promoter sequences from crop plants. We are currently in the process of supplying these tools to the USDA ARS soybase project in Ames, Iowa and to Dow Agrosciences in Indianapolis. Dow Agrosciences is funding further research in our lab on how to apply these tools to choose and engineer promoters for the expression of transgenes. PARTICIPANTS: Kathleen Keating (graduate student) and Shayan Dhanani (undergraduate) have participated in this project. We are partnering with Dow Agrosciences and with the USDA ARS in Ames, IA to apply the software that has been developed. TARGET AUDIENCES: The following publication from USDA ARS in West Lafayette, IN used our software in order to identify new pathways of regulation of gene expression in soybean: Hudson, K.A. 2010. The Circadian clock-controlled transcriptome of developing soybean seeds. The Plant Genome. 3(1):1-11. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
We have identified a large number of promoters from the soybean genome corresponding to all the promoters of annotated genes. These have been used by ourselves and others to identify new regulatory elements as well as those elements previously known from other species, to identify their role in soybean transcriptional networks. We are currently making transgenic plants using selected promoter constructs.

Publications

  • No publications reported this period


Progress 01/01/09 to 12/31/09

Outputs
OUTPUTS: Regulatory DNA elements discovery algorithms continue to have impact in research, now from other groups (for example, a publication from USDA ARS at Purdue University recently used our website at http://stan.cropsci.uiuc.edu/index.php and cited it). Our website now allows the discovery of regulatory elements in soybean, and this has allowed more relevance to agricultural gene expression systems. We are now exploring the use of regulatory site discovery in combination with RNA sequencing. This has led to a collaboration with Dow Agrosciences to explore computational analysis of gene expression in crops. PARTICIPANTS: Two graduate trainees and three undergraduate trainees were trained as part of this project. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
Small motif analysis methods developed under this project have been further used to develop genotyping methods for crops. Publications are in preparation on these methods. The new project with Dow Agrosciences is completely derived from this project. It has also lead to several invited presentations at conferences and university seminar series.

Publications

  • Schwarz, D., Robertson, H,M., Feder J.L., Varala, K., Hudson, M.E., Ragland, G.J., Hahn, D.A and Berlocher, S.H. 2009. Sympatric ecological speciation meets pyrosequencing: Sampling the transcriptome of the apple maggot Rhagoletis pomonella. BMC Genomics (In Press).
  • Kim, K.-S., Bellendir, S., Hudson, K., Hill, C., Hartman, G., Hyten, D., Hudson, M. and Diers, B. 2009. Fine mapping the soybean aphid resistance gene Rag1 in soybean. Theoretical App. Genet. (In Press).
  • Kikis, E.A., Oka, Y., Hudson, M.E., Nagatani, A. and Quail, P.H. 2009. Residues clustered in the light-sensing knot of Phytochrome B are necessary for conformer-specific binding to signaling partner PIF3. PLoS Genetics 5: e1000352.
  • Hudson, M.E. and Kane, N. 2009. Plant genomes do a balancing act. Mol. Ecol. 18, 2743-2745.
  • Pan, Y., Michael, T.P., Hudson, M.E., Kay, S., Chory, J. and Schuler, M.A. 2009. Cytochrome P450s as reporters for circadian-regulated pathways. Plant Physiol. 150, 858-878.
  • Muellner, M.G., Attene-Ramos, M.S., Hudson, M.E., Wagner, E.D. and Plewa, M.J. 2009. Human cell toxicogenomic analysis of bromoacetic acid: A regulated drinking water disinfection by-product. Env. Mol. Mutagenesis (In Press).
  • Tuteja, J.H., Zabala, G., Varala, K., Hudson, M. and Vodkin, L. 2009. Endogenous, tissue-specific siRNAs silence the chalcone synthase gene family in Glycine max seed coats. Plant Cell (In Press).


Progress 01/01/08 to 12/31/08

Outputs
OUTPUTS: The algorithms developed to discover regulatory DNA elements in plants have resulted in the discovery of a key mechanism by which plants respond to predation by insects (see 2007 report) but the elements predicted to control light-responsive gene expression have now been shown to be non-sufficient for the response of gene expression to light (unlike the element responsible for touch/predation response). The Web site developed for this project continues to have impact. See http://stan.cropsci.uiuc.edu/index.php. PARTICIPANTS: Two graduate trainees and two undergraduate trainees were trained as part of this project. One graduate trainee received a Master of Science degree. TARGET AUDIENCES: Not relevant to this project. PROJECT MODIFICATIONS: Negative results on the competence of predicted motifs to confer light-responsiveness has led to the de-prioritization of this line of research. The main focus has now shifted to short read sequencing methods and computational data analysis for these techniques. We are using these techniques to explore the role of small RNA in gene expression responses, which may explain our earlier results.

Impacts
Small motif analysis methods developed under this project have been applied to the assembly and analysis of short read sequence data, with impacts in many fields including entomology and soybean genetics. This has lead to publications in diverse fields such as entomology, social behavior and nematology as well as crop genomics. It has also lead to several invited presentations at conferences and university seminar series.

Publications

  • Kaczorowski, K.A., Kim, K.-S., Diers, B.W. and Hudson, M. 2008. Microarray-based genetic mapping using soybean near-isogenic lines and generation of SNP markers in the Rag1 aphid-resistance interval. Plant Genome 1, issue 2.
  • Craig, J.P., Bekal, S., Hudson, M., Domier, L., Niblack, T. and Lambert, K.N. 2008. Analysis of a horizontally transferred pathway involved in vitamin B6 biosynthesis from the soybean cyst nematode Heterodera glycines. Mol. Biol. Evol. 25, 2085-2098
  • Bekal, S., Craig, J., Hudson, M., Niblack, T., Domier, L. and Lambert, K.N. 2008. Genomic DNA sequence comparison between two inbred soybean cyst nematode biotypes facilitated by massively parallel 454 micro-bead sequencing. Mol. Genet. Genomics 279, 535-543.


Progress 01/01/07 to 12/31/07

Outputs
OUTPUTS: The algorithms developed to discover regulatory DNA elements in plants have resulted in the discovery of a key mechanism by which plants respond to predation by insects (see Publications). The Web site developed for this project continues to have impact in several areas of plant biology, as evidenced by a recent citation in a paper in the journal Science. PROJECT MODIFICATIONS: Our focus on small DNA motifs has led to the inclusion of short read sequencing methods and computational data analysis for these techniques into the project, in addition to the continuing research on regulatory DNA.

Impacts
Regulatory DNA analysis using the methods developed under this project has led to the knowledge that plant genomes respond to herbivory by means of a previously uncharacterized pathway. This has had an impact on the fields of plant insect interactions and signal transduction. Also, the publication of these results has led to increased impact of the website and thus hopefully greater impact in future. The transgenic plants developed as part of this project are now being characterized to monitor pathway activity during maturity and flowering. The light responsive elements are expected to lead to more discovery of uncharaterized processes in plants. Small motif analysis methods developed under this project have been applied to the assembly and analysis of short read sequence data, with impacts in many fields including entomology and soybean genetics.

Publications

  • Hudson, M. 2007. Sequencing breakthroughs for genomic ecology and evolutionary biology. Published online October 1 2007. Molecular Ecology Notes, OnlineEarly Articles. doi:10.1111/j.1471-8286.2007.02019.x. (Foprthcoming In Molecular Ecology Resources).
  • Toth, A.L., Varala, K., Newman, T.C., Miguez, F.E., Hutchison, S.K., Willoughby, D.A., Simons, J.F., Egholm, M., Hunt, J.H., Hudson, M. and Robinson, G.E. 2007. Wasp gene expression supports an evolutionary link between maternal behavior and eusociality. Science 318, 441-444.
  • Walley, J.W., Coughlan, S., Hudson, M., Covington, M.F., Kaspi, R., Banu, G., Harmer, S.L. and Dehesh, K. 2007. Mechanical stress induces biotic and abiotic stress responses via a novel cis-element. PLOS Genetics, doi:10.1371/journal.pgen.0030172.eor.
  • Hudson, M., Bruggink, T., Chang, S.H., Yu, W., Han, B., Wang, X., van der Toorn, P. and Zhu, T. 2007. Analysis of gene expression during Brassica seed germination using a cross-species microarray platform. Plant Genome 2, S-96-S-117, Crop Science 47, 96-112.
  • Swaminathan, K., Varala, K. and Hudson, M. 2007. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey. BMC Genomics 8:132.


Progress 01/01/06 to 12/31/06

Outputs
Transgenic Arabidopsis plants carrying luciferase reporter constructs have been created as a means to check the computational prediction of regulatory elements. Preliminary experiments suggest that at least two regulatory motifs have the ability to confer light regulation on a reporter gene. In addition, refinement of computer prediction algorithms has led to further advancement in the number of promoter elements detected in two recent experiments, one with defense elicitors and another with signaling analogs. A new tool has been made available to allow searches with microarray data for known regulatory elements.

Impacts
Many downloads of the Sift software and users of the website indicate impact on several laboratories working on regulation of plant gene expression, as do two citations in the literature. We hope that improved versions of our website will lead to greater impact.

Publications

  • Hudson, M.E. 2006. Photoreceptor biotechnology. In: Light and Plant Development, G.C. Whitelam, ed., Blackwell Publishing.


Progress 01/01/05 to 12/31/05

Outputs
All necessary equipment has been installed and configured in the laboratory, including a dual-processor Linux server for bioinformatics services (the server has a website and BLAST server accessible to Crop Sciences, and another site externally accessible at stan.cropsci.uiuc.edu). An IBM Linux cluster has been installed and configured for bioinformatics research. In addition an externally acessible beta version of a promoter analysis site has been deployed that has already generated interest from several other laboratories. A program (degsuite) has been developed using this hardware by Adam Thomas under the supervision of the PI. This program is the first to offer statistically rigorous promoter motif analysis, where the experimenter can determine potential regulatory sites on the basis of P values calculated using the hypergeometric distribution and corrected for multiple testing.

Impacts
It is expected that the Degsuite program and the promoter analysis website will have a significant impact on laboratories working on the discovery of promoter elements in plants. It has already led to the discovery of potential regulatory elements in the promoters of plant P450 genes involved in defense against disease (manuscript in preparation). Consequently it is hoped that this work will lead to a more thorough understanding of the regulation of plant gene expression.

Publications

  • Chen, W.J., Chang, S.H., Hudson, M., Kwan, W.K., Li, J., Estes, B., Knoll, D., Shi, L. and Zhu, T. 2005. Contribution of transcriptional regulation to natural variations in Arabidopsis. Genome Biology 6:R32.