Source: UTAH STATE UNIVERSITY submitted to
REPRODUCTIVE PERFORMANCE IN DOMESTIC RUMINANTS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0227686
Grant No.
(N/A)
Project No.
UTA01062
Proposal No.
(N/A)
Multistate No.
W-2112
Program Code
(N/A)
Project Start Date
Oct 1, 2011
Project End Date
Sep 30, 2016
Grant Year
(N/A)
Project Director
Stevens, J.
Recipient Organization
UTAH STATE UNIVERSITY
(N/A)
LOGAN,UT 84322
Performing Department
Agricultural Experiment Station
Non Technical Summary
Gene expression technology has been applied to [nuclear transfer] cattle cloning to identify and characterize a potential genetic basis for successful vs. unsuccessful cloned pregnancies. Conventional statistical methods to analyze gene expression data do not allow for either controlling meaningful error rates within the framework of nested hypotheses or characterizing multivariately differentially expressed genes, which abound in these cattle cloning experiments. The purpose of this study is to develop and disseminate appropriate statistical methods for such experiments, including the creation of convenient implementations of these methods. The identification of appropriate statistical methods will constitute a change in knowledge (both for the principal investigator as well as the target audience of agricultural genomics researchers), and the creation of convenient software implementations will lead to a change in actions as these methods are adopted by agricultural genomics researchers.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
9017310209090%
3013310108010%
Goals / Objectives
Discover and translate molecular, metabolic, genomic, endocrine, and immunologic mechanisms that influence testicular and ovarian function, reproductive behavior, conception rate, embryo and fetal development, attainment of puberty, and effects of climate/season on reproductive patterns of domestic ruminants. [Activities] The principal investigator will mentor statistics graduate students on thesis and dissertation topics related to this project, and will teach the relevant statistical methods in an annual senior/graduate-level statistical bioinformatics course. [Events] The results of this project will be disseminated to the target audience through both conference participation and publication. [Products] The statistical methods developed by this project will be packaged into convenient software implementations made available to the target audience.[Change in Knowledge] Agricultural genomics researchers will become aware of the appropriate data analysis methods for nested hypothesis testing situations. [Change in action] Agricultural genomics researchers will adopt these appropriate data analysis methods for nested hypothesis testing situations. By adopting these statistical methods, agricultural genomics researchers will make more justified conclusions from their genomics experiments.
Project Methods
The W2112 project outline states that "fulfillment of the objective should lead to the development and application of methodologies to improve animal health, well-being, and reproductive efficiency of domestic ruminants." The methodologies to be developed by the proposed research are statistical methods for the appropriate analysis of gene expression data. The intent of these methods is to identify and characterize differentially expressed genes between SCNT and non-SCNT samples, so that the genetic basis for successful SCNT pregnancies can be better understood, leading to a greater SCNT success rate, so that attributes such as carcass quality can be maximized and preserved in successive generations of cattle. To develop the statistical tools to achieve the general W2112 project objective the following specific five procedures will be employed: [1] Develop a global multivariate gene set test for the case where each gene has multiple P-value corresponding to multiple tests of differential expression. This will be a natural extension of an existing univariate gene set test to characterize significantly differentially expressed genes, which uses a single P-value for differential expression of each gene. The extension will make use of meta-analytic methods to combine P-values. [2] Develop a nonparametric multivariate test of shift between a gene set and its complement. This will be a variation on an existing test for differential expression that relies on a multivariate test for shift. The emphasis will be on characterizing significant genes in the SCNT example. [3] Extend tree-based gatekeeping strategies to a "forest" of trees. Existing methods for controlling error rates in high-dimensional testing will be synthesized to identify (via simulation) one that best controls the false discovery rate in cases of large-scale nested testing, where one hypothesis (like a test of whether a gene exhibits an embryo type by gestation days interaction) serves as a gate-keeper for another set of tests (such as tests of whether a gene is differentially expressed across embryo types). [4] Adapt the recent Benjamini-Hochberg tree approach to allow non-homogeneous trees and dependent tests. In the motivating SCNT example, we expect non-homogeneity (such as false nulls [real differences between embryo types] possibly nested within true nulls [no embryo type by gestation days interaction]) and dependent tests (such as relationships among genes). [5] Well-documented interfaces to the developed statistical methods will be created for the agricultural bioinformatics community. An R package (or two) will be created and submitted to the Bioconductor repository to supplement published papers summarizing the findings of the project, with particular application to the SCNT gene expression work. Conference presentations (including to the Conference on Applied Statistics in Agriculture) will aid in the dissemination of these results.

Progress 10/01/11 to 09/30/16

Outputs
Target Audience:Target Audience The target audience of this project has been the community of agricultural genomics researchers, with a focus on those working in domestic livestock reproduction, as well as on the agricultural statistics community supporting such genomics research. Changes/Problems:Changes/Problems From the original project proposal, Objective 2 (Develop a nonparametric multivariate test of shift between a gene set and its complement) and Objective 4 (Adapt the Benjamini-Hochberg tree approach to allow non-homogeneous trees and dependent tests) ended up not being pursued once they were determined to be statistically and biologically unjustifiable, as described in the Accomplishments section above. What opportunities for training and professional development has the project provided?Opportunities This project has supported four graduate students -- Garrett Saunders (PhD 2014), Dennis Mecham (MS 2014), Russell Banks (MS 2015), and Michael Bishop (MS expected 2017). This support included annual participation (for principal investigator and students) in the Conference on Applied Statistics in Agriculture from 2012-2016, and in the Joint Statistical Meetings from 2013-2015. In addition, the principal investigator has used examples and statistical ideas from this project in his Statistical Bioinformatics course to train both undergraduate and graduate students (from both statistics and animal science backgrounds). How have the results been disseminated to communities of interest?Dissemination The many results of this project have been disseminated through a combination of the following: [1] formal conference presentations (five times to the Conference on Applied Statistics in Agriculture, and three times to the Joint Statistical Meetings) [2] publications (five statistical methodology papers, one animal science application paper, and two animal science abstracts) [3] discussions at conferences (five times at the Conference on Applied Statistics in Agriculture, three times at the Joint Statistical Meetings, three times at the annual meeting of the multi-state project Research Advances in Agricultural Statistics, and three times at the annual meeting of the multi-state project Reproductive Performance in Domestic Ruminants) [4] depositing of the report/thesis/dissertation of three completed graduate students in Digital Commons [5] depositing (and maintaining) our mvGST package for R in the Bioconductor repository What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Accomplishments The accomplishments of this project can be best summarized by referring to the five objectives of the original proposal. Objective 1: [Develop a global multivariate gene set test.] The gene set testing method that we published in the 2012 Proceedings of the Conference on Applied Statistics in Agriculture was implemented in our R package mvGST by MS student Dennis Mecham (Objective 5 below) in 2014. In 2015, this test was expanded to handle gene annotation ambiguities (see Objective 5 below) by MS student Russell Banks. Objective 2: [Develop a nonparametric multivariate test of shift between a gene set and its complement.] MS student Russell Banks investigated this development in 2014 and determined that it would be inherently problematic. It would necessarily involve a competitive test (comparing a gene set to its complement) and gene sampling, both of which were shown in a 2007 paper to decrease statistical power and lead to biologically suspect conclusions. In light of this, Objective 2 was no longer pursued. Objective 3: [Extend tree-based gatekeeping strategies to a "forest" of trees.] In 2014 our paper in BMC Bioinformatics (with PhD student Garrett Saunders) showed how the tree-based gatekeeping strategies can be applied to a gene ontology graph, at any specified focus level (or depth in the tree). This essentially fulfilled Objective 3 because the "forest" of trees is a special case where the focus level is chosen below the root node, generating many (even thousands of) disjoint trees. Our tree-based gatekeeping approach was also extended beyond gene expression data to QTL data in our 2014 BMC Genetics and 2014 Current Genomics papers (both with PhD student Garrett Saunders). Objective 4: [Adapt the Benjamini-Hochberg tree approach to allow non-homogeneous trees and dependent tests.] PhD student Garrett Saunders demonstrated (both in our 2014 BMC Bioinformatics paper as well as his May 2014 dissertation) that the Benjamini-Hochberg approach was less meaningful because it only controls the false discovery rate, rather than the family-wise error rate. Our 2014 BMC Bioinformatics paper (from Objective 3) discusses the added value in considering the family-wise error rate. In light of this, Objective 4 was no longer pursued. Objective 5: [Provide well-documented interfaces to the developed statistical methods for the agricultural bioinformatics community.] Our software package mvGST (for multivariate gene set testing) was accepted in 2014 to Bioconductor, a peer-reviewed repository of R packages for the analysis of bioinformatic data. This package implements both the global multivariate gene set test (Objective 1) and the tree-based gatekeeping strategy in a gene ontology graph (Objectives 3 and 4). In 2015, MS student Russell Banks developed methods to expand the utility of this package to handle data from non-model organisms, and also to resolve gene annotation ambiguities. With our mvGST package, agricultural genomics researchers can now conveniently use their gene expression data (from any platform and in any annotated organism, not only model organisms) to identify biological processes that are differentially active (up or down) in multiple comparisons of simultaneous interest, with strong family-wise error rate control that was previously not computationally feasible. Since its 2014 acceptance to Bioconductor, the mvGST package has been downloaded by over 2,000 unique users. As the initial objectives of this project were being completed, attention was also paid to ongoing collaborations with animal reproductive performance researchers, focusing on their research questions involving gene expression analysis and gene set testing.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Presentations Bishop, M. O., Stevens, J. R., Isom, S., Conference on Applied Statistics in Agriculture, "Assessing Individual Oocyte Viability through Gene Expression Profiles," Kansas State University. (May 2, 2016 - May 3, 2016)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2016 Citation: Presentations Stevens, J. R., Animal Reproduction and Biotechnology Laboratory Seminar, "Annotation Tools for Multivariate Gene Set Testing in Non-Model Organisms," Colorado State University Department of Biomedical Sciences. (October 10, 2016)
  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Publications Polejaeva, I., Ranjan, R., Davies, C., Regouski, M., Hall, J., Olsen, A., Meng, Q., Rutigliano, H., Dosdall, D., Angel, N., Sachse, F., Seidel, T., Thomas, A., Stott, R., Panter, K., Lee, P., Van Wettere, A., Stevens, J. R., Wang, Z., MacLeod, R., Marrouche, N. F., White, K. (2016). Increased Susceptibility to Atrial Fibrillation Secondary to Myocardial Fibrosis in Transgenic Goats Expressing Transforming Growth Factor-?1 in the Heart. Journal of Cardiovascular Electrophysiology, 27, 1220-1229. onlinelibrary.wiley.com/doi/10.1111/jce.13049/epdf


Progress 10/01/14 to 09/30/15

Outputs
Target Audience:Target Audience During this reporting period, we targeted agricultural genomics researchers as well as statistics researchers involved in methodological developments for genomic data analysis. Changes/Problems:Changes/Problems As this project nears completion this year, I will turn attention to a new problem that has arisen with my collaborator Clay Isom. Briefly, Dr. Isom is interested in using gene expression data to say something about oocyte quality, comparing expression values from cloned oocytes to in vivo oocytes (the gold standard). We hope to develop appropriate statistical methods for this, with the aim of identifying cloned oocytes more likely to have success rates closer to those seen with in vivo oocytes. I hope to submit this as a new project, related to either the W2112 (Reproductive Performance in Domestic Ruminants) or the NCCC170 (Research Advances in Agricultural Statistics) multi-state projects. What opportunities for training and professional development has the project provided?Opportunities The project provided partial support to one MS student (Russell Banks), allowing him to both develop efficient statistical computing skills and practice interdisciplinary statistical consulting in bioinformatics applications. Also, I continue to use examples and statistical ideas from this project in my Statistical Bioinformatics course to both undergraduate and graduate students (from both statistics and animal science backgrounds). How have the results been disseminated to communities of interest?Dissemination This year, dissemination has involved (1) two presentations (at the Conference on Applied Statistics in Agriculture, and at the Joint Statistical Meetings), (2) discussions at the annual meeting of the NCCC170 (Research Advances in Agricultural Statistics) multi-state project, and (3) the depositing of my MS student Russell Banks's thesis in Digital Commons. What do you plan to do during the next reporting period to accomplish the goals?Plan of Work My work on this project this next year will focus on collaborations with animal reproductive performance researchers, and on the combined refinement / publicity of our mvGST software package. Objective 1: I will continue my collaborations with animal reproductive performance researchers, focusing on their research questions involving gene expression analysis and gene set testing. These collaborators at Utah State primarily include Clay Isom, Abby Benninghoff, and Irina Polejaeva. Objective 5: I will incorporate the methods developed by my MS student Russell Banks to our mvGST package. This will expand the utility of our mvGST package beyond basic model organisms, to handle data from the sheep genome, for example. It will also resolve some remaining mvGST package issues with multiple gene naming systems. We anticipate submitting a manuscript presenting and promoting this package.

Impacts
What was accomplished under these goals? Accomplishments Objective 1: Develop a global multivariate gene set test. This objective was effectively fulfilled in 2014, as described in that year's report. In 2015, this test was expanded to handle gene annotation ambiguities (see Objective 5 below), such as when there is not a one-to-one correspondence between gene names in gene expression data for a non-model organism and gene names in an annotation database. Objective 2: Develop a nonparametric multivariate test of shift between a gene set and its complement. This objective was dropped in 2014, as described in that year's report. Objective 3: Extend tree-based gatekeeping strategies to a "forest" of trees. This objective was effectively fulfilled in 2014, as described in that year's report. Objective 4: Adapt the Benjamini-Hochberg tree approach to allow non-homogeneous trees and dependent tests. This objective was effectively fulfilled in 2014, as described in that year's report. Objective 5: Provide well-documented interfaces to the developed statistical methods for the agricultural bioinformatics community. Our software package mvGST (for multivariate gene set testing) was accepted in 2014 to Bioconductor, a peer-reviewed repository of R packages for the analysis of bioinformatic data. This package implements both the global multivariate gene set test (Objective 1) and the tree-based gatekeeping strategy in a gene ontology graph (Objectives 3 and 4). In the current 2015 reporting year, my MS student Russell Banks developed methods to expanded the utility of this package to handle data from non-model organisms, and also to resolve gene annotation ambiguities. With our mvGST package, agricultural genomics researchers can now conveniently use their gene expression data (from any platform and in any annotated organism, not only model organisms) to identify biological processes that are differentially active (up or down) in multiple comparisons of simultaneous interest, with a strong family-wise error rate control that was previously not computationally feasible. During the 2015 reporting year, the mvGST package has been downloaded by over 1,000 unique users.

Publications

  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Presentations Stevens, J. R. (Presenter & Author), Mecham, D. (Author Only), Isom, S. C. (Author Only), Saunders, G. (Author Only), Joint Statistical Meeting, "mvGST: Multivariate and Directional Gene Set Testing," American Statistical Association, Seattle, Washington. (August 8, 2015 - August 13, 2015)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Presentations Stevens, J. R. (Presenter & Author), Saunders, G. (Author Only), Fu, G. (Author Only), Research Advances in Agricultural Statistics, "The Bivariate Null Kernel Method for LD-based QTL Mapping," NCCC-170 / USSES, Mayaguez, Puerto Rico. (June 25, 2015 - June 26, 2015)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2015 Citation: Presentations Stevens, J. R. (Presenter & Author), Mecham, D. (Author Only), Isom, S. C. (Author Only), Saunders, G. (Author Only), Conference on Applied Statistics in Agriculture, "mvGST: Multivariate and Directional Gene Set Testing," Kansas State University, Manhattan, Kansas. (April 27, 2015 - April 28, 2015)


Progress 10/01/13 to 09/30/14

Outputs
Target Audience: Target Audience During this reporting period, we targeted agricultural genomics researchers as well as statistics researchers involved in methodological developments for genomic data analysis. Changes/Problems: In light of our discovery this year that a nonparametric multivariate test of shift between a gene set and its complement will be inherently problematic (see Accomplishments section above), Objective 2 will no longer be pursued. Also, in light of our discovery that the Benjamini-Hochberg approach could be considered less meaningful than a family-wise error rate approach (see Accomplishments section above), Objective 4 will no longer be pursued. What opportunities for training and professional development has the project provided? The project provided partial support to two MS students (Dennis Mecham and Russell Banks) and one PhD student (Garrett Saunders), allowing them to both develop efficient statistical computing skills and practice interdisciplinary statistical consulting in bioinformatics applications. In addition, the Ph.D. student (Garrett Saunders) made a conference poster presentation (to agricultural statistics researchers) that helped him refine his statistical methodology and communication skills Finally, I continue to use examples and statistical ideas from this project in my Statistical Bioinformatics course to both undergraduate and graduate students (from both statistics and animal science backgrounds). How have the results been disseminated to communities of interest? This year, dissemination has involved (1) a presentation (at the Conference on Applied Statistics in Agriculture), (2) discussions at the annual meeting of the W2112 (Reproductive Performance in Domestic Ruminants) multi-state project, (3) publication of three manuscripts (principally our November 2014 BMC Bioinformatics paper, which focused on gene set testing with a reproductive performance application; our June 2014 BMC Genetics and November 2014 Current Genomics papers involve extensions of our statistical methods for QTL data), and (4) inclusion of our mvGST software package in the Bioconductor repository in October 2014. What do you plan to do during the next reporting period to accomplish the goals? My work on this project this next year will focus on collaborations with animal reproductive performance researchers (particularly those involving gene set testing), and on the combined refinement / publicity of our mvGST software package. Objective 1: I will continue my collaborations with animal reproductive performance researchers, focusing on their research questions involving gene set testing. These collaborators at Utah State include Clay Isom (current USDA grant and pending NIH R21 proposal), Ken White and Abby Benninghoff (pending NIH R01 proposal), and Jeff Mason (upcoming NIH R01 proposal). I am also collaborating with another W2112 project member, Tod Hansen at Colorado State, on a gene set testing project with sheep reproduction data; my MS student Russell Banks is involved in this collaboration. Objective 5: My MS student Russell Banks will work to expand the utility of our mvGST package beyond basic model organisms, to handle data from the sheep genome, for example. I will also work with Russell to resolve some remaining mvGST package issues with multiple gene naming systems. We anticipate submitting a manuscript advertising this package, and making at least one conference presentation.

Impacts
What was accomplished under these goals? Objective 1: Develop a global multivariate gene set test. The gene set testing method that I published in my May 2012 paper (with collaborator Clay Isom) was implemented in the R package mvGST by my MS student Dennis Mecham (Objective 5 below). Objective 2: Develop a nonparametric multivariate test of shift between a gene set and its complement. My MS Statistics student Russell Banks investigated this development and determined that it would be inherently problematic. It would necessarily involve a competitive test (comparing a gene set to its complement) and gene sampling, both of which were shown in a 2007 paper to decrease statistical power and lead to biologically suspect conclusions. In light of this, Objective 2 will no longer be pursued. Objective 3: Extend tree-based gatekeeping strategies to a "forest" of trees. In November 2014, I published a paper in BMC Bioinformatics (with my PhD student Garrett Saunders and collaborator Clay Isom) showing how the tree-based gatekeeping strategies can be applied to a gene ontology graph, at any specified focus level (or depth in the tree). This essentially fulfills Objective 3 because the "forest" of trees is a special case where the focus level is chosen below the root node, generating many (even thousands of) disjoint trees. Our tree-based gatekeeping approach was also extended beyond gene expression data to QTL data in our June 2014 BMC Genetics and November 2014 Current Genomics papers (both with my PhD student Garrett Saunders and collaborator Guifang Fu). Objective 4: Adapt the Benjamini-Hochberg tree approach to allow non-homogeneous trees and dependent tests. My PhD student Garrett Saunders demonstrated (both in our November 2014 BMC Bioinformatics paper as well as his May 2014 dissertation) that the Benjamini-Hochberg approach was less meaningful because it only controls the false discovery rate, rather than the family-wise error rate. Our November 2014 BMC Bioinformatics paper (from Objective 3) discusses the added value in considering the family-wise error rate. In light of this, Objective 4 will no longer be pursued. Objective 5: Provide well-documented interfaces to the developed statistical methods for the agricultural bioinformatics community. In October 2014 our software package mvGST (for multivariate gene set testing) was accepted to Bioconductor, a peer-reviewed repository of R packages for the analysis of bioinformatic data. I authored this package along with my MS Statistics student Dennis Mecham. This package implements both the global multivariate gene set test (Objective 1) and the tree-based gatekeeping strategy in a gene ontology graph (Objectives 3 and 4). My MS student Russell Banks is working to expand the utility of this package beyond basic model organisms. With our mvGST package, agricultural genomics researchers can now conveniently use their gene expression data (from any platform) to identify biological processes that are differentially active (up or down) in multiple comparisons of simultaneous interest, with a strong family-wise error rate control that was previously not computationally feasible. Since its acceptance to Bioconductor in October 2014, the mvGST package has been downloaded by over 250 unique users.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Fu, G., Saunders, G., Stevens, J. R. (2014). Holm multiple correction for large-scale gene-shape association mapping. BMC Genetics, 15(Suppl 1), S5. www.biomedcentral.com/1471-2156/15/S1/
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Saunders, G., Stevens, J. R., Isom, C. (2014). A shortcut for multiple testing on the directed acyclic graph of Gene Ontology. BMC Bioinformatics, 15(349).
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Saunders, G., Fu, G., Stevens, J. R. (2014). A Graphical Weighted Power Improving Multiplicity Correction Approach for SNP Selections. Current Genomics, 15(5), 380-389. benthamscience.com/journal/abstracts.php?journalID=cg&articleID=125861
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Saunders, G., Stevens, J. R., Isom, S. C. (2014). A shortcut for multiple testing on the directed acyclic graph of gene ontology. BMC Bioinformatics, 15, 349. www.biomedcentral.com/1471-2105/15/349
  • Type: Other Status: Published Year Published: 2014 Citation: Stevens, J. R., Mecham, D. S. (2014). mvGST: multivariate and directional gene set testing. Bioconductor. www.bioconductor.org/packages/release/bioc/html/mvGST.html
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Stevens, J. R., Graduate student seminar, "Research Program Overview," Utah State University, Dept. of Mathematics and Statistics. (September 2014)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2014 Citation: Saunders, G., Fu, G., Stevens, J. R., Conference on Applied Statistics in Agriculture, "A Hierarchical Weighted-Bonferroni Multiplicity Correction in Linkage Disequilibrium Based QTL Mapping," Kansas State University, Manhattan, KS. (April 28, 2014 - April 29, 2014) 1062


Progress 01/01/13 to 09/30/13

Outputs
Target Audience: During this reporting period, we targeted agricultural genomics researchers as well as statistics researchers involved in methodological developments for genomic data analysis. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? During this reporting period, the principal investigator has advised two Statistics graduate students (Ph.D. student Garrett Saunders and M.S. student Dennis Mecham) on dissertation/thesis topics motivated by this project. This has increased the statistical expertise of the students in the areas of multiple hypothesis testing as well as statistical computing. In addition, the supported Ph.D. student made two conference poster presentations that allowed him to discuss the details of his work with other agricultural statistics researchers. Through these discussions, he was introduced to other possible applications of his work, and this has led to an ongoing collaboration with QTL methodology researcher Guifang Fu (another faculty member in the Department of Mathematics and Statistics). This collaboration has involved a great deal of one-on-one mentoring regarding issues of multiple hypothesis testing, and has led to one manuscript submitted (to Genetics) and another in preparation. Finally, the principal investigator continues to use examples and statistical ideas he has learned through this project in his annual Statistical Bioinformatics course to both undergraduate and graduate students. How have the results been disseminated to communities of interest? The principal investigator and the supported Ph.D. student were involved in presentations at the Kansas State University Conference on Applied Statistics in Agriculture (targeting agricultural genomics statisticians) and the Joint Statistical Meetings (targeting more general biomedical genomics statisticians). These presentations focused on the statistical methods developed as part of this project. The biological knowledge gained from applying these methods to the motivating livestock cloning experiment were published (with collaborator S. Clay Isom) in Physiological Genomics and presented at the International Plant & Animal Genome conference (targeting agricultural genomics researchers). Finally, the principal investigator referred to this project during an invited talk at the Mathematics Department of BYU-Idaho, where the students were unaware of such research activities for statisticians. What do you plan to do during the next reporting period to accomplish the goals? During the next reporting period, the principal investigator will continue to focus on the development and refinement of statistical methods and tools for agricultural genomics researchers in the areas of gene set testing (Specific Aim 1) and structured multiple hypothesis testing (Specific Aim 2). Specific Aim 1 in the initial project proposal was to develop statistical methods for “Characterization of multivariately differentially expressed genes”. Work planned during the next reporting period under this Specific Aim include the following: (1) The principal investigator will graduate M.S. student Dennis Mecham, whose research project includes the development of the mvGST (MultiVariate Gene Set Testing) software package. (2) The mvGST package will be submitted to an appropriate online repository. (3) A manuscript co-authored by M.S. student Dennis Mecham will be submitted for publication to publicize the mvGST package to agricultural genomics researchers. (4) The principal investigator will recruit a new graduate student to look at a nonparametric multivariate test of shift between a gene set and its complement (a remaining objective from the initial project proposal). Specific Aim 2 in the initial project proposal was to develop statistical methods that “Control error rates in nested multiple hypothesis tests.” Work planned during the next reporting period under this Specific Aim include the following: (1)The principal investigator will graduate Ph.D. student Garrett Saunders, whose dissertation includes the development of a computationally efficient method to control the family-wise error rate in nested gene set testing (the focuslevel software package). (2) The Ph.D. student Garrett Saunders will make his focuslevel package publicly available through an appropriate online repository. (3) The principal investigator will work with Ph.D. student Garrett Saunders to publish 2-3 papers from his dissertation, all involving error rate control in nested multiple hypothesis testing. One paper will publicize this focuslevel package to agricultural genomics researchers. (4) The Ph.D. student Garrett Saunders will make a presentation at the Conference on Applied Statistics in Agriculture, focusing on his developed methods to control the family-wise error rate in nested multiple hypothesis testing. Finally, the principal investigator will continue collaborating with colleagues in the Animal, Dairy, and Veterinary Sciences Department on activities and grant proposals relevant to both Specific Aims 1 and 2, including a funded 3-year USDA grant. These collaborators include Clay Isom, Abby Benninghoff, and Ken White.

Impacts
What was accomplished under these goals? Specific Aim 1 in the initial project proposal was to develop statistical methods for “Characterization of multivariately differentially expressed genes”. (1) The major activity completed within this Specific Aim 1 has been that a Statistics M.S. student (Dennis Mecham) has written functions to be included in a statistical software package. These functions are efficient implementations of statistical methods developed by the principal investigator in a 2012 conference paper on multivariate differential expression. (2) One of the objectives within this Specific Aim 1 was the construction of a statistical software package to be disseminated. The package (mvGST, for MultiVariate Gene Set Testing) is not yet complete, but the core functions it will use have been written. Progress on this Specific Aim is still preliminary, so that there are not yet (3) significant findings or (4) key impacts to report. Specific Aim 2 in the initial project proposal was to develop statistical methods that “Control error rates in nested multiple hypothesis tests.” (1) The major activity completed within this Specific Aim 2 has been the development of a tree-based gatekeeping method to characterize differentially expressed genes in terms of their common biological processes. Thousands of biological processes are represented in the Gene Ontology (GO) database, and a Statistics Ph.D. student (Garrett Saunders) partially supported by this project has developed and validated a statistical method to use gene expression data (as from a motivating livestock cloning experiment involving various embryo types) to test each biological process for differential expression (or differential activity) between embryo types. This necessarily involves nested hypothesis tests because some biological processes are special cases of others. (2) One of the specific objectives under Specific Aim 2 was the creation and dissemination of a statistical software package implementing the developed statistical methods. The Ph.D. student has created this package (focuslevel), and at the end of this reporting period is preparing to submit it to an online repository of packages. (3) The most significant results produced by this student were findings that his method is 15,000 times faster than the currently available method for testing in structured scenarios, and that his method limits the probability of any Type I errors. The computation time is important because the currently available method (through the globaltest package) is only feasible on very small (and contrived) datasets, whereas this newly-developed method is applicable to large, real datasets, such as the data from the motivating livestock cloning experiment. (4) A key accomplishment in this reporting period is that through the Ph.D. student’s two conference presentations, the agricultural statistics community became aware of the computational feasibility of testing all biological processes in a structured setting while limiting the probability of any Type I errors.

Publications

  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Isom, S. C., Stevens, J. R., Li, R., Spollen, W., Cox, L., Spate, L., Murphy, C., Prather, R. (2013). Transcriptional profiling by RNA-Seq of peri-attachment porcine embryos generated using a variety of assisted reproductive technologies (ART). Physiological Genomics, 45(14), 577-589. physiolgenomics.physiology.org/content/45/14/577.abstract
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Saunders, G. (Presenter & Author), Stevens, J. R. (Author Only), Isom, C. (Author Only), Joint Statistical Meetings, "An Improved FWER-Controlling Method in Gene Ontology Graphs," American Statistical Association, Montreal, Quebec. (August 2013)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Saunders, G. (Presenter & Author), Stevens, J. R. (Author Only), Isom, C. (Author Only), Conference on Applied Statistics in Agriculture, "An Improved FWER-Controlling Method in Gene Ontology Graphs," Kansas State University, Manhattan, Kansas. (April 29, 2013 - April 30, 2013)
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Cox, L. (Presenter & Author), Ward, A. (Author Only), Stevens, J. R. (Author Only), Isom, C. (Author Only), International Plant & Animal Genome XXI, "Gene expression analysis of in vivo and in vitro matured porcine Metaphase II oocytes," San Diego, CA. (January 2013)


Progress 01/01/12 to 12/31/12

Outputs
OUTPUTS: "Activities: The principal investigator (i) mentored a second-year statistics Ph.D. student in statistics, including two special topics courses on statistical methods relevant to this project; (ii) published a manuscript dealing with statistical methods related to this project; (iii) submitted two manuscripts (one dealing with statistical methods, one dealing with assisted reproductive technologies) related to this project; and (iv) submitted a funding proposal as co-investigator related to this project. Events: In May 2012 the principal investigator gave a contributed talk at the Conference on Applied Statistics in Agriculture, hosted by Kansas State University; the audience was approximately 200 statisticians working in agriculture applications; this talk presented work on gene set testing from this project. In June 2012 the principal investigator submitted (as co-investigator) a NIH proposal involving gene expression and methylation with cloned livestock embryos; this proposal focuses on developing a noninvasive assay to assess embryo viability, and is currently under review. In August 2012 the principal investigator submitted a manuscript describing statistical methods for gene set testing; the manuscript is currently under review. In November 2012 the principal investigator resubmitted (as co-author) a revised manuscript dealing with gene expression in livestock embryos from assisted reproductive technologies. Services: The principal investigator and the second-year statistics Ph.D. student provided statistical consulting to the Isom lab at USU, which is employing genomic technologies for animal reproductive studies; the data from the Isom lab motivated the statistical development aspect of this project. The principal investigator and another, first-year statistics Ph.D. student provided statistical consulting to the genotyping core lab at USU's CIB; this consulting was to help identify haplotypes of the MHC class I genes in cattle. The principal investigator also served on the graduate committees for three students in agricultural sciences. Products: A collaboration with agricultural genomics researchers at USU was continued, related to statistical methods in this project, resulting in one published manuscript and two more currently under review." PARTICIPANTS: "The principal investigator is Dr. John R. Stevens, and he worked with second-year statistics Ph.D. student Garrett Saunders (multiple hypothesis testing and gene set testing) and first-year statistics Ph.D. student Darl Flake (haplotype identification) related to this project. The principal investigator also provided statistical support to several genomic studies at USU using statistical methods related to this project. The Center for Integrated BioSystems (CIB) at USU is a partner organization that has provided useful discussions for research related to this project. Collaborators at USU for research related to this project include Clay Isom (ADVS), Abby Benninghoff (ADVS), Joanie Hevel (CHEM), Sean Johnson (CHEM), Anhong Zhou (BIE), Roger Coulombe (ADVS), and Mike Lefevre (NDFS). Non-USU collaborators for research related to this project include Brenda Alexander (U. of Wyoming) and Tod Hansen (Colorado State)." TARGET AUDIENCES: Audience: The results of this project will be of most interest to agricultural genomics researchers, particularly those involved in methodological developments for data analysis. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Change in knowledge: The agricultural statistics community became aware of the possibility of multivariate gene set testing, and also of the need for directionality in gene set testing.

Publications

  • Isom, S. C., Stevens, J. R., Li, R., Spollen, W., Spate, L., Murphy, C., & Prather, R. 2012. Transcriptional profiling by RNA-Seq of peri-attachment porcine embryos generated using a variety of assisted reproductive technologies (ART): Physiological Genomics. (Submitted).
  • Stevens, J. R., & Nicholas, G., (2012). Assessing Numerical Dependence in Gene Expression Summaries with the Jackknife Expression Difference: PLoS ONE, 7(8): e39570. (Published).
  • Stevens, J. R., & Isom, C. 2012. Gene Set Testing to Characterize Multivariately Differentially Expressed Genes. Proceedings of Conference on Applied Statistics in Agriculture. Kansas State University. (Published).
  • Isom, S. C., Stevens, J. R., Li, R., Spollen, W., & Prather, R. S., 2012 Transcriptional profiling by high-throughput sequencing of porcine pre- and peri-implantation embryos. Reproduction, Fertility and Development, USA (P184-184). (Published).