Progress 01/01/10 to 12/31/14
Outputs Target Audience: Target audience is researchers using SAS software for statistical analysis. Focus is on designed experiments, so anyone in academics, business or government generating data from an experimental design is in the target audience. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Researchers can self-train in using SAS for statistical analysis using example datasets and instructions on the DAWG website. How have the results been disseminated to communities of interest? The DAWG website is usually in the top 10 hits from Google searches involving SAS and experimental designs. We have been slow to advertise the website as we are still filling in details, even though the structure is complete. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
This project has accomplished the following for each of the number objectives: 1) danda.sas has been fully developed during this project, with only updates and bug fixes being needed in the final year of the project. danda.sas has been made publically available through the DAWG website and SourceForge, a website home for thousands of open source software projects. 2) The DAWG website offers a step-by-step guide to the use of danda.sas for analysis of a wide variety of experimental designs. The structure of the website is complete, tabs and design module pages are posted. However details such as sample datasets and example outputs are still being filled in, so this objective is about 80% complete. Work will continue even though the project is terminating. 3) Progress continues on creating similar tools for the R software, an increasingly popular free statistics, graphics and data management program, but this objective is only about 25% finished. Realistically an entire project should be devoted to this effort. 4) Each year of the project had two or three large statistical analysis problems that needed substantial computing resources, eg. weeks of computing time and 64+ Gigabytes of memory. These were solved using SAS or R, but solutions were specific to the research data, so general tools can not be produced for general use.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2014
Citation:
Bastin, B. C., A. Houser, C. P. Bagley, K. M. Ely, R. R. Payton, A. M. Saxton, F. N. Schrick, J. C. Waller, and C. J. Kojima. A polymorphism in XKR4 is significantly associated with serum prolactin concentrations in beef cows grazing endophyte-infected tall fescue. Animal Genetics (2014) 45(3): 439-441
- Type:
Websites
Status:
Published
Year Published:
2014
Citation:
http://dawg.utk.edu
|
Progress 10/01/13 to 09/30/14
Outputs Target Audience: Target audience is researchers using SAS software for statistical analysis. Focus is on designed experiments, so anyone in academics, business or government generating data from an experimental design is in the target audience. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Researchers can self-train in using SAS for statistical analysis using example datasets and instructions on the DAWG website. How have the results been disseminated to communities of interest? The DAWG website is usually in the top 10 hits from Google searches involving SAS and experimental designs. We have been slow to advertise the website as we are still filling in details, even though the structure is complete. What do you plan to do during the next reporting period to accomplish the goals? Highest priority is finishing Objective 2, filling in training examples on the DAWG website.
Impacts What was accomplished under these goals?
This project has accomplished the following for each of the numbered objectives: 1) Danda.sas has been completed, with only minor enhancements and bug fixes added this year. 2) The DAWG website structure is complete, tabs and design module pages are posted. However details such as sample datasets and example outputs are still being filled in, so this objective is about 80% complete. 3) Progress continues on creating similar tools for the R software, an increasingly popular free statistics, graphics and data management program, but this objective is only about 25% finished. Realistically an entire project should be devoted to this effort. 4) This objective is for specialized software solutions to huge computing problems associated with data analysis. As is typically the case, this year the challenges were gene expression data and QTL analyses. Publications show one example.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2014
Citation:
Bastin, B. C., A. Houser, C. P. Bagley, K. M. Ely, R. R. Payton, A. M. Saxton, F. N. Schrick, J. C. Waller, and C. J. Kojima. A polymorphism in XKR4 is significantly associated with serum prolactin concentrations in beef cows grazing endophyte-infected tall fescue. Animal Genetics (2014) 45(3): 439-441
- Type:
Websites
Status:
Published
Year Published:
2014
Citation:
http://dawg.utk.edu
|
Progress 01/01/13 to 09/30/13
Outputs Target Audience: Research scientists world-wide using SAS software to statistically analyze their data. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Researchers can engage in self-training by working through the modules on the DAWG website. How have the results been disseminated to communities of interest? The danda.sas file is easily found using a web search, and is freely downloadable. Similarly the DAWG website is openly accessible to anyone with an internet connection. What do you plan to do during the next reporting period to accomplish the goals? Goal 2) Since this will be the final year of the project, plans are to complete all modules within the DAWG website. We also have publications planned to make the resource citable, which may stimulate use. Goal 3) We plan to have a working (but beta version) copy of a macro that will enable easy use of R. This will at least provide a skeleton that can be filled out by future efforts.
Impacts What was accomplished under these goals?
Goal 1) This goal has been accomplished, but error correction and adding options per user requests has continued. Goal 2) The DAWG website has been approximately half finished, with most of the common experimental design modules now online. Goal 3) Developing similar tools as in Goal 1 for the R software package has continued, with orthogonal polynomial capability now functional. This allows researchers to generate contrasts for treatments that are amounts, such as fertilizer treatments 0-150 lbs/acre. Goal 4) A chestnut restoration project resulted in needing to run repeated-measures analysis of variance on a large dataset. Preliminary runs found the analysis took at least 6 days, impractical given the number of variables and multiple reruns typically needed for repeated measures models. An additional complexity was having binary (not normally distributed) variables. Work-arounds were developed to obtain acceptably accurate results within the limits of feasibility.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2013
Citation:
Fallen BD, Hatcher CN, Allen FL, Kopsell DA, Saxton AM, Chen P, Kantartzi SK, Cregan PB, Hyten DL, and Pantalone VR. Soybean Seed Amino Acid Content QTL Detected Using the Universal Soy Linkage Panel 1.0 with 1,536 SNPs. Journal of Plant Genome Sciences (2013) 1 (3): 6879
|
Progress 01/01/12 to 12/31/12
Outputs OUTPUTS: Objective 1. Two versions of the danda.sas software have been fully tested, and are now released on the SourceForge website (https://sourceforge.net/projects/danda/). Version 2.11 is the last version that will run with SAS version 9.2 or earlier, and will only be updated to fix errors. Version 2.12 has been greatly modified to capitalize on SAS version 9.3 changes. The biggest revisions were to switch completely from Proc Mixed to Proc Glimmix for the mixed model analysis of variance computations, and to remove all use of the SAS/Graphics, instead relying upon the new ODS graphics capabilities. The latter may help users reduce the number of products that must be licensed from SAS Institute. Objective 2. The DAWG instructional website has been completely revised to mirror changes in SAS 9.3 output formatting. This was necessary as the previous formatting looks severely dated once users become accustomed to the SAS 9.3 html format. Objective 3. R software continues to grow in popularity, so creating a danda.sas equivalent for this language has become a higher priority. A preliminary step towards this objective has been accomplished, with the translation of one of the danda.sas macros (pdmix) into an R function. This was used in the Ernst et al. 2012 publication cited below. Objective 4. The largest compute application for this reporting period was the calculation of epistatic effects for soybean breeding. Preliminary runs using the EPISTACY SAS program written by JB Holland showed that computation would take approximately one year, clearly not feasible. Optimization of the SAS code produced an approximately 6-fold increase in speed, and other hardware tricks allowed the program to run in one month. This was for additive*additive interactions, but results suggested additive*additive*additive epistasis might be worth exploring. Given the exponential increase in time required (for example 50,000 SNP genotypes would require 50,000 months!!), this was left as an open question. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Frequency of questions and requests concerning the danda.sas program indicate that it continues to gain usage within the agricultural research community. SourceForge downloads last year were 48, a positive indicator given the new and unadvertised nature of that resource. Optimization of statistical software and computing hardware to research data has produced answers for soybean breeding research that would have otherwise been impossible to obtain.
Publications
- Ernest, B.; Gooding, J. R.; Campagna, S. R.; Saxton, A. M. & Voy, B. H. (2012) MetabR: an R script for linear model analysis of quantitative metabolomic data. BMC Research Notes 5: 596. http://dx.doi.org/10.1186/1756-0500-5-596
|
Progress 01/01/11 to 12/31/11
Outputs OUTPUTS: Objective 1. Two versions (2.11 and 2.12) of the DANDA.SAS software have been developed. The 2.12 version is being optimized for SAS 9.3, which has made major changes in default output formats. A SourceForge website (open-source project management site) has been created as a second outlet for the DANDA.SAS software. A book proposal has been sent to SAS Press which will provide examples and usage details for DANDA.SAS. As part of the book development, version 2.12 has been extensively changed to make usage more streamlined. Objective 2. No progress. Objective 3. No progress. Objective 4. The only publication requiring substantial computing is listed, a microarray paper. However, RNAseq experimental data has been generated by one research group within the Institute of Agriculture, producing about 100,000,000 sequence reads. Preliminary analyses were impossible on standard desktop computers. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Objective 1. Interest in DANDA.SAS is growing, as indicated by email questions. It is anticipated that when (if) the book project is published, this Hatch project will be successful at putting useful statistical tools in the hands of the scientific community. Objective 4. Application of statistics and computing to research data is essential in the genomics area, and genomics methods are becoming more widely used in all agricultural disciplines. Research support or collaboration is critical.
Publications
- Hill RD, Gouffon JS, Saxton AM and Su C. 2011. Differential gene expression in the mice infected with distinct Toxoplasma strains. Infection and Immunity 2012 80(3):968-74. Epub 2011 Dec 5.
|
Progress 01/01/10 to 12/31/10
Outputs OUTPUTS: Objective 1. A new version (1.30) of the DANDA.SAS software has been developed. After further testing this version will be posted on the DAWG website. This version has many new capabilities and bug fixes. Objective 2. No progress. Objective 3. A key program in DANDA.SAS has been translated for use in R. Objective 4. As indicated in the publications, several high demand computing problems were addressed. The Wadl paper is a good example, in which about 3 weeks of computing time was needed to compute permutation tests for genetic markers. The Lutz paper required adaptation of a previously written SAS macro for generating contrast statements needed to conduct a diallel analysis. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Objective 1. DANDA.SAS version 1.30 does not require the IML package in SAS, which will allow greater access. Each package in SAS is an additional yearly fee, so some scientists did not have access to IML. Objective 4. Application of current statistical software on computing hardware has enhanced the extraction of scientific meaning from data in several research projects.
Publications
- Kim HY, Stewart TP, Wyatt BN, Siriwardhana N, Saxton AM, Kim, JH. (2010) Gene expression profiles of a mouse congenic strain carrying an obesity susceptibility QTL under obesigenic diets. Genes & Nutrition 5(3), 237-250. 10.1007/s12263-009-0163-0.
- Lutz CG, Armas-Rosales AM, and Saxton AM. Genetic effects influencing salinity tolerance in six varieties of tilapia (Oreochromis) and their reciprocal crosses. Aquaculture Research 2010:1-11.
- Stewart TP, Kim HY, Saxton AM and Kim JH. 2010. Genetic and genomic analysis of hyperlipidemia, obesity and diabetes using (TALLYHO/JngJ x C57BL/6J) F2 mice. BMC Genomics 11:713.
- Wadl PA, Saxton AM, Wang X, Pantalone VR, Rinehart TA and Trigiano RN. 2011. Simple Sequence Repeat (SSR) Markers Associated with Red Foliage in Cornus florida L. Molecular Breeding 27:409-416 10.1007/s11032-011-9551-4.
|
|