Source: UNIVERSITY OF CALIFORNIA, DAVIS submitted to NRP
JWAS: JULIA IMPLEMENTATION OF WHOLE-GENOME ANALYSES SOFTWARE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1015599
Grant No.
2018-67015-27957
Cumulative Award Amt.
$400,000.00
Proposal No.
2017-05192
Multistate No.
(N/A)
Project Start Date
May 1, 2018
Project End Date
Apr 30, 2022
Grant Year
2018
Program Code
[A1201]- Animal Health and Production and Animal Products: Animal Breeding, Genetics, and Genomics
Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671
Performing Department
Animal Science
Non Technical Summary
Genomic dataare increasingly being incorporated into animal and plant breeding programs to speed up genetic improvement through more accurate genetic evaluations of selection candidates at young ages. Related statistical methods and user-friendlycomputational tools are demanding to implement and understand. We will remove the limitations of currently available software toolsby developing a well-documentedopen-source software platformideal for routine data analyses and "reproducible research" usingall available pedigree, phenotypic and genomic information on multiple traits simultaneouslythatmakes it easy for our community of researchers to participate, document, maintain and extend.The friendly user interface and fast computing speed of our software toolwill provide a powerful convenience for associations, suppliers and researchers in industry and academia.
Animal Health Component
50%
Research Effort Categories
Basic
30%
Applied
50%
Developmental
20%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043910108060%
3043910209030%
3043910208010%
Goals / Objectives
The long-term goal of this project is to develop a single-language software platform ideal for routine data analyses and "reproducible research" in genomic prediction and GWAS using complete or incomplete genomic data ("single-step" methods) that makes it easy for our community of researchers to participate, document, maintain and extend. The objectives are: 1.Extend JWAS to accommodate production versions of single-trait and multi-trait "single-step" Bayesian regression analyses with incomplete genomic data. 2.Improve the computational efficiency of JWAS by implementing various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers). 3.Improve the computational efficiency and reduce the memory requirement of JWAS by implementing parallel Gibbs sampling strategies. 4.Further incorporate into JWAS the multi-core CPU and GPU computing capabilities of Julia. 5. Document source code and examples for JWAS using the interactive Jupyter notebook that is ideal for "reproducible research" . 6. Extend JWAS to accommodate categorical traits.
Project Methods
1. We will incorporate single-trait SSBR into the user interface of JWAS and extend JWAS to accommodate multi-trait SSBR.2. Alternative computational strategies will be implemented for p > n and n > p in JWAS for both complete and incomplete genomic data. We will compare the computational efficiency of these alternative strategies with datasets of various characteristics on different computer hardware platforms.3. Wewill implement the parallel Gibbs sampling strategy in JWAS and compare the performance of these computational strategies.4. Code for multi-core computing in JWAS will be implemented to incorporate into JWAS multi-core CPU and GPU computing capabilities of Julia.5. We will further document the source code and build comprehensive examples for a wide range of models fitted using JWAS and simulated data. This will be achieved using the interactive Jupyter notebook to make JWAS ideal for "reproducible research" and will be easy to maintain and extend. The documentation will contain live open-source code, equations, visualizations and explanatory text. This documentation will also be available for users accessing JWAS through JuliaBox and Jupyterhub, which require no installation and make computations available immediately from the user's browser.6.Currently, JWAS only supports a limited range of models and only for analyses of con- tinuous traits. Bayesian regression methods have been modified for categorical traits.

Progress 05/01/18 to 04/30/22

Outputs
Target Audience:breeders and scientists in industry and academia Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have instructed 3 short courses in which tools and methods developed in this project were taught, including "Whole Genome Analyses Using Julia" from December 3rd to 7th, 2018 at Campus of the TUM School of Life Sciences Weihenstephan, Freising, Germany and "Modern Programming in Genomic Prediction" in 2019 and 2022 at University of California, Davis, US. How have the results been disseminated to communities of interest?Peer-reviewed articles have been published, proving extensive opportunities for interaction with stakeholders (animal breeders and other geneticists from both academia and industry). Talks and posters were presented at conferences. Results (methods and tools) are used through collaborations with associations and companies such as the American Simmental Association. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The long-term goal of this project is to develop a single-language software platform ideal for routine data analyses and "reproducible research" in genomic prediction and GWAS using complete or incomplete genomic data ("single-step" methods) that makes it easy for our community of researchers to participate, document, maintain and extend. The objectives are: 1. Extend JWAS to accommodate production versions of single-trait and multi-trait "single-step" Bayesian regression analyses with incomplete genomic data. 2. Improve the computational efficiency of JWAS by implementing various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers). 3. Improve the computational efficiency and reduce the memory requirement of JWAS by implementing parallel Gibbs sampling strategies. 4. Further incorporate into JWAS the multi-core CPU and GPU computing capabilities of Julia. 5. Document source code and examples for JWAS using the interactive Jupyter notebook that is ideal for "reproducible research" . 6. Extend JWAS to accommodate categorical traits. We have accomplished all objectives.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Vallejo, R. L., Cheng, H., Fragomeni, B. O., Shewbridge, K. L., Gao, G., MacMillan, J. R., et al. (2019). Genome-wide association analysis and accuracy of genome-enabled breeding value predictions for resistance to infectious hematopoietic necrosis virus in a commercial rainbow trout breeding population. Genetics Selection Evolution, 51(1), 114. http://doi.org/10.1186/s12711-019-0489-z
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Runcie, D., & Cheng, H. (2019). Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 (Bethesda, Md.), 9(11), g3.400598.20193741. http://doi.org/10.1534/g3.119.400598
  • Type: Other Status: Published Year Published: 2019 Citation: Marrano, A., Sideli, G. M., Leslie, C. A., Cheng, H., & Neale, D. B. (2019). Deciphering of the Genetic Control of Phenology, Yield, and Pellicle Color in Persian Walnut (Juglans regia L.). Frontiers in Plant Science, 10, 6376. http://doi.org/10.3389/fpls.2019.01140
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Zhao, T., Fernando, R., Garrick, D., Cheng, H. (2020). Fast parallelized sampling of Bayesian regression models for whole-genome prediction. Genetics Selection Evolution, 52(1), 111.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Qu, J., Kachman, S., Fernando, R., Garrick, D., Cheng, H. (2020). Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection or Minor Allele Frequency Filtering. Frontier in Genetics, 11, 18.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Abhilash Dhal, Jiayi Qu, Hao Cheng, Genome-Wide Association Studies Combining Genotyped and Non-Genotyped Relatives Using Bayesian Regression Methods with Mixture Priors, Plant & Animal Genome XXVIII, 2020
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Tianjing Zhao, Rohan Fernando, Dorian Garrick, Hao Cheng, Fast Parallelized Sampling of Bayesian Linear Mixed Models for Whole-Genome Prediction, Plant & Animal Genome XXVIII, 2020
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Jiayi Qu, Stephen Kachman, Rohan Fernando, Dorian Garrick, Hao Cheng, Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection or Minor Allele Frequency Filtering, Plant & Animal Genome XXVIII, 2020
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Wang, Z., Chapman, D., Morota, G., and Cheng, H., 2020, A Multiple-trait Bayesian Variable Selection Regression Method for Integrating Phenotypic Causal Networks in Genome-Wide Association Studies. G3: Genes, Genomes, Genetics. https://doi.org/10.1534/g3.120. 401618
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Vallejo, R., Fragomeni, B., Cheng, H. and Gao, G., Long, R., Shewbridge, K., MacMillan, J., Towner, R., and Palti, Y. , 2020. Assessing Accuracy of Genomic Predictions for Resistance to Infectious Hematopoietic Necrosis Virus with Progeny Testing of Selection Candidates in a Commercial Rainbow trout Breeding Population, Frontiers in Veterinary Science, 7, 939, https://doi.org/10.3389/fvets.2020.590048
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Li, J., Wang, Z., Fernando, R. and Cheng, H., 2021. Tests of association based on genomic windows can lead to spurious associations when using genotype panels with heterogeneous SNP densities. Genetics Selection Evolution, 53, 45. https://doi.org/10.1186/ s12711-021-00638-x
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Zhao, T., Fernando, R., and Cheng, H., 2021. Interpretable Artificial Neural Networks incorporating Bayesian Alphabet Models for Genome-wide Prediction and Association Studies, G3: Genes, Genomes, Genetics. https://doi.org/10.1093/g3journal/jkab228
  • Type: Journal Articles Status: Published Year Published: 2021 Citation: Wang, Z., and Cheng, H., 2021. Single-Trait and Multiple-Trait Genomic Prediction From Multi-Class Bayesian Alphabet Models Using Biological Information. Frontier in Genetics, 12:717457. https://doi:10.3389/fgene.2021.717457
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Chen, C., Garrick, D., Fernando, R, Karaman, E., Stricker, C., Keehan, M., and Cheng, H., 2022, XSim Version 2: Simulation of Modern Breeding Programs, G3: Genes, Genomes, Genetics, jkac032, https://doi.org/10.1093/g3journal/jkac032
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Zhao, T., Zeng, J., and Cheng, H., Extend Mixed Models to Multi-layer Neural Networks for Genomic Prediction Including Intermediate Omics Data, Genetics, 2022, Genetics, 221:1, iyac034, https://doi.org/10.1093/genetics/iyac034
  • Type: Journal Articles Status: Published Year Published: 2022 Citation: Qu, J., Morota, G., and Cheng, H., 2022, A Bayesian random regression method using mixture priors for genome-enabled analysis of time-series high-throughput phenotyping data, The Plant Genome, accepted.


Progress 05/01/20 to 04/30/21

Outputs
Target Audience:breeders and researchers in industry and academia Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Several peer-reviewed articles have been published, proving extensive opportunities for interaction with stakeholders (animal breeders and other geneticists from both academia and industry). What do you plan to do during the next reporting period to accomplish the goals?We will further develop methods and improve the computational efficiency of JWAS by various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers).

Impacts
What was accomplished under these goals? The long-term goal of this project is to develop a single-language software platform ideal for routine data analyses and "reproducible research" in genomic prediction and GWAS using complete or incomplete genomic data ("single-step" methods) that makes it easy for our community of researchers to participate, document, maintain and extend. The objectives are: 1. Extend JWAS to accommodate production versions of single-trait and multi-trait "single-step" Bayesian regression analyses with incomplete genomic data. 2. Improve the computational efficiency of JWAS by implementing various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers). 3. Improve the computational efficiency and reduce the memory requirement of JWAS by implementing parallel Gibbs sampling strategies. 4. Further incorporate into JWAS the multi-core CPU and GPU computing capabilities of Julia. 5. Document source code and examples for JWAS using the interactive Jupyter notebook that is ideal for "reproducible research" . 6. Extend JWAS to accommodate categorical traits. We have extended JWAS to accommodate single-trait and multi-trait "single-step" Bayesian regression analyses for continuous and categorical traits. We have implemented parallel Gibbs sampling strategies to improve computational efficiency and reduce the memory requirement of JWAS. We have documented source code and examples for JWAS. We have implemented Bayesian methods for genome-wide association studies. We have incorporated into JWAS the multi-core CPU capabilities of Julia.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: A Multiple-trait Bayesian Variable Selection Regression Method for Integrating Phenotypic Causal Networks in Genome-Wide Association Studies
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Assessing Accuracy of Genomic Predictions for Resistance to Infectious Hematopoietic Necrosis Virus with Progeny Testing of Selection Candidates in a Commercial Rainbow trout Breeding Population
  • Type: Journal Articles Status: Accepted Year Published: 2021 Citation: Tests of association based on genomic windows can lead to spurious associations when using genotype panels with heterogeneous SNP densities


Progress 05/01/19 to 04/30/20

Outputs
Target Audience:breeders and researchers in industry and academia Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We were supposed toinstructtwoshort courses this year including onein the Sixth International Conference of Quantitative Geneticsand one before the Visions III: Star Gazing into the Galaxy of Animal Genetics and Genomics. However, both are postponed until next year due to COVID-19. How have the results been disseminated to communities of interest?Severalposters were presented at the Plant and Animal Genome Conference in 2020, proving extensive opportunities for interaction with stakeholders (animal breeders and other geneticists from both academia and industry). Several peer-reviewed articles have been published,proving extensive opportunities for interaction with stakeholders (animal breeders and other geneticists from both academia and industry).Results (methods and tools) are used through collaborations with associations and companies such as the American Simmental Association. What do you plan to do during the next reporting period to accomplish the goals?We will further develop methods and improve the computational efficiency of JWAS by various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers), as well as the multi-core CPU and GPU computing capabilities of Julia. We will document source code and examples for JWAS. We will organize and instruct more short courses.

Impacts
What was accomplished under these goals? We haveextended JWAS to accommodate single-trait and multi-trait "single-step" Bayesian regression analyses for continuous and categorical traits. We have implementedparallel Gibbs sampling strategies to improve computational efficiency and reduce the memory requirement of JWAS. We have documented source code and examples for JWAS. We have implemented Bayesian methodsfor genome-wide association studies.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Vallejo, R. L., Cheng, H., Fragomeni, B. O., Shewbridge, K. L., Gao, G., MacMillan, J. R., et al. (2019). Genome-wide association analysis and accuracy of genome-enabled breeding value predictions for resistance to infectious hematopoietic necrosis virus in a commercial rainbow trout breeding population. Genetics Selection Evolution, 51(1), 114. http://doi.org/10.1186/s12711-019-0489-z
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Runcie, D., & Cheng, H. (2019). Pitfalls and Remedies for Cross Validation with Multi-trait Genomic Prediction Methods. G3 (Bethesda, Md.), 9(11), g3.400598.20193741. http://doi.org/10.1534/g3.119.400598
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Marrano, A., Sideli, G. M., Leslie, C. A., Cheng, H., & Neale, D. B. (2019). Deciphering of the Genetic Control of Phenology, Yield, and Pellicle Color in Persian Walnut (Juglans regia L.). Frontiers in Plant Science, 10, 6376. http://doi.org/10.3389/fpls.2019.01140
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Zhao, T., Fernando, R., Garrick, D., Cheng, H. (2020). Fast parallelized sampling of Bayesian regression models for whole-genome prediction. Genetics Selection Evolution, 52(1), 111.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Qu, J., Kachman, S., Fernando, R., Garrick, D., Cheng, H. (2020). Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection or Minor Allele Frequency Filtering. Frontier in Genetics, 11, 18.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Abhilash Dhal, Jiayi Qu, Hao Cheng, Genome-Wide Association Studies Combining Genotyped and Non-Genotyped Relatives Using Bayesian Regression Methods with Mixture Priors, Plant & Animal Genome XXVIII, 2020
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Tianjing Zhao, Rohan Fernando, Dorian Garrick, Hao Cheng, Fast Parallelized Sampling of Bayesian Linear Mixed Models for Whole-Genome Prediction, Plant & Animal Genome XXVIII, 2020
  • Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Jiayi Qu, Stephen Kachman, Rohan Fernando, Dorian Garrick, Hao Cheng, Exact Distribution of Linkage Disequilibrium in the Presence of Mutation, Selection or Minor Allele Frequency Filtering, Plant & Animal Genome XXVIII, 2020


Progress 05/01/18 to 04/30/19

Outputs
Target Audience: associationsand researchers in industry and academia Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have instructed 2 short courses in which tools and methods developed in this project were taught, including "Whole Genome Analyses Using Julia" from December 3rd to 7th, 2018 atCampus of the TUM School of Life Sciences Weihenstephan, Freising, Germany and"Modern Programming in Genomic Prediction" fromJune 24th to 28th 2019 atUniversity of California, Davis, US. How have the results been disseminated to communities of interest?Resutls have been disseminated to comunities of interest through short courses mentioned above and also collaborations with industries. Results (methods and tools) are used through collaborations with associations and companies such as American Simmental Association. What do you plan to do during the next reporting period to accomplish the goals?We will further improve the computational efficiency of JWAS by implementing parallel Gibbs sampling strategies and various computational strategies corresponding to the characteristics of data (e.g. employing different strategies for n > p and p > n, where n and p indicate the number of individuals and the number of markers), as well as the multi-core CPU and GPU computing capabilities of Julia. We will document source code and examples for JWAS. We will organize and instruct more short courses.

Impacts
What was accomplished under these goals? We have extended JWAS to accommodate single-trait and multi-trait "single-step" Bayesian regression analyses with incomplete genomic data. We have extended JWAS to accommodate categorical traits. Some ofcomputational strategies to improve the computational efficiency and reduce the memory requirement have been studied.

Publications