Single-Step Bayesian Method for Genomic Prediction That Combines Information From Genotyped and Non-Genotyped Animals

SINGLE-STEP BAYESIAN METHOD FOR GENOMIC PREDICTION THAT COMBINES INFORMATION FROM GENOTYPED AND NON-GENOTYPED ANIMALS

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1005495

Grant No.

2015-67015-22947

Cumulative Award Amt.

$350,000.00

Proposal No.

2014-05926

Multistate No.

(N/A)

Project Start Date

Feb 1, 2015

Project End Date

Jan 31, 2019

Grant Year

2015

Program Code

[A1201]- Animal Health and Production and Animal Products: Animal Breeding, Genetics, and Genomics

Recipient Organization
IOWA STATE UNIVERSITY
2229 Lincoln Way
AMES,IA 50011

Performing Department
Animal Science

Non Technical Summary
Genomic data are increasingly being used for genetic evaluation of livestock populations, resulting in more accurate evaluations on selection candidates even before any phenotypes on these animals can be recorded. This has been shown to result in faster genetic improvement in many livestock populations. At present, genomic information is available only on a small subset of the animals, and methods for genomic evaluation must combine information from animals that have both genomic and phenotypic data with information from animals that have only phenotypic data. The current method of genomic evaluation that combines information from genotyped and non-genotyped animals is becoming computationally infeasible as its computational burden is proportional to the cube of n, where n is the number of animals with genomic data. To address this problem, we have developed a method that has a computational burden proportional to n. In this project, this method will be extended to combine information from multiple traits. Further, strategies to improve its computational efficiency and accuracy will be investigated. Data from real livestock populations will be used to compare the performance of this approach with that currently used.

Animal Health Component

50%

Research Effort Categories

Basic

(N/A)

Applied

50%

Developmental

50%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
303	7310	1080	100%

Knowledge Area
303 - Genetic Improvement of Animals;

Subject Of Investigation
7310 - Experimental design and statistical methods;

Field Of Science
1080 - Genetics;

Keywords

bayesian regression

genomic selection

whole-genome association

whole-genome prediction

Goals / Objectives
The long-term goal of this project is to develop software that enables use of single-step Bayesian regression (SSBR) analyses for multi-trait genomic prediction in large pedigrees with incomplete genotype data, using all available phenotypic, genotypic, and pedigree data. The same software will also be useful for making inferences on location and magnitude of causal loci in genome-wide association studies (GWAS). The overall objective of the research proposed here is to develop and implement computationally efficient strategies for multi-trait SSBR analyses with large pedigrees.This will be accomplished by:1) Extending SSBR to accommodate multi-trait analyses.2) Improving the computational efficiency of SSBR by comparing alternative MCMC strategies to sample marker effects in parallel.3) Improving the accuracy of SSBR by comparing the effect of alternative methods to impute the missing covariates.4) Comparing the computional efficiency and accuracy of the SSBR with SSBLUP in real livestock populations.

Project Methods
1) Bayesian regression has been extended to accomodate multi-trait analyses and to combine data from genotyped and non-genotyped animals. These extensions will be combined to provide an SSBR method that accommodates multiple traits. The advantage of using multi-trait analyses will be studied by comparing the accuracy of genomic selection using single-trait analyses.2) The computational efficiency of two methods for sampling marker effects in parallel will be compared to the current method that samples marker effects in sequence.3) The method of imputation used in SSBR, best linear prediction, does not optimally use all available information. Other methods of imputation that capture more of the available information will be adapted for SSBR. The accuracy of genomic selection will be used to compare these approaches.4) Data from real livestock populations will be used to compare SSBR and SSBLUP based on accuracy of genomic selection and efficiency of computation.

Progress 02/01/15 to 01/31/19

Outputs
Target Audience:Students and research personnel from academia and industry working in animal breeding and genetics with an emphasis on the analysis of genomic data, especially genomic selection. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?Presentations at scientific conferences, seminars to research groups, workshops on genomic evaluation, publications of journal articles. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Statistical and computational methods were developed for efficient use of Bayesian regression analyses with multi-trait genomic data. The methods developed under this project have been implemented in two software packages. The first is an open-source package (JWAS) that is written in the Julia programming language, which is aimed at the academic and research communities. This package is used in courses and workshopsfor genomic analyses. The second is a package called BOLT that is suitable for national and international genomic evaluations, which was developed by Theta Solutions. BOLT is used in the US by the America Hereford Association for its single-step North American Hereford evaluation, and International Genetic Solutions uses BOLT for the largest North American single-step beef cattle evaluation, which involves multiple traits, multiple countries and multiple breeds.It is also used by chicken, pig and dairy cattle organizations. As a result of the developments of this project, several groups in Europe are considering the adoption of Bayesian regression as an alternative to genomic best linear unbiased prediction (GBLUP) for their genomic evaluation and selection programs. Further, techniques and ideas from this project have been the basis for new developments in genomic methodology by other research groups. As part of this project, we have also shown how Bayesian regression is useful for locating of genes that are important for agriculturally important traits. Finally, we have compared Bayesian regression with GBLUP. These studies have contributed to a better understanding of GBLUP and its limitations, and we have proposed how to overcome some of these limitations.

Publications

Type: Journal Articles Status: Published Year Published: 2018 Citation: Toosi, A., Fernando, R.L., Dekkers, J.C.M. 2018. Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis. Genetics Selection Evolution 50 (1), 32.
Type: Journal Articles Status: Published Year Published: 2018 Citation: Zeng, J., Garrick, D., Dekkers, J., Fernando, R. 2018. A nested mixture model for genomic prediction using whole-genome SNP genotypes. {PloS} one 13 (3), e0194683
Type: Journal Articles Status: Published Year Published: 2018 Citation: Cheng, H., Kizilkaya, K., Zeng, J., Garrick, J., Fernando, R. 2018. Genomic prediction from multiple-trait {Bayesian} regression methods using mixture priors. Genetics, 209, 89-103
Type: Journal Articles Status: Published Year Published: 2018 Citation: Yang, J., Ramamurthy, R.K., Qi, X., Fernando, R.L., Dekkers, J.C.M., Garrick, D.J., Nettleton, D., Schnable, P.S. and others. 2018. Empirical comparisons of different statistical models to identify and validate kernel row number-associated variants from structured multi-parent mapping populations of Maize. G3: Genes, Genomes, Genetics. 8 (11), 3567--3575.
Type: Books Status: Published Year Published: 2018 Citation: Saatchi, M, Fernando, RL, Hyde, L, Atkins, J, McGuire, S, Shafer, W, Spangler, ML, Golden, B. 2018. Empirical progeny equivalent of genotyped animals in a multi-breed beef cattle genetic evaluation using a single-step Bayesian regression model. Iowa State University, Animal Industry Report 664 (1) 25
Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Cheng, H., Fernando, R., and Garrick, D. 2018. JWAS: Julia implementation of Whole-genome Analyses Software. Proceedings of the 11th World Congress on Genetics Applied to Livestock Production
Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Fernando, R. and Gianola, D. 2018. Bayesian inference of genomic similarity among individuals from markers and phenotypes. Proceedings of the 11th World Congress on Genetics Applied to Livestock Production
Type: Journal Articles Status: Published Year Published: 2019 Citation: Gianola, Daniel; Fernando, Rohan L; Garrick, Dorian J. 2019. A certain invariance property of BLUE in a whole?genome regression context. Journal of Animal Breeding and Genetics, 136,2,113-117
Type: Journal Articles Status: Published Year Published: 2019 Citation: Westhues, Matthias; Heuer, Claas; Thaller, Georg; Fernando, Rohan; Melchinger, Albrecht E. 2019. Efficient genetic value prediction using incomplete omics data,Theoretical and Applied Genetics 1-12. Springer Berlin Heidelberg
Type: Journal Articles Status: Published Year Published: 2019 Citation: Weng, Ziqing; Wolc, Anna; Su, Hailin; Fernando, Rohan L; Dekkers, Jack CM; Arango, Jesus; Settar, Petek; Fulton, Janet E; OSullivan, Neil P; Garrick, Dorian J. 2019. Identification of recombination hotspots and quantitative trait loci for recombination rate in layer chickens. Journal of animal science and biotechnology,10,1,20. BioMed Central

Progress 02/01/17 to 01/31/18

Outputs
Target Audience:Research workers in animal breeding and genetics Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One PhD student and one postdoctoral associate worked on this research. How have the results been disseminated to communities of interest?Two invited presentations and four peer-reviewed publications. What do you plan to do during the next reporting period to accomplish the goals?Study the application of mlutiple-trait SSBR to large data sets. Publish paper on parallel computing strategies for SSBR. Compare alternative strategies for imputing missing genoytpes.

Impacts
What was accomplished under these goals? IMPACT STATEMENT: To facilitate faster genetic improvement in livestock populations, we are researching new methods of genomic evaluation in large pedigrees with incomplete genotype data. This project year, we increased knowledge of using these methods for gene discovery in analyses involving large numbers of genotypes. Use of the proposed approach would lead to discovery of more genes controlling traits of economic importance because our approach does not have the disadvantage of incurring the multiple-test penalty. Other research groups are adopting this approach. We have also increased our understanding of how to obtain evaluations that are unbiased and have improved accuracy by including a term for the genetic mean in the model. This feature is already being applied in the beef cattle evaluations. Long-term goal) One of the advantages of SSBR (single-step Bayesian regression)is that it allows inference based on Bayesian posterior probabilities. We used theory and computer simulation to demonstrate that the proportion of false positives among significant results can be controlled by using Bayesian posterior probabilities for inference. The advantage of this approach is that, unlike inference based on controlling the genomewise error rate, use of Bayesian posterior probabilities to control the rate of false positives among significant results does not incur a multiple-test penalty, which is increasingly important for GWAS that may involve millions of tests. Objective 1) Extending SSBR to accommodate multi-trait analyses. Nothing to report this period. Objective 2) Improving the computational efficiency of SSBR by comparing alternative MCMC strategies to sample marker effects in parallel. An efficient method was developed for leave-one-out cross validation in parallel. Objective 3)Improving the accuracy of SSBR by comparing the effect of alternative methods to impute the missing covariates. When genotypes are observed for all individuals, centering genotype covariates has no effect of genomic prediction; this is not the case when some genotypes are missing. Single-step genomic prediction is used to combine information from genotyped and non-genotyped animals. In this situation, observed genotypes should be centered using genotype means from unselected founders, which may not be available. We proposed an alternative analysis that does not require centering genotypes but instead fits the mean of unselected individuals as an unobserved fixed effect. Computer simulation was used to demonstrate the improved accuracy of this analysis in a population undergoing selection. Objective 4) Comparing the computional efficiency and accuracy of the SSBR with SSBLUP in real livestock populations. Nothing to report this period.

Publications

Type: Journal Articles Status: Published Year Published: 2017 Citation: Fernando, R., Toosi, A., Wolc, A., Garrick, D., & Dekkers, J. (2017) Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach. Journal of Agricultural, Biological and Environmental Statistics, 22(2), 172193. http://doi.org/10.1007/s13253-017-0277-6
Type: Journal Articles Status: Published Year Published: 2017 Citation: Fernando, R. L., Cheng, H., Sun, X., & Garrick, D. J. (2017). A comparison of identity-by-descent and identity-by-state matrices that are used for genetic evaluation and estimation of variance components. Journal of Animal Breeding and Genetics, 134(3), 213223. http://doi.org/10.1111/jbg.12275
Type: Journal Articles Status: Published Year Published: 2017 Citation: Cheng, H., Garrick, D. J., & Fernando, R. L. (2017). Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction. Journal of Animal Science and Biotechnology, 8(1), 38. http://doi.org/10.1186/s40104-017-0164-6
Type: Journal Articles Status: Published Year Published: 2017 Citation: Hsu, W.-L., Garrick, D. J., & Fernando, R. L. (2017). The Accuracy and Bias of Single-Step Genomic Prediction for Populations Under Selection. G3 (Bethesda, Md.), 7(8), 26852694. http://doi.org/10.1534/g3.117.043596

Progress 02/01/16 to 01/31/17

Outputs
Target Audience:Research workers in animal breeding and genetics Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One PhD student and two postdoctoral associates worked on this research. How have the results been disseminated to communities of interest?One workshop, two invited presentations and five peer-reviewed publications. What do you plan to do during the next reporting period to accomplish the goals?Complete the research on objectives 2 and 3, and submit manuscripts on these results.

Impacts
What was accomplished under these goals? Overall impact statement: In order to develop softwarethat enables use of single-step Bayesian regression (SSBR) analyses for multi-trait genomic prediction in large pedigrees with incomplete genotype data, we worked on developing computationally efficient strategies for the analysis. This reporting period, two approaches were used for genomic selection. The first uses a genomic relationship matrix in a breeding value model for prediction of genetic merit (GBLUP). The second uses genotypes as covariates in a random-regression model for prediction (RRBLUP). We showed that the genomic relationship matrix, G, is singular when the number of animals exceeds the number of markers used in computing genomic relationships. In this situation, the inverse of G, which is needed for genomic prediction by GBLUP, is not defined. We developed a theory for exact best linear unbiased prediction when G is singular. It can be shown that addressing the problems associated with the singular G matrix results in a hybrid model that fits a random-regression model for animals with genotypes and a breeding value model for animals without genotypes. Strategies were developed for efficient genomic prediction with large numbers of genotyped and non-genotyped animals using the hybrid model. Objective 1) Extending SSBR to accommodate multi-trait analyses. Current multi-trait, Bayesian-regression methods assume a locus has an effect on all traits or on none of them. This assumption was relaxed to allow a locus to have an effect on any subset of the traits. This has been implemented in Julia and applied to real data. A manuscript describing the method and the results is being revised following review. Objective 2) Improving the computational efficiency of SSBR by comparing alternative MCMC strategies to sample marker effects in parallel. A paper that shows the advantages of parallel sampling of marker effects is being prepared for publication. Objective 3) Improving the accuracy of SSBR by comparing the effect of alternative methods to impute the missing covariates. LDMIP method of imputation combines linkage disequilibrium and linkage information for imputation. When this method was used in a chicken pedigree for imputation of missing genotypes and the residual from imputation was ignored, prediction accuracies were higher than SSBR with BLP imputation. This approach is being further investigated to see if this advantage of imputation with LDMIP holds up in other situations. Objective 4) Comparing the computational efficiency and accuracy of the SSBR with SSBLUP in real livestock populations. Results from a comparison of methods shows that for some traits SSBR gives better results that SSBLUP. A paper has been published with these results.

Publications

Type: Journal Articles Status: Published Year Published: 2016 Citation: Karaman, E., Cheng, H., Firat, M. Z., Garrick, D. J., & Fernando, R. L. (2016). An Upper Bound for Accuracy of Prediction Using GBLUP. PLoS ONE, 11(8), e016105418. http://doi.org/10.1371/journal.pone.0161054
Type: Journal Articles Status: Published Year Published: 2016 Citation: Sun, X., Fernando, R., & Dekkers, J. (2016). Contributions of linkage disequilibrium and co-segregation information to the accuracy of genomic prediction. Genetics Selection &.
Type: Journal Articles Status: Published Year Published: 2016 Citation: Fernando, R. L., Cheng, H., & Garrick, D. J. (2016). An efficient exact method to obtain GBLUP and single-step GBLUP when the genomic relationship matrix is singular. Genetics Selection Evolution, 48(1), 80. http://doi.org/10.1186/s12711-016-0260-7
Type: Journal Articles Status: Published Year Published: 2016 Citation: Fernando, R. L., Cheng, H., Golden, B. L., & Garrick, D. J. (2016). Computational strategies for alternative single-step Bayesian regression models with large numbers of genotyped and non-genotyped animals. Genetics Selection Evolution, 48(1), 96. http://doi.org/10.1186/s12711-016-0273-2
Type: Journal Articles Status: Published Year Published: 2017 Citation: Lee, J., Cheng, H., Garrick, D., Golden, B., Dekkers, J., Park, K., et al. (2017). Comparison of alternative approaches to single-trait genomic prediction using genotyped and non-genotyped Hanwoo beef cattle. Genetics Selection Evolution, 49(1), 2. http://doi.org/10.1186/s12711-016-0279-9

Progress 02/01/15 to 01/31/16

Outputs
Target Audience:Scientists working in the area of Genomic Prediction Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?PhD and postdoctoral students worked on this research. How have the results been disseminated to communities of interest? Four workshops on genomic prediction were conducted in 2015 Two papers have been submitted What do you plan to do during the next reporting period to accomplish the goals?Submit a paper on the comparison of a new model for multi-trait analysis with the current model. Investigate strategies for sampling marker effects in parallel. Compare computational efficiency and accuracy of the single-step Bayesian regression (SSBR) withsingle-step Best Linear Unbiased Prediction (SSBLUP) in real livestock populations.

Impacts
What was accomplished under these goals? An efficient method for genomic prediction that combines information from multiple traits in large pedigrees with incomplete genotype data was implemented in the Julia programming language. The computational burden associated with the implemented method increases linearly with the number of animals contributing to the analysis. This implementation is useful to study the performance of the proposed method. Objective 1) Extending SSBR to accommodate multi-trait analyses. Current multi-trait, Bayesian-regression methods assume a locus has an effect on all traits or on none of them. This assumption was relaxed to allow a locus to have an effect on any subset of the traits. This has been implemented in Julia and applied to real data. Objective 2) Improving the computational efficiency of SSBR by comparing alternative MCMC strategies to sample marker effects in parallel. This objective will be addressed during year 2 and 3 Objective 3) Improving the accuracy of SSBR by comparing the effect of alternative methods to impute the missing covariates. LDMIP method of imputation combines linkage disequilibrium and linkage information for imputation. When this method was used in a chicken pedigree for imputation of missing genotypes and the residual from imputation was ignored, prediction accuracies were higher than SSBR with BLP imputation. Further investigation is needed to see if this advantage of imputation with linkage disequilibrium multilocus iterative peeling (LDMIP) holds up in other situations. Objective 4) Comparing the computational efficiency and accuracy of the SSBR with SSBLUP in real livestock populations. This objective will be addressed in year 3.

Publications