Progress 11/01/02 to 10/31/06
Outputs Finite mixture models can uncover heterogeneity due to hidden structure. Quantitative genetics of continuous characters having a finite mixture of Gaussian components was explored. The partition of variance in a mixture, the covariance between relatives under the supposition of an additive genetic model, and the offspring-parent regression were derived. Formulae for assessing the effect of mass selection operating on a mixture were worked out. Expressions for the genetic and phenotypic correlations between mixture and Gaussian traits, and between two mixture traits were derived as well. Semi-parametric procedures for prediction of total genetic value for quantitative traits, that make use of phenotypic and genomic data simultaneously, were developed. The methods focus on the treatment of massive information provided by, e.g., single-nucleotide polymorphisms, which can create a mixture of distributions. It was argued that standard parametric methods for quantitative
genetic analysis cannot handle the multiplicity of potential interactions arising in models with, e.g., hundreds of thousands of markers, and that most of the assumptions required for an orthogonal decomposition of variance are violated in artificial and natural populations. A fully Bayesian method for quantitative genetic analysis of data consisting of ranks of, e. g., genotypes, scored at a series of events or experiments was developed. The model postulates a latent structure, with an underlying variable realized for each genotype or individual involved in the event. The rank observed is assumed to reflect the order of the values of the unobserved variables, i.e., the classical Thurstonian model of psychometrics. A study was conducted to apply finite mixture models to field data for somatic cell scores (SCS) for estimation of genetic parameters. Data were approximately 170,000 test-day records for SCS from first-parity Holstein cows in Wisconsin, USA. Five different models were
applied, each one with an increasing level of complexity. The best model was one for which genetic and permanent environmental variances were heterogeneous, but residual variances were homogeneous. The genetic effects for the two components suggested that SCS from healthy and infected cattle were different traits, with a genetic correlation between high and low SCS of only 0.13. Robust threshold models with multivariate Student's t or multivariate slash link functions were employed to infer genetic parameters of clinical mastitis at different stages of lactation, with each cow defining a cluster of records. The robust fits were compared with that from a multivariate probit model with a pseudo-Bayes factor an an analysis of residuals. Results suggest that clinical mastitis resistance is not the same trait across periods, corroborating earlier findings with probit models.
Impacts This project developed theory and methods for genetic analysis of heterogeneous characters whose statistical distributionn requires the specification of a mixture of distributions, such as somatic cell scores in cattle, gene expression data and traits for which there may be major genes segregating, but without knowing the genotypes involved. Models and methods were applied to dairy cattle records on somatic cell scores from the USA and Norway, and found to have a better performance than standard methods of genetic evaluation. The idea of mixtures also underlies a wide class of semi-parametric methods, which may be useful in conjunction with genomic selection. Application of these methods to livestock populations can increase the accuracy of prediction of genetic merit of animals.
Publications
- D. Gianola, R. L. Fernando and A. Stella. 2006. Genomic assisted prediction of genetic value with semi-parametric procedures. Genetics 173, 1761-1776.
- D. Gianola, B. Heringstad and J. Odegaard. 2006. On the quantitative genetics of mixture characters. Genetics 173, 2247-2255.
- D. Gianola, and H. Simianer. 2006. A thurstonian model for quantitative genetic analysis of ranks: A Bayesian approach. Genetics 174, 1613-1624.
- P. J. Boettcher, D. Caraviello and D. Gianola. 2007. Genetic analysis of somatic cell Scores of Holstein cows with a Bayesian mixture model. Journal of Dairy Science 90, 435-443.
- Y. M. Chang, D. Gianola, B. Heringstad and G. Klemetsdal. 2006. A comparison between multivariate slash, Student-t and probit threshold models for analysis of clinical mastitis in first lactation cows. Journal of Animal Breeding and Genetics 123, 290-300.
|
Progress 01/01/04 to 12/31/04
Outputs Mastitis elevates SCC, inducing a positive correlation between SCS and the disease. Selection against mastitis has focused on genetic evaluations for low level of SCC. An observed SCC can be viewed as drawn from a two component mixture defined by the unknown health status of a cow. A mixture model was developed, assuming that health class membership associated with a test-day record of SCS was fully determined by an underlying variable. The probability of putative mastitis may vary between sub-groups. Further, a baseline SCS may be affected by fixed and random effects. Based on simulations, the model gave unbiased estimates of parameters. We fitted a finite mixture model (FMM) somatic cell counts in goats and compared the fit to that of a standard linear mixed effects model. Bacteriological information was used to assess the ability of the model to classify records as from healthy or infected goats. Data were 4518 observations of SCS and bacterial infection from both
udder halves of 310 goats from 5 herds in Northern Italy. Records were from a complete production season, and were taken monthly from February to November of 2000. Explanatory factors included a three-parameter regression on days in milk; fixed class effects of herd-test-day, parity group, and udder side (left or right); and random effects of goat and udder half within goat. The two-component FMM included a fixed mean for the second component of the model (theoretically corresponding to infected udder halves), and an unknown probability of membership to a given putative infection status. A Bayesian approach was used for the analysis with Gibbs sampling employed to obtain draws from posterior distributions of parameters of interest, The Deviance Information Criterion (DIC) was used to compare the fit of the two models. The FMM yielded a much lower estimate of residual variance than the standard model (1.28 vs. 3.02 SCS2), and a slightly higher estimate for the between-goat variance
(1.79 vs. 1.48). The DIC was much lower for the FMM, indicating a better fit to the data. The FMM was able to classify correctly 60% and 48% of the healthy and infected observations, respectively. This was slightly greater than what would be expected from random classification, but not high enough for useful mastitis diagnosis. Nevertheless, increased precision of genetic evaluation is the goal of applying the FMM, rather than timely and accurate mastitis diagnosis. The results suggest that more research on FMM for SCS is merited and necessary for proper application. Prediction of random effects with mixtures with Gaussian distributions was studied from a non-Bayesian perspective, assuming that location and dispersion parameters are known. The focus was on calculating the best predictor for several models. Coverage included mixture sampling models, and mixtures for the distribution of the random effects. Longitudinal and cross-sectional specifications such as those arising in animal
breeding and genetics, were examined. The best linear predictor and the best linear unbiased predictor were derived for these models.
Impacts Theory and algorithms for genetic analysis of Gaussian mixtures was developed and applied to livestock data, with the primary motivation being mastitis, an udder disease. Application of these procedures to livestock populations can enhance the effectiveness of genetic selection for increased resistance to disease.
Publications
- D. Gianola, J. Odegaard, B. Heringstad, G. Klemetsdal, D. Sorensen, P. Madsen, J. Jensen and J. Detilleux. 2004. Mixture model for inferring susceptibility to mastitis in dairy cattle: a procedure for likelihood-based inference. Genetics, Selection, Evolution 36, 3-27.
- P. J. Boettcher, P. Moroni, G. Pisoni and D. Gianola. 2005. Application of a finite mixture model to somatic cell scores of Italian goats. Journal of Dairy Science 88, 2209-2216.
- D. Gianola. 2005. A primer on prediction of random effects in finite mixture models with Gaussian components. Journal of Animal Breeding and Genetics 122, 145-160 J. Odegaard, J. Jensen, P. Madsen, D. Gianola, G. Klemetsdal and B. Heringstad. 2004. A Bayesian Liability-Normal mixture model for analysis of a continuous mastitis-related trait. Journal of Dairy Science 88, 2652-2659.
|
Progress 01/01/03 to 12/31/03
Outputs The distribution of somatic cell score (SCS) in cows with and without intramammary infection (mastitis) may be different. SCS could be regarded as a mixture of at least two components depending on cow udder health status. A heteroscedastic two-component Bayesian normal mixture model with random effects was developed and implemented via Gibbs sampling. The model was evaluated using simulated data sets. SCS was simulated as a mixture representing two alternative udder health statuses (healthy or mastitic). Animals were assigned randomly to the two components according to the probability of group membership (Pm). Random effects (additive genetic and permanent environment), when included, had identical distributions across mixture components. Posterior probabilities of putative mastitis were estimated for all observations, and model adequacy was evaluated. Fitting different residual variances in the two mixture components seems to cause bias in estimation of parameters.
When the components are difficult to disentangle, so are their residual variances; this biases estimation of Pm and of location parameters of the two underlying distributions. When all variance components were identical across mixture components, the mixture model analyses returned parameter estimates without bias and with a high degree of accuracy. Including random effects in the model significantly increased the probability of correct classification. No sizable differences in probability of correct classification were found between models in which a single animal effect (ignoring relationships) was fitted and models where this effect was split into genetic and permanent environmental components utilizing relationship information. Finite mixture models can separate a heterogeneous population into homogeneous components. These models can be used to classify individuals to the unknown member groups, e.g., using somatic cell scores to assign cows into udder health groups. A simulation
study of a two-component normal mixture model was carried out. Parameters were estimated by maximum likelihood using the EM algorithm. The objective was to evaluate alternative Monte Carlo implementations of the E-step. 400 individuals were randomly assigned into two sub-populations with mixture parameter 75% (P); heritability was 0.2, and the residual variance was homogeneous. The E-step was done with Gibbs sampling. Different lengths of burn-in and of sampling periods were run to study the efficiency of the Monte Carlo implementation. Location parameters and P converged quickly regardless of the length of burn-in and the number of Gibbs samples collected; however, smaller mean-squared errors were observed with longer burn-in. Variance components were less stable, and larger number of Gibbs samples gave smaller mean-squared errors. Longer burn-in at beginning stage of the Monte Carlo implementation and increasing the Gibbs samples collected should result in a better performance of
the algorithm.
Impacts The expectation is that a mixture model may enhance the accuracy of genetic evaluation for resistance to mastitis which, at present, is based on analysis of somatic cell scores, ignoring the type of heterogeneity considered in a mixture specification.
Publications
- J. Odegaard, J. Jensen, P. Madsen, D. Gianola, G. Klemetsdal and B. Heringstad. 2003. Detection of mastitis in dairy cattle by use of mixture models for repeated somatic cell scores: A Bayesian approach via Gibbs sampling. Journal of Dairy Science 86, 3694-3703.
- Y. M. Chang, D. Gianola, J. Odegaard, J. Jensen, P. Madsen, D. Sorensen, G. Klemetsdal and B. Heringstad. 2003. Evaluation of a Monte Carlo EM algorithm for likelihood inference in a finite normal mixture model with random effects. European Association for Animal Production, 54th Annual Meeting, Rome, Italy.
|
|