Source: UNIVERSITY OF MARYLAND submitted to
A COMPREHENSIVE SOFTWARE PACKAGE FOR GENE MAPPING IN ANIMAL PEDIGREES IN THE POST-GENOME ERA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
NEW
Funding Source
Reporting Frequency
Annual
Accession No.
0209252
Grant No.
2007-35205-17883
Project No.
MDR-2006-04839
Proposal No.
2006-04839
Multistate No.
(N/A)
Program Code
43.0
Project Start Date
Jan 15, 2007
Project End Date
Jan 14, 2010
Grant Year
2007
Project Director
O`Connell, J.
Recipient Organization
UNIVERSITY OF MARYLAND
(N/A)
BALTIMORE,MD 21201
Performing Department
(N/A)
Non Technical Summary
Artificial insemination (AI) organizations are ready to embrace the promise of the new genomics to guide animal improvement programs, replacing population genetics principles with genome-wide selection to identify young bulls to enter progeny testing programs. Currently over $30 million is invested in these programs with only a 1 in 10 success rate. Identifying DNA sequence variants that help predict the success of young bulls in progeny programs will lead to substantial savings by the industry and enhance international competitiveness. A major obstacle that we face to identifying these variants is that analyses on large complex pedigrees arising from animal improvement programs are often intractable, even for small numbers of genotypes, let alone for the thousands currently available. The purpose of this project is to develop new analytical methods, computational algorithms and software programs for the animal genetics research community that can effectively handle the large number of genotypes. Our primary approach is to exploit the fact that genetic markers that are physically close in the genome tend to be highly correlated with regard to their haplotype structure, which are the particular DNA variants an offspring inherits from each parent. This correlation, called linkage disequilibrium, allows accurate reconstruction of the gene flow in a pedigree, to significant reduce the computational complexity of the problem.
Animal Health Component
(N/A)
Research Effort Categories
Basic
60%
Applied
40%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043499108070%
3043499208030%
Goals / Objectives
Genetics is being transformed in the post-genome era by unprecedented advance in genotyping technology. The number of single nucleotide polymorphisms (SNPs) available to for genome-wide analyses has increased a 1000-fold in just four years, reaching the one million mark. This increase will continue to accelerate as we move to whole-genome sequencing. These SNPs are thought to represent the majority of the functional variation within the genome, and thus offer unprecedented opportunities to map loci that control economically important quantitative traits of interest to animal improvement programs. The promise of the new genomics to replace expensive population-based methods of selection by DNA-based methods will lead to substantial economic savings and increase viability of breeding programs. A major obstacle that we face to realizing this promise is the computational complexity of analyzing large numbers of genetic markers on the large complex pedigrees arising from animal improvement programs. The change in genotype landscape requires new analysis methods and software tools that can scale with the genotyping technology. The objective of this project is to develop the methods and software tools necessary to tackle these computational issues. Our objective is to enhance the analysis of genome-wide SNP data on animal pedigrees for genome-wide selection and quantitative trait nucleotide (QTN) and quantitative trait loci (QTL) mapping. These software tools will be freely available to the research community.
Project Methods
Our approach to reduce the complexity of the genotype space prior to analysis involves two parts. The first part is how we handle missing genotype data. Generally we must assign an individual without genotype data all possible genotypes, which increases the complexity exponentially. To reduce this impact we will implement a novel genotype recoding algorithm that collapses alleles that do not have observable parent-offspring transmission into a single super allele. This recoding preserves the likelihood of the pedigree, while reducing the number of genotype states. The second part is how we handle missing phase information between heterozygous genotypes, which determine the possible haplotype combinations. The number of phase combinations is two to the number of genotypes. While the number of phases grows exponentially in the number of genotypes, very few of these combinations are actually biologically plausible. Genotypes that are physically close in the genome tend to be highly correlated with regard to their haplotype structure. This correlation, called linkage disequilibrium, will concentrate the posterior probability distribution of the genotype data of an individual on very few haplotype combinations. We plan to build an efficient computational engine that exploits the genotype recoding and linkage disequilibrium by dividing the large number of markers into smaller, more computationally tractable sets of overlapping markers. We plan to sample from the posterior probability distributions of the smaller sets, which will tend to be concentrated on only a few states, to then build the distribution over the larger sets. We expect that the linkage disequilibrium between the overlapping markers will reduce the number of long-range haplotype possibilities as we scaffold the short-range haplotypes. The reduction in computational complexity will allow more sophisticated genetic models to be applied to the data.