Progress 03/15/20 to 03/14/21
Outputs Target Audience:The target audience is primarily animal genomics researchers. Collectively our work will increase the power of gene mapping studies, facilitate gene, haplotype, pathway and network-based GWAS, improve genome annotation, provide functional evidence to aid in prioritization of candidate genes, and expedite the identification of functional alleles in the horse. The Camoco framework developed in Objective 4 will be of use to genomics researchers across species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?• Development of computational skills. Objective 1 was developed into the PhD thesis of a boarded large animal internist Dr. Sian Durward-Akhurst. During this project, through classes, mentoring, and working with other students Dr. Durward-Akhurst has developed the necessary computational skills to produce and analyze the 30TB of data that make up this project. • Presentation skills. Dr. Durward-Akhurst has presented these data at several conferences and during our College of Veterinary Medicine graduate student seminar series. • Collaboration. To gain additional cases for these 11 diseases and additional diseases to investigate in the future, Dr. Durward-Akhurst has worked with horse owners and veterinarians. • Grant writing skills. Dr. Durward-Akhurst assisted with and was mentored in the writing additional grants to support the extra cases included in this analysis, as well as 3 grants as principal investigator and an additional 5 grants as co-investigator to follow-up the variants identified in horses with AFIB in 600 Standardbred and 600 Thoroughbred racehorses. • Mentoring. Dr. Durward-Akhurst co-mentored a DVM summer scholar for 2 summers on part of the disease-causing identification work for equine myotonia. Dr. Durward-Akhurst also co-mentoring 2 DVM summer scholar students on: 1) the work exploring the AFIB variants; and 2) the estimation of the false positive rate of the genetic burden variants in 2020. Both students have elected to continue in our lab. One will continue working on the AFIB project and the other is developing the preliminary data required to start developing a genetic variation catalog to improve conservation efforts for endangered raptors. • Additional training. Dr. Durward-Akhurst has received a Morris Animal Foundation fellowship to support her continuing investigation of the AFIB variants. The success of this work has led to 2 clinician-scientist position offers at well respected veterinary schools and a job interview at a 3rd well-respected veterinary school. • PhD student Mr. Jonah Cullen attended the Rocky Mountain Genomics HackCon at the University of Colorado Boulder (July 2019) • Mr. Cullen, worked on a webtool for improving the functional annotation via co-expression networks within a small team • Attendance of the RMGHC provided Mr. Cullen the opportunity to meet and network with both peers and senior investigators from a diverse range of biological and computational backgrounds • Mr Cullen has led the expansion of the biweekly hacky hour to include researchers throughout the veterinary medicine college. • Mr Cullen will attend of the University of Washington's 7th Summer Institute in Statistics for Big Data (SISBID) will provide the opportunity to gain additional skills in big data visualization and machine learning, as well as meet and network (virtually) with peers. How have the results been disseminated to communities of interest?Presentations: Sian Durward Akhurst: Invited presentations • Durward-Akhurst SA. The genetics of cardiac arrhythmias. University of Minnesota Equine Center Research symposium, MN, USA. January 2021. • Durward-Akhurst SA. Cardiology for the horse: a day-to-day approach. 1st Congresso Internacional de Residencia em Medicina Veterinaria, Universidade Federal de Minas Gerais, Brazil. September 2020 Scientific presentations • Durward-Akhurst SA, JR Mickelson, C Stauthammer, ME McCue. Genetic bases of cardiac arrhythmias in the horse. Equine Cardiology Retreat, PA, USA. June 2020. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. The frequency of loss of function variants in the equine population. Equine and NRSP8 workshops, Plant and Animal Genome Conference, CA, USA. January 2020. Abstract presentations • Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey K, Mickelson JR, McCue ME. The frequency of phenotype associated variants in the equine population. Dorothy Havemeyer Equine Genetics Research Retreat, NY, USA. February 2021. Poster presentations • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Plant and Animal Genome Conference, CA, USA. January 2020. Jonah Cullen: • Cullen JN, Schaefer RJ, Durward-Akhurst SA, Mickelson JR, McCue ME. Assessing the impact of sequencing platform on transcriptome assembly, differential expression, and variant discovery in the horse (poster), Plant and Animal Genome Conference, San Diego CA (2020). Rob Schaefer: • Schaefer RJ, Beeson SK, Durward-Akhurst SA, Grantham B, Carey K, Mickelson JR, McCue ME. Whole Genome Imputation in the horse. Plant and Animal Genome Conference, CA, USA. January 2020. Summer Scholar students: • Springer K, Durward-Akhurst SA, Mickelson JR, McCue ME. Investigating loss of function variants in the general equine population. Student Chapter of the American Veterinary Medicine Association Congress. March 2021. • Adam E, Durward-Akhurst SA, Mickelson JR, McCue ME. Investigating and validation of the genetic bases of cardiac arrhythmias including atrial fibrillation in Standardbreds. Boehringer Ingelheim-NIH National Veterinary Scholars Symposium. August 2020. • Springer K, Durward-Akhurst SA, Mickelson JR, McCue ME. Investigating loss of function variants in the general equine population. Boehringer Ingelheim-NIH National Veterinary Scholars Symposium. August 2020. • Mahoney KC, Tate NM, Wanner NM, Durward-Akhurst SA, McCue ME, Mickelson JR, Friedenberg SG, Furrow E. The use of computational (in silico) tools to predict pathogenicity of missense variants in the horse. Boehringer Ingelheim-NIH National Veterinary Scholars Symposium. August 2020. High school students: • Nadkarni I, Durward-Akhurst SA, McCue ME. Computational validation of putative arrhythmia causing variants. Minnesota High School Research Symposium. February 2021. What do you plan to do during the next reporting period to accomplish the goals?Objective 1. Presentations. We will continue to present our data on the variant catalog and highlight the utility of precision medicine approaches for veterinary species. Genetic testing guidelines - we will write up guidelines and recommendations for producing reliable genetic tests that horse owners and veterinarians can use and interpret without concerns about the variants being false positives. Structural variant analysis - the structural variant results will be analyzed to determine the best way to determine the overlap between the identification tools and written up as the first large scale study of structural variants from whole genome sequencing in the horse. Disease-causing variant analysis - The final prioritization and ranking of the variants will be performed and then grants submitted to follow-up on these variants and determine the true disease-causing variants. The atrial fibrillation follow-up is already funded and we expect to produce a publication in the next 18 months. We have successfully recruited an excellent PhD student who will start working on the idiopathic renal hematuria variants. This will likely make up her PhD thesis. Objective 2. Sustain development of the computational tools and publish pipelines and imputation results. Objective 3. Pipeline construction and comparisons between callers Finalize and containerize complete indexing and mapping pipelines Prepare RNA for sequencing from 200+ tissues RNA sequencing on remaining tissues Generate tissue-specific networks as part of the equine expression atlas Develop methods to build cross-tissue networks Present preliminary results at general and equine-specific conferences
Impacts What was accomplished under these goals?
Objective 1. WGS were collected from 534 horses of 44 breeds, including ≥15 horses from each of 10 target breeds (Arabian, Belgian, Clydesdale, Icelandic horse, Morgan, Quarter Horse [QH], Shetland pony, Standardbred, Thoroughbred, Welsh Pony) that represent the major groupings of genetic diversity in domestic horses. The WGS were mapped to the equine reference genome. SNPs and small structural variants Single nucleotide polymorphisms (SNPs) and small structural variants identified using a modified version of the GATK best practices pipeline. Bcftools and GATK-haplotype caller were used to identify the variants and the intersect used for downstream analysis. ANNOVAR and SnpEff were used to predict the functional effect of the variants and then all high impact variants (high by both variant effect predictors, or high by one and moderate by the other) were summed to give the genetic burden in the population and within the target breeds. The number of variants identified was associated with the depth of coverage, and therefore estimated marginal means (EMMEANs) accounting for depth of coverage were used for the analysis. The total genetic burden corrected for the estimated false positive rate is 5,807 variants with each horse carrying on average 846 genetic burden variants (range: 213 - 1,193). Our analysis of the 10 target breeds showed significant differences in the genetic burden between breeds (p <0.001), with the highest average genetic burden in Icelandic horses (755 variants/horse) and lowest in Thoroughbreds (585 variants/horse). horses. Reported locations of causative and associated variants for equine phenotypes were extracted from the Online Mendelian Inheritance in Animals catalogue (https://omia.org/home/). There were a reported 34 disease-causing, 68 disease associated, 50 non-disease causing, and 4 non-disease-associated. We identified between 34% and 91% of these variants in our cohort. Large structural variants Structural variants were identified using BreakDancer (insertions, deletions, duplications, inversions, and intra-chromosomal translocations [SVs] and inter-chromosomal translocations [CTXs]), cn.MOPS (copy number variants [CNVs]), DELLY (SVs and CTXs), and GenomeSTRiP (SVs and CNVs) with each tool's base settings. Python was used to extract the exact intersect (based on matching break points at the 5' and 3' end of the variant), the union, and the 50% overlap (CNVs and SVs) of the variant callers for each structural variant type. Between 288,802 and 14,842,349 SVs, 1,120,501 and 10,019,966 CTXs, and 135,988 and 370,220 CNVs were identified. Analysis of the overlap between structural variant callers is ongoing and a publication is being put together with expected submission by the end of this year. We have identified and performed WGS (12-20 x coverage) on 28 horses representing 11 highly detrimental genetic diseases in the horse that are analogous to human Mendelian diseases. Variants were then further prioritized based on: 1) their frequency in the catalog of equine genetic variation that we developed (cut offs of minor allele frequency ≤5% and ≤based on Mendelian disease variant frequencies based on Hardy-Weinberg equilibrium [q]); 2) their presence in genes reported to be intolerant to damaging variants; 3) computationally predicted to have a high impact on phenotype; and 4a) presence in all disease cases (AFIB, IRH, myotonia); or 4b) following a recessive, de novo, or dominant inheritance pattern in the offspring compared to the parent (alopecia areata, microphthalmia). Objective 2. See 2020 report for more details. Objective 3. As part of objective 3, we have developed two publicly available containerized workflows, 1) IndexForTheFuture (https://github.com/jonahcullen/IndexForTheFuture), and 2) RNAMapping (https://github.com/UMN-EGGL/RNAMapping). IndexForTheFuture was designed to ensure the stable and reproducible generation of reference genome indices directly from the NCBI and Ensembl servers to be used for mapping RNA-seq data. We are currently improving this pipeline to run entirely within our publicly available Docker container (https://hub.docker.com/repository/docker/jonahcullen/ec3index). This pipeline generates both STAR and Salmon genome indices for any specified NCBI or Ensembl annotation release. The RNAMapping pipeline is capable of processing RNA-seq data regardless of platform or sequencing strategy from raw FASTQs through the generation of a multi-sample, non-redundant transcriptome (i.e. improving the physical annotation) and transcript/gene-level quantification. Moreover, the output from this workflow, may be used to improve functional annotation via generation of tissue-specific co-expression networks. Seven additional horses were sacrificed and tissues collected in fall of 2020 after COVID delays. Based on the previously RNA-sequenced tissue samples and available frozen tissue sets, we designed a sound strategy to maximize the number and diversity of tissues to sequence. RNA is currently being isolated and mRNA and small RNA libraries will be generated and sequenced in summer 2021.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Hisey EA, Hermans H, Lounsberry Z, Avila F, Knickelbein K, Durward-Akhurst SA, McCue ME, Kalbfleisch T, Lassaline ME, Back W, Bellone RR. (2020). Whole genome sequencing identified a 16 kilobase deletion on ECA13 associated with distichiasis in Friesian horses. BMC Genomics. 21, 848
- Type:
Journal Articles
Status:
Submitted
Year Published:
2021
Citation:
Durward-Akhurst SA, Schaefer RJ, Grantham B, Carey WK, Mickelson JR, McCue ME. Genetic variation and the distribution of variant types in the horse. Submitted to Genome Research April 2021
- Type:
Journal Articles
Status:
Submitted
Year Published:
2021
Citation:
Durward-Akhurst SA, Schaefer RJ, Springer K, Grantham B, Carey WK, Mickelson JR, McCue ME. The genetic burden and frequency of disease-associated variants in the equine population. Submitted to Nature Genetics April 2021.
|
Progress 03/15/19 to 03/14/20
Outputs Target Audience:The target audience is primarily animal genomics researchers. Collectively our work will increase the power of gene mapping studies, facilitate gene, haplotype, pathway and network-based GWAS, improve genome annotation, provide functional evidence to aid in prioritization of candidate genes, and expedite the identification of functional alleles in the horse. The Camoco framework developed in Objective 4 will be of use to genomics researchers across species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?• Development of computational skills. Objective 1 was developed into the PhD thesis of a boarded large animal internist Dr. Sian Durward-Akhurst. During this project, through classes, mentoring, and working with other students Dr. DurwardAkhurst has developed the necessary computational skills to produce and analyze the 30TB of data that make up this project. • Presentation skills. Dr. Durward-Akhurst student has presented these data at several conferences and during our College of Veterinary Medicine graduate student seminar series. • Collaboration. To gain additional cases for these 10 diseases and additional diseases to investigate in the future, Dr. Durward-Akhurst has worked with horse owners and veterinarians. • Grant writing skills. Dr. Durward-Akhurst assisted with and was mentored in the writing additional grants to support the extra cases included in this analysis, as well as 3 grants as principal investigator to follow-up the variants identified in horses with AFIB in 600 Standardbred racehorses. • Mentoring. Dr. Durward-Akhurst co-mentored a DVM summer scholar for 2 summers on part of the disease-causing identification work for equine myotonia. Dr. Durward-Akhurst is also co-mentoring 3 new students on submitting summer scholar proposals to work on the follow-up to the AFIB disease-causing variants. • Postdoctoral student Dr. Robert Schaefer led a hack-a-thon team at the Rocky Mountain Genome Hack-a-thon where we implemented additional features in COB. Results from these tools were presented at several meetings and conferences (below). • PhD student Mr. Jonah Cullen attended the Rocky Mountain Genomics HackCon at the University of Colorado Boulder (July 2019) • Mr. Cullen, worked on a webtool for improving the functional annotation via co-expression networks within a small team • Attendance of the RMGHC provided Mr. Cullen the opportunity to meet and network with both peers and senior investigators from a diverse range of biological and computational backgrounds How have the results been disseminated to communities of interest?Publications: All publications in preparation are listed in the text above. Presentations: Rob Schaefer: • (Invited Talk) UCD School of Agriculture and Food Science, Dublin Ireland. Integrating GWAS SNPs with CoExpression Networks to Prioritize Candidate Genes in Complex Traits • (hack-a-thon) Rocky Mountain Hack-a-thon. Boulder, CO. Team Lead: COB. • (poster) Maize Genetics Conference, St. Louis, MO. The transcriptional landscape of diverse maize genotypes • (poster) Plant and Animal Genomics Conference, San Diego, CA. Cloud Scalable Computational Tools for the Horse Genome • (talk/demo) Havemeyer Horse Genetics Workshop, Pavia, Italy. Processing tens of millions of genotypes with HapDab and analyzing tissue specific gene co-expression networks with Camoco. • (hack-a-thon) Equine Gene Annotation Hack-Con. Host. St. Paul, MN. • (hack-a-thon) Rocky Mountain Hack-a-thon. Boulder, CO. • (hack-a-thon) Mozilla Global Code Sprint. Host. St. Paul, MN. • (Software demo) Plant and Animal Genome Conference, San Diego, CA. Identifying High Priority Candidate Genes from GWAS using Co-Expression Networks. • (Invited Talk) Bio5 Institute and Cyverse. Tucson AZ. Using Co-expression networks to unravel gene function in agricultrual species. Sian Durward Akhurst: • Durward-Akhurst SA. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. University of Minnesota Supercomputing Institute seminar, Minneapolis, MN. October 2018. Scientific presentations • Durward-Akhurst SA. Using the genome to improve disease diagnosis. ACVIM Forum, Phoenix, Arizona, USA. June 2019. • Durward-Akhurst SA. Identification of disease-causing variants for pituitary dwarfism in Quarter Horses. Genofling, University of Minnesota Genomics Center, Minneapolis, USA. April 2019. Finalist Genopitch. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Equine and NRSP8 workshops, Plant and Animal Genome Conference, CA, USA. January 2020. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Tools for Precision Medicine in the Horse. AVMA/AVMF Young Investigator Award competition, Worcester State University, MA, USA. July 2019. Finalist. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Lightning talk: Plant and Animal Genome Conference, San Diego, USA. January 2019. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Plant and Animal Genome Conference, CA, USA. January 2020. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Calgary International Equine Symposium: Innovation and Discovery, Calgary, AB, Canada. September 2019. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. ACVIM Forum, Phoenix, Arizona, USA. June 2019. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Plant and Animal Genome Conference, San Diego, USA. January 2019. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Calgary International Equine Symposium: Innovation and Discovery, Calgary, AB, Canada. September 2018. 1 st place Graduate Student Poster award. • Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Single nucleotide polymorphism in the equine population. Minnesota Supercomputing Institute research day, University of Minnesota, Minneapolis, MN, USA. April 2018. 2nd place Biological and Medical Sciences category Jonah Cullen: • Cullen JN, Schaefer RJ, Beeson S, Mickelson JR, McCue ME. RNA-seq driven gene annotation in the horse (poster), Plant and Animal Genome Conference, San Diego CA (2018). • Cullen JN, Schaefer RJ, Mickelson JR, McCue ME. RNA-seq workflow for improved physical and functional annotation of the equine genome (poster), Calgary International Equine Symposium, Calgary Alberta (2019). • Cullen JN, Schaefer RJ, Durward-Akhurst SA, Mickelson JR, McCue ME. Assessing the impact of sequencing platform on transcriptome assembly, differential expression, and variant discovery in the horse (poster), Plant and Animal Genome Conference, San Diego CA (2020). Summer Scholar students: • Ellingson L, Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Identification of genetic variants associated with myotonia in the horse. Morris Animal Foundation Scientific Advisory Board Meeting, Denver, Colorado, USA. June 2018. 2 nd place Exemplary Summer Scholar award ? • Ellingson L, Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Identification of genetic variants associated with myotonia in the horse. Boehringer Ingelheim-NIH National Veterinary Scholars Symposium, National Institutes of Health MSU, USA. August 2017. What do you plan to do during the next reporting period to accomplish the goals?Objective 1. • Structural variant analysis - the structural variant results will be analyzed and written up as the first large scale study of structural variants from whole genome sequencing in the horse. • Disease-causing variant analysis - The final prioritization and ranking of the variants will be performed and then grants submitted to follow-up on these variants and determine the true disease- causing variants. • Publications on the: 1) genetic variation in the equine population; 2) frequency of previously reported disease-causing variants in the equine population; and 3) improvement in number of variants identified with high allele frequencies between EquCab2 and EquCab3 will be written and submitted. Objective 2. • Sustain development of the computational tools and publish pipelines and imputation results. Objective 3. • In Q1 2020, we plan to sample the 20 tissue types from 6 horses and send to the sequencing core for library preparation and sequencing. RNA-seq data will be processed, assembled, and analyzed in Q2 and Q3, followed by the initial manuscript submission in Q4. ?• Website development to begin in Q4 and finish in 2021.
Impacts What was accomplished under these goals?
Objective 1. WGS were collected from 534 horses of 49 breeds, including ≥15 horses from each of 10 target breeds (Arabian, Belgian, Clydesdale, Icelandic horse, Morgan, Quarter Horse [QH], Shetland pony, Standardbred, Thoroughbred, Welsh Pony) that represent the major groupings of genetic diversity in domestic horses. The WGS were mapped to the equine reference genome, and single nucleotide polymorphisms (SNPs) and small structural variants identified using a modified version of the GATK best practices pipeline. Bcftools and GATK-haplotype caller were used to identify the variants and the intersect used for downstream analysis. ANNOVAR and SnpEff were used to predict the functional effect of the variants and then all high impact variants (high by both variant effect predictors, or high by one and moderate by the other) were summed to give the genetic burden in the population and within the target breeds. 29,882,273 variants were identified; these included 28,273,058 SNPs and 1,609,215 small structural variants. The total genetic burden is 8,683 variants with each horse carrying on average 2,409 genetic burden variants (range 538 - 3,674). Our analysis of the 10 target breeds showed significant differences in the genetic burden between breeds (p <0.001), with the highest average genetic burden in Icelandic horses (2,906 variants/horse) and lowest in Morgans (2,037 variants/horse). This provides the first large-scale catalog of genetic variation and estimation of the genetic burden in healthy horses. Larger structural variants have been called with Breakdancer, cn.MOPS, Delly, and GenomeSTRiP and analysis is ongoing. Publications on the: 1) genetic variation in the equine population; 2) frequency of previously reported disease-causing variants in the equine population; and 3) improvement in number of variants identified with high allele frequencies between EquCab2 and EquCab3 are in progress. ?We have identified and performed WGS (12-20 x coverage) on 24 horses representing 10 highly detrimental genetic diseases in the horse that are analogous to human Mendelian diseases. These diseases include: 5 skeletal muscle diseases (dystrophic and non-dystrophic myotonia, eosinophilic myositis, malignant hyperthermia (MH) without the RYR1 C7360G allele, and hyperkalemic periodic paralysis (HYPP) without the SCN4A F1416L allele); alopecia areata, that causes severe hair loss and must be managed with corticosteroids that can lead to laminitis; hemochromatosis, an iron storage disease that leads to liver failure; complete bilateral absence of the vas deferens causing stallion infertility; paroxysmal atrial fibrillation (AFIB), the most common pathologic arrhythmia in horses and may contribute to sudden cardiac death; idiopathic renal hematuria (IRH), a life threatening disease of Arabs; and microphthalmia, a performance limiting disease of Sport Horses. WGS has been mapped to the reference genome and alleles identified using the pipeline developed in objective 1. Variants have been prioritized based on the reported disease prevalence (AFIB) or a presumed disease prevalence of 1% given that the diseases are considered rare in the general horse population and assuming that they follow recessive inheritance patterns. Variants with an allele frequency in the catalog of genetic variation greater than expected based on disease prevalence were excluded. Variants were then further prioritized based on: 1) their presence in genes reported to be intolerant to damaging variants; 2) computationally predicted to have a high impact on phentotype; and 3) presence in all disease cases (AFIB, IRH, myotonia); or 4) following a recessive, de novo, or dominant inheritance pattern in the offspring compared to the parent (alopecia areata, microphthalmia). Final ranking using the American College of Medical Genetics guidelines for disease-causing variant prioritization is ongoing. The long-term goal is to validate the likely variants by sequencing them in a larger population of individuals to ensure segregation of the allele with the disease phenotype, and to develop genetic tests for easy disease diagnosis and identification of carriers to assist breeders with decreasing the frequency of clinical disease in their herd. Objective 2. A comprehensive and reproducible computational pipeline was designed to facilitate the creation of a genotype imputation and haplotype reference panel in the horse. Segments of the pipeline were defined using the workflow management system 'Snakemake', and the computational environment wascontainerized using both Docker as well as Singularity. Containers are publicly available on both DockerHub and the Singularity Container Library. Source code to build the pipeline from source is available at our public laboratory git repository (https://github.com/UMN-EGGL). Whole genome sequence from 549 horses were mapped to EquCab3 using both GATK Haplotype Caller as well as bcftools 'mpileup' to produce joint genotyping calls across every base pair in the genome. SNPs from both callers were assessed for baseline quality control and variant quality scores were recalibrated using 6 shared features among GATK and bcftools variant callers. Gaussian mixture models were fit for both callers using training data derived from SNPs sets defined from the recent MNEc2M SNP genotyping array. Using a true positive probability threshold of 99.5%, recalibrated scores were used to combine SNPs from GATK and bcftools into a combined set of 20.8M SNPs representing high confidence, whole-genome SNP reference panel for the horse (WGSNP). Variants were phased in 50Kb windows using Beagle 5.1. Estimations for imputation accuracies from MNEc2M:WGSNP were calculated using leave-1 out cross validation resulting in between 85% and 99.8% accuracy under naïve imputation scenarios in over 11 horse breeds. Objective 3. As part of objective 3, we have developed two publicly available containerized workflows, 1) IndexForTheFuture (https://github.com/jonahcullen/IndexForTheFuture), and 2) RNAMapping (https://github.com/UMN-EGGL/RNAMapping). IndexForTheFuture was designed to ensure the stable and reproducible generation of reference genome indices directly from the NCBI and ENSEMBL servers to be used for mapping RNA-seq data. The RNAMapping workflow is capable of processing RNA-seq data regardless of platform or sequencing strategy from raw FASTQs through the generation of a multisample, non-redundant transcriptome (i.e. improving the physical annotation) and transcript/gene-level quantification. Moreover, the output from this workflow, may be used to improve functional annotation via generation of tissue-specific co-expression networks. Through containerization, usage of these workflow removes the variability associated with both lab-specific computational infrastructure (e.g. differences in computing platforms from equine research groups at various universities) and genome release. Additionally, we have begun analyzing the differences in the physical annotation (via transcriptomes), differential expression, and functional annotation (via co-expression networks) observed between samples sequenced on different platforms. Expected results will serve to better inform how to consider platformspecific variance and merge data from multiple sources in the continued development of the equine tissue expression atlas. In addition to data resources, several major contributions were made to general purpose, bioinformatics computational tools. Major features were implemented for storing biological data using Minus80, an open source bioinformatics Python library. Minus80 has had 14 minor releases and 1 major release throughout the duration of this grant.
Publications
|
Progress 03/15/18 to 03/14/19
Outputs Target Audience:The target audience is primarily animal genomics researchers. Collectively our work will increase the power of gene mapping studies, facilitate gene, haplotype, pathway and network-based GWAS, improve genome annotation, provide functional evidence to aid in prioritization of candidate genes, and expedite the identification of functional alleles in the horse. The Camoco framework developed in Objective 4 will be of use to genomics researchers across species. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Development of computational skills. Objective 1 was developed into the PhD thesis of a boarded large animal internist Dr. Sian Durward-Akhurst. During this project, through classes, mentoring, and working with other students Dr. Durward-Akhurst has developed the necessary computational skills to produce and analyze the 30TB of data that make up this project. Presentation skills. Dr. Durward-Akhurst student has presented these data at several conferences and during our College of Veterinary Medicine graduate student seminar series. Collaboration. To gain additional cases for these 10 diseases and additional diseases to investigate in the future, Dr. Durward-Akhurst has worked with horse owners and veterinarians. Grant writing skills. Dr. Durward-Akhurst assisted with and was mentored in the writing additional grants to support the extra cases included in this analysis, as well as 3 grants as principal investigator to follow-up the variants identified in horses with AFIB in 600 Standardbred racehorses. Mentoring. Dr. Durward-Akhurst co-mentored a DVM summer scholar for 2 summers on part of the disease-causing identification work for equine myotonia. Dr. Durward-Akhurst is also co-mentoring 3 new students on submitting summer scholar proposals to work on the follow-up to the AFIB disease-causing variants. Postdoctoral student Dr. Robert Schaefer led a hack-a-thon team at the Rocky Mountain Genome Hack-a-thon where we implemented additional features in COB. Results from these tools were presented at several meetings and conferences (below). PhD student Mr. Jonah Cullen attended the Rocky Mountain Genomics HackCon at the University of Colorado Boulder (July 2019) Mr. Cullen, worked on a webtool for improving the functional annotation via co-expression networks within a small team Attendance of the RMGHC provided Mr. Cullen the opportunity to meet and network with both peers and senior investigators from a diverse range of biological and computational backgrounds How have the results been disseminated to communities of interest?Publications: All publications in preparation are listed in the text above. Presentations: Rob Schaefer: (Invited Talk) UCD School of Agriculture and Food Science, Dublin Ireland. Integrating GWAS SNPs with CoExpression Networks to Prioritize Candidate Genes in Complex Traits (hack-a-thon) Rocky Mountain Hack-a-thon. Boulder, CO. Team Lead: COB. (poster) Maize Genetics Conference, St. Louis, MO. The transcriptional landscape of diverse maize genotypes (poster) Plant and Animal Genomics Conference, San Diego, CA. Cloud Scalable Computational Tools for the Horse Genome (talk/demo) Havemeyer Horse Genetics Workshop, Pavia, Italy. Processing tens of millions of genotypes with HapDab and analyzing tissue specific gene co-expression networks with Camoco. (hack-a-thon) Equine Gene Annotation Hack-Con. Host. St. Paul, MN. (hack-a-thon) Rocky Mountain Hack-a-thon. Boulder, CO. (hack-a-thon) Mozilla Global Code Sprint. Host. St. Paul, MN. (Software demo) Plant and Animal Genome Conference, San Diego, CA. Identifying High Priority Candidate Genes from GWAS using Co-Expression Networks. (Invited Talk) Bio5 Institute and Cyverse. Tucson AZ. Using Co-expression networks to unravel gene function in agricultrual species. Sian Durward Akhurst: Durward-Akhurst SA. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. University of Minnesota Supercomputing Institute seminar, Minneapolis, MN. October 2018. Scientific presentations Durward-Akhurst SA. Using the genome to improve disease diagnosis. ACVIM Forum, Phoenix, Arizona, USA. June 2019. Durward-Akhurst SA. Identification of disease-causing variants for pituitary dwarfism in Quarter Horses. Genofling, University of Minnesota Genomics Center, Minneapolis, USA. April 2019. Finalist Genopitch. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Equine and NRSP8 workshops, Plant and Animal Genome Conference, CA, USA. January 2020. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Tools for Precision Medicine in the Horse. AVMA/AVMF Young Investigator Award competition, Worcester State University, MA, USA. July 2019. Finalist. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Lightning talk: Plant and Animal Genome Conference, San Diego, USA. January 2019. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Plant and Animal Genome Conference, CA, USA. January 2020. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Loss of function variants in the equine population. Calgary International Equine Symposium: Innovation and Discovery, Calgary, AB, Canada. September 2019. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. ACVIM Forum, Phoenix, Arizona, USA. June 2019. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Plant and Animal Genome Conference, San Diego, USA. January 2019. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Genetic variation and the frequency of deleterious mutations (genetic burden) in healthy horses. Calgary International Equine Symposium: Innovation and Discovery, Calgary, AB, Canada. September 2018. 1 st place Graduate Student Poster award. Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Single nucleotide polymorphism in the equine population. Minnesota Supercomputing Institute research day, University of Minnesota, Minneapolis, MN, USA. April 2018. 2nd place Biological and Medical Sciences category Jonah Cullen: Cullen JN, Schaefer RJ, Beeson S, Mickelson JR, McCue ME. RNA-seq driven gene annotation in the horse (poster), Plant and Animal Genome Conference, San Diego CA (2018). Cullen JN, Schaefer RJ, Mickelson JR, McCue ME. RNA-seq workflow for improved physical and functional annotation of the equine genome (poster), Calgary International Equine Symposium, Calgary Alberta (2019). Cullen JN, Schaefer RJ, Durward-Akhurst SA, Mickelson JR, McCue ME. Assessing the impact of sequencing platform on transcriptome assembly, differential expression, and variant discovery in the horse (poster), Plant and Animal Genome Conference, San Diego CA (2020). Summer Scholar students: Ellingson L, Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Identification of genetic variants associated with myotonia in the horse. Morris Animal Foundation Scientific Advisory Board Meeting, Denver, Colorado, USA. June 2018. 2 nd place Exemplary Summer Scholar award Ellingson L, Durward-Akhurst SA, Schaefer RJ, Mickelson JR, McCue ME. Identification of genetic variants associated with myotonia in the horse. Boehringer Ingelheim-NIH National Veterinary Scholars Symposium, National Institutes of Health MSU, USA. August 2017. What do you plan to do during the next reporting period to accomplish the goals?Objective 1. Structural variant analysis - the structural variant results will be analyzed and written up as the first large scale study of structural variants from whole genome sequencing in the horse. Disease-causing variant analysis - The final prioritization and ranking of the variants will be performed and then grants submitted to follow-up on these variants and determine the true disease- causing variants. Publications on the: 1) genetic variation in the equine population; 2) frequency of previously reported disease-causing variants in the equine population; and 3) improvement in number of variants identified with high allele frequencies between EquCab2 and EquCab3 will be written and submitted. Objective 2. Sustain development of the computational tools and publish pipelines and imputation results. Objective 3. In Q1 2020, we plan to sample the 20 tissue types from 6 horses and send to the sequencing core for library preparation and sequencing. RNA-seq data will be processed, assembled, and analyzed in Q2 and Q3, followed by the initial manuscript submission in Q4. Website development to begin in Q4 and finish in 2021.
Impacts What was accomplished under these goals?
Objective 1. WGS were collected from 534 horses of 49 breeds, including ≥15 horses from each of 10 target breeds (Arabian, Belgian, Clydesdale, Icelandic horse, Morgan, Quarter Horse [QH], Shetland pony, Standardbred, Thoroughbred, Welsh Pony) that represent the major groupings of genetic diversity in domestic horses. The WGS were mapped to the equine reference genome, and single nucleotide polymorphisms (SNPs) and small structural variants identified using a modified version of the GATK best practices pipeline. Bcftools and GATK-haplotype caller were used to identify the variants and the intersect used for downstream analysis. ANNOVAR and SnpEff were used to predict the functional effect of the variants and then all high impact variants (high by both variant effect predictors, or high by one and moderate by the other) were summed to give the genetic burden in the population and within the target breeds. 29,882,273 variants were identified; these included 28,273,058 SNPs and 1,609,215 small structural variants. The total genetic burden is 8,683 variants with each horse carrying on average 2,409 genetic burden variants (range 538 - 3,674). Our analysis of the 10 target breeds showed significant differences in the genetic burden between breeds (p <0.001), with the highest average genetic burden in Icelandic horses (2,906 variants/horse) and lowest in Morgans (2,037 variants/horse). This provides the first large-scale catalog of genetic variation and estimation of the genetic burden in healthy horses. Larger structural variants have been called with Breakdancer, cn.MOPS, Delly, and GenomeSTRiP and analysis is ongoing. Publications on the: 1) genetic variation in the equine population; 2) frequency of previously reported disease-causing variants in the equine population; and 3) improvement in number of variants identified with high allele frequencies between EquCab2 and EquCab3 are in progress. ?We have identified and performed WGS (12-20 x coverage) on 24 horses representing 10 highly detrimental genetic diseases in the horse that are analogous to human Mendelian diseases. These diseases include: 5 skeletal muscle diseases (dystrophic and non-dystrophic myotonia, eosinophilic myositis, malignant hyperthermia (MH) without the RYR1 C7360G allele, and hyperkalemic periodic paralysis (HYPP) without the SCN4A F1416L allele); alopecia areata, that causes severe hair loss and must be managed with corticosteroids that can lead to laminitis; hemochromatosis, an iron storage disease that leads to liver failure; complete bilateral absence of the vas deferens causing stallion infertility; paroxysmal atrial fibrillation (AFIB), the most common pathologic arrhythmia in horses and may contribute to sudden cardiac death; idiopathic renal hematuria (IRH), a life threatening disease of Arabs; and microphthalmia, a performance limiting disease of Sport Horses. WGS has been mapped to the reference genome and alleles identified using the pipeline developed in objective 1. Variants have been prioritized based on the reported disease prevalence (AFIB) or a presumed disease prevalence of 1% given that the diseases are considered rare in the general horse population and assuming that they follow recessive inheritance patterns. Variants with an allele frequency in the catalog of genetic variation greater than expected based on disease prevalence were excluded. Variants were then further prioritized based on: 1) their presence in genes reported to be intolerant to damaging variants; 2) computationally predicted to have a high impact on phentotype; and 3) presence in all disease cases (AFIB, IRH, myotonia); or 4) following a recessive, de novo, or dominant inheritance pattern in the offspring compared to the parent (alopecia areata, microphthalmia). Final ranking using the American College of Medical Genetics guidelines for disease-causing variant prioritization is ongoing. The long-term goal is to validate the likely variants by sequencing them in a larger population of individuals to ensure segregation of the allele with the disease phenotype, and to develop genetic tests for easy disease diagnosis and identification of carriers to assist breeders with decreasing the frequency of clinical disease in their herd. Objective 2. A comprehensive and reproducible computational pipeline was designed to facilitate the creation of a genotype imputation and haplotype reference panel in the horse. Segments of the pipeline were defined using the workflow management system 'Snakemake', and the computational environment was containerized using both Docker as well as Singularity. Containers are publicly available on both DockerHub and the Singularity Container Library. Source code to build the pipeline from source is available at our public laboratory git repository (https://github.com/UMN-EGGL). Whole genome sequence from 549 horses were mapped to EquCab3 using both GATK Haplotype Caller as well as bcftools 'mpileup' to produce joint genotyping calls across every base pair in the genome. SNPs from both callers were assessed for baseline quality control and variant quality scores were recalibrated using 6 shared features among GATK and bcftools variant callers. Gaussian mixture models were fit for both callers using training data derived from SNPs sets defined from the recent MNEc2M SNP genotyping array. Using a true positive probability threshold of 99.5%, recalibrated scores were used to combine SNPs from GATK and bcftools into a combined set of 20.8M SNPs representing high confidence, whole-genome SNP reference panel for the horse (WGSNP). Variants were phased in 50Kb windows using Beagle 5.1. Estimations for imputation accuracies from MNEc2M:WGSNP were calculated using leave-1 out cross validation resulting in between 85% and 99.8% accuracy under naïve imputation scenarios in over 11 horse breeds. Objective 3. As part of objective 3, we have developed two publicly available containerized workflows, 1) IndexForTheFuture (https://github.com/jonahcullen/IndexForTheFuture), and 2) RNAMapping (https://github.com/UMN-EGGL/RNAMapping). IndexForTheFuture was designed to ensure the stable and reproducible generation of reference genome indices directly from the NCBI and ENSEMBL servers to be used for mapping RNA-seq data. The RNAMapping workflow is capable of processing RNA-seq data regardless of platform or sequencing strategy from raw FASTQs through the generation of a multi- sample, non-redundant transcriptome (i.e. improving the physical annotation) and transcript/gene-level quantification. Moreover, the output from this workflow, may be used to improve functional annotation via generation of tissue-specific co-expression networks. Through containerization, usage of these workflow removes the variability associated with both lab-specific computational infrastructure (e.g. differences in computing platforms from equine research groups at various universities) and genome release. Additionally, we have begun analyzing the differences in the physical annotation (via transcriptomes), differential expression, and functional annotation (via co-expression networks) observed between samples sequenced on different platforms. Expected results will serve to better inform how to consider platform- specific variance and merge data from multiple sources in the continued development of the equine tissue expression atlas. In addition to data resources, several major contributions were made to general purpose, bioinformatics computational tools. Major features were implemented for storing biological data using Minus80, an open source bioinformatics Python library. Minus80 has had 14 minor releases and 1 major release throughout the duration of this grant.
Publications
|
Progress 03/15/17 to 03/14/18
Outputs Target Audience:
Nothing Reported
Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
See 2019 report for updated information
Publications
|
|