Source: UNIV OF MINNESOTA submitted to
DISCOVERING CAUSAL VARIANTS FOR COMPLEX DISEASE USING FUNCTIONAL NETWORKS IN THE HORSE
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
1008816
Grant No.
2016-67012-24841
Project No.
MIN-62-G01
Proposal No.
2015-03476
Multistate No.
(N/A)
Program Code
A7201
Project Start Date
Feb 15, 2016
Project End Date
Feb 14, 2018
Grant Year
2016
Project Director
Schaefer, R. J.
Recipient Organization
UNIV OF MINNESOTA
(N/A)
ST PAUL,MN 55108
Performing Department
Veterinary Population Medicine
Non Technical Summary
The recent availability of high throughput technologies in agricultural animals provides an opportunity to advance our understanding of complex, agriculturally important traits. Genome wide association studies have identified thousands of loci linked to agriculturally important traits; however in most cases the causal gene remains unknown. Assessing a single data type can often miss complex models that require variation across multiple levels of biological regulation. Integrating several sources of unbiased, genomic information allows for efficient ranking of interesting candidate regions discovered by GWAS.We propose building tools to integrate available sources of genomic data in the horse to build a multi-staged data integration model for prioritization of QTL candidate genes. Using these tools, we will investigate Equine Metabolic Syndrome (EMS) in a disease specific (case-control) meta-dimensional model by integrating whole genome SNP data, muscle and adipose RNAseq, and metabolomic data from horses phenotyped for EMS.We hypothesize an integrated, network based approach will better explain the genotype to phenotype relationship of EMS than any single dataset alone. Linking phenotype to causal genes is critical to understanding the biology underlying traits, and in the context of disease, the identification of potential preventative measures and therapeutic targets.The results of this study have the potential to substantially expand our understanding of the molecular and genetic factors that contribute to the pathophysiology of EMS, and improve our ability to predict disease risk. Furthermore, since this approach is generalizable to any phenotype of interest, our long term goal is to develop tools that allow integration of genomic and other high-dimensional datasets to better understand complex phenotypic traits and extend them to other agricultural animals. We will deploy these tools using iAnimal as a developmental platform ensuring any research group generating association data will be able to use our tools.
Animal Health Component
50%
Research Effort Categories
Basic
(N/A)
Applied
100%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3043810108050%
3043810208050%
Goals / Objectives
Our long term goal is to develop tools that allow integration of genomic and other high-dimensional datasets in agricultural animals to better understand complex phenotypic traits. Our goal in this proposal is to build tools that facilitate the linking genotype to phenotype in the domestic horse. Professional development goalsThe research outlined in this post-doctoral fellowship will allow me to capitalize on the skills I learned during my PhD and to expand my computational and experimental skill sets by developing tools to integrate across multiple high-throughput data types, and to apply network integration to a complex multivariate phenotype characterizing equine metabolic syndrome. To date, with my undergraduate degree in computer science and PhD in computational biology, my training has been largely computationally focused. This post-doctoral fellowship will grant me an opportunity to enhance my understanding and skills in experimental biology by working closely with my mentor, who is trained as an experimental biologist. In addition to scientific training, I will also use this period of support to continue to develop other professional skills critical to success in academia. It is my overarching, professional goal to approach genomics in an interdisciplinary sense by closely working with experimental experts in the field to develop the best computational approaches possible, implemented specifically with the organism in mind.Professional objective 1: I will write scientific proposals to address the biologic and computational next steps from this work to appropriate agencies (e.g. NIFA, Morris Animal Foundation, NSF), with the input of my mentor/mentoring committee - and I will receive formal training in grant and manuscript writing through short courses offered at the University of Minnesota.Professional objective 2: I will expand my teaching skills I will enroll in two courses through the Preparing Future Faculty program, this coursework includes a mentored teaching experience as well as other professional development.Professional objective 3: I will take advantage of the opportunity to learn techniques related to sample preparation and management and basic molecular biology: DNA and RNA isolation, PCR, gel electrophoresis, restriction fragment length polymorphism (RFLP) assay design, Sanger sequencing, long-range PCR, cloning of PCR fragments for sequencing, etc. Research Goals This research outlined in this post-doctoral fellowship will allow me to capitalize on the skills I learned during my PhD and to expand my computational and experimental skill sets by developing tools to integrate across multiple high-throughput data types, and to apply network integration to a complex multivariate phenotype characterizing equine metabolic syndrome.In Objective 1 we will use available sources of genomic data to build a multi-staged data integration model for prioritization of QTL candidate genes applicable to any phenotype of interest in the horse using the Camoco (CoAnalysis of Molecular Components) computational framework. Further, since this approach is generalizable to any phenotype of interest, we propose using iAnimal as a developmental platform meaning any research group generating association data will be able to use our tools. In Objective 2 we will extend this general tool incorporating three additional datasets to investigate Equine Metabolic Syndrome (EMS). Whole genome SNP data, muscle and adipose tissue RNAseq, and metabolomic data from horses phenotyped for EMS will be used to build disease specific (case-control) co-expression and co-metabolite networks. These networks will then be integrated with genotypic (SNP) variation to construct a meta-dimensional EMS model.We expect that an integrated, network based approach will better explain the genotype to phenotype relationship of EMS. This project will provide a valuable tool for the agricultural community, especially for species with modest genetic resources. In addition, it will offer insights into the genetic architecture of EMS.
Project Methods
2 million SNP markers for 513 horses representing 15 breeds chosen to represent distinct phylogenetic clades will be computationally phased using 3 sets of trios built into the design. Breed specific haplotype length will be calculated. Marker to haplotype mappings will be stored in a SQL database and an application programmable interface (API) will be developed using the general purpose programming language, Python3.4.Co-expression networks will be built from Illumina 100bp paired end RNA-Seq in 80 individuals and 2 different tissues in addition to 6 individuals in 11 tissues (n=226 separate tissue samples). Network information will be imported into SQL databases using the existing tools available in the open source, network analysis software, Camoco. Co-expression networks provide an intuitive, unbiased functional look into gene sets, such as those generated by SNP-to-gene mappings.Efforts: Both pipelines and subsequent databases will be deployed on iAnimal servers using the available Atmosphere cloud service which provides access to iAnimal's core infrastructure resources including high performance computing environments and access to easily link in web based interfaces. Material for a workshop will be created to teach the community how to use the tools we develop.

Progress 02/15/16 to 02/14/18

Outputs
Target Audience:The recent availability of high throughput technologies in agricultural animals provides an opportunity to advance our understanding of complex, agriculturally important traits. Genome wide association studies have identified thousands of loci linked to agriculturally important traits; however in most cases the causal gene remains unknown. Assessing a single data type can often miss complex models that require variation across multiple levels of biological regulation. Integrating several sources of unbiased, genomic information allows for efficient ranking of interesting candidate regions discovered by GWAS. We propose building tools to integrate available sources of genomic data in the horse to build a multi-staged data integration model for prioritization of QTL candidate genes. Using these tools, we will investigate Equine Metabolic Syndrome (EMS) in a disease specific (case-control) meta-dimensional model by integrating whole genome SNP data, muscle and adipose RNAseq, and metabolomic data from horses phenotyped for EMS. We hypothesize an integrated, network based approach will better explain the genotype to phenotype relationship of EMS than any single dataset alone. Linking phenotype to causal genes is critical to understanding the biology underlying traits, and in the context of disease, the identification of potential preventative measures and therapeutic targets. The results of this study have the potential to substantially expand our understanding of the molecular and genetic factors that contribute to the pathophysiology of EMS, and improve our ability to predict disease risk. Furthermore, since this approach is generalizable to any phenotype of interest, our long term goal is to develop tools that allow integration of genomic and other high-dimensional datasets to better understand complex phenotypic traits and extend them to other agricultural animals. We will deploy these tools using Cyverse as a developmental platform ensuring any research group generating association data will be able to use our tools. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Objective 1: Grant writing and scientific proposals During the course of this fellowship and under the guidance of Dr. McCue, I've had several internal and external grant proposals successfully funded. The Functional Annotation of the Animal Genome consortium (FAANG) supported by travel to the Plant and Animal genome conference in 2016 ($700). A successfully funded grant to the University of Minnesota Informatics Institute (UMII) supported travel for Dr. McCue and I to present work from this project at the Cold Spring Harbor Network Biology Conference in January 2017 ($4,406). Finally, a travel grant submitted to the Microbial and Plants Genetics and Genomics Institute (MPGI) at the University of Minnesota supported travel to St. Louis to present computational methods at the Maize Genetics Conference ($200). Several non-travel grants were also awarded during the project period. In collaboration with Dr. McCue, I wrote an internal UMII "updraft" grant which was funded for $5,000 to support a undergraduate data wrangler whom I directly oversaw. I was also closely involved in writing key sections as a co-investigator for two USDA grants with Dr. McCue. One was for a Genome Tools grant titled, "Tools to Link Genotype to Phenotype in the Horse" which was successfully funded through USDA-NIFA in 2017, and the other was for an Animal Health grant titled, "Functional Prioritization of Candidate Genes and Alleles for Equine Metabolic Syndrome" funded through USDA-AFRI in 2017. Exposure to these processes has set me up for future submissions to both internal and external funding sources. Objective 2: Professional development and teaching To fulfill this portion of my professional development, I took on several leadership roles throughout the course of this project. I was involved with several programs sponsored by Mozilla, a tech non-profit dedicated to advocating for strong leaders in open source and open science (https://www.mozilla.org/en-US/about/). In addition to being invited to participate in their internal, bi-annual meeting (see above), I also received training through their Open Leadership training program which focused on how to organize and manage open source and open science projects. After completing this training I accepted the role as a mentor for other open science projects during the next round of training where I directly mentored two students running open science projects. Based on my development as an open science advocate I was asked to serve as a figshare ambassador for 2018. I've also pursued training in professional development by participating in a four week course at the University of Minnesota on Value Proposition Design that covers how to organize products from a project to solicit support in the form of start-ups or small businesses. During the project period, I've also developed more traditional roles in academic leadership. I helped form and am a co-chair for a new Functional Annotation of the Animal Genome (FAANG) working group that focuses on developing best practices and methods for network analysis that currently has a community of 30 animal geneticists throughout the world. Additionally I'm currently co-editing a Research Topic with Dr. David MacHugh, a bovine researcher at the University College Dublin, where we will organize and edit a collection of articles for a special interest section titled "Integrative and Network Genomics in Agricultural Animals" for the academic publisher "Frontiers" (https://www.frontiersin.org). Objective 3: Laboratory Techniques: With support from this fellowship, I've had the opportunity to gain experience in a laboratory that handles and processes wet-lab samples. In transition from my PhD, where I was in a strictly dry-lab, Dr. McCue's lab offered hands on experience on techniques for molecular biology. I learned specifics on sample preparations for DNA and RNA isolation. Since joining Dr. McCue's lab, I've had the opportunity to perform some "hands on" data generation with PCR and an RFLP assay. I've also had to opportunity to experience a horse necropsy where an individual was euthanized and tissue samples were extracted and processes. With this first hand experience, I've come to better appreciate the effort and work that goes into sample collection and to also respect the cost and sacrifice in working in animal species where the impact of sacrificed animals is high. How have the results been disseminated to communities of interest?Results have been presented in several national and international conferences. Two posters and one talk were presented in the Plant and Animal Genome Conference in January 2017. Two posters and an oral presentation were presented for the International Society for Animal Genetics conference (ISAG) in July of 2017. A single poster and a software demo showcasing the features of Camoco were presented at the Plant and Animal Genome conference in 2018. All of the software developed as a part of this fellowship was designed using software engineering best practices and are available as free and open source software. Code related to Ponytools, Camoco, Minus80, LocusPocus and HapDab are freely available as a software suite from GitHub (https://github.com/LinkageIO). Since the start of this fellowship, 589 individual git commits have been incorporated into Camoco and 144 individual git commits have been incorporated into Ponytools, 152 git commits have been made to Minus80. 132 commits have been made to LocusPocus, and 49 commits have been made to HapDab. In total, 1,066 git commits have been incorporated to software related to this project. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? National and global concerns of agricultural problems such as conservation of animal breeding populations in the face of endemic disease or climate change means advancements in agriculture have become increasingly reliant on data and technology. In the past decade, scientists have been deeply invested in developing technology to cheaply sequence genomes. The first genomes cost millions of dollars each and focused on human health and medical applications largely through the use of model organisms such as mice or cell cultures. Now almost two decades later, this genome sequencing technologies have become affordable enough to be used to study agricultural species, which requires the sequencing of natural population containing hundreds of samples (i.e. herds). While the data are becoming more available for scientists to interpret, non-model and especially agricultural species lack the specialized computational tools necessary to fully and robustly analyze these large datasets. In the past two years, we have made substantial progress in developing software and computational tools specifically for studying genomic sequencing data for agricultural species. Agricultural species have biological complexities that make direct applications of technology developed for model species such as those described above difficult. These hurdles, such as genome size, economic impact of sacrificed samples, and species generational time make computational approaches that utilize big data very desirable. Using the tools we are developing here, we are honing in on specific genetic components for complex agricultural traits. In the horse, we've computationally predicted genes that affect Equine Metabolic Syndrome in order to predict either disease risk in animals that have already been bred as well as to identify potential targets for preventative measures and therapeutic targets. Research goals: Objective 1: Develop necessary computational tools and resources for integrating heterogeneous data in the horse. 1a) Using SNP markers from 513 horses representing 15 different phylogenetic clades, we've calculated a breed specific haplotype map for ~1.9M SNPs. SNPs were binned into 1 megabase windows and linkage disequilibrium (LD; r2 based) was calculated between SNPs in each window. LD was found the decay as a function of base pair distance, however, the rate of decay was found to be breed dependent. LD is used to estimate haplotype length at each of the ~1.9M SNPs that are available on the MNEc2M and MNEc670K arrays (See Other Products Section). This haplotype map is available to the equine genetics community as a resource through two different software packages we've developed. The first package called, "PonyTools" implements some software tools used to analyze data generated with the MNEc2M and MNEc670K arrays. The second software package, called "HapDab", is a general purpose haplotype database software tool that is generalizable to other animal systems. Two other software packages for managing and analyzing biological datasets were the direct result of this fellowship. "Minus80" is a software library that I developed to better handle raw biological data and to allow for freezing/unfreezing large datasets using a cloud based system and "LocusPocus" is a general purpose genetic coordinate library used to represent genetic coordinates. 1b) We used gene expression data available in our lab to create gene co-expression networks in the horses. We have developed open-source software called Camoco that takes gene expression data across a set of conditions (case vs control) or from genotypically diverse individuals (breed data) and identifies small clusters of genes that have similar responses in gene expression. Using Camoco, we have built and analyzed co-expression networks using gene expression data isolated from 4 different tissues: Muscle, Adipose, Cartilage and Bone. The co-expression networks contained between 78,688 and 288,632 gene-gene interactions. Networks were vetted against known biological functions (e.g. GO Terms) finding between 32% and 41% of functional gene sets specified by GO were co-expressed in our networks. These gene co-expression networks are currently being used to interpret results from GWA studies. Using the haplotype map described in 1a, SNPs associated with traits of interest are being mapped to genes within the haplotype window and analyzed for network interactions. Camoco calculates the probability of connectedness among a group of genes within a subnetwork through bootstrapping (i.e. randomization). This approach has resulted in a change in action when interpreting GWAS results within our lab. Candidate gene identification using co-expression networks does not rely on manual gene curation. Instead, genes are prioritized in a data driven approach, using interactions in the co-expression network to rank genes that are putatively related to disease. In addition to Camoco being freely available software for anyone to use on GitHub, we developed user friendly modules built on the Cyverse (formally iAnimal) Discovery Environment (http://cyverse.org). We developed container based (Docker) pipelines that are easily deployable and shareable with geneticists in the equine community as well as with scientists studying other organisms. Objective 2. Determine gene pathway interactions responsible for equine metabolic syndrome (EMS) In a genome wide association study, we identified 2,375 SNPs associated with 11 different monomorphic, biochemical, or hormonal phenotypes related to EMS in both Morgan and Welsh Ponies, two breeds affected by the disease. After SNP to gene mapping, these SNPs encompass 183 different independent genomic regions that contain 2,962 potential candidate genes. Case and control groups were delimited by taking the top 25% and bottom 25% phenotypic extremes for either insulinemic response or fasting adiponectin levels. In skeletal muscle and adipose tissue we found 463 and 798 genes that exhibited differential expression, respectively, between our comparison groups for insulin response. Similarly, we found 1,001 and 497 differentially expressed (DE) genes in skeletal muscle and adipose tissue for high and low adiponectin concentrations. These results indicate that the tissues we used to examine EMS phenotypes were biologically relevant. Using the co-expression networks described above (Muscle, Adipose, Bone and Cartilage), we found 259 genes that exhibited significant co-expression among loci identified by the EMS GWAS. These candidate genes are currently being evaluated in follow-up analyses with additional functional data (see below). Complementary to the gene co-expression networks (described above) we built a network based on abundances of 646 metabolites in 40 Welsh ponies. The resultant network contained 208,335 interactions and formed 39 distinct network clusters. Initial quality control metrics indicate that the clusters formed in the network follow expected distributions for biological networks. Efforts to pinpoint the genetic components to variation in metabolite abundances were however inconclusive. Additional metabolite data for 264 Welsh ponies and 286 Morgan horses are currently being generated and the metabolomics analysis pipeline is being refined with support from a currently ongoing USDA animal health grant awarded in 2017 to Dr McCue. To further pinpoint genetic components of EMS and to provide validation for the candidate genes we have identified here, we have expanded the scope of this experiment by increasing the number of samples used to build the networks from 28 individuals to 92. We've also secured funding to increase the number of tissues used in the tissue expression network to 12 tissues in 12 horses. These additional samples have been collected and RNA has been isolated for sequencing. We anticipate that including these additional samples will aid in further identifying candidate genes for EMS.

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Robert J. Schaefer, Jean-Michel Michno, Chad L. Myers. Unraveling gene function in agricultural species using gene co-expression networks. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 30 July 2016
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schaefer et al. Camoco: a computational framework for inter-relating GWAS loci and unraveling gene function using co-expression networks. Plant and Animal Genome Conference. January 2017.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schaefer et al. Unraveling gene function using gene co-expression networks in the domestic horse. Plant and Animal Genome Conference. January 2017.
  • Type: Journal Articles Status: Published Year Published: 2017 Citation: Barbara Wallner, Nicola Palmieri, Claus Vogl, Doris Rigler, Elif Bozlak, Thomas Druml, Vidhya Jagannathan, Tosso Leeb, Ruedi Fries, Jens Tetens, Georg Thaller, Julia Metzger, Ottmar Distl, Gabriella Lindgren, Carl-Johan Rubin, Leif Andersson, Robert Schaefer, Molly McCue, Markus Neuditschko, Stefan Rieder, Christian Schl�tterer, Gottfried Brem. Y Chromosome Uncovers the Recent Oriental Origin of Modern Stallions. Current Biology, 2017; DOI: http://doi.org/10.1016/j.cub.2017.05.086
  • Type: Journal Articles Status: Under Review Year Published: 2018 Citation: Robert J Schaefer, Jean-Michel Michno, Joseph Jeffers, Owen Hoekenga, Brian Dilkes, Ivan Baxter, Chad Myers. Integrating co-expression networks with GWAS to prioritize causal genes in maize. bioRxiv. DOI: https://doi.org/10.1101/221655
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2018 Citation: Robert J Schaefer, Jean-Michel Michno, Elaine M Norton, Joseph R Jeffers, Owen Hoekenga, Brian P. Dilkes, Ivan Baxter, Molly E McCue, Chad Myers. Camoco: Identifying High Priority Candidate Genes from GWAS using Co-Expression Networks. Plant and Animal Genome Conference. San Diego, CA. 2018.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2018 Citation: Robert J Schaefer, Jean-Michel Michno, Elaine M Norton, Joseph R Jeffers, Owen Hoekenga, Brian P. Dilkes, Ivan Baxter, Molly E McCue and Chad Myers. Camoco: Identifying High Priority Candidate Genes from GWAS using Co-Expression Networks. Plant and Animal Genome Conference. San Diego, CA. 2018
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2018 Citation: Robert J Schaefer, James R Mickelson and Molly E McCue. A Haplotype Map and Imputation Resource in the Horse. Plant and Animal Genome Conference. San Diego, CA. 2018.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2017 Citation: Robert Schaefer et al. Unraveling gene function using co-expression networks in the domestic horse. International Society of Animal Genetics. Dublin, Ireland. 2017.


Progress 02/15/16 to 02/14/17

Outputs
Target Audience:Target audiences for this fellowship have included: Myself as a mentor to Dr. McCue Members of the McCue lab The larger equine genomics community who are impacted by the genomic tools we are developing here Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Coming from a mainly computation background, I have expanded my skill set, working closely with members of the McCue lab. I have gained benchtop laboratory experience in several areas. I was introduced to several new protocols including PCR, gel electrophoresis and RFLP tagging. Being incorporated into a mixed wet/dry lab has allowed me to appreciate the process of data generation and given me insight into experimental design and implementation. In turn, I have been able to share computational expertise to primarily veterinarian and biological science PhD students in the lab. I founded and lead a bi-weekly "hackey-hour" that is meant to prime students in the lab on computational techniques and promote best-practices in software (scripting) development. In addition to gaining skills in lab techniques, I have also gained skills related to grantsmanship and writing. Under the mentorship of Dr. McCue, I was directly involved in the submission of two successfully funded USDA proposals and one successfully funded proposal from the Morris Animal Foundation. As co-investigator on these grants, I have received substantial mentoring on the crafting of successful grants and have experienced the process of submitting successful grants. Using the skills I've learned here, I've just recently submitted an independent proposal for future fellowship support through the Mozilla Science Laboratory, a nonprofit that supports computational scientists. How have the results been disseminated to communities of interest?Results have been presented in several international conferences. Two posters and one talk were presented in the Plant and Animal Genome Conference in January 2017 and a abstract was been selected for an oral presentation in the upcoming conference for the International Society for Animal Genetics in July. Code related to Ponytools and Camoco are freely available from GitHub (see references above). Since the start of this fellowship, 384 individual git commits have been incorporated into Camoco and 67 individual git commits have been incorporated into Ponytools. What do you plan to do during the next reporting period to accomplish the goals?In the next reporting period we plan to accomplish: Finishing RNASeq of the 15 muscle and 15 adipose tissues Incorporate the Metabolite to gene mapping into the Camoco software suite Analyze integrated EMS GWAS results with disease specific co-expression networks Enroll in a Preparing Future Faculties course Organize a workshop at the plant and animal genome conference PAG in January 2017 for network analysis for the equine community Maintain and package accompanying software for distribution through Cyverse (previously iPlant)

Impacts
What was accomplished under these goals? National and global concerns of agricultural problems such as crop response to climate change or conservation of animal breeding populations in the face of endemic and disease means advancements in agriculture have become increasingly reliant on data and technology. In the past decade, scientists have been invested in developing technology to cheaply sequence genomes. The first genomes cost millions of dollars each and focused on human health and medical applications largely through the use of model organisms such as mice or cell cultures. Now almost two decades later, this genome sequencing technologies have become affordable enough to be used to study agricultural species, which requires the sequencing of hundreds of samples (i.e. herds and fields). In the past year, we have made substantial progress in developing software and computational tools specifically for studying genomic sequencing data for agricultural species. Agricultural species have biological complexities that make direct applications of technology developed for model species such as those described above difficult. These hurdles, such as genome size, economic impact of sacrificed samples, and generational time make computational approaches that utilize big data very desirable. Using the tools we are developing here, we are honing in on specific genetic components for complex agricultural traits. In the horse, we've computationally predicted genes that affect Equine Metabolic Syndrome in order to predict either disease risk in animals that have already been bred as well as to identify potential targets for preventative measures and therapeutic targets. This NIFA postdoctoral fellowship has allowed me to take computational skills that I've learned during my PhD, from a lab that studies biology in the model species yeast, and to rapidly apply it to domestic horse, a major veterinary system. Taken this experience, and in preparation for the future, I've extended my role and a computational scientist to create collaborations with other agricultural and veterinary communities to share software and methods. These include a number of other non-model species including researchers who study corn, potato, soybean as well as forming new collaborations with researchers studying canine and feline systems. Research goals: Objective 1: Develop necessary computational tools and resources for integrating heterogeneous data in the horse. 1a) Using SNP markers from 513 horses representing 15 different phylogenetic clades, we've calculated a breed specific haplotype map for ~1.9M SNPs. SNPs were binned into 1 megabase windows and linkage disequilibrium (LD; r2 based) was calculated between SNPs in each window. LD was found the decay as a function of base pair distance, however, the rate of decay was found to be breed dependent. LD is used to estimate haplotype length at each of the ~1.9M SNPs that are available on the MNEc2M and MNEc670K arrays (See Other Products Section). This haplotype map will be available to the equine genetics community as a resource through a software package we developed called PonyTools1 (See Other Product Section). We are currently implementing software routines that will enable SNP to gene mapping based on the calculated haplotype windows that will be incorporated into PonyTools. This objective has resulted in a change in knowledge within the equine genetics community. This resource will be used by community members who are using the MNEc670K array our group has previously developed. Using our accompanying tools, researchers will be able to more accurately estimate candidate genes around SNPs associated with their traits of interest. 1b) We used gene expression data available in our lab to create gene co-expression networks in the horses. We have developed open-source software called Camoco2 that takes gene expression data across a set of conditions (case vs control) or from genotypically diverse individuals (breed data) and identifies small clusters of genes that have similar responses in gene expression. These clusters, collectively, form a gene information network called a gene co-expression network. Networks are being used to link genomic loci identified by genome wide association studies (GWAS) to candidate genes identified within a biological network. Using Camoco, we have built and analyzed co-expression networks using gene expression data isolated from 4 different tissues: Muscle, Adipose, Cartilage and Bone. The co-expression networks contained between 78,688 and 288,632 gene-gene interactions. Networks were vetted against known biological functions (e.g. GO Terms, or KEGG) in order to estimate the informativeness of the interactions within the network. We found between 32% and 41% of functional gene sets specified by GO were significantly co-expressed in our networks indicating network modules contained valid, biological information. These gene co-expression networks are currently being used to interpret results from GWA studies. Using the haplotype map described in 1a, SNPs associated with traits of interest are being mapped to genes within the haplotype window and analyzed for network interactions. Camoco calculates the probability of connectedness among a group of genes within a subnetwork through bootstrapping (i.e. randomization). This approach has resulted in a change in action when interpreting GWAS results within our lab. Candidate gene identification using co-expression networks does not rely on manual gene curation. Instead, genes are prioritized in a data driven approach, using interactions in the co-expression network to rank genes that are putatively related to disease. 1 http://github.com/schae234/Camoco 2 http://github.com/schae234/PonyTools Objective 2. Determine gene pathway interactions responsible for equine metabolic syndrome We have made substantial progress towards the aims of objective 2. We have identified genomic regions associated with Equine Metabolic Syndrome (EMS) by using the tools developed in Objective 1. Previous GWAS performed in the lab identified 2,375 SNPs associated with 11 different monomorphic, biochemical, or hormonal phenotypes related to EMS in both Morgan and Welsh Ponies, two breeds affected by the disease. After SNP to gene mapping, these SNPs encompass 183 different independent genomic regions that contain 2,962 potential candidate genes. Using the co-expression networks described above (Muscle, Adipose, Bone and Cartilage), 259 genes exhibit significant co-expression among loci identified by the EMS GWAS. We are currently building disease specific co-expression networks to better describe EMS SNPs. We proposed performing RNA-Seq on 15 horses using tissue from both skeletal muscle as well as tail head depot (adipose). To date, we have isolated RNA from all 15 skeletal muscle samples and are transitioning towards isolated adipose samples, which have a more complicated protocol. Isolated RNA will be submitted together in the near future to reduce batch effects. Additionally, funding has been secured to perform RNA-sequencing on additional samples from more breeds. This work is currently on-going. Co-metabolite networks have been constructed for metabolic data, however, work is still on-going on mapping metabolites to genes. We expect that once sequencing has been completed on additional muscle and adipose samples, analysis will continue as proposed in year 2.

Publications

  • Type: Journal Articles Status: Published Year Published: 2016 Citation: Robert J. Schaefer, Jean-Michel Michno, Chad L. Myers. Unraveling gene function in agricultural species using gene co-expression networks. Biochimica et Biophysica Acta (BBA) - Gene Regulatory Mechanisms. 30 July 2016
  • Type: Journal Articles Status: Under Review Year Published: 2017 Citation: Robert J Schaefer, Mikkel Schubert, Ernest Bailey, Danika L. Bannasch, Eric Barrey, Gila Kahila Bar-Gal, Gottfried Brem, Samantha A. Brooks, Ottmar Distl, Ruedi Fries, Carrie J. Finno, Vinzenz Gerber, Bianca Haase, Vidhya Jagannathan, Ted Kalbfleisch, Tosso Leeb, Gabriella Lindgren, Maria Susana Lopes, Nuria Mach, Artur da C�mara Machado, James N. MacLeod, Annette McCoy, Julia Metzger, Cecilia Penedo, Sagi Polani, Stefan Rieder, Imke Tammen, Jens Tetens, Georg Thaller, Andrea Verini-Supplizi, Claire M. Wade, Barbara Wallner, Ludovic Orlando, James R. Mickelson, Molly E. McCue. Development of a high-density, 2M SNP genotyping array and 670k SNP imputation array for the domestic horse. BMC Genomics. May 2017. doi: https://doi.org/10.1101/112979.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schaefer et al. Camoco: a computational framework for inter-relating GWAS loci and unraveling gene function using co-expression networks. Plant and Animal Genome Conference. January 2017.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2017 Citation: Schaefer et al. Unraveling gene function using gene co-expression networks in the domestic horse. Plant and Animal Genome Conference. January 2017.