Performing Department
(N/A)
Non Technical Summary
Problem and Proposed Solution:Genomic selection, or the ability to predict complex phenotypes using DNA information, has tremendously accelerated animal breeding programs in recent years. This revolutionary approach is already mainstream to breeding species such as dairy cattle in advanced programs. However, further expansion of genomic selection to larger populations, additional breeding programs, underserved species, and developing countries requires greater prediction accuracy at lower genotyping cost. The first challenge can be addressed by whole genome re-sequencing to generate ultra-high-density DNA marker panels in the training population.Unfortunately,the species that would benefit the most from genome re-sequencing as a genotyping approach, such as cattle, pigs, and fish species, have notoriously repetitive genomes. The drawback of sequencing genomes with high repetitive content is that a significant portion of the sequencing cost is spent characterizing these uninformative regions. Up to 80% of the short-reads generated from a re-sequencing project may be expected to map to high-copy regions and be discarded during bioinformatic analysis.Addressing the challenge of targeting genome re-sequencing to non-repetitive regions of the genome will reduce cost, increase throughput, eliminate wasteful sequencing of non-informative regions, improve the prediction accuracy of genomic selection, and increase access to the process to a wider range of producers. Furthermore, directing sequencing to non-repetitive genic and regulatory regions is more likely to uncover variants that impact traits of importance. As a strategy to reduce costs further, genomes can be sequenced at low sequencing depth followed by imputation to solve the high missing data inherent to these approaches. The problem with this solution is that some markers associated with key genetic conditions and abnormalities cannot be inferred via imputation and remain missing. Therefore, an ideal product for whole-genome, sequencing-based genotyping of cattle will involve depleting unwanted repetitive elements but simultaneously enriching for key regions of the genome.The market currently lacks a product that allows rapid and efficient elimination of repetitive regions from genomes while enriching for key variants, and that can be readily integrated to standard re-sequencing pipelines.The goal of this SBIRPhase I is to determine the technical feasibility of GD-Seq, a product to characterize the cattle genome using an improved whole-genome, sequencing-based genotyping approach. Upon completion of the product, we anticipate that important regions of the genome will be enriched 300-500% compared to standard sequencing methods. Simultaneously, the company expects to enrich markers associated with key traits around 100 times. This will allow sequencing resources to be better allocated towards important regions of the genome and key genetic markers to be accurately analyzed even when the rest of the genome is sequenced at low sequencing depth. This product presents a significant improvement over current methodologies because it will solve major pain points currently associated with SNP chips and other sequencing-based approaches.Anticipated Results and Commercial Applications:The commercial focus of this technical project is to facilitate the conversion of genotyping using DNA chip to a new generation of genotyping that relies on re-sequencing using GD-seq. The process is highly scalable and automatable yet remaining cost-effective by improving sequencing data utilization. This provides the opportunities for more animals to be genotyped with higher marker density, which will support an industry that increasingly demands higher genetic gains, especially on more complex traits like feed efficiency, fertility and animal health. A significant market exists for this technology among animal producers, breeders and even researchers.
Animal Health Component
50%
Research Effort Categories
Basic
30%
Applied
50%
Developmental
20%
Goals / Objectives
The goal of this SBIR Phase I is to determine the technical feasibility of GD-Seq, a product to characterize the cattle genome using an improved whole-genome, sequencing-based genotyping approach. At the end of Phase I, the team expects to have answered the following technical questions:Can depleting the repetitive portion of the cattle genome during library construction shift the allocation of sequencing resources from repeats to more informative regulatory and gene regions?Is it possible to program Cas9 enzyme to efficiently cleave a large number of targets in parallel?When sequencing the cattle genome at low depth, are we capable of enriching for certain key regions so they receive additional sequencing data and can be accurately characterized without imputation?The technical objectives of this Phase I project are:Technical Objective 1. Coarse depletion of repetitive DNA sequences from the cattle genome.Upon concluding Technical Objective 1, the team will have demonstrated the feasibility of reducing the heterochromatin of cattle prior to sequencing without negatively affecting the sequencing of genes and regulatory regions. In addition, the team will have validated what enzyme combinations to use and how efficient are the treatments for different cattle breeds. A 200-300% enrichment towards low-copy regions of the genome compared to standard WGS is expected as an outcome.Technical Objective 2. Targeted depletion of repetitive elements from the cattle genome.Upon concluding Technical Objective 2, RAPiD Genomics will have demonstrated the ability to synthesize a large number of gRNA molecules to target Cas9 to cleave specific regions of the genome. Furthermore, the team will have evidence of how the abundance of each target affects the stoichiometry of loading Cas9 with the appropriate amount of gRNA, as well as how to perform this in a multiplex reaction where thousands of gRNA+Cas9 molecules are complexed together. The team expects to achieve 200% enrichment towards low-copy regions of the genome compared to standard WGS.Technical Objective 3. Enrichment for markers linked to important genetic conditions and abnormalities in the cattle genome.Upon concluding Technical Objective 3, RAPiD Genomics will have demonstrated the ability to enrich key regions of the genome using a simple and effective approach. The team will know how to properly design and balance these primers in solution relative to the universal primers that amplify the other genomic regions. The team expects to enrich these targets about 100X, so that when the rest of the genome is sequenced at low sequencing depth, for example 1X, these important regions are sequence at 100X, providing high-confidence genotyping.
Project Methods
This project will be divided into experiments as described below.Experiment 1.1 - Selection of methylation dependent enzymes to deplete cattle heterochromatin.In this experiment, the team will evaluate four commercially available methylation dependent restriction enzymes to identify the optimal one that is capable of depleting the cattle heterochromatin. DNA extracted from cattle ear punches will be digested by these enzymes during library construction. The resulting libraries will be sequenced so we can identify the regions of the genome they depleted. The best enzyme combination is the one that depletes a large portion of the cattle genome without digesting important genic regions.Experiment 1.2 - Evaluation of heterochromatin depletion on a diverse germplasm.After selection of the best methylation-dependent restriction enzyme combination to be used for the cattle genome inExperiment 1.1, the team will evaluate the performance of this approach on a diverse germplasm of cattle. This experiment will provide a detailed catalogue of the regions that are depleted from genomes of different breed origin, and the proportional enrichment that is achieved in single-copy regions of these breeds.Experiment 2. Application of CRISPR-Cas9 system to large-scale programmable repeat depletion.Although the depletion approach proposed in Objective 1 is expected to eliminate a large percentage of repetitive DNA prior to sequencing, many repeats will likely survive that treatment and still consume valuable sequencing resources. The secondObjectiveof this proposal is to evaluate a method to carry out targeted depletion of specific regions using a CRISPR-Cas9 system. Cas9 is a targeted endonuclease enzyme capable of being programmed to cleave specific regions of the genome through an RNA guide molecule. Thegoalis to evaluate the performance of Cas9 to cleave thousands of targets in a single multiplex reaction. In this experiment, the team will construct a large number of guide RNA molecules using a modification of its proprietary batch oligonucleotide construction method, and evaluate their capacity to further deplete undesired genomic regions to improve GD-Seq in cattle.Experiment 3.1. Application of single primer enrichment to WGS librariesStrategy to incorporate single primer extension to target enrichment.A single primer extension concept will be developed during the final PCR enrichment step of library construction to elevate the sequencing signal for main targets in the cattle genome.During the standard library enrichment process, a pool of target-specific primers and Illumina universal enrichment primers will be added to a PCR reaction. The linear extension primers provide additional copies of the target containing library molecules at every subsequent PCR cycle increasing their abundance in the final PCR pool over what would be present without additional enrichment. Results from this experiment will provide primer design characteristics, optimized reaction conditions, and enrichment efficiencies necessary for implementing targeting enrichment.Experiment 3.2. Application of single primer enrichment to GD-Seq librariesIn this experiment, the team will evaluate if GD-Seq libraries can be coupled with targeted enrichment to produce a tool that can both depleted unwanted repetitive regions and enrich for loci of interest. The stacking of these two library treatments in the same workflow will provide a significant improvement to existing skim sequencing methodologies. The results that will be obtained from this experiment will guide the development of a first-class product with a simple, scalable and automatable workflow that combines depletion and enrichment.