Addition by Subtraction – a new tool to improve genetic gain and accelerate breeding decisions,

ADDITION BY SUBTRACTION – A NEW TOOL TO IMPROVE GENETIC GAIN AND ACCELERATE BREEDING DECISIONS,

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

SMALL BUSINESS GRANT

Reporting Frequency

Annual

Accession No.

1025779

Grant No.

2021-33530-34402

Cumulative Award Amt.

$100,000.00

Proposal No.

2021-01084

Multistate No.

(N/A)

Project Start Date

Jul 1, 2021

Project End Date

Feb 28, 2023

Grant Year

2021

Program Code

[8.3]- Animal Production & Protection

Recipient Organization
RAPID GENOMICS LLC
747 SW 2ND AVE STE 354
GAINESVILLE,FL 326016284

Performing Department
(N/A)

Non Technical Summary
Problem and Proposed Solution:Genomic selection, or the ability to predict complex phenotypes using DNA information, has tremendously accelerated animal breeding programs in recent years. This revolutionary approach is already mainstream to breeding species such as dairy cattle in advanced programs. However, further expansion of genomic selection to larger populations, additional breeding programs, underserved species, and developing countries requires greater prediction accuracy at lower genotyping cost. The first challenge can be addressed by whole genome re-sequencing to generate ultra-high-density DNA marker panels in the training population.Unfortunately,the species that would benefit the most from genome re-sequencing as a genotyping approach, such as cattle, pigs, and fish species, have notoriously repetitive genomes. The drawback of sequencing genomes with high repetitive content is that a significant portion of the sequencing cost is spent characterizing these uninformative regions. Up to 80% of the short-reads generated from a re-sequencing project may be expected to map to high-copy regions and be discarded during bioinformatic analysis.Addressing the challenge of targeting genome re-sequencing to non-repetitive regions of the genome will reduce cost, increase throughput, eliminate wasteful sequencing of non-informative regions, improve the prediction accuracy of genomic selection, and increase access to the process to a wider range of producers. Furthermore, directing sequencing to non-repetitive genic and regulatory regions is more likely to uncover variants that impact traits of importance. As a strategy to reduce costs further, genomes can be sequenced at low sequencing depth followed by imputation to solve the high missing data inherent to these approaches. The problem with this solution is that some markers associated with key genetic conditions and abnormalities cannot be inferred via imputation and remain missing. Therefore, an ideal product for whole-genome, sequencing-based genotyping of cattle will involve depleting unwanted repetitive elements but simultaneously enriching for key regions of the genome.The market currently lacks a product that allows rapid and efficient elimination of repetitive regions from genomes while enriching for key variants, and that can be readily integrated to standard re-sequencing pipelines.The goal of this SBIRPhase I is to determine the technical feasibility of GD-Seq, a product to characterize the cattle genome using an improved whole-genome, sequencing-based genotyping approach. Upon completion of the product, we anticipate that important regions of the genome will be enriched 300-500% compared to standard sequencing methods. Simultaneously, the company expects to enrich markers associated with key traits around 100 times. This will allow sequencing resources to be better allocated towards important regions of the genome and key genetic markers to be accurately analyzed even when the rest of the genome is sequenced at low sequencing depth. This product presents a significant improvement over current methodologies because it will solve major pain points currently associated with SNP chips and other sequencing-based approaches.Anticipated Results and Commercial Applications:The commercial focus of this technical project is to facilitate the conversion of genotyping using DNA chip to a new generation of genotyping that relies on re-sequencing using GD-seq. The process is highly scalable and automatable yet remaining cost-effective by improving sequencing data utilization. This provides the opportunities for more animals to be genotyped with higher marker density, which will support an industry that increasingly demands higher genetic gains, especially on more complex traits like feed efficiency, fertility and animal health. A significant market exists for this technology among animal producers, breeders and even researchers.

Animal Health Component

50%

Research Effort Categories

Basic

30%

Applied

50%

Developmental

20%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
303	3440	1081	50%
304	3440	1080	50%

Knowledge Area
303 - Genetic Improvement of Animals; 304 - Animal Genome;

Subject Of Investigation
3440 - Meat, dairy cattle;

Field Of Science
1081 - Breeding; 1080 - Genetics;

Keywords

sequencing-based genotyping

Goals / Objectives
The goal of this SBIR Phase I is to determine the technical feasibility of GD-Seq, a product to characterize the cattle genome using an improved whole-genome, sequencing-based genotyping approach. At the end of Phase I, the team expects to have answered the following technical questions:Can depleting the repetitive portion of the cattle genome during library construction shift the allocation of sequencing resources from repeats to more informative regulatory and gene regions?Is it possible to program Cas9 enzyme to efficiently cleave a large number of targets in parallel?When sequencing the cattle genome at low depth, are we capable of enriching for certain key regions so they receive additional sequencing data and can be accurately characterized without imputation?The technical objectives of this Phase I project are:Technical Objective 1. Coarse depletion of repetitive DNA sequences from the cattle genome.Upon concluding Technical Objective 1, the team will have demonstrated the feasibility of reducing the heterochromatin of cattle prior to sequencing without negatively affecting the sequencing of genes and regulatory regions. In addition, the team will have validated what enzyme combinations to use and how efficient are the treatments for different cattle breeds. A 200-300% enrichment towards low-copy regions of the genome compared to standard WGS is expected as an outcome.Technical Objective 2. Targeted depletion of repetitive elements from the cattle genome.Upon concluding Technical Objective 2, RAPiD Genomics will have demonstrated the ability to synthesize a large number of gRNA molecules to target Cas9 to cleave specific regions of the genome. Furthermore, the team will have evidence of how the abundance of each target affects the stoichiometry of loading Cas9 with the appropriate amount of gRNA, as well as how to perform this in a multiplex reaction where thousands of gRNA+Cas9 molecules are complexed together. The team expects to achieve 200% enrichment towards low-copy regions of the genome compared to standard WGS.Technical Objective 3. Enrichment for markers linked to important genetic conditions and abnormalities in the cattle genome.Upon concluding Technical Objective 3, RAPiD Genomics will have demonstrated the ability to enrich key regions of the genome using a simple and effective approach. The team will know how to properly design and balance these primers in solution relative to the universal primers that amplify the other genomic regions. The team expects to enrich these targets about 100X, so that when the rest of the genome is sequenced at low sequencing depth, for example 1X, these important regions are sequence at 100X, providing high-confidence genotyping.

Project Methods
This project will be divided into experiments as described below.Experiment 1.1 - Selection of methylation dependent enzymes to deplete cattle heterochromatin.In this experiment, the team will evaluate four commercially available methylation dependent restriction enzymes to identify the optimal one that is capable of depleting the cattle heterochromatin. DNA extracted from cattle ear punches will be digested by these enzymes during library construction. The resulting libraries will be sequenced so we can identify the regions of the genome they depleted. The best enzyme combination is the one that depletes a large portion of the cattle genome without digesting important genic regions.Experiment 1.2 - Evaluation of heterochromatin depletion on a diverse germplasm.After selection of the best methylation-dependent restriction enzyme combination to be used for the cattle genome inExperiment 1.1, the team will evaluate the performance of this approach on a diverse germplasm of cattle. This experiment will provide a detailed catalogue of the regions that are depleted from genomes of different breed origin, and the proportional enrichment that is achieved in single-copy regions of these breeds.Experiment 2. Application of CRISPR-Cas9 system to large-scale programmable repeat depletion.Although the depletion approach proposed in Objective 1 is expected to eliminate a large percentage of repetitive DNA prior to sequencing, many repeats will likely survive that treatment and still consume valuable sequencing resources. The secondObjectiveof this proposal is to evaluate a method to carry out targeted depletion of specific regions using a CRISPR-Cas9 system. Cas9 is a targeted endonuclease enzyme capable of being programmed to cleave specific regions of the genome through an RNA guide molecule. Thegoalis to evaluate the performance of Cas9 to cleave thousands of targets in a single multiplex reaction. In this experiment, the team will construct a large number of guide RNA molecules using a modification of its proprietary batch oligonucleotide construction method, and evaluate their capacity to further deplete undesired genomic regions to improve GD-Seq in cattle.Experiment 3.1. Application of single primer enrichment to WGS librariesStrategy to incorporate single primer extension to target enrichment.A single primer extension concept will be developed during the final PCR enrichment step of library construction to elevate the sequencing signal for main targets in the cattle genome.During the standard library enrichment process, a pool of target-specific primers and Illumina universal enrichment primers will be added to a PCR reaction. The linear extension primers provide additional copies of the target containing library molecules at every subsequent PCR cycle increasing their abundance in the final PCR pool over what would be present without additional enrichment. Results from this experiment will provide primer design characteristics, optimized reaction conditions, and enrichment efficiencies necessary for implementing targeting enrichment.Experiment 3.2. Application of single primer enrichment to GD-Seq librariesIn this experiment, the team will evaluate if GD-Seq libraries can be coupled with targeted enrichment to produce a tool that can both depleted unwanted repetitive regions and enrich for loci of interest. The stacking of these two library treatments in the same workflow will provide a significant improvement to existing skim sequencing methodologies. The results that will be obtained from this experiment will guide the development of a first-class product with a simple, scalable and automatable workflow that combines depletion and enrichment.

Progress 07/01/21 to 02/28/23

Outputs
Target Audience:During this Phase I project reached out to the following audience: Key opinion leaders working in cattle genomics to identify the regions of the genome associated with important genetic conditions and abnormalities, which we attempted to enrich for prior to sequencing at low-depth. Industry partners that we planned to commercialize a final product with, which include genomics, and kit manufacturing companies. Key cattle breeding associations that will ultimately be clients and provide samples to us. They helped gauge interest in the product. Changes/Problems:Despite several tests to optimize technical objective 3, which aimed at enriching key regions of the genome without adding additional steps to the protocol, we were unable to reach significant enrichment. Out of the 29 targets we designed and attempted to enrich, none were enriched beyond the control, suggesting that a change in approach will be needed. We have identified alternative strategies, and they will be tested in future phases of the project. What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?The target audience was contacted during the sales and business development process that Rapid Genomics routinely carries out with them. This continuously monitored the interest of the target audience in the GD-Seq product and let us adjust the focus on key priorities. For example, it has become clear that coarse depletion is more important than targeted depletion of the genomic DNA because it will lead to a cheaper and faster protocol. Similarly, these activities clarify how paramount it is to enrich key target regions to obtain a product that the community will readily adopt, an area that requires more research and optimization. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? It is important to characterize the genome of cattle so that this DNA information can be incorporated into the breeding process and lead to more efficient livestock production. It is also important for this process to be done cost-effectively, requiring continuous and disruptive technological improvements to how the DNA information is generated. Because the genome of cattle is large and contains a significant fraction of less informative regions, commonly referred to as repetitive regions, the first goal for the product being developed under this application is to deplete these repetitive regions. The generation of the DNA information can then be done more efficiently via sequencing, as the sequencing resources (cost and time) are focused on the more important regions of the genome of cattle. The second goal was to create a protocol for going from tissue to depleted DNA ready for sequencing that was high-throughput, simple, and cost-effective. The cattle industry has identified variants in the cattle genome that are biologically and economically important. When sequencing the DNA, it is necessary that these regions are well-sampled and characterized, which is impossible to achieve when sequencing is done with low-depth, known as skim or low-pass sequencing. Therefore, the last goal was to enrich these regions while preparing the samples for sequencing. When the GD-Seq product is released, companies and organizations performing cattle breeding will be able to utilize it as a standard genotyping service. This will be done by sending samples to our laboratory that will process them from tissue to DNA information, and returning to these customers useful genetic information they can overlay in their breeding program to make selections of the best animals. Experiment 1.1 - From tissue to sequencing libraries. We adopted ear punches collected in AllFlex Tissue Sampling Units (Merck) as the default for all experiments. This is a standard collection protocol in the industry and leads to reasonably consistent results. A protocol for high-throughput extraction of the DNA from tissue was developed using magnetic beads and liquid handling automation. Another outcome obtained was a protocol to go from extracted DNA to sequencing, known as library preparation. We streamlined the process by using an enzymatic tagmentation reaction, which is faster than alternative methods and suits the overarching goals of the GD-Seq product. Experiment 1.2 - Selection of methylation-dependent enzymes to deplete cattle heterochromatin. Different enzymes were evaluated to remove the repetitive region of the genome associated with high levels of methylation commonly described as heterochromatin DNA. All enzymes depleted the DNA to different extents, but MspJI produced the best results and was adopted for subsequent tests. Experiment 2.1 - Batch construction of guide RNAs for Cas9. We devised and built a system to construct thousands of RNA guides for Cas9 in parallel. The process starts by identifying Cas9 target to which we include known sequences on both sides that will be used to convert the molecules from DNA to RNA. The DNA molecules were synthesized in parallel on a microarray chip, with the DNA being converted to functional guides via a series of molecular biology reactions. Experiment 2.2 - Self-programming of argonaute enzyme to deplete repetitive elements. We investigated the capability of prokaryote argonaute enzyme to "learn" from repetitive elements and create its own targets to deplete these repetitive elements. To achieve that, we subjected the enzyme to cattle Cot-1 DNA, which contains large amounts of repetitive DNA. In this context, argonaute enzyme uses this DNA to create short guides that can then be used to deplete these regions when combined with genomic DNA. Preliminary results support this process is viable, but further evaluation is needed to identify its efficiency and specificity to the repetitive targets. Experiment 3.1 - Target enrichment A set of 29 key regions of the genome of cattle were identified via consultation with the target audience. Oligos for these targets were designed to enrich these regions during the library construction optimized in Experiment 1.1. The resulting libraries were sequenced to identify the percentage of enrichment obtained, with different conditions explored. Further optimization is required, because the results obtained until the completion of the project did not show significant enrichment for these key regions. The results obtained during this project standardized individual cattle sample collection, cost-effectively extracted DNA, converted the DNA to sequencing-ready molecules, and identified the best enzyme to provide coarse depletion of the repetitive DNA to eliminate to concentrate sequencing resources on more important regions of the genome. An alternative approach utilizing argonaute enzyme was created to self-program these enzymes to deplete regions of the genome that will be further explored in future phases of the project. The project also explored the possibility of enriching key regions of the genome in the same steps that libraries are constructed, therefore not adding additional time to the process.

Publications

Progress 07/01/21 to 06/30/22

Outputs
Target Audience:During this Phase I project we plan to reach out to the following audience as we advance the technical aspects needed to develop a product: Key opinion leaders working in cattle genomics to identify the regions of the genome associated with important genetic conditions and abnormalities, which we plan to enrich for prior to sequencing at low-depth. Industry partners that we hope the commercialize a final product with, which include genomics, veterinary and kit manufacturing companies. Key cattle breeding associations that will ultimately be clients and provide samples to us. They will help gauge interest in the product and provide valuable testing material. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?The next steps in this project will focus on two objectives. The first is the targeted depletion of residual regions of the genome that are not favorable and can still be removed to further focus the sequencing resources on more relevant regions of the genome. The second goal will be to attempt to enrich key regions on the genome that are biologically or commercially important to the target audience, which will give more certainty on the classification of the animals accordingly.

Impacts
What was accomplished under these goals? It is important to characterize the genome of cattle so that this DNA information can be incorporated into the breeding process and lead to more efficient livestock production. It is also important for this process to be done cost-effectively, requiring continuous and disruptive technological improvements to how the DNA information is generated. Because the genome of cattle is large and contains a significant fraction of less informative regions, commonly referred to as repetitive regions, the first goal for the product being developed under this application is to deplete these repetitive regions. The generation of the DNA information can then be done more efficiently via sequencing, as the sequencing resources (cost and time) are focused on the more important regions of the genome of cattle. The second goal was to create a protocol for going from tissue to depleted DNA ready for sequencing that was high-throughput, simple, and cost-effective. When the GD-Seq product is released, companies and organizations performing cattle breeding will be able to utilize it as a standard genotyping service. This will be done by sending samples to our laboratory that will process them from tissue to DNA information, and returning to these customers useful genetic information they can overlay in their breeding program to make selections of the best animals. Technical Objective 1. Coarse depletion of repetitive DNA sequences from the cattle genome. Experiment 1.1 - From tissue to sequencing We adopted ear punches collected in AllFlex Tissue Sampling Units (Merck) as the default for all experiments. This is a standard collection protocol in the industry and leads to reasonably consistent results. A protocol for high-throughput extraction of the DNA from tissue was developed using magnetic beads and liquid handling automation. Another outcome obtained was a protocol to go from extracted DNA to sequencing, known as library preparation. We streamlined the process by using an enzymatic tagmentation reaction, which is faster than alternative methods and suits the overarching goals of the GD-Seq product. Experiment 1.2 - Selection of methylation-dependent enzymes to deplete cattle Different enzymes were evaluated to remove the repetitive region of the genome associated with high levels of methylation commonly described as heterochromatin DNA. All enzymes depleted the DNA to different extents, but MspJI produced the best results and was adopted for subsequent tests. Altogether, the results obtained so far allow us to standardize individual cattle sample collection, cost-effectively extract DNA, convert the DNA to sequencing-ready molecules, and identify the best enzyme to provide coarse depletion of the repetitive DNA want to eliminate to concentrate sequencing resources on more important regions of the genome.

Publications