Source: UNIV OF MINNESOTA submitted to
COMPARISON OF CRISPR-INDUCED MUTATIONS AT DUPLICATE LOCI, INCLUDING CHANGES THAT ARE ON-TARGET, NON-TARGET, AND UNINTENDED
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1031365
Grant No.
2023-33522-41008
Cumulative Award Amt.
$551,099.00
Proposal No.
2023-02447
Multistate No.
(N/A)
Project Start Date
Sep 1, 2023
Project End Date
Aug 31, 2026
Grant Year
2023
Program Code
[HX]- Biotechnology Risk Assessment
Recipient Organization
UNIV OF MINNESOTA
(N/A)
ST PAUL,MN 55108
Performing Department
(N/A)
Non Technical Summary
Targeted genetic modifications have the potential to create novel agricultural products that are free of foreign DNA. Modifications that involve a single mutation are exempt from the regulatory process applied to traditional bioengineered crops. However, multiple gene copies or members of a gene family contribute to many agronomically important plant phenotypes. Therefore, a single mutation is insufficient to induce the desired phenotype. For federal regulatory agencies to make science-based decisions, it is necessary to determine if modifications of multiple gene copies or gene family members pose an increased risk to the environment or human health. We will investigate the potential for unintended changes due to CRISPR/Cas9 modification of five agronomically important gene families in soybean where multiple gene copies must be modified to induce a desired phenotype. We will use long-read DNA sequencing to characterize the full genomic complement of the five gene families in two soybean varieties. CRISPR/Cas9 edits generate double-stranded breaks in DNA that can induce chromosomal structural variants. Thus, we will use long-read sequencing to detect structural and nucleotide sequence variants potentially induced by CRISPR/Cas9 edits of multiple linked and dispersed copies of gene family members. Visual comparisons and hyperspectral analysis will be used to identify any unintended phenotypic changes in modified plants. Our data will increase understanding of the environmental risks associated with CRISPR/Cas9-induced changes in an important crop plant and provide a contrast with variation induced by unregulated practices such as conventional breeding and mutagenesis.
Animal Health Component
50%
Research Effort Categories
Basic
50%
Applied
50%
Developmental
0%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20118201080100%
Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1820 - Soybean;

Field Of Science
1080 - Genetics;
Goals / Objectives
Objective 1.Use long-read DNA sequencing to characterize duplicate genes in two soybean experimental lines.Objective 2.Characterize CRISPR modifications at duplicate genes that vary in linkage, copy number, sequence divergence, and phenotypic effect.Objective 3.Use long-read DNA sequencing and phenotypic screening to identify potential off-target or unintended changes induced by multiple CRISPR edits.
Project Methods
Objective 1. Use long-read DNA sequencing to characterize duplicate genes in two soybean experimental lines. (Years 1 - 2).In Objective 1, we address the fundamental challenge of working with duplicate genes; often, the number of copies of each gene is unknown. We aim to determine the number of complete, partial, and pseudogene copies of each of the genes we target in Bert and Williams82 before making comparisons with CRISPR-modified lines. Of course, sequence similarity can vary among gene copies, so the most important factor is the sequence similarity of gene copies at the genic positions that match guide RNAs (gRNAs) used to target them.Several approaches can be combined for this characterization, including mapping existing short reads to canonical versions of each gene to measure read depth and, thus, copy number. We can also perform local reassembly of existing Pac-Bio reads to improve the characterization of each region. Ultimately, we expect long and accurate Oxford Nanopore Technologies (ONT) sequences to provide a direct read-through of these five genes and their various copies in Bert and Williams82. Thus the product of this objective will be the full-length, sequence-level characterization of the five duplicate genes we are targeting in both Bert and Williams82.Objective 2. Characterize CRISPR modifications at duplicate genes that vary in linkage, copy number, sequence divergence, and phenotypic effect. (Years 1 - 3). In Objective 2, we will assess CRISPR edits produced to create phenotypic changes that improve the agronomic adaptation, nutritional value, and utility of soybean. Each CRISPR-modified soybean line will be grown in the greenhouse and monitored for phenotypic changes resulting from genetic modification. We will assess modified lines for changes related to the modified genes. For the Gibberellin 2-oxidase 8, this will involve changes in plant height or architecture, while the other four gene families affect seed traits that are assessed through proteomic and digestibility studies.Phenotypic characterization of modified lines in this objective will help select lines further characterized by long-read sequencing in Objective 3Objective 3. Use long-read DNA sequencing and phenotypic screening to identify potential off-target or unintended changes induced by multiple CRISPR edits. (Years 1 - 3).We will generate ONTwhole-genome long-read sequencing for ≥ 12 CRISPR-modified lines relative to unmodified versions of the transformation germplasm. The sequence will be used to identify targeted, non-targeted, and unintended changes in soybean genomes. Long sequence reads from individual lines will be mapped to assemblies of 'Bert' and 'Williams82' supplemented with fully characterized versions of the duplicate genes generated under Objective 1.While we will pay special attention to the duplicate genes targeted for CRISPR modification, DNA sequence data is inherently whole-genome, and structural changes anywhere in the genome will be identified. Because modification of multiple CRISPR targets could induce structure rearrangements, will use at least two current approaches that take advantage of ONT sequence for structure variant detection.

Progress 09/01/23 to 08/31/24

Outputs
Target Audience: Target audiences include federal regulators, plant genetics and genomics researchers, geneticists and breeders, evolutionary biologists, and population geneticists. They also include users and producers of soybeans or other crops subject to transgenesis or other forms of genetic modification. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project has provided training and professional development opportunities for undergraduate students and a postdoctoral scholar. All trainees involved in the project have improved their biocomputing and programming skills and are learning data management and analysis skills. The lab also has a weekly lab meeting and journal club that provides the opportunity to discuss science and genetics informally with lab members. The grant supports postdoctoral scholar Dr. Arge, who also contributed to developing DNA sequencing handling pipelines. Dr. Arge is involved in DNA sequencing sample preparation and analysis. He also helps prepare manuscripts related to the project. The grant has also supported three undergraduate students, Elaine Lee, Jacob Pacheco, and Nathan Liang. Ms. Lee has now completed her undergraduate degree. Mr. Pacheco is completing his undergraduate degrees. Mr. Liang graduated and works as a computer programmer for Well Fargo Bank. All three students continue to contribute to the data analysis related to this project. How have the results been disseminated to communities of interest?Our primary means of disseminating the research has been through the scientific literature. A manuscript built on the broader theme of the nucleotide sequence-level consequences of biotechnology approaches is currently in review at PLOS Genetics and available on bioRxiv (doi: 10.1101/2024.05.06.592067). Both the code used for data handling and analysis is made available as is all of the primary and important intermediate data. The manuscript noted above includes a public GitHub repository for code (https://github.com/MorrellLAB/Barley_Mutated) and all data that is not in Genbank will be available from a Data Repository for University of Minnesota (DRUM) archive record that has a unique digital object identifier (DOI) and is indexed by search engines. More general, multipurpose code developed concerning the project is also available in public repositories. These include file conversion scripts (https://github.com/MorrellLAB/File_Conversions) and workflow for variant calling using nucleotide sequencing data (https://github.com/MorrellLAB/sequence_handling). A public repository also includes resources related to the soybean Williams82 reference genome (https://github.com/MorrellLAB/Williams82_reference). Code related to the current project is in a private repository at (https://github.com/MorrellLAB/usda_brag_crispr) What do you plan to do during the next reporting period to accomplish the goals?The protein domain-based search for gene family members has dramatically improved the identification of genes that could be modified by a guide RNA targeting a member of the gene family. It also makes it clear that gene copy numbers can exceed initial estimates from a genome assembly. A focus on protein domains also has the potential to overlook partial gene copies, gene fusions, or pseudogenes. Pseudogenes could be a particular problem because they may not contain readily recognizable protein domains but could still be subject to unintended changes if they provide a sufficient match to a (relatively short) CRISPR/Cas9 guide RNA. To address this issue, particularly regarding pseudogenes, we are working on an alignment free approach based on k-mer (small string of DNA of length k). This approach will allow us to break target gene family members into smaller segments which can then be identified in sequence reads without aligning a genome assembly. Our approach will follow that of Pajust & Remm (2023 - doi:10.1038/s41598-023-44636-z). This approach, combined with the protein domain-based search for gene family members should provide a very comprehensive catalog of every gene copy of genes family members. This is important for identifying potential off-target changes, but also for identifying all of the members of family that would need to be modified to successfully induce phenotypic changes through gene knockouts.

Impacts
What was accomplished under these goals? Objective 1- Create CRISPR/Cas9 edited soybean families where lineages differ in the number of generations they carried the CRISPR/Cas9 transgene (Years 1 - 2). We have completed Oxford Nanopore (ONT) whole genome sequencing for the soybean lines Williams82 and Bert used as the transformation germplasm in our study. ONT sequencing now includes 42x mean coverage for Bert and 39x for Williams82. This is sufficient coverage to achieve our goals. These two lines are used for CRISPR/Cas9 modification of seven duplicate gene families. The Stupar lab has published an updated "platinum" genome assembly of Wiliams82 based on Pacific Biosciences (PacBio) sequencing (Espina et al. 2024 - doi: 10.1111/tpj.17026). We are using this reference as it likely incorporates a more accurate characterization of multigene families. Postdoctoral scholar Luis Willian Pacheco Arge has developed an original software pipeline designed to identify all members of a gene family based on a search for protein domains similar to the target loci. Dr. Arge has identified this pipeline as a Genome-Wide Gene Family Scan (GWGFScan). It is under development and will be available at (https://github.com/MorrellLAB/GWGFScan). Among seven genes that the Stupar Lab targets, we've currently identified additional gene family members for three of the families. These include tandem duplicates of genes. For one gene family, our analysis suggests one fewer gene family members. As detailed below, this analysis likely does not capture pseudogenes or gene copies. We are implementing approaches to identify all gene copies that CRISPR/Cas9 guide RNAs could alter. Objective 2 - Use phenotypic screening and DNA resequencing to identify on-target, off-target, and genome-wide changes induced by CRISPR/Cas9 activity (Years 1 - 3). We are using AVITI 300 bp paired-end sequencing for a set of samples modified by CRISPR/Cas9 at our seven target gene families. This will permit a first detailed look at modifications at target and non-target loci. Phenotypic screening for off-target changes in individual plants is ongoing and is an important part of the characterization of modified plants. We have also developed workflows and resources that characterize "callable" regions of the soybean genome, primarily by identifying genomic regions with high degrees of repetitive sequences or deletions relative to the reference. We have also developed code for manipulating long-read (Oxford Nanopore) sequencing. These resources are noted in this report and available in public GitHub repositories. Objective 3.Use long-read DNA sequencing and phenotypic screening to identify potential off-target or unintended changes induced by multiple CRISPR edits. While completing long-read sequencing for the soybean lines Bert and Williams82, used as the genetic background in all these experiments, we have optimized the conditions for DNA extraction and ONT sequencing. This typically involves loading and washing a flow cell so that three reactions are loaded per flow cell. This results in ~15x coverage per sample per flow cell, which is sufficient coverage for our off-target variant detection efforts. Thus we can generate ONT resequencing for CRISPR/Cas9 modifications at ~2 samples per gene family that is targeted for modification.

Publications

  • Type: Theses/Dissertations Status: Accepted Year Published: 2023 Citation: Liu C (2023) Biological problem solving through computation. Conservancy University of Minnesota https://hdl.handle.net/11299/259763
  • Type: Other Journal Articles Status: Submitted Year Published: 2024 Citation: Liu C, Frascarelli G, Stec AO, Heinen S, Lei L, Wyant SR, Legg E, Spiller M, Muehlbauer GJ, Smith KP, Morrell, PL. 2024. Sodium azide mutagenesis induces a unique pattern of mutations. bioRxiv.:2024.05. 06.592067. doi:10.1101/2024.05.06.592067