Progress 09/01/23 to 08/31/24
Outputs Target Audience: Target audiences include federal regulators, plant genetics and genomics researchers, geneticists and breeders, evolutionary biologists, and population geneticists. They also include users and producers of soybeans or other crops subject to transgenesis or other forms of genetic modification. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The project has provided training and professional development opportunities for undergraduate students and a postdoctoral scholar. All trainees involved in the project have improved their biocomputing and programming skills and are learning data management and analysis skills. The lab also has a weekly lab meeting and journal club that provides the opportunity to discuss science and genetics informally with lab members. The grant supports postdoctoral scholar Dr. Arge, who also contributed to developing DNA sequencing handling pipelines. Dr. Arge is involved in DNA sequencing sample preparation and analysis. He also helps prepare manuscripts related to the project. The grant has also supported three undergraduate students, Elaine Lee, Jacob Pacheco, and Nathan Liang. Ms. Lee has now completed her undergraduate degree. Mr. Pacheco is completing his undergraduate degrees. Mr. Liang graduated and works as a computer programmer for Well Fargo Bank. All three students continue to contribute to the data analysis related to this project. How have the results been disseminated to communities of interest?Our primary means of disseminating the research has been through the scientific literature. A manuscript built on the broader theme of the nucleotide sequence-level consequences of biotechnology approaches is currently in review at PLOS Genetics and available on bioRxiv (doi: 10.1101/2024.05.06.592067). Both the code used for data handling and analysis is made available as is all of the primary and important intermediate data. The manuscript noted above includes a public GitHub repository for code (https://github.com/MorrellLAB/Barley_Mutated) and all data that is not in Genbank will be available from a Data Repository for University of Minnesota (DRUM) archive record that has a unique digital object identifier (DOI) and is indexed by search engines. More general, multipurpose code developed concerning the project is also available in public repositories. These include file conversion scripts (https://github.com/MorrellLAB/File_Conversions) and workflow for variant calling using nucleotide sequencing data (https://github.com/MorrellLAB/sequence_handling). A public repository also includes resources related to the soybean Williams82 reference genome (https://github.com/MorrellLAB/Williams82_reference). Code related to the current project is in a private repository at (https://github.com/MorrellLAB/usda_brag_crispr) What do you plan to do during the next reporting period to accomplish the goals?The protein domain-based search for gene family members has dramatically improved the identification of genes that could be modified by a guide RNA targeting a member of the gene family. It also makes it clear that gene copy numbers can exceed initial estimates from a genome assembly. A focus on protein domains also has the potential to overlook partial gene copies, gene fusions, or pseudogenes. Pseudogenes could be a particular problem because they may not contain readily recognizable protein domains but could still be subject to unintended changes if they provide a sufficient match to a (relatively short) CRISPR/Cas9 guide RNA. To address this issue, particularly regarding pseudogenes, we are working on an alignment free approach based on k-mer (small string of DNA of length k). This approach will allow us to break target gene family members into smaller segments which can then be identified in sequence reads without aligning a genome assembly. Our approach will follow that of Pajust & Remm (2023 - doi:10.1038/s41598-023-44636-z). This approach, combined with the protein domain-based search for gene family members should provide a very comprehensive catalog of every gene copy of genes family members. This is important for identifying potential off-target changes, but also for identifying all of the members of family that would need to be modified to successfully induce phenotypic changes through gene knockouts.
Impacts What was accomplished under these goals?
Objective 1- Create CRISPR/Cas9 edited soybean families where lineages differ in the number of generations they carried the CRISPR/Cas9 transgene (Years 1 - 2). We have completed Oxford Nanopore (ONT) whole genome sequencing for the soybean lines Williams82 and Bert used as the transformation germplasm in our study. ONT sequencing now includes 42x mean coverage for Bert and 39x for Williams82. This is sufficient coverage to achieve our goals. These two lines are used for CRISPR/Cas9 modification of seven duplicate gene families. The Stupar lab has published an updated "platinum" genome assembly of Wiliams82 based on Pacific Biosciences (PacBio) sequencing (Espina et al. 2024 - doi: 10.1111/tpj.17026). We are using this reference as it likely incorporates a more accurate characterization of multigene families. Postdoctoral scholar Luis Willian Pacheco Arge has developed an original software pipeline designed to identify all members of a gene family based on a search for protein domains similar to the target loci. Dr. Arge has identified this pipeline as a Genome-Wide Gene Family Scan (GWGFScan). It is under development and will be available at (https://github.com/MorrellLAB/GWGFScan). Among seven genes that the Stupar Lab targets, we've currently identified additional gene family members for three of the families. These include tandem duplicates of genes. For one gene family, our analysis suggests one fewer gene family members. As detailed below, this analysis likely does not capture pseudogenes or gene copies. We are implementing approaches to identify all gene copies that CRISPR/Cas9 guide RNAs could alter. Objective 2 - Use phenotypic screening and DNA resequencing to identify on-target, off-target, and genome-wide changes induced by CRISPR/Cas9 activity (Years 1 - 3). We are using AVITI 300 bp paired-end sequencing for a set of samples modified by CRISPR/Cas9 at our seven target gene families. This will permit a first detailed look at modifications at target and non-target loci. Phenotypic screening for off-target changes in individual plants is ongoing and is an important part of the characterization of modified plants. We have also developed workflows and resources that characterize "callable" regions of the soybean genome, primarily by identifying genomic regions with high degrees of repetitive sequences or deletions relative to the reference. We have also developed code for manipulating long-read (Oxford Nanopore) sequencing. These resources are noted in this report and available in public GitHub repositories. Objective 3.Use long-read DNA sequencing and phenotypic screening to identify potential off-target or unintended changes induced by multiple CRISPR edits. While completing long-read sequencing for the soybean lines Bert and Williams82, used as the genetic background in all these experiments, we have optimized the conditions for DNA extraction and ONT sequencing. This typically involves loading and washing a flow cell so that three reactions are loaded per flow cell. This results in ~15x coverage per sample per flow cell, which is sufficient coverage for our off-target variant detection efforts. Thus we can generate ONT resequencing for CRISPR/Cas9 modifications at ~2 samples per gene family that is targeted for modification.
Publications
- Type:
Theses/Dissertations
Status:
Accepted
Year Published:
2023
Citation:
Liu C (2023) Biological problem solving through computation. Conservancy University of Minnesota https://hdl.handle.net/11299/259763
- Type:
Other Journal Articles
Status:
Submitted
Year Published:
2024
Citation:
Liu C, Frascarelli G, Stec AO, Heinen S, Lei L, Wyant SR, Legg E, Spiller M, Muehlbauer GJ, Smith KP, Morrell, PL. 2024. Sodium azide mutagenesis induces a unique pattern of mutations. bioRxiv.:2024.05. 06.592067. doi:10.1101/2024.05.06.592067
|