Progress 09/01/24 to 08/31/25
Outputs Target Audience:Our target audience for this reporting period includedbasic and applied entomologists, applied geneticists,and regulatory agencies involved in resistance management for pesticidal biotechnologies. An additional target audience includes developers of algorithms for variant calling from short read data. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Funding for this projecthas provided Co-PD Taylor with the opportunity to recruit an undergraduate student researcher at Hofstra University. This student will complete a year-long senior capstone project focused on the downstream analysis of the genome assemblies produced this year. The undergraduate student will learn about the research process, fundamental genetics and genomics, and will develop bioinformatics skills including gene annotation, variant effect prediction, and gene tree reconstruction. PD Fritz has had theopportunity to train one postdoctoral researcher (Ben Schultz) at the University of Maryland. Ben began his postdoctoral position in June of 2025 after a lengthy search. He is learningtheH. zeastudy system, further developing his bioinformatics/data analytics skills, and is beginning to mentor other trainees involved in this project. Co-PDMolloy has been provided the opportunity to train one University of Maryland graduate student (Junyan Dai). Junyan Dai will enter his second year of the computer science PhD program in fall 2025. Since his hire in spring 2025, Junyan has learned fundamentals of genomics and population genetics, familiarized himself with the H. zea study system, and developed bioinformatics skills. This training is critical for Junyan to apply knowledge acquired through his PhD coursework in computer science (e.g., algorithm design and analysis, parallel computing, and machine learning) to this project. How have the results been disseminated to communities of interest?PD Fritz and co-PD Taylor each shared results from this project in presentations given at the 2025 Plant and Animal Genome Conference. These presentationsreached academic researchers in the fieldsof agriculture, entomology, and applied evolutionary biology. PD Fritz also gave one invited seminar on results from this project at the University of Virginia during the spring of 2025, reaching academic researchers in the fields ofbasic and applied evolutionary biology. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: Measure variation among CNV resistance haplotypes using long-read sequencing In the next reporting period we will complete the downstream analysis of the two newly assembled haplotypes. Downstream analysis will include manual gene model curation, analysis of potential functional impacts of gene variants, and reconstruction of gene amplification evolutionary history. Additionally, using the sequencing and analysis approach validated in this project period, we will target 2 more individuals for whole genome PacBio HiFi sequencing that are known to have different serine protease homologue gene copy numbers. Whole genome HiFi sequencing is more expensive than our original targeted adaptive nanopore approach. Using only whole genome HiFi sequencing we could accomplish our planned objectives, but with a more limited sample size. Due to our interest in describing haplotypic variation in and among multiple populations, we plan to try one more targeted adaptive nanopore sequencing run, with a modified protocol. In consultation with the sequencing center, we decided that hard masking the reference sequence and modifying DNA isolation protocols to minimize fragmentation could yield significantly better results. If we are unable to produce satisfactory results with this second attempt at adaptive nanopore sequencing of chromosome 9, we will fully pivot to using only whole genome PacBio HiFi sequencing which we have shown is capable of describing variation in these important resistance haplotypes. Objective 2.Characterize the ancestral state of the Cry1Ab resistance associated genomic region inH. zea. Inthe upcoming reporting period, we will use the newly assembled haploid genomes from resistant and susceptible individuals (see Obj 1) to develop baits for target-capture sequencing. We will work with Arbor BioSciences or a similar companyto optimize baitsets and synthesize them. Museum quality H. zea samples collected from before or shortly after the commercial release of Cry1-expressing crops will be identified and submitted for sequencing at Arbor BioSciences. Upon receipt of the sequencing data, we will begin bioinformatic analyses to characterize haplotypic diversity at the time of commercial release of Bt crops Objective 3.Develop novel computational methods for discovering resistance-related CNVs from data produced by genomic monitoring. In Fall 2025, we will complete our CNV benchmarking study and prepare this work for publication in a peer-reviewed scientific journal. We have planned several analyses to finalize our preliminary results. First, we will evaluate whether filtering CNV regions impacts relative method performance; for example, we will filter CNV regions associated with repetitive sequences (e.g., transposable elements) as well as those identified in all individuals, which likely reflect common differences between the reference genome. Second, we will compare methods in terms of copy number. Specifically, many methods, including LUMPY, identify CNV regions associated with gains or losses but do not indicate a specific copy number, whereas other methods like CNVkit indicate both the regions and the copy number. As our preliminary results indicate that LUMPY more accurately identifies CNV regions than CNVkit, we will evaluate the impact of calling copy number in the CNV regions identified by LUMPY with different methods. For this evaluation, we will consider read mappability scores of the CNV regions. Finally, we will extend our benchmarking study to include a greater number of CNV callers, including those developed in recent years (e.g., SurVIndel2, Nat Comm, 2024). In Spring 2026, we will begin working on supporting objective 3b: developing an algorithm for identifying CNVs in a population associated with resistance evolution.
Impacts What was accomplished under these goals?
At the time of writing this proposal, we had identified that one chromosome with the strongesteffect on Cry1Ab resistance in H. zea contained a copy number variant of multiple trypsin-like genes. This unexpected finding of a non-target site mechanism of resistance led toour proposed objectives, which we use to organize our accomplishments below. Objective 1: Measure variation among CNV resistance haplotypes using long-read sequencing (20% complete) The major goal of objective 1 wasto characterize variation in Cry resistance copy number haplotypes using long-read sequencing. In year 1 we planned to complete the sequencing and in years 2 and 3 we planned to complete analysis and publication. Our planned approach focused on targeted Nanopore long read sequencing of chromosome 9 for three populations with whole genome sequencing for a small number of individuals. As targeted Nanopore sequencing of chromosomes is an emerging technology, we completed a trial sequencing run on a subset of samples. Analysis of that trial sequencing run suggested that repetitive content on the targeted chromosome negatively impacted sequencing yield, coverage, and the resulting assemblies. Based on those preliminary results from the targeted sequencing we pivoted to focus on using whole genome long read sequencing with PacBio HiFi reads as an alternative approach to characterize resistance haplotypes. Using whole genome PacBio HiFi sequencing and a trio-binning assembly approach, we successfully produced high quality resistant and susceptible whole genome assemblies. These assemblies contained fully-resolved resistant and susceptible haplotypes of the target region. The resistant haplotype included three copies of the ~200 kb regiondescribed in our previously published papers. Automated annotation also identified three copies of most of the genes in that ~200 kb region, including the trypsin-like genes (serine protease homologs), which are the gene candidates we have linked to resistance evolution. The resistant assembly includes full sequences of each gene copy and break points for the amplification events. Pivoting to whole genome PacBio HiFi sequencing allowed us to not only recover the sequences of the amplified genes we were targeting but will make possible genome wide analyses investigating the role of gene amplification in resistance evolution broadly. With these first assemblies, we have developed an analysis pipeline that can be applied to other samples to begin describing variation in resistance haplotypes. Objective 2.Characterize the ancestral state of the Cry1Ab resistance associated genomic region inH. zea (2% complete) The major goal of objective 2 was to characterize ancestral variation atthe region of Chr9 containingthe resistance-related CNV for H. zea by sequencing historical, museum-quality samples from our collection. We had planned to begin bait development and sequencingsamples starting in year 1, and to complete analyses and publication in year 2.During this performance period, we advertised for a postdoc. However, we were unable to find a candidate after our first call for applications, delaying the start of this objective. We were able to hire a postdoctoral researcherto assist with objective 2 inJune of 2025. They havebegun identifying and working with gDNA samples that we wish to use for objective 2. Objective 3.Develop novel computational methods for discovering resistance-related CNVs from data produced by genomic monitoring (25% complete) The major goal of objective 3 was to develop a computational method for discovering resistance-related copy number variants (CNVs) from genomic monitoring data (Years 1 - 3). We proposed to accomplish this goal through three supporting objectives. First, we planned to evaluate the utility of existing methods for calling CNVs from whole genome sequencing of H. zea individuals (Supporting Objective 3a). Second, we planned to develop an algorithm that takes CNVs called from genomic monitoring data as input and identifies those likely to be associated with resistance evolution (Supporting Objective 3b). Third, we planned to apply CNV callers and our method to real H. zea time-series data to evaluate its potential for resistance monitoring (Supporting Objective 3c). We proposed to complete supporting objectives 3a, 3b, and 3c in years 1-2, years 2-3, and year 3, respectively. Trio data sets are considered a gold standard tool for benchmarking variant callers, as Mendelian consistency can be evaluated between the parents and offspring. To address our supporting objective 3a, we have generated a benchmarking data set by performing whole genome sequencing of 4 H. zea trios (12 individuals total). To our knowledge, this is the first trio data set generated for H. zea or related systems. Moreover, each trio is an F1 cross between Cry resistant and susceptible individuals, with the respective presence or absence of the CNV on chromosome 9 confirmed via digital droplet PCR. Simultaneously, we recruited a computer science PhD student (Junyan Dai), who is leading the CNV calling benchmarking efforts. Junyan has just completed benchmarking three of the most popular CNV callers: CNVkit, GATK-gCNV, and LUMPY. These respective methods made 16.5, 10358.5, and 45982 CNV calls on average, of which 6 (36%), 2976.5 (29%), and 1745.5 (4%) were inconsistent with Mendelian inheritance. Not only did LUMPY achieve the lowest inconsistency rate, but it was also the only method to correctly identify CNVs in the regions targeted by digital droplet PCR regions across all individuals. To summarize, our preliminary results demonstrate that (1) popular CNV callers are highly variable in terms of their performance on H. zea, highlighting the need for benchmarking, and that (2) the method LUMPY may produce sufficiently accurate CNV calls for genomic monitoring of H. zea.
Publications
|