Seeing double: Evidence for copy number variants in transgenic crop resistance and methods for their early detection

Recipient Organization
UNIV OF MARYLAND
(N/A)
COLLEGE PARK,MD 20742

Performing Department
(N/A)

Non Technical Summary
This research seeks to understand the frequency and mechanisms by which pests overcome plant resistance traits conferred by engineered genes (BRAG priority 5i), and develop a monitoring framework to improve transgenic crop durability. We have shown that genomic monitoring can track changes in resistance genotypes of wild pests over time, and this information can be leveraged to detect emerging resistance and trigger remediation. Our prior work focused on single nucleotide polymorphisms, but genomic monitoring can detect multiple types of genetic variants conferring pest resistance. We recently found that gene copy number variants (CNVs) strongly contribute to the field-evolved transgenic crop resistance observed in H. zea, our pest model. Understanding the role of CNVs in resistance evolution will be key to insect resistance management because CNVs often act to broadly increase detoxification and metabolism, conferring cross resistance to many compounds (including insecticides expressed by transgenic crops). Our proposed work will generate data and develop algorithms to improve detection of emerging resistance caused by CNVs. Using long read sequencing and targeted historical sequencing, we will characterize genome wide CNVs in H. zea, which will provide important insight into their role in transgenic crop resistance. To improve CNV detection from genomic monitoring data, we will develop a novel algorithm and benchmark it against available tools, providing a novel resource for agricultural researchers and regulators. This work will advance our understanding of which genomic patterns correspond to resistance associated CNV evolution and develop approaches for detection of those evolutionary signals from genomic monitoring data.

Animal Health Component

50%

Research Effort Categories

Basic

25%

Applied

50%

Developmental

25%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
211	3110	1080	100%

Knowledge Area
211 - Insects, Mites, and Other Arthropods Affecting Plants;

Subject Of Investigation
3110 - Insects;

Field Of Science
1080 - Genetics;

Keywords

plant-incorporated protectant resistance management.

copy number variant

genomic monitoring

Goals / Objectives
Objective 1: Measure variation among CNV resistance haplotypes using long-read sequencingTo accomplish Objective 1,we will identify CNVs by sequencing H. zea samples collectedbefore, during, and after resistance evolution with long read technology. Using whole genome nanopore sequencing we will reconstruct genome-wide CNV haplotypes in three Cry1Ab resistant samples (Objective 1a). We will also describe population level haplotypic variation in CNVs for 75 samples using targeted sequencing of one resistance associated chromosome (Chr9; Objective 1b). These data will allow us to recover full resistance associated haplotypes, improve our understanding of the evolutionary processes giving rise to resistance associated CNVs, and provide a set of validated CNVs for benchmarking CNV detection algorithms. Objective 2.Characterize the ancestral state of the Cry1Ab resistance associated genomic region in H. zea.To accomplish Objective 2,we will sequence the region of Chr9 containingthe resistance-related CNV forfield-collected museum quality H. zea from 1996. These data will allow us to examine whether the CNV existed as standing genetic variation in H. zea prior to commercial release of Bt crops. Sequence data from these samples will then be compared to existing DNA sequence data from samples collected in later years, allowing us to determine both the extent of DNA variation linked to the CNV, as well as how selection shaped Chr9 immediately following commercial release of Bt transgenic crops.Objective 3. Develop novel computational methods for discovering resistance-related CNVs from data produced by genomic monitoring.To accomplish Objective 3, we will first generate benchmark datasets and use them to evaluate the utility of existing methods for discovering resistance-related CNVs from genomic monitoring data (Supporting Objective 3a). Second, we will develop new algorithms, specifically targeting wild insect population data collected across multiple time points (Supporting Objective 3b). Finally, we will apply our methods to real H. zea time-series data to evaluate its potential for improved resistance monitoring (Supporting Objective 3c).

Project Methods
Our plan of work combines two evolutionary genomics experiments to determine the number and types of mutational events that resulted in Bt resistance evolution (Objective 1), as well as test whether the trypsin CNV on Chr9existed as standing genetic variation prior to commercial release of Bt crops (Objective 2). Results from these studies will help determine how selection shaped H. zea's Chr9 as Bt adoption grew. Understanding these patterns will allow us to improve upon existing genomic approaches for resistance monitoring. Specifically, we will use data from our first two objectives, gather existing data from previous BRAG-funded work, and develop simulated datasets to benchmark existing software for identification of CNVs in insects (Objective 3). Understanding the strengths and limitations of existing algorithms, typically developed for use in human cancer genomics, will enable us to improve upon or develop novel algorithms for use in other organisms (Objective 3).For Objective 1, we will sequence genomes of field-collected individuals from 2002, 2012, and 2019 with ultra-long reads from an Oxford Nanopore PromethIon. This will allow us to assemble through repetitive regions and recover full sequences of each CNV. With the addition of Illumina short read data for polishing of the error prone long reads, we will resolve full CNV haplotypes with high confidence. We will initially use long read whole genome sequencing for one individual per time point (Objective 1a). This will allow us to characterize the CNV landscapeacross the genome, identify genomic breakpoints, and recover full copies of each duplicated gene separately. We will also identify which CNVs are likely to play a role in resistance by comparison to already identified genomic windows under selection and resistance QTL. We will thenuse targeted Nanopore adaptive sequencing to identify resistance associated haplotypic diversity on Chr9 (containing our Bt resistance CNV) in 25 individuals from each of our three time points (Objective 1b), as Bt resistance spread. Data sets generated for Objective 1a and 1b will allow us to use both alignment-based and assembly-based CNV detection, providing two lines of evidence for each CNV. After assembly, we will predict and manually curate gene models for the CNV, and reconstruct gene amplification history. Based upon our gene models, we will also identify potential effects of variants on gene copy function, detect signals of selection separately for each gene copy, and compare gene expression across copies using existing RNASeq data.For Objective 2, we will empirically generate ancestral genetic information (e.g. prior to deployment of a widespread management tool) for wild H. zea exposed to increasing Bt selective pressure. This will reveal early diversity at the trypsin CNV and its flanking regions. To accomplish this objective, we will use target sequence capture to prepare Illumina libraries for 36 ancestral H. zea samples. These target capture libraries will allow us to sequence much of the 5-6 Mb region of Chr9. Target capture reads will be mapped to the H. zea genome, followed by SNP calling. This will generate SNPs and coverage depth data which can be used to analyze evolutionary genomic patterns at (and outside of) our target CNV. SNPs and coverage depth data from this ancestral dataset will be compared to samples collected from the same geographic region in later years (2002, 2008, 2010, 2012, and 2017; n = 25-30 per year). Within this comparative framework, we will test our hypothesis that 1996 samples will have the highest diversity in our CNV region, but strong selection on the CNV in later years should result in an appreciable decline in diversity, both within the trypsin cluster, as well as in the flanking regions. We will also compare depth of coverage across years (scaled to mean overall depth per sample) to test for evidence of the CNV in the ancestral population. Finally, we will ask whether the CNV or the haplotypes flanking the modern CNV existed at detectable frequencies in these ancestral samples, providing insights into their presence as standing genetic variation and qualities that might influence their detection in genomic monitoring data.For Objective 3, we will combine existing and novel data streams to test existing algorithms/software, as well as improve upon or develop new algorithms/software for CNV detection from genomic monitoring data in the following stages.Stage 1 (Supporting Objective 3a): Generate benchmark datasets and use them to evaluate the utility of existing methods for discovering resistance-related CNVs from genomic monitoring data.We will generate three benchmark datasets. Benchmark (A) will be created by Illumina sequencing of up to four H. zea trios. Benchmark (B) will be synthetic genomic monitoring data for H. zea. Specifically, we perform SLiM4 population simulations to create synthetic genomes with variants under neutral evolution (genetic drift) and positive selection (e.g. due to conferring a resistance phenotype). The resulting simulated genomes will be used to guide the introduction of copy number and single nucleotide variants into an H. zea reference genome, producing time-series population data. Lastly, reads will be generated from these synthetic genomes using the ART simulator. Benchmark (C) will be created in a similar fashion to benchmark B but using a broader set of genomes drawn from the USDA-ARS Ag100Pest Initiative. Benchmark A will be used to evaluate the utility of existing methods for discovering and genotyping CNVs from insect genomes (e.g. CNVnator, CNVcaller, DELLY, and MANTIS). Benchmarks B and C will be used to evaluate the utility of existing methods for discovering resistance-related CNVs from time-series data. First, we will apply methods of CNV calling and genotyping (e.g. CNVnator, CNVcaller, DELLY, and MANTIS). Second, we will transform the resulting CNV genotypes into two-state variants, indicating whether there are 2 copies or not. Third, we will apply methods for detecting two-state variants under positive selection from time-series population data (e.g. FIT and Timesweeper). At each stage, we will evaluate methods for accuracy and computational efficiency.Stage 2 (Supporting Objective 3b): Develop new algorithms, specifically targeting wild insect population data collected across multiple time points.We will develop new methods that address the limitations of existing methods identified during benchmarking. We will also develop a novel statistical framework for identifying copy number variants (i.e. multi-state variants) under positive selection from time-series population data, as no such method exists for this task. Lastly, we will develop tools for validating and inspecting results.Stage 3 (Supporting Objective 3c): Apply our methods to real H. zea time-series data to evaluate the potential for improved resistance monitoring.We will apply the bioinformatics pipelines developed in our project to genomic monitoring data collected for H. zea from 2002, 2008, 2010, 2012, and 2017 (n = 25-30 per year). We will run pipelines taking each of these years as the present time point to determine the earliest year that resistance-related CNVs (e.g. the trypsin 77 CNV) can be detected. This will help us to determine the utility of our methods for early detection.