Source: PURDUE UNIVERSITY submitted to NRP
IF YOU BUILD IT, WILL IT SPREAD? QUANTIFYING GENE FLOW IN MANAGED AND UNMANAGED HONEY BEE POPULATIONS ACROSS THE UNITED STATES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1028967
Grant No.
2022-33522-37854
Cumulative Award Amt.
$499,959.00
Proposal No.
2022-03036
Multistate No.
(N/A)
Project Start Date
Aug 1, 2022
Project End Date
Jul 31, 2026
Grant Year
2022
Program Code
[HX]- Biotechnology Risk Assessment
Recipient Organization
PURDUE UNIVERSITY
(N/A)
WEST LAFAYETTE,IN 47907
Performing Department
Entomology
Non Technical Summary
Honey bees are essential agricultural pollinators, contributing $18 billion to the US economy. US beekeepers lose ~30% of their colonies annually to pests and pathogens. Recent genomic work has identified candidate genes that may mitigate or even reverse the decline, and both public and private institutions are generating trans- or mutagenic honey bees. The potential for gene flow among honey bee populations is high - much of the industry is migratory with interstate shipping commonplace, and bee mating biology encourages outbreeding. For society to weigh the benefits of transgenic honey bees against risks of escape, fundamental information is needed about the potential for gene flow and the likelihood of phenotypes persisting. Africanized honey bees (AHBs) offer a unique natural experimental system to explore both the spread of honey bee alleles and the effectiveness of regulatory measures in place to manage AHB spread. We propose to obtain genome sequence from 5000 honey bees (~100 from each US state) and use Resistance Modelling to identify specific management and environmental features that best predict AHB-specific gene flow. Partnership with the Apicultural Inspectors of America, Purdue Plant and Pest Diagnostic Lab, and our stakeholders across the country will collaboratively generate the first-ever reference population genomic data base for honey bees in the country. These data will also permit us to investigate how effectively current containment and agricultural practices and landscape features have restricted gene flow of AHB, and can be expected (or not) to prevent the spread of transgenes. ?
Animal Health Component
10%
Research Effort Categories
Basic
80%
Applied
10%
Developmental
10%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
30431101080100%
Knowledge Area
304 - Animal Genome;

Subject Of Investigation
3110 - Insects;

Field Of Science
1080 - Genetics;
Goals / Objectives
Honey bees are essential agricultural pollinators, contributing $18 billion to the US economy. US beekeepers lose ~30% of their colonies annually to pests and pathogens. Recent genomic work has identified candidate genes that may mitigate or even reverse the decline, and both public and private institutions are generating trans- or mutagenic honey bees. The potential for gene flow among honey bee populations is high - much of the industry is migratory with interstate shipping commonplace, and bee mating biology encourages outbreeding. For society to weigh the benefits of transgenic honey bees against risks of escape, fundamental information is needed about the potential for gene flow and the likelihood of phenotypes persisting. Africanized honey bees (AHBs) offer a unique natural experimental system to explore both the spread of honey bee alleles and the effectiveness of regulatory measures in place to manage AHB spread. We propose to obtain genome sequence from 5000 honey bees (~100 from each US state) and use Resistance Modelling to identify specific management and environmental features that best predict AHB-specific gene flow. Partnership with the Apicultural Inspectors of America, Purdue Plant and Pest Diagnostic Lab, and our stakeholders across the country will collaboratively generate the first-ever reference population genomic data base for honey bees in the country. These data will also permit us to investigate how effectively current containment and agricultural practices and landscape features have restricted gene flow of AHB, and can be expected (or not) to prevent the spread of transgenes.
Project Methods
We propose to sequence 5,000 honey bee DNA samples (100 per US State) sourced from collaborators and community scientists. Because of the magnitude of sampling, we have enlisted the expertise of Purdue's Plant and Pest Diagnostic Laboratory (PPDL), led by Tom Creswell (Letter of Support) and the AIA (Letter of Support). The PPDL diagnoses plant diseases and identifies insects, plants, and weeds for members of the public, research faculty, staff, students, and private businesses. They are fully equipped to receive and process samples and communicate their findings to the end user. In 2019 alone, the lab provided 4,507 diagnoses on 2,644 submitted samples from plant and insect pests and others sourced across the United States. In addition to their identification and diagnostic services, PPDL provides educational programming in the form of talks and extension bulletins for stakeholders to learn how to correctly collect and submit samples and learn how to interpret and act on diagnoses. Work led by PD Harpur has established PPDL as a central facility for AHB testing able to receive samples and process them for genome sequencing-a service listed and available through our partnership with the AIA. PPDL will ultimately serve as the primary sample-handler for this objective.A high density of DNA markers will be needed in view of the high level of recombination in the honey bee, averaging >5 crossovers per chromosome pair per meiosis 55 and rapid decay of linkage disequilibrium (LD) with physical distance (~50% reduction in R2 over only 500 bp56). Our GBS protocol will allow for such a map. Our protocol is modified from Multiplexed Shotgun Genotyping 57 combined with the TASSEL-GBS analysis pipeline v2.0 50, using an Illumina NextSeq in the UGA core facility (Figure). In a 96-multiplexed library, genotypes are tagged with barcodes in the first 6 bases of raw reads, making multiple libraries for populations with > 96 individuals. Reads that start with 'N' and with no barcode in the first 6 bases are removed. Processed reads are input to the TASSEL-GBS 50, reference based genotype caller, to guide stacking of short reads from the same genomic location, aligning using BWA58. A broadly sampled genomic reference population (5,000 colony samples, above) provides a data set powerful enough to identify segregating ancestral haplotypes and quantifying introgression across the country (Objectives 2-3)59,60. Collected data will be stored, visualized, filtered and shared via SQLite, a database engine with no configuration, and with seamless incorporation with R for subsequent statistical data analysis 61. SQLite will also enable us to present data in a public-facing way through PPDL. This will allow AIA to detect AHB in their respective States and integrate the database into their management decisions in the future. Populations will first be treated as individual US States and then defined genetically using the unsupervised mode of ADMIXTURE 62 and PCAdmix 63 at bi-allelic sites relative to reference samples (see Objective 2). Population genetic analysis (mismatch distribution, Tajima's D, Fu's F, pairwise Fst) and tests for neutrality and linkage disequilibrium will be carried out using pegas in R 64. Data types will range from 'passport' information (colony locations and histories as collected by community scientists and AIA and metadata input by PPDL) through qualitative and/or quantitative traits, to GBS data for each colony. Ultimately, PD Harpur is responsible for archiving all data types (see Data Management Plan). Following this, for each sample, we will compile environmental variables based on sample location from GIS layers relevant to dispersal (road ways, local latitudinal variation, land cover) and biology (land cover, water features) from the National Transportation Dataset (NTD) 78, The National Land Cover Dataset (NCLD) 79, from Raster layers in BeeScape 80 (where available, this represents available forage, insecticide risk layers, and foraging quality layers), honey bee colony density 81, and climatic variables extracted from high-resolution MODIS collections 82. All layers will be represented at the same resolution (30-100m) based on availability and clipped to a 5 km buffer surrounding the sampling locations, representative of a honey bee colony's foraging and mating distance. Included in this data set, will be variables accounting for State-by-state variation in management strategy considering AHB (Letter of Support). Work by our collaborator Dr. Jamie Ellis (University of Florida) has documented variance in AHB management across the country (Figure 3): Ellis' group has found that 35% (18) of mainland US states have no regulations in place to limit the spread of AHB. Of the 18 States with no AHB importation regulations, 11 additionally have no mandatory inspections of imported colonies. These States (e.g. Kansas, Utah) present the extreme end of the distribution of management and we predict higher levels of AHB ancestry in these relative to states that highly regulate importation ('no AHB Importation'; Figure 3) (e.g. Indiana, Ohio) after accounting for environmental variance (see next paragraph). Specific management information and the number of colonies moved state-to-state (e.g. propagule pressure), both kindly provided in the Ellis data set, will be included in subsequent analysis to identify predictors of AHB introgression and identify which management practices (if any) best prevent introgression using an Optimal Resistance Value approach using ResistanceGA in R 41. A resistance value of a surface is a value given to each landscape or environmental feature representing how much it impacts connectivity among populations 83. ResistanceGA identifies how landscape features affect gene flow among populations by iteratively varying the 'resistance' of any number of landscape (or other) features, finds the overall 'resistance' between populations 42, and then evaluates each of the included variables performance using a regression against genetic distance. Optimal regression models are chosen based on Akaike information criteria. To control for this collinearity, we evaluated mo del significance using the multiple regression on distance matrices function in the Ecodist package in R and calculating Variance Inflation 84. This method has been successfully applied to understand human-mediated movement of invasive species 85 among other systems. Because this method relies on genetic distance (and not the exact ploidy of the organism) it is highly tractable across species.

Progress 08/01/23 to 07/31/24

Outputs
Target Audience:During this reporting period, our target audience consisted primarily of beekeepers and bee managers across the United States, a group estimated at approximately 250,000 individuals. This audience includes a diverse range of stakeholders, from small-scale hobbyist beekeepers to commercial operations managing thousands of colonies. Within this larger group, a specific focus was placed on beekeepers and bee managers in Indiana, who represent a significant segment of our direct outreach efforts. Indiana alone accounts for around 4,000 active beekeepers, including those involved in commercial, sideline, and hobby beekeeping, as well as queen breeders and migratory pollination services. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We have hosted one Field Day at Purdue hosting 250 beekeepers and training them on monitoring genetics in their colonies. How have the results been disseminated to communities of interest?We have presented at Field Day (250 beekeepers). We have also presented results at 6 major beekeeping meetings nationwide, reaching ~1000 additional beekeepers. What do you plan to do during the next reporting period to accomplish the goals?We anticipate at least one publication by next period. During the next reporting period, we will focus on advancing each of our primary objectives, building on the significant progress already made. Our first objective, which involves developing a comprehensive reference population genomic dataset for U.S. managed and feral honey bees, will be a key area of effort. To enhance the breadth and utility of this dataset, we plan to expand our sample collection efforts. This will involve collaboration with additional community science groups, beekeepers, and regional stakeholders, particularly targeting underrepresented geographic areas. By increasing the diversity and number of samples, we aim to capture a more complete picture of the genetic landscape of honey bees across the United States. We anticipate adding at least 1,000 new samples to the existing dataset, which will further strengthen its value as a foundational resource for genomic studies and breeding programs. Additionally, we will continue to analyze the genetic diversity and population structure using advanced bioinformatics tools. This analysis will provide crucial insights into the genetic makeup of U.S. honey bee populations and inform strategies for improving colony health and resilience. To facilitate broader access and utility, we plan to develop a publicly accessible database platform, allowing researchers, breeders, and beekeepers to explore and utilize this genomic information. For our second objective, which focuses on identifying Africanized Honey Bee (AHB)-associated genomic haplotypes, we will refine and enhance our current methods. We are employing a dual approach using both mitochondrial DNA sequencing and genomic ancestry analysis tools like ancestryHMM. This strategy allows us to pinpoint specific AHB-associated haplotypes with high accuracy, providing a clearer understanding of how these traits manifest and spread within the honey bee population. Over the next period, we will work on refining the resolution of these haplotyping methods, improving the accuracy of A-haplotype identification. Furthermore, we plan to validate our findings through field testing with samples of known AHB origin. This validation process will ensure the robustness and reliability of our approach, enabling us to confidently identify AHB traits and understand their genetic underpinnings. The identification of these genomic markers will be a significant step forward in understanding the transmission genetics of AHB traits, which is critical for predicting the potential spread of transgenes in honey bee populations. In addressing our third objective, which involves quantifying AHB introgression across the United States and examining environmental factors influencing this process, we plan to begin detailed mapping of introgression levels using the extensive dataset we have collected. By integrating genomic data with environmental variables at multiple spatial scales (3 km, 5 km, and 10 km radii around each sample site), we aim to identify patterns and factors that affect the spread of AHB genetic traits. This analysis will involve correlating the levels of introgression with various landscape and climatic features, such as habitat type, land use, temperature, and precipitation. Understanding these relationships will provide insights into how environmental conditions might facilitate or hinder the spread of AHB traits, offering a predictive framework for assessing the risk of AHB introgression in different regions. Finally, we recognize the importance of engaging with stakeholders and the broader beekeeping community to ensure our findings are accessible and actionable. Throughout the next reporting period, we plan to conduct outreach activities, including webinars, workshops, and the development of extension materials, to share preliminary results and gather feedback. Engaging directly with beekeepers, queen breeders, and industry stakeholders will help us refine our methods and align our research with the practical needs of those managing honey bee colonies. By providing updates and educational resources, we aim to equip beekeepers with the knowledge to make informed decisions based on the latest genomic insights, ultimately supporting healthier and more resilient honey bee populations. Through these combined efforts, we anticipate making substantial progress toward our goals, enhancing the understanding of honey bee genetics, and contributing valuable resources to the beekeeping community.

Impacts
What was accomplished under these goals? Objective 1: Development of a Complete Reference Population Genomic Data Set We have successfully sequenced over 2,500 honey bee samples collected nationwide, creating the largest genomic dataset for honey bees globally. This effort combined samples from managed and feral colonies, with targeted contributions from community-science initiatives. This dataset provides a comprehensive reference for assessing genetic diversity and population structure in U.S. honey bees, serving as a critical resource for future genomic selection efforts. Objective 2: Identification of Africanized Honey Bee (AHB) Genomic Haplotypes Current work involves the identification of AHB-associated genomic haplotypes using a dual approach. We are employing mitochondrial DNA analysis and the ancestryHMM software to detect and characterize A-haplotypes. This process helps us pinpoint specific regions of the genome that may be associated with AHB traits, allowing us to understand the spread and impact of these haplotypes across the bee population. Objective 3: Quantification of AHB Introgression Across the USA We are poised to quantify AHB introgression by leveraging our extensive environmental dataset, which includes variables at 3 km, 5 km, and 10 km radii around each sample site. This spatial approach will enable us to analyze how various environmental factors influence the spread of AHB genomic traits, providing insights into landscape-level impacts on bee genetics.

Publications


    Progress 08/01/22 to 07/31/23

    Outputs
    Target Audience:The target audience for our work primarily encompasses a diverse range of stakeholders deeply involved in the apiculture industry within the United States. First, beekeepers represent a significant portion of our audience. They rely on advancements in genetic documentation to enhance the health, productivity, and resilience of their colonies. Apiary inspectors, tasked with ensuring the well-being of bee populations, benefit from our genetic documentation efforts to assess and manage disease risks more effectively. Additionally, bee breeders are key players as they seek to develop bee strains with desirable traits, such as disease resistance or high honey production, leveraging our genetic insights. Lastly, genetic scientists find value in our work as they contribute to the scientific understanding of honeybee genetics, fostering innovation and sustainable practices within the industry. Together, these stakeholders form a crucial audience invested in the advancement of genetic documentation for honeybees in the country, driving progress and sustainability in apiculture. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Our project has offered valuable opportunities for training and professional development, particularly through our participation in national and international conferences. We're proud to highlight that we've sent one student to attend three such conferences, providing them with exposure to cutting-edge research, networking opportunities with experts in the field, and a platform to present their own findings. These conferences serve as invaluable forums for knowledge exchange, skill development, and fostering collaborations within the scientific community. By supporting our student's attendance at these events, we're investing in the next generation of researchers and equipping them with the tools and experiences necessary to excel in their careers. How have the results been disseminated to communities of interest?The dissemination of our results to communities of interest has been a priority for our project. We've employed various avenues to ensure broad access to our findings. Firstly, we've actively participated in conferences, both national and international, where we've presented our research outcomes to a diverse audience of scientists, policymakers, and stakeholders in the apiculture industry. Furthermore, recognizing the importance of engaging directly with beekeepers, we've attended and presented at nearly a dozen beekeeper conferences. These events serve as vital platforms for sharing our findings with those directly impacted by our research, fostering dialogue, and gathering feedback from practitioners in the field. Additionally, to ensure that beekeepers have access to relevant genetic information, we've sent 'ancestry' reports to beekeepers along with each of the 2000 samples we've collected. This direct dissemination approach empowers beekeepers with actionable insights derived from our genetic analysis, enabling them to make informed decisions about their colonies' management and health. Through these concerted efforts, we've strived to bridge the gap between scientific research and practical application, ensuring that our findings are accessible and beneficial to the communities of interest in the apiculture sector. What do you plan to do during the next reporting period to accomplish the goals?In the upcoming reporting period, our focus remains steadfast on advancing our project's goals in genetic documentation of honey bees across the United States. We'll continue our efforts in collecting honey bee samples from diverse regions to ensure a comprehensive representation in our dataset. By leveraging both low-pass and GBS sequencing methods, we aim to reach our target of analyzing 5000 genomes, providing invaluable insights into honey bee genetics. Furthermore, we'll refine our analytical pipeline to enhance the accuracy of identifying sample ancestry, enabling more nuanced interpretations of genetic data. Integrating environmental variables into our analysis will deepen our understanding of the factors shaping honey bee populations and gene flow dynamics. Dissemination of our findings remains a priority. We'll actively participate in conferences, engaging with both scientific and beekeeper communities to share our research outcomes and gather feedback. Providing 'ancestry' reports to beekeepers with each sample fosters direct communication and empowers practitioners with actionable genetic insights. By diligently pursuing these avenues, we aim to contribute significantly to the understanding of honey bee genetics and inform strategies for managing population dynamics, ultimately promoting the sustainability and resilience of honey bee populations in the United States.

    Impacts
    What was accomplished under these goals? We've made significant strides towards achieving our project goals. Specifically, we've successfully sequenced 2000 honey bee genomes utilizing two genotyping strategies, namely low-pass sequencing and Genotyping-by-Sequencing (GBS). This comprehensive approach allows us to gather a wealth of genetic data from diverse bee populations, providing invaluable insights into their genomic makeup. Importantly, our sampling efforts have been extensive, with specimens obtained from nearly every state across the country, demonstrating a broad representation of honey bee populations. Furthermore, we've developed a robust pipeline for analyzing the genetic data, enabling us to accurately identify sample ancestry and understand the genetic diversity within and among populations. This analytical framework is essential for our research, allowing us to uncover patterns of gene flow and assess the effectiveness of containment measures. In addition to genetic analysis, we've also optimized a method for extracting environmental variables from each sample. This crucial step allows us to integrate environmental data with genetic information, providing a holistic understanding of the factors influencing honey bee populations and gene flow. While we've made significant progress, our work is ongoing, and we continue to collect samples to achieve our ultimate goal of sequencing 5000 honey bee genomes. This comprehensive dataset will serve as the foundation for our research, facilitating a deeper understanding of honey bee genetics and informing strategies for managing gene flow and mitigating risks associated with transgenic honey bees.

    Publications