Applying Pangenomics to Polyploid Breeding Programs Using Blueberry as a Model

Recipient Organization
HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY
601 GENOME WAY
HUNTSVILLE,AL 358062908

Performing Department
(N/A)

Non Technical Summary
Many important crops are difficult to breed new traits, such as disease resistance, into promptly. For example, blueberries are a perennial crop that takes many years to become productive. Without genetic technology, we have to wait until the blueberry is at a productive age and take extensive data in the field to evaluate its set of traits. This means when new crosses are made we may not know for years if they address the issues we need them to address. In fact, with conventional breeding, it can take up to 20 years to develop a new blueberry line. This can have devastating effects on the agricultural industry as a new stressor (drought, disease, pest, etc...)could wipe out a crop yield for that year due to the susceptibility of the plant. However, new genetic technologies allow us to move much faster in terms of breeding better crops. We can do this by evaluating the genetics or genotype of a newly bred young plant to see if has the set of traits we are looking for without waiting for it to grow and taking extensive data. This allows plant breeders to make decisions on which plants to continue to grow and which to cull. It also informs breeders as to which plants to cross for the best possible outcome. These technologies are already being widely used to improve the efficiency of crop breeding for many species and have been shown to drastically decrease the amount of time it takes to release a new line. However, some more genomically complicated species don't have the same tools available yet. Many fruit and nut crops take many years to be productive, take much longer to breed new varieties, and have genomes that are difficult to work with to find traits of interest. These crops mustn't get left behind in this new era of genetic-based breeding. Blueberries serve as a great model for how to construct and implement new genomic tools for these crops. Blueberries are a very healthy and delicious crop that are native to the United States that have a large economic value for the agricultural industry. There are a few species of cultivated blueberry that have different ploidy levels. Ploidy refers to the number of genomes present in an organism. Humans are diploid as they have two genome sets- one from their mother and one from their father. Blueberries are more complicated as they are tetraploid, having four genomic sets, or hexaploid, having six genomic sets. In addition, they are highly heterozygous meaning each genomic set might differ from the others greatly. This makes developing genomic tools for blueberries more difficult. With this project, we will use blueberries as a model to show how we can implement state-of-the-art genomic tools to identify genetic regions that correlate to agriculturally important traits and develop markers for these traits that breeders can then use in their breeding labs. Specifically, we will generate new genome assemblies for blueberry lines that are being used for breeding at the University of Georgia (UGA) and create a pangenome- a reference map consisting of multiple blueberry genomes. This pangenome will contain most of the genetic diversity found within UGA's breeding program. We will then use the pangenome reference to map the genetics of alarge population of (>800 lines) of blueberries. Using the genetic information from this blueberry population mapped to the pangenome along with data taken in the field, we will be able to identify the genetic regions correlated to fruit weight, flowering time, and fruit development time.We will then build markers for that will indicate if these geneticregions are present that can cheaply and quickly be used in a standard breeding lab so breeders can screen their lines early for these traits. While we will release all new genetic and genomic data for public use that will be greatly valuable to blueberry breeders in the Southeast, we will also develop and releasean analytical pipeline that efficiently deals with complicated genomes. These tools will greatly aid breeders working on crops with complicated genomes worldwide.

Animal Health Component

100%

Research Effort Categories

Basic

Applied

100%

Developmental

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1120	1081	70%
202	1120	1080	30%

Knowledge Area
202 - Plant Genetic Resources; 201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1120 - Blueberry;

Field Of Science
1081 - Breeding; 1080 - Genetics;

Keywords

Goals / Objectives
The goal of this project is to use blueberry as a complicated polyploid species to model how pangenomics can be directly used in a specific and productive breeding program.My specific aims are:Produce a high-quality pangenome of blueberry cultivars from the UGA breeding programUtilize the genomes and phenotyping data from UGA to pinpoint the underlying genes for late flowering time, fruit development time, and berry sizeDevelop markers that can be used in UGA's blueberry breeding program.A robust bioinformatic pipeline for polyploid pangenome and marker association analysis will be generated.The discoveries from this project will positively impactnot only accelerating blueberry breeding but also extending the practical application of pangenomics to other polyploid crop species.

Project Methods
Aim 1:High molecular weight DNA will be extracted from select blueberry line tissue and checked for quality using the Femto Pulse System. Library preparation, HiFi sequencing using PacBio Revio, and Dovetail Hi-C will be done at HAGSC. Whole-genome HiFi sequencing data will be assembled into scaffolds using Hifiasm (Cheng et al., 2021), and Hi-C data will be used to create pseudomolecules with JUICER (Durand et al., 2016). Repeats will be identified and soft-masked using RepeatMasker (Smit et al., 2013-2015), and genes will be annotated with BRAKER2 (Br?na et al., 2020). I will use an in-house script to match alleles between the phased haplotypes. The pangenome will first be constructed using Minigraph-Cactusv. 2.5.1(Hickey et al., 2020; Hickey et al., 2022). The pangenome will be refined using an in-house pangenome graph pipeline called khufuPAN. khufuPAN utilizes Giraffe from the vg toolkit for short read mapping (Hickey et al., 2020; Sirén et al., 2021). The final output will be a panHap map that can be visualized on a web interface (https://w-korani.shinyapps.io/cyclops_eye_ii/). There is some flexibility as genomes may be added or removed from the pangenome.Aim 2:Eight hundred DNA samples of phenotyped blueberries will be skim-sequenced using Illumina NovaSeq 600Genetic variant curation will be performed with the Khufu pipeline (https://hudsonalpha.org/khufudata/) by mapping the reads to the pangenome graph generated in aim 1, calling SNPs, Indels, PAVs and CNVs, and imputing missing markers.A GWASwill then be conducted using the allele calls from the pangenome graph. TASSEL will be used to assign multiple alleles at a single locus when necessary(Bradbury et al., 2007). For each SNP position, a p-value will be calculated based on the linear regression of genotypes and phenotypic scores. The P-value will determine if the SNP is significantly linked to the phenotype. The relevant regions of the genome will be analyzed by NCBI BLAST for gene identification (Johnson et al., 2008).Aim 3: Utilizing the SNP/indel data obtained in Aim 2, I will design KASPar markers for each trait of interest using online tools. I will determine the most reliable markers and disseminate them to the blueberry breeding program for use in genomic selection.