Leveraging pangenome-enabled breeding tools to better understand the genotype-phenotype map for fiber quality traits and stress tolerance in upland cotton

Recipient Organization
HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY
601 GENOME WAY
HUNTSVILLE,AL 358062908

Performing Department
(N/A)

Non Technical Summary
Cotton breeding has made significant strides due to the rapid accumulation of upland cotton reference genomes, which have identified crucial genomic regions linked to superior fiber quality, robust stress resilience, and valuable Pima cotton introgressions. However, fully leveraging these genetic insights in practical breeding pipelines is challenging due to two primary gaps: 1) lack of integrated and standardized genomic resources: existing genetic data is often fragmented, hindering comprehensive comparisons across cultivars, and 2) absence of user-friendly tools: breeders lack intuitive platforms to explore and utilize regions of interest identified through genetic mapping studies.This project aims to solve those problems by creating a comprehensive, well-organized collection of cotton genomes, called a "pan-genome"--that brings together and improves existing genome data. This resource will help scientists and breeders better understand how cotton varieties differ at the genetic level. We will also pinpoint key genetic regions linked to traits breeders care about most. To make this work accessible and useful, we will develop a user-friendly online platform called the "Breeder Resource Hub." This tool will allow cotton breeders to easily explore genetic regions of interest, share data, and apply new discoveries to speed up the development of better cotton varieties through both traditional and modern breeding methods. Ultimately, this research can lead to more sustainable cotton farming, stronger rural economies, and better consumer cotton products, all while helping American agriculture remain competitive globally.

Animal Health Component

25%

Research Effort Categories

Basic

60%

Applied

25%

Developmental

15%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1710	1080	100%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1710 - Upland cotton;

Field Of Science
1080 - Genetics;

Keywords

upland cotton breeding

Goals / Objectives
Overview of goals: Cotton breeders have made impressive genetic gains over the decades, improving yield and resilience to environmental stressors, even with limited genetic variation in cultivated germplasm. This progress is especially notable given that pre-adapted genotypes rarely exist for many modern challenges, such as emerging pests, pathogens, and environmental stresses. However, the effectiveness of traditional breeding is showing signs of reaching its limits, including a narrowing genetic base in elite cultivars and an increasing demand to breed for region-specific or niche traits.To sustain improvement, cotton breeding must move beyond traditional methods, embracing advanced approaches. Modeling the relationship between genes and traits, the genotype-to-phenotype map, can speed up trait discovery and enhance breeding efficiency. Genomic tools can, for example, help identify fiber quality genes from historical Pima cotton introgressions or drought-response loci across varieties, enabling targeted improvement in upland cotton.Despite this promise, integrating genomics into cotton breeding remains challenging due to its large, polyploid genome, the growing number of reference genomes, and its history of interspecific hybridization. These factors complicate the effective application of genomic findings across cultivars in real-world breeding programs. The overall goal of this project is to develop an integrated pan-genome resource and query tools to empower breeders with genome-informed decision-making throughout the breeding cycle and optimize selection strategies across breeding varieties. This breeder-focused hub will enable intuitive exploration of high-value genomic regions, accelerating both traditional and biotechnology-driven cotton breeding initiatives.Objective 1: to build an integrated pan-genome resource. The Upland cotton pan-genome resource, developed through re-annotation and vetting of a set of more than 20 high-quality reference genomes, will facilitate more accurate comparative analyses and a deeper understanding of the variability among cultivars.Objective 2: to perform an integrative analysis of pan-genomic regions of interest across the upland cotton varieties. Here, we will project genomic regions of interest across the pan-genome resource, permitting user querying of QTL intervals associated with critical breeding targets, including superior fiber quality and robust stress tolerance. These pre-computed intervals will be updated regularly following interactions with breeders and other stakeholders.Objective 3: to develop and implement a 'Breeder Resource Hub'. We will empower practitioners with tools to explore pan-genomic variation with a user-friendly web resource. This interactive platform will not only serve as a central repository for the integrated genomic data but also offer intuitive exploration of specific regions of interest (QTLs), promoting data sharing and use across all corners of cotton traditional and biotechnology-informed breeding efforts.?

Project Methods
Objective 1 methods. To advance cotton breeding, we will create a comprehensive, user-friendly data hub bridging upland cotton genomic resources with breeder needs. First, we will aggregate 20+ high-quality upland cotton genomes selected for breeding relevance, collaborating with genome assembly and annotation teams to align on public data-sharing goals and incorporate additional genomes post-year one as needed. Each genome will undergo stringent quality control to exclude assemblies with issues, utilizing advanced technologies (PacBio HiFi, Hi-C/Omni-C scaffolding, short-read polishing) for accurate chromosome organization and annotation with Iso-Seq/Kinnex long-read RNA sequencing and multi-tissue short-read RNA-seq for comprehensive gene models and coexpression analysis.Using our GENESPACE tool's comparative genomics engine, we will construct synteny-constrained orthologs to map gene counterparts across genomes within meiotically homologous regions, mitigating polyploidy-related ambiguity. This gene identifier "dictionary" will enable seamless translation of annotations across resources. All genomes will be re-annotated using a standardized framework to minimize methodological biases in gene presence/absence (PAV) and copy number variations (CNV). This re-annotated pan-genome will undergo GENESPACE syntenic orthology analysis for consistent cross-genome comparisons, enhancing alternative isoform and transcript abundance analyses to inform candidate gene selection (see Objective 2). Finally, summary statistics and graphical visualizations will be provided, accessible via an interactive, breeder-focused web portal (see Objective 3).Objective 2 methods. To enhance trait discovery in upland cotton, we will address limitations of the single-reference (TM-1) approach used in quantitative genetics studies, which identify quantitative trait loci (QTLs) for fiber quality, yield, and stress tolerance but are misaligned with modern breeding cultivars. We will develop two complementary pan-genome-informed methods to translate QTL positions across diverse genomes and enable practical application.First, we will create a queryable "atlas" of physical positions, a subgenome-aware synteny map of meiotically homologous sequences across all reference genomes in our pan-genome resource. Using optimized alignment methods (alpha version: https://github.com/jtlovell/DEEPSPACE), we will develop, test, and release the DEEPSPACE tool as the first deliverable in year one, enabling precise mapping of QTL positions across genomes. Next, we will develop a user-friendly query tool to retrieve physical positions of meiotic homologs across all genomes based on user-supplied coordinates (genome ID, chromosome, start, end). In year two, supplemental modules will extract genes, variants, and other attributes within these intervals, enhancing accessibility for breeders.Additionally, we will explore regions of interest using two computational approaches. First, we will apply our optimized pan-genome graph construction pipeline (minigraph-cactus, vg toolkit, pggb, odgi, sequenceTubeMap) to build local pan-genome graphs, informing haplotype block reconstruction for gene editing. Second, we will implement a k-mer-based variant detection method to identify diagnostic short sequences for haplotypes in specified intervals, vetting unique k-mers to type variation across whole-genome sequenced genotypes and provide access to these results via the breeder resource hub (see Objective 3).Objective 3 methods. Under Objective 3, we will develop a user-friendly Breeder Resource Hub hosted on the Phytozome, addressing the limitations of existing resources like cottongen.org, which lack streamlined cross-genome query functionality. This hub will enable breeders to leverage recent quantitative trait loci (QTLs), introgressed regions, and rapid genotyping of offspring, reducing reliance on costly fiber phenotyping.First, we will design optimized data structures for efficient storage, retrieval, and analysis of large-scale genomic datasets. The hub will integrate all reference genome assemblies from Objective 1, linked to existing upland cotton genomes on Phytozome, with updated gene models while preserving prior versions for compatibility with external databases. Tools will facilitate exploration of structural variations, genes, and orthologs across cultivars, aiding identification of genotyping targets and candidate genes. Additionally, the portal will allow breeders to define and compare specific regions of interest across genomes. In year two, we will incorporate static databases of k-mer-based variants from Objective 2, enhancing the hub's utility for breeding applications.To evaluate project progress, we will incorporate the proposed work into our internal quarterly review process. Our team will rigorously assess the stated objectives and deliverables, ensuring their timely release to the broader cotton research community.