Source: HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY submitted to NRP
PAN-MAGIC GENOMIC PLATFORM FOR DISEASE RESISTANCE, DROUGHT TOLERANCE, AND YIELD ENHANCEMENT IN PEANUT
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030499
Grant No.
2023-78408-39694
Cumulative Award Amt.
$490,000.00
Proposal No.
2022-07290
Multistate No.
(N/A)
Project Start Date
Apr 15, 2023
Project End Date
Apr 14, 2026
Grant Year
2023
Program Code
[A1811]- AFRI Commodity Board Co-funding Topics
Recipient Organization
HUDSONALPHA INSTITUTE FOR BIOTECHNOLOGY
601 GENOME WAY
HUNTSVILLE,AL 358062908
Performing Department
(N/A)
Non Technical Summary
Using genomics technology to identify molecular markers that can be used to select for traits of interest is still not as efficient as it can be. Sequencing technology has outpaced the methods used to connect genotypes to phenotypes effectively across a wide range of species. One major time constraint in breeding programs is the bifurcation of the applied breeding program with the genetic discovery program. Using breeding relevant structured populations in an innovative way has the potential to increase the resolution of genetic mapping and decrease the time it takes to develop important genomic tools. The ability to react quickly to emerging threats in agriculture with native genetic variation being identified, selected for, and deployed is an important goal for translation genomics research.We will use advanced sequencing technology to construct a pangenome for a newly developed peanut multi parent advanced generation intercross population (MAGIC). We will assess the advantages of constructing a MAGIC population in an inbred polyploid crop that has minimal genetic diversity. We will map two very important traits in peanut; stem rot resistance and drought tolerance. We will use a modified strategy to attain higher genetic resolution; i.e. increase our ability to detect variation that is tightly linked to our trait of interest. This strategy will rely on the genetic power of the pangenome to shift the resource allocation from population advancement and replication to one year of intensive phenotyping and genetic mapping. We anticipate that the shift in resources will allow the identification of beneficial genomic tools quickly, within breeding relevant populations, that will effectively combine breeding with genetic discovery. Agronomically relevant germplasm will be able to be selected for and tested within the genetic experiment, vastly decreasing the time from discovery to crop improvement.The goal of this project is to demonstrate this approach is valuable as a new tool to more quickly develop genomic resources and improve crops more effectively to feed a growing population in our evolving environment.
Animal Health Component
25%
Research Effort Categories
Basic
25%
Applied
25%
Developmental
50%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20218301081100%
Knowledge Area
202 - Plant Genetic Resources;

Subject Of Investigation
1830 - Peanut;

Field Of Science
1081 - Breeding;
Goals / Objectives
The major goals of the project are to demonstrate the efficiacy of developing pangenomic resources for multi parent structured populations that are directly relevant for breeding programs and to demonstrate that pangenomics in breeding populations helps acheive functional marker discovery and deployment in advanced germplasm lines. The specific objectives of the goals are as follows:1) Develop a pangenome graph for an 18-way multi parent advanced generation intercross population (MAGIC) developed in peanut using PacBio HIFI assembled genomes.2) Obtain genome wide variation from low coverage whole genome sequencing data on all the F1 hybrids that were used to construct the population across the 4 rounds of crossing.3) Analyze the variants using the pangenome graph and obtain high resolution map of cross overs during the construction of the population to eveluate the impact of rounds of intercrossing4) Obtain genome wide variation on the final MAGIC hybrid families that are a result of the 4 rounds of intercrossing to provide genotype information for functional marker discovery using the population5) Demonstrate howutilizing pangenomics increases the resolution to identify functional variation by mapping two important traits in peanut; stem rot resistance and tolerance to late season drought stress.
Project Methods
The main objectives of the project aim to develop and utilize a pangenome to map two important traits in peanut using low coverage whole genome sequencing (wgs). We have sequenced and assembled genome sequences from the 18 parent genotypes. Those assemblies will be used to construct a pangenome graph using PanPipes. All of the plants that were used during crossing will be sequenced and genome wide variantion will be assessed using the pangenome graph. Chromosomal crossing over sites will be quantified after each round of crossing from the initial round to the fourth round of crosses. The final set of hybrid families will be genotyped in the same way using low coverage wgs. The unique aspect of this is the use of low coverage sequencing to genotype a polyploid crop species. We have specially developed informatics to analyze these data effectively but it is not commonplace.There are two different traits of interest we will map using the MAGIC population; stem rot resistance and drought tolerance. Because the final set of families need to be advanced to fixation, we will map using remnant seed of the 8-way hybrids (3 rounds of crossing).For stem rot resistance, we will germinate 500 F2 seeds from a set of 8-way hybrid plants. We will take cuttings of each plant to recover seed. We will take tissue from each plant and hold for DNA extraction. Rows of a tolerant stem rot variety will be sowed per agronomic standards in a field used exclusively for stem rot evaluation. The 500 plants will be randomly transplanted in the rows, 5 to a row, and marked with flags. When the row canopy is mature, we will inoculate the flagged plants with an agar disc of sclerotium at the base of the plant and water the plants every day for 5 days to produce high humidity. The plants will be scored after sufficient disease pressure has occurred on a 1 to 5 scale for stem rot resistance. The 100 most resistance plants and the 100 most susceptible plants will be sequenced and a QTL-seq analysis will be conducted by bulking in silico to map potential resistance-linked loci using the pangenome graph. There are three unique departures from usual methods in this analysis; 1) we will use un-replicated single plants and will instead "replicate alleles" instead of discrete genotypes 2) we will sequence individuals separately and analyze the bulks in silico instead of bulking DNA and sequencing together and 3) we will use a pangenome to analyze bulk sequencing data.For drought tolerance, we will use a similar strategy in that we will germinate plants in the greenhouse, take cuttings to retain seed from each line, and transplant into the field. We will transplant 1,000 plants in 100 ft rows spaced 2 plants per foot. At about 100 days post planting, we will restrict water for 40 days by using drought shelters over the plots. The plots will be rated as a 1 to 5 visual score over the course of the drought stress. We will test two methods to map drought linked haplotypes; 1) We will use bulk analysis as described above to test the difference between drought tolerant plants and drought susceptible plants and 2) we will test drought tolerant plants versus a randomly sampled set that represents a null model.The outputs will be evaluated by the ability to successfully map field-based traits and identify strongly associated loci with high value diseae resistance and stress tolerance quickly. The results of the mapping efforts will be used to select disease resistant lines and drought tolerant lines that will be tested in replicated trials in the final year of the project. The key milestones are that we have successfully identified molecular targets for improvement and we have tested the efficacy of using genomic data to select for those targets in a genome assisted breeding program.

Progress 04/15/24 to 04/14/25

Outputs
Target Audience:We reached bioinformaticians and software developers by releasing two major software packages that help to analyze whoel genome sequencing data on pangenome graphs and help analyze the results. The KhufuEnv includes a set of 132 tools that effieciently analzye large ngs datasets. We reached plant breeders, genomicists, geneticists, and peanut researchers by presenting at conferences including Advances in Arachis Genomics and Biotechnology (2025 Goa, India), Annual Peanut Research and Education Society annual meeting (2025, Richmond, VA), National Assoicaiton of Plant Breeders annual conference (2025, Hawaii), and Plant and Animal Genomics (2025, San Diego, CA). We reached professionals in the field of breeding and genomics by presenting at the first PAG webinar series (July 10, 2025). The webinar had more than 500 registerations and individuals representing 30 countries attended. Changes/Problems:There have been no major changes in approach. We did add one intriguing activity. We have selected on the most promising single plants, increased them in winter nurseries, and started evaluating plots for drought tolerance and white mold resistance. The first year plots were extremely tolerant to drought. We look forward to the second year evaluations. What opportunities for training and professional development has the project provided?This project has provided training of 1 PhD student and two postdoctoral researchers. How have the results been disseminated to communities of interest?Results have been disseminated through national and international conferences (PAG, AAGB, APRES, NAPB) and through webinars (PAG/Genomeweb webinar series). We have released two pieces of software through github for bioinformaticians. What do you plan to do during the next reporting period to accomplish the goals?The next reporting period will include a third season of field phenotyping, full analysis of the first two years, and final analysis of the recombination events within the population. We will also prepare and finalize final publications and go through the review process of the currently under review publications.

Impacts
What was accomplished under these goals? The project has 3 main objectives: Objective 1 - Evaluate the impact of a multiparent crossing scheme for creating novel combinations of alleles in peanut and identifying functional variation Objective 2 - Genetically map molecular targets for strong drought tolerance and disease resistance segregating in the MAGIC population Objective 3 - Utilize pan-genomic graphs as a method to increase the efficiency of genetic mapping in complex populations Objective 1 The first step to accurately identifying crossing over within the generations of the MAGIC population is to derive a novel imputation strategy that utilizes the population structure. Using genotyping information from each level of the 18-way MAGIC population (specific to each cross), we can iteratively impute genotypes across each level for skim-seq data and assess the recombination architecture at the F1s. Having sequencing data through the final crossing scheme, we can also assess the amount of initial diversity lost and if there is any bias towards certain founders in the population. Given that MAGIC populations are often used for mapping, we will also compare the accuracies of using initial the initial founder HiFi data to impute the members of the 16-way cross versus an iterative imputation pipeline that accounts for genotypes across each generation. This may provide insight on the importance of subsampling within MAGIC generations for genotyping since imputed genotypes serve as the foundation for identifying QTL in MAGIC populations. Pangenome graph construction: Captures 4,795,144 variants across 18 haplotypes Sequencing of MAGIC population: PanMap consists of 669,891 filtered variants across 770 individuals Average coverage: 0.77x Sequencing of "ground-truth" MAGIC population subset: PanMap consists of filtered variants across of 34 "ground truth" samples which are being subset/down sampled to 1, 0.5, and 0.25x for assess impact of coverage on variant calling and imputation Average coverage: 6.3x Iterative imputation pipeline: Starting at the F1 generation, imputation is initially fueled by HiFi (>20x) long-read sequencing data of parents as the reference, and iteratively, imputing missing genotypes for the following generations for each cross, using imputed genotypes for each layer to serve as reference for the next Current working script is available on GitHub: https://github.com/hw85241/MAGIC; will serve as a resource for others who take on this approach Objective 2 In 2023, two field experiments were conducted. First, 740 individual plants (50 checks) - representing F1 families from an 8-way cross - were germinated in the greenhouse. The plants were sampled for DNA, tagged, and transplanted in the field within rows of white mold tolerant cultivar Georgia 12Y. One month after transplanting, plants were inoculated with Sclerotium rolfsi mycelium agar disks punches and the plots were well watered for two days to promote fungal growth. Disease ratings were taken every week starting one week after inoculation. Disease ratings were based on a 1 - 5 scale where 1 represents no disease and 5 represents plant death. Ratings of 2,3, and 4 represent disease progression from small lesions on the mainstem to larger lesions on the mainstem and lateral stems. A total of 700 plants were sequenced using Twist 96-plex whole genome sequencing library prep. The average sequencing depth across the population was 0.86x coverage. The resistant and susceptible tails of the disease distribution were identified for genetic mapping. Analysis is still in progress. Second, 900 plants (100 checks) - representing F1 families from a separate 8-way cross - were germinated in the greenhouse. The plants were sampled for DNA, tagged, and transplanted in the field spaced one foot apart. After 100 days, drought shelters were placed over the plots to induce stress. Drought stress visual ratings were taken starting 10 days after stress induction at 3 different times. Final rating was taken at 40 days post stress induction. Visual rating was a 1 to 5 scale where 1 is no visual stress (wilting) and 5 is completely brown and dead. To control for edge effects (where edges of the shelter received less stress than middle rows), the plants were separated into edges (300) and middle rows (900). All 900 plants were sequenced to achieve ~1X coverage. For the edge rows (600 plants), the bulks averaged a rating of 1 for the tolerant bulk and 4 for the susceptible bulk. For the middle rows (300 plants), the tolerant bulk averaged a rating of 1.4 and the susceptible bulk averaged a rating of 4.8. Both sets of bulks will be analyzed separately and together. In 2024, the experiment was replicated with 700 plants analyzed for white mold and 900 plants analyzed for drought. All 1,600 plants were genotyped with ~1x coverage wgs and variation has been cataloged on the pangenome graph. Analysis is currently underway. Objective 3 To efficiently profile variation on the pangenome graph, we developed a comprehensive pipeline that prepares graphs from minigraph-cactus, calls variants from vg tools and filters for accuracy, and filters final variant calls for allele frequency, depth, and missing data. We named this pipeline, KhufuPAN, a pangenome aware version of our internal software Khufu (https://www.hudsonalpha.org/khufudata/). KhufuPAN can be accessed publicly at https://github.com/w-korani/KhufuPAN. To efficiently filter and query graph-based variants, we also created the KhufuEnv (Wright et al., 2025). KhufuEnv is a suite of tools that can be used to convert, filter, query hapmap and panmap files. We developed a new format, the panmap, that contains the information necessary to understand the variation from a graph and how it relates to the graph structure. Critical information such as the graph genome where the allele is derived and the size and sequence of the variant is included. References Hickey G, Monlong J, Ebler J, Novak AM, Eizenga JM, Gao Y, Abel HJ, Antonacci-Fulton LL, Asri M, Baid G, Baker CA, Belyaeva A, Billis K, Bourque G, Buonaiuto S, Carroll A, Chaisson MJP, Chang P-C, Chang XH, Cheng H, Chu J, Cody S, Colonna V, Cook DE, Cook-Deegan RM, Cornejo OE, Diekhans M, Doerr D, Ebert P, Ebler J, Eichler EE, Fairley S, Fedrigo O, Felsenfeld AL, Feng X, Fischer C, Flicek P, Formenti G, Frankish A, Fulton RS, Garg S, Garrison E, Garrison NA, Giron CG, Green RE, Groza C, Guarracino A, Haggerty L, Hall IM, Harvey WT, Haukness M, Haussler D, Heumos S, Hoekzema K, Hourlier T, Howe K, Jain M, Jarvis ED, Ji HP, Kenny EE, Koenig BA, Kolesnikov A, Korbel JO, Kordosky J, Koren S, Lee H, Lewis AP, Liao W-W, Lu S, Lu T- Y, Lucas JK, Magalhães H, Marco-Sola S, Marijon P, Markello C, Marschall T, Martin FJ, McCartney A, McDaniel J, Miga KH, Mitchell MW, Mountcastle J, Munson KM, Mwaniki MN, Nattestad M, Nurk S, Olsen HE, Olson ND, Pesout T, Phillippy AM, Popejoy AB, Porubsky D, Prins P, Puiu D, Rautiainen M, Regier AA, Rhie A, Sacco S, Sanders AD, Schneider VA, Schultz BI, Shafin K, Sibbesen JA, Sirén J, Smith MW, Sofia HJ, Tayoun ANA, Thibaud-Nissen F, Tomlinson C, Tricomi FF, Villani F, Vollger MR, Wagner J, Walenz B, Wang T, Wood JMD, Zimin AV, Zook JM, Marschall T, Li H, Paten B, Human Pangenome Reference C (2024) Pangenome graph construction from genome alignments with Minigraph-Cactus. Nature Biotechnology 42:663-673 Lee, K, Korani, W, Bentz, PC, Pokhrel, S, Ozias-Akins, P, Harkess, A, Vaughn, J, Clevenger, J (2025) Long-Read Low-Pass Sequencing for High-Resolution Trait Mapping bioRxiv 2025.01.09.632261; doi: https://doi.org/10.1101/2025.01.09.632261 Wright, HC, Davis, CEM, Clevenger, J, Korani, W (2025) KhufuEnv, an auxiliary toolkit for building computational pipelines for plant and animal breeding. bioRxiv 2025.03.28.645917; doi: https://doi.org/10.1101/2025.03.28.645917

Publications


    Progress 04/15/23 to 04/14/24

    Outputs
    Target Audience:We presented our work at the Advances in Arachis Genomics and Biotechnology biannual conference held this year in Goa, India. There were two presentations given at this conference showcasing the work of using local pangenomes for inegrated gene discovery and selection. A manusccript describing the pangenome was submitted for peer review and is currently in review. Changes/Problems:There have been no major changes to the proposed approach. What opportunities for training and professional development has the project provided?The project fully supports one plant breeding doctoral student. The data generated also are part of projects involving two postdoctoral researchers. HudsonAlpha hosts undergraduate interns every summer and aspects of the project have been part of training two interns. How have the results been disseminated to communities of interest?We have been presenting the project to the peanut community through domestic and international meetings including American Peanut Reserach and Education Society and the Advances in Arachis Genomcis and Biotechnology meeting. We have prepared one manuscript that is currently in peer review. What do you plan to do during the next reporting period to accomplish the goals?We have one more field season to collect phenotypes on drought and white mold. We will fully analyze the genotypes and phenotypes from years 1 and 2. We will finalize the hybrid analysis to analyze crossing over during the MAGIC population construction. We will finalize manuscripts describing the work.

    Impacts
    What was accomplished under these goals? 1) We have fully developed the pangenome graph of all 18 parents, including reference level genome assemblies. 2) All F1 hybrids from the 2-way, 4-way, 8-way, and 16-way hybrids were sequenced and analyzed. A novel imputation strategy was developed to fully track recombination in the population and is currently being used to map the cross overs. 4) Over 3,000 individuals were phenotyped for either white mold resistance or drought tolerance and were genotyped with low pass sequencing and anlysis. Currently the phenotype data from year 2 is being integrated with the genotypes for mapping. 5) Mapping from year 1 and 2 for drought and white mold resistance is underway.

    Publications