A genetics-based data analysis system for breeders in polyploid breeding programs

A GENETICS-BASED DATA ANALYSIS SYSTEM FOR BREEDERS IN POLYPLOID BREEDING PROGRAMS

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1027948

Grant No.

2022-67013-36269

Cumulative Award Amt.

$634,487.00

Proposal No.

2021-07591

Multistate No.

(N/A)

Project Start Date

Jan 1, 2022

Project End Date

Sep 30, 2025

Grant Year

2022

Program Code

[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production

Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695

Performing Department
Horticultural Science

Non Technical Summary
Many important agricultural species are polyploids, i.e., have multiple copies of their genomes. They range from staple food crops (potato, sweetpotato) to fruits (strawberry, kiwi, blueberry, banana), ornamental flowers (roses, chrysanthemum), forage crops, turfgrass, and sugar and energy production crops (sugarcane). The transmission of genic material across generations in polyploids is much more intricate and challenging to unravel than diploids, such as maize, rice, and soybeans. Although challenging, understanding inheritance patterns is essential information in breeding programs. With the correct assessment of these patterns, it is possible to associate specific genomic positions to important traits or even find the gene responsible for them and use this information in breeding programs.In the last few years, we have developed a series of computational tools to help breeders and geneticists answer these questions by analyzing genomic data in polyploid species. We developed tools such as VCF2SM and SuperMASSA for processing raw DNA sequences and identifying genetic markers, MAPpoly for constructing genetic maps, and QTLpoly for locating important genes to trait phenotypes also, to perform prediction. Currently, our tools are implemented for limited genetic design: full-sib families, and we are extending to multiple families. This project proposes extending, even more, our previous polyploid genomic tool for general multiple-generation pedigree breeding populations typically present in practical polyploid breeding programs. Moreover, we propose developing a new downstream computational tool, called DecisionPoly, user-friendly and offers clearly illustrated actionable information to assist polyploid breeders in making short- and long-term breeding decisions based on the collected and learned information about their breeding populations different breeding objectives.

Animal Health Component

30%

Research Effort Categories

Basic

50%

Applied

30%

Developmental

20%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	2499	1080	50%
201	2410	1081	50%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
2410 - Cross-commodity research--multiple crops; 2499 - Plant research, general;

Field Of Science
1081 - Breeding; 1080 - Genetics;

Keywords

decisionpoly

polysomic inheritance

prediction

selection

simulation

Goals / Objectives
The main goal of this project is to develop a comprehensive, integrated, open-source, and publicly available pipeline data analysis platform to process genomic data, to infer the complex inheritance patterns from parents to offspring, to map genes that are important for breeding objectives, and to offer breeders actionable information to make short and long-term breeding decisions in practical breeding programs for polyploid species. In its upstream, we aim to develop computational tools to deal with different genomic data, call biallelic and multiallelic markers, combine genetic and genomic information from multiple breeding families, and build complex genetic models from complex population structures. In its downstream, we aim to develop a user-friendly computational tool offering illustrated actionable information to assist breeders in making breeding decisions based on the collected and learned information about their breeding populations for different breeding objectives.For this project, the specific objectives are:To further develop haplotyping algorithms to consider all the relevant information from complex breeding schemes with multiple-generation and partially inter-connected multiple families.To extend the genetic models between genotypes and phenotypes for all scenarios presented in item 1 and build sound and efficient statistical analysis procedures to achieve the purposes of genetic discovery and more accurate prediction.To implement a user-friendly computational tool in Shiny-R language to help breeders make short and long-term decisions in their breeding programs. Breeders will be able to use the results obtained in items 1 and 2 to make educated short-term decisions, such as which individuals to select and mate, as well as long-term decisions which will be attained through interactive breeding exercises (predict outcomes for various breeding decisions); optimize decisions based on breeding objectives; multiple-generation forward simulation exercises (predict results for different breeding strategies).

Project Methods
Haplotype inference in complex pedigrees.To infer the haplotypes in complex pedigrees, we will extend our previous work on constructing genetic maps in full-sib families. Our solution uses multilocus hidden Markov model (HMM) analysis and works for even ploidy levels from diploids up to autooctaploids. Our current model uses input probability-based dosage markers and recovers the multiple polyploid genotypes present in the segregating population, otherwise masked by the biallelic nature of the SNPs. Given the flexibility of the HMM framework, We also expect to extend our model to use multiallelic markers. The concept of HMM in genetic mapping is to use multiple linked markers to estimate the parental linkage phase, the genetic distance between markers, and reconstruct the offspring haplotypes. This procedure can correct many potential genotyping errors.Using the various sources of interconnected evidence (multiple SNPs and individuals), HMM can aggregate multiple SNP information and partially recover the intrinsic high error rate in individual marker dosage callings in polyploid species. Multilocus analysis to construct genetic maps and offspring haplotypes in polyploids is extremely important for information recovery and marker data quality control. We will implement these innovations in our open source publically available R package MAPpoly.Genetic models between genotypes and phenotypes in complex pedigreesTo integrate all the relevant information (genomic markers and trait phenotypes) in a complex pedigree breeding population for joint and informed analysis, we need a cohesive quantitative genetic model that applies to the whole breeding population. A quantitative genetic model can be devised based on the alleles of the population founders, assuming that all the segregating alleles of breeding individuals can be traced back to the founders in probability. The challenge is how to perform the genetic analysis efficiently and informatively.There are two strategies. One is to extend our current QTL random-effect genetic model for a full-sib family (implemented in QTLpoly) to a complex multi-generational breeding population with the alleles and their effects defined in terms of the founders' alleles. Like computing the G matrix for GBLUP (for GS), a corresponding G matrix (or Q matrix) can be computed for each targeted QTL locus. Multiple Q matrices can be built in one model for multiple QTL. The statistical importance of each QTL locus can be evaluated in the variance component. This strategy is similar to the one proposed by Amadeu et al. (2020) for QTL mapping in tetraploid dialleles in their software DiaQTL. However, there needs to be a more efficient way to identify multiple QTL in this case, as a sequential genome search is not computationally efficient. The second strategy may help in this respect.The second strategy is: first to build a large set of the founder's allele effects sampled in the genome at every specified genome interval position; then to put that set of allelic effects in a LASSO analysis (Tibshirani, 1996) to shrink them to a small set; and from this set to identify potential QTL positions for evaluation. Some combinations of the two strategies can help identify both QTL and their relevant allelic effects computationally more manageable way. We can first try to find significant QTL additive effects, and then conditional on those, try to find their essential dominance effects. This analysis can be used for genetic discovery (identification of QTL and additive, dominant, and epistatic allelic effects). It can also be optimized to predict the performance of future generation individuals. This prediction can be used for selection and mating design and optimizing the experimental designs (which can be aided by a forward simulation). Identifying significant dominant allelic effects is particularly important for variety development because they contribute to heterosis or special combining ability. Of course, our computational tool will also have an option to compute the standard genome G matrix and use it to do GBLUP for GS, at least for comparison. We will implement these innovations in our open source publically available R package QTLpoly.Development of a terminal tool for breeders - DecisionPolyTo truly help breeders incorporate genomic information in their breeding programs, we need to make extra efforts to put the downstream tool in the hands of breeders to help them make breeding decisions directly. For this purpose, we ask ourselves this question: What do breeders need? We think they may need a computational tool that can assist them in making the short and long-term breeding decisions based on the collected and learned information about their breeding populations for different breeding objectives.Data analysis of using the upstream computational tools (SuperMASSA, MAPpoly, and QTLpoly) requires quite a bit of proficiency in using R programs and understanding the scientific and technical issues behind the tools. Thus, we will implement DecisonPoly, an easy-to-use interactive application that can access the relevant information from MAPpoly and QTLpoly, display various information and results graphically, provide various options for users to perform further breeding analysis and simulations, and give breeders a direct control on the decision-making process.We will implement this application using the multiplatform language Shiny. Shiny is an R package that makes it easy to build interactive web apps straight from R with direct connections with databases and the other upstream R packages (MAPpoly and QTLpoly). It is highly interactive, can perform analysis, directly communicates with users with graphic results, and tells the story from the data. We can host this standalone app on a webpage or embed it in R Markdown documents or build dashboards.We will develop our tools using the Breading Application Programing Interface (BrAPI) standardized technical specifications (https://brapi.org/) to facilitate communications between different breeding platforms. We also will implement other ways to import and export datasets, such as CSV and Excell files, since these formats are prevalent in many breeding programs. We will also follow the FAIR guidelines to make our analysis, codes, and datasets findable, accessible, interoperable, and reproducible to the scientific community, especially breeders.

Progress 01/01/22 to 09/30/25

Outputs
Target Audience:The project primarily served scientists and practitioners who generate, curate, and act on genomic information in polyploid crops. Core audiences included breeders, breeding program leads, and quantitative geneticists in public institutions and non-profit initiatives who need reliable haplotyping, QTL mapping, and decision support to select parents and crosses. This group spanned programs in sweetpotato, potato, blueberry, alfalfa, kiwi, blackberry, and tomato, and provided feedback through collaborations with NCSU breeding programs, USDA and Cornell's Breeding Insight, the International Potato Center, and international teams. The work also engaged bioinformatics researchers and software developers who build and maintain analysis pipelines; this community adopted and extended our open-source packages (for example, MAPpoly and QTLpoly), contributed issues and improvements, and benefited from our emphasis on FAIR data practices. To support the next generation of researchers, we provided training for students and early-career scientists. Graduate students and postdocs used the software in theses and research projects, with mentorship delivered through tutorials, workshops, and accessible online materials. The project further supported genotyping platform users and data managers responsible for marker panel evaluation and quality control by enabling segregation checks, haplotype reconstruction, and QA/QC workflows. Finally, to translate analytical outputs into action for decision makers in breeding operations, we initiated development of the breeder-facing application GGSpoly and released a first version that exposes QTLpoly analyses given an existing map. Although the principal focus was on public-sector users, the open-source model and documented interfaces made these tools relevant to private companies and contract research providers. Engagement was domestic and international, which broadened access for programs operating with limited resources and for teaching environments. Changes/Problems:One of the main challenges encountered during the project was the development of GGSpoly, the breeder-facing decision support tool. Although we deployed an initial version that connects to QTLpoly outputs and provides breeders with an accessible entry point for selection and mating decisions, the full implementation proved more complex than anticipated. Integrating simulation, optimization, and visualization features in a way that is both statistically rigorous and user-friendly required more time and resources than were available within this award. As a result, only part of the planned functionality was completed during the project period. This outcome reflects the inherent complexity of polyploid data structures and the need to design a platform that breeders can use with confidence. The work completed so far provides a solid foundation, and we plan to continue refining and expanding GGSpoly in future funding opportunities. What opportunities for training and professional development has the project provided?The project provided hands-on training for multiple graduate students and postdoctoral researchers, giving them practical experience in genetic mapping, QTL analysis, simulation, and software development. Key trainees included Lujia Mo (Ph.D.), Amelia Loeb (Ph.D.), Nyssa Ndey-Bongo (M.S.), Gabriel Gesteira (postdoc), and Guilherme da Luz (visiting Ph.D.). Each incorporated the tools into active research projects ranging from blueberry genetic map construction to sweetpotato QTL analysis and common bean population studies. Mentorship was provided through tutorials, collaborative troubleshooting, and integration of methods into thesis and dissertation projects. Participants also gained experience with reproducible workflows, QA/QC practices, and open-source development, supporting their professional growth in both breeding and computational research. How have the results been disseminated to communities of interest?Findings and tools were disseminated through a combination of in-person, virtual, and open-access platforms. Presentations were given at the Plant and Animal Genome Conference and the Tools for Polyploids workshop, with recorded sessions made available online. All source code and tutorials were released openly on GitHub and mirrored in the USDA Ag Data Commons, ensuring long-term access. Publications included peer-reviewed journal articles, book chapters on sweetpotato genomics, and the Nature Plants genome paper. These outputs reached both U.S. and international audiences. Community uptake has been strong, with MAPpoly alone exceeding 39,000 CRAN downloads by the end of the project, reflecting its adoption in breeding, teaching, and research contexts. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? For the first objective, we released MAPpoly2, enabling haplotype reconstruction and mapping in multi-population and mixed-ploidy contexts. The software was validated with tetraploid potato diallels and interconnected hexaploid sweetpotato populations, and performance improvements made large-scale hexaploid analysis practical. We also developed SIMpoly, a simulation tool for testing algorithm performance under complex conditions. These tools supported the release of public mid-density genotyping panels for blueberry and sweetpotato and provided ultradense maps that were essential for the first fully phased, chromosome-scale sweetpotato genome assembly, published in Nature Plants. Under the second objective, we extended genotype-to-phenotype modeling to multi-population structures. A new QTL mapping approach was implemented and applied to sweetpotato, identifying loci for beta-carotene, dry matter, and starch. We also developed the theoretical framework for a G2A model in polyploids. While the polyploid implementation is ongoing, a proof-of-concept study in maize produced methodological advances and a submitted manuscript, laying the groundwork for future applications in polyploid crops. For the third objective, we initiated GGSpoly, a Shiny application that connects to QTLpoly outputs and provides an accessible entry point for breeders to explore QTL results for selection and mating decisions. Although only a subset of features could be implemented during the project period, the app establishes the foundation for future simulation and optimization functions. Development adhered to FAIR and BrAPI standards to ensure reproducibility and interoperability with breeding data platforms.

Publications

Type: Peer Reviewed Journal Articles Status: Published Year Published: 2025 Citation: Alves da Mata, A. P., Gemenet, D. C., Diaz, F., David, M., Mosquera, V., Bachega Feij� Rosa, J. R., Gon�alves dos Santos, I., de Siqueira Gesteira, G., Mollinari, M., Khan, A., Yencho, G. C., Zeng, Z., & da Silva Pereira, G. (2025). Linkage map construction and QTL mapping for morphological traits in Ipomoea trifida , a diploid sweetpotato relative. The Plant Genome, 18(3). https://doi.org/10.1002/tpg2.70106
Type: Peer Reviewed Journal Articles Status: Accepted Year Published: 2025 Citation: Gesteira, G. de S., Ferreira, G. C., Mollinari, M., Santos, M. F., Jank, L., Vilela, M. de M., Raposo, A., Chiari, L., Zeng, Z.-B., & Garcia, A. A. F. (2025). Genetic linkage mapping in Megathyrsus maximus (Jacq.) with multiple dosage markers. G3: Genes, Genomes, Genetics, 15(9). https://doi.org/10.1093/g3journal/jkaf126
Type: Peer Reviewed Journal Articles Status: Published Year Published: 2025 Citation: Wu, S., Sun, H., Zhao, X., Hamilton, J. P., Mollinari, M., Gesteira, G. D. S., Kitavi, M., Yan, M., Wang, H., Yang, J., Yencho, G. C., Buell, C. R., & Fei, Z. (2025). Phased chromosome-level assembly provides insight into the genome architecture of hexaploid sweetpotato. Nature Plants, 11(9), 19511959. https://doi.org/10.1038/s41477-025-02079-6

Progress 01/01/24 to 12/31/24

Outputs
Target Audience:Similarly to the previous reporting period, our project continued to engage breeders, geneticists, researchers, and graduate students with a strong interest or active involvement in polyploid genetics and breeding. This year, our direct collaborations expanded to include work on tomato, sweetpotato, potato, kiwi, blackberry and blueberry, further demonstrating the broad applicability of our methods across diverse polyploid species. Additionally, the open-source nature of our implementations enhances accessibility for a wide range of users, including those in educational institutions, smaller breeding programs, and resource-limited regions. Our target audience remains both domestic and international, reinforcing the global impact of our work. Changes/Problems:As anticipated in our initial proposal, the complexity of polyploid genetics--particularly in hexaploid crops like sweetpotato--posed significant challenges that extended beyond the scope and timeline of this project. Developing theoretical frameworks for mixed ploidy level populations, implementing the G2A model in polyploids, and integrating these methodologies into GGSpoly required extensive computational and methodological advancements. While we completed foundational aspects of this work, not all planned objectives could be fully achieved within this project's timeframe. Given these challenges and the long-term nature of our research, we proactively planned for a phased approach and, as stated in our initial proposal, anticipated the need for continued efforts beyond this project. Accordingly, we have submitted a new proposal to build on our progress and fully implement and refine these methodologies. What opportunities for training and professional development has the project provided?Ph.D. candidate Lujia Mo and postdoctoral researcher Gabriel Gesteira have been actively engaged in the development of the G2A model, gaining hands-on experience in theoretical modeling and computational analysis. Additionally, several students are utilizing MAPpoly2 in their research. Among them, Master's student Nyssa Ndey-Bongo is working on genetic map construction in blueberry, while visiting Ph.D. student Guilherme da Luz from Brazil is developing an integrated genetic map for F2 and RIL populations in common bean and will apply the G2A model for QTL mapping. Furthermore, many other students have benefited from the project by using these packages in their research and actively engaging in discussions and troubleshooting. How have the results been disseminated to communities of interest?We have disseminated our work through conferences, workshops, and scholarly articles. With the conclusion of our SRCI project "Tools for Polyploids", 2024 marked the last opportunity to present our work in that forum. However, we believe we have built significant momentum, as evidenced by the frequent consultations of our GitHub repository and YouTube videos. MAPpoly alone has received over 39,000 accesses on CRAN to date, highlighting its widespread use. Furthermore, all source codes for our project are openly available on GitHub (https://github.com/mmollina), ensuring accessibility to the broader scientific community. Additionally, all our packages are hosted in the USDA Ag Data Commons, further facilitating data sharing and accessibility. What do you plan to do during the next reporting period to accomplish the goals?By the next reporting period, which will be the final report before the project's conclusion on June 30, 2025, we expect to have submitted the MAPpoly2 manuscript and initiated the implementation of the G2A model in polyploids. Our focus will be on finalizing ongoing analyses, refining methodologies, and ensuring that the developed tools are well-documented and accessible to the community.

Impacts
What was accomplished under these goals? Objective 1: We continued developing MAPpoly2, significantly enhancing its performance and integrating essential QA/QC functions for robust analysis. Additionally, we conducted further simulations and formalized the theory for mixed ploidy level populations. While this delayed the paper submission, it will be completed and included in the final report. To further test MAPpoly2 in complex genetic scenarios, we developed SIMpoly, a simulation tool designed to assess its performance under challenging conditions. SIMpoly is now available to the community, providing researchers with a valuable resource to evaluate and refine their analyses in diverse polyploid settings. As outlined in the previous report, we collaborated with Breeding Insight - Cornell to construct genetic maps under restricted information conditions in blueberry and sweetpotato, enabling the public release of mid-density genotype platforms for breeders. These maps were crucial for verifying SNP segregation patterns, reconstructing haplotypes, and confirming panel quality. These efforts resulted in peer-reviewed publications (see the Products section). Furthermore, we used MAPpoly2 to generate ultradense genetic maps of sweetpotato, which were instrumental in phasing chromosomes at the genome level. Given the need to construct six ultradense maps (~40K SNPs each, one per haplome), the efficiency of MAPpoly2 was critical in achieving these results. This work has been submitted and is currently under review. Objective 2: We developed the theoretical framework for applying the G2A model to polyploids, though its implementation is still pending. Our first major step in applying this concept was in studying the genetic basis of heterosis in maize. While not directly applicable to the polyploid nature of this project, this work serves as a foundation for future applications. Using a maize dataset, we developed theoretical approaches, applied them for methodological advancements, and provided illustrative examples. A manuscript detailing these methods and procedures has been submitted and is currently under review in Genetics. Objective 3: The development of GGSpoly aims to bridge the usability gap by providing a more accessible platform for genomic analyses. While progress was slower than anticipated due to the complexity of developing the underlying theory, we made significant foundational strides. In its current version, GGSpoly performs QTL analysis by running QTLpoly in the backend, given an existing genetic map. Although this represents only the first step, it establishes a crucial groundwork for a fully integrated, breeder-friendly analytical pipeline. We plan to expand its functionalities in a future phase of funding.

Publications

Type: Peer Reviewed Journal Articles Status: Accepted Year Published: 2024 Citation: Fraher, S., Schwarz, T., Heim, C., De Siqueira Gesteira, G., Mollinari, M., Da Silva Pereira, G., Zeng, Z.-B., Brown-Guedira, G., Gorny, A., & Yencho, G. C. (2024). Discovery of a major QTL for resistance to the guava root-knot nematode (Meloidogyne enterolobii) in Tanzania, an African landrace sweetpotato (Ipomoea batatas). Theoretical and Applied Genetics, 137, 234. https://doi.org/10.1007/s00122-024-04739-1
Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Mollinari, M., Gesteira, G. S., Taniguti, C. H., da Silva Pereira, G., Zhao, D., Wu, S., Moraes, A. C. L., Vigna, B. B. Z., Souza, A. P., Garcia, A. A. F., Fei, Z., Sheehan, M., Byrne, D., Riera-Lizarazu, O., Yencho, C., & Zeng, Z.-B. (2024, January 16). Genomic challenges in polyploid crops: An overview of progress so far. Plant and Animal Genome Conference (PAG 31), January 12-17, 2024. PAG.
Type: Peer Reviewed Journal Articles Status: Accepted Year Published: 2024 Citation: Zhao, D., Sandercock, A. M., Mejia-Guerra, M. K., Mollinari, M., Heller-Uszynska, K., Wadl, P. A., Webster, S. A., Beil, C. T., & Sheehan, M. J. (2024). A Public Mid-Density Genotyping Platform for Hexaploid Sweetpotato (Ipomoea batatas [L.] Lam). Genes, 15(8), 1047. https://doi.org/10.3390/genes15081047
Type: Peer Reviewed Journal Articles Status: Accepted Year Published: 2024 Citation: Zhao, D., Sapkota, M., Glaubitz, J., Bassil, N., Mengist, M., Iorizzo, M., Heller-Uszynska, K., Mollinari, M., Beil, C. T., & Sheehan, M. (2024). A public mid-density genotyping platform for cultivated blueberry (Vaccinium spp.). Genetic Resources, 5(9), 3644. https://doi.org/10.46265/genresj.WQZS1824
Type: Peer Reviewed Journal Articles Status: Under Review Year Published: 2024 Citation: Wu, S., Sun, H., Hamilton, J. P., Mollinari, M., Gesteira, G. D. S., Kitavi, M., Yan, M., Wang, H., Yang, J., Yencho, G. C., Buell, C. R., & Fei, Z. (2024). Phased chromosome-level genome assembly provides insight into the origin of hexaploid sweetpotato. bioRxiv. https://doi.org/10.1101/2024.08.17.608395
Type: Book Chapters Status: Published Year Published: 2024 Citation: de Siqueira Gesteira, G., da Silva Pereira, G., Zeng, Z.-B., & Mollinari, M. (2025). Genetic maps in sweetpotato. In G. C. Yencho, B. A. Olukolu, & S. Isobe (Eds.), The Sweetpotato Genome. Compendium of Plant Genomes. Springer, Cham. https://doi.org/10.1007/978-3-031-65003-1_5
Type: Book Chapters Status: Published Year Published: 2024 Citation: da Silva Pereira, G., da Silva, C. C., Feij� Rosa, J. R. B., Sobowale, O. O., de Siqueira Gesteira, G., Mollinari, M., & Zeng, Z.-B. (2025). New analytical tools for molecular mapping of quantitative trait loci in sweetpotato. In G. C. Yencho, B. A. Olukolu, & S. Isobe (Eds.), The Sweetpotato Genome. Compendium of Plant Genomes. Springer, Cham. https://doi.org/10.1007/978-3-031-65003-1_6

Progress 01/01/23 to 12/31/23

Outputs
Target Audience: During this reporting period, our project specifically targeted breeders, geneticists, researchers, and graduate students with an interest or active involvement in polyploid genetics and breeding. More specifically, we directly interacted with groups working on alfalfa and roses, which are both tetraploids, as well as Koronivia grass (Urochloa humidicola) and sweetpotatoes, which are hexaploids. Also, the open-source nature of our implementations potentially benefits a wide array of users, including those in educational settings, smaller breeding programs, or regions with limited resources. Our target audience includes both domestic and international personnel and institutions. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?During the reporting period, we engaged with graduate students who learned to utilize and interact with MAPpoly2 and QTLpoly. As mentioned in the previous report, NCSU Ph.D. candidate Simon Fraher continued to employ our tools within sweetpotato populations. With the aid of our resources, he initiated a project aimed at developing sweetpotato germplasm for the NCSU breeding program, focusing on enhancing skin robustness to minimize injury and reduce postharvest losses due to disease and desiccation. Additionally, another Ph.D. candidate at NCSU, Amelia Loeb, has worked with MAPpoly2, gaining an understanding of genetic mapping in multi-parental breeding populations. She is keen on contributing to projects at the intersection of breeding and bioinformatics, and we are committed to supporting her in these pursuits. Furthermore, postdoctoral researcher Gabriel Gesteira is significantly contributing to our team and assisting in developing GGSpoly. How have the results been disseminated to communities of interest?Our efforts have been shared through conferences, workshops, and scholarly articles. We were active participants at the 31st Plant and Animal Genome Conference, where the Principal Investigator (PD) and Co-Principal Investigator (Co-PD) delivered presentations related to our project. The PD also showcased MAPpoly2 at the Tools for Polyploids workshop, which attracted around 250 attendees, including breeders, geneticists, plant pathologists, researchers, and students from both academia and industry. Recordings of the presentations from the Tools for Polyploids workshop are accessible as videos on their website (MAPpoly2 presentation showing the alfalfa consensus map construction: https://youtu.be/eOT2mUXZJgc). Moreover, all the source codes for our project are openly available on our GitHub repository at https://github.com/mmollina. What do you plan to do during the next reporting period to accomplish the goals?Objectives 1 and 2: We are committed to further refining the algorithms crafted for objectives 1 and 2. Our goal is to enhance their efficiency and effectiveness, followed by submitting our advancements to peer-reviewed journals for validation and dissemination. Objective 3: Our focus will be on advancing GGSpoly, particularly in integrating the backend tools from objectives 1 and 2 with the user-friendly web interface. We plan to engage closely with our breeding partners to collect feedback and tailor the tool to meet their specific needs, ensuring it is both practical and valuable for their work.

Impacts
What was accomplished under these goals? In progressing toward Objective One, we have made advancements in haplotype reconstruction methodologies. We introduced MAPpoly2, a genetic mapping and haplotyping software for multi-population structures, which is now accessible on GitHub (refer to the "Other Products" section). This software streamlines the user experience by minimizing the number of decisions users need to make, thereby enhancing both speed and usability. Our theoretical advancements include the development of algorithms capable of integrating various ploidy levels across populations. This feature is particularly important when obtainingseedless varieties through odd-ploidy-level offspring. We conducted simulations (rpubs.com/mmollin/multi_family_simulation) with mixed ploidy levels and found that the algorithms perform robustly. In a collaborative effort with USDA/Cornell Breeding Insights, we succeeded in building interconnected tetraploid maps in alfalfa and reconstructing offspring's haplotypes (refer to "Products," full analysis available at https://rpubs.com/mmollin/tutorial_mappoly2). We also construct acomplex hexaploid map and derive offspring haplotypes in a modest-sized biparental population (60-50) using the DartTag mid-density target genotype platform. A paper detailing this work is being prepared. This achievement aligns with the challenging nature of this objective, as outlined in our proposal. Our current focus is on applying this novel algorithm to a tetraploid potato population, which includes a diverse range of full-sib sizes from 1 to 68 individuals, encompassing 16 parents and 399 offspring. While we anticipate a certain degree of information overlap within full-sib offspring, our findings so far suggest that the construction of consensus maps and haplotypes in breeding populations is feasible and promising. To share these significant theoretical and practical advancements, we started preparing a scientific publication, which we plan to submit in 2024. Regarding objective two (2), we have already implemented a multi-population QTL mapping method that will serve as a base to extend the genetic models between genotypes and phenotypes for all scenarios presented in objective 1. Regarding objective three (3), we have initiated the preliminary implementation of GGSpoly, accessible at https://gesteira.statgen.ncsu.edu/shiny/ggspoly/. This initial version lays the foundation for a comprehensive web application currently in the developmental phase. It serves as a framework upon which we are constructing a user-friendly computational tool in Shiny-R language designed specifically to assist breeders in making informed decisions about their breeding programs.

Publications

Type: Journal Articles Status: Published Year Published: 2023 Citation: da Costa Lima Moraes, A., Mollinari, M., Ferreira, R.C.U. et al. Advances in genomic characterization of Urochloa humidicola: exploring polyploid inheritance and apomixis. Theor Appl Genet 136, 238 (2023). https://doi.org/10.1007/s00122-023-04485-w
Type: Journal Articles Status: Published Year Published: 2023 Citation: Zhao, D., Mejia-Guerra, K. M., Mollinari, M., Samac, D., Irish, B., Heller-Uszynska, K., Beil, C. T. and Sheehan, M. J. (2023) A public mid-density genotyping platform for alfalfa (Medicago sativa L.), Genetic Resources, 4(8), pp. 5563. doi: 10.46265/genresj.EMOR6509.
Type: Journal Articles Status: Published Year Published: 2023 Citation: Cristiane Hayumi Taniguti, Lucas Mitsuo Taniguti, Rodrigo Rampazo Amadeu, Jeekin Lau, Gabriel de Siqueira Gesteira, Thiago de Paula Oliveira, Getulio Caixeta Ferreira, Guilherme da Silva Pereira, David Byrne, Marcelo Mollinari, Oscar Riera-Lizarazu, Antonio Augusto Franco Garcia, Developing best practices for genotyping-by-sequencing analysis in the construction of linkage maps, GigaScience, Volume 12, 2023, giad092, https://doi.org/10.1093/gigascience/giad092
Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: Mollinari M., Gesteira G.S., Taniguti C.H., Pereira G.D.S., Zhao D., Wu S., Garcia A.A.F., Fei Z., Sheehan M., Byrne D., Riera-Lizarazu O., Yencho C., Zeng Z-B. Genomic Challenges in Polyploid Crops: An Overview of Progress so Far. In: International Plant & Animal Genome Conference 31. (Oral Presentation - Presenter) Available at https://tinyurl.com/2dp7ejtb

Progress 01/01/22 to 12/31/22

Outputs
Target Audience:We aimed to engage breeders and graduate students with an interest or active involvement in polyploid breeding. The diverse plant species studied by these groups spanned blueberries, potatoes, sweetpotatoes, roses, yams, ornamental flowers, and blackberries, among others. Our target audience included participants from both domestic and international personnel and institutions. Changes/Problems: In our initial proposal, we identified the Mwanga Diversity Population (MDP) as the main resource for providing a representative sample of genetic data to be used in developing and testing our models and implementations. However, despite our collaborator's efforts to obtain good-quality DNA from the MDP materials and several adjustments to the genotyping protocol, we were unable to obtain the genotype information of the MDP progeny. As a result, testing the efficiency of the multi-parental model in that scenario has not been possible. This work is ongoing, and we expect to acquire the necessary data in the next reporting cycle. In the meantime, we have established collaborations with other breeding groups, including the International Potato Center (CIP), which provided us with a population that allowed us to construct a consensus genetic map and reconstruct the haplotypes in the progeny. Additionally, we are collaborating with the Texas A&M University Rose Breeding Genetic program, which supplied an interconnected rose population with varying ploidy levels, enabling us to test our procedures. We also decided to change the name of our user-friendly down-stream tool from DecisionPoly to GGSpoly (GGS for Genetic and Genomic Selection) - to be implemented in the next reporting cycles. What opportunities for training and professional development has the project provided?We are working in collaboration with several graduate students on projects directly involved with this project. For the graduate students, we could point out Simon Fraher, who successfully used our tools to identify a single major QTL that explained 70% of the variation in resistance to the nematode Meloidogyne enterolobii and currently developing markers to perform assisted breeding selection at the NCSU sweetpotato breeding program. We are collaborating closely with Gabriel Gesteira, a postdoctoral researcher at NCSU specializing in genotype-phenotype associations, and Cristiani Tanigiuti, a postdoctoral fellow at Texas A&M University, who is involved in multi-population haplotype construction and the development of user-friendly tools. How have the results been disseminated to communities of interest? Formal classroom instruction: We presented our work in guest lectures at NCSU, including Plant Cytogenetics in Plant Breeding, Breeding Asexually Propagated Crops, and Quantitative Genetics Theory and Methods. Workshops and conferences: We participated in the following events to share our knowledge and expertise: a. American Society for Horticultural Science Conference b. 30th Plant and Animal Genome Conference c. Tools for Polyploids workshop, attended by 238 participants d. Advancing Computing Skills in Plant Breeding workshop, organized by the NCSU plant breeding consortium, with an average of 30 attendees Extension and outreach: The talks from the Tools for Polyploids workshop were made available in video format on the Tools for Polyploids webpage at https://www.polyploids.org/2023recordedpresentations. In addition, all source codes for this project are freely available in the git repository https://github.com/mmollina. What do you plan to do during the next reporting period to accomplish the goals?Objective1 and 2. a) Continue to improve our phasing algorithm and explore possibilities to include pedigree with small family sizes and multiple generations. b) Finish the implementation of MAPpoly 2.0 and improve the implementation of MAPpoly-MP and QTLpoly-MP. Objective3. Leveraging our VIEWpoly visualization framework for polyploid genetic analysis (https://cran.r-project.org/package=viewpoly), we plan to commence the development and implementation of GGSpoly in the upcoming reporting cycle. GGSpoly is a computational tool intended to aid breeders in making informed short and long-term breeding decisions based on gathered and processed data about their breeding populations for diverse objectives. We anticipate collaborating closely with breeding groups such as the NCSU Sweetpotato and Potato Breeding Team led by Craig Yencho and the Texas A&M Rose Breeding Group led by Oscar Riera-Lizarazu and David Byrne in order to optimize GGSpoly's effectiveness in real-world breeding situations.

Impacts
What was accomplished under these goals? Objective 1: We successfully implemented mapping and haplotype reconstruction algorithms for polyploid partially inter-connected families, including diploid (2x), tetraploid (4x), and hexaploid (6x) families. Our current implementation allows for the integration of families with varying ploidy levels, enabling the simultaneous reconstruction of genetic maps and offspring haplotypes in single-generation populations given a sufficiently large number of individuals within full-sib populations. The performance of these algorithms was verified using real-world populations, such as a tetraploid potato in a partial diallel 3 x 9 parental configuration and three interconnected hexaploid sweetpotato populations, as well as several in silico simulations. Furthermore, we have optimized the algorithms employed in creating individual genetic maps for individual parents. Our preliminary evaluations demonstrated a substantial decrease in processing time, exhibiting a 10-fold improvement for tetraploid families and a 100-fold improvement for hexaploid families when tested with real data. This refinement is crucial for future implementation. The analysis of markers in single parents will act as an initiation algorithm for merging genetic maps in multiparental families in subsequent phases. Currently, we are implementing these functions in a new version of our mapping software MAPpoly which will incorporate several significant updates designed to enhance its capabilities and efficiency. Objective 2. We also made considerable advancements in connecting the haplotypes obtained in Objective 1 to phenotypes of interconnected populations. Our algorithm is an extension of the algorithm previously developed by Pereira et al. (2020). It relies on a random effect model which is applied in the context of the Multiple Interval Mapping (MIM) procedure. To evaluate and demonstrate the utility of our approach, we applied it to the same three interconnected hexaploid sweetpotato populations described in Objective 1, and also used several in silico simulations. With the posterior haplotype probabilities, we performed multiple QTL mapping and detected significant QTL for beta-carotene, dry matter, and starch content, with consistent allele effects across sub-populations. Objective 3: Nothing to report during this period.

Publications

Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Mollinari M. Computational Tools for Genomic Analysis in Polyploids in: 2022 ASHS Annual Conference (presenter) https://ashs.confex.com/ashs/2022/meetingapp.cgi/Paper/38640
Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Gesteira G.S., Mollinari M., Pereira G.da S., Olukolu B.A., Oloka B.M., Yencho C., Zeng Z-B. Genetic Mapping in Interconnected Hexaploid Sweetpotato Populations in: International Plant & Animal Genome 30 Conference (presenter) https://plan.core-apps.com/pag_2023/abstract/d8c2bdba-5e59-4f12-ab32-9e01d221a7a0
Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Wu S., Sun H., Kitavi M., Hamilton J.P., Gesteira G.S., Mollinari M., Zeng Z-B., Yencho C., Buell R., Fei Z. Advances in the Development of Chromosome-Scale Haplotype-Resolved Genome Assemblies of Hexaploid Sweetpotatoes in: International Plant & Animal Genome 30 Conference https://plan.core-apps.com/pag_2023/abstract/18b6ff6a-ca63-4074-ae87-e4e99fea4f11