Use QTL mapping and genomic selection with epistasis for predictive plant breeding

USE QTL MAPPING AND GENOMIC SELECTION WITH EPISTASIS FOR PREDICTIVE PLANT BREEDING

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

HATCH

Reporting Frequency

Annual

Accession No.

1005398

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Nov 13, 2014

Project End Date

Sep 30, 2019

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
NORTH CAROLINA STATE UNIV
(N/A)
RALEIGH,NC 27695

Performing Department
Statistics

Non Technical Summary
This project will develop a data-based and QTL-based statistical analysis and simulation tool for designing experiments for a predictive plant breeding. The tool will be based on our newly-improved multiple interval mapping method that takes epistasis into account. This is a major strength of the project. The statistical analysis method and simulation tool will provide means to analyze the detail of genetic association of a genome with a phenotype in a mapping population and use the information to design appropriate ways to conduct breeding experiments. It can also be used to study and evaluate different breeding strategies and mating schemes that are relevant to the breeding population. This is probably the most appropriate way to conduct marker-assisted selection, and a right move towards to the eventual breeding method--breeding by genetic design.

Animal Health Component

40%

Research Effort Categories

Basic

30%

Applied

40%

Developmental

30%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	7310	1080	25%
201	7310	1081	25%
901	7310	2090	50%

Knowledge Area
901 - Program and Project Design, and Statistics; 201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
7310 - Experimental design and statistical methods;

Field Of Science
2090 - Statistics, econometrics, and biometrics; 1081 - Breeding; 1080 - Genetics;

Keywords

quantitative trait loci

Goals / Objectives
Genomic revolution has changed the ways of plant breeding from the traditional phenotype-based selection to marker-assisted selection that uses the whole-genome molecular marker information and the estimated marker and phenotype relationship to perform selection. The marker-assisted selection and breeding has become the norm in plant breeding. The main challenge for marker-assisted predictive breeding is the genetic complexity of many quantitative traits that are important for breeding. There are usually many genetic loci that are segregating in the breeding population and important for breeding. The loci can have complex linkage structure (e.g., repulsion linkage) and interaction patterns. The key to improve the efficiency of selection is to take these complexities into account in building a selection model. In this project, we propose to develop a new method that can effectively take into account these complexities for marker-assisted selection. This selection method is based on a newly-improved multiple interval mapping method that can map multiple quantitative trait loci (QTL) with epistasis in a breeding population. Based on the mapping result, the method can provide a predictive estimate of breeding value for any pair of breeding individuals. This new predictive breeding method should be much more powerful than many currently available methods as the new method uses not only the information from estimated main effects of multiple loci, but also the dominant and epistatic effects, thus taking into account the likely compositions of genetic combination for a pair of breeding individuals or lines. We propose to develop the selection methods that are suited to both pure-line breeding paradigm, such as for soybean, and hybrid breeding paradigm, such as for maize.

Project Methods
How to map quantitative trait loci (QTL) with epistasis efficiently and reliably has been a persistent problem for QTL mapping analysis. There are a number of difficulties for studying epistatic QTL. Linkage can impose a significant challenge for finding epistatic QTL reliably. If multiple QTL are in linkage and have interactions, searching for QTL can become a very delicate issue. A commonly used strategy that performs a two-dimensional genome scan to search for a pair of QTL with epistasis can suffer from low statistical power and also may lead to false identification due to complex linkage disequilibrium and interaction patterns.To tackle the problem of complex interaction of multiple QTL with linkage, recently we have developed a three-stage search strategy. In the first stage, main effect QTL are searched and mapped. In the second stage, epistatic QTL that interact significantly with other identified QTL are searched. In the third stage, new epistatic QTL are searched in pairs. This strategy is based on the consideration that most genetic variance is due to the main effects of QTL. Thus by first mapping those main-effect QTL, the statistical power for the second and third stages of analysis for mapping epistatic QTL can be maximized. The search for main effect QTL is robust and does not bias the search for epistatic QTL due to a genetic property associated with the orthogonal genetic model that the additive and additive by additive variances are independent despite of linkage. The model search criterion is empirically and dynamically evaluated by using a score-statistic based resampling procedure. We demonstrated through simulations that the method has good power and low false positive in the identification of QTL and epistasis. This method, called Epis-MIM, provides an effective and powerful solution to map multiple QTL with complex epistatic pattern. The method has been implemented in the user-friendly computer software Windows QTL Cartographer.We propose to develop a predictive procedure for marker-assisted selection (MAS) based on the Epis-MIM method. Epis-MIM is particularly suited for the predictive purpose as it takes the whole genetic architecture of a quantitative trait (not just part of it as many other statistical methods are intended to) into account in model building and prediction. With the final inclusion of the epistatic part, the method can take gene interactions into account in the prediction. We will also build a data and model based simulation tool for designing and evaluating different selection schemes for a predictive MAS.The tool will be integrated with the Epis-MIM method and procedures for building a comprehensive QTL model from the mapping population, and supplemented with simulation procedures that simulate specific mating between individuals and calculate the predictive genotypic values of offspring. The simulation tool will be implemented with a number of breeding strategies and mating schemes for exploring different selection options for a particular population and data set. It will be an interactive tool and can be used to predict selection responses in multiple generations.This predictive breeding simulation tool can have significant impact for breeding practice. It can be used to evaluate the efficiency and performance of different breeding strategies (MAS vs. GS), different MAS methods, and different mating designs. It can also be used to help to design a breeding program and study a number of associated issues, such as populations selected for breeding, sample size requirement, number of generations, etc.A discussion needs to be made about comparison between MAS and genomic selection (GS). In MAS (based on QTL mapping), individual genome locations (QTL) or markers are selected and are used for MAS based on their significant association with phenotype. Genomic selection (GS) is an emerging alternative to MAS. In GS, the test for significance is omitted and all markers in the genome are simultaneously used to estimate their effects (big or small) in the form of genomic estimated breeding values (GEBVs) and are used for selection. There are advantages and disadvantages of these two approaches. For us, our approach has two notable advantages that are particularly important for a predictive breeding in crops. First, Epis-MIM takes the dominance and epitasis into account in model building, parameter estimation and breeding prediction. Thus, it can take into account the specific mating of two individuals or lines and should be more powerful than the traditional MAS and GS that only consider the additive effects of genome for a particular individual or line. Second, it is much more practical to consider only a small and important sub-set of a genome, as compared to the whole-genome in GS, in designing and evaluating different predictive breeding strategies and selection schemes.

Progress 11/13/14 to 09/30/19

Outputs
Target Audience:The research reported in this project is targeted to geneticists and plant breeders. The polyploid genome datananalysis and computational tool development are more specifically targeted to polyploid genetics and plant breeding community. Changes/Problems:During the course of this project, in 2015 we shifted our research emphasis to polyploid linkage and QTL analysis due to a great opportunity to join a project supported by Bill & Melinda Gates Foundation. Polyploid genomic data analysis has been a tremendously challenging problem and also exciting research to work with. The overall objectives of our research are to develop analytic methods and computational tools for performing data analysis from raw DNA sequence reads to SNP calling to construct genetic linkage map to QTL mapping and to genomic selection in full-sib families of sweat potato, a hexaploid species; and to apply the methods and tools to the genomic data generated from our collaborators. What opportunities for training and professional development has the project provided?For the first part of the project, a graduate student participated in the project. The student gained valuable research experience and was trained in performing high level statistical genetics research and analysis. The student learned high level statistical analysis methods, such as LASSO and GBLUP, and statistical genetics principles, and performed original research by using computer simulation and real data analysis. For the second part of the project, two postdoctoral research associated were hired. One was responsible to the development of MAPpoly, and the other was responsible to the development of QTLpoly. The training and research for both postdocs were outstanding. After this project, one postdoc was promoted to Research Assistant Professor and continues the research on MAPpoly development. The other postdoc was recruited to take an Assistant Professor position in Brazil. How have the results been disseminated to communities of interest?The resarch results have been disseminated to the scientific communities through journal publications, and presentations in the scientific meetings. Both MAPpoly and QTLpoly were publicly released and freely distributed to the scientific community. Through GT4SP and a USDA/SCRI project, we joined the effort to teach MAPpoly and QTLpoly in an annual workshop to the scientific community. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? The research in this project isdivided into two periods. In the first period, research was focused on the original proposal "Use QTL mapping and genomic selection with epistasis for predictive plant breeding", that is to explore the opportunity of using QTL epistasis for genomic selection in plant breeding. We made significant progressinthis research topics and report the research results below. In the second period, due to an opportunity of joining the project of Genomic Tools for Sweetpotato Improvement (GT4SP) supported by Bill & Melinda Gates Foundation, we worked on the theory, methods and computational tools for polyploid genome analysis. The progress made in this period was phenomenal and a game-change achievement. This is also reported below. Part 1: QTL epistasis and genomic selection with epistasis 1.QTL Mapping with epistasis: We developed a three-stage method to search for epistatic quantitative trait loci (QTL) which was published in BMC Genetics (2014, 15(1):112). This method was extensively tested and improved through simulation studies. The simulation results verify that our method is currently the most efficient (i.e., detecting more QTL epistasis) and least biased method for QTL epistasis detection. The methods were implemented in Windows QTL Cartographer (https://brcwebportal.cos.ncsu.edu/qtlcart/WQTLCart.htm). 2. Consistent estimation of genomic epistasis: A related question is how much epitasis in the current plant breeding populations, i.e., how important the issue can be for plant breeding. To answer this question, we need to come up appropriate statistical and computational ways to estimate the genetic variance components (additive, dominant and epistatic components) unbiasedly. We explored several approaches (e.g., LASSO and GBLUP), and found that GBLUP can give an unbiased estimation. We performed extensive simulations to demonstrate this desirable and long-sought property. 3. Consistent estimation of genomic epistasis in maize NAM populations: We applied GBLUP to the famous maize NAM populations--25 populations of recombinant inbred lines each with 200 lines, to estimate the genomic additive and epistatic variances for a whole host of phenotypes measured in the NAM populations. We also applied the method to a hybrid rice population to obtain consistent estimates of genomic variance partition: additive, dominant, additive x additive, additive x dominant, and dominant x dominant variances for yield and yield components traits. The result is a breakthrough in our understanding of importance of epistasis. That is the genomic epistasis can be theoretically defined and statistically estimated consistently. 4. Evaluation of genomic selection with or without epistasis for the pure-line breeding.With our Syngenta collaborators, we conducted the following research project. We compared the efficiency of using selection with or without epitasis for two breeding approaches by simulation: pure-line breeding (the soybean paradigm) and hybrid breeding (the maize paradigm). We made significant progress for the soybean paradigm and have obtained many very interesting simulation results. For the maize paradigm research, we signed an agreement with Syngenta for us to access the Syngenta data for this research. However, due to the change of company ownership, Syngenta backed away from the agreement, and this project was not pursued further. Part 2: Computational tool development for polyploid genomic data analysis We have shifted our research emphasis, starting in 2015, to polyploid linkage and QTL analysis and computational tool development due to the funding of GT4SP project by Bill & Melinda Gates Foundation. Polyploid genomic data analysis has been a tremendously challenging problem and also exciting research to work with. The overall objectives of our research are to develop analytic methods and computational tools for performing genomic data analysis from raw DNA sequence reads to SNP calling to construct genetic linkage map to QTL mapping and to genomic selection in full-sib families of sweatpotato, a hexaploid species; and to apply the methods and tools to the genomic data generated from the project. 1 MAPpoly development: MAPpoly is an R package (https://github.com/mmollina/MAPpoly) and implements algorithms to experimental populations derived from biparental crosses of autopolyploids with even ploidy levels varying from 2 up to 8. MAPpoly contains a number of functions for data reading, pairwise and multi-loci recombination fraction estimation procedures, phasing algorithms, marker ordering procedures, haplotype inferences and graphical visualization and diagnostics. 2. BT population analysis with MAPpoly: In GT4SP, we have a full-sib family of Beauregard x Tanzania (BT) with the whole genome GBS data and phenotypes of a number of quantitative traits. We used this population for our computational tool developments. First, we improved our GBS-based genotype calling process by using the posterior probability distribution of the genotypes provided by the SuperMASSA software. This procedure is especially important for populations where the read depth for SNPs is not as high as in the BT population. Then we produced the first complete integrated multilocus genetic map in hexaploidy sweetpotato, containing 31,778 SNPs distributed in 15 linkage groups totaling a length of 4,132.7 cM. To do this, we used a hybrid approach that harnesses the speed of the pairwise analysis and the power of the Hidden Markov Model (HMM) to deal with incomplete information. We also reconstructed the haplotypes of the 311 individuals in the BT population. Ultimately, this is the most comprehensive result that a genetic mapping procedure can produce since it describes in detail the inheritance pattern of the homologous chromosomes from both parents ('Beauregard' and 'Tanzania') to their offspring. We estimated and tested preferential pairing and multivalent formation. Apart from a small deviation from random pairing in linkage group 2, we found that sweetpotato presents mostly non-preferential pairing behavior in the BT population, and we observed an average of ~10% multivalent formation across all linkage groups. We observed significant collinearity between the hexaploid sweetpotato and the two diploid reference genomes (Ipomoea trifida and Ipomoea triloba). 3. QTLpoly development: We have developed a random-effect multiple interval mapping (REMIM) model for multiple QTL detection and characterization for autopolyploid species, and implemented a computational tool in R. The software QTLpoly is available at https://github.com/guilherme-pereira/qtlpoly). Implemented functions include: Perform score statistic-based multiple QTL search and model optimization; Fit multiple QTL and estimate QTL variance components using REML; Estimate QTL allelic effects and predict the individual breeding value for selection based on the detected QTL and available genotypic data; Draw QTL profiles, support intervals, and dot plots for single or multiple traits. We have also performed extensive simulations for both hexaploid and tetraploid populations to evaluate the analysis methods and procedures. These provided the basis for our guidelines to the community on the usage of QTLpoly for QTL mapping data analysis in autopolyploid populations. We have performed QTL mapping data analysis using QTLpoly in several populations, including: 'Beauregard' x 'Tanzania' (hexaploid, 315 full-sibs) for beta-carotene, flesh color, yield-related and quality traits; 'Tanzania' x 'Beauregard' (hexaploid, 245 full-sibs) for root knot nematode; 'New Kawogo' x 'Beauregard' (hexaploid, 287 full-sibs) for SPVD, SPW, and yield and quality-related traits; 'Atlantic' x B1829-5 (tetraploid, 153 full-sibs) for yield and quality-related traits and potato scab disease.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Laurie, C., S. Wang, L.A. Carlini-Garcia and Z.-B. Zeng (2014) Mapping epistatic quantitative trait loci. BMC Genetics 15(1), 112
Type: Theses/Dissertations Status: Published Year Published: 2015 Citation: Wenjing Lu, (2015) Genome-wide association, epistasis and selection of quantitative traits in experimental populations PhD Thesis, North Carolina State University
Type: Journal Articles Status: Published Year Published: 2017 Citation: Schumann, M., Z-B Zeng, M. E. Clough, and G. C. Yencho. (2017) Linkage map construction and QTL analysis for internal heat necrosis in autotetroploid potato. Theoretical and Applied Genetics 130: 2045. DOI 10.1007/s00122-017-2941-1
Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Pereira, G., DC. Gemenet, M Mollinari, B Olukolu, F Diaz, V Mosquera, W Gruneberg, A Khan, GC Yencho, Z-B Zeng (2018) Multiple QTL Mapping in Hexaploid Sweetpotato for Yield and Yield Components. Proceeding of Plant and Animal Genome XXVI Conference.
Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Mollinari M, B Olukolu, G Pereira, DC. Gemenet, A Khan, M Kitavi, David, M Ghislain, GC Yencho, Z-B Zeng (2018) Construction of an ultradense genetic map in hexaploid sweetpotato. Proceeding of Plant and Animal Genome XXVI Conference.
Type: Journal Articles Status: Accepted Year Published: 2019 Citation: Lara, L., M. Santos, L. Jank, L. Chiari, M. Vilela, R. Amadeu, J. Santos, G. Pereira, Z.-B. Zeng, A. Garcia (2019) Genomic Selection with Allele Dosage in Panicum maximum (Jacq.). G3: Genes, Genomes and Genetics (in press).
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: Mollinari, M., B. Olokulu, G. Pereira, D. Gemenet, C. Yencho, Z.-B. Zeng (2019 Unraveling the hexaploid sweetpotato inheritance using ultra-dense multilocus mapping (submitted)
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: G. Pereira, D. Gemenet, M. Mollinar, B. Olukolu, F. Diaz, V. Mosquera, W. Gruneberg, A. Khan, C. Yencho and Z.-B. Zeng (2019) Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato full-sib population. (submitted).
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: Gemenet, DC., G. Pereira, F. Diaz, V. Mosquera, M. Mollinari, B.A. Olukolu, M. David, M. Kitavi, G. Burgos, T.Z. Felde, M. Ghislain, E. Carey, R. Mwanga, L. Coin, Z. Fei, C.R. Buell, B. Yada, C. Yencho, Z.-B. Zheng, A. Khan, W. Gruneberg (2019) Translating Genomic Research to Address Development and Adoption Bottlenecks of Nutritious Sweetpotato [Ipomoea batatas (L.) Lam.] in sub-Saharan Africa (submitted).

Progress 10/01/17 to 09/30/18

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals?Currently MAPpoly and QTLpoly are implemented for a full-sib family for autopolyploid species. With the anticipated continuing funding from the Bill & Melina Gates Foundation on the project, we plan to extend the methods and programs to multiple connected full and half sib families. These are the typical breeding populations for outcrossing autopolyploid species. This will take multiple years of effort.

Impacts
What was accomplished under these goals? After the years of hard works, we have reached a mile stone in developing statistical methods and computer programs for genomic data analysis in autopolyploid populations. We have developed two packages: MAPpoly (https://github.com/mmollina/MAPpoly): An R package that constructs a complete joint linkage map from dosage markers for a full-sib family in autopolyploid species (2X, 4X, 6X, and 8X); QTLpoly (https://github.com/guilherme-pereira/qtlpoly/): An R package performs random-effect multiple interval mapping (REMIM) in full-sib families of autopolyploid species based on REML estimation and score statistics. Two main papers to describe the methods and applications have been submitted for publication. 1. Unraveling the hexaploid sweetpotato inheritance using ultra-dense multilocus mapping The hexaploid sweetpotato (Ipomoea batatas (L.) Lam., 2n = 6x = 90) is an important staple food crop worldwide. Due to its high ploidy level, genetic studies in sweetpotato lag behind other major diploid crops. In this work, we perform a comprehensive analysis to investigate inheritance landscape in sweetpotato using an ultra-dense multilocus integrated genetic map in a biparentalpopulation derived from the cultivars "Beauregard" and "Tanzania". To construct the map, including the phasing of the parental polyplotypes, we used our newly implemented software MAPpoly. The resulting map included 30,684 SNPs distributed in 15 linkage groups, each one containing six phased homologous chromosomes for each parent. We observed a marked colinearity between the I. batatas map and two related diploid species (I. trifida and I. triloba), with noticeable rearrangements in several homology groups. Using a hidden Markov model framework, we computed the probabilities of all possible underlying genotypes in the full-sib population and inferred their polyplotypes. We found clear evidence of multivalent formation, although the majority of the meiotic configurations were originated from bivalent pairing. We detected low levels of preferential pairing in one linkage group whereas in the remaining groups the pairing was random indicating the autopolyploid inheritance nature of sweetpotato. The map presented here is a fundamental resource for future sweetpotato studies, including QTL analysis and assembly of the I. batatas genome. In addition, our approach is readily extendable to tetraploids and octaploids. 2. Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato full-sib population Several autopolyploid species such as sweetpotato, Ipomoea batatas (L.) Lam. (2n = 6x = 90), have importantsocial and economic impact, mostly in developing countries. Yet the detection and characterization of quantitative trait loci (QTL) have remained limited. Due to the genetic complexity of autopolyploids, current fixed-effect models can only fit a single QTL and are generally hard to interpret. Here we report the use of a random-effect model approach to map multiple QTL based on score statistics in a sweetpotato bi-parental population ('Beauregard' _ 'Tanzania') with 315 full-sibs. Phenotypic data were collected for eight yield component traits in six environments in Peru, and joint adjusted means were obtained using mixed models. An integrated linkage map consisting of 30,666 markers distributed along 15 linkage groups spanning 2,702.01 cM was used to obtain the genotype conditional probabilities of putative QTL at every cM position. Multiple interval mapping was performed using the R package QTLPOLY and detected a total of 41 QTL, ranging from one to ten QTL per trait. Some regions, such as those on LGs 3 and 15, were consistently detected among root number and yield traits. In addition, some QTL were found to affect commercial and noncommercial root traits distinctly. Further best linear unbiased predictions allowed us to characterize additive allele effects as well as to compute QTL-based breeding values for selection. Together with quantitative genotyping and its appropriate usage in linkage analyses, this QTL mapping methodology will facilitate the use of genomic tools in sweetpotato breeding as well as in other autopolyploid species.

Publications

Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Pereira, G., DC. Gemenet, M Mollinari, B Olukolu, F Diaz, V Mosquera, W Gruneberg, A Khan, GC Yencho, Z-B Zeng (2018) Multiple QTL Mapping in Hexaploid Sweetpotato for Yield and Yield Components. Proceeding of Plant and Animal Genome XXVI Conference.
Type: Conference Papers and Presentations Status: Published Year Published: 2018 Citation: Mollinari M, B Olukolu, G Pereira, DC. Gemenet, A Khan, M Kitavi, David, M Ghislain, GC Yencho, Z-B Zeng (2018) Construction of an ultradense genetic map in hexaploid sweetpotato. Proceeding of Plant and Animal Genome XXVI Conference.
Type: Journal Articles Status: Accepted Year Published: 2019 Citation: Lara, L., M. Santos, L. Jank, L. Chiari, M. Vilela, R. Amadeu, J. Santos, G. Pereira, Z.-B. Zeng, A. Garcia (2019) Genomic Selection with Allele Dosage in Panicum maximum (Jacq.). G3: Genes, Genomes and Genetics (in press).
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: Mollinari, M., B. Olokulu, G. Pereira, D. Gemenet, C. Yencho, Z.-B. Zeng (2019 Unraveling the hexaploid sweetpotato inheritance using ultra-dense multilocus mapping (submitted)
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: G. Pereira, D. Gemenet, M. Mollinar, B. Olukolu, F. Diaz, V. Mosquera, W. Gruneberg, A. Khan, C. Yencho and Z.-B. Zeng (2019) Multiple QTL mapping in autopolyploids: a random-effect model approach with application in a hexaploid sweetpotato full-sib population. (submitted).
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: Gemenet, DC., G. Pereira, F. Diaz, V. Mosquera, M. Mollinari, B.A. Olukolu, M. David, M. Kitavi, G. Burgos, T.Z. Felde, M. Ghislain, E. Carey, R. Mwanga, L. Coin, Z. Fei, C.R. Buell, B. Yada, C. Yencho, Z.-B. Zheng, A. Khan, W. Gruneberg (2019) Translating Genomic Research to Address Development and Adoption Bottlenecks of Nutritious Sweetpotato [Ipomoea batatas (L.) Lam.] in sub-Saharan Africa (submitted).

Progress 10/01/16 to 09/30/17

Outputs
Target Audience:This report is for general scientists and professionals with some basic understanding of genetcis, genomics and plant breeding. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest?The research resutls have been diseeminated to communities of interest through publications of scientific papers in journals and conferences and distribution of software. What do you plan to do during the next reporting period to accomplish the goals?We need to finish up the development of MAPpoly and QTLpoly software. Two main scientific papers that describe the methodology of linkage map construction and QTL mapping in autopolyploid species are in preparation.

Impacts
What was accomplished under these goals? The research during the last year has been focused on developing MAPpoly and QTLpolyusing the genetic and phenotypic BT population data as the basis for both methodology and tool development/improvement, as well as for data analysis. A few highlights are provided here. BT population linkage map construction and findings: We improved our GBS-based genotype calling process by using the posterior probability distribution of the genotypes provided by the SuperMASSA software. Rather than using a fixed "ad hoc" read-count threshold, we filtered SNPs with less than 75% of individuals with maximum genotype probability of 0.8. With this procedure, the number of SNPs was increased from 20,992 (the last report) to 38,701 while maintaining a high quality of the map. This procedure is especially important for populations where the read depth for SNPs is not as high as in the BT population. Thus, we anticipate an improvement on the genetic linkage maps for the lower read depth TB and NKB populations that we have previously constructed. We produced, to the best of our knowledge, the first complete integrated multilocus genetic map in hexaploid sweetpotato, containing 31,778 SNPs distributed in 15 linkage groups totaling a length of 4,132.7 cM, varying from 173.9 cM (LG 8) to 398.0 cM (LG 1). To do this, we used a hybrid approach that harnesses the speed of the pairwise analysis and the power of the Hidden Markov Model (HMM) to deal with incomplete information. Also, using the HMM we obtained the conditional genotype probabilities of the 400 possible hexaploid genotypes in a full-sib population along the whole genome for every individual. We also reconstructed the haplotypes of the 311 individuals in the BT population. Ultimately, this is the most comprehensive result that a genetic mapping procedure can produce since it describes in detail the inheritance pattern of the homologous chromosomes from both parents ('Beauregard' and 'Tanzania') to their offspring. This information is critical for understanding the genetic architecture of quantitative traits. In some cases, it was not possible to define the entire haplotype of a particular linkage group for some individuals since segments of homologous chromosomes within a homology group were identical. We estimated two recombination features uniquely associated with polyploid species, namely, preferential pairing and multivalent formation. Apart from a small deviation from random pairing in linkage group 2, we found that sweetpotato presents mostly non-preferential pairing behavior in the BT population, and we observed an average of ~10% multivalent formation across all linkage groups. Since we are using a robust HMM model to estimate the final map, we believe that these deviations do not pose serious consequences to the final map. Thus, we concluded that the inheritance pattern in sweetpotato is essentially polysomic, confirming our above observations. We observed significant collinearity between the hexaploid sweetpotato and the two diploid reference genomes (Ipomoea trifida and Ipomoea triloba). We found that 7 out of the 15 linkage groups had chromosome-wise collinearity. The remaining linkage groups presented rearrangements of synteny blocks, which varied from a few mega-base pairs (e.g., LG 1) to entire chromosome arms (e.g., LG 2 and LG 3). MAPpoly development: The construction of our maps was only possible because of the continuous enhancement of the software MAPpoly, available at https://github.com/mmollina/MAPpoly. MAPpoly was able to compute over 700 million pairwise recombination fractions including all kinds of simplex and multiplex SNP configurations. For each pair, the program returns a maximum likelihood estimation of the recombination fraction and linkage phase configuration. The inclusion of new arguments in the map estimation function enables the automation of a significant part of the mapping pipeline, which now takes little human intervention to build a "de novo" map. We have observed that our mapping pipeline is very stable, eliminating possible artifacts generated by incomplete SNP information. Our pipeline to build a "de novo" map includes a few steps, namely, low-quality SNP filtering, pairwise recombination fraction estimation, linkage group formation, "de novo" ordering through MDS algorithm, and multipoint map reconstruction, including phasing. Finally, the conditional genotype probabilities in the whole genome can be calculated straightforwardly. We are still implementing functions to reconstruct the offspring haplotypes automatically. Nevertheless, the R scripts used in our analysis are available. Moreover, if the genomic information is available, MAPpoly has the functions necessary to perform a genome-assisted improvement, which will also be a feature in an upcoming version. QTLpoly development: We have developed a random-effect multiple interval mapping (REMIM) model for QTL detection and characterization for autopolyploid species, and implemented a computational tool in R. In brief, the model evaluates the whole-genome in a given step size (such as 1 cM), and fits QTL in positions that explain a significant part of the phenotypic variance. The significance is assessed by score statistics and the variance components associated with QTL are estimated using restricted maximum likelihood (REML). The model uses a stepwise procedure to select QTL. In comparison, the current published methods for autopolyploids are not able to fit more than one QTL at a time due to the test and estimation problems associated with the needs of estimating numerous fixed effects.To simplify this QTL mapping strategy, we developed QTLpoly, an R package for polyploid QTL analysis that implements many analysis functions based on score statistics and REML estimation. The software QTLpoly is available at https://github.com/guilherme-pereira/qtlpoly). These functions are able to: Perform score statistic-based multiple QTL search and model optimization; Fit multiple QTL and estimate QTL variance components using REML; Estimate QTL allelic effects and predict the individual breeding value for selection based on the detected QTL and available genotypic data; Draw QTL profiles, support intervals, and dot plots for single or multiple traits. We have performed extensive simulations for both hexaploid and tetraploid populations to evaluate the analysis methods and procedures. These will provide the basis for the guidelines to the community on the usage of QTLpoly for QTL mapping data analysis in autopolyploid populations. We have performed QTL mapping data analysis using QTLpoly in several populations, including: 'Beauregard' x 'Tanzania' (hexaploid, 315 full-sibs): analyzed for beta-carotene and flesh color, yield-related and quality traits (from Peru), and yield-related traits (from Ghana); 'Tanzania' x 'Beauregard' (hexaploid, 245 full-sibs): evaluated for root knot nematode (from NCSU); 'New Kawogo' x 'Beauregard' (hexaploid, 287 full-sibs): with historical data of SPVD, SPW, and yield and quality-related traits (from Uganda); 'Atlantic' x B1829-5 (tetraploid, 153 full-sibs): test data consisting of yield and quality-related traits (NCSU) and potato scab disease (from USDA).

Publications

Type: Journal Articles Status: Published Year Published: 2017 Citation: Schumann, M., Z-B Zeng, M. E. Clough, and G. C. Yencho. (2017) Linkage map construction and QTL analysis for internal heat necrosis in autotetroploid potato. Theoretical and Applied Genetics 130: 2045. DOI 10.1007/s00122-017-2941-1

Progress 10/01/15 to 09/30/16

Outputs
Target Audience:Scientists in the plant breedingcommunity. Changes/Problems:We have shifted our research emphasis to polyploid linkage and QTL analysis due to a project supported by Bill & Melinda Gates Foundation. It has beentremendously challengingand also exciting to work on this project. The overall objectives of our research are to develop analytic methods and computational tools for performing data analysis from raw DNA sequence reads to SNP calling to construct genetic linkage map to QTL mapping and to genomic selection in full-sib families of sweat potato, a hexaploid species; and to apply the methods and tools to the genomic data generated from our collaborators. What opportunities for training and professional development has the project provided?A gradaute student and two postdocs participated in this project. The project provided valuable opportunities for training student and postdocs. How have the results been disseminated to communities of interest?Several research papers have been produced and are in the process of publication.Polymap software is still in the development. When the development is finished, we will release it to the communities. What do you plan to do during the next reporting period to accomplish the goals?The polyploid genetics and genomic is a relatively long term project. Our next priority is to finish Polymap and release it and start working QTL analysis methods in polyploids,

Impacts
What was accomplished under these goals? Evaluation of genomic selection with or without epistasis for the pure-line breeding (the soybean paradigm):Jointly with Syngenta collaborators, we conducted the following research project. We compared the efficiency of using genomic selection with or without epitasis for two breeding approaches by simulation: pure-line breeding (the soybean paradigm) and hybrid breeding (the maize paradigm). We made significant progress for the soybean paradigm and have obtained many very interesting simulation results. The results show that using genomic selection with epistasis can make some additional gain for the selection response. However, the amount of additional gain depends critically on the sample size, related to the accuracy of epistasis estimation. That is, in order to practice genomic selection with epistasis for those traits with low additive but high epistatic heritability, a sufficiently large sample size is required. We have shifted our research emphasis to polyploid linkage and QTL analysis due to a project supported by Bill & Melinda Gates Foundation. It has beentremendously challengingand also exciting to work on this project. The overall objectives of our research are to develop analytic methods and computational tools for performing data analysis from raw DNA sequence reads to SNP calling to construct genetic linkage map to QTL mapping and to genomic selection in full-sib families of sweat potato, a hexaploid species; and to apply the methods and tools to the genomic data generated from the project. We target our analysis to genotyping-by-sequencing (GBS) data. GBS has been performed over two populations of each diploid (I. trifida M9xM19 cross) and hexaploid (I. batatas BxT cross) sweetpotato species. GBS reads were processed using Tassel-GBS pipeline (Glaubitz et al. 2014) and tags were aligned using Bowtie2 (Langmead et al. 2013). Previously, through our collaborator (Dr. G.R.A. Margarido) we have extended the Tassel-GBS pipeline to polyploidy data. Quantitative genotype calls based on read depths were obtained using SuperMASSA software (Serang et al. 2012). In the last year by analyzing the data from this project we have updated and optimized SuperMASSA software for polyploidy GBS based analysis. For linkage analysis, we use our OneMap R package (Margarido et al. 2007) for the diploid population. For polyploid populations, we have been working on developing Polymap package for a general autopolyploid linkage analysis. For QTL analysis and genomic selection, we are working on a general strategy to use a random effect model to perform interval mapping and multiple interval mapping in full-sib families. Research progress: We have performed linkage analysis for I. trifida M9xM19 cross and I. batatas BxT cross. Through these analyses, we have been updating and optimizing SuperMASSA, OneMap and Polymap codes. For the first population, a total of 1,692 markers were mapped in 15 linkage groups (LGs). LGs ranged from 65 to 200 markers and from 130.88 to 269.62 cM in length, spanning 2,595.94 cM in total (average marker density = 1.53).For the second population, a total of 1,811 markers were also grouped in 15 LGs. Progress on Polymap: the software to build genetic linkage maps in full-sib populations of autopolyploid species.As a part of the effort to obtain a genetic map in sweet potato, software called Polymap has been developed. Polymap is an R package and implements algorithms specially tailored to experimental populations derived from biparental crosses of autopolyploids with even ploidy levels varying from 2 up to 10. Although the concept of genetic linkage mapping is relatively simple, the combinatorial properties and increasingly missing information from practical data that arise from the multiple sets of chromosomes in a polyploid species make the construction of such maps extremely challenging. In its current stage of development, Polymap contains 122 functionsincluding data reading, pairwise and multi-loci recombination fraction estimation procedures, phasing algorithms, graphical visualization and diagnostics and several marker ordering procedures. All these functions were tested in different ploidy levels in a variety of linkage phase scenarios. Polymap is flexible for constructing linkage maps in the presence or absence of a reference genome. With genomic information, it is possible to use the position of the markers along the genome in order to narrow down the search space for marker orders. On the other hand, Polymap is also capable to construct "de novo" maps using only the information contained on a full sib population. The "de novo" map construction is very important in several polyploid species including sweet potato, where whole genome sequences are not completely assembled and genetic map could help for such a task. Any type of molecular markers can be used for constructing a complete genetic map in Polymap. However, the amount of information contained in markers will reflect on the quality of the final map. One important aspect of markers is their capability to distinguish allelic variants present in different homologous chromosomes. In an ideal case, markers can distinguish different alleles in all parental homologous chromosomes (complete informative markers). However, this is rarely the case. In this project, the allelic variation is accessed using sequencing technologies, where the ratio between the numbers of reads containing different allelic variants (SNP) can be translated in terms of allelic dosages. Polymap can handle these types of markers and also has functions that provide the amount of information for any combination of dosages and linkage phase configurations. For QTL analysis, we have formulated a strategy that uses a random effect model for a general autopolyploid QTL analysis. For examples, for autotetroploid there are potentially (6x6=) 36 genotypes at each QTL locus, and for autohexaploid there are potentially (20x20=) 400 genotypes at each QTL locus. For these genotypes, we can formulate a set of alleles with additive effects assumed to be random with N(0, σa2) and their interaction (dominance) effects assumed to be random with N(0, σd2). Detection and mapping of a QTL is to test σa2 and σd2 for each QTL in reference to a linkage map. Research in this line is in progress in conjunction with Polymap development.

Publications

Type: Theses/Dissertations Status: Published Year Published: 2015 Citation: Wenjing Lu, (2015) Genome-wide association, epistasis and selection of quantitative traits in experimental populations PhD Thesis, North Carolina State University

Progress 11/13/14 to 09/30/15

Outputs
Target Audience:This report is for general scientists and professionals with some basic understanding of genetcis, genomics and plant breeding. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?A graduate student participated in this project. The studentgained valuable research experience and was tained in performing high level statistical genetics research. For the project, the student learned high level statistical analysis methods, such as LASSO and GBLUP, and statistical genetics principles, and performed orioginal research by using computer simulation and real data analysis. How have the results been disseminated to communities of interest?The rsearch results have been disseminated to the scientific communities through journal publications, and presentations in scientific meetings. What do you plan to do during the next reporting period to accomplish the goals?We plan to perform extensive simulation study to evaluate the role of using epistasis for plant breeding. Initialluy, we will target for pure-line breeding system, the so-called soybean paradigm. We will evaluate the efficiencyof genomic selection with epistasis on selection response as compared to genomic selection without epistasis. We intend to seek results, by simulation study,on questions, such as how important by taking epistasis into account with genomic selection for plant breeding improvement, what is the optimal methods and procedures for taking epistasis into account for genomic selection.

Impacts
What was accomplished under these goals? QTL Mapping with epistasis: The paper "Mapping epistatic quantitative trait loci" was published in BMC Genetics (2014, 15(1):112). The paper reports a three-stage method to search for epistatic quantitative trait loci (QTL). The results of this study formed the basis of current project: explore a new marker-assisted predictive breeding method that utilizes the information of QTL epistasis. As a part of further study on QTL epistasis, we set out to perform a comprehensive simulation study to compare the efficiency and bias of several alternative statistical methods that map QTL epistasis. The goal of this study is to test how our three-stage approach is compared to a few other alternative approaches. The results verify that our method is currently the most efficient (i.e., detecting more QTL epistasis) and leastbiased method for QTL epistasis detection. The paper is in the process of publication. Consistent estimation of genomic epistasis: The above studies ask the question on how to detect significant QTL main effects and epistatic effects for breeding. A related, but different, question is how much epitasis in the current plant breeding populations, i.e., how important the issue can be for plant breeding. There are many breeding data, both public and private, that are available for study. To answer this question, we need to come up appropriate statistical and computational ways to estimate the genetic variance components (additive, dominant and epistatic components) unbiasedly. The major challenge for us is that the dimension of data for the problem (all pair-wises of genomic markers) is huge, in the order of millions or more. There are both statistical and computational challenges. We explored several approaches (e.g., LASSO, ISIS and mixed models), and obtained some interesting and promising results. We also evaluated GBLUP for estimating genomic epistatic variance and found that GBLUP can give an unbiased estimator. We performed extensive simulations to demonstrate this desirable and long-sought property. Consistent estimation of genomic epistasis in maize NAM populations: We applied GBLUP to the famous maize NAM populations--25 populations of recombinant inbred lines each with 200 lines, to estimate the genomic additive and epistatic variances for a whole host of phenotypes measured in the NAM populations. The results show that estimates of additive heritability vary between 0.2-0.6 and estimates of epistatic heritability vary between 0.05-0.3 for the traits measured, and that many traits that are associated with life history tend to have low additive heritability and high epistatic heritability. We also applied the method to a hybrid rice population to obtain consistent estimates of genomic variance partition: additive, dominant, additive x additive, additive x dominant, and dominant x dominant variances for yield and yield components traits. The result is a breakthrough in our understanding of importance of epistasis. The study paper is currently in the process of publication.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Laurie, C., S. Wang, L.A. Carlini-Garcia and Z.-B. Zeng (2014) Mapping epistatic quantitative trait loci. BMC Genetics 15(1), 112