Source: UNIVERSITY OF CALIFORNIA, DAVIS submitted to NRP
USING DIPLOID GERMPLASM TO UNDERSTAND GENOME STRUCTURE, SEGREGATION DISTORTION, AND YIELD OF ALFALFA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1030338
Grant No.
2023-67013-39617
Cumulative Award Amt.
$649,982.00
Proposal No.
2022-10301
Multistate No.
(N/A)
Project Start Date
Jul 1, 2023
Project End Date
Jun 30, 2026
Grant Year
2023
Program Code
[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production
Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671
Performing Department
(N/A)
Non Technical Summary
Alfalfa, Medicago sativa L., is the most important forage crop in the USA. However, biomass yield improvement has been slow. We propose to investigate diploid alfalfa from three interrelated angles to better understand the genetic basis of biomass yield and identify possible genetic approaches to future yield improvement. First, we will develop highly accurate, haplotype resolved genome sequences for eight diploid genotypes from both alfalfa subspecies and resequence an additional eight genotypes to gain further information on structural variation among these individuals. Our sequences will include the first whole genome sequence of the yellow-flowered subspecies falcata. Second, we will investigate the biologically interesting phenomenon of segregation distortion toward excess heterozygosity. We will survey more than 20 segregating populations for excess heterozygosity, describing similarities and differences between the regions with distortion. Third, we will develop several large segregating populations that express heterozygote excess in some genomic regions and use them to genetically map quantitative trait loci (QTL) for yield and yield components. Finally, we will look for coincidence of QTL, loci with distorted segregation, and structural variants in the parental genomes to help explain variations in yield and propose mechanisms for how these loci control yield. These results will lead to genetic markers to assist selection for yield and potentially strategies to more effectively select for higher yield.
Animal Health Component
50%
Research Effort Categories
Basic
50%
Applied
50%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011640108050%
2011640108150%
Goals / Objectives
In this project, we aim to investigate four objectives:(1) To develop high quality, haplotype resolved reference genomes diploid M. sativa subsp. caerulea and subsp. falcata in order to evaluate structural variation,(2) To identify genomic loci exhibiting segregation distortion toward excess heterozygosity in diverse genetic backgrounds,(3) To map quantitative trait loci (QTL) for biomass yield and yield components, and(4) To compare the location of structural variants, loci exhibiting segregation distortion toward excess heterozygosity, and QTL for biomass yield and yield components.Our long term goal is to understand mechanistically the genetic drivers of biomass yield and to use that understanding to predict gene and/or haplotype combinations that will lead to yield improvement. This project is a first step toward that goal and will lead to testable hypotheses about loci controlling yield that can be incorporated into breeding programs.
Project Methods
Objective 1. We will sequence genomes of 16 genetically diverse diploid alfalfa genotypes using PacBio HiFi and Omni-C to generate phased genome assemblies of each genotype. These assemblies will be further improved by whole-genome paired-end 150bp short read sequences to correct any errors that may be present. We will extract high molecular weight DNA and develop SMRTbell libraries for High Fidelity (HiFi) sequencing based on PacBio protocols. The Omni-C libraries will be prepared using the Dovetail® Omni-C® Kit according to the manufacturer's protocol. Short-read PE150 libraries will be developed based on Illumina protocols. All sequencing will be done at the UC Davis DNA Technologies Core facility on a PacBio Sequel II system (HiFi), Illumina HiSeq (Omni-C), and a Novaseq S4 sequencer (Illumina PE150). We anticipate sequencing the entire alfalfa genome of ~800 Mbp to a depth of greater than 50×.We will generate de novo assemblies of these genotypes as follows. We will assemble HiFi reads using hifiasm (Cheng et al., 2021). Omni-C data will be assembled using AllHiC (Zhang et al., 2020). Omni-C and HiFi data will be combined using built-in functions of hifiasm for complete assemblies. Short reads (Illumina) will be aligned to assemblies using bowtie2 (Langmead and Salzberg, 2012). Alignments of haplotypes within and across genotypes will be performed with minimap2 (Li, 2018; Li, 2021; Kalikar et al., 2022). Single nucleotide and structural variation between haplotypes, both within individual genotypes and across genotypes, will be identified using the Genome Analysis Toolkit (GATK; McKenna et al., 2012) and SyRI (Goel et al., 2019).We will annotate genomes ab initio using Augustus (Hoff and Stanke, 2019). To improve annotation, we will use existing transcript data from our previously published experiments (Li et al, 2012) and the existing whole-genome sequencing projects. Further, we will also use isoform sequencing (Iso-seq) of the two genotypes for which gold standard reference genomes will be developed. We will collect leaves one week after plants had been clipped to a 3 cm stubble and from leaves and flower buds at about four weeks of regrowth when plants have begun to flower. Leaves will be collected on ice and frozen at -80C before extraction using Qiagen RNAeasy kits. cDNA libraries will be made by the UC Davis DNA Technologies Core facility and sequenced on the PacBio Sequel II system to obtain full length transcripts. Iso-seq data along with the previously published transcripts will be used to annotate assembled genomes using an established pipeline with IsoSeq3, Minimap2, StringTie, and Evidence modeler that has been used by the Monroe lab and others.Objective 2. To examine the prevalence of segregation distortion across diploid germplasm, we propose to evaluate distortion in S1 and F2 populations derived from the 16 parental genotypes sequenced above. These populations will include intra- and inter-subspecies hybrids, replicate F2 populations derived from different F1 genotypes from the same parents, reciprocal populations, and multiple populations for a given parent.For genotyping we will use the DArTag platform developed by Diversity Arrays Technology. Recently, the Breeding Insights program has developed two 3000 SNP arrays on this platform. The arrays were developed from SNP identified through resequencing a series of alfalfa genotypes from across the fall dormancy spectrum; thus, we anticipate these arrays will be useful for our purposes here. Plants will be grown in the greenhouse under typical conditions (~25C with a 16 hour photoperiod). Young, freshly expanded leaves free from disease and insect damage will be harvested, placed on ice, and freeze-dried. DNA extraction and sample preparations will follow the DArT instructions.For each S1/F2 population, we will initially screen 46 individuals and the parental genotype, with two populations per plate = 94 genotypes plus two slots for the DArTag controls. With this population size, our power to detect moderate deviations with a c2 test at a=0.05 is over 0.8. If the results from 46 individuals returns ambiguous results, we will genotype more individuals. However, at this phase of the project, our aim is simply to determine the universality of segregation distortion, particularly toward excess heterozygosity, so genotyping more populations is better. The DArT marker sequences will be aligned to the alfalfa genome sequences we have developed in Objective 1.The population sizes will be small to create genetic maps, but for the two larger populations, we will create genetic maps using QTL IciMapping (Meng et al., 2015), which has been used previously to identify SDR in rice (Liang et al., 2020), and JoinMap 5.0 (Van Ooijen, 2006; https://www.kyazma.nl/index.php/JoinMap/) to compare maps between populations. Segregation distortion regions will be defined as exceeding a LOD threshold of 3.0 in QTL IciMapping (Meng et al., 2015). Segregation distortion affects linkage analysis, so we will consider the use of the methods by Lorieux et al. (1995) to account for SD, as we did in Li et al. (2011a), if necessary.Objective 3. The individual genotypes of the populations will be heterozygous. To evaluate individual genotypes in the past, we have used vegetative clones via rooted stem cuttings for yield QTL mapping. Depending on the number of clones (or seeds) we have available, we will use either an a-lattice or an augmented design with replicated checks for field trials (Zystro et al., 2018; Burgueño et al., 2018). We will plant trials at the UC Davis Plant Sciences Farm in Davis, CA.We will measure biomass yield beginning in April or May the spring after establishment or when they begin to flower, which depends on weather and the level of dormancy these germplasm will show. We will harvest the plots at intervals when ~50% of plants have started to flower. Diploid germplasm tends to regrow more slowly than cultivated tetraploids based on my previous observations. Therefore, we expect to have 2-4 harvests per year.At each harvest period, we will score maturity based on the scale of Kalu and Fick (1981), measure the length of the tallest stem of each individual plant, harvest the center plant in the plot in one sample bag for additional measurements, and harvest all other plants in the plot into a second sample bag. Harvest will be done with hand sickles cutting at ~5cm from the soil surface. Both samples will be dried in a forced air drier at 60C, and the dry matter weighed. On the one plant subsample, we will separate leaves and stems, weigh each component, and compute a leaf:stem ratio. We will also count the number of primary stems on the plant.Phenotypic data analysis of linear mixed models will be conducted in AS-REML-R to generate best linear unbiased predictors (BLUP) to use in QTL mapping with JoinMap 5 for F2 populations or TASSEL (Bradbury et al., 2007) for the AIP. Broad sense heritabilities will be computed, and correlations among traits analyzed. A meta-analysis of QTL (Goffinet and Gerber, 2000) from our experiments and those of other alfalfa yield studies will be conducted using Meta-QTL (Veyrieras et al., 2007) to update the consensus mapping paper published by Ray et al. (2018)Objective 4. In this objective, which in some respects is accomplished as we go along with the others, we explicitly bring the three previous objectives together. Starting with the genomes of our parents, we will assess what differentiates them and determine if there are any significant features observed at the genome level - insertions, deletions, etc. - related to the regions where we have seen segregation distortion and to regions where QTL for biomass yield or yield components are located. Overlapping regions may be obvious by markers falling into QTL intervals and similar markers associated with multiple traits and/or segregation distortion.

Progress 07/01/23 to 06/30/24

Outputs
Target Audience:The primary targets of this research are the scientific community at large and the alfalfa breeding community, both in the private and public sectors. As such, we have presented our research results atseveral national/international conferences during the reporting period, including the International Plant and Animal Genome and the North American Alfalfa Improvement Conference. In addition, this research will have important implications for alfalfa improvement leading to improved cultivars, and because of this, we have also discussed this project at field days, including at the Intermountain Research and Extension Center in Tulelake, CA, to explain to farmers, ranchers, and extension personnel what we are doing in more basic genetics to improve the yield and other traits of alfalfa as a consequence of this project. Changes/Problems:The project is proceeding as anticipated in all respects. What opportunities for training and professional development has the project provided?This project has an outstanding group of young scientists involved with it. An MS student is taking the lead on Objectives 2 and 3 and is deeply involved in Objective 1; a PhD student is leading the bioinformatics to assemble genome sequences in Objective 1; and a postdoc is providing input regarding the methodologies of DNA sequence assembly, scaffolding, and annotation. The group collectively meets regularly, with the advanced student and postdoc assisting the MS student with coding and bioinformatics and the MS student explaining alfalfa biology and genetics to the others. How have the results been disseminated to communities of interest?To date, we have primarily reported results through poster presentations at conferences. In the reporting period, we had posters at the International Plant and Animal Genome conference in San Diego, where the lead student (Cree King) also gave an impromptu oral presentation of her poster during the Alfalfa Workshop, and at the North American Alfalfa Improvement Conference. Additonal conference presentations are planned for the next reporting period. We have also discussed the project at field days, especially at the UC ANR Intermountain Research and Extension Center field day, where we have many of the plants used in this project growing in the field. What do you plan to do during the next reporting period to accomplish the goals?Within the next reporting period, we expect to complete the genome sequence of our reference individual, including a full annotation, and of the other sequenced individuals. This will complete Objective 1. We will then begin to compare the sequences to assess differences within and between subspecies as part of Objective 4. We will complete the analysis of segregation distortion in F2 populations, completing Objective 2. We will idenfity the location of markers showing distortion in different populations on our reference genomes, enabling us to begin to assess the relationship of distortion with structural differences between the genomes of the parents, as part of Obj. 4. We will create the advanced intercross populations and genotype them with the same marker set used previously, and plant the populations in the field to measure biomass yield phenotypes as part of Obj. 3.

Impacts
What was accomplished under these goals? (1) To develop high quality, haplotype resolved reference genomes diploid M. sativa subsp. caerulea and subsp. falcata in order to evaluate structural variation. During the reporting period, we have sequenced seven diploid alfalfa genotypes from both diploid subspecies (i.e., subsp. caerulea and subsp. falcata) using PacBio HiFi Revio long read technology. In addition, for one of the sequenced individuals, which we are using as our reference genotype, we have further obtained Dovetail Omni-C sequence data to assist with sequence scaffolding and further phase haplotypes of this genotype. That reference individual is an F1 hybrid between two plants, one from each subspecies. Using this plant enabled us to use trio binning to help assemble the haplotypes of the F1, and further, wil give us extremely detailed assemblies of each diploid subspecies. We generated initial assemblies from the HiFi sequencing, with final resolution happening after the reporting period. In addition, we collected RNA from the two parents and the F1 genotype used for sequencing at four time points during one day and from several tissues at each time point to generate expression data to guide genome annotation. The data are in hand, but the annotation will be done in the next reporting period. Thus, this objective is well on the way to completion. (2) To identify genomic loci exhibiting segregation distortion toward excess heterozygosity in diverse genetic backgrounds, During the reporting period, we obtained DNA marker data using the DArTag 3000 SNP product developed as part of the Breeding Insights program. The SNP data were obtained from 14 small F2 populations to understand the prevalence of segregation distortion across diverse germplasm. These F2 populations included crossed both within and between subspecies of diploid alfalfa. All populations showed a substantial amount of segregation distortion, mostly toward excess heterozygosity. Thus, our initial hypothesis that excess heterozygosity was a pervasive feature of alfalfa populations appears to be confirmed. We will be increasing the size of several of these populations and doing further characterization, make genetic maps, and tie the maps to the genome sequences in the next reporting period. Thus, this objective as well is progressing well and should be completed in the next reporting period. (3) To map quantitative trait loci (QTL) for biomass yield and yield components, and During this reporting period, we have started to develop advanced intercross populations by selfing and intercrossing pairs of F2 individuals within several of the populations. Additional rounds of self pollination and/or intercrossing will be done. The goal of these populations is to induce more recombination to enable the isolation of segregation distortion loci to smaller genetic intervals and ultimately, to identify the loci responsible for the distortion. These populations will be used to assess biomass yield under field conditions, which will occur beginning in 2025. In addition to these advanced intercross populations, we have also measured biomass on one F2 population in the greenhouse. Data for this population has not been analyzed yet. (4) To compare the location of structural variants, loci exhibiting segregation distortion toward excess heterozygosity, and QTL for biomass yield and yield components. This objective will be begun in the coming year as we start to get the DNA sequences completed, the marker data analyzed, and QTL information starts coming in.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: King, C., Davis, M., Bird, K., Monroe, J.G., and Brummer, E.C. Identifying Segregation Distortion Loci in Diverse Genetic Backgrounds of Diploid Alfalfa. Plant and Animal Genome 31, January 12 - January 17, 2024, San Diego, CA. https://plan.core-apps.com/pag_2024/abstract/4a31f4bf732fdfe0f1c7fd0aad1a25bb
  • Type: Conference Papers and Presentations Status: Published Year Published: 2024 Citation: King, Cree, Matt Davis, Kevin Bird, Grey Monroe, and E. Charles Brummer. Understanding Segregation Distortion in Diploid Alfalfa. 2024 Joint Conference NAAIC, Trifolium, & Grass Breeders June 24-26, 2024 � Pasco, WA. https://www.naaic.org/Meetings/National/2024meeting/35-King.pdf