Insights into Plant-Microbe Interactions by Studying Genome Variation in Medicago

INSIGHTS INTO PLANT-MICROBE INTERACTIONS BY STUDYING GENOME VARIATION IN MEDICAGO

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

HATCH

Reporting Frequency

Annual

Accession No.

1000228

Grant No.

(N/A)

Cumulative Award Amt.

(N/A)

Proposal No.

(N/A)

Multistate No.

(N/A)

Project Start Date

Oct 1, 2013

Project End Date

Sep 30, 2016

Grant Year

(N/A)

Program Code

[(N/A)]- (N/A)

Recipient Organization
UNIV OF MINNESOTA
(N/A)
ST PAUL,MN 55108

Performing Department
Plant Pathology

Non Technical Summary
Legumes play a vital role in ecological and agricultural systems. Among cultivated crops, legumes are unique in their ability to fix atmospheric nitrogen through symbioses with rhizobial bacteria. Alfalfa (Medicago sativa) is a legume that occupies a key role as forage for livestock production. Alfalfa is the fourth most widely grown crop in the United States with an annual value exceeding $8 billion. Unfortunately, the genome of alfalfa has not yet been sequenced and it also displays autotetrapoloid genetics, making it difficult to study in terms of genome organization and genetic analysis. The closely related M. truncatula is often utilized as a model for genome studies. My research program will investigate the genomics of M. truncatula, emphasizing the use of genome sequence data to explore genome variation and the architecture of gene families important in plant-microbe interactions. We will use functional assays, de novo genome assembly, and bioinformatic analyses to characterize genes underlying symbiosis in M. truncatula. Legumes are noteworthy for the sophisticated symbioses they form with rhizobial bacteria (Sinorhizobium). However, existing knowledge about symbioses comes primarily from knockout mutants, an approach that can miss genes of subtle yet significant effect. This project seeks to discover the genes most likely to experience active selection and therefore be important in the contemporary evolution of rhizobial and mycorrhizal symbioses. In earlier work, we utilized genome-wide association analysis (GWAS) to discover several strongly supported candidate loci, often with independent lines of evidence (expression profile, correlation with multiple traits, published symbiotic phenotype). In contrast to earlier GWAS studies, our results were based on whole genome resequencing that enabled SNP analysis at much higher density and without ascertainment bias. We will now go on to test ~100 of these candidate loci through reverse genetic experiments involving Tnt1 insertion lines and Agrobacterium rhizogenes-based gene silencing. Structural variants (SVs) and copy number variants (CNVs) are both known to have major impacts on genome variation. This is extremely important in exploring the genomics of symbiosis because certain large gene families, especially NB-ARC disease resistance genes and nodule-specific cysteine rich peptides, play important roles in symbiosis and other plant-microbe interactions. Unfortunately, SVs and CNVs are difficult to discover with confidence using medium-depth next generation sequencing. Therefore, we will deeply sequence and de novo assemble 30 nodal M. truncatula accessions to discover SVs and CNVs with high confidence. Sequence-based variant discovery will be complemented by comparative genome hybridization and experimental validation. Ultimately, SVs and CNVs will be imputed genome-wide for our entire panel of 250 Medicago accessions.

Animal Health Component

(N/A)

Research Effort Categories

Basic

100%

Applied

(N/A)

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1419	1080	100%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1419 - Leguminous vegetables, general/other;

Field Of Science
1080 - Genetics;

Keywords

copy number variation

genome resequencing

Goals / Objectives
Legumes play a vital role in ecological and agricultural systems. Among cultivated crops, legumes are unique in their ability to fix atmospheric nitrogen through symbioses with rhizobial bacteria. Indeed, fixed nitrogen derived from legume-rhizobia symbioses contributes more than 90 Tg of nitrogen per year worldwide, an amount that would require roughly 300 Tg of fuel (>$30 billion) annually if replaced using the Haber-Bosch process (Kinzig 1994). Since legumes are not constrained for nitrogen, they produce remarkable levels of protein, a property that is both biologically and agriculturally significant. Nearly 33% of all nutritional nitrogen comes from legumes and in many developing countries, legumes serve as the single most important source of vegetable protein (Graham et al 2003). Legumes also provide a significant fraction of the world's edible oil and synthesize an impressive array of isoflavonoid and triterpene saponin compounds, chemicals possessing anti-cancer, anti-inflammatory, or cardiovascular-promoting properties. Among legumes, alfalfa (Medicago sativa) occupies a key role as forage for livestock production. Alfalfa is the fourth most widely grown crop in the United States with an annual value exceeding $8 billion (http://www.naaic.org/). When the value of alfalfa as a mixture with other forages is considered, it is actually equal to either wheat or soybeans. Unfortunately, the genome of alfalfa has not yet been sequenced. Moreover, alfalfa displays autotetrapoloid genetics, making it difficult to study in terms of genome organization and genetic analysis. Consequently, the closely related Medicago truncatula is often utilized as a model for genome studies. My research program investigates the genomics of M. truncatula, emphasizing the use of genome sequence data to explore genome variation and the architecture of gene families important in plant-microbe interactions, especially genes critical in symbiotic nitrogen fixation and resistance to disease pathogens. M. truncatula and its microbial partners are superb models to study the biology of symbiosis. It is widely recognized as an excellent model for legume genomics and has been the subject of a long and highly productive history of symbiosis research (Young and Udvardi 2009). M. truncatula played a key role in the initial description of the chemical dialogue underlying nodulation and M. truncatula is the system where many of the known molecular factors in nodulation were originally described (Geurts et al 2005). The power of M. truncatula as a model for legume research expanded further in 2011 when a high quality, BAC-based sequence for M. truncatula was published (Young 2011). The research proposed here extends the results of earlier CRIS / MAES work where my colleagues and I utilized next generation sequencing to explore genome sequence variation throughout the M. truncatula genome. We went on to perform genome-wide association analysis mapping (GWAS) to discover candidate genes associated with rhizobial nodulation and other fitness traits. We will now go onto validate these candidates and explore other types of genome variation and their impact on nodulation and plant-microbe interactions more generally. This research, which is long-term and broad in scope, relies primarily on NSF Plant Genome funding. The CRIS / MAES funding requested here is an essential complement. Specifically, the CRIS / MAES provides partial salary support for the lead PI on the project (Young) plus the associate scientist who acts as overall lab manager. GOAL 1. Identify and validate candidate genes playing a role in M. truncatula-rhizobium symbioses In the preceding cycle of CRIS / MAES, my colleagues and I utilized genome-wide association analysis (GWAS) to discover several strongly supported candidate loci associated with rhizobial symbiosis. Many of these candidates were also supported by independent lines of evidence (expression profile, correlation with multiple traits, published symbiotic phenotype). In contrast to earlier GWAS studies, our results were based on whole genome resequencing, which enabled single nucleotide polymorphism (SNP) analysis at much higher density and without ascertainment bias (Stanton-Geddes et al 2013). As a foundation for this work, we identified > 6 million SNPs (approximately 750,000 in coding regions) present at a minor allele frequency (MAF) > 0.02 (Branca et al 2011). We used the entire set of SNPs to explore the genetic basis of variation in nodules per plant and rhizobial strain specificity along with developmental traits such as flowering time. Phenotypic data were then collected from 226 accessions grown in replicate with each plant co-inoculated with two strains of S meliloti (~ 2,000 plants assayed). GWAS was conducted using mixed model analyses of variance as implemented in TASSLE with confounding effects of demographic history (ie, population structure) minimized by using a kinship matrix as a covariate. The top 100 candidate SNPs (those with most significant P values) include several within or beside genes with biological functions that make them promising as candidates for validation. Candidate SNPs tagged several nodulation-related examples such as SERK2, MtnodGRP3, MtMMPL1, NFP, CaML3, and MtnodGRP3A. GWAS also identified numerous genes coding for nodule-specific cysteine-rich peptides (NCRs), proteins previously shown to play a role in Sinorhizobial differentiation in nodules (Haag et al 2011). We also identified SNPs that explain a significant portion of trait variance within genes not previously recognized as having a possible nodulation function as well as totally uncharacterized genes - important candidates for the novel gene discovery that is the focus of our next round of symbiosis research. GOAL 2. Explore structural variation and characterize the genome architecture of M. truncatula symbiosis-related gene families Among the plant factors essential for legume symbiosis are two key gene families: the NB-ARCs, which are separately known to play a role in disease resistance, and the nodule cysteine rich peptides (NCRs), which are observed only in Medicago and its close taxonomic relatives (Alunni 2007). Recent studies indicate central roles for both protein families in symbiosis (Haag 2011), while previous studies of genome variation have shown they both exhibit high levels of sequence and structural variation (Branca 2011, Lai 2010). Notably, these gene families are highly diverse not only at the level of SNP variation, but even more so at the level of structural variation and copy number variation. However, better definition of the role of variation, especially its impact on symbiosis and other plant-microbe interactions, is impossible without deeper resequencing combined with de novo genome assembly. Our earlier GWAS mapping only targeted SNP variation, even though structural variants (SVs) and copy number variants (CNVs) are both known to have major impacts on phenotypic variation. This is extremely important in exploring the genomics of symbiosis and disease resistance because certain large gene families - NB-ARCs and nodule cysteine rich peptides - play especially important roles in plant-microbe interactions. Unfortunately, SVs and CNVs are difficult to discover with confidence using medium-depth next generation (Illumina) sequencing alone. Therefore, we will deeply sequence and de novo assemble 30 nodal M. truncatula accessions and then compare the assemblies directly to discover SVs and CNVs with high confidence.

Project Methods
METHODS 1. Identify and validate candidate genes playing a role in M. truncatula-rhizobium symbioses Based on results described earlier, we will go on to test approximately 100 nodulation candidates through reverse genetic experiments involving Tnt1 insertion lines and Agrobacterium rhizogenes-based gene silencing. This will involve RNA silencing and transposon tagging strategies examined through interaction assays with previously defined Sinorhizobium strains. Currently, more than 22,000 independent Tnt1 insertion lines have been generated at the Noble Foundation, where we have established a collaboration with project leaders, Michael Udvardi and Jiangqi Wen. It is estimated that Tnt1-insertion lines contain an average 25 insertions per genome (Tadege et al 2008) so the Tnt1-tagged mutant collection represents ~525,000 independent insertions altogether, translating to ~90% of Medicago genes with inserts (Tadege et al 2009). We will begin by searching Noble's existing flanking sequence tag (FST) database, but if mutants are lacking, we will work with Noble to design gene-specific primers to screen DNA pools. Direct association between Tnt1 and candidates will be confirmed either by the presence of multiple independent Tnt1 inserts in the same gene model or by co-segregation between the target gene and Tnt1 in test crosses that we will make. In parallel, my colleagues and I will utilize RNAi silencing technology to deliver hairpins to Medicago roots that generate dsRNA to degrade, in a sequence-specific manner, homologous target mRNA (Fusaro et al 2006). Briefly, a PCR amplicon is amplified from the coding region of the target gene and cloned into a binary vector as an inverted repeat. This inverted repeat (or "hairpin") is then transcribed by a nodule-specific promoter. The transgene is next transformed into a low-virulence Agrobacterium strain so as not to drastically alter root phenotype or confound downstream phenotyping. Upon Agrobacterium infection, a root is generated that is transformed with the hairpin transgene. The composite plant can then inoculated with rhizobia to induce nodules. Using these techniques we will knock-down GWAS-discovered candidate gene transcripts in order to test their function in nodulation pathways. The hairpin RNA platform is a robust and rapid assay that has been developed over the past decade by legume researchers (Limpens et al 2004). METHODS 2. Explore structural variation and characterize the genome architecture of M. truncatula symbiosis-related gene families Our de novo sequencing will target 30 diverse and informative M. truncatula accessions as a basis for discovering structural variants (SVs) and copy number variants (CNVs) at high resolution and with high confidence. The strategy will utilize 100X Illumina coverage to overcome short read lengths and bolster quality. We will employ a mix of fragment sizes to mitigate the assembly challenges posed by genomic repeats of differing length (Gnerre 2011). The shortest fragment size will be designed to generate overlapping reads that can be joined to produce a longer, higher quality contigs. Moderate and larger fragment sizes provide long-range connectivity for extension of contigs and scaffolds. We have selected assemblers that are compatible with this mix of data types (Gnerre 2011): ALLPATHS-LG merges overlapping reads, ABySS tolerates broad insert size distributions while Celera and ABySS filter chimeric mate pair read. We will evaluate all assemblies by intrinsic and orthogonal measures. Intrinsic measures include number of incorporated reads, total bases in contigs, combined span of large contigs, contig bases N50, scaffold span N50, and mate constraint satisfaction. Our extrinsic measures include concordance to reference sequence including BACs, organelle sequences, transcript sequences, and other Medicago assemblies, keeping in mind the limitations of each reference. Accurate discovery of structural and copy number variants is a crucial prerequisite to understanding small-effect and large-effect phenotypic variation. Consequently, we will calculate a comprehensive set of SV and CNVs across all 30 nodal accessions and then choose a subset for experimental validation to attain an estimate of our false-positive and false-negative rates. No assembler can correctly assemble all regions of the genome and therefore we examine potential regions of misassembly that might lead to inaccurate SV calls. Because M. truncatula has low levels of heterozygosity, regions of clustered heterozygosity are indicative of overassembly, such as occurs when repeats or members of gene families are collapsed. By aligning the reads back to the assembly they created, it is possible to quickly flag regions that should be approached with caution. By assessing heterozygosity density in adjacent genes, we can efficiently identify genomic regions where the number of genes in the assembly may not reflect the number of genes in the actual genome. This not only flags regions where SVs should be interpreted with caution but also gives us the opportunity to manually explore these regions. To identify and compare SVs among accessions, the reference M. truncatula genome sequence (A17) is not an ideal point of comparison. Preliminary data indicate numerous regions where A17 deviates in structure and content from other accessions. A more neutral reference for Medicago accessions would therefore be preferable. Therefore, we will begin by creating a pan-Medicago reference (Gan et al 2011). For this, we will generate a "minimal complete representation" reference, where the new reference will include all novel sequence from any of the 30 accessions and where no new sequence must be inserted in order to derive one of those accessions. Our work to create de novo assemblies and SV prediction will be complemented by comparative genomic hybridization (CGH) technology (Haun 2011). The two platforms are complementary, in part, because they are based on different chemistries and subject to different biases. CGH relies on relative hybridization of genomic sequences to a pre-ascertained set of probes, whereas the sequencing platform relies on library construction and adequate read coverage. SVs detected across both platforms can be called at high confidence. To create a Medicago platform, we will leverage the sequence data from the de novo genomes to build a comprehensive set of probes. This will result in a one-million feature Medicago genome tiling array based on Agilent technology. The array probes, each 50-70 bp oligos, will be designed from non-repetitive sequence spaces identified in the A17 genome plus the 30 de novo Medicago assemblies. Therefore, the CGH platform should be useful for mapping and cross-validating SVs that are present in at least one of the sequenced accessions. Approximately 100 putative structural and copy number variants will be subject to wet-lab verification. Preference will be given to variants expected to have biological import, namely those that result in alterations to genes, especially symbiosis-related genes or those with nodule expression. Due to their high sequence similarity and highly rearranged nature, tandem clusters of NB-ARC and NCR genes are expected to pose particular challenges to the underlying sequence assemblies on which our SVs are called and so a significant number of the regions we verify will involve these types of gene clusters. Candidate SVs will be selected for verification via PCR using primers spanning precise breakpoints identified by the sequencing, outward-facing primers for suspected tandem duplicates, and large-scale deletions via far-separated primers that would only amplify sequence if the deletion actually exists. PCR products will be cloned and sequenced to verify breakpoints. Finally, SVs and CNVs will be imputed genome-wide for our entire ~250 accession GWAS panel.

Progress 10/01/13 to 09/30/16

Outputs
Target Audience:Legume Researchers Plant Genomicists Plant Breeders Plant Pathologists Bioinformaticists and Compuational Biologists Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The Young Lab hosted one graduate student and one undergraduates, both under-represented minority females, on the project during 2015. During this reporting period, the Young Lab also hosted two post-doctoral researchers. How have the results been disseminated to communities of interest?My colleagues and I have given talks at multiple scientific conferences and seminars, including the Plant and Animal Genome Conference 2016 (San Diego, CA) and the Medicago Genetics and Genomics Conference (Ardmore, OK). The results of Medicago genome sequencing and gene family analysis have been submitted and currently under review. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? GOAL 1. Identify and validate candidate genes playing a role in M. truncatula-rhizobium symbioses. ACCOMPLISHMENTS: We used a combination of gene-disruption platforms (Tnt1 retro-transposons, hairpin RNA-interference constructs, and CRISPR/Cas9 nucleases) together with randomized, well-replicated experiments to evaluate the function of genes that an earlier GWAS in Medicago truncatula had identified as candidates contributing to variation in the symbiosis between legumes and rhizobia. We evaluated ten candidate genes found in six clusters of strongly associated SNP candidiates, selected on the basis of their strength of statistical association, proximity to annotated gene models, and root or nodule expression. We found statistically significant effects on nodule production for three candidate genes, each validated in two independent mutants. Annotated functions of these three genes suggest their contributions to quantitative variation in nodule production occur through processes not previously connected to nodulation, including phosphorous supply and salicylic acid-related defense response. These results demonstrate the utility of GWA combined with reverse mutagenesis technologies to discover and validate genes that contribute to naturally occurring variation in quantitative traits. The results highlight the potential for GWA to complement forward genetics in identifying the genetic basis of ecologically and economically important traits. GOAL 2. Explore structural variation and characterize the genome architecture of M. truncatula symbiosis-related gene families. ACCOMPLISHMENTS: Genome-wide synteny based on 15 diverse M. truncatula genome assemblies effectively detected different types of structural variants, indicating that as much as 22% of the genome is involved in large structural changes, altogether affecting 28% of gene models. A total of 63 million base pairs (Mbp) of novel sequence was discovered, expanding the reference genome space for Medicago by 16%. Pan-genome analysis revealed that 42% (180 Mbp) of genomic sequences is missing in one or more accessions, while pan-proteome analysis identified 67% (50,700) of all ortholog groups as dispensable - estimates comparable to recent studies in maize and soybean. Environmentally sensitive gene families (ESGFs) were found to be enriched in the accession-specific gene pool. Among ESGFs, the nucleotide-binding site leucine-rich repeat gene family harbors the highest level of nucleotide diversity, large effect single nucleotide change, protein diversity, and presence/absence variation. However, the leucine-rich repeat and heat shock families are also disproportionately affected by large effect single nucleotide changes and even higher levels of copy number variation.

Publications

Type: Journal Articles Status: Awaiting Publication Year Published: 2016 Citation: Burghardt, LT, Young ND, Tiffin P (2016) A guide to genome-wide association mapping in plants. Current Protocols in Plant Biology (In press).
Type: Journal Articles Status: Published Year Published: 2016 Citation: Young ND, Zhou P, Silverstein KAT (2016) Exploring structural variants in environmentally-sensitive gene families. Current Opinion in Plant Biology 30: 19-24, http://dx.doi.org/10.1016/j.pbi.2015.12.012.

Progress 10/01/14 to 09/30/15

Outputs
Target Audience: Legume Researchers Plant Genomicists Plant (especially Legume) Breeders Plant Pathologists Bioinformaticists and Computational Biologists Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?My lab has hosted three undergraduates, including two under-represented minorities, who worked on the project during 2015. During this reporting period, I also advised two graduate students, including one under-represented female minority, as well as two post-doctoral researchers. One graduate student completed his studies during the past year and he is now working at the Minnesota Supercomputer Institute as a bioinformatics researcher and consultant. How have the results been disseminated to communities of interest?My colleagues and I have given talks at multiple scientific conferences and seminars, including the Plant and Animal Genome Conference 2015 (San Diego, CA) and the Plant Genome Evolution Conference (Amsterdam, Netherlands). The results of Medicago genome sequencing and gene family analysis were submitted and -- responding to comments of the reviewers -- a new manuscript has been prepared and is nearing completion for re-submission. A separate manuscript describing validation of symbiosis candidates is in progress and expected to be ready for submission by the end of January, 2016. What do you plan to do during the next reporting period to accomplish the goals?*** Publication: Submit the two central manuscripts capturing our most significant accomplishments of the project so far. Prepare and submit another new manuscript describe a novel approach to assembling genome sequence data from a combination of next generation short DNA sequence reads and the newer long-read technology based on the PacBio sequencing system. *** Research: Complete the validation of all remaining symbiosis candidate genes, including the creation of multiple versions (alleles) for each candidate gene in order to obtain independent evidence to support observed biological function for positive candidates. Further characterize gene families that have expanded and shown to play a role in symbiosis/nodulation through additional genome sequencing, sequence variant analysis, and a new round of functional genomic testing. *** Training: Continue mentoring and advising for: (1-3) undergraduates, (1) graduate student, and (2) post-doctoral researchers.

Impacts
What was accomplished under these goals? GOAL 1. Identify and validate candidate genes playing a role in M. truncatula-rhizobium symbioses. Of the 12 top gene candidates previously associated with symbiotic nodulation (through genomewide association mapping, GWAS, technology), all 12 have now been tested multiple times. For these functional tests, my colleagues and I "knock-out" the candidate gene using one of the following strategies: 1) RNAi, where the messenger RNA becomes double-stranded and is therefore degraded, 2) Tnt1, where the gene itself is disrupted by a transposon, 3) TALENS and CRISPRs, where the gene is targeted by either a protein or protein-RNA complex, producing a double-stranded break. So far, five of the genes have been shown to be statistically significantly associated with nodulation. This is especially significant because it represents the first time GWAS technology has been used to discover genes involved in symbiosis and then validated through functional genomics tests. GOAL 2. Explore structural variation and characterize the genome architecture of M. truncatula symbiosis-related gene families. A total of 16 different M. truncatula genomes have been re-sequenced using next generation sequencing technology and assembled, de novo, into high quality genome sequence assemblies. Gene families predicted to play a role in plant-microbe interaction have been analyzed in detail, with special emphasis on their genome architecture, the structure of their multi-gene clusters, and the types of structural variants that underlie differences among gene family members. My colleagues and I have found that these gene families are the most variable in plant genomes, but that genes putatively involved in plant disease resistance are more variable than those involved in symbiosis. One gene family that we're characterizing, a family that is highly specific to Medicago and its close relatives, is especially interesting because the gene products can either act as antibiotics that kill (inappropriate) Rhizobial bacteria or, if appropriate, induce bacterial differentiation into a symbiotic partner.

Publications

Type: Journal Articles Status: Awaiting Publication Year Published: 2015 Citation: Martinez-Vaz B, Denny R, Sadowsky M, Young ND (2015) An alternative approach to Identification of Unknowns: designing a protocol to verify the identities of nitrogen fixing bacteria. Journal Microbiology and Biology Education, In press.
Type: Journal Articles Status: Published Year Published: 2015 Citation: Kang Y, Sakuroglu M, Krom N, Stanton-Geddes J, Wang M, Lee Y-C, Young ND, Udvard M (2015) Genome-wide assocation of drought-related and biomass traits with HapMap SNPs in Medicago truncatula. Plant Cell Environ. doi: 10.1111/pce.12520.
Type: Journal Articles Status: Published Year Published: 2015 Citation: Gentzbittel L, Andersen SU, Ben C, Rickauer M, Stougaard J, Young ND (2015) Naturally occurring diversity helps to reveal genes of adaptive importance in legumes. Frontiers in Plant Science, 6: 269. doi: 10.3389/fpls.2015.00269
Type: Journal Articles Status: Published Year Published: 2015 Citation: Bonhomme M, Boitart S, San Clemente H, Dumas B, Young ND, Jacquet C (2015) Genomic signature of selective sweeps reveals adaptation of Medicago truncatula to root-associated microorganisms. Molecular Biology and Evolution, doi: 10.1093/molbev/msv092.
Type: Journal Articles Status: Under Review Year Published: 2016 Citation: Young ND, Zhou P, Silverstein KAT (2016) Exploring structural variants in environmentally-sensitive gene families. Current Opinion in Plant Biology (In review).

Progress 10/01/13 to 09/30/14

Outputs
Target Audience: Crop Breeders, primarily soybean breeders. Plant Pathologists, primarily scientists interested in soybean cyst nematodes. Plant Geneticists, primarily scientists working on the evolution of disease resistance genes, especially in legume species. Plant Genomicists, targeting reseachers carrying out large-scale, whole-genome sequencing projects of plant species, especially legumes. Plant Bioinformaticists, especially programmers developing software for discovery and analysis of gene involved in plant-microbe interactions. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? UNDERGRADUATE SUMMER RESEARCH FELLOWS. As part of the University of Minnesota's Life Sciences Summer Undergraduate Research Program (LSSURP), the Young Lab co-hosted three summer students who worked on the Medicago and Rhizobium projects. These students came from either Hamline University, a primarily undergraduate liberal arts institution in St. Paul, MN or from the University of Puerto Rico. As part of their summer projects, students participated in workshops to learn more about choosing and applying to graduate schools, giving scientific presentations, and good practice is scientific professionalism. At the end of the summer, students either give a presentation about their research or present posters. GRADUATE STUDENT TRAINING.Three graduate students work in the Young Lab (Joseph Guhlin, Diana Trujillo, Peng Zhou) work on this project, all working toward their Ph.D. degrees. During this reporting cycle, Guhlin and Zhou had the opportunity to present posters at national/international meetings (primarily the Plant and Animal Genome Conference). They also participated in University of Minnesota courses focused on successful teaching and on scientific professionalism. The students participate in and regularly lead lab meeting discussions about their own research and/or journal club presentations. POST-DOCTORAL TRAINING. Post-doctoral researcher, Shaun Curtin, leads the work on nodulation gene validation. During this project period, he presented his work twice, once in the Plant Pathology departmental seminar series and once at the Plant and Animal Genome conference. How have the results been disseminated to communities of interest? Project PI, Nevin Young, gave invited presentations about this research at the Plant and Animal Genome (PAG) Conference (San Diego, January 2014) and International Conference on Legume Genetics and Genomics (Saskatoon, July 2014). Post-doctoral associate, Shaun Curtin, gave an invited presentation about his work at the PAG conference and also in the Plant Pathology departmental seminar series. Graduate students Guhlin and Zhou presented project posters at the PAG conference. Two refereed publications and four abstracts describing the project's research were published during 2014 What do you plan to do during the next reporting period to accomplish the goals? GENOME SEQUENCING, VARIANT DISCOVERY AND GENE FAMILY ANALYSIS. Genome re-sequencing, structural variant discovery and characterization of complex gene families will continue until all of the targeted 30 accessions have been completed. In the process, new and novel approaches to assembling plant genome sequences based on a combination of short read ("Illumina") and long read ("Pacbio") technologies will be developed. A draft "Pan-Genome" for Medicago truncatula will be created as a basis for exploring the cysteine rich peptide (CRP) and NBS-LRR (disease resistance) gene families in Medicago. CANDIDATE GENE VALIDATION. Preliminary validation work has indicated that five of the genes previously discovered through genomewide association studies exhibit a phenotype in lines with gene knockout/knockdown constructs. These early results will be repeated and cross-validated by lines with additional knockout/knockdown alleles to provide convincing evidence for nodulation-related phenotypes.

Impacts
What was accomplished under these goals? STRUCTURAL VARIATION AND NODULATION CANDIDATE GENE DISCOVERY. We have identified copy-number variants (CNVs) in M. truncatula HapMap accessions and performed a genome-wide association study (GWAS) on nodulation-related traits. CNVs were identified through comparison between reference read-depth and specific accession read-depths. Polymorphisms were then treated as SNP markers in combination with the published Medicago HapMap SNPs. An association analyses conducted with TASSEL using this combined set of variants identified a CNV (a deletion relative to the reference genome) of a nodule-related cysteine-rich (NCR) peptide that is associated with a reduction of total nodule count. This NCR deletion was validated by comparison to de novo assemblies. The observed CNV events had not been tagged previously by SNP calls and are in low linkage disequilibrium with nearby SNPs. These results indicate that CNVs may play a role in phenotypic variation that has not been properly characterized by SNP-only methods, and hints that other structural variants may also play an important role in phenotype. GENOME RESEQUENCING IN MEDICAGO. Previous studies using whole-genome sequence data to identify sequence polymorphisms (SNPs and short Insertion/Deletions) have relied on mapping short reads to a single reference (ecotype A17). However, limitations of read-mapping approaches have hindered variant detection and characterization in repeat-rich regions as well as highly divergent regions. Studies of large gene families, as a result, are also hindered due to high sequence similarities among family members and high divergence among accessions. During this project period, we sequenced and createdde novo assemblies for the genomes of 15 natural M. truncatula accessions. Genome-wide pairwise alignments and construction of synteny blocks enabled accurate detection of SNPs, short InDels and large structural variants (large InDels, copy number changes, translocations). Nucleotide substitution rates (SNP per bpfrom 0.63% in HM058 to 2.37% in HM340) were found to be dramatically higher than previous reports that were based on aligning next generation reads to the reference genome only. Among the most variable genes are two defense-related gene families: the nucleotide-binding site leucine-rich repeat (NBS-LRR) gene family and the nodule-specific, cysteine-rich gene family (CRPs). While gene structural changes are frequently observed in longer genes such as NBS-LRRs, family expansion / contraction is an important source of variation for shorter genes such as CRPs.

Publications

Type: Journal Articles Status: Published Year Published: 2014 Citation: Trujillo DI, Silverstein KAT, Young ND (2014) Genomic characterization of the LEED..PEEDs, a gene family unique to Medicago lineage. G3: Genes | Genomes | Genetics. doi:10.1534/g3.114.011874.
Type: Journal Articles Status: Published Year Published: 2014 Citation: Yoder JB, Stanton-Geddes J, Zhou P, Briskine R, Young ND, Tiffin P (2014) Genomic signature of local adaptation to climate in Medicago truncatula. Genetics 196(4):1263-1275 doi: 10.1534/genetics.113.159319.
Type: Journal Articles Status: Published Year Published: 2014 Citation: Bao Y, Vuong T, Meinhardt C, Tiffin P, Denny R, Chen S, Nguyen HT, Orf JH, Young ND (2014) Potential of association mapping and genomic selection to explore PI88788 derived soybean cyst nematode resistance. The Plant Genome 7(3). doi:10.3835/plantgenome2013.11.0039.
Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Zhou P, et al (2014) The Medicago pan-genome reveals large-scale variation. Plant and Animal Genome XXII (San Diego, CA, January 2014), P348.
Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Curtin S, et al (2014) A genome engineering toolbox for legume functional genomics. Plant and Animal Genome XXII (San Diego, CA, January 2014), P309.
Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Guhlin J, et al (2014) ODG: A graph database generator for omics data. Plant and Animal Genome XXII (San Diego, CA, January 2014), P985.
Type: Conference Papers and Presentations Status: Published Year Published: 2014 Citation: Young N (2014) Exploring nodulation through genome-wide association studies in Medicago truncatula. Plant and Animal Genome XXII (San Diego, CA, January 2014), W480.