Progress 09/15/13 to 09/14/17
Outputs Target Audience:The audiences included: a) Researchers in the U.S. and Canada: Bean Improvement Cooperative (biennial meetings, 2015 in Niagara Falls, Canada, and 2017 in East Lansing, Michigan), American Society of Agronomy (annual meeting 2015, Minneapolis), Stanford University, University of California--Riverside, National Association of Plant Breeders (Davis, 2017). b) Researchers internationally: Plant and Animal Genome Meeting (San Diego, CA: 2015), Gansu University scientist visiting UC Davis (2016), PanAfrican Grain Legume Conference (Zambia, 2016), c) Growers/farmers mainly in California (yearly field day at end of August). The researchers include breeders, geneticists, pathologists, and gene bank curators. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?One postdoc: Dr. Andrea Ariani in genotyping-by-sequencing and bioinformatics (See Ariani and Gepts 2015, Ariani et al. 2016, 2018, Ariani and Gepts, submitted) One graduate student: Dr. Jorge Berny Mier y Teran, in wild bean reaction to drought stress, in comparison with domesticated beans (See Berny Mier y Teran et al. 2018) How have the results been disseminated to communities of interest?See above: in target groups. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
1.1Objective 1: Use existing SNP platform to compare genetic diversity in wild and domesticated types of the Andean and Mesoamerican gene pools. The existing platform consisted of a SNPchip, i.e., the BARCBean6K_3 Infinium SNP array (Song et al. 2015). This SNP chip was used to compare wild and domesticated accessions in the core collection of the USDA Phaseolus bean collection, maintained at the Western Regional Plant Introduction Station in Pullman, WA. A STRUCTURE analysis identified K = 3 or K = 7 groups in this collection. The K = 3 represented the Andean domesticated group and the ecogeographic races Mesoamerica and Durango of the Middle American domestication. For K = 7, two of the three K3 groups (K3.1 and K3.3) were subdivided into smaller groups, whereas the K3.2 remained unsubdivided. The Andean group K3.1 was subdivided into two groups: K7.4 and K7.7. Based on seed type (color, shape, and size; Fig. S4) and phaseolin type, these two groups were Andean in origin. With one exception, group 7.4 originated in Ecuador and Peru. The seeds were large, especially in Ecuador and round-shaped , suggesting they may be representatives of race Peru (Singh et al. 1991; Fig. S4The geographic distribution of these two groups differed in that K7.1 was present in Mexico but included a relatively important representation in the northern countries of Central America (Guatemala, Honduras, and Nicaragua). K7.5, on the other hand, was distributed almost entirely in Mexico. Inspection of the seed types of these two groups suggests that K7.5 represents eco-geographic race Durango of the northern, more arid highlands of Mexico. In contrast, the presence of wild beans from Mexico seems incongruous in a core collection devoted primarily to domesticated types. Either these wild bean accessions are removed or additional wild beans representing the populations distributed in Central America and the Andes are included in this core collection. Recent studies have studied diversity of wild beans (Tohme et al. 1996; Kwak and Gepts 2009). We recommend that this core collection be revised to broaden its coverage. 2.2. Objective 2: Develop a Genotyping-by-Sequencing (GBS) pipeline for further SNP discovery. Genotyping-by-sequencing (GBS) represents a simple, cost-effective, and highly multiplexed alternative for species with or without an available reference genome. However, this technology requires specific optimization for each species, especially for the restriction enzyme (RE) used. Here we report on the application of GBS in a test experiment with 18 genotypes of wild and domesticated Phaseolus vulgaris. After an in silico digestion with different RE of the P. vulgaris genome reference sequence, we selected CviAII as the most suitable RE for GBS in common bean based on the high frequency and even distribution of restriction sites. In order to check the applicability of the GBS protocol using CviAII, a test experiment was performed with 18 wild and domesticated Phaseolus vulgaris genotypes belonging to both Andean and Mesoamerican gene pool, and including a representative of the wild ancestral gene pool from northern Peru, G21245. The total number of variants was 47,838 (61%), divided between 44,875 (94%) SNPs, 1,940 (3%) deletions and 1,693 (3%) insertions. The 23,273 SNPs and InDels located in genic sequences identified 11,027 different genes, with an average of 2 variants per gene. The phylogenetic analysis based on the identified SNPs and InDels was clearly consistent with the division in different gene pools and domesticated/wild lines and was also significantly supported by high bootstrapping values. In particular, the phylogenetic tree automatically rooted with the ancestral genotype G21245 from Northern Peru. Overall, phylogenetic analysis of the variants identified using GBS with CviAII seemed to be reliable and supported previous genetic diversity information about this species. We concluded that GBS is a simple, cost-effective, and highly multiplexed protocol for plant genotyping using NGS technologies. Even though the use of a frequent-cutting methylation- insensitive enzyme will require a higher genome coverage, the small genome size of common bean and the results presented in this study clearly show the advantages of using CviAII for GBS in common bean. 1.3Objective 3: Use the GBS pipeline to further compare diversity, divergence, and adaptation in the wild and domesticated common bean gene pools. The wild progenitor of common-bean has an exceptionally large distribution from northern Mexico to northwestern Argentina, unusual among crop wild progenitors. This research sought to document major events of range expansion that led to this distribution and associated environmental changes. Using genotyping-by-sequencing and geographic information systems applied to a sample of 246 accessions of wild P. vulgaris, including 157 genotypes of the Mesoamerican, 77 of the southern Andean and 12 of the Northern Peru-Ecuador gene pools, we identified five subpopulations based on ~20,000 SNPs. Three of these subpopulations belong to the Mesoamerican gene pool (Northern and Central Mexico, Oaxaca, and Southern Mexico, Central America and northern South America) and one each to the Northern Peru-Ecuador (PhI) and the southern Andean gene pools. Data suggest that water-related traits, such as drought tolerance or water-use efficiency may be more common in the MW1 and MW2 groups, and the derived domesticates, such as race Durango (Singh et al. 1991). In contrast, they are consistent with the adaptation to relatively cooler and moister environments encountered by Andean wild beans and their domesticates, e.g., in Colombia (Debouck et al. 1993) and western Europe (the Netherlands: (Zeven 1997). Inferred divergence time between Andean and Mesoamerican gene pool showed an average of T1 ~ 87,000 years with a 95% confidence interval (CI) in the range of 86,635-88,186 years and a highest posterior density (HPD) of the estimated divergent time between 50,008 and 168,809 years. The estimated divergence time between the Ancestral and the main common bean group showed an average of T2 ~ 373,000 years (95% CI: 371,799-374,321; 95% HPD: 300,009-505,122). Thus, the Ecuador-N. Peru group represents an early separation from the main wild gene pool of P. vulgaris, consistent with the presence of ancestral phaseolin types (Kami et al. 1996) endemic in this group, as well as the sharing of ancestral chloroplast DNA between this group and the Mesoamerican gene pool (Chacon et al. 2007). Given that the core area of the genus Phaseolus as defined by Maréchal et al. (Maréchal 1978) is located in Mesoamerica (Delgado-Salinas et al. 1999; Freytag and Debouck 2002), the existence of species or populations outside this area suggests one or more dispersal events. In general, we propose that the current distribution of wild P. vulgaris has been achieved by seed dispersal at three spatial scales, each of them associated with its own temporal scale. Distance and frequency of seed dispersal are inversely correlated. In turn, these long-distance migration events have exposed wild beans to different climates and soil types. The northward migration, in particular, has led to potential adaptation to hotter and drier conditions. Further experimentation is necessary to determine the value of introgression from this wild beans into the domesticated gene pool.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2015
Citation:
Ariani A, Gepts P 2015. Genome-wide identification and characterization of aquaporin gene family in common bean (Phaseolus vulgaris L.). Mol Genet Genomics 290 (5): 1771-1785. DOI: 10.1007/s00438-015-1038-2
- Type:
Journal Articles
Status:
Published
Year Published:
2015
Citation:
Kole C, Muthamilarasan M, Henry R, Edwards D, Sharma R, Abberton M, Batley J, Bentley A, Blakeney M, Bryant J, Cai H, Cakir M, Cseke LJ, Cockram J, Oliveira ACd, Pace CD, Dempewolf H, Ellison S, Gepts P, Greenland A, Hall A, Hori K, Howe GT, Hughes S, Humphreys MW, Iorizzo M, Ismail AM, Marshall A, Mayes S, Nguyen HT, Ogbonnaya FC, Ortiz R, Paterson AH, Simon PW, Tohme J, Tuberosa R, Valliyodan B, Varshney RK, Wullschleger SD, Yano M, Prasad M. Application of genomics-assisted breeding for generation of climate resilient crops: Progress and prospects. Frontiers in Plant Science 6: 563. doi:10.3389/fpls.2015.00563
- Type:
Journal Articles
Status:
Published
Year Published:
2015
Citation:
Abberton M, Batley J, Bentley A, Bryant J, Cai H, Cockram J, Costa de Oliveira A, Cseke LJ, Dempewolf H, De Pace C, Edwards D, Gepts P, Greenland A, Hall AE, Henry R, Hori K, Howe GT, Hughes S, Humphreys M, Lightfoot D, Marshall A, Mayes S, Nguyen HT, Ogbonnaya FC, Ortiz R, Paterson AH, Tuberosa R, Valliyodan B, Varshney RK, Yano M. Global agricultural intensification during climate change: a role for genomics. Plant Biotechnology Journal: 1-4. doi: 10.1111/pbi.12467
- Type:
Journal Articles
Status:
Published
Year Published:
2016
Citation:
Ariani A, Berny JC, Gepts P (2016) Genome-wide identification of SNPs and copy cumber variation in common bean (Phaseolus vulgaris L.) using genotyping-by-sequencing (GBS). Molecular Breeding 36 (online): 87 (11 pages). DOI: 10.1007/s11032-016-0512-9
- Type:
Journal Articles
Status:
Published
Year Published:
2017
Citation:
Rend�n-Anaya M, Montero-Vargas JM, Saburido-�lvarez S, Vlasova A, Capella-Gutierrez S, Ordaz-Ortiz JJ, Aguilar OM, Vianello-Brondani RP, Santalla M, Delaye L, Gabald�n T, Gepts P , Winkler R, Guig� R, Delgado-Salinas A, Herrera-Estrella A. Genomic history of the origin and domestication of common bean unveils its closest sister species. Genome Biology 18: 60 DOI 10.1186/s13059-017-1190-6
- Type:
Journal Articles
Status:
Published
Year Published:
2017
Citation:
2017 Rend�n-Anaya M, Herrera-Estrella A, Gepts P, Delgado-Salinas A (2017) A new species of Phaseolus (Leguminosae, Papilionoideae) sister to Phaseolus vulgaris, the common bean. Phytotaxa 313:259-266 10.11646/phytotaxa.313.3.3
- Type:
Book Chapters
Status:
Published
Year Published:
2017
Citation:
Gepts P. Genetic aspects of crop domestication. In: Hunter D, Guarino L, Spillane C, McKeown PC (eds) Routledge Handbook of Agricultural Biodiversity. Routledge/Taylor & Francis, New York
- Type:
Journal Articles
Status:
Published
Year Published:
2018
Citation:
Berny Mier y Teran JC, Konzen ER, Medina V, Palkovic A, Ariani A, Tsai SM, Gilbert ME, Gepts P (2018) Root and shoot variation in relation to potential intermittent drought adaptation of Mesoamerican wild common bean (Phaseolus vulgaris L.). Annals of Botany doi: 10.1093/aob/mcy221
- Type:
Journal Articles
Status:
Published
Year Published:
2018
Citation:
Ariani A, Berny Mier y Teran J, Gepts P. Spatial and temporal scales of range expansion in wild Phaseolus vulgaris. Molecular Biology and Evolution 35 (1): 119-131 doi: 10.1093/molbev/msx273
- Type:
Journal Articles
Status:
Submitted
Year Published:
2019
Citation:
Signatures of Environmental Adaptation During Range Expansion of Wild Common Bean (Phaseolus vulgaris)
|
Progress 09/15/13 to 09/14/14
Outputs Target Audience: The project was presented and discussed during the Annual Bean Breeding Field Day on Sept. 4, 2014 (such a field day is held every year in the first week of September). The audience is diverse and consists of growers and warehouse handlers affiliated with the California Dry Bean Advisory Board, the California Crop Improvement Association, other bean growers such as the organic bean industry (e.g., Lundberg Family Farm), seed saving companies, visiting scientists, visiting scholars (Humphrey Fellows at UC Davis), and UC Davis students. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided? Four individuals have benefited from training and professional development in this project. Dr. Andrea Ariani is a postdoctoral fellow funded by this project and who is successfully developing the GBS protocol so far (see Objective 2) and is now in the process of subjecting the entire sample of 300 wild beans to GBS. Mr. Jorge Berny, a PhD graduate student from Mexico with funding from CONACYT, has developed the sample of 300 wild beans and extracted DNA from the respective accessions. Ms. Saarah Kuzay, an undergraduate student, is participating in the SNP analysis of the domesticated accessions as part of and funded by the BeanCAP. Her data on wild beans will be added to the current data. Ms. Antonia Palkovic (not funded by this project) is an assistant specialist who is in charge of greenhouse cultivations of beans, including wild beans. These provide a special challenge because of their dormancy during germination and aggressive climbing growth habit. How have the results been disseminated to communities of interest?
Nothing Reported
What do you plan to do during the next reporting period to accomplish the goals? Goal 1: SNP diversity using the current platform based on the BARCBEAN6 chip. We have completed the analysis of the domesticated materials. These data will then be available when the SNP data of the wild lines become available in Year 2. Data will be analyzed for polymorphism levels, differentiation between wild and domesticated types (Fst), and potential selection signatures of domestication as well as geographic and climatic correlations. Goal 2: This goal is largely completed but additional domesticated lines may be sequenced to provide a better reference in comparison with wild accessions. Goal 3: GBS will be completed during Year 2. This will allow variant calling to start. Frequency, genome distribution, and other population genetic parameters will be estimated.
Impacts What was accomplished under these goals?
Objective 1: Use the existing SNP platform to compare gene-pool-wide and genome-wide diversity and divergence patterns between wild and domesticated types in the Andean and Mesoamerican gene pools of common bean (Phaseolus vulgaris). The National Plant Germplasm System of the USDA and the Genetic Resources Unit of CIAT in Colombia currently hold the majority of wild common bean accessions. From their databases, we selected 300 accessions criteria of covering the most geographic origin of collection without redundancy. We included 110 genotypes that were previously studied for diversity by three independent research groups (Chacón et al., 2005: Kwak and Gepts, 2009; Bitocchi et al., 2012). The sample is representative geographically, climatically, and altitudinally. From the total, 249 of the accessions have been sent for genotyping with the 5,398 SNP markers developed by the Common Bean Coordinated Agricultural Project (BeanCAP) using the Illumina Infinium BARCBEAN6K_3 GeneChip at the Soybean Genomics and Improvement Laboratory, ARS/USDA in Beltsville, Maryland (Hyten et al., 2010). Objective 2: Develop a Genotyping-by-Sequencing (GBS) pipeline for further SNP discovery. 2.1 In silico evaluation of restriction enzyme cutting sites in the P. vulgaris genome Thanks to the availability of P. vulgaris whole-genome on Phytozome (Goodstein et al. 2012; Schmutz et al. 2014), a survey of different restriction enzymes and their relative cutting sites was performed. From the biopython suite enzymes were selected that create a 'sticky' end after cleaving, cut only once for each recognition sites, and do not recreate the restriction site after cleaving. Due to the relatively low level of methylated DNA in the bean genome, ~30% (Abid et al. 2010), another approach was followed. For each enzyme, the distribution of recognition sites in repetitive vs non-repetitive parts of the genome was determined; those enzymes that preferentially cut in the non-repetitive part of the genome were selected. Within this sub-set of enzymes, the ones with the higher count (repetitive vs non repetitive) and the lower p value (based on binomial test) were selected. Among these enzymes, CviAII was chosen since it has the highest occurrence in the non-repetitive parts of the genome and is a neoschizomer (i.e. it recognizes the same nucleotide sequence) of NlaIII, an enzyme used in our lab for construction of RESCAN sequencing libraries (Monson-Miller et al. 2012) in common bean, allowing then for a possible comparison of SNPs detected with different technologies. This enzyme was then analyzed for the in silico length distribution of fragment sizes along the genome. 2.2 Application of GBS protocol on different Phaseolus genotypes - Library preparation and quality evaluation In order to check the applicability of the GBS approach using CviAII, a test experiment was performed with 18 wild and domesticated Phaseolus genotypes belonging to both the Andean and Mesoamerican gene pools. Genotypes included G21245 (W, Intermediate), CAL143 (D, A; drought tolerant), G19833 (D, A; reference genome of Schmutz et al. 2014), UC0801 (D, A; UCD variety), Midas (D, A; parent of RI population), PI417653 (W, M), PI319441 (W, M), PI343950 (W, M), G12873 (W, M; parent of RI population), SEA5 (D, M; drought tolerant), Pinto San Rafael (D, M; drought tolerant), UCD Flor de Mayo (D, M; UC Davis advanced line), SER118 (D, M; drought tolerant), Matterhorn (D, M; drought tolerant variety Michigan State U.), UCD9634 (D, M; drought tolerant UC Davis advanced line), L88-63 (D, M; drought tolerant), Victor (D, M; drought tolerant), and PI311859 (D, P. dumosus as outgroup species). Specific barcodes and adapters for CviAII were designed with the GBS barcode adapter generator (http://www.deenabio.com/services/gbs-adapters). One of the parameters that needs to be optimized for each species in GBS is the initial adapter concentration. This is essential for a correct ligation between adapters and genomic DNA, but also to avoid the formation of adapter-dimers that would be needlessly sequenced. We optimized adapter concentration to 4.5 ng per reaction and we reduced the ligation buffer concentration to 0.6x, instead of 1x, during the ligation step. This optimization led to a higher library concentration, with average fragment size between 170-300bp and a complete absence of adapter-dimers (fragment ~130bp). Except for the previously mentioned optimizations, the library was prepared following the protocol of Elshire et al. (2011) and sequenced in a single lane of Illumina HiSeq2000 flowcell, using the 50bp cycle protocol at the qb3 laboratory at UC Berkeley, CA, USA. A total of 137,026,622 50bp single-end reads was generated. Of these, 127,384,853 sequences (93%) passed the initial quality trimming. Among these ~127M reads, 3,002,729 (2.4%) were removed because they were shorter than 30bp after trimming of reads containing the RE recognition site or adapter contaminants, or because did not contain the overhang RE sequence after the barcode index. As expected from the library preparation strategy, there was a high level of duplicated reads, with 13,278,501 unique reads in the dataset, suggesting a mean 10x redundancy for each read tag. This data suggests that the overall library quality was high and consistent with the experimental approach. 2.3 Alignment and variant calling The numbers of reads was almost equally distributed among the different lines, with ~90% of annotated genes (> 25,000) being tagged by at least one read (Table III). In particular, almost 50% of the reads in each individual could be aligned with the reference genome, and 50% of the aligned reads tagged gene sequences. The total number of reads per gene in each line ranged from 36 to 84, with a mean of 52 reads per gene in each individual. These results are consistent with the in silico digestion of P. vulgaris genome, and showed a homogeneous read mapping rate for races belonging to different gene pools, and also for a sister species in the same genus (P. dumosus). A total of 77,595 high quality SNPs and InDels were identified. Among the variants, 73,656 (94.9%) were SNPs, 2,088 (2.7%) were deletions and 1,851 (2.4%) were insertions. The InDels ranged from 1 to 8 bp, with the majority of them being mononucleotide insertions and deletions. Due to the repetitive nature of most plant genomes, and the resulting miscalls of SNPs and InDels in repetitive regions, all the variants that are located in these regions, were removed. In total, 61% (47,838) of the identified variants were located in non-repetitive regions of the genomes. This ratio is similar to the in silico occurrence of CviAII recognition sites in non-repetitive vs repetitive regions of the genome. These variants are divided between 44,875 (94%) SNPs, 1,940 (3%) deletions and 1,693 (3%) insertions. For further analysis, only these non-repetitive SNPs were considered. The SNP and InDel distributions were significantly highly correlated with chromosome length (r=0.79, p=0.004) (Fig. 3), with a mean of ~4328 and a median of 4,312 variants per chromosomes and a median of 79 variants per Mb. Distribution of SNPs and InDels showed a non-random patterns, with fewer or no SNPs in the centromeric parts of the genome. Of the identified 47,838 SNPs and InDels, 23,273 (49%) were located in genic sequences, with 11,163 (23%) in CDS, 2,285 (5%) in untranslated regions (UTRs) and 9,825 (21%) in introns. For all the genotypes analyzed, 45-49% of the SNPs and InDels were located in genic sequences; among them, ~50% were located in CDS, ~40% in introns, and ~10% in UTRs. The 23273 SNPs and InDels located in genic sequences identified 11027 different genes, with an average of 2 variants per gene. Objective 3: None so far.
Publications
|