Conservation and Divergence in the Common Bean (Phaseolus vulgaris) Genome during Domestication Assessed by Next-Generation Sequencing

CONSERVATION AND DIVERGENCE IN THE COMMON BEAN (PHASEOLUS VULGARIS) GENOME DURING DOMESTICATION ASSESSED BY NEXT-GENERATION SEQUENCING

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1000929

Grant No.

2013-67013-21224

Cumulative Award Amt.

$400,000.00

Proposal No.

2013-01906

Multistate No.

(N/A)

Project Start Date

Sep 15, 2013

Project End Date

Sep 14, 2017

Grant Year

2013

Program Code

[A1141]- Plant Health and Production and Plant Products: Plant Breeding for Agricultural Production

Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671

Performing Department
Dept. of Plant Sciences

Non Technical Summary
The wild progenitors of crop plants contain more biodiversity than their domesticated descendants. For example, genes for yield, disease and pest resistance, and tolerance to environmental stresses such as drought and heat, have been found in the wild ancestor but are absent in the corresponding crops. This projects seeks to facilitate the utilization of such extra diversity in the breeding of improved bean varieties, whether dry beans or green beans. The use of biodiversity collections for crop improvement has been limited in the past for a number of reasons, including the difficulty in evaluating such collections in a systematic way; and the difficulty in transfering genetic diversity of interest, while leaving deleterious wild trait behind. Wild beans are of great interest because genes for yield and disease and pest resistance have been identified previously among them, but also because they have a very broad geographic distribution from northern Mexico to northwestern Argentina, suggesting that they have diverse adaptations to drought or heat. In order to identify wild beans of interest in this broad distribution, we will perform a novel combination of analyses, namely an analysis of DNA diversity combined with a computer-based eco-geographic analysis to: 1) identify those wild bean populations that live in the driest/hottest environments; and 2) identify those genes or genome regions correlated with distribution in the driest/hottest environments.In turn, the chosen wild beans can then be used in crosses with domesticated beans to enhance drought or heat tolerance and using the genes identified as molecular markers in marker-assisted selection. This project will provide an unsurpassed detailed view of genetic diversity in wild beans, which will facilitate the maintenance and utilization of this biodiversity. It will develop a new high-throughput genomic resources for beans, namely a large number of new molecular markers. It will initiate new breeding efforts to obtain bean varieties that are more tolerant to drought and heat, two traits that may climate-proof beans.

Animal Health Component

50%

Research Effort Categories

Basic

50%

Applied

50%

Developmental

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1410	1070	25%
202	1411	1080	50%
203	0510	1081	25%

Knowledge Area
203 - Plant Biological Efficiency and Abiotic Stresses Affecting Plants; 202 - Plant Genetic Resources; 201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1410 - Beans (dry); 1411 - Beans (fresh, fresh-processed); 0510 - Wilderness;

Field Of Science
1070 - Ecology; 1081 - Breeding; 1080 - Genetics;

Keywords

domestication syndrome

germplasm conversion

marker-assisted selection

next generation sequence

Goals / Objectives
The major goal of this project is to provide a detailed characterization of the genetic diversity and adaptation of wild beans (Phaseolus vulgaris) at the molecular level to facilitate their use in domesticated bean improvement. The wild progenitor of common bean has an extraordinarly wide distribution, ranging from northern Mexico to northwestern Argentina. This distribution area encompasses both tropical and subtropical regions, which makes wild common bean an attractive model to study biological adaptation to a wide range of climates and climate change in a common genetic background. Specifically, the project will use an existing single-nucleotide polymorphism (SNP) platform to compare diversity in wild and domesticated beans from the Andean and Mesoamerican gene pools at the gene-pool-wide and whole-genome levels. It will also develop a new Genotyping-by-Sequencing (GBS) pipeline to increase the available SNPs from ~5000 to ~30,000. Using this pipeline, diversity, divergence, and adaptation in the wild and domesticated gene pools will be further characterized and correlated with local environmental conditions (mainly climate) to identify polymorphisms linked to or directly involved in local adaptation using geographic information systems (GIS). This project will provide an unsurpassed view of genetic divergence and adaptation of the wild progenitor of common bean, which contains a significant and largely unused amount of genetic diversity for bean improvement. It will develop a new genomic resource for common bean, namely a large number of new SNP markers linked to sequences of agronomic interest. Genomic information will be made available as soon as possible in the public genome database for Phaseolus beans, PhaseolusGenes.

Project Methods
Methods: Objective 1: We will usean Illumina iSelectHD Infinium beadchip with 6,000 SNPs (labeled BARCBEAN6K_3) developed by Illumina based on information provided by P. Cregan (USDA-ARS, Beltsville, MD) as part of the BeanCAP. This chip contains ~5,200 scoreable SNPs. Analysis will be conducted on 288 wild and domesticated bean lines. Results will be analyzed with established software to characterize genetic diversity among gene pools and between wild and domesticated types. We will also investigate genetic diversity according to the distribution of SNPs along the 11 bean chromosomes using the whole-genome sequence of common bean available at phytozome.net.Two novel elements are the followign. First, genetic diversity datawill be correlated with environmental variables to identify subgroups of wild beans that may be adapted to more extreme environmental conditions like drought or heat. Second, we will correlate environmental variation with variation at specific SNP loci. Correlated loci become candidates for actual genes underlying tolerance phenotypes and can be used to perform marker-assisted selection. Objective 2: We will develop a genotyping-by-sequencing platform to identify a large number of additional SNP markers. We will follow essentially the method proposed by the Buckler group at Cornell U.Parameters of importance include the choice of the restriction enzyme and the adapters, especially the barcode-containing adapter, A bioinformatics pipeline will be set up, which is linked to the PhaseolusGenes databases. Objective 3: The GBS protocol developed in the previous objective will be applied to a sample of 288 wild beans. Analyses will be the same as described in Objective 1. Efforts: Knowledge obtained in this project will be disseminated in a variety of ways: a) formal or informal course presentations: e.g., PLB143:http://www.plantsciences.ucdavis.edu/gepts/pb143/pb143.htm ; b) internships for undergraduate students (many recruited from PLB143) and rotations for graduate students; c) visiting students and scientists; d) general or commodity meetings; and e) farmers' meetings. Evaluations: We will follow this timeline to measure progress towards each objective on a trimester or semester basis. Objectives Activities Year 1 Year 2 Year 3 A B A B A B 1: Low-density SNP diversity DNA extraction and quality control XXX Infinium Assay at UC Davis Genome Center XXXXXXXXXX Data analyses (GenomeStudio, etc.) XXXXXXXXXXXXXXXX 2: Development of a GBS pipeline DNA extraction and quality control XXX Library development and testing (restriction enzyme, barcodes, multiplexing, etc.) & sequencing XXXXXXXXXXXXXXXX Bioinformatics pipeline development XXXXXXXX 3: GBS assessment of bean diversity DNA extraction and quality control XXXXXXXXX GBS XXXXXXXXXXXX Bioinformatics analysis XXXXXXXXXXXXXXXXXX

Progress 09/15/13 to 09/14/17

Outputs
Target Audience:The audiences included: a) Researchers in the U.S. and Canada: Bean Improvement Cooperative (biennial meetings, 2015 in Niagara Falls, Canada, and 2017 in East Lansing, Michigan), American Society of Agronomy (annual meeting 2015, Minneapolis), Stanford University, University of California--Riverside, National Association of Plant Breeders (Davis, 2017). b) Researchers internationally: Plant and Animal Genome Meeting (San Diego, CA: 2015), Gansu University scientist visiting UC Davis (2016), PanAfrican Grain Legume Conference (Zambia, 2016), c) Growers/farmers mainly in California (yearly field day at end of August). The researchers include breeders, geneticists, pathologists, and gene bank curators. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?One postdoc: Dr. Andrea Ariani in genotyping-by-sequencing and bioinformatics (See Ariani and Gepts 2015, Ariani et al. 2016, 2018, Ariani and Gepts, submitted) One graduate student: Dr. Jorge Berny Mier y Teran, in wild bean reaction to drought stress, in comparison with domesticated beans (See Berny Mier y Teran et al. 2018) How have the results been disseminated to communities of interest?See above: in target groups. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? 1.1Objective 1: Use existing SNP platform to compare genetic diversity in wild and domesticated types of the Andean and Mesoamerican gene pools. The existing platform consisted of a SNPchip, i.e., the BARCBean6K_3 Infinium SNP array (Song et al. 2015). This SNP chip was used to compare wild and domesticated accessions in the core collection of the USDA Phaseolus bean collection, maintained at the Western Regional Plant Introduction Station in Pullman, WA. A STRUCTURE analysis identified K = 3 or K = 7 groups in this collection. The K = 3 represented the Andean domesticated group and the ecogeographic races Mesoamerica and Durango of the Middle American domestication. For K = 7, two of the three K3 groups (K3.1 and K3.3) were subdivided into smaller groups, whereas the K3.2 remained unsubdivided. The Andean group K3.1 was subdivided into two groups: K7.4 and K7.7. Based on seed type (color, shape, and size; Fig. S4) and phaseolin type, these two groups were Andean in origin. With one exception, group 7.4 originated in Ecuador and Peru. The seeds were large, especially in Ecuador and round-shaped , suggesting they may be representatives of race Peru (Singh et al. 1991; Fig. S4The geographic distribution of these two groups differed in that K7.1 was present in Mexico but included a relatively important representation in the northern countries of Central America (Guatemala, Honduras, and Nicaragua). K7.5, on the other hand, was distributed almost entirely in Mexico. Inspection of the seed types of these two groups suggests that K7.5 represents eco-geographic race Durango of the northern, more arid highlands of Mexico. In contrast, the presence of wild beans from Mexico seems incongruous in a core collection devoted primarily to domesticated types. Either these wild bean accessions are removed or additional wild beans representing the populations distributed in Central America and the Andes are included in this core collection. Recent studies have studied diversity of wild beans (Tohme et al. 1996; Kwak and Gepts 2009). We recommend that this core collection be revised to broaden its coverage. 2.2. Objective 2: Develop a Genotyping-by-Sequencing (GBS) pipeline for further SNP discovery. Genotyping-by-sequencing (GBS) represents a simple, cost-effective, and highly multiplexed alternative for species with or without an available reference genome. However, this technology requires specific optimization for each species, especially for the restriction enzyme (RE) used. Here we report on the application of GBS in a test experiment with 18 genotypes of wild and domesticated Phaseolus vulgaris. After an in silico digestion with different RE of the P. vulgaris genome reference sequence, we selected CviAII as the most suitable RE for GBS in common bean based on the high frequency and even distribution of restriction sites. In order to check the applicability of the GBS protocol using CviAII, a test experiment was performed with 18 wild and domesticated Phaseolus vulgaris genotypes belonging to both Andean and Mesoamerican gene pool, and including a representative of the wild ancestral gene pool from northern Peru, G21245. The total number of variants was 47,838 (61%), divided between 44,875 (94%) SNPs, 1,940 (3%) deletions and 1,693 (3%) insertions. The 23,273 SNPs and InDels located in genic sequences identified 11,027 different genes, with an average of 2 variants per gene. The phylogenetic analysis based on the identified SNPs and InDels was clearly consistent with the division in different gene pools and domesticated/wild lines and was also significantly supported by high bootstrapping values. In particular, the phylogenetic tree automatically rooted with the ancestral genotype G21245 from Northern Peru. Overall, phylogenetic analysis of the variants identified using GBS with CviAII seemed to be reliable and supported previous genetic diversity information about this species. We concluded that GBS is a simple, cost-effective, and highly multiplexed protocol for plant genotyping using NGS technologies. Even though the use of a frequent-cutting methylation- insensitive enzyme will require a higher genome coverage, the small genome size of common bean and the results presented in this study clearly show the advantages of using CviAII for GBS in common bean. 1.3Objective 3: Use the GBS pipeline to further compare diversity, divergence, and adaptation in the wild and domesticated common bean gene pools. The wild progenitor of common-bean has an exceptionally large distribution from northern Mexico to northwestern Argentina, unusual among crop wild progenitors. This research sought to document major events of range expansion that led to this distribution and associated environmental changes. Using genotyping-by-sequencing and geographic information systems applied to a sample of 246 accessions of wild P. vulgaris, including 157 genotypes of the Mesoamerican, 77 of the southern Andean and 12 of the Northern Peru-Ecuador gene pools, we identified five subpopulations based on ~20,000 SNPs. Three of these subpopulations belong to the Mesoamerican gene pool (Northern and Central Mexico, Oaxaca, and Southern Mexico, Central America and northern South America) and one each to the Northern Peru-Ecuador (PhI) and the southern Andean gene pools. Data suggest that water-related traits, such as drought tolerance or water-use efficiency may be more common in the MW1 and MW2 groups, and the derived domesticates, such as race Durango (Singh et al. 1991). In contrast, they are consistent with the adaptation to relatively cooler and moister environments encountered by Andean wild beans and their domesticates, e.g., in Colombia (Debouck et al. 1993) and western Europe (the Netherlands: (Zeven 1997). Inferred divergence time between Andean and Mesoamerican gene pool showed an average of T1 ~ 87,000 years with a 95% confidence interval (CI) in the range of 86,635-88,186 years and a highest posterior density (HPD) of the estimated divergent time between 50,008 and 168,809 years. The estimated divergence time between the Ancestral and the main common bean group showed an average of T2 ~ 373,000 years (95% CI: 371,799-374,321; 95% HPD: 300,009-505,122). Thus, the Ecuador-N. Peru group represents an early separation from the main wild gene pool of P. vulgaris, consistent with the presence of ancestral phaseolin types (Kami et al. 1996) endemic in this group, as well as the sharing of ancestral chloroplast DNA between this group and the Mesoamerican gene pool (Chacon et al. 2007). Given that the core area of the genus Phaseolus as defined by Maréchal et al. (Maréchal 1978) is located in Mesoamerica (Delgado-Salinas et al. 1999; Freytag and Debouck 2002), the existence of species or populations outside this area suggests one or more dispersal events. In general, we propose that the current distribution of wild P. vulgaris has been achieved by seed dispersal at three spatial scales, each of them associated with its own temporal scale. Distance and frequency of seed dispersal are inversely correlated. In turn, these long-distance migration events have exposed wild beans to different climates and soil types. The northward migration, in particular, has led to potential adaptation to hotter and drier conditions. Further experimentation is necessary to determine the value of introgression from this wild beans into the domesticated gene pool.

Publications

Type: Journal Articles Status: Published Year Published: 2015 Citation: Ariani A, Gepts P 2015. Genome-wide identification and characterization of aquaporin gene family in common bean (Phaseolus vulgaris L.). Mol Genet Genomics 290 (5): 1771-1785. DOI: 10.1007/s00438-015-1038-2
Type: Journal Articles Status: Published Year Published: 2015 Citation: Kole C, Muthamilarasan M, Henry R, Edwards D, Sharma R, Abberton M, Batley J, Bentley A, Blakeney M, Bryant J, Cai H, Cakir M, Cseke LJ, Cockram J, Oliveira ACd, Pace CD, Dempewolf H, Ellison S, Gepts P, Greenland A, Hall A, Hori K, Howe GT, Hughes S, Humphreys MW, Iorizzo M, Ismail AM, Marshall A, Mayes S, Nguyen HT, Ogbonnaya FC, Ortiz R, Paterson AH, Simon PW, Tohme J, Tuberosa R, Valliyodan B, Varshney RK, Wullschleger SD, Yano M, Prasad M. Application of genomics-assisted breeding for generation of climate resilient crops: Progress and prospects. Frontiers in Plant Science 6: 563. doi:10.3389/fpls.2015.00563
Type: Journal Articles Status: Published Year Published: 2015 Citation: Abberton M, Batley J, Bentley A, Bryant J, Cai H, Cockram J, Costa de Oliveira A, Cseke LJ, Dempewolf H, De Pace C, Edwards D, Gepts P, Greenland A, Hall AE, Henry R, Hori K, Howe GT, Hughes S, Humphreys M, Lightfoot D, Marshall A, Mayes S, Nguyen HT, Ogbonnaya FC, Ortiz R, Paterson AH, Tuberosa R, Valliyodan B, Varshney RK, Yano M. Global agricultural intensification during climate change: a role for genomics. Plant Biotechnology Journal: 1-4. doi: 10.1111/pbi.12467
Type: Journal Articles Status: Published Year Published: 2016 Citation: Ariani A, Berny JC, Gepts P (2016) Genome-wide identification of SNPs and copy cumber variation in common bean (Phaseolus vulgaris L.) using genotyping-by-sequencing (GBS). Molecular Breeding 36 (online): 87 (11 pages). DOI: 10.1007/s11032-016-0512-9
Type: Journal Articles Status: Published Year Published: 2017 Citation: Rend�n-Anaya M, Montero-Vargas JM, Saburido-�lvarez S, Vlasova A, Capella-Gutierrez S, Ordaz-Ortiz JJ, Aguilar OM, Vianello-Brondani RP, Santalla M, Delaye L, Gabald�n T, Gepts P , Winkler R, Guig� R, Delgado-Salinas A, Herrera-Estrella A. Genomic history of the origin and domestication of common bean unveils its closest sister species. Genome Biology 18: 60 DOI 10.1186/s13059-017-1190-6
Type: Journal Articles Status: Published Year Published: 2017 Citation: 2017 Rend�n-Anaya M, Herrera-Estrella A, Gepts P, Delgado-Salinas A (2017) A new species of Phaseolus (Leguminosae, Papilionoideae) sister to Phaseolus vulgaris, the common bean. Phytotaxa 313:259-266 10.11646/phytotaxa.313.3.3
Type: Book Chapters Status: Published Year Published: 2017 Citation: Gepts P. Genetic aspects of crop domestication. In: Hunter D, Guarino L, Spillane C, McKeown PC (eds) Routledge Handbook of Agricultural Biodiversity. Routledge/Taylor & Francis, New York
Type: Journal Articles Status: Published Year Published: 2018 Citation: Berny Mier y Teran JC, Konzen ER, Medina V, Palkovic A, Ariani A, Tsai SM, Gilbert ME, Gepts P (2018) Root and shoot variation in relation to potential intermittent drought adaptation of Mesoamerican wild common bean (Phaseolus vulgaris L.). Annals of Botany doi: 10.1093/aob/mcy221
Type: Journal Articles Status: Published Year Published: 2018 Citation: Ariani A, Berny Mier y Teran J, Gepts P. Spatial and temporal scales of range expansion in wild Phaseolus vulgaris. Molecular Biology and Evolution 35 (1): 119-131 doi: 10.1093/molbev/msx273
Type: Journal Articles Status: Submitted Year Published: 2019 Citation: Signatures of Environmental Adaptation During Range Expansion of Wild Common Bean (Phaseolus vulgaris)

Progress 09/15/15 to 09/14/16

Outputs
Target Audience:The target audience are primarily other researchers (plant breeding, plant pathology) and curators of gene banks. I have attended bean meetings, as well as genomics meetings. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Andrea Ariani, postdoc; Jorge Berny, PhD graduate student in Horticulture & Agronomy How have the results been disseminated to communities of interest?Presentation at ITQB, Lisbon, Portugal, September 2015 Poster at Plant and Animal Genome XXIV, San Diego, January 2016 Presentation at CINVESTAV, Irapuato, Mexico What do you plan to do during the next reporting period to accomplish the goals?We will continue analyzing the correlation between climatic variables and DNA variation in wild beans. We will also conduct an analysis of drought tolerance in domesticate x wild crosses.

Impacts
What was accomplished under these goals? Wild common beans (Phaseolus vulgaris) constitute an extremely useful resource for broadening the genetic diversity and breeding improvement of the domesticated gene pools. However, this resource is still under-exploited due to the scarcity of information regarding genetic diversity and resistance markers in wild common beans, that are essentials for an efficient breeding program. For this reason we applied Genotyping-By-Sequencing (GBS) in a panel of ~280 wild common beans representative of the genetic diversity and geographical distribution of this species. With this approach, we identified 19,126 variants in the common bean genome across 246 wild individuals. We used this variants dataset for analyzing genetic diversity, population structure and phylogenetic relationship between the different wild gene pools of this species, but also for evolutionary studies using coalescent simulation and Approximate Bayesian Computation (ABC). In addition, we applied a landscape genomic approach by coupling genotypic information with GIS data and bio-climatic databases, and identified several markers involved in environmental adaptation in wild common bean. This is the first study reporting a genome-wide genetic characterization of wild Phaseolus vulgaris using state-of-the-art genotyping technologies. The results delivered would allow a more efficient germplasm management for this species, and would facilitate breeding improvement of domesticated common bean by introgessing abiotic resistance traits from the wild gene pool.

Publications

Progress 09/15/14 to 09/14/15

Outputs
Target Audience:The target audience are primarily other researchers (plant breeding, plant pathology) and curators of gene banks. I have attended bean meetings, as well as genomics meetings. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Postdoc: Dr. Andrea Ariani; graduate student in Horticulture & Agronomy: J. Berny How have the results been disseminated to communities of interest?Presentation of a poster at the Plant & Animal Genomics meeting in Jan. 2015 in San Diego: Ariani Andrea, Berny Jorge, Gepts Paul. Characterization of genetic diversity and climatic selection in wild common bean (Phaseolus vulgaris) using Genotyping by Sequencing (GBS) technology What do you plan to do during the next reporting period to accomplish the goals?Continue analysis of relationships between DNA sequence variation and environmental variables.

Impacts
What was accomplished under these goals? During domestication, and subsequent selection, plants were subjected to a severe bottleneck that drastically reduced genetic diversity in cultivated crops. This reduced diversity restricts the ability of breeding improvement in crop plants. Crop wild relatives constitute a valuable genetic resource for broadening the domesticated gene pools, even though they have been poorly characterized and exploited in breeding programs. Common bean (Phaseolus vulgaris) originated in the Americas and diverged into two separated gene pools (e.g., Mesoamerican and Andean), that underwent domestication independently. Wild common beans have a wide geographical distribution from north Mexico to north Argentina, that reflects an adaption to different environments. In this project we aim to genotype, using GBS technology, 284 wild common beans distributed from central to south America. The goal of the project is to generate a high density SNPs resource for wild common beans. These data will be useful for a comprehensive analysis of genetic diversity and signature of climatic selection in this species. After an in silico digestion of common bean reference genome we selected CviAII as the most suitable enzyme for our goals. We tested the approach with a sample of 17 wild and domesticated common bean, and a domesticated P. dumosus. We identified >47k high-quality variants located in ~11k annotated genes. SNPs distribution was positively correlated with chromosome size (r=0.79), with a median of 79 variants/Mb. These results suggest a dense variant distribution suitable for a comprehensive analysis of genetic diversity and signature of climatic selection in common bean.

Publications

Progress 09/15/13 to 09/14/14

Outputs
Target Audience: The project was presented and discussed during the Annual Bean Breeding Field Day on Sept. 4, 2014 (such a field day is held every year in the first week of September). The audience is diverse and consists of growers and warehouse handlers affiliated with the California Dry Bean Advisory Board, the California Crop Improvement Association, other bean growers such as the organic bean industry (e.g., Lundberg Family Farm), seed saving companies, visiting scientists, visiting scholars (Humphrey Fellows at UC Davis), and UC Davis students. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Four individuals have benefited from training and professional development in this project. Dr. Andrea Ariani is a postdoctoral fellow funded by this project and who is successfully developing the GBS protocol so far (see Objective 2) and is now in the process of subjecting the entire sample of 300 wild beans to GBS. Mr. Jorge Berny, a PhD graduate student from Mexico with funding from CONACYT, has developed the sample of 300 wild beans and extracted DNA from the respective accessions. Ms. Saarah Kuzay, an undergraduate student, is participating in the SNP analysis of the domesticated accessions as part of and funded by the BeanCAP. Her data on wild beans will be added to the current data. Ms. Antonia Palkovic (not funded by this project) is an assistant specialist who is in charge of greenhouse cultivations of beans, including wild beans. These provide a special challenge because of their dormancy during germination and aggressive climbing growth habit. How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Goal 1: SNP diversity using the current platform based on the BARCBEAN6 chip. We have completed the analysis of the domesticated materials. These data will then be available when the SNP data of the wild lines become available in Year 2. Data will be analyzed for polymorphism levels, differentiation between wild and domesticated types (Fst), and potential selection signatures of domestication as well as geographic and climatic correlations. Goal 2: This goal is largely completed but additional domesticated lines may be sequenced to provide a better reference in comparison with wild accessions. Goal 3: GBS will be completed during Year 2. This will allow variant calling to start. Frequency, genome distribution, and other population genetic parameters will be estimated.

Impacts
What was accomplished under these goals? Objective 1: Use the existing SNP platform to compare gene-pool-wide and genome-wide diversity and divergence patterns between wild and domesticated types in the Andean and Mesoamerican gene pools of common bean (Phaseolus vulgaris). The National Plant Germplasm System of the USDA and the Genetic Resources Unit of CIAT in Colombia currently hold the majority of wild common bean accessions. From their databases, we selected 300 accessions criteria of covering the most geographic origin of collection without redundancy. We included 110 genotypes that were previously studied for diversity by three independent research groups (Chacón et al., 2005: Kwak and Gepts, 2009; Bitocchi et al., 2012). The sample is representative geographically, climatically, and altitudinally. From the total, 249 of the accessions have been sent for genotyping with the 5,398 SNP markers developed by the Common Bean Coordinated Agricultural Project (BeanCAP) using the Illumina Infinium BARCBEAN6K_3 GeneChip at the Soybean Genomics and Improvement Laboratory, ARS/USDA in Beltsville, Maryland (Hyten et al., 2010). Objective 2: Develop a Genotyping-by-Sequencing (GBS) pipeline for further SNP discovery. 2.1 In silico evaluation of restriction enzyme cutting sites in the P. vulgaris genome Thanks to the availability of P. vulgaris whole-genome on Phytozome (Goodstein et al. 2012; Schmutz et al. 2014), a survey of different restriction enzymes and their relative cutting sites was performed. From the biopython suite enzymes were selected that create a 'sticky' end after cleaving, cut only once for each recognition sites, and do not recreate the restriction site after cleaving. Due to the relatively low level of methylated DNA in the bean genome, ~30% (Abid et al. 2010), another approach was followed. For each enzyme, the distribution of recognition sites in repetitive vs non-repetitive parts of the genome was determined; those enzymes that preferentially cut in the non-repetitive part of the genome were selected. Within this sub-set of enzymes, the ones with the higher count (repetitive vs non repetitive) and the lower p value (based on binomial test) were selected. Among these enzymes, CviAII was chosen since it has the highest occurrence in the non-repetitive parts of the genome and is a neoschizomer (i.e. it recognizes the same nucleotide sequence) of NlaIII, an enzyme used in our lab for construction of RESCAN sequencing libraries (Monson-Miller et al. 2012) in common bean, allowing then for a possible comparison of SNPs detected with different technologies. This enzyme was then analyzed for the in silico length distribution of fragment sizes along the genome. 2.2 Application of GBS protocol on different Phaseolus genotypes - Library preparation and quality evaluation In order to check the applicability of the GBS approach using CviAII, a test experiment was performed with 18 wild and domesticated Phaseolus genotypes belonging to both the Andean and Mesoamerican gene pools. Genotypes included G21245 (W, Intermediate), CAL143 (D, A; drought tolerant), G19833 (D, A; reference genome of Schmutz et al. 2014), UC0801 (D, A; UCD variety), Midas (D, A; parent of RI population), PI417653 (W, M), PI319441 (W, M), PI343950 (W, M), G12873 (W, M; parent of RI population), SEA5 (D, M; drought tolerant), Pinto San Rafael (D, M; drought tolerant), UCD Flor de Mayo (D, M; UC Davis advanced line), SER118 (D, M; drought tolerant), Matterhorn (D, M; drought tolerant variety Michigan State U.), UCD9634 (D, M; drought tolerant UC Davis advanced line), L88-63 (D, M; drought tolerant), Victor (D, M; drought tolerant), and PI311859 (D, P. dumosus as outgroup species). Specific barcodes and adapters for CviAII were designed with the GBS barcode adapter generator (http://www.deenabio.com/services/gbs-adapters). One of the parameters that needs to be optimized for each species in GBS is the initial adapter concentration. This is essential for a correct ligation between adapters and genomic DNA, but also to avoid the formation of adapter-dimers that would be needlessly sequenced. We optimized adapter concentration to 4.5 ng per reaction and we reduced the ligation buffer concentration to 0.6x, instead of 1x, during the ligation step. This optimization led to a higher library concentration, with average fragment size between 170-300bp and a complete absence of adapter-dimers (fragment ~130bp). Except for the previously mentioned optimizations, the library was prepared following the protocol of Elshire et al. (2011) and sequenced in a single lane of Illumina HiSeq2000 flowcell, using the 50bp cycle protocol at the qb3 laboratory at UC Berkeley, CA, USA. A total of 137,026,622 50bp single-end reads was generated. Of these, 127,384,853 sequences (93%) passed the initial quality trimming. Among these ~127M reads, 3,002,729 (2.4%) were removed because they were shorter than 30bp after trimming of reads containing the RE recognition site or adapter contaminants, or because did not contain the overhang RE sequence after the barcode index. As expected from the library preparation strategy, there was a high level of duplicated reads, with 13,278,501 unique reads in the dataset, suggesting a mean 10x redundancy for each read tag. This data suggests that the overall library quality was high and consistent with the experimental approach. 2.3 Alignment and variant calling The numbers of reads was almost equally distributed among the different lines, with ~90% of annotated genes (> 25,000) being tagged by at least one read (Table III). In particular, almost 50% of the reads in each individual could be aligned with the reference genome, and 50% of the aligned reads tagged gene sequences. The total number of reads per gene in each line ranged from 36 to 84, with a mean of 52 reads per gene in each individual. These results are consistent with the in silico digestion of P. vulgaris genome, and showed a homogeneous read mapping rate for races belonging to different gene pools, and also for a sister species in the same genus (P. dumosus). A total of 77,595 high quality SNPs and InDels were identified. Among the variants, 73,656 (94.9%) were SNPs, 2,088 (2.7%) were deletions and 1,851 (2.4%) were insertions. The InDels ranged from 1 to 8 bp, with the majority of them being mononucleotide insertions and deletions. Due to the repetitive nature of most plant genomes, and the resulting miscalls of SNPs and InDels in repetitive regions, all the variants that are located in these regions, were removed. In total, 61% (47,838) of the identified variants were located in non-repetitive regions of the genomes. This ratio is similar to the in silico occurrence of CviAII recognition sites in non-repetitive vs repetitive regions of the genome. These variants are divided between 44,875 (94%) SNPs, 1,940 (3%) deletions and 1,693 (3%) insertions. For further analysis, only these non-repetitive SNPs were considered. The SNP and InDel distributions were significantly highly correlated with chromosome length (r=0.79, p=0.004) (Fig. 3), with a mean of ~4328 and a median of 4,312 variants per chromosomes and a median of 79 variants per Mb. Distribution of SNPs and InDels showed a non-random patterns, with fewer or no SNPs in the centromeric parts of the genome. Of the identified 47,838 SNPs and InDels, 23,273 (49%) were located in genic sequences, with 11,163 (23%) in CDS, 2,285 (5%) in untranslated regions (UTRs) and 9,825 (21%) in introns. For all the genotypes analyzed, 45-49% of the SNPs and InDels were located in genic sequences; among them, ~50% were located in CDS, ~40% in introns, and ~10% in UTRs. The 23273 SNPs and InDels located in genic sequences identified 11027 different genes, with an average of 2 variants per gene. Objective 3: None so far.

Publications