Tools and Applications of Gene-by-Gene Sequencing in Common Bean

TOOLS AND APPLICATIONS OF GENE-BY-GENE SEQUENCING IN COMMON BEAN

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

NRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

0203112

Grant No.

2005-35300-15450

Cumulative Award Amt.

(N/A)

Proposal No.

2005-00757

Multistate No.

(N/A)

Project Start Date

Apr 15, 2005

Project End Date

Oct 14, 2008

Grant Year

2005

Program Code

[52.1]- (N/A)

Recipient Organization
NORTH DAKOTA STATE UNIV
1310 BOLLEY DR
FARGO,ND 58105-5750

Performing Department
PLANT SCIENCES

Non Technical Summary
To date, genomic information has been collected for only a few select plant species. Gene sequence data is the most abundant information currently available. It is now time to use the data from those species to assist with genomic analysis in other species. Tools will be developed that organize that data in a useful manner that enables other researchers studying other plant species to apply gene sequence based analysis to their species of choice. We will demonstrate this usefulness by applying those tools to the study of common bean (Phaseolus vulgaris L.).

Animal Health Component

50%

Research Effort Categories

Basic

50%

Applied

50%

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1410	1080	50%
201	2499	1080	50%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1410 - Beans (dry); 2499 - Plant research, general;

Field Of Science
1080 - Genetics;

Keywords

Goals / Objectives
Extensive genomic resources are not available for most plant researchers, yet all can benefit when they apply the data obtained by the large genome projects. The most abundant genomic resource is the collection of gene sequences developed from gene-by-gene or whole genome sequencing approaches. It is now time for researchers working with other crop species to mine this sequence information and convert it into useful genomic tools. The first objective of this project is to develop tools and procedures that define clusters of plant orthologs and paralogs. In addition, a WWW interface will be developed that displays multiple alignments of cluster members. Finally, approaches will be defined and implemented that will define primer sequences for the subsequent amplification of a cluster ortholog from another plant species. Common bean (Phaseolus vulgaris L.) is one species for which genomic resources are under developed. A greater abundance of gene sequence data would be useful to develop coordinate the genetic and physical map of the species. In addition, the sequence data is critical for the discovery of genes controlling critical phenotypes using candidate gene and association mapping approaches. Therefore, the second objective is to apply these clustering, alignment, and primer development tools to sequence a fragment(s) from 300 genes from common bean. These genes will then be mapped on the community wide BAT93 x Jalo EEP558 linkage map.

Project Methods
Objective 1: Complete gene models will be collected from all genes characterized by gene-by-gene approaches or complete genome sequencing. An all-against-all BLAST analysis will be performed. Then a complete linkage hierarchial clustering will be performed at specific e-values to define the orthologous/paralogous relationships. Clusters of ortholog/paralogs defined at a specific e-value will be aligned using multiple alignment techniques. The alignment data will then be used as input data to define primers for amplification of orthologous fragments from other species. The clusters, multiple alignments, and primer information will be delivered via a WWW interface that enables the user to apply a taxonomic approach to select cluster members from which clusters, multiple alignments, and primers are defined. Objective 2: Orthologs discovered using the tools described above, along with any available EST data, will be used to define primers for PCR amplification of gene fragments from the common bean (Phaseolus vulgaris L.) genotypes BAT93 and Jallo EEP558. The genes will represent those involved in metabolic pathways, Arabidopsis mutant phenotypes, and other relevant agronomic phenotypes. The fragments will be sequenced, and indel or SNP polymorphism will be used to develop mapping primers. The BAT93 x Jalo EEP558 recombinant inbred population will be scored and a linkage map consisting of these genes along with previously described molecular and phenotypic markers will be developed.

Progress 04/15/05 to 10/14/08

Outputs
OUTPUTS: The two major goals of the project are: 1) develop a 300-gene transcript map of common bean using the community-wide BAT93 x Jalo EEP558 mapping population; and 2) develop an on-line tool from which sequence data can be used to design primers that will be used to amplify target genes from species severely lacking genomic sequence resources. We have already completed the first goal. All of the data has been collected, CAPS and dCAPS markers were developed, the parents of the major common bean mapping populations were scored for these polymorphic loci, and all of the data is currently loaded into the LIS database at NCGR. We are going beyond this goal, and developing a low-density SNP map for the sequence data that we have collected. In addition, we have collaborated with the Jackson lab (Purdue) project to develop and integrate a physical map of common bean with the genetic map. We are also working with the soybean research community to trace the ancestry of the soybean genome using the common bean transcript map as a reference. All of these last activities were beyond the initial goals of the project. For goal two, we have implemented or developed all of the tools necessary to extract the necessary sequence data, define appropriate gene families, discovery appropriate primer site targets, and report primer sequences. We are currently implementing these in our WWW interface. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Not relevant to this project.

Impacts
The development of the database will enable researchers working on species without significant sequence data to apply the modern candidate gene genomic approach to their research. The gene-based map of common bean will provide a framework for 1) gene cloning of important agronomic target genes in common bean and 2) the application of comparative legume genome analysis to the improvement of the crop.

Publications

Schlueter JA, Goichochea JL, Gill V, Lin J-Y, Yu U, Collura K, Vallejos, Thome J, Blair M, McClean P, Wing R, Jackson SA. 2008. BAC-end sequence and a draft physical map of the common bean (Phaseolus vulgaris L.) genome. Tropical Plant Biology 1:40-48.
Mamidi, S., Lee, R.K., Terpstra, J., Schlueter, J.A., Dixon, P., Shoemaker, R.C., Lavin, M., and McClean, P.E.. 2008. Investigating Gene duplication events in legumes using EST sequence data. Plant and Animal Genome XVI Abstracts. http://www.intl-pag.org/16/abstracts/PAG16_P05f_378.html.
McConnell, M.D., Lee, R.K., Choi, I.Y., Song, Q., Song, Q.J., Cregan, P., McClean, P.E. 2008. Macrosyntenic relations between common bean (Phaseolus vulgaris L.), Medicago, Arabidopsis, and Poplar. http://www.intl-pag.org/16/abstracts/PAG16_P05f_420.html.
Gepts P, Aragao F, de Barros E, Blair, MW, Brondani R, Broughton WJ, Galasso I, Hernandez G, Kami J, Lariguet P, McClean P, Melotto M, Miklas P, Pauls P, Pedrosa-Harand A, Porch T, Sanchez F, Sparvoli F and Yu K. 2008. Genomics of Phaseolus beans, a major source of dietary protein and micronutrients in the Tropics. In: Moore PH, and Ming R (Eds), Genomics of Tropical Crop Plants. Springer, Berlin, 113-143.

Progress 04/15/06 to 04/14/07

Outputs
Database development: The latest version of the database can be viewed at: http://134.129.125.203/. The current version of the database has full CDS sequences for Viridiplantae species in GenBank (as of November 2006) except Arabidopsis, rice, and Medicago. For those three species, the database is populated with the gene models generated from the on-going sequencing projects. We have implemented a phylogenetic approach to the database. The user selects the specific hierarchy to which the want data. For example, a legume researcher can expand Angiosperms/Eudicots/Core Eudicots/Rosids and then select Legumes. Or the individual can select any level above to provide a broader survey of sequences. After the individual enters their query, the results show individual records in the database that meet the query criteria. Next the user selects one of records and other sequences that are similar at a user selected e-value are returned. We are currently working on implementing several other features. First, the user will be able to view a multiple alignment of the sequences (using the MultAlin algorithm). Next, they will be able to create primers based on the Primer 3 algorithm. They will also be able to download the nucleotide and amino acid sequences for the sequences in the cluster. Once all of the features are in place, we intend to completely update the database by including all of the Viridiplantae CDS sequences along with all of the full gene models from all sequenced plant genomes. When complete the database should include over 125,000 records for orthology and paralogy searches as well as primer design. Gene-by-gene sequencing in common bean: 300 genes were mapped using markers from the core BJ map (Freyre et al. 1998) as a guide for linkage group assignment. The completed map is 1586.7 cM in length, containing 205 markers mapped at a LOD score of 3.0 or better, 159 of which are the new gene-based markers. 303 total markers were mapped at LOD 2.0 or better, 214 of which are gene-based. In addition, 139 markers could be assigned to bins between markers on the LOD 2.0 map. All totaled, 285 gene-based markers could be assigned to a location. It also contains the location of other previously mapped markers. Linkage group (LG) 2 is the longest linkage group, at 207.5 cM, and LG 5 is the smallest, at 77.2 cM. The number of markers on each linkage group range from 27 (LG 4 and LG 10) to 56 (LG 2). The average number of markers on each linkage group is 41, and the average number of gene-based markers on each linkage group is 26. We have deposited the map and all associated data into Legume Information Service database (see http://www.comparative-legumes.org/cgi-bin/cmap/map_set_info?species_ acc=Pv&map_type_acc=-1 and select the maps associated with McClean (NDSU) 2007). We have begun studying the synteny with other species and found relationships that extend over tens of megabase distances in Medicago and Poplar and tens of centimorgans in common bean. As with other plant species, we observed synteny over a megabase distance in Arabidopsis and a few centimorgans in common bean.

Impacts
The development of the database will enable researchers working on species without significant sequence data to apply the modern candidate gene genomic approach to their research. The gene-based map of common bean will provide a framework for 1) gene cloning of important agronomic target genes in common bean and 2) the application of comparative legume genome analysis to the improvement of the crop.

Publications

Rossi M, Mamidi S, Bellucci E, McConnell MD, Lee RD, Papa R, McClean PE. (2007) The effect of selection on loci within close proximity of domestication loci in common bean (Phaseolus vulgaris L.) Phaseomic V Abstracts. p. 9.
McConnell MD, Mamidi S, Rossi M, Lee RK, and McClean PE. A gene-based linkage map of common bean (Phaseolus vulgaris L.). 2007. Plant and Animal Genome XV Abstracts. (http://www.intl-pag.org/15/abstracts/PAG15_P05f_414.html)
Mamidi S, Rossi M, McConnell MD, Lee RK, Papa R, McClean PE, and Bellucci E. 2007. Investigation of the domesticatin process in common bean (Phaseolus vulgaris L.) using multilocus data. Plant and Animal Genome XV Abstracts. (http://www.intl-pag.org/15/abstracts/PAG15_P05f_416.html)
Buchfink DJ, Denton A, McClean P. 2007. Database and tools for primer design. 2007. Plant and Animal Genome XV Abstracts. (http://www.intl-pag.org/15/abstracts/PAG15_P08a_843.html)

Progress 04/15/05 to 04/14/06

Outputs
Database development: The goal of this aspect of the project is to develop a database from which users interested in gene-by-sequence can make a query and be offered a suite of primers from which the target gene could be amplified from their species of interest. To that end, we created a database structure to store all relevant data. We have downloaded all of the publicly available gene models for Arabidopsis, rice, Medicago, and maize from databases involved in curating this data. In addition, we downloaded all full gene models available from GenBank for species other than these model species. Collectively, we have over 100,00 sequences. These sequences were analyzed in an all-against-all manner using blastp and clustered using complete linkage clustering. Alignments of clusters were performed for all clusters at specific similarity levels using the MultAlin algorithtm. An algorithm was developed to search the alignment for the best regions for primer development. A perl script was tested to pass specific sequences to Primer3 for primer development. More specifically, we implemented a BioSQL schema in the PostGreSQL database. For clustering, 50% of each gene involved in the cluster constraint was require. Finally, to evaluate the clustering, we used a histogram-based evaluation measure and determined the complete linkage clustering provided better alignments than single linkage clustering. Gene-by-gene sequencing of common bean: The second goal of the project is to perform gene-by-gene sequencing with common bean. We collected all of the known tentative consensus (TC) sequences of common bean and compared them with all of the gene models from Arabidopsis. Genes to be sequenced were selected based on similarity in a BLAST search. TC sequences were used as a query for an all-against-all blastp analysis against individual databases containing Arabidopsis thaliana genes with mutant phenotypes, genes under selection during domestication in maize, A. thaliana genes involved in biochemical pathways, and all A. thaliana genes. A gene was selected for sequencing if had at least 100 nucleotides in the 3 prime UTR and an E-value less than e-30 with the top hit. Primers were designed with Primer3 with a target TC fragment size of 450-500 nucleotides, primer size of 18-28 nucleotides, and Tm of all primers about 58oC. The 3 prime primer was targeted to a location 150 nt downstream of the putative stop codon. Fragments were amplified from BAT93 (Bat) and Jalo EEP558 (Jalo) genomic DNA and directly sequenced. Of the more than1000 genes analyzed to date, DNA sequence data for the two genotypes were obtained for 322. Of these, 222 genes were polymorphic. A total of 1003 polymorphisms were detected, and of these 85.5% were SNPs. On average, one SNP was detected every 151 nt, and one indel was observed every 897 nt. 44.1% of the polymorphisms were located in introns, 38.7% in exons, and 17.0% in the 3 prime UTR. SNPs were evenly distributed between introns and exons, whereas indels were largely found with introns. The sequence polymorphism data was used to developed CAPS markers, and to date, we have mapped 52 genes on the Bat x Jalo linkage map.

Impacts
The development of the database will enable researchers working on species without significant sequence data to apply the modern candidate gene genomic approach to their research. The gene-based map of common bean will provide a framework for 1) gene cloning of important agronomic target genes in common bean and 2) the application of comparative legume genome analysis to the improvement of the crop.

Publications

McConnell, M., Mamidi, S., Lee, R. and McClean, P.E. 2006. DNA sequence polymorphisms among common bean genes. Annu. Rept. Bean Improv. Coop. 49: in press.
Kar, A., Dorr, D., Denton, A. and McClean, P. 2006. Evaluating clusterings for primer design. Plant and Animal Genome XIV Abstracts. p. 322.
Dorr, D., and Denton, A. 2006. Clustering sequences by length alignment. Plant and Animal Genome XIV Abstracts. p. 327. McClean, P.E., Lee, R.D., McConnell, M.D., Mamidi, S. and White, A. 2006. Sequence and marker-based diversity in common bean. Plant and Animal Genome XIV Abstracts. p. 40.
McConnell, M., Mamidi, S., Lee, R., and McClean, P. 2006. Mapping putative functional genes in Phaseolus vulgaris. Plant and Animal Genome XIV Abstracts. p. 212.