Full-length cDNA Resources for Legume Genomics - INSTITUTE FOR GENOMIC RESEARCH

FULL-LENGTH CDNA RESOURCES FOR LEGUME GENOMICS

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

NRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

0207303

Grant No.

2006-35300-17144

Cumulative Award Amt.

(N/A)

Proposal No.

2006-03597

Multistate No.

(N/A)

Project Start Date

Sep 1, 2006

Project End Date

Aug 31, 2010

Grant Year

2006

Program Code

[52.1]- Plant Genome

Recipient Organization
INSTITUTE FOR GENOMIC RESEARCH
9712 MEDICAL CENTER DRIVE
ROCKVILLE,MD 20850

Performing Department
(N/A)

Non Technical Summary
Legumes are one of the two most important crop families in the world and, due to their ability to convert atmospheric nitrogen into organic compounds, have high levels of protein that provides nearly 33% of all human nutritional requirements for nitrogen. Large-scale genome sequencing efforts that will provide many fundamental insights into legume biology are underway for two model species with small genomes, Medicago truncatula and Lotus japonicus, and one crop species, soybean, which is a mainstay on US agriculture. Complementary DNAs (cDNAs) are derived from the messenger RNAs (mRNAs) of expressed genes by a process of reverse transcription followed by second strand DNA synthesis. Due to technical limitations, not all cDNAs represent the entire mRNA of a gene. Full-length cDNAs (FL-cDNAs) contain the entire coding sequence of a gene and are important in plant genomics for two reasons: gene structure annotation and gene function analysis. In the research proposed here, we will generate and sequence large collections (12,000-15,000) of FL-cDNA clones and sequences for each of the three species and make them publicly available to the academic and industrial agricultural research communities. These sequences will be used to provide high quality annotation of gene structures in the genome sequences of Medicago, Lotus and soybean. The clones will be used for functional studies by expressing the encoded proteins and studying their biochemical properties such as catalytic activity and the nature of the proteins with which they interact.

Animal Health Component

25%

Research Effort Categories

Basic

75%

Applied

25%

Developmental

(N/A)

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	1820	1080	33%
201	2420	1080	67%

Knowledge Area
201 - Plant Genome, Genetics, and Genetic Mechanisms;

Subject Of Investigation
1820 - Soybean; 2420 - Noncrop plant research;

Field Of Science
1080 - Genetics;

Keywords

Goals / Objectives
The project will generate clones and sequences for approximately 12,000 full-length cDNAs from each of three species - Medicago truncatula, Lotus japonicus and Glycine max (soybean)

Project Methods
For each species, the project will generate normalized libraries from RNAs pooled from 3 groups of tissues: 1. Flowers, early seed, late seed, stems; 2. leaves, abiotic and biotic stressed leaves, tissue culture and elicitor-treated plants; 3. roots, early and late nodules, abiotic and biotic stress. From each library, 13,000-14,000 clones will be sequenced from the 5' end. Potentially full-length clones will be identified by comparison of the 5' sequence with a comprehensive protein database. Representative full-length clones for each distinct coding sequence will be re-arrayed and sequenced from both ends. All sequences will be deposited in GenBank and the clones distributed through the Noble Foundation in the US and INRA-CNRGV in the EU

Progress 09/01/06 to 08/31/10

Outputs
OUTPUTS: We constructed three normalized cDNA libraries from each of three legume species: soybean, Medicago and Lotus. For each species, one library was from above ground tissue, one from below-ground tissue and one from various stressed tissues (biotic and abiotic). For each species, we generated ~ 40,000 Sanger reads, predominantly from the 5' ends of the clones. For the soybean and Medicago libraries, approximately 40% of the clones appeared to be 5' complete (i.e. full-length) as judged by the TargetIdentifier program, while for Lotus, the number was ~ 35%. A non-redundant set of potentially full-length clones was identified for each species and these were re-arrayed for complete sequencing. Clones were arranged in plates according to their predicted insert size: less than 750 bp; 750-1200 bp and above 1200 bp to facilitate sequencing. All re-arrayed clones were sequenced from both ends. Primers for walking the larger clones were designed either from available genome sequence or from the 5' and 3' reads of the clones themselves. The medium- and large-insert clones were then subject to one or two rounds of primer walking, resulting in up to 4 or 6 sequences per clone respectively. After sequencing was complete, reads were assembled on a clone-by-clone basis to produce either a single sequence for each full-length cDNA clone or two partial sequences (5' and 3') if insufficient sequence was obtained. The clone-by-clone assemblies were submitted to GenBank as individual records so that they can be cross-referenced to the clones themselves which will be available through stock centers in the US and in Europe. Results from the project have been reported at the Model Legumes Congress, at USDA workshops at the Plant and Animal Genome Conference and in seminars at a number of institutions. All sequence data has been submitted to GenBank. PARTICIPANTS: The following individuals worked on this project: Yongli Xiao (Staff Scientist): performed early work on FL-cDNA identification and resequencing. Now a Staff Scientist at NIH. Bill Moskal (Research Associate): Library construction, sequencing and data analysis. Now a group leader at Dow Agrochemicals. Foo Cheung (Bioinformatics Engineer): Data analysis and GenBank submissions. Now a group leader at NIH. Agnes Chan (Assistant Professor): library construction, data analysis, re-arraying and resequencing. Currently Assistant Professor at JCVI. Vivek Krishnakumar (Bioinformatics Engineer): data analysis and GenBank submission. Currently Bioinformatics Engineer at JCVI. cDNA clones will be available through the Noble Foundation (USA) and CNRGV (The French Plant Genomic Resource Center) TARGET AUDIENCES: Target audience for the initial sequencing efforts was the groups involved in annotating genome sequence. Target audience for the FL-cDNA clones is the plant research community. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The initial rounds of random sequencing contributed to the collections of ESTs in GenBank and became parts of the datasets used for gene structure prediction and validation in their respective whole genome sequencing projects: soybean, Medicago and Lotus. The full-length cDNA clones provide a valuable resource for researchers who will use the clones either to test predictions from their own research by creating transgenic plants or to express protein for antibody generation, biochemical activity determination or other purposes. Totals of 9,946 soybean, 8,273 Medicago and 8,566 Lotus full-length cDNA clones were created, sequenced and made available. We also responded to a number of requests for specific clones while the project was in progress.

Publications

No publications reported this period

Progress 09/01/07 to 08/31/08

Outputs
OUTPUTS: We have 1. Completed construction and random sequencing of three Medicago libraries; 2. Completed construction and random sequencing of 3 soybean libraries; 3. Completed construction and pilot sequencing of 3 Lotus libraries. Complete sequencing of these libraries is in progress. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Due to personnel changes, the project was not completed within the initial funding period and a third-year no-cost extension has been approved.

Impacts
We have settled on the program "TargetIdentfier", which is very similar to our own in-house pipelines, as the best way to move ahead with FL-cDNA clone identification. Our analysis to date has identified ~ 3,000 Medicago FL-clones;these have been re-arrayed and fully sequenced. We have identified over 5,000 FL-clones in the first-pass analysis and begun planning to re-array and re-sequence them. We have also used PASA (the program to assemble spliced alignments) in conjunction with the soybean genomic sequence) to distinguish near-identical paralogs and have found ~ 2,000 genes (1,000 pairs) with > 95% identity that can be sequenced as separate clones. We have submitted 17,560 new Medicago EST sequences and 40,977 new soybean EST sequences to GenBank.

Publications

No publications reported this period

Progress 09/01/06 to 08/31/07

Outputs
OUTPUTS: RNA samples for library construction were solicited from workers in legume genomics world wide. For soybean, we received a total of 45 diverse samples from 6 different laboratories. Three normalized libraries have been constructed from separate pools of RNA representing above-ground tissue, below ground tissue and various kinds of stressed tissue. Approximately 10,000 5' reads have been generated from each of these libraries. A total of 32,575 good reads assembled into 6,306 contigs with an average of 4 reads per contig and 7,730 singletons. For Medicago truncatula, we received 37 samples from 7 different laboratories. To date, two normalized libraries have been constructed from pooled RNAs, one from above-ground and one from below-ground tissues. A third library, representing various stressed tissues, is under construction. A total of 17,500 reads from these two libraries assembled into 3,719 contigs with an average of ~ 3 reads per contig and 5,584 singletons representing 9,303 unique sequences. For the soybean and Medicago libraries, on average 43 and 53% of the 5'reads represent unique sequences, and 40 and 48% of the reads are 5'-complete (possess a methionine start codon), based on alignment to protein databases. In all, 16% and 33% of the soybean and Medicago libraries represent unique and 5'complete clones. There have been fewer donations of Lotus RNA; construction of the first library will be initiated shortly and other donations are being solicited. After potentially full-length clones are identified, they are re-arrayed and sequenced from both ends, using primer walking for longer clones. Individual 5' EST reads and sequences of full-length cDNA assemblies are submitted to GenBank. The full-length clones themselves will be deposited in and available from stock centers at from the Noble Foundation (Ardmore OK, USA) and from French National Resources Centre for Plant Genomics (CNRGV) at INRA, Toulouse. PARTICIPANTS: William A. Moskal Jr., M.S. - Research Associate (library construction, sequencing and bioinformatics) Agnes P. Chan, Ph.D. - Staff Scientist (library construction, sequencing and bioinformatics) Yongli Xiao, Ph.D., Staff Scientist (sequencing and bioinformatics) Foo Cheung, Ph.D., Bioinformatics Engineer (bioinformatics) Members of the J. Craig Venter Institute Joint Technology Center (sequencing and clone-re-arraying). TARGET AUDIENCES: Legume research community.

Impacts
Full-length cDNA sequences are a crucial resource for accurate prediction of gene structures in genomic DNA. The sequences generated in this project will be used to enhance the annotation of the genomes of Lotus japonicus, Medicago truncatula and Glycine max all of which will be more or less completely sequenced within the next 18 months. Full-length cDNA clones provide a valuable resource for legume researchers to undertake expression and functional characterization of genes of interest.

Publications

No publications reported this period