Source: THE SAMUEL ROBERTS NOBLE FOUNDATION, INC. submitted to
FIVE THOUSAND VIRUS GENOMES
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
TERMINATED
Funding Source
Reporting Frequency
Annual
Accession No.
0209207
Grant No.
2007-35600-17833
Project No.
OKLR-2007-01012
Proposal No.
2007-01012
Multistate No.
(N/A)
Program Code
23.2
Project Start Date
Jan 1, 2007
Project End Date
Jun 30, 2010
Grant Year
2007
Project Director
Roossinck, M. J.
Recipient Organization
THE SAMUEL ROBERTS NOBLE FOUNDATION, INC.
2510 SAM NOBLE PARKWAY
ARDMORE,OK 73401
Performing Department
(N/A)
Non Technical Summary
Viruses are a vital part of every complex ecosystem, but they remain largely unknown. The 5000 Virus Genomes project is a study of RNA viruses that occur in wild plants. We will look for viruses without regard to symptoms in plants. The study will use be conducted in the Area Conservacion Guanacaste in northwestern Costa Rica, because this is an area of extremely high biodiversity (in 350,000 acres there are more plants than in the United States and Canada) where the infrastructure for research is well established. A subset of the plant families found in the region will be analyzed for the presence of viruses, and any positive samples will be analyzed to determine the RNA sequence of these viruses. A corollary study will use sequence information to determine if viruses are moving from agricultural lands to wildlands or from wildlands to agricultural lands. Our preliminary information shows that about 60% of wild plants are infected with RNA viruses, but most do not show any signs of disease. The results of this study will probably change the way we think about viruses, because most of what we know about viruses comes from studying the ones that cause disease in humans and domesticated plants and animals.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
21240301040100%
Knowledge Area
212 - Pathogens and Nematodes Affecting Plants;

Subject Of Investigation
4030 - Viruses;

Field Of Science
1040 - Molecular biology;
Goals / Objectives
Viruses are a vital part of all complex ecosystems. They play a significant role in maintaining host population balance, productivity and sustainability in both domestic and wildland plants and animals, and in globally important ecosystem cycles such as the nutrient cycle of the seas. However, viruses are largely unknown, and no comprehensive analyses of any terrestrial viruses have been undertaken. Our objectives are to analyze 5,000 genomes of plant RNA viruses collected from a natural environment by collecting wildland plants, assessing virus infection by analysis of double-stranded RNA, and rapidly determining the viral genomic sequences by an adaptation of 454 Life Science's genome sequencing technology. We will build new bioinformatics tools to assess diversity, incidence, relationships and movement of the viruses. Data will be deposited in a unique Taxonomy Node in GenBank to provide searchability to the scientific community. The project will analysis thousands of viral genomes, most of which will probably be novel, link the viral genomes to their appropriate hosts, and analyze of the relationships of viruses between wildlands and agricultural lands.
Project Methods
The 5000 Virus Genomes project will collect plant samples from a highly diverse region of northwestern Costa Rica. Details of location (GPS) and plant status information will be recorded, plants will be photographed and identified to species level, in collaboration with an ongoing plant inventory. Samples will be analyzed for the presence of double stranded RNA, a hallmark of RNA virus infection, through total nucleic acid extraction and chromatography with CF11 cellulose, followed by agarose gel electrophoresis. Positive samples will be prepared for sequencing on 454 Life Sciences Technology picotiter plates by random primed reverse transcriptase and PCR amplification, adding unique 4 nt tags to identify the samples. Samples will be pooled in batches of 20 and applied to one of 16 lanes of the picotiter plates, allowing analysis of 320 samples per plate. The sequences will be determined and compiled with the collection database. New bioinformatics tools will be developed that can query the data as to incidence, recurrence of individual viruses, relationships between collection sites, time of year, plant species, etc. A potyvirus, Zucchini yellow mosaic virus, which has already been identified from several samples of wild plants and one agricultural plant adjacent to the study site, will be used for phylogenetic analyses to determine virus flow between agricultural lands and wildlands.

Progress 01/01/07 to 06/30/10

Outputs
OUTPUTS: Outreach. The PI has continued her Public Lectures in Science series. The past season saw large numbers of attendees. The 2009-2010 series included four science lectures. In addition the PI hosted a summer scholar in 2010 who worked on analysis of melanin production in endophytic fungi. The work of the 2009 summer scholar is now published. The PI participated as a lecturer in the 2010 Cold Spring Harbor Plant Biology summer course. The PI also continued her role as Postdoc Mentor for the Noble Foundation, organizing teaching opportunities and training in seminar presentations, grant writing and job application skills. Research. We have collected and processed over 7500 plant samples from the Area Conservacion Guanacaste over the grant period and beyond (Table 1). These samples are from 16 different microhabitiats around the conservation area, and have been identified to species level in the field, photographed, and GPS readings have been taken and recorded. Of these samples, over 7000 are completely processed for sequencing, and the remaining few are in progress. We will complete these even though the funding has finished. This project started with a steep learning curve on how to do the pyrosequencing and analyze the massive amounts of data that it generated (Table 2), much more than we actually anticipated in the beginning. We have completed the first stage bioinformatics for over half of these samples, and have discovered about 2500 new viruses (Figure 1). The final analysis of analyzing the data will require considerably more effort, for several reasons: the software to properly assemble the data from each sample is flawed, and we have had to write new software and repeat a lot of the analysis; the tools for analysis are just now being developed; we had a very large number of sequences that did not have any related sequences in GenBank, so we have not been able to determine if these are new viruses or not, or to do proper annotation. The task of analyzing the massive amount of data generated proved to be much greater than we anticipated. For example, we did not expect that almost everything we found would be novel, we did not expect that we would get such a large number of "singletons" that were almost certainly generated by the poor quality of assembly. In our initial analysis we had decided not to include singletons; however, now we realize that we will need to reassess this. If we include singletons we find that over 98% of the plants have viruses. This means that our total discovery will be at least 8,000 new viruses, since many samples are infected with multiple viruses. We are currently seeking funding to finish the bioinformatics analysis, and to extend our study to look at movement of viruses between wildlands and crops. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Included in our samples were a repetition of the ecological collection samples: these were originally collected a few years ago from two sites in the dry forest, an old growth site and a 25 year succession site. We sampled the same sites in 2009 to determine if there have been any changes in virus incidence and distribution. From the previous collection we find that virus incidence is highest in the late dry season. This surprising result may be because, as we have shown in lab studies (Xu et al, 2008. New Phytologist 180:911), viruses confer drought tolerance. Hence in the late dry season only virus infected plants are still green and suitable for collection. The incidence of virus infection in plants varies from about 50% to almost 100%, depending on the analysis, with many samples containing multiple infections. The most virus families found in any single plant is nine. For the most part these plants are all asymptomatic. The most common viruses are the persistent viruses: Partitiviridae, Chrysoviridae, Totiviridae, and Endornaviruses. These virus families are also found in fungi; other work from our lab has shown that totiviruses, traditionally believed to be only found in fungi, are plant viruses. The data generated by this study is allowing us to gain a deep understanding of plant virus ecology, a newly emerging field that will have significant impact of the future of food production worldwide. Bullet points of novel findings: * Viruses are extremely prevalent in wild plants, and incidence may approach 100%. * The most common viruses are persistent viruses in the families Partitiviridae, Totiviridae, Chrysoviridae and the genus Endornavirus * One well-known pathogenic plant virus, Zucchini yellow mosaic virus was found frequently in wild plants from many different plant families. ZYMV was asymptomatic in all wild plants, but is responsible for an emerging plant disease in surrounding agricultural areas. Phylogenetic analysis showed that the ZYMV most likely came from local melon farms and moved into the wild plants. * The wild ZYMV seems to be under considerable positive selection for changes in the aphid transmission domains that render the virus non-transmissable, suggesting wild ZYMV may be evolving to use a new vector. * Almost all of the plant viruses found in this study are novel viruses. * A large proportion of the sequence data is for "unknowns", i.e. they have no relatives in GenBank. These are most likely the sequences of completely novel viruses.

Publications

  • Roossinck, M. J. 2010. Lifestyles of plant viruses. Phil. Trans. R. Soc. B 365:1899-1905.
  • Roossinck, M. J., P. Saha, G. B. Wiley, J. Quan, J. D. White, H. Lai, F. Chavarria, G. Shen, and B. A. Roe. 2010. Ecogenomics: Using massively parallel pyrosequencing to understand virus ecology. Mol. Ecol. 19 (S1):81-88.
  • Manuscripts in preparation: Saha, P., B. Roe, M.J. Roossinck. 2011. Widespread incidence of asymptomatic Zucchini yellow mosaic virus in wild plants
  • Saha, P. J. He, M.J. Roossinck. 2011. Incidence of viruses in wild plants in a tropical biodiversity hotspot.