Source: OKLAHOMA STATE UNIVERSITY submitted to NRP
MASSIVELY PARALLEL SEQUENCING (MPS) AS A DIAGNOSTIC AND FORENSIC ANALYSIS TOOL FOR PLANT PATHOGENS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0220956
Grant No.
2010-85605-20542
Cumulative Award Amt.
$914,338.00
Proposal No.
2009-05017
Multistate No.
(N/A)
Project Start Date
Jan 15, 2010
Project End Date
Jan 14, 2015
Grant Year
2010
Program Code
[91510]- Plant Biosecurity
Recipient Organization
OKLAHOMA STATE UNIVERSITY
(N/A)
STILLWATER,OK 74078
Performing Department
Entomology And Plant Pathology
Non Technical Summary
Diagnostics laboratories face tremendous challenges in dealing with plant pathogens. The vast array of possible crop species, each with their own set of possible pathogens, results in the need to be able to detect thousands of microbes with high specificity and sensitivity. In addition, the legal requirements for diagnostic labs require that all assays be validated and all assay users be certified in order for the results to be allowed as valid. Under these conditions, diagnostics labs would spend all of their time and resources in developing assays, validating those assays and training certifying workers. The best solution would be a single protocol that could detect and identify any and all pathogens in a given sample simultaneously, with the capacity to do forensic analysis and search for unknown and genetically modified organisms. No such technology currently exists, but MPS has the potential to fulfil such a role. The goal and objectives of this project are to develop MPS as a diagnostic tool that can meet all the needs described above.
Animal Health Component
40%
Research Effort Categories
Basic
50%
Applied
40%
Developmental
10%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2122410116060%
7122410116040%
Goals / Objectives
Develop a "reverse search" tool, which would allow a sequence database generated by a massively parallel sequencing (MPS) run to be searched with a limited number of key pathogen sequences. This will cut processing time exponentially, increasing efficiency and throughput. Test MPS diagnostic capacity on select plant pathogens from viruses, phytoplasmas, bacteria, fungi and stramenopiles. Test MPS forensic (strain typing) capacity on select plant pathogens, where previous typing data is available. Test the ability of MPS to detect plant pathogens that have been genetically engineered to express toxins or proteins harmful to humans, by screening plant pathogen DNA samples spiked with examples of such genes or commonly used genetic engineering promoters and/or plasmid screening markers. Establish a collaborative research and educaitonal link between the Oklahoma State National Institute for Microbial Forensics & Food and Agricultural Biosecurity and the Fort Detrick National Interagency Biodefense Campus.
Project Methods
Massively parallel sequencing (MPS) generates tremendous numbers of overlapping short sequence reads that are then compiled by computer to generate huge contigs of sequence. In theory, MPS generates sequence for all DNAs in a given sample. For example, if a plant is infected with a bacterial pathogen the resulting MPS data would contain a mixture of plant and bacterial sequences. The most common use of MPS is to generate genomic sequences, but the technology can also be used to create a diagnostic assay with the capacity to detect and potentially strain type any and all pathogens in a sample. This would give MPS diagnostics the capacity to act as a forensic tool, and potentially allow the screening of DNA samples for the hallmarks of genetically engineered harmful toxins/proteins. This project would adapt this technology to pathogen identification by using a "reverse search" tool, to search an MPS database with a limited number of key pathogen sequences. This will cut processing time exponentially, increasing efficiency and throughput. DNA extractions from plants infected with pathogens of interest will be subjected to MPS sequencing. The sequence data will be used to create a database that is "probed" bioinformatically to determine the microbial population present. The technique can also be used for forensic analysis and/or to search for genetically modified microbes by changing the signature sequences used for probing.

Progress 01/15/10 to 01/14/15

Outputs
Target Audience: US Department of Homeland Security US Federal Bureau of Investigation USDA Animal and Plant Health Inspection Service National Plant Diagnostic Network National Plant Disease Recovery System US Defense Threat Reduction Agency Oklahoma Office of Homeland Security European Union 7th Framework - Security Changes/Problems: The project has ended. What opportunities for training and professional development has the project provided? (1) Through a strong educational component, we addressed a critical emerging national need for scientists trained and experienced in both traditional and modern areas of plant pathology, and knowledgeable and appreciative of new National initiatives in agricultural biosecurity and forensic capability. (2) All 3 graduate students completed a second summer internship at the USDA-ARS Emerging Diseases & Pathogens Laboratory, Ft. Detrick, MD. (3) All students and postdocs attended multiple meetings presented posters on the project research. (4) Three 4-H Summer Camps were held at Oklahoma State University, educating teens and adults on plant pathology, forensics and molecular biology. Attendance included 40-50 teenaged 4-H youth and 10-20 Extension Educators. Evaluation forms from both adults and teens were extremely positive. How have the results been disseminated to communities of interest? All students and postdocs attended multiple meetings presented posters on the project research. (2) The work was the subject of four invited PI talks at national and international meetings. What do you plan to do during the next reporting period to accomplish the goals? The project has ended.

Impacts
What was accomplished under these goals? (1) The bininformatic software, EDNA, developed by grant participants, consists of two methods: a) e-probe selection software based on a modified version of TOFI, and b) a pathogen detection software based on a modified version of BLAST. (2) The EDNA pipeline was tested using simulated sequence databases, and found capable of detecting RNA viruses, DNA viruses, bacteria, oomycetes and fungi when pathogen sequences represented at least 0.5% of the total read content (Stobbe et al., 2013). The fungal/oomycete detection protocol was modified for specialized detection of eukaryotic pathogens in a eukaryotic background (Espindola et al., 2015) (3) The EDNA pipeline was validated with sequencing data sets containing a DNA and a RNA plant virus (Stobbe et al., 2014), bacterial plant pathogens (Daniels et al., in preparation) and human pathogens on plants (Blagden et al., 2015), oomycete pathogens and fungal pathogens (in preparation). Procedures to test the statistical significance of BLAST hits by e-probes of the databases were evaluated and optimal parameters for searching sequence databases were established. (4) The ability of EDNA to detect genetically modified organisms was tested using a GFP modified Serratia marcesens strain (in preparation). (5) Support vector machine models were developed and explored to distinguish host plant from bacterial and from fungal sequences, generating the capacity to detect unknown pathogens with no counterparts in Genbank. The final product was presented to researchers from USDA-ARS, APHIS and Ag Canada in a October, 2014 workshop in Beltsville, MD.

Publications

  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Stobbe, A.H., Schneider, W.L., Hoyt, P. and Melcher, U. (2014) Screening metagenomic data for viruses using the E-probe Diagnostic Nucleic acid Analysis (EDNA). Phytopathology 104: 1125-1129.
  • Type: Journal Articles Status: Published Year Published: 2014 Citation: Melcher, U., Verma, R., and Schneider, W. (2014) Metagenomic search strategies for interactions among plants and multiple microbes. Front. Plant Sci., 5:268 doi: 10.3389/fpls.2014.00268.
  • Type: Journal Articles Status: Accepted Year Published: 2015 Citation: Espindola, A., Schneider, W.L., Hoyt, P., Marek, S., and Garzon, C. (2015) A new approach for detecting fungal and oomycete plant pathogens in next generation sequencing metagenomic data utilizing electronic probes. Int. J. Data Min. Bioinf. (accepted).
  • Type: Journal Articles Status: Submitted Year Published: 2015 Citation: Blagden, T., Schneider, W.L., Melcher, U., Daniels, J. and Fletcher, J. (2015) Adaptation and validation of E-probe Diagnostic Nucleic acid Analysis for the detection of Escherichia coli O157:H7 in metagenomic data of complex food matrices. J. Food Sci. (submitted).
  • Type: Theses/Dissertations Status: Accepted Year Published: 2013 Citation: Stobbe, Anthony. 2013. Virus detection in a metagenomics sequence dataset: Methods and applications. Oklahoma State University, ProQuest, UMI Dissertations Publishing.
  • Type: Theses/Dissertations Status: Accepted Year Published: 2013 Citation: Daniels, Jon. 2013. The use of next generation sequencing to detect plant pathogenic prokaryotes. Oklahoma State University, ProQuest, UMI Dissertations Publishing.
  • Type: Theses/Dissertations Status: Accepted Year Published: 2013 Citation: Espindola, Andres. 2013. Massively parallel sequencing (MPS) as a diagnostic and forensic analysis tool for important fungi and chromista plant pathogens. Oklahoma State University, ProQuest, UMI Dissertations Publishing.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2013 Citation: Blagden, T., Schneider, W., Melcher, U., and Fletcher, J. (2013) In silico adaptation of EDNA (E- probe Diagnostic Nucleic Acid Analysis) for detection of foodborne pathogens. Intl. Asso. Food Prot. Meeting, Charlotte, North Carolina.
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2015 Citation: Dutta M., Stobbe A., Schneider W., Malmstrom C. and Melcher U. K. (2015) Adaptation and validation of E-probe Diagnostic Nucleic Acid analysis for screening metagenomic data for viruses using universal virus microarray probes. 8th Wkshp Virus Evol., Pennsylvania State University, State College, PA.


Progress 01/15/13 to 01/14/14

Outputs
Target Audience: US Department of Homeland Security, US Federal Bureau of Investigation, USDA Animal and Plant Health Inspection Service, National Plant Diagnostic Network, National Plant Disease Recovery System, US Defense Threat Reduction Agency, Oklahoma Office of Homeland Security, European Union 7th Framework – Security Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? STEM training for 4-H youth and educators Teaching and mentoring experience for graduate students and postdocs teaching the 4-H participants Students and postdocs presented oral papers and posters at national and international professional meetings Students and postdocs are writing and submitting their manuscripts to peer reviewed journals How have the results been disseminated to communities of interest? Talks and posters at several scientific meetings, both national and international Submission of papers to refereed journal articles Application as STEM activity for 4-H youth What do you plan to do during the next reporting period to accomplish the goals? Complete the research data analyses Prepare and submit manuscripts to refereed journal articles Host the final subject matter expert guest lecturer for our Agricultural Biosecurity course Prepare funding applications for establishment and maintenance of a data depository of validated e-probe sequences

Impacts
What was accomplished under these goals? Output: This project addresses the vulnerability of U.S. agriculture to intentional targeting by adapting massively parallel sequencing (MPS) for detection and diagnosis. The method is rapid, identifies any pathogen in a complex sample and can detect hallmarks of genetic engineering. Goals are: (1) develop a molecular tool to capture key diagnostic sequence information from an existing MPS sequence database by “reverse search”, (2) test MPS diagnostic capacity on plant pathogens by generating sample sequence databases, (3) test MPS forensic (strain typing ) capacity on plant pathogens, (4) test MPS detection of plant pathogens engineered to express toxins or proteins, and (5) prepare young scientists via a multidisciplinary graduate program, internships on the Ft. Detrick Biosecurity Campus, and microbial forensics leadership workshops for 4-H youth and teachers. Progress: (1) The EDNA pipeline was validated with sequencing data sets for Bean golden mosaic virus (DNA) and Plum pox virus (RNA); both of which were detected in a metagenomic sample containing both microbial and plant sequences when the pathogen reads were at least 1.34%. Procedures to test the statistical significance of the BLAST hits by e-probes of the databases were evaluated and optimal sequence searching parameters were established. E-probes specific for Serratia marcescens (Sm), green fluorescent protein (GFP) and antibiotic resistance genes were designed. Nucleotide sequences from zucchini, healthy or infected with Sm or with Sm+GFP plasmid, were probed with the e-probes. (2) Twenty-5000 unique queries each, generated for the bacteria Xylella fastidiosa 9a5c, Xanthomonas oryzae, and Ralstonia solanacearum race 3 biovar 2, and Candidatus Liberibacter asiaticus, were used to BLAST mock sample databases (MSD) of plant and pathogens sequences. All bacteria were detected, even when in low abundance. Pathogen specific queries, ranging in lengths from 15 nt to 60 nt, were created for detection of the bacteria R. solanacearum race 3 biovar 2and Pst. The e-probe sets were used to BLASTn NGS runs of diseased host tissues. Both bacterial pathogens were readily detectable in planta,. The NGS data contained sequences from multiple bacteria, fungi, host genome, mitochondrial genomes, and the chloroplast genome typical of a metagenomic sample from an infected plant. Both the bioinformatic tools and e-probe designs were effective and specific. (3) Pythium ultimum, Phakopspora pachyrhizi, Puccinia graminis and Phytophthora ramorum were detected with high accuracy in 454 sequencing output data using E-probes. Lowering the e-value decreased the sensitivity and reduce computing needs. EDNA was applied to MSD containing genome traces of three isolates of Pythium to evaluate the test’s Stramenopile discriminative capacity. (4) Unique e-probes were generated for human pathogens Escherichia coli and Salmonella spp. (5) Support vector machines were explored to distinguish host plant from bacterial and from fungal sequences. (6) Two students presented posters at the 2013 Annual Meeting of the American Phytopathological Society. (7) A manuscript of the theoretical aspects of EDNA development was published in a refereed journal. (8) A manuscript describing strain discrimination among viruses by the EDNA strategy is under revision following initial review for publication in a peer-reviewed journal. (9) The 3rd 4-H Summer Camp/Workshop was held in Stillwater, OK, in June 2013; with approximately 20 teenaged 4-H youth and 10 Extension Educators and teachers. A new “Teachers’ Lesson Guide” was prepared to accompany the student manual. Evaluation forms from adults and teens were extremely positive. Outcome/Impact: Deep sequencing (or next-generation sequencing) has altered the molecular biology landscape in many ways, including the development of the field of metagenomics. Deep sequencing using metagenomics principles has been applied to diagnostics, generating massive volumes of data and increasing the likelihood of finding pathogens present at low levels. However, sorting through the volumes of data can be cumbersome. Significant portions of any metagenome are irrelevant for pathogen detection, and the assembly of complete genomes is not actually necessary for pathogen diagnostics. The initial objective of this work was to develop bioinformatics tools for the streamlining of sequence data for diagnostic purposes. EDNA finds nucleic acid signatures of microbes of interest without assembly and GenBank BLAST steps. In silico simulations indicated that the procedure was both sensitive and specific in the detection of RNA and DNA viruses, as well as prokaryotic and eukaryotic organisms. Experiments using actual infected plant samples suggest that EDNA specificity may be even better than levels observed in in silico simulations. EDNA also is flexible, allowing the user to adjust specificity levels and sensitivity levels to suit the needs of the assay. The data indicate that (1) Query sequences should be 80 bases in length to optimize specificity and sensitivity. (2) The number of query sequences generated is proportional to the size and availability of the pathogen genome. (3) The approach has the potential to be very sensitive. Mock databases containing 0.5 percent pathogen “reads” were consistently identified using only 4 query sequences. (4) The sensitivity of MPS diagnostics can be increased by increasing the number of queries. For example, a mock database containing 0.1 percent Xylella fastidiosa reads was correctly identified consistently when 1000+ query sequences were used. (5) The specificity of MPS diagnostics is flexible. A reverse BLAST E value of 10 to the power of negative 3 is necessary to limit false positives. (6) A statistical approach to determining identification confidence levels has been developed based on the number of random matches that occur with a negative control query panel made up of reversed query sequences. (7) With the MPS technique, a plant sample can be simultaneously (a) screened bioinformatically for organisms of concern, (b) strain-typed for forensic purposes, and (c) searched for signs of genetic engineering. (8) EDNA has already been used to identify unknown viruses in samples obtained by USDA-APHIS PPQ. In summary, we have developed a bioinformatics pipeline for such screening, created sample sequence datasets to test the pipeline and sued the database for pathogen typing to support forensic microbiological attribution. The assay, as is or with appropriate modification, can be used for many other applications including food safety testing, border inspection, and routine diagnosis. (9) Through a strong educational component, we also address a critical emerging national need for scientists trained and experienced in both traditional and modern areas of plant pathology, and knowledgeable and appreciative of new National initiatives in agricultural biosecurity and forensic capability.

Publications

  • Type: Journal Articles Status: Accepted Year Published: 2014 Citation: Stobbe, T., W. Schneider, P.R. Hoyt, and U. Melcher. 2014. Screening metagenomic data for viruses using the E-Probe Diagnostic Nucleic Acid Assay (EDNA). Phytopathology. Accepted with revision.
  • Type: Journal Articles Status: Published Year Published: 2013 Citation: Stobbe, T., J. Daniels, A. Espindola, U. Melcher, F. Ochoa Corona, C. Garzon, R. Verma, J. Fletcher and W. Schneider. 2013. Electronic diagnostic nucleic acid analysis (EDNA): A theoretical approach for improved handling of massively parallel sequencing data for diagnostics. J. Micr. Meth. http://dx.doi.org/10.1016/j.mimet.2013.07.002
  • Type: Books Status: Published Year Published: 2013 Citation: Fletcher, J., F.M. Ochoa Corona, and M. Payton. 2013. Plant disease diagnostics for forensic applications. In: Proceedings of the International Congress of Plant Pathology
  • Type: Theses/Dissertations Status: Published Year Published: 2013 Citation: Espindola, AS. 2013. Massively Parallel Sequencing (MPS) As a Diagnostic and Forensic Analysis Tool for Important Fungi and Chromista Plant Pathogens. Master of Science Thesis, Department of Entomology and Plant Pathology, Oklahoma State University
  • Type: Theses/Dissertations Status: Submitted Year Published: 2013 Citation: Daniels, J.M. 2013. The use of next generation sequencing to detect plant pathogenic prokaryotes. Master of Science Thesis, Department of Entomology and Plant Pathology, Oklahoma State University.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Blagden, T.D., W. Schneider, U. Melcher, and J. Fletcher. 2013. In silico Adaptation of EDNA (E-probe Diagnostic Nucleic Acid Analysis) for Detection of Foodborne Pathogens. International Association of Food Protection Annual Meeting. Charlotte, North Carolina.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2013 Citation: Daniels, J., T. Stobbe, A. Espindola, W. Schneider, J. Sallee, T. Blagden, F. Ochoa Corona, C. Garzon, and J. Fletcher. 2013. CSI in a tomato disease plot: Engaging 4-H youth and educators in STEM through investigative plant pathology. APS Annual Meeting, Austin, TX.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Daniels, J., W. Schneider, J. Fletcher and F. Ochoa Corona. 2013. Next generation sequencing and its application as a biosecurity tool. Gordon Rsch Conf. Chem. & Biolog. Terrorism Defense, Ventura, CA.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Daniels, J., W. Schneider, J. Fletcher and F. Ochoa Corona. 2012. A need for simple and user-friendly detection for waterborne microbes. Oklahoma Water Resources Research Board Water Research Symposium, Tulsa, OK.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2013 Citation: Espindola A, Schneider W, Garzon CD. Pythium aphanidermatum strain-discrimination from 454 pyrosequencing metagenomic samples. 2013. Bioinformatics strategies for microbial forensics. APS Annual Meeting, Austin, TX.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2013 Citation: Espindola A, Schneider W, and Garzon C. 2013. Strain identification of Pythium aphanidermatum in metagenomic samples from 454 pyrosequencing. Oomycete Molecular Genetics Network Meeting. Pacific Grove, CA.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2012 Citation: Espindola, A., W. Schneider, J. Fletcher and C. Garzon. 2012. Using next generation sequencing as a diagnostic tool for Phytophthora ramorum and Pythium ultimum. Oomycete Molecular Genetics Meeting, Nanjing, China.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2012 Citation: Espindola, A., W. Schneider, J. Fletcher and C. Garzon 2012. Validation of EDNA, a newly developed bioinformatics tool, for the detection of Pythium ultimum from metagenomic samples. APS Annual Meeting, Providence, RI.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2012 Citation: Espindola, A., W. Schneider, J. Fletcher and C. Garzon 2012. Validation of EDNA, a newly developed bioinformatics tool, for detection of Phakopsora pachyrhizi from metagenomic samples. APS Annual Meeting, Providence, RI.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Esp�ndola A, Stobbe A, Daniels J, Ochoa-Corona F, Fletcher J, Melcher U, Garz�n C, Schneider W. Massive Parallel Sequencing as a forensic and diagnostic tool for plant diseases. Universidad Polit�cnica Salesiana, Quito Ecuador.
  • Type: Conference Papers and Presentations Status: Other Year Published: 2013 Citation: Esp�ndola A, Stobbe A, Daniels J, Ochoa-Corona F, Fletcher J, Melcher U, Garz�n C, Schneider W. Massive Parallet Sequencing: a tool for forensic analysis and diagnosis of plant diseases caused by fungi and oomycetes. Pontificia Universidad Cat�lica del Ecuador, Quito Ecuador.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2013 Citation: Schneider, W.S., R. Verma, A. Stobbe, J. Daniels, A. Espindola, T. Blagden, J. Fletcher, F. Ochoa-Corona, C. Garzon, and U. Melcher. 2013. Bioinformatics strategies for microbial forensics. APS Annual Meeting, Austin, TX.


Progress 01/15/12 to 01/14/13

Outputs
OUTPUTS: This project addresses the vulnerability of U.S. agriculture to intentional targeting by adapting massively parallel sequencing (MPS) for detection and diagnosis. The method is rapid, identifies any pathogen in a complex sample, and can detect hallmarks of genetic engineering. Goals are: (1) develop a molecular tool to capture key diagnostic sequence information from an existing MPS sequence database by "reverse search," (2) test MPS diagnostic capacity on plant pathogens by generating sample sequence databases, (3) test MPS forensic (strain typing) capacity on plant pathogens, (4) test MPS detection of plant pathogens engineered to express toxins or proteins, and (5) prepare young scientists via a multidisciplinary graduate program, internships on the Ft. Detrick Biosecurity Campus, and microbial forensics leadership workshops for 4-H youth and teachers. Progress: (1) The EDNA pipeline was validated with sequencing data sets for Bean golden mosaic virus (DNA) and Plum pox virus (RNA); both of which were detected in a metagenomic sample containing both microbial and plant sequences when the pathogen reads were at least 1.34%. Procedures to test the statistical significance of BLAST hits by e-probes of the databases were evaluated and optimal sequence searching parameters were established. E-probes specific for Serratia marcescens (Sm), green fluorescent protein (GFP) and antibiotic resistance genes were designed. Nucleotide sequences from zucchini, healthy or infected with Sm or with Sm+GFP plasmid, were probed with the e-probes. (2) Twenty-5000 unique queries each, generated for the bacteria Xylella fastidiosa 9a5c, Xanthomonas oryzae, and Ralstonia solanacearum race 3 biovar 2, and Candidatus Liberibacter asiaticus, were used to BLAST mock sample databases (MSD) of plant and pathogen sequences. All bacteria were detected, even when in low abundance. (3) Pythium ultimum, Phakopsora pachyrhizi, Puccinia graminis and Phytophthora ramorum were detected with high accuracy in 454 sequencing output data using E-probes. Lowering the e-value decreased the false positive hits but lowered the number of positive hits. Using multiple e-probes of 40 nt-140 nt may increase sensitivity and reduce computing needs. EDNA was applied to MSD containing genome traces of three isolates of Pythium to evaluate the test's Stramenopile discriminative capacity. (4) Unique e-probes were generated for human pathogens Escherichia coli and Salmonella spp. (5) Support vector machines were explored to distinguish host plant from bacterial and from fungal sequences. (6) All 3 graduate students completed a second summer internship at the USDA-ARS Emerging Diseases & Pathogens Laboratory, Ft. Detrick, MD. (7) All students presented posters at the 2012 Annual Meeting of the American Phytopathological Society. (8) A manuscript on EDNA was submitted to a journal. (9) The 2nd 4-H Summer Camp was held in June 2012; with 16 teenaged 4-H youth and six Extension Educators. Evaluation forms from adults and teens were extremely positive. (10) Project PIs were awarded the 3rd Place OSU 2012 Research Innovation Award for this creative multidisciplinary project. PARTICIPANTS: Blagden, Trenna Postdoc OSU Daniels, Jon Ph.D. student OSU Espindola, Andres Ph.D. student OSU Fletcher, Jacqueline Full Professor Garzon, Carla Asst. Professor OSU Hoyt, Peter Assoc. Rsch Scientist OSU Melcher, Ulrich Professor OSU Ochoa Corona, F. Asst. Professor OSU Sallee, Jeff Asst. Professor & Extension Specialist OSU Schneider, William Plant Pathologist USDA ARS Stobbe, Tony Ph.D. student OSU Verma, Ruchi Postdoc OSU TARGET AUDIENCES: US Department of Homeland Security US Federal Bureau of Investigation USDA Animal and Plant Health Inspection Service National Plant Diagnostic Network National Plant Disease Recovery System US Defense Threat Reduction Agency Oklahoma Office of Homeland Security European Union 7th Framework - Security PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
Deep sequencing (or next-generation sequencing) has altered the molecular biology landscape in many ways, including the development of the field of metagenomics. Deep sequencing using metagenomics principles has been applied to diagnostics, generating massive volumes of data and increasing the likelihood of finding pathogens present at low levels. However, sorting through the volumes of data can be cumbersome. Significant portions of any metagenome are irrelevant for pathogen detection, and the assembly of complete genomes is not actually necessary for pathogen diagnostics. The initial objective of this work was to develop bioinformatic tools for the streamlining of sequence data for diagnostic purposes. EDNA finds nucleic acid signatures of microbes of interest without assembly and GenBank BLAST steps. In silico simulations indicated that the procedure was both sensitive and specific in the detection of RNA and DNA viruses, as well as prokaryotic and eukaryotic organisms. Initial experiments using actual infected plant samples suggest that EDNA specificity may be even better than levels observed in in silico simulations. EDNA also is flexible, allowing the user to adjust specificity levels and sensitivity levels to suit the needs of the assay. The data indicate that (1) Query sequences should be 80 bases in length to optimize specificity and sensitivity. (2) The number of query sequences generated is proportional to the size and availability of the pathogen genome. (3) The approach has the potential to be very sensitive. Mock databases containing 0.5 percent pathogen "reads" were consistently identified using only 4 query sequences. (4) The sensitivity of MPS diagnostics can be increased by increasing the number of queries. For example, a mock database containing 0.1 percent Xylella fastidiosa reads was correctly identified consistently when 1000+ query sequences were used. (5) The specificity of MPS diagnostics is flexible. A reverse BLAST E value of 10 to the power of negative 3 is necessary to limit false positives. (6) A statistical approach to determining identification confidence levels has been developed based on the number of random matches that occur with a negative control query panel made up of reversed query sequences. (7) With the MPS technique, a plant sample can be simultaneously (a) screened bioinformatically for organisms of concern, (b) strain-typed for forensic purposes, and (c) searched for signs of genetic engineering. (8) EDNA has already been used to identify unknown viruses in samples obtained by USDA-APHIS PPQ. In summary, we have developed a bioinformatic pipeline for such screening, created sample sequence datasets to test the pipeline and used the database for pathogen typing to support forensic microbiological attribution. Through a strong educational component, we also address a critical emerging national need for scientists trained and experienced in both traditional and modern areas of plant pathology, and knowledgeable and appreciative of new National initiatives in agricultural biosecurity and forensic capability.

Publications

  • Daniels, J., A. Stobbe, A. Espindola, W. Schneider, J. Fletcher and F. Ochoa Corona. 2012. Next generation sequencing as a diagnostic tool for biosecurity agencies. ASM Biodefense Conference, Washington, D.C.
  • Espindola, A., W. Schneider, J. Fletcher and C. Garzon. 2012. Using next generation sequencing as a diagnostic tool for Phytophthora ramorum and Pythium ultimum. Oomycete Molecular Genetics Meeting, Nanjing, China.
  • Espindola, A., W. Schneider, J. Fletcher and C. Garzon 2012. Validation of EDNA, a newly developed bioinformatics tool, for the detection of Pythium ultimum from metagenomic samples. APS Annual Meeting, Providence, RI.
  • Espindola, A., W. Schneider, J. Fletcher and C. Garzon 2012. Validation of EDNA, a newly developed bioinformatics tool, for detection of Phakopsora pachyrhizi from metagenomic samples. APS Annual Meeting, Providence, RI.
  • Stobbe, A. U. Melcher, J. Fletcher and W. Schneider. 2012. Validation of a unique sequence-based detection of plant pathogens using next generation sequence data. APS Annual Meeting, Providence, RI.
  • Verma, R. & Melcher, U. (2012) A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins. BMC Bioinformatics 13 (S15) S9.


Progress 01/15/11 to 01/14/12

Outputs
OUTPUTS: This project addresses the vulnerability of U.S. agriculture to intentional targeting by adapting massively parallel sequencing (MPS; megasequencing) technology for detection and diagnosis. The method, based on pathogen genomic sequences, is rapid, identifies any pathogen (including unknowns) within a complex sample, and can detect hallmarks of genetic engineering. Goals are: (1) develop a molecular tool to capture key diagnostic sequence information from an existing MPS sequence database by "reverse search," (2) test MPS diagnostic capacity on plant pathogens by generating sample sequence databases, (3) test MPS forensic (strain typing) capacity on select plant pathogens, (4) test the ability of MPS to detect plant pathogens engineered to express toxins or proteins, and (5) prepare young scientists via a multidisciplinary graduate program, internships on the Ft. Detrick Biosecurity Campus, and leadership workshops on microbial forensics for 4-H youth and teachers. Progress: (1) Bioinformatic pipelines were developed for quick (<48hr) e-probe generation for targeted pathogens having a sequenced genome. (2) An additional diagnostic bioinformatic process, termed E-probe Diagnostic Nucleic acids Analysis (EDNA), avoids assembly and GenBank BLAST steps while successfully finding nucleic acid signatures of microbes of interest. (3) Optimal EDNA parameters were established, including query length and e-value. (4) EDNA features a high degree of flexibility, allowing the user to adjust specificity levels and sensitivity levels to suit the needs of the assay. (5) The addition of a BLAST check step in the e-probe design process reduces the number of e-probes generated, thereby reducing the number of hits, the rate of false negatives, and the computation required. (6) In silico simulations indicated that EDNA was sensitive and specific in the detection of RNA and DNA viruses, bacteria and fungi. (7) In silico, EDNA was valid and reliable even when microbes were tested in complex host samples. (8) By comparing the target scores to negative control decoys, pathogen sequences are detectable at ratios as low as 0.5% of the sequence reads. (9) MetaSim MSD generation was streamlined to accept a multitude of modifications depending on the host and current biosecurity needs for use in expediting BLAST searches. (10) Initial experiments using actual infected plant samples suggest that EDNA specificity may be even better than levels observed in in silico simulations. (11) All 3 graduate students completed a summer internship at the USDA-ARS Emerging Diseases & Pathogens Laboratory, Ft. Detrick, MD. (12) All 3 graduate students attended the Annual Meeting of the American Phytopathological Society in August, 2011, and presented posters on the project research. (13) A manuscript on the EDNA procedure is almost ready for submission to a journal. (14) The first 4-H Summer Camp was held at Oklahoma State University in June 2011; attendance included 16 teenaged 4-H youth and six Extension Educators. Evaluation forms from both adults and teens, which were extremely positive, will be used for improvements in the second camp scheduled for summer 2012. PARTICIPANTS: Blagden, Trenna Postdoc OSU Daniels, Jon Ph.D. student OSU Espindola, Andres Ph.D. student OSU Garzon, Carla Assoc. Professor OSU Hoyt, Peter Assoc. Rsch Scientist OSU Melcher, Ulrich Professor OSU Ochoa Corona, F. Asst. Professor OSU Schneider, William Plant Pathologist USDA ARS Stobbe, Tony Ph.D. student OSU Verma, Ruchi Postdoc OSU TARGET AUDIENCES: US Department of Homeland Security US Federal Bureau of Investigation USDA Animal and Plant Health Inspection Service National Plant Diagnostic Network National Plant Disease Recovery System US Defense Threat Reduction Agency Oklahoma Office of Homeland Security European Union 7th Framework - Security PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The mock databases generated by the various programs were used to test the bioinformatic tools being developed for the final protocol. The modified TOFI generated diagnostic query sequences for all of the pathogens used so far, and these diagnostic sequences successfully identified pathogen "reads" within the mock sequence database. The data indicate that (1) Query sequences should be 80 bases in length to optimize specificity and sensitivity. (2) The number of query sequences generated is proportional to the size and availability of the pathogen genome. (3) The approach has the potential to be very sensitive. Mock databases containing 0.5 percent pathogen "reads" were consistently identified using only 4 query sequences. (4) The sensitivity of MPS diagnostics can be increased by increasing the number of queries. For example, a mock database containing 0.1 percent Xylella fastidiosa reads was correctly identified consistently when 1000+ query sequences were used. (5) The specificity of MPS diagnostics is also flexible. A reverse BLAST E value of 10 to the power of negative 3 is necessary to limit false positives. (6) A statistical approach to determining identification confidence levels has been developed based on the number of random matches that occur with a negative control query panel made up of reversed query sequences. (7) With the MPS technique, a plant sample can be simultaneously (a) screened bioinformatically for organisms of concern, (b) strain-typed for forensic purposes, and (c) searched for signs of genetic engineering. In summary, we have developed a bioinformatic pipeline for such screening, created sample sequence datasets to test the pipeline and used the database for pathogen typing to support forensic microbiological attribution. Through a strong educational component, we also address a critical emerging national need for scientists trained and experienced in both traditional and modern areas of plant pathology, and knowledgeable and appreciative of new National initiatives in agricultural biosecurity and forensic capability.

Publications

  • Stobbe, A., J. Daniels, A. Espindola, W. Schneider, J. Fletcher, U. Melcher. 2011. Massively parallel sequencing used a diagnostic tool. DTRA Chem-Bio Defense Annual Meeting, Las Vegas, NV. Abstract.
  • Daniels J., A. Stobbe, A. Espindola, W. Schneider, J. Fletcher and F.M. Ochoa-Corona. 2011. Massively parallel sequencing as a diagnostic tool for bacterial plant pathogens. DTRA Chem-Bio Defense Annual Meeting, Las Vegas, NV. Abstract.
  • Daniels, J., T. Stobbe, A. Espindola, W. Schneider, J. Fletcher and F.M. Ochoa-Corona. 2011. In silico simulation of massively parallel sequencing as a diagnostic tool for bacterial phytopathogens. APS Annual Meeting, Honolulu, HI. Abstract.
  • Espindola A., T. Stobbe, J. Daniels, J. Fletcher, W. Schneider and C. Garzon . 2011. Query based detection of eukaryotic plant pathogens Puccinia graminis and Phytophthora ramorum in computer-generated pyrosequencing databases. DTRA Chem-Bio Defense Annual Meeting, Las Vegas, NV. Abstract.
  • Espindola, A., J. Daniels, T. Stobbe, J. Fletcher, C. Garzon and W. Schneider. 2011. Design and validation of queries for the detection of Puccinia graminis in simulated metagenomes. APS Annual Meeting, Honolulu, HI. Abstract.
  • Espindola, A., J. Daniels, T. Stobbe, J. Fletcher, C. Garzon and W. Schneider. 2011. Design and validation of queries for the detection of Phytophthora ramorum in simulated metagenomes. APS Annual Meeting, Honolulu, HI. Abstract.
  • Schneider, W.L., A. Stobbe, J. Daniels, A. Espindola, R. Verma, T. Blagden, J. Fletcher, F. Ochoa-Corona, C. Garzon, P. Hoyt, and U. Melcher. 2011. Finding the diagnostic needle in a deep sequence data haystack: E-probe Diagnostic Nucleic acids Analysis (EDNA). Binational Agricultural Research & Development Conference, Microarrays and Next-generation Sequencing for Detection and Identification of Plant Viruses; College Park, MD. Abstract.


Progress 01/15/10 to 01/14/11

Outputs
OUTPUTS: This project addresses the vulnerability of U.S. agriculture to intentional targeting by adapting massively parallel sequencing (MPS; megasequencing) technology for detection and diagnosis. The method, based on pathogen genomic sequences, is rapid, identifies any pathogen (including unknowns) within a complex sample, and can detect hallmarks of genetic engineering. Goals are: (1) develop a molecular tool to capture key diagnostic sequence information from an existing MPS sequence database by "reverse search," (2) test MPS diagnostic capacity on plant pathogens by generating sample sequence databases, (3) test MPS forensic (strain typing) capacity on select plant pathogens, (4) test the ability of MPS to detect plant pathogens engineered to express toxins or proteins, and (5) prepare young scientists via a multidisciplinary graduate program, internships on the Ft. Detrick Biosecurity Campus, and leadership workshops on microbial forensics for 4-H youth and teachers. Progress: 1. A reverse search tool was developed by a modification of existing BLAST codes. 2. A script (UnAssembler), developed for the creation of simulated sequence runs (mock databases), will use a genome sequence as a fasta file and generate another fasta file of varying length "reads" corresponding to expected read lengths generated by various MPS platforms. The "reads" start positions are chosen randomly (or set by a sliding window) so that an average length is specified and can be varied by a user-specified range. Also variable is coverage (in the usual "fold" sense used in genome projects). A step was added to correct for end under-representation with circular genomes. 3. An additional script (Samplemaker) was developed to mix Unassembler pathogen and host "reads" at various ratios, creating mock databases simulating a MPS run output. 4. Using Metasim and Samplemaker scripts, 135 mock sequence databases (MSDs) were established with pathogen sequences representing 20%, 10%, 1%, 0.1% or 0% of the total sequence reads. Complete sets of MSDs were generated for an RNA virus (Plum pox virus), a DNA virus (Bean Golden mosaic virus) and two bacteria. Work on spiroplasma, fungal and oomycete MSDs is underway. 5. An existing primer design program (Tools for Oligo Fingerprint Identification (TOFI)) was modified to develop query sequences for the MPS project. The TOFI component designed to select diagnostic probes based on thermodynamic and binding qualities was adapted to eliminate potential queries containing homopolymers (problematic for sequencing) and length. 6. All test viruses and bacteria have generated queries (probes) against near neighbors. 7. Strain specific query sequences were selected for 10 pathogens. Databases, formatting, and queries have been completed for various sequenced strains of Ralstonia solanacearum. These outputs have substantially met objectives 1 and 2, as well as contributing to objective 3. 8. Planning has begun, in conjunction with OSU's 4-H Youth Programs, for the 4-H Summer Camp to be held on the OSU campus in July 2010. This planning contributes to the completion of objective 5. PARTICIPANTS: Nothing significant to report during this reporting period. TARGET AUDIENCES: Nothing significant to report during this reporting period. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.

Impacts
The mock databases generated by the various programs were used to test the bioinformatic tools being developed for the final protocol. The modified TOFI generated diagnostic query sequences for all of the pathogens used so far, and these diagnostic sequences successfully identified pathogen "reads" within the mock sequence database. The data indicate that: 1. Query sequences should be 80 bases in length to optimize specificity and sensitivity. 2. As expected, the number of query sequences generated is proportional to the size and availability of the pathogen genome. 3. The approach has the potential to be very sensitive. For example, a mock database containing 1% Bean golden mosaic virus "reads" was correctly identified using only 4 query sequences. 4. The sensitivity of MPS diagnostics can be increased by increasing the number of queries. For example, a mock database containing 0.1% Xylella fastidiosa "reads" was correctly identified consistently when 1000+ query sequences were used. 5. A reverse BLAST E value of 10-3 is necessary to limit false positives. 6. A statistical approach to determining identification confidence levels is being developed based on the number of random matches that occur with a negative control query panel made up of 1000 random query sequences. A poster on the work was presented by Ph.D. student Anthony Stobbe at the 2010 Biochemistry & Molecular Biology Graduate Student Association Symposium, September 23, 2010 Oklahoma State University, Stillwater OK: "Generation of probe sequences for use in massivbley parllel sequencing as a diagnostic tool," and a manuscript is in preparation for submission to a peer-reviewed journal. With the MPS technique, a plant sample can be simultaneously (a) screened bioinformatically for organisms of concern, (b) strain-typed for forensic purposes, and (c) searched for signs of genetic engineering. We will develop a bioinformatic pipeline for such screening, create sample sequence datasets to test the pipeline and use the database for pathogen typing to support forensic microbiological attribution. Through a strong educational component, we also address a critical emerging national need for scientists trained and experienced in both traditional and modern areas of plant pathology, and knowledgeable and appreciative of new National initiatives in agricultural biosecurity and forensic capability.

Publications

  • No publications reported this period