Progress 03/01/21 to 02/28/25
Outputs Target Audience:Our target audience was broad. Research efforts were directly for diagnosticians, who serve stakeholders. However, findings also impacted scientific researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Overall, the project trained undergraduate and graduate students, postdoctoral researchers, and FRAs. It provided cross-training between computational biology, plant pathology, and diagnostics. It also gave diagnostiticians training in bioinformatics and helped basic biologists apply their research to a diagnostic setting. How have the results been disseminated to communities of interest?Results have been disseminated via peer-reviewed publication, as presentations at conferences, as a workshop at APS (2024), and as modules and a pipeline on nf-core. We are scheduled to present PathogenSurveillance at APS (2025), both as an oral presentation (submitted abstract) and as a workshop (accepted). What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
Aim I. Implement and integrate tools to apply WGS for disease diagnostics. Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This aim was completed and published in Iruegas-Borcado et al (2023). Briefly, we reported on the effects of sequencing depth on genome assembly and accuracy of calling single nucleotide polymorphisms (SNPs). In addition, we reported on the importance of comparing not only core genomes, but also accessory genomes when analyzing whole genome sequences of drawing conclusions on epidemiological links. We demonstrated that SNP calling programs and reference genome sequences (relationship to samples and quality of assemblies) can have significant effects on conclusions. Last, in Iruegas-Borcado et al (2023), we made recommendations on best practices. Obj. 2. Integrate currently used WGS analysis tools into a workflow for use in plant clinics. We have completed the development of the PathogenSurveillance pipeline. Our pipeline is based on Nextflow, a workflow engine that uses software containers such as Docker and Singularity, or the Conda management system to run pipelines independent of the execution environment (Di Tommaso et al. 2017). This pipeline will take raw Illumina or Nanopore reads and analyze whole genome analyses of bacteria and eukaryotes. Note the new features include analyses of Nanopore reads and of eukaryotes. PathogenSurveillance automates several analyses steps, including: 1) processing reads, 2) quickly inferring a taxonomic identity, 3) assembling and annotating genome sequences, 4) clustering orthologs, 5) identifying from public databases the closest related strains, 5) building core genome and SNP phylogenies, and 6) building minimum spanning networks. There are two extremely powerful attributes that allow non-experts without sufficient computing infrastructure to use our pipeline. These are: 1) automated analysis, including even automated identification of publicly available close reference genome sequences and 2) coupling the Seqera Platform to Google cloud infrastructure. The Seqera Platform manages workflows while AWS provides cost-effective computing infrastructure. A functional pipeline is already available and being beta tested by collaborators. Approximately 15 modules, many of which have general purposes, have been developed. The "Continuous integration testing", necessary for pipelines to be accepted by nf-core, has taken longer than expected. We are currently experiencing unusual errors that are not due to our pipeline but occur under rarely used run variables by nf-core. This is the last hurdle prior to public release of the pipeline and submission of a manuscript describing it. Obj. 3. Develop visuals to effectively interpret and communicate WGS data. These were completed in prior years. Our pipeline generates an interactive HTML report as well as a static PDF report. The interactive HTML report includes phylogenetic trees as well as a minimum spanning network. Last, the report provides key information to the diagnostician such as taxonomic identities of samples and quality of genome sequences. We have also completed the building of R markdowns that provide higher level summaries for researchers interested in mining the data for more basic questions on the biology of samples. Aim II. Implement and integrate tools to apply Meta-WGS for disease diagnostics. Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This work was completed and its efficacy has been demonstrated for detecting Xylella fastidiosa We have also completed testing the applicability of these methods for detecting plant-associated pathogens that reside within leaf tissues and among more complex microbial communities. Obj. 2. Integrate available Meta-WGS analysis tools into a workflow for use in plant clinics. This was completed. We used Nanopore sequencing to characterize samples infected by members of the Xylella genera to survey population diversity and inform on infections by subspecies complexes. More details can be found in products (Abdelrazek et al., 2024). Obj. 3. Develop and optimize novel machine learning algorithms. This was completed (Johnson et al., 2023). Briefly, we constructed a k-mer frequency table and found that a random forest model had the best combination of run-time and accuracy, based on the analysis of tomato metagenomes . Aim III. Plant disease clinics validate the developed protocols and tools. Obj. 1. Validation using inoculated samples. This was completed previously and is described in Aim II, objective 2. Obj. 2. Validation using field collected samples. The was completed, as was described in Aim I, objective 1 and again in Irugeas-Bocardo et al (2024).
Publications
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2024
Citation:
Sudermann, M.A., Foster, Z.S.L., Chang, J.H., Grunwald, N.J. (2024) Metabarcoding for plant pathologists. Canadian Journal of Plant Pathology. 46(2), 142160.
- Type:
Peer Reviewed Journal Articles
Status:
Published
Year Published:
2025
Citation:
Iruegas-Bocardo, F., Sutton, W., Buchanan, R.A., Grunwald, N.J., Chang, J.H., and Putnam, M.L. (2025) Canker and dieback of Alnus rubra is caused by Lonsdalea quercina. Phytopathology. 115:112-116
- Type:
Peer Reviewed Journal Articles
Status:
Awaiting Publication
Year Published:
2025
Citation:
Abdelrazek, S., Rodriguez Salamanca, L., and Vinatzer, B A. (2025) Metagenomic sequencing of tomato plants with wilt symptoms allows for
strain-level pathogen identification and genome-based characterization. Phytopathology (in press)
|
Progress 03/01/23 to 02/29/24
Outputs Target Audience:Diagnosticians, growers and farmers, researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?At Oregon State University, we trained one postdoctoral researcher and two undergraduate students. At Virginia Tech, we trained one research associate and one graduate student. At Ohio State University, we trained one postdoctoral researcher. Training is trans-disciplinary and includes biology, computer science, mathematics, and statistics. How have the results been disseminated to communities of interest?Results have been disseminated via a peer-reviewed publication and in presentations at conferences What do you plan to do during the next reporting period to accomplish the goals?1. Include use of long-read sequences in the pipeline. 2. Add POCP to the pipeline. 3. Deploy pipeline at OSU Plant Clinic 4. Complete manuscript on pipeline. 5. Run workshop at APS (Memphis, TN) on pipeline 6. Complete manuscript that describes a machine learning pipeline that assigns importance to SNPs or genes.
Impacts What was accomplished under these goals?
Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This aim was completed in a prior reporting period. Obj. 2. Integrate currently used WGS analysis tools into a workflow for use in plant clinics. We are extremely excited about our accomplishments under this objective and believe that this pipeline has potential to be transformational for diagnostic settings and highly useful for research scientists. The pipeline is functional and nearly complete. It was built in the nf-core framework (Ewels et al. 2020). Our pipeline is based on Nextflow, a workflow engine that uses software containers such as Docker and Singularity, or the Conda management system to run pipelines independent of the execution environment (Di Tommaso et al. 2017). This pipeline will take raw Illumina reads and analyze whole genome analyses of bacteria. It automates several analyses steps, including: 1) processing reads, 2) quickly inferring a taxonomic identity, 3) assembling and annotating genome sequences, 4) clustering orthologs, 5) identifying from public databases the closest related strains, 5) building core genome and SNP phylogenies, and 6) building minimum spanning networks. There are two extremely powerful attributes that allow non-experts without sufficient computing infrastructure to use our pipeline. These are: 1) automated analysis, including even automated identification of publicly available close reference genome sequences and 2) coupling the Seqera Platform to AWS cloud infrastructure. The Seqera Platform manages workflows while AWS provides cost-effective computing infrastructure. A functional pipeline is already available and being beta tested by collaborators. Approximately 15 modules, many of which have general purposes, have been developed. Obj. 3. Develop visuals to effectively interpret and communicate WGS data. Our pipeline generates an interactive HTML report as well as a static PDF report. The interactive HTML report includes phylogenetic trees as well as a minimum spanning network. Last, the reports provides key information to the diagnostician such as taxonomic identities of samples and quality of genome sequences. We are also building R markdowns that provide higher level summaries for researchers that may be interested in mining the data for more basic questions on the biology of samples. Aim II. Implement and integrate tools to apply Meta-WGS for disease diagnostics. Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This aim was completed in a prior reporting period. Obj. 2. Integrate available Meta-WGS analysis tools into a workflow for use in plant clinics. This has been completed We used Nanopore sequencing to characterize samples infected by members of the Xylella genera to survey population diversity and inform on infections by subspecies complexes. More details can be found in products (Abdelrazek et al., 2024) Obj. 3. Develop and optimize novel machine learning algorithms. This near complete in a previous annual report and has been finished durig this one. More details can be found in products (Johnson et al., 2023). Aim III. Plant disease clinics validate the developed protocols and tools. Obj. 1. Validation using inoculated samples. This was completed previously. Obj. 2. Validation using field collected samples. The Virginia Tech and Oregon State University Plant Clinics have archived over 50 samples and their corresponding pathogen isolates as well as used traditional methods for diagnosis.
Publications
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2023
Citation:
Genomic Source Attribution of Salmonella Using Machine Learning on Metagenomics Sequencing Data. School of Plant and Environmental Sciences Annual Symposium. Chinnareddy S, Liao J, Li S. October 5th 2023.
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2023
Citation:
2. PathogenDx: Automated Analysis of Whole Genome Sequencing Data for the Identification and Analysis of Pathogen Populations. ICPP. Bocardo, Foster, Phan, Witherell, Weisberg, Putnam, Chang, and Grunwald. 2023
- Type:
Conference Papers and Presentations
Status:
Other
Year Published:
2023
Citation:
Xanthomonas hortorum pv. pelargonii in geranium. Plant Health 2023. APS. Denver, CO. Roman-Reyna, V. 2023.
- Type:
Journal Articles
Status:
Published
Year Published:
2023
Citation:
1. Crosby KC, Rojas M, Sharma P, Johnson MA, Mazloom R, Kvitko BH, Smits TH, Venter SN, Coutinho TA, Heath LS, Palmer M, Vinatzer BA (2023) Genomic delineation and description of species and within-species lineages in the genus Pantoea. Frontiers in Microbiology. doi.org/10.3389/fmicb.2023.1254999
- Type:
Journal Articles
Status:
Published
Year Published:
2024
Citation:
2. Abdelrazek S, Bush E, Oliver CL, Liu H, Sharma P, Flores MA, Donegan MA, Almeida R, Nita M, Vinatzer BA (2024) A survey of Xylella fastidiosa in the US state of Virginia reveals wide distribution of both subspecies fastidiosa and multiplex in grapevine. Phytopathology. doi.org/10.1094/PHYTO-06-23-0212-R
- Type:
Journal Articles
Status:
Published
Year Published:
2023
Citation:
3. Johnson MA, Vinatzer BA, Li S (2023) Reference-Free Plant Disease Detection Using Machine Learning and Long-Read Metagenomic Sequencing. Applied and Environmental Microbiology, DOI. doi.org/10.1128/aem.00260-23
- Type:
Journal Articles
Status:
Published
Year Published:
2024
Citation:
4. Roman-Reyna, V, Sharma, A, Toth, H, Konkel, Z, Lmiotek, N, Murthy, S, Faith, S, Slot, J, Hand, F, Goss, E, Jacobs, J (2024) Live tracking of a plant pathogen outbreak reveals rapid and successive, multidecade plasmid reduction. mSystems. 9(2): https://doi.org/10.1128/msystems.00795-23.
|
Progress 03/01/22 to 02/28/23
Outputs Target Audience:Diagnosticians, growers and farmers, researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Two postdoctoral researchers and one post-bacc student (Oregon State University), one postdoctoral researcher, graduate research assistant and undergraduate student (The Ohio State University) one graduate student was trained on developing machine learning tools and one research faculty was trained on using nanopore sequencing at Virginia Tech have been trained at the intersection of plant disease diagnostics, evolutionary biology, genomics as well as metagenomics, and machine learning. How have the results been disseminated to communities of interest?Results have been disseminated via a peer-reviewed publication. Results have been disseminated to the greater scientific community and to the National Plant Diagnostic Network (by way of their newsletter) via a peer-reviewed publication. Our project was presented at the 2022 National meeting of the National Plant Diagnostic Network in Davis, CA. This group is comprised of university and state department of agriculture diagnostic professionals. Marcela A. Johnson, Haijie Liu, Elizabeth Bush, Parul Sharma, Shu Yang, Reza Mazloom, Lenwood S. Heath, Mizuho Nita, Song Li, Boris A. Vinatzer. "Long-read metagenomics to investigate plant disease outbreaks beyond plant pathogen detection and identification". AMS Fall Central Sectional Meeting. Sept 17-18, 2022. El Paso, TX. Oral presentation. Marcela A. Johnson. "From CS to Bioinformatics and Beyond". UTEP Bioinformatics colloquium. Sept 16, 2022. El Paso, TX. Oral presentation. What do you plan to do during the next reporting period to accomplish the goals?Aim I. Implement and integrate tools to apply WGS for disease diagnostics. Obj. 2. Integrate currently used WGS analysis tools into a workflow for use in plant clinics. We have gained a strong familiarity with the Nextflow programming language and expect to complete module and pipeline development during the next period. We will also develop methods that automate the selection of reference genome sequences. Obj. 3. Develop visuals to effectively interpret and communicate WGS data. We will determine the type of information required in reports. We expect to draft a report that includes easily accessible visualizations. We will test the utility of Nextflow Tower, which allows users to interact via web browsers to run Nextflow pipelines and execute them on local computing or HPC cloud-based environments. Nextflow Tower is a potential method for sharing reports. Aim II. Implement and integrate tools to apply Meta-WGS for disease diagnostics. Obj. 2. Integrate available Meta-WGS analysis tools into a workflow for use in plant clinics. We will continue using Meta-WGS for disease diagnostics. Obj. 3. Develop and optimize novel machine learning algorithms. We will develop simulations to develop alignment-free methods for diagnostics, e.g., predict virulence gene sequences. Aim III. Plant disease clinics validate the developed protocols and tools. Obj. 2. Validation using field collected samples. Plant Disease Clinics are currently isolating DNA from samples and isolates. We expect to use our methods to analyze DNA and compare findings from those derived from traditional diagnostic methods. We will present our project at APS and ICPP in the summer of 2023.
Impacts What was accomplished under these goals?
Aim I. Implement and integrate tools to apply WGS for disease diagnostics. Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This aim has been completed and published in Iruegas-Borcado et al (2023). Briefly, we reported on the effects of sequencing depth on genome assembly and accuracy of calling single nucleotide polymorphisms (SNPs). In addition, we reported on the importance of comparing not only core genomes, but also accessory genomes when analyzing whole genome sequences for drawing conclusions on epidemiological links. We demonstrated that SNP calling programs and reference genome sequences (relationship to samples and quality of assemblies) can have significant effects on conclusions. Last, in Iruegas-Borcado et al (2023), we made recommendations on best practices. We also like to highlight that this published work was done in collaboration with multiple plant clinics and showed the importance of sharing genomic data among a network of clinics. Obj. 2. Integrate currently used WGS analysis tools into a workflow for use in plant clinics. WGS analysis requires expertise in biocomputing and computing tools often requires specific environments. To overcome these hurdles, we are implementing our pipeline in the nf-core framework (Ewels et al. 2020). Our pipeline is based on Nextflow, a workflow engine that uses software containers such as Docker and Singularity, or the Conda management system to run pipelines independent of the execution environment (Di Tommaso et al. 2017). A powerful feature of nf-core is that it has a large community of developers who contribute open-source modules. This flexibility is highly advantageous because modules can be incorporated into a diversity of pipelines. Moreover, nf-core has oversight to ensure all contributed modules meet with standards. To date, we have contributed four modules to nf-core and connected them to several others available in nf-core to produce a partial pipeline that aligns reads to a reference to identify SNPs. Using the same dataset reported in Iruegas-Borcado et al (2023), we demonstrated a proof-of-concept implementation of a nextflow pipeline and showed it performed as expected. Obj. 3. Develop visuals to effectively interpret and communicate WGS data. As part of the nf-core pipeline, we included tools to visualize relationships in a minimum spanning network. We are also in the process of building modules for genome assembly and constructing core genome phylogenies as an additional visualization tool. We are also working on report formats suitable for sharing by email or as a web presentation. Aim II. Implement and integrate tools to apply Meta-WGS for disease diagnostics. Obj. 1. Develop standards and optimize preparatory workflow for use in plant clinics. This work was completed in the previous year. Obj. 2. Integrate available Meta-WGS analysis tools into a workflow for use in plant clinics. We have begun using meta-WGS in the diagnostic setting. We used Nanopore sequencing to characterize samples infected by members of the Ralstonia and Xylella genera to survey population diversity and inform on infections by subspecies complexes. We used meta-WGS to support diagnoses of an emergent fungal pathogen that causes vascular disease on dogwood, redbuds and maple. Additionally, we used Illumina sequencing to diagnose fungal and bacterial pathogens of diverse plant hosts including tomato, pepper, potato, cabbage, geranium and cabbage. We are comparing different Illumina read depths to define the appropriate limits for pathogen detection with meta-WGS. Obj. 3. Develop and optimize novel machine learning algorithms. We previously developed and tested a K-mer based machine learning method using both convolutional neural networks and random forest. The machine learning methods were tested with nanopore sequencing data generated for Pseudomonas syringae-infected tomato leaves and for Xylella-infected grapevine. Findings showed that for P. syringae data, the majority of reads from infected samples were from the pathogen and other microorganisms and the machine learning methods were able to distinguish pathogen-derived reads from plant-derived reads based largely on differential abundance. For Xylella data, a substantial amount of reads in the samples were from the host plant. Nonetheless, the machine learning method was able to differentiate pathogen reads based on GC content, as reads derived from host plants tended to be more AT rich. A revised manuscript is currently under review. Aim III. Plant disease clinics validate the developed protocols and tools. Obj. 1. Validation using inoculated samples. This was completed. See objective 2 of Aim II. Obj. 2. Validation using field collected samples. The three Plant Clinics have archived over 250 tissue samples and their corresponding pathogen isolates and have used multiple means to arrive at a diagnosis. This material will be used as a check on the Meta-WGS analyses of the samples.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2023
Citation:
Iruegas-Bocardo, F., Weisberg, A. J., Riutta, E. R., Kilday, K., Bonkowski, J. C., Creswell, C., Daughtrey, M. L., Rane, K., Gr�nwald, N. J., Chang, J. H.6, and Putnam, M. L. (2023). Whole genome sequencing-based tracing of a 2022 introduction and outbreak of Xanthomonas hortorum pv. pelargonii. Phytopath. https://doi.org/10.1094/PHYTO-09-22-0321-R.
- Type:
Journal Articles
Status:
Published
Year Published:
2022
Citation:
Bernal E, Rotondo F, Roman-Reyna V, Klass T, Timilsina S, Minsavage GV, Iruegas-Bocardo F, Goss EM, Jones JB, Jacobs JM, Miller SA, Francis DM. Migration Drives the Replacement of Xanthomonas perforans Races in the Absence of Widely Deployed Resistance. Front Microbiol. 2022 Mar 18;13:826386. doi: 10.3389/fmicb.2022.826386.
https://www.frontiersin.org/articles/10.3389/fmicb.2022.826386/full
- Type:
Journal Articles
Status:
Accepted
Year Published:
2023
Citation:
Roman-Reyna V, Curland RD, Velez-Negron Y, Ledman KE, Gutierrez Castillo DE, Beutler J, Butchacas J, Brar G, Roberts R, Dill-Macky R, Jacobs JM. Development of genome-driven, lifestyle-informed markers for identification of the cereal-infecting pathogens Xanthomonas translucens pathovars undulosa and translucens. Phytopathology. 12 Oct 2022 (epub ahead of print) https://doi.org/10.1094/PHYTO-07-22-0262-SA
- Type:
Theses/Dissertations
Status:
Accepted
Year Published:
2023
Citation:
https://vtechworks.lib.vt.edu/handle/10919/113825
|
Progress 03/01/21 to 02/28/22
Outputs Target Audience:Plant disease diagnosticians and researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?One postdoctoral researcher (Oregon State University), one postdoctoral researcher (The Ohio State University), one graduate student (Virginia Tech), and one undergraduate (Oregon State University) have been trained at the intersection of plant disease diagnostics, evolutionary biology, genomics as well as metagenomics, and machine learning. How have the results been disseminated to communities of interest?Results have been disseminated via peer-reviewed publication and in review. We have also communicated results to our advisory board, who represent federal and state regulatory agencies, lead diagnostic clinics, and/or have leadership roles in the national plant diagnostics network. We (investigators from each institution) have presented at departmental and national meetings. We have also communicated our project to individual stakeholders and have invited their participation (submit samples and provide feedback on findings). Last, we developed and presented workshops on using these methods in diagnostics. Workshops were delivered at four US universities and two international instiuttions. What do you plan to do during the next reporting period to accomplish the goals?Aim I. Implement and integrate tools to apply WGS for disease diagnostics. Obj. 2. Integrate currently used WGS analysis tools into a workflow for use in plant clinics. We will continue to develop PathID for automating data processing and analyses. We will start developing the three databases of marker genes. We will circumscribe additional species, subspecies, and phylogroups within species in LINbase with a focus on Agrobacterium and related genera. Obj. 3. Develop visuals to effectively interpret and communicate WGS data. We will continue adapting tools such as Nextstrain for data visualization. Aim II. Implement and integrate tools to apply Meta-WGS for disease diagnostics. Obj. 2. Integrate available Meta-WGS analysis tools into a workflow for use in plant clinics. We will write detailed protocols on how to use the devised Illumina and Nanopore analysis work-flow so that it can be tested by disease clinic personnel. Obj. 3. Develop and optimize novel machine learning algorithms. We will further study the machine learning methods using interpretable machine learning approaches. We will test Deep-Lift and RF-SHAP, two advanced feature selection methods to select informative k-mers for the classification. We will determine the k-mer distributions in the genome and identify the minimum informative k-mers for achieving high classification accuracy. For samples with low accuracy, we plan to test a two stage ML model where the first layer model will be used to determine whether the reads are from host genome or the pathogen genome, and the second layer model will determine whether the sample can be classified as infected or healthy. Aim III. Plant disease clinics validate the developed protocols and tools. Obj. 1. Validation using inoculated samples. Oregon State University will continue infecting plants to generate samples for future use. VT will also start this objective and compare results with those obtained from samples consisting of DNA of known combinations of mock microbial communities, DNA of healthy plant DNA, and DNA of known pathogen DNA at different concentrations. VT will share DNA of these samples and of inoculated samples with the group at Ohio State University so we can compare results obtained at VT with nanopore sequencing with those obtained at Ohio State University using Illumina sequencing. Obj. 2. Validation using field collected samples. Plant Disease Clinics will continue archiving samples for use once pipelines are ready to be tested. For goal IV, we have arranged for a speaking engagement at the National Plant Diagnostic Network meeting in April of 2022. At this event, we will present our accomplishment and future goals to scientsts from the 50 states and US territories.
Impacts What was accomplished under these goals?
Goal I. Obj. 1. Reads from deeply sequenced and previously assembled genomes varying in size from approximately 5 Mb to 50 Mb have been used to assess the impact of sequencing depth on genome coverage, genome assembly, and robustness of SNP calls. For bacterial genomes, a depth of coverage of 20X consistently yielded results comparable to assemblies derived from all reads. For eukaryotic pathogen genomes, a 40X coverage is necessary. Findings will be used to guide diagnosticians on multiplexing strategies prior to sequencing and the reliability of results after sequencing. A similar approach was employed to determine the minimal quality for genome assemblies to be used in LINbase. First, assemblies using only Illumina reads were compared with hybrid assemblies using both, short Illumina reads and long nanopore reads. It was found that the two types of assemblies were assigned the same LINs up to position U (99.99% ANI). Therefore, we concluded that closing a genome assembly with long reads does not need to be included in our workflow. Second, assemblies of different quality (in regard to number of contigs, n50, and length of shortest contig) were made using different numbers of Illumina reads. We found that as long as the number of contigs was below 500, the n50 was above 50,000, and the shortest contig was longer than 500, the assigned LINs stayed the same up to position P (99.925% ANI). This ANI threshold is higher than the breadth of the clonal, cool-virulent brown rot pandemic lineage (approximately corresponding to the select agent R. solanacearum Race 3 biovar 2), which has an ANI breadth of 99.9% (manuscript in preparation based on result obtained as part of USDA APHIS Farmbill project AP19PPQS&T00C083). Therefore, the minimal assembly quality (contig number <500; n50>50,000; shortest contig >500) will be used for our disease diagnostics workflow for WGS-based identification using LINbase. Last for this aim, using an in-house Illumina miniSeq purchased using funds received from a state agency, we have made significant progress in developing a standard preparatory workflow for preparing DNA and making libraries. Obj. 2. We are in the process of scripting an automated workflow that will initiate data processing and analyses once sequencing reads are transferred onto our servers (PathID). We are adapting our pipeline for NextFlow, which overcomes key issues with portability, reproducibility, and continuous checkpoint. We are currently examining a recently developed NextFlow-based pipeline called Bactopia to assess the ease to which we can adapt it for our needs. We have circumscribed Pseudomonas, Xanthomonas, Xylella, and Ralstonia species, subspecies, and phylogroups within species in LINbase. The Ralstonia circumscriptions were done as part of the USDA APHIS Farmbill project AP19PPQS&T00C083. Any user of LINbase can now identify a genome sequence as a member of any of the circumscribed groups. We are in the process of adding circumscriptions of genome-based taxa (including validly published named species as well as genome-similarity-based genomospecies) for Agrobacterium and related genera as well. We have also used currently available genome sequences to calibrate the LINbase classification scheme to that of hierBAPS. This is a crucial step for determining sub-species level relationships of plant pathogens for identifying suitable references for calling single nucleotide polymorphisms. To make LINbase and the planned PathID Web server compatible with each other, we are developing an application programming interface (API) for LINbase so that the future PathID Web server can communicate with LINbase. Obj. 3. We are testing NextStrain as a potential method for rapidly visualizing data. Goal II. Obj. 1. This work has been completed and its efficacy has been demonstrated for detecting Xylella fastidiosa (see products). We are currently testing the applicability of these methods for detecting plant-associated pathogens that reside within leaf tissues and among more complex microbial communities. Obj. 2. For Meta-WGS using nanopore sequencing, we have devised the workflow shown in Figure 11. Once this workflow has been validated, we will train the Virginia Tech Plant Disease Clinic personnel in using this workflow. Obj. 3. We have developed and tested a K-mer based machine learning method using both convolutional neural networks and random forest. The machine learning methods were tested with nanopore sequencing data generated for Pseudomonas syringae-infected tomato leaves and for Xylella-infected grapevine. For P. syringae data, we have achieved high accuracy of > 95% using both models. For Xylella data, we can only achieve 65% accuracy. The success rates seem to be related to the complexity of the sequencing library. In P. syringae samples, the majority of reads from infected samples were from the pathogen and other microorganisms. In contrast, for the Xylella data, a substantial amount of reads in the samples were from the host plant. Goal III. Obj. 1. For nanopore sequencing, we have mostly used field collected samples so far. Analyzing these samples with Meta-WGS and comparing the obtained results with qPCR, we have realized the need to start validation using known concentrations of pure DNA of mock microbial communities mixed with known concentrations of DNA of pure healthy plant DNA and known concentrations of pure pathogen DNA. We will compare Meta-WGS results obtained with these samples with qPCR to compute detection thresholds. Once this step is completed, we will transition to inoculated samples. We have inoculated tomato plants with a strain of agrobacteria. Total DNA will be collected from galls of various age. These samples will be used in the future to test the efficacy of Meta-WGS methods. Obj. 2. Not yet started but the associated plant disease clinics have begun archiving diagnosed samples that will be used in the future for assess the efficacy of WGS- and Meta-WGS-based detection methods.
Publications
- Type:
Journal Articles
Status:
Accepted
Year Published:
2021
Citation:
Roman-Reyna et al., (2021; https://doi.org/10.1128/mSystems.00591-21)
|
|