Source: VIRGINIA POLYTECHNIC INSTITUTE submitted to NRP
ACQUISITION OF LONG READ, HIGH-THROUGHPUT SEQUENCING DEVICE FOR FOOD AND AGRICULTURE RESEARCH AT VIRGINIA TECH
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1024329
Grant No.
2020-70410-32900
Cumulative Award Amt.
$297,000.00
Proposal No.
2020-07814
Multistate No.
(N/A)
Project Start Date
Sep 1, 2020
Project End Date
Aug 31, 2023
Grant Year
2020
Program Code
[EGP]- Equipment Grants Program
Recipient Organization
VIRGINIA POLYTECHNIC INSTITUTE
(N/A)
BLACKSBURG,VA 24061
Performing Department
School of Plant Sciences
Non Technical Summary
In this equipment grant, we propose to purchase a high throughput, long read sequencing equipment PromethION24, from the Oxford nanopore technology. Gene sequencing has become an indispensable tool for research in agriculture, food, and environmental sciences. Through this grant, 18 participating research groups from Virginia Tech will gain access to long read sequencing capacity provided by PromethION24. Traditional sequencing technology can only generate sequences with the size of 100-250 base pairs, whereas the PromethION24 can generate sequences up to one million base pairs. Such long read technology can enable novel research projects in the fields of genomics and metagenomics of plant and animal research, water quality and antibiotic resistance genes, soil health and microbial metagenomics, disease borne vectors, and food safety and security. With long read sequencing, we can assembly genomes of pathogenic microorganisms for plants or animals in order to achieve early disease detection. We can sequence DNA found in waste water treatment plants to detect antibiotic resistant genes in order to prevent further spread of microorganisms carrying such genes. We can also use genomic sequencing to assist breeding and selection of specialty crops. This instrument will be used to sequence DNA or RNA samples from plant, animals, microorganisms found in water, soil, plant surface, infected tissues, animal digestive tracks and food products. Depends on the specific research question, different approaches will include whole genome sequencing, metagenomic sequencing, transcriptome and meta-transcriptome sequencing. The participating groups will submit internal grants for their research projects and these grants will be peer reviewed. Results will be disseminated through peer reviewed publications, extension publications, poster and oral presentations in conferences, and reports to end users. Using genomic sequencing, the goals of this project will include: faster diagnostic tools for plant diseases; improved throughput and accuracy for antibiotic resistant gene identification; high-throughput identification of beneficial microbiome in cattle digestive system; improved genome sequences and genetic map of crops; new methods for food borne pathogen identification; and new computational pipelines and machine learning tools to analyze nanopore sequencing data. We will also engage in training of undergraduate/graduate students, postdocs and research scientists. We will broaden the participation of Ag and Engineering education in women and underrepresented minorities, through programs include VT-REEL, MAOP for outreach to under-represented minorities, and Governor's School of Agriculture for outreach to high school girls. Towards the end of this grant, we expect to provide service to regional and national research programs, including more than five neighboring states and ten higher educational institutes, particular, for universities with strong Ag research programs.
Animal Health Component
25%
Research Effort Categories
Basic
75%
Applied
25%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2011419108110%
2051131116010%
2061550110110%
2011820101010%
3043410102010%
3053320105010%
1024099107010%
1020210110020%
7124010110010%
Goals / Objectives
Major goals of this project include to use a long read sequencing equipment PromethION24 to advance cutting-edge research and extension projects; to educate and train postdoc, senior scientists, graduate, undergraduate, and K-12 students; and to serve regional and national needs of genomic sequencing in agriculture research and extension. On the research front, we plan to use this equipment to enhance research and extension activities in the field of genomics and metagenomics of plant and animal research, water quality and antibiotic resistance genes, soil health and microbial metagenomics, disease borne vectors, and food safety and security. The data generated by this equipment will spur new trans-disciplinary collaboration in the field of agriculture research with engineering, computer science, machine learning and data sciences. The PromethION24 will be maintained by our Genomics Core Facility, which provides extensive experience in servicing the Virginia Tech research community. The Fralin Life Sciences Institute will provide long term support during and after the end of grant period. All participating collaborators agree to broaden the participation of Ag and Engineering education in women and underrepresented minorities, and high school students. Towards the end of this grant, we expect to provide service to regional and national research programs with a focus on agriculture related research.
Project Methods
In the first year, internal competition of the participating groups will be organized. In the second year, additional projects will be solicitated from groups outside the original participating groups. In the third year, external projects will be solicitated from other research institutions. Priorities will be given to projected funded by USDA and those projects related to USDA research. We plan to conduct two round of internal competitions. First round competition will start in Fall 2020, and it will award to up to 5 projects. The second round of competition will start in Spring 2020. The exact method for sequencing will depends on the project. The typical experiments will include genome (DNA) or transcriptome (RNA) sequencing for plants, animals or insects and meta-genomic or transcriptomics for microbial species. Graduate students, undergraduate students and senior research personnel will be trained for nanopore sample preparation. The main educational component will be the training of data analytics. Classroom instruction will include hands on analysis of nanopore data in a graduate class, "introduction to genomic data sciences", which will be offered in every fall semester. Student will learn how to use high performance computing clusters to analyze long read sequencing data, how to run computational pipelines for data quality control, read map and read assembly. Python and R scripts will be developed by students to integrate long read sequencing into their class project. Another graduate class, "plant genomics", will be offered every other spring semester, and this class will include a tutorial on how to use long read sequencing data for genomic research in agriculture. In this class, students will be required to develop a proposal using long read sequencing technologies as part of their projects. Evaluation Research and extension projects will be evaluated based on the amount of data generated and released to the public domain, peer reviewed journal or extension publications, workshops and presentations given, posters and abstracts published, number of undergraduate, graduate and senior personnel trained using this instrument. Grant proposals submitted or funded using data generated by this instrument. Education projects will be evaluated based on the number of students trained in the classes, workshops. Outreach activities will be evaluated based on the number of K-12 students participated in the governor's school summer program. Under-represented minorities trained during this project will also be evaluated based on the established process of VT-REEL and MOAP programs.

Progress 09/01/20 to 08/31/23

Outputs
Target Audience:1. Faculty and senior research scientists at Virginia Tech working in plant genomics, plant pathology, animal health, animal microbiome, soil microbiome, environmental microbiome, and disease vectors. 2. Academic research scientists outside Virginia Tech in above mentioned research areas. 3. Graduate and undergraduate students working in agriculture related discipline. 5. Agricultural producers, farmers, crop advisors. 4. State commodity board and CALS advisory board members including high level executives, business owners and state policy makers Changes/Problems:For the tomato project,two attempts at nanopore by the PromethIon were done, but both failed due to (i) incompatibilityofthe library kit and flow cell or (ii) corrupted flow cell/data. We are currently waiting for the new direct RNAseq kit and compatible flow cell that we can use. The quality of nanopore flow cells is highly variable from batch to batch, which causes a lot of frustration for the users. What opportunities for training and professional development has the project provided?During the full project period of the past three years, we have educated and trained 12postdocs and senior scientists, 50+graduate students (including attendees of the nanopore day event), and outreach activities with more than 60 K-12 students through governor's school of agriculture and 4H events. How have the results been disseminated to communities of interest?The results have been disseminated to the communities via poster and oral presentations in international,national, and regional conferences, publications in peer-reviewed journals, and through a hybrid training event, Virginia Tech nanopore day with more than 30attendees. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? Significant progress has been made under the outlined goals of the research projects.We have used PromethION24 for whole genome resequencing and methylation analysis. Our aims to serve regional and national needs in genomic sequencing for agriculture research and extension have been achieved through the research projects listed in the product section and summarized below. (1) Edamame/Soybean Genomics: weperformed whole genome resequencing for 28 Edamame/Soybean varieties. We identified structural variations in soybean genomes using Nanopore long-read sequencing. We also validated newly identified structural variations using bench experiments. A follow-up experiment using CRISPR editing is planned to confirm the function of structural variations in soybean genomes. This research can benefit the soybean and edamame breeding community by introducing a new type of genomic marker through structural variation. (2) We have performed whole genome resequencing of mosquito genomes, and obtained >100 Gbases PromethION reads from four individuals of Aedes aegypti. We produced preliminary haplotype-resolved assemblies for two individuals. The quality of assemblies exceeded the most recent published A. aegypti genome assembly. This research can further benefit the research community related to vector born diseases. (3) We have performed whole genome resequencing of the turkey genome. The aim is to polish the domestic turkey genome and explore genome divergence in wild turkey species. We generated additional samples for Eastern Wild Turkey and Mexican Oscillated Turkey. (4) We performed whole genome resequencing for five Brassicaceae species. These species included Arabidopsis thaliana, Camelina sativa, Schrenkiella parvula, Eutrema salsugineum, and Sisymbrium irio. The results were used to enhance genome and organelle assembly for a comparative study. (5) We performed sequencing of mouse cortical cell types of nine libraries for three different mouse cortical cell types. We evaluated methylation levels for over 40M CpG sites, and finished one paper ready for submission and integrated data into an online resource (not public yet). (6) We worked on generating long-read-based transcriptome library for developing tomato fruits. We created a long-read-based transcriptome library for developing tomato fruits, and we planned to use it for differential transcript expression and co-expression analyses. (7) We sequenced a green algae genome of Chlamydomonas reinhardtii. Ongoing bioinformatic analyses with a draft genome that is planned for future publication. (8) We performed metagenomic sequencing for plant disease diagnostics. We completed four runs of sequencing between 13 and 24 samples per run and submitted data to NCBI SRA for the first two runs. Overall, we generated over 1 Terabyte of data for the 3rd and 4th runs.In summary, the accomplishments include the successful utilization of PromethION24 for various sequencing projects, significant progress in whole genome resequencing for different species, training of students and researchers, and the generation of substantial genomic data for further analysis and publication. The projects align with the broader goals of advancing genomic research and addressing key questions in agriculture, metagenomics, and genomics of various organisms.

Publications

  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: DNA methylation regulates TrkB isoform expression in CNS. Wei X and Olsen ML, Neuroscience 2022, San Diego, Nov 2022.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Methylome, transcriptome and alternative splicing profiling of neurons, astrocytes, and microglia. Wei X and Olsen ML, Biology of Genomes 2023, Cold Spring Harbor, May 2023.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Genome wide 5mC and 5hmC patterns determine unique transcriptional signatures, transcriptional regulators and alternative splicing of neural cell types in mouse brain. Wei X, Li J, Cheng Z, Wei S, Yu G and Olsen ML, Neuroscience 2023, Washington D.C., Nov 2023
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Genome wide 5mC and 5hmC patterns determine unique transcriptional signatures, transcriptional regulators and alternative splicing of neural cell types in mouse brain. Wei X, Li J, Cheng Z, Wei S, Yu G and Olsen ML, 2023 SoN summer research retreat, July 2023.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Genome wide 5mC and 5hmC patterns contribute to unique transcriptional signatures, transcriptional regulators and alternative splicing of neural cell types in mouse brain. Wei X, Li J, Cheng Z, Wei S, Yu G and Olsen ML, Neuroscience 2023, Washington D.C., Nov 2023
  • Type: Conference Papers and Presentations Status: Published Year Published: 2023 Citation: Long-Read Whole Genome Sequencing Reveals Novel Structural Variation Markers for Important Agronomic and Quality Traits of Soybean Zhibo Wang, Kassaye Belay, Joe Paterson, Qijian Song, Patrick Bewick, Bo Zhang, Song Li, Nature Plant. Under preparation.
  • Type: Journal Articles Status: Published Year Published: 2023 Citation: A survey of Xylella fastidiosa in the US state of Virginia reveals wide distribution of both subspecies fastidiosa and multiplex in grapevine. Sahar Abdelrazek, Elizabeth Bush, Charlotte L. Oliver, Haijie Liu, Parul Sharma, Marcela Aguilera Flores, Monica Ann Donegan, Rodrigo Almeida, Mizuho Nita, and Boris Vinatzer. Phytopathology.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Abdelrazek, S., et al. "Characterization of the Xylella fastidiosa population in Virginia using metagenomics." PHYTOPATHOLOGY. Vol. 112. No. 11. 3340 PILOT KNOB ROAD, ST PAUL, MN 55121 USA: AMER PHYTOPATHOLOGICAL SOC, 2022.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2022 Citation: Application of Interpretable Machine Learning Methods in Plant Genomics, Song Li, Virginia Tech, 9th Plant Genomics & Gene Editing Congress USA


Progress 09/01/21 to 08/31/22

Outputs
Target Audience:1. Faculty and senior research scientists at Virginia Tech working in plant genomics, plant pathology, animal health,environmental microbiome, and disease vectors. 2. Academic research scientists outside Virginia Tech in above mentioned research areas. 3. Graduate and undergraduate students working in agriculture-related disciplines. Changes/Problems:One of the major challenges is the quality of the flowcells and the expiration date of the contract of the flow cells. We did not expect that all the flowcells would expire within 1 year of purchasing this machine. Therefore, several potential users could not use the free flowcells provided as part of the package of the sequencing machine. The quality of the flowcells also varies from batch to batch. There are also complains about the business handling of the orders and packages which are not very timely. What opportunities for training and professional development has the project provided?We have trained three postdoc associates and five graduate students during this period in nanopore sequencing experimental design, library preparation, and data analysis. We also hosted summer 4H and Governor School events as outreach activities to demonstrate the power of our sequencing capability and explained the use of sequencing for plant breeding, plant disease surveillance, among other applications. How have the results been disseminated to communities of interest?The results have been disseminated to potential users via poster presentations, and conferences. We have found several potential users from NC State University and UNC Charlotte who are interested in whole genome resequencing for plant and microbial genomes. What do you plan to do during the next reporting period to accomplish the goals?We plan to expand the userbase of this equipment to more complex applications such as sequencing microbiome, viruses, and RNA sequencing for plant and animal systems.

Impacts
What was accomplished under these goals? We have started an internal grant competition and determined a priority list of projects to be used on the sequencing machine. The list isbased on the technical difficulties of library preparation and relevance of the data to the project goals. From a technical point of view, the whole genome DNA sequencing is most straight forward for library preparation, followed by microbiome projects and followed by RNA seq and virus sequencing. In this year, we have completed most genomic DNAsequencing-based projects and start to move to microbiome and RNA-seq experiments.

Publications


    Progress 09/01/20 to 08/31/21

    Outputs
    Target Audience:Our audience include the following categories: 1. Faculty and senior research scientists at Virginia Tech working in plant genomics, plant pathology, cattle health, cattle microbiome, soil microbiome, environmental microbiome, disease vectors, water quality, and food safety. 2. Academic research scientists outside Virginia Tech in above mentioned research areas. 3. Extension agents related to above mentioned areas. 4. Graduate and undergraduate students working in agriculture related discipline. 5. Agricultural producers, farmers, crop advisors. 6. State commodity board and CALS advisory board members including high level executives, business owners and state policy makers.? Changes/Problems:We have faced many challenges due to the COVID19 pandemics. These challenges include: (1) Delivery of the machine is delayed at customs and the arrival of the machine is four months behind schedule. Shipping budget exceeded the proposed budget, but the Fralin Life Sciences Institute supported the additional shipping cost. (2) Onsite service and in person training are impacted by the COVID restrictions. (3) Flow cell quality is variable and a portion of the flow cells are not of good quality before warranty date. (4) Other experimental supplies such as tubes for extracting large molecular weight DNA from samples are constantly out of supply. These shortages are partly due to COVID related supply chain problem. (5) Summer high school event was not organized due to concern of COVID. What opportunities for training and professional development has the project provided?We have trained 25 graduate students and two postdocs in nanopore data analysis pipeline using computational tools. Six senior research scientists, postdocs and research assistants have been trained for Nanopore library preparation. A senior research scientist from genomic sequencing center at Virginia Tech has been trained for operation of the PromthION sequencing machine. How have the results been disseminated to communities of interest?The results have been disseminated through multiple channels including emails through (1) an internal Nanopore user email list; (2) teaching of a graduate level class; and (3) hands-on training of library preparation, operation of the sequencing machine and data analysis. What do you plan to do during the next reporting period to accomplish the goals?Our plan for next reporting period includes the following activities: 1. Organizing nanopore day. This event will happen in the weeks of March 5-13, 2022, which is spring break for Virginia Tech students. The event will be hosted at Virginia Tech as a full day event. The schedule will include a presentation from the Nanopore company representatives, and research presentation from local investigators for their data generated using the PromthION machine. This event will also include hands-on training of sample preparation. Participating members will be asked to bring their samples and nanopore a representative will supervise participants for real time sample preparation and sequencing. 2. In the second year, we plan to encourage participating PIs to sequencing more samples that are not DNA. For example, we plan to sequence more samples from RNA-seq, virus-seq, microbiome-seq. We will expand the use of the machine to projects related to food science, anti-biotic resistance and plant pathology domains. 3. We plan to incorporate more educational events include guest lectures in graduate and undergraduate classes, encouraging students and postdocs to showcase their research results through poster presentation at local, national and international meetings. If situation permits, we will include nanopore sequencing demo in summer high school camps such as Virginia Governor's School of Agriculture in summer 2022. 4. We plan to develop standardized protocol and pricing structure for external users (who was not involved in the original proposal but are future customers). This will be useful for the third year of the project where we will open up service for external stake holders such as users from other universities to use this sequencing machine. 5. To eliminate the data transfer bottleneck, we will establish a 10Gb connection to the sequencing machine. We also plan to seek funding for better computing clusters through NSF-MSI, and to develop portal on current cluster computing system.

    Impacts
    What was accomplished under these goals? Internal proposal and first round of sequencing. As described in the original proposal, we have organized an internal proposal competition and based on the internal proposals, we have initiated the first round of nanopore sequencing for PIs who contributed supporting letters to the proposal. The first round of nanopore sequencing projects were focused on whole genome DNA sequencing due to simplicity of sample preparation. These round of projects helps the project team to establish the protocols for sample preparation, quality control, and data analysis pipeline. We also able to better understand the entire workflow of nanopore sequencing and the computational infrastructure needs for analyzing the nanopore data. Developed course materials for teaching. We have developed teaching materials for analyzing nanopore sequencing data. The teaching materials include teaching slides for two class sessions dedicated to the analysis of nanopore data and other related materials for using Linux clusters for genome data analysis. The teaching materials also include sample data and analytic pipelines developed for whole genome read mapping, read polish and whole genome de novo assembly. The teaching materials have been used in one graduate level class, 'introduction to genomic data science' and will be used in future instructions of this class.

    Publications