Source: UNIVERSITY OF KENTUCKY submitted to NRP
TRANSLATING GENOMICS INTO EFFECTIVE PEST MANAGEMENT: EXPANDING ACCESS TO BIOINFORMATIC RESOURCES FOR SPECIES IDENTIFICATION AND PATHWAY ANALYSIS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1030563
Grant No.
2023-67012-39892
Cumulative Award Amt.
$146,000.00
Proposal No.
2022-09766
Multistate No.
(N/A)
Project Start Date
Sep 1, 2023
Project End Date
Jan 15, 2025
Grant Year
2023
Program Code
[A1112]- Pests and Beneficial Species in Agricultural Production Systems
Recipient Organization
UNIVERSITY OF KENTUCKY
500 S LIMESTONE 109 KINKEAD HALL
LEXINGTON,KY 40526-0001
Performing Department
(N/A)
Non Technical Summary
Accelerating climate change and globalization are driving unexpected changes in the distribution of invasive pest species. Accurate and rapid species identification and pathway analysis are critical to effective management but oftentimes pests belong to closely related species groups that are morphologically indistinct yet have different ecological impacts. For these, genomic diagnostic marker panels are an important tool for identification. However, genotype calling and analysis relies on bioinformatic pipelines that require a high level of expertise to execute and interpret, a significant barrier to the wide-scale adoption of a powerful phytosanitary resource. To overcome this barrier, this postdoctoral project intends to develop a user-friendly, end-to-end software program for sequence-based diagnostic tool analysis. Starting with next-generation sequencing data, it will flexibly analyze sequence data or call genotypes for probabilistic species identification, down to strain or population depending on the panel, and if applicable, perform geographic source/pathway analysis. Accessibility will be addressed with secure online hosting and the creation of a user-friendly interface. By emphasizing standardization, the software will be intentionally designed as a universal framework for sequencing-based diagnostics, providing flexibility for diverse, taxonomically difficult pest groups. Thus, the proposed work contributes to the AFRI Farm Bill Priority Area: Plant Health and Production and Animal Products and Program Area: 1c. Pests and Beneficial Species in Agricultural Production Systems (Program Area Priority Code: A1112) as a rapid, flexible resource for pest management and fits the AFRI EWD goal of "Advancing Science" by providing training that is highly relevant to the future of the USDA.
Animal Health Component
50%
Research Effort Categories
Basic
50%
Applied
50%
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
21131101130100%
Goals / Objectives
Develop a user-friendly, end-to-end software program for sequence-based diagnostic tool analysis. It will flexibly analyze next-generation sequence data or call genotypes for species identification, down to strain or population depending on the panel, and if applicable, perform geographic source/pathway analysis. Accessibility will be addressed with secure online hosting and the creation of a user-friendly interface. By emphasizing standardization, the software will be intentionally designed as a universal framework for sequencing-based diagnostics, providing flexibility for diverse, taxonomically difficult pest groups.
Project Methods
I. Software development: Develop a universal application programming interface (API) for the analysis of sequence data for diagnostic questions.I-A. File inputs.I-B-1. Data analysis: Genotyping SNPs from fastq data.I-B-2. Data analysis: Genotypes from non-HTS platforms.I-B-3. Data analysis: Creating consensus sequences.I-C-1. Diagnostic analysis: Species assignment.I-C-2. Diagnostic analysis: Cluster assignment.II. Software accessibility: Develop a graphical user interface (GUI) and demonstrate the API and GUI online.II-A. Develop interface.II-B. Demonstrate the program online.III. Document and demonstrate the utility of the software with publicly available tephritid data.

Progress 09/01/23 to 01/15/25

Outputs
Target Audience:Researchers and technicians with limited bioinformatic experience who are conducting studies/surveys (often agricultural) that require insect diagnostics. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? Nothing Reported How have the results been disseminated to communities of interest? Nothing Reported What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? To support the bioinformatic pipelines, I learned about software management, namely workflows and software containerization. Workflow tools (I learned Common Workflow Language) provide an infrastructure to set up and run a sequence of tasks, for example, when importing sequence data and calling genotypes. Software containerization creates a portable, isolated environment that has the main programs and their dependencies; this makes it easier to run programs across different computing environments. I learned how to make Docker containers, which is a popular open-source platform for containerization.

Publications


    Progress 09/01/23 to 08/31/24

    Outputs
    Target Audience:Researchers monitoring invasive or pest insects for phytosanitary purposes. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? I developed teaching experience by creating and teaching an introductory bioinformatics workshop at the Univeristy of Iceland. This was a week-long, bootcamp-style workshop (Bioinformatics, from Command Line to Genomes) on bioinformatics and data analysis in population genomics at the University of Iceland from May 13 - 16th 2024. How have the results been disseminated to communities of interest? Publication on diagnostic marker selection in the journal Bioinformatics. Publicly available code on GitHub. What do you plan to do during the next reporting period to accomplish the goals? For the first objective, I will develop the code for the API that allows different bioinformatic software/pipelines to interact. For the second objective, I will develop the GUI that allows users to interact with the software without command-line experience. The third objective is to document and demonstrate the utility of the software with publicly available tephritid data. This is contingent on the first two objectives, however, I should be able to start on this objective while the other two are in progress.

    Impacts
    What was accomplished under these goals? The first objective is software development, where I develop an analysis pipeline with a universal application programming interface (API) that allows the pipeline programs to share data. I am learning the necessary Python coding skills and have researched bioinformatic programs for the pipeline. Because this project is intended for non-model systems and emerging pests, I also thought about programs that streamline the development of new diagnostic panels. Thus, I co-authored an R package, snpAIMeR (https://github.com/OksanaVe/snpAIMeR), that helps select panel markers from a set of candidate markers. The paper for this program was published at the journal Bioinformatics. The second objective addresses software accessibility, where I develop a graphical user interface (GUI) for point-and-click analysis (instead of expecting users to know command line). Here, I am learning the necessary Python coding including the Tkinter library for creating GUI applications and the Bokeh library for interactive data visualization.

    Publications

    • Type: Journal Articles Status: Published Year Published: 2024 Citation: Vertacnik, K. L., Vernygora, O. V., & Dupuis, J. R. (2024). snpAIMeR: R package for evaluating ancestry informative marker contributions in non-model population diagnostics. Bioinformatics, btae377.