DSFAS PARTNERSHIP: High-Dimensional Phenotype Data Management and Analysis Infrastructure for Plant Breeding

Recipient Organization
CLEMSON UNIVERSITY
(N/A)
CLEMSON,SC 29634

Performing Department
(N/A)

Non Technical Summary
Plant high dimensional phenotypes (HDPs) are an increasingly valuable tool for understanding biological function and informing plant breeding selection decisions. Also known as omics data, HDPs encompass a wide range of data types, including spectral, metabolomic, transcriptomic, proteomic, and microbiome metagenomic data. To further the utilization of HDPs in plant breeding programs, these data must be clearly labeled, easily accessible, and fully integrated with other data types. Our interdisciplinary team of plant scientists and software developers has extensive experience building user-friendly, open-source data management and analysis tools that have been widely adopted by plant breeders and geneticists. This project focuses on adding support for HDPs to these tools through the following objectives:A.Generate appropriate data models for HDPs that accurately represent data structure and metadata for plant breeding and genetics use cases;B.Develop Breeding Application Programming Interface (BrAPI) standards to efficiently handle each HDP data type;C.Design and implement HDP storage structures in a BrAPI-enabled breeding database;D.Integrate HDP BrAPI endpoints into widely-used plant breeding data collection tools;E.Develop HDP BrAPI-enabled analysis applications (BrAPPs) that integrate omics and other plant breeding data types.The tools developed from this project will provide the digital ecosystem necessary to handle and integrate HDP data with other plant breeding data types, ultimately accelerating the development of improved crop varieties for food, fiber, and fuel. This proposed project addresses the AFRI Program Priority Areas of Plant Health and Production and Plant Products and Agriculture Systems and Technology.

Animal Health Component

(N/A)

Research Effort Categories

Basic

(N/A)

Applied

(N/A)

Developmental

100%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
204	2410	1081	100%

Knowledge Area
204 - Plant Product Quality and Utility (Preharvest);

Subject Of Investigation
2410 - Cross-commodity research--multiple crops;

Field Of Science
1081 - Breeding;

Keywords

api

data management

high dimensional phenotypes

plant breeding

Goals / Objectives
The overall aim of this project is to increase the accessibility of high-dimensional phenotypes (HDPs)for plant breeding programs through the development of tools for HDP data collection, transfer, storage, and analysis.Specific objectives include:Generate appropriate data models for HDPs (spectral, transcriptomic, metabolomic, proteomic, and microbiome data) that accurately represent data structure and associated metadata for plant breeding and genetics use cases.Develop Plant Breeding API (BrAPI) standards to efficiently handle HDPs based on the data models for each data type.Design and implement HDP storage structures in BrAPI-enabled breeding databases.Integrate HDP BrAPI calls into widely-used plant breeding data collection tools.Develop HDP BrAPI-enabled analysis applications (BrAPPs) that integrate omics and other plant breeding data types.

Project Methods
OBJECTIVE A.The Plant Breeding API (BrAPI) and Breedbase require well-defined data structures and comprehensive, descriptive metadata for the transfer, storage, and analysis of phenotypic data. We have developed a generic high dimensional phenotype (HDP) data framework that captures the necessary metadata for HDP experiments. This framework has already been used to generate data models for near-infrared spectra, metabolomics, and transcriptomics HDP data types. In this objective, we will consult with domain experts to map domain-specific metadata terms onto our generic data model for the remaining HDP data types, proteomics and metagenomics. These data models will be referenced in each of the subsequent objectives. We define HDP data models according to the following levels: field trial, sampling trial, protocol, phenotype annotation, and data matrix.This objective will be co-led by PD Hershberger and co-PI Jannink, with support from the entire project team. The project team has been meeting weekly for the past two years to discuss HDP data models and their translation into BrAPI standards and data storage systems. Hershberger and Jannink will lead these regular group meetings and coordinate discussions with domain experts to further define specific HDP data models.OBJECTIVE B.The BrAPI specification describes a standard Representational State Transfer (REST)ful web service API for breeding data.We have developed data models and endpoints for describing NIRS, metabolomics, and transcriptomics data structures using a shared design pattern with specific details defined for unique pieces of metadata. This design pattern uses well-defined data models that follow the level concepts described above and defines a set of standard RESTful endpoints for each HDP data type. This set of endpoints allows a client application to search for and retrieve metadata from the Protocol and Phenotype annotation levels, as well as retrieve a whole or subset of a data matrix.The proteomic and metagenomic BrAPI specification development process will follow the same process used for developing the existing HDP specifications. A developer will translate the data fields from the data models (Objective A) into technical specification documents. The technical specification documents use the OpenAPI3 standard and are written in the YAML format. The developer will take the list of data fields and construct a JSON schema model for a protocol-level object and a Phenotype Annotation-level object specific to the HDP data type. Endpoint definitions to search and retrieve the data will be added across the data types. The finished product will be a collection of OpenAPI3 YAML files with all the endpoint and data model definitions that can be readily shared and displayed online with tools like Apiary and SwaggerHub.We will test the specification thoroughly byevaluatingthe structure and content of the data being retrieved by the API while it is being built, checking to ensure all the fields are populated and have the expected content.This objective will be led by Key Personnel Selby with support from Co-PD Mueller and co-PD Rife.OBJECTIVE C.Breedbase provides a BrAPI-compliant digital ecosystem for plant breeding and genetics research. Support for spectral data upload and storage was recently added to Breedbase by our team through a collaboration between co-PD Mueller and PD Hershberger. In this Objective, we will expand Breedbase HDP data storage capabilities for new data types and implement the new HDP BrAPI calls (Objective B) directly into the Breedbase platform.Guided by the data models generated in Objective A, storage and upload features for metabolomic, transcriptomic, proteomic, and metagenomic data types will be added to Breedbase. These new HDPs will be stored in a special JavaScript Object Notation (JSON) format in the Postgres database following the previously developed example of the spectral data type. The JSON format is a type of NoSQL database that is integrated into data fields in the database, creating a database with both SQL and NoSQL features. HDP JSON objects will be linked to other levels of data type-specific metadata and relevant objects in the database, as described by the data models generated in Objective A. HDP observations will be linked to general metadata, including the observation unit, operator, timestamp, and geolocation. This information is stored in a relational database schema based on the Chado Natural Diversity schema.HDP BrAPI endpoints will be implemented directly into the underlying Breedbase code. Breedbase is built on Catalyst, an open-source web framework with excellent support for parsing complex RESTful URLs that has enabled the robust implementation of BrAPI.An updated Breedbase build is released nearly every two weeks. The development process heavily relies on the issue, project management, automated builds, and code release tools available from GitHub.This objective will be led by co-PD Mueller with support from Key Personnel Selby.OBJECTIVE D.The PhenoApps suite is a collection of open-source Android apps designed that are easily integrated into plant breeding workflows for data collection and management. Field Book, the flagship app in this suite, is a digital equivalent of traditional paper field books used in plant breeding. Field Book supports the BrAPI v2.0 standard. In this objective, we will integrate support for portable spectrometers into Field Book along with the HDP BrAPI endpoints necessary to transmit spectral data to BrAPI-compliant databases.This objective will be led by co-PD Rife with support from Key Personnel Selby.OBJECTIVE E.BrAPI-enabled applications (BrAPPs) are standalone tools that pull data using BrAPI for analysis or visualization. They can run independently or easily integrate into existing data analysis pipelines and BrAPI-compliant databases.For this objective, we will develop BrAPPs that allow plant breeders to utilize HDPs more readily for selection and analysis with a focus on helper tools, quality control (QC), and low-dimensional phenotype prediction.To develop and deploy helper BrAPPs efficiently, we will first work toward adding HDP-specific BrAPI-based R functions to existing BrAPI R packages such asbrapiandQBMS. Integrating these functions into publicly available packages will enable existing R-based tools to be extended to support BrAPI communication. We will develop a BrAPP to impute missing HDP phenotype observations and present the user with sample-by-phenotype cases of deviations between observed and predicted values based on a statistically definable threshold. We will design a BrAPP that facilitates routine sample clustering, allowing users to select and visualize clusters in lower dimensions and select, filter, or subset individual clusters for further analysis and investigation.We will develop a BrAPPbased on the Breedbase spectral prediction tool. The underlying R package,waves, is maintained by PD Hershberger. As part of this objective, the Breedbase spectral prediction tool will be refactored into a standalone, BrAPI-compliant R Shiny application that will enable compatibility with any BrAPI-compliant database.Multi-omic prediction will be supported through a BrAPP that combines functionality from existing, non-BrAPI-compliant analysis tools. Breedbase currently contains a genomic relationship matrix calculatorthat will be converted to a standalone BrAPI-compliant R Shiny application and further extended to calculate relationship matrices from HDPs that allow analyses complementary to traditional genomic BLUP, including phenomic predictionand transcriptomic prediction. A MegaLMM R package is available on GitHub with a permissive MIT licenseand will be integrated into an existing genomic prediction tool in Breedbase that will be further updated to support BrAPI.This objective will be led by PD Hershberger and Key Personnel Jannink and supported by Co-PD Mueller.