Source: CLEMSON UNIVERSITY submitted to NRP
DSFAS PARTNERSHIP: HIGH-DIMENSIONAL PHENOTYPE DATA MANAGEMENT AND ANALYSIS INFRASTRUCTURE FOR PLANT BREEDING
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1032341
Grant No.
2024-67021-42519
Cumulative Award Amt.
$725,925.00
Proposal No.
2023-11724
Multistate No.
(N/A)
Project Start Date
Sep 1, 2024
Project End Date
Aug 31, 2027
Grant Year
2024
Program Code
[A1541]- Food and Agriculture Cyberinformatics and Tools
Recipient Organization
CLEMSON UNIVERSITY
(N/A)
CLEMSON,SC 29634
Performing Department
(N/A)
Non Technical Summary
Plant high dimensional phenotypes (HDPs) are an increasingly valuable tool for understanding biological function and informing plant breeding selection decisions. Also known as omics data, HDPs encompass a wide range of data types, including spectral, metabolomic, transcriptomic, proteomic, and microbiome metagenomic data. To further the utilization of HDPs in plant breeding programs, these data must be clearly labeled, easily accessible, and fully integrated with other data types. Our interdisciplinary team of plant scientists and software developers has extensive experience building user-friendly, open-source data management and analysis tools that have been widely adopted by plant breeders and geneticists. This project focuses on adding support for HDPs to these tools through the following objectives:A.Generate appropriate data models for HDPs that accurately represent data structure and metadata for plant breeding and genetics use cases;B.Develop Breeding Application Programming Interface (BrAPI) standards to efficiently handle each HDP data type;C.Design and implement HDP storage structures in a BrAPI-enabled breeding database;D.Integrate HDP BrAPI endpoints into widely-used plant breeding data collection tools;E.Develop HDP BrAPI-enabled analysis applications (BrAPPs) that integrate omics and other plant breeding data types.The tools developed from this project will provide the digital ecosystem necessary to handle and integrate HDP data with other plant breeding data types, ultimately accelerating the development of improved crop varieties for food, fiber, and fuel. This proposed project addresses the AFRI Program Priority Areas of Plant Health and Production and Plant Products and Agriculture Systems and Technology.
Animal Health Component
(N/A)
Research Effort Categories
Basic
(N/A)
Applied
(N/A)
Developmental
100%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
20424101081100%
Goals / Objectives
The overall aim of this project is to increase the accessibility of high-dimensional phenotypes (HDPs)for plant breeding programs through the development of tools for HDP data collection, transfer, storage, and analysis.Specific objectives include:Generate appropriate data models for HDPs (spectral, transcriptomic, metabolomic, proteomic, and microbiome data) that accurately represent data structure and associated metadata for plant breeding and genetics use cases.Develop Plant Breeding API (BrAPI) standards to efficiently handle HDPs based on the data models for each data type.Design and implement HDP storage structures in BrAPI-enabled breeding databases.Integrate HDP BrAPI calls into widely-used plant breeding data collection tools.Develop HDP BrAPI-enabled analysis applications (BrAPPs) that integrate omics and other plant breeding data types.
Project Methods
OBJECTIVE A.The Plant Breeding API (BrAPI) and Breedbase require well-defined data structures and comprehensive, descriptive metadata for the transfer, storage, and analysis of phenotypic data. We have developed a generic high dimensional phenotype (HDP) data framework that captures the necessary metadata for HDP experiments. This framework has already been used to generate data models for near-infrared spectra, metabolomics, and transcriptomics HDP data types. In this objective, we will consult with domain experts to map domain-specific metadata terms onto our generic data model for the remaining HDP data types, proteomics and metagenomics. These data models will be referenced in each of the subsequent objectives. We define HDP data models according to the following levels: field trial, sampling trial, protocol, phenotype annotation, and data matrix.This objective will be co-led by PD Hershberger and co-PI Jannink, with support from the entire project team. The project team has been meeting weekly for the past two years to discuss HDP data models and their translation into BrAPI standards and data storage systems. Hershberger and Jannink will lead these regular group meetings and coordinate discussions with domain experts to further define specific HDP data models.OBJECTIVE B.The BrAPI specification describes a standard Representational State Transfer (REST)ful web service API for breeding data.We have developed data models and endpoints for describing NIRS, metabolomics, and transcriptomics data structures using a shared design pattern with specific details defined for unique pieces of metadata. This design pattern uses well-defined data models that follow the level concepts described above and defines a set of standard RESTful endpoints for each HDP data type. This set of endpoints allows a client application to search for and retrieve metadata from the Protocol and Phenotype annotation levels, as well as retrieve a whole or subset of a data matrix.The proteomic and metagenomic BrAPI specification development process will follow the same process used for developing the existing HDP specifications. A developer will translate the data fields from the data models (Objective A) into technical specification documents. The technical specification documents use the OpenAPI3 standard and are written in the YAML format. The developer will take the list of data fields and construct a JSON schema model for a protocol-level object and a Phenotype Annotation-level object specific to the HDP data type. Endpoint definitions to search and retrieve the data will be added across the data types. The finished product will be a collection of OpenAPI3 YAML files with all the endpoint and data model definitions that can be readily shared and displayed online with tools like Apiary and SwaggerHub.We will test the specification thoroughly byevaluatingthe structure and content of the data being retrieved by the API while it is being built, checking to ensure all the fields are populated and have the expected content.This objective will be led by Key Personnel Selby with support from Co-PD Mueller and co-PD Rife.OBJECTIVE C.Breedbase provides a BrAPI-compliant digital ecosystem for plant breeding and genetics research. Support for spectral data upload and storage was recently added to Breedbase by our team through a collaboration between co-PD Mueller and PD Hershberger. In this Objective, we will expand Breedbase HDP data storage capabilities for new data types and implement the new HDP BrAPI calls (Objective B) directly into the Breedbase platform.Guided by the data models generated in Objective A, storage and upload features for metabolomic, transcriptomic, proteomic, and metagenomic data types will be added to Breedbase. These new HDPs will be stored in a special JavaScript Object Notation (JSON) format in the Postgres database following the previously developed example of the spectral data type. The JSON format is a type of NoSQL database that is integrated into data fields in the database, creating a database with both SQL and NoSQL features. HDP JSON objects will be linked to other levels of data type-specific metadata and relevant objects in the database, as described by the data models generated in Objective A. HDP observations will be linked to general metadata, including the observation unit, operator, timestamp, and geolocation. This information is stored in a relational database schema based on the Chado Natural Diversity schema.HDP BrAPI endpoints will be implemented directly into the underlying Breedbase code. Breedbase is built on Catalyst, an open-source web framework with excellent support for parsing complex RESTful URLs that has enabled the robust implementation of BrAPI.An updated Breedbase build is released nearly every two weeks. The development process heavily relies on the issue, project management, automated builds, and code release tools available from GitHub.This objective will be led by co-PD Mueller with support from Key Personnel Selby.OBJECTIVE D.The PhenoApps suite is a collection of open-source Android apps designed that are easily integrated into plant breeding workflows for data collection and management. Field Book, the flagship app in this suite, is a digital equivalent of traditional paper field books used in plant breeding. Field Book supports the BrAPI v2.0 standard. In this objective, we will integrate support for portable spectrometers into Field Book along with the HDP BrAPI endpoints necessary to transmit spectral data to BrAPI-compliant databases.This objective will be led by co-PD Rife with support from Key Personnel Selby.OBJECTIVE E.BrAPI-enabled applications (BrAPPs) are standalone tools that pull data using BrAPI for analysis or visualization. They can run independently or easily integrate into existing data analysis pipelines and BrAPI-compliant databases.For this objective, we will develop BrAPPs that allow plant breeders to utilize HDPs more readily for selection and analysis with a focus on helper tools, quality control (QC), and low-dimensional phenotype prediction.To develop and deploy helper BrAPPs efficiently, we will first work toward adding HDP-specific BrAPI-based R functions to existing BrAPI R packages such asbrapiandQBMS. Integrating these functions into publicly available packages will enable existing R-based tools to be extended to support BrAPI communication. We will develop a BrAPP to impute missing HDP phenotype observations and present the user with sample-by-phenotype cases of deviations between observed and predicted values based on a statistically definable threshold. We will design a BrAPP that facilitates routine sample clustering, allowing users to select and visualize clusters in lower dimensions and select, filter, or subset individual clusters for further analysis and investigation.We will develop a BrAPPbased on the Breedbase spectral prediction tool. The underlying R package,waves, is maintained by PD Hershberger. As part of this objective, the Breedbase spectral prediction tool will be refactored into a standalone, BrAPI-compliant R Shiny application that will enable compatibility with any BrAPI-compliant database.Multi-omic prediction will be supported through a BrAPP that combines functionality from existing, non-BrAPI-compliant analysis tools. Breedbase currently contains a genomic relationship matrix calculatorthat will be converted to a standalone BrAPI-compliant R Shiny application and further extended to calculate relationship matrices from HDPs that allow analyses complementary to traditional genomic BLUP, including phenomic predictionand transcriptomic prediction. A MegaLMM R package is available on GitHub with a permissive MIT licenseand will be integrated into an existing genomic prediction tool in Breedbase that will be further updated to support BrAPI.This objective will be led by PD Hershberger and Key Personnel Jannink and supported by Co-PD Mueller.

Progress 09/01/24 to 08/31/25

Outputs
Target Audience:The target audience during this reporting period was high-dimensional plant phenotype domain experts, mainly university faculty working adjacent to the plant breeding space. As the project continues, the target audience will broaden to include plant breeders working with a wide range of crops. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Project personnel participated in several training and professional development activities during the reporting period. Internal training occurred through expert consultations on metabolomics and microbiome data models, which provided feedback that directly shaped project outputs while building cross-disciplinary knowledge between plant and computer scientists. Professional development opportunities supported technical skill development in high-dimensional phenotype (HDP) data management and integration, as well as broader growth in plant breeding and informatics. Co-PI Lukas Mueller, collaborator Peter Selby, and team members Chaney Courtney and Ben Maza attended the 2025 BrAPI Hackathon in Los Baños, Philippines, where Selby introduced the project and Maza presented on BrAPI use with HDPs. The hackathon fostered collaborative software development and offered professional visibility within the global BrAPI community. Additional professional development occurred through conference participation. PD Jenna Hershberger delivered an invited oral presentation at the 2025 National Association of Plant Breeders (NAPB) Annual Meeting. Postdoc McKena Wilson presented posters on phenotypic data infrastructure at the Clemson University Postdoctoral Research Symposium and the NAPB Annual Meeting. Maza also gave an invited seminar at the Boyce Thompson Institute, strengthening communication skills with diverse audiences. Collectively, these activities advanced technical expertise, communication, and professional networks for the early-career scientists on the project. Hershberger, J. (2025 May 21). Building phenomics capacity in a new public vegetable breeding program. [Oral Presentation]. 2025 National Association of Plant Breeding Annual Meeting, Kona, HI, USA. Maza, B. (2025, June, 2-6). Using BrAPI with High Dimensional Phenotypes [Oral Presentation]. BrAPI Hackathon, Los Baños, Philippines. Maza, B. (2025, July, 7). Helping Plant Breeders Manage Complex Data [Oral Presentation]. Boyce Thompson Institute, Ithaca, NY, USA. Wilson, M., Courtney, C., Jannink, JL., Maza, B., Mueller, L., Powell, A., Rife T., Selby P., Hershberger, J. (2025, April, 22). Managing the Complexity: Phenotypic Data Infrastructure for Next-Gen Plant Breeding [Poster]. Clemson UniversityPostdoctoral ResearchSymposium, Clemson, SC, USA. Wilson, M., Courtney, C., Jannink, JL., Maza, B., Mueller, L., Powell, A., Rife T., Selby P., Hershberger, J. (2025, May, 19-23). Managing the Complexity: Phenotypic Data Infrastructure for Next-Gen Plant Breeding [Poster]. 2025 National Association of Plant Breeders Annual Meeting, Kona, HI, USA. How have the results been disseminated to communities of interest?Results and project activities were disseminated through Zoom calls with HDP domain experts and invited and contributed presentations and posters (see professional development section above). At the 2025 NAPB Annual Meeting, PD Hershberger's invited talk highlighted project goals and early outcomes for the broader plant breeding community. Wilson's poster presentations at NAPB and Clemson's Postdoctoral Research Symposium further shared project progress with meeting attendees. International dissemination occurred through project participation in the 2025 BrAPI Hackathon in the Philippines. By contributing project use cases and demonstrations, the team engaged directly with a global network of plant breeding software developers. Maza's invited seminar at the Boyce Thompson Institute also provided an opportunity to communicate project goals and progress to potential software users and researchers outside the immediate plant breeding community. What do you plan to do during the next reporting period to accomplish the goals?Objective 1: Generate appropriate data models for HDPs We will develop a draft data model for proteomic data and review it with proteomics domain expert Dr. Aleksandra Skirycz (Michigan State University) to ensure it accurately represents data structure and metadata. Previously developed data models for metabolomic, transcriptomic, and microbiome metagenomic data will be reviewed with domain experts to incorporate feedback and ensure alignment with research workflows. Objective 2: Develop BrAPI standards for HDPs BrAPI calls for storing and retrieving HDP data within Breedbase will be refined, incorporating feedback from domain experts. This includes finalizing and testing the user interface for uploading and downloading metabolomic, transcriptomic, and microbiome data. Objective 3: Design and implement HDP storage structures in BrAPI-enabled breeding databases We will continue improving data management within Breedbase, enhancing user interface features for organizing and monitoring uploaded HDP datasets to ensure efficient storage and accessibility. Objective 4: Integrate HDP BrAPI calls into widely used data collection tools We will release an updated Field Book app with spectrometer integration features developed through this project and will update Field Book documentation to support adoption by users. We will begin work to support spectral data transfer from Field Book to Breedbase via BrAPI. ?Objective 5: Develop HDP BrAPI-enabled analysis applications (BrAPPs) We will continue development of BrAPPs for HDP data analysis and visualization, including spectral prediction, filtering, clustering, principal component analysis, heatmaps, dataset creation, and plotting based on user-selected parameters. These tools will be informed by features in the existing (but non-BrAPI-enabled) Sol Genomics Expression Atlas and Breedbase spectral data analysis tools and ongoing consultation with domain experts. Implementation of HDP support in the QBMS R package will also continue in collaboration with Khaled Al-Sham'aa (ICARDA). In addition to addressing these specific objectives, the project team will continue to participate in professional development, training, and dissemination activities to support skill growth and community engagement.

Impacts
What was accomplished under these goals? The overarching goal of this project is to increase the accessibility and usability of high-dimensional phenotypes (HDPs), including spectral, transcriptomic, metabolomic, proteomic, and microbiome data, for plant breeding programs by developing tools for their collection, transfer, storage, and analysis. Significant progress has been made toward this aim across several objectives during the reporting period. To support Objective 1, we first developed introductory materials to share with domain experts in order to guide discussions and gather feedback. We then met with experts in plant metabolomics (Dr. Lauren Brzozowski, University of Kentucky, and Dr. Gaurav Moghe, Cornell University) and microbiome and metagenomics (Dr. Jason Wallace, University of Georgia) to review proposed metadata fields and formats. These consultations informed the development of draft data models for metabolomics and microbiome metagenomic data that capture both data structure and metadata required for plant breeding and genetics applications. In addition, we finalized the spectral and transcriptomic data models that had been drafted prior to the project start, ensuring that they were well aligned with domain-specific needs and workflows. For Objective 2, we worked toward extending Plant Breeding API (BrAPI) standards to accommodate HDPs by translating the data models developed in Objective 1 into draft BrAPI standards. We have posted these draft standards online for use by other BrAPI developers at https://brapinewconceptpreview.docs.apiary.io/#/reference/high-dimensional-phenotypes. In alignment with Objective 3, we implemented BrAPI calls for spectral, transcriptomic, and metabolomic data within Breedbase, enabling the upload, download, and retrieval of these data types. This functionality was extended through several contributions to the Sol Genomics Network GitHub repository, with 8 pull requests referencing "NIRS" and 10 referencing "BrAPI" during the reporting period. For example, pull request 5491 (https://github.com/solgenomics/sgn/pull/5491) added a new spectral data section to the trial detail page, while pull request 5381 (https://github.com/solgenomics/sgn/pull/5381) created a landing page for the management of transcriptomic data. Progress toward Objective 4 was achieved by integrating support for mobile spectrometers and colorimeters into the Field Book Android application, including implementation of a snowflake database structure to facilitate current and future device integration. Testing is underway to finalize these updates prior to release. Finally, to advance Objective 5, a BrAPI-enabled application was developed in Breedbase to visualize and plot spectral data for individual samples within field trials, providing a proof of concept for analysis tools that integrate HDPs with other breeding data types. To support further BrAPI application (BrAPP) development, we are working to integrate HDP BrAPI calls into QBMS, an R package for BrAPI, with contributions planned in coordination with package developer Khaled Al-Sham'aa (ICARDA). We also consulted with Dr. Trupti Joshi (University of Missouri) to explore opportunities for incorporating BrAPI into her existing web-based HDP analysis tools, such as G2PDeep, SoyKB, and KBCommons. Collectively, these accomplishments demonstrate strong progress toward developing robust, standardized frameworks for managing HDPs within plant breeding software ecosystems, while building broad community engagement to ensure that the tools are well aligned with domain-specific workflows and priorities. Progress to date is in line with the proposed project timeline.

Publications