Source: UNIVERSITY OF CALIFORNIA, DAVIS submitted to NRP
PLATFORMS AND METHODS FOR SHARING AND COLLABORATION ON AG2P USING PUBLIC AND CONFIDENTIAL DATA
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
ACTIVE
Funding Source
Reporting Frequency
Annual
Accession No.
1031416
Grant No.
2023-70412-41054
Cumulative Award Amt.
$1,174,382.00
Proposal No.
2023-06068
Multistate No.
(N/A)
Project Start Date
Sep 15, 2023
Project End Date
Sep 14, 2026
Grant Year
2023
Program Code
[AG2PI]- Agricultural Genome to Phenome Initiative
Recipient Organization
UNIVERSITY OF CALIFORNIA, DAVIS
410 MRAK HALL
DAVIS,CA 95616-8671
Performing Department
(N/A)
Non Technical Summary
Data sharing and collaboration are of increasing importance to enable validation, further research, and joint analysis of multiple data sets. However, these processes are often complicated or even prevented because public data may have limited visibility and accessibility, and private data often contain confidential or proprietary information, especially in the case of industry data. Our proposed research takes a tiered approach to facilitate access to and sharing of data for genomic and phenomic analyses.
Animal Health Component
50%
Research Effort Categories
Basic
40%
Applied
50%
Developmental
10%
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
3037310108125%
3047310108125%
2017310108150%
Goals / Objectives
To effectively promote open science in agriculture while addressing confidentiality concerns, we advocate for the following multifaceted strategy: (1) fostering streamlined data sharing of public data, (2) innovating data sharing methods that protect confidentiality, and (3) enabling collaborative research without data sharing. With the long-term goal of enabling efficient and effective AG2P research and applications to advance livestock and crop production, these strategies form the first three specific aims of our proposal, with a strong integrated education component as the fourth aim.
Project Methods
Methods include developing datasets and resources to foster streamlined data sharing, using homomorphic encryption for confidentiality-preserving encrypted data sharing, employing federated and transfer learning for collaborative research without data sharing, and delivering educational resources.

Progress 09/15/23 to 09/14/24

Outputs
Target Audience:Our audience includes various groups, such as undergraduates, graduates, research scientists, plant and animal breeding professionals, and data-sharing stakeholders like field-based plant and animal breeders. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?We organized two stakeholder advisory meetings to discuss our progress and proposed solutions with stakeholders, including editors from scientific journals and representatives from plant and animal genetics companies. How have the results been disseminated to communities of interest?We organized two stakeholder advisory meetings to discuss our progress and proposed solutions with stakeholders, including editors from scientific journals and representatives from plant and animal genetics companies. What do you plan to do during the next reporting period to accomplish the goals?We will continue improving the database. To this end, we will reach out to various editors and colleagues, grant them access to the database, and seek their suggestions for improvements. Additionally, we will finalize and submit the manuscript currently in preparation and make the searchable database publicly available. Furthermore, we will continue to develop and validate the encryption methods and federated learning techniques.

Impacts
What was accomplished under these goals? We created the searchable database infrastructure, including: Definition of the metadata required to identify data sets and to enable searches; Created a database infrastructure; Developed a web interface for the searchable database; Deployed a beta version of the database online. We identified 137 publicly available genomic data sets and populated the data bases with the meta data for this data sets. For each data set we also created a data loader (accessible through the searchable database) that can be used to load into an R-environment the data set after downloading it. Thus, the searchable database is functional and populated with 137 data sets. We outline a manuscript (to be submitted as a 'resource' paper) that we aim to publish to introduce this resource in the genomic selection community. In this manuscript we will present an application of the use of this data set benchmarking standard genomic prediction models for 43 selected data sets. The analyses will serve two purposes: (i) as an illustration of possible uses of the searchable data base, and (ii) provide benchmarks that can be used for future methods and software research. To facilitate the use of these benchmarks we will share a doi with all the data sets used in the benchmarks formatted and ready to be analyzed. We further developed our statistical methods for data encryption and federated learning, and validated these methods using real data.

Publications

  • Type: Other Journal Articles Status: Published Year Published: 2024 Citation: Zhao, T., Wang, F., Mott, R., Dekkers., J., Cheng, H, 2024, Using encrypted genotypes and phenotypes for collaborative genomic analyses to maintain data confidentiality, Genetics
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Donna Li, Tianjing Zhao, Jack C M Dekkers, Richard Mott, and Hao Cheng, Using Encrypted Genotypes and Phenotypes for Reproducible and Collaborative Genomic Analyses to Maintain Data Confidentiality, AGBT Ag, 2024
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Gustavo de los Campos, Hao Cheng, Jack Dekkers, Juan Steibel, Regularization And Transfer Learning in Genomic Models Through Gradient Descent with Early Stopping, AGBT Ag, 2024
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Blessing Olabosoye, Hao Cheng, Jack Dekkers, Juan Steibel, Gustavo de los Campos, Transfer learning and meta-analysis strategies for optimizing genomic prediction accuracy without data sharing, AGBT Ag, 2024
  • Type: Conference Papers and Presentations Status: Accepted Year Published: 2024 Citation: Hao Cheng, Jack Dekkers, Juan Steibel, Gustavo de los Campos, Platforms and Methods for Sharing and Collaboration on Agricultural Genome to Phenome Using Public and Confidential Data, AGBT Ag, 2024