FACT: Enabling Association Mapping And Landscape Genomics Through The Advanced Integration Of Genotype, Phenotype, And Geospatial Data

FACT: ENABLING ASSOCIATION MAPPING AND LANDSCAPE GENOMICS THROUGH THE ADVANCED INTEGRATION OF GENOTYPE, PHENOTYPE, AND GEOSPATIAL DATA

Sponsoring Institution

National Institute of Food and Agriculture

Project Status

COMPLETE

Funding Source

AFRI COMPETITIVE GRANT

Reporting Frequency

Annual

Accession No.

1019897

Grant No.

2019-67021-29920

Cumulative Award Amt.

$500,000.00

Proposal No.

2018-09223

Multistate No.

(N/A)

Project Start Date

Aug 1, 2019

Project End Date

Sep 30, 2023

Grant Year

2019

Program Code

[A1541]- Food and Agriculture Cyberinformatics and Tools

Recipient Organization
UNIV OF CONNECTICUT
438 WHITNEY RD EXTENSION UNIT 1133
STORRS,CT 06269

Performing Department
Environmental & Evol. Biology

Non Technical Summary
Funding agencies have made significant investments to mitigate losses and improve production of economically and ecologically important species, including support for population genetic studies. These require generation and integration of genotype, phenotype, and environmental data. The associated datasets must also be connected to analytical pipelines supported by high performance computing to contend with high throughput technologies. Currently, applications for the collection of this data and associated metadata are rare, and not universally adopted. Moreover, the outcomes of large-scale studies, revealing potential adaptive and causal variants, are often lost between genome versions and intermediate assemblies.We will develop the first web-based application that integrates genotype-phenotype metadata and data for model and non-model plant systems in a geospatial context. The field to analysis framework will connect data collection, data submission, ontology-based metadata annotation, storage, and exchange directly to high performance computing for association/landscape genomics analysis to examine adaptive potential, genotypic diversity from wild accessions, impact of invasive species, and productivity of breeding populations. This application is being developed within the open-source Tripal framework which represents a federation of over 25 plant and animal genomics/phenomics databases.Plant health, productivity, and biogeographical response to environmental challenges have consequences beyond food and timber production: they affect the environment, spanning areas of biodiversity, carbon cycling, and planetary health. Providing access to high quality genotypic, phenotypic and environmental data in a semantically aware, geospatial-enabled, web-based platform connected to high performance computing resources will enable interrogation, storage, and exchange of data from large-scale association studies.

Animal Health Component

10%

Research Effort Categories

Basic

40%

Applied

10%

Developmental

50%

Classification

Knowledge Area (KA)	Subject of Investigation (SOI)	Field of Science (FOS)	Percent
201	0430	1080	25%
202	0699	1081	25%
136	0699	1070	25%
123	0199	1060	25%

Knowledge Area
136 - Conservation of Biological Diversity; 202 - Plant Genetic Resources; 201 - Plant Genome, Genetics, and Genetic Mechanisms; 123 - Management and Sustainability of Forest Resources;

Subject Of Investigation
0430 - Climate; 0199 - Soil and land, general; 0699 - Trees, forests, and forest products, general;

Field Of Science
1060 - Biology (whole systems); 1070 - Ecology; 1081 - Breeding; 1080 - Genetics;

Keywords

Goals / Objectives
The objective of this proposal is to develop CartograTree, the first web-based application that integrates genotype-phenotype metadata and raw data for georeferenced individuals from model and non-model systems. The software will provide an interactive geospatial interface and access to high performance computing for real-time analysis of important biological questions. This framework will be implemented for tree databases but also developed as an independent Tripal module, CartograPlant. Goal 1: Leverage Tripal3 enabled databases to build a FAIR model for data submission, storage, and validation across genotype, phenotype, and environmental data that can be applied broadly to plants Goal 2: Enable a Web-based association and landscape genetics workflows that can interpret heterogeneous input data and provide a robust data filtering, visualization, and analytic framework via the Tripal GalaxyGoal 3: Integrate diverse environmental layers to provide rapid access to high resolution data for georeferenced individuals that can be directly assessed for landscape genomicsGoal 4: Train and educate researchers in landscape genomics via TreeSnap and CartograTree as well as Tripal plant database providers with CartograPlant.

Project Methods
We will build a global capability for landscape genomics and association mapping by the following six steps:1. Develop and test metadata collection, identifier assignment, and machine-actionable datasets with collaborator data for the Tripal Plant PopGen Submit module2. Mapping of the scientific protocols for AM and landscape genomics to informatic workflows3. Develop new approaches to facilitate geospatial data discovery and integration4. Integrate with TreeSnap to provide a field to data connection with CartograTree for scientists5. Architect and implement the major components: Web-based, interactive geospatially-aware GWAS for forest trees1. Tripal Plant PopGen Submit PipelineThe current TGDR pipeline serves as a basic but not FAIR representation for data submission for forest tree species. The proposed Tripal Plant PopGen Submit (TPPS) pipeline will provide an interface for the collection of genotypic, phenotypic, and environmental data from any georeferenced (exact or regional) plant focused experiment. Once the user is logged in to their Tripal database profile, TPPS will ask the user specific questions on the design of the study, starting with the data that was obtained (what combination of genotype, phenotype, and environmental data). The user will be asked to provide a text file to identify the species and georeferenced location of the trees and/or sites that were evaluated. This will be validated as machine-readable before proceeding. Metadata on the analytical process will be recorded through directed questions relating to: tools, statistical assumptions, and population/kinship analysis. Following successful submission, the researcher will be provided with a long-term TreeGenes accession number that is associated with DOI generated via Zenodo.2. Informatic components supporting Association Mapping and Landscape Genomics Submission of the G/P/E variety will take place via TPPS which will be hosted at TreeGenes. In addition to direct submissions, the source databases will import data from TreeSnap (georeferenced phenotype), Dryad (georeferenced genotype/phenotype), and TRY-DB (georeferenced phenotype). Any public user of CartograTree will be able to browse, query, and analyze the public datasets.Genotypes: The majority of the analytical functionality will focus on SNP markers, the predominant marker type for fine-scale mapping. Upon selection of a set of individuals, users will view a SNP summary of shared SNPs with statistics on: missing data, allele frequencies, and previously identified associations. Metadata associated with the genotypes includes the sequencing technology, bioinformatic processes used to identify the polymorphism, physical/relative locations, and validation status. The metadata for a given association will include: imputation techniques applied, analysis model used, covariates (kinship and population structure), and multiple-testing methods employed.Phenotypes: High-throughput phenotyping (phenomics) as well as traditional phenotyping metrics are housed in a variety of different repositories and formats. CartograTree will focus on phenotypic data retrieved from three different derivative sources. For the purpose of this application, our goal is to leverage the growing set of repositories that are enabling a variety of reporting standards.Analysis: After filters are applied, the user may access the active search or combine across several saved searches associated with their profile. The combined datasets, which may include any combination of genotype, phenotype, and environmental data, is pre-processed prior to uploading it for analysis on a high-performance computing cluster (HPC). The location of the Galaxy run will be transparent to the user but the design is optimized to resolve significant loads on the local UCONN clusters. Since TPPS submitted datasets contain information on the study design (common garden, landscape, etc.), the workflows will prompt the user with recommended processes based upon the metadata available. Once the analysis is complete, CartograTree will download the results from the Galaxy instance into the user's profile, and convert them to a layer vector format, which the user can display on the map as an overlay.3. Develop new approaches to facilitate geospatial data discovery and integration.In CartograTree, the Mapnik/Renderd/Mod_tile framework will allow tiling of OSM, and GeoServer with GeoWebCache, will allow tiling of the other layers (both raster and vector formats) to optimize performance. Different functionality is provided by GeoServer based on the type of layer. In addition, GeoServer enables the cross-query of the data in vector layers through the Extended Common Query Language (ECQL). We will extend the functionality of GeoServer to allow querying raster layers as well.4. Architect and implement the major components: Web-based, interactive geospatially-aware GWAS for forest trees:(1) The user experience (UX) is an immersive full browser window view of the geographical mapping area. Client-side architecture is a SPA (Single Page Application), whereby page elements are updated continuously and asynchronously. The user does not navigate over multiple pages, but engages all interactions on the opening page. The SPA is enabled by a JavaScript MVC (Model-View-Controller) framework (2) using jQuery, Backbone, and Bootstrap for scaffolding and user interface (UI) components, and MapBox GL JS and other libraries for map tiling interaction. Industry-standard HTML5, CSS3, and JavaScript AJAX technologies are supported by all major browsers. The client-side MVC framework communicates directly with tiling servers for tiling maps (3). Custom user layers will be pre-processed into mapping tiles and hosted along with open source OpenStreetMap tiles on either our own open source GeoServer tiling server or on MapBox. User layers are available for geo-queries (4). For CartograTree persistence, we will use a NoSQL JSON data store such as MongoDB, specifically for high density genotype and environmental data. This acts as an interface with backend source data, yielding performance benefits that are requisite for responsive Web applications. GWAS data is derived from TreeGenes and HWG submissions via TPPS, and pre-processed for Web application querying and delivery (5). Lastly, best-practice is to segregate Web servers and analytic servers. (6). Specific workflows around PLINK will be developed with support from Galaxy workflows and the Tripal Gateway. The workflow results are stored locally in the user's profile and pushed out to CyVerse with a connected profile, for longer term storage.5. Advance and integrate TreeSnap, a citizen science and phenotyping mobile platform:TreeSnap, a forest health mobile application developed by Co-PI Staton, aims to connect the public with ongoing tree research programs. TreeSnap uses the ubiquity of smartphones to engage the public in scouting for trees affected by invasive insects and diseases. The TreeSnap mobile application, available for free on iOS and Android, has an intuitive interface, allowing users to take photos, collect GPS coordinates, and report on tree features. This proposal will support an established connection (API) between TreeSnap and CartograTree, to provide a single interface with access to the geolocated data from both sources.

Progress 08/01/19 to 09/30/23

Outputs
Target Audience:The CartograPlant project focused on serving the needs of researchers in academia, government, and NGOs interested in integrated assessments of plant populations through well connected genotype, phenotype, and environmental metrics and in a georeferenced framework. The software also provided opportunities for land owners and citizen scientists to connect to provide data through TreeSnap. This project enabled expansion of the TreeSnap mobile utility in regards to metadata collection and customization for researcher-led collection projects. The CartograPlant projectconnectedwith the larger database community, through its implementation as a Tripal (tripal.info) module. In addition, the HPC capacity and analytic workflows were developed through the open-source bioinformatic framework,Galaxy (https://usegalaxy.org/). There was also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, and other members of the team, to engage the community. In specific, we hosted a total of five training sessions (through AG2PI, Botany, Plant and Animal Genome, and two direct offerings).In total, these trained over 187 end users.Irene, as well as our lead database developer, Emily Grau, also served on the genotype-phenotype committee through the AgBioData Working Groups that met weekly to develop best practices for data - the audience here includes other databases hosting species of agricultural interest. CartograPlant was presented at a total of 11 different conferences/meetings. Three unique competitions were run via social media (primarily on Twitter/LinkedIn) to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. We also expanded our biocuration team and expanded efforts to include the curation of environmental layers, public population genetic studies, reference genomes/annotations, and also manual curation of trait data. The biocuration training model included over 10 undergraduates during the duration of the project. The students work together as acoordinated team with a leader that trains new students on biocuration best practices. Finally, we supportedseveral different teams the World Forest ID organization that tracks illegal logging, the US Forest Service, two different large-scale JGI projects hosting forest tree data, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The finalyear of the project provided mentorship fora team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with two competitions that were run during the time of plant-based conferences (SFTIC and WFGA this past Summer).In addition, Dr. Herndon (co-PI) mentored two undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton's laboratory, the new technical developer for TreeSnap, was trained in the development of high quality software documentation. How have the results been disseminated to communities of interest?In the final year of the project,we focused on community training through virtual workshops that would be accessible to users everywhere. We hosted these demonstrations and workshops through the USDA funded AG2PI network, as well as the AgBioData consortium, SFTIC annual meeting, and through two independent virtual offerings. Training included walk throughs of both association mapping and landscape genomics workflows, including data filtering. We also hosted a total of two contests for data submission (study submission) that the biocuration team coordinated. We also presented this work at three different conferences (Plant and Animal Genome, SFTIC, The Nature Conservancy, and WFGA). We are also involved in two other USDA-NIFA funded initatives, including AG2PI as well as the AgBioData community with a focus on genotype/phenotype FAIR data standards. Finally, we continued our support of the USDA efforts for World Forest ID with application of CartograPlant to track illegal logging. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? In the final year, the team continued their focus on the workflow development for CartograPlant. Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. Wedeveloped new interactive visualization methods for SNP imputation to assist researchers in determining optimal thresholds. We also developed new visualizations for environmental data, including improvements in the intersections across map layers. We optimized the environmental metrics as well to speed the downstream analytic workflow for landscape genomics. We also completed a full re-design of the front page menuing from feedback received from the advisory committee. Trainingto the community included two dedicated virtual workshops that were broadly advertised to the research community. These were led by the primary postdoc developer and lead database administrator. We also trained members of the Nature Conservancy, Timber Tracking networks, and student researchers.

Publications

Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: CartograPlant: Cyberinfrastructure to improve plant health and productivity in the context of a changing climate SFTIC Conference, June 2023
Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Integrated Platform for Association Mapping and Landscape Genomics. Plant and Animal Genome, January 2023
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Computational tools and infrastructure to improve forest health, Virtual Presentation for the IUFRO Tree Biotechnology Meeting, Harbin, China, July 2022
Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Transforming forest health through genomics, Virtual Presentation for the Atlanta Botanical Garden, February 2023.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2023 Citation: Seeing the Forest for the Trees: Bioinformatics Solutions to Conserve Biodiversity, Women in Bioinformatics Keynote, New Haven, CT, November 2023

Progress 08/01/21 to 07/31/22

Outputs
Target Audience:In Year 3, the CartograPlant project continued to focuson connections with the Tripal (tripal.info) / Galaxy (https://usegalaxy.org/) community to facilitate workflow development. There was also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, to engage the community. In specific, we hosted a virtual training session in June2022 and had a total of 40 attendees. Irene also presented a virtual presentation for Botany 2022 (A hybrid conference in July 2022). Irene also served on the genotype-phenotype committee through the AgBioData Working Groups that met weekly to develop best practices for data - the audience here includes other databases hosting species of agricultural interest.Two unique competitions were run via social media (primarily on Twitter) to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. We also expanded our curation team and expanded efforts to include the curation of environmental layers, public population genetic studies, and also manual curation of trait data. We now employ a total of five undergraduates who work in a coordinated team with a leader that trains new students on biocuration best practices.Finally, we continued to interact with several different teams - the World Forest ID organization that tracks illegal logging, the US Forest Service, two different large-scale JGI projects hosting forest tree data, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Most of the team continued from the second year into the third. The 3rd year of the project provided mentorship for postdoc Dr. Irene Cobo-Simon as well as a team of sixundergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with twocompetitions that were run during the time of plant-based conferences (Botany and NAFGS this past Summer). This past year, I enacted a new system of team-based feedback for the biocuration team so that they could provide direct feedback on data integrity aspects of the interface. In addition, Dr. Herndon (co-PI) mentored two undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap and has continued to provide support for new trait-based ontologies. How have the results been disseminated to communities of interest?In Year 3, we focused on community training through virtual workshops that would be accessible to users everywhere.We hosted two of these workshops (about three hours in length each). All PIs participated and the lead postdoc led most of the session.Training included walk throughs of both association mapping and landscape genomics workflows, including data filtering. We also hosted atotal of two contests for data submission(study submission) that the biocuration team coordinated.We also presented this work at three different conferences (Botany, NAFGS, and Plant and Animal Genome). We are also involved in two other USDA-NIFA funded initatives, including AG2PIas well as the AgBioData community with a focus on genotype/phenotype FAIR data standards. Finally, we continued our support of the USDA efforts for World Forest ID with application of CartograPlant to track illegal logging. What do you plan to do during the next reporting period to accomplish the goals?During our final year of the project, we will host an in-person workshop at Plant and Animal Genome (January 2023). We will also publish two papers on CartograPlant to demonstrate its utility as a meta-analysis tool - with a specific focus on the ability to confirm genotype-phenotype associations across studies and identify new associations. We will provide a set of updated training materials through our project page Git and lead one additional Virtual Workshop. We will also work directly with our advisory board to review the workflows that are available in CartograPlant and seek feedback on the exposed parameters, visualizations, and overall usability.

Impacts
What was accomplished under these goals? In Year 3, the team continued their focuson the workflow development for CartograPlant. Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. We specifically focused on improving the process for population structure and have included an automated process that generates study-specific and species-specific structure maps after the upload of a new study. These layers are now available instantly to the end users. The second primary focus was visualization. Here, we developed new interactive visualization methods for SNP filtering that allowed more iterative adjustment to the datasets and allowed the user to change parameters after examining the impact in preliminary analysis. We also developed visualizations for environmental data, specifically correlation matricies to match what was available for phenotypes. We optimized the environmental metrics as well to speed the downstream analytic workflow for landscape genomics.Finally, the project focused on outreach to the community through two dedicated virtual workshops that were broadly advertised to the research community. These were led by the primary postdoc developer and ran for three hours each.

Publications

Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Integrated Platform for Association Mapping and Landscape Genomics. Plant and Animal Genome. January 2022 in the Tripal Workshop Session.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: CartograPlant: Cyberinfrastructure for genotype, phenotype, and environment. North American Forest Genetics Society. June 2022.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: TreeGenes and CartograPlant: Resources for Plant Conservation and Breeding. Botany 2022 Virtual Presentation. July 2022.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2022 Citation: Cyberinfrastructure to improve forest health and productivity. Virtual Keynote Presentation for the Australasian Plant Breeding Conference, May 2022.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2021 Citation: Integrated technologies to support forest tree biodiversity and conservation genomics, Virtual Presentation for Natural Resources Canada, Quebec City, Canada. December 2021.

Progress 08/01/20 to 07/31/21

Outputs
Target Audience:In Year 2, the CartograPlant project focused on connections with the Tripal (tripal.info) / Galaxy (https://usegalaxy.org/) community to facilitate workflow development. Therewas also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, to engage the community. In specific, we hosted a workshop during the Botany 2021 meeting (July2021) that had 32 attendees. We also participated in a Virtual Tripal Hackathon to work with other developers on challenges related to Tripal-Galaxy workflow implementation. We presented at the North American Forest Genetics Student Symposium Conference (May 2021). Three unique competitions were run via social media to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. This enhanced the repositories of self-described plant population studies. Finally, we interacted with several different teams - the World Forest ID organization that tracks illegal logging, the US Forest Service, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?Most of the team continued from the first year into the second.The secondyear of the CartograPlant project provided mentorship for postdoc Dr. Irene Cobo-Simon as well as a team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with three different competitions that were run during the time of plant-based conferences. In addition, Dr. Herndon (co-PI) mentored three undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap and has added in support for the ontologies that are being used as part of the MIAPPE standards in CartograPlant(https://treesnap.org/). Abdullah has also put together a secondary application, known as WildType that is ideal for users to connect directly to the TPPS system to upload their data. How have the results been disseminated to communities of interest?This year, we published a new review paper on Tripal which highlights the CartograPlant module. In addition, we interacted with the community through journal submission contests. A total of three contests were run during the time of virutal conferences where communication on Twitter was frequent. We also interacted with the Tripal community via the first virutal Hackathon as well as with the Galaxy community throught he community conference. The Botany 2021 workshop on Population Genetics and Landscape Genomics with CartograPlant was well inteded by faculty, postdocs, and graduate students. What do you plan to do during the next reporting period to accomplish the goals?The final year will provide an opportunity to publish two papers - one is the final stages of writing for submission to Global Change Biology descirbing the overall platform. The second will be use case on the utility of meta-analysis using the three systems described previously. This final year will also see expansion in the type of Galaxy workflows available and how the users interact with them. Finally, we will work with the community to test the utility of the UI as well as efficieny of the FAIR data submisison system in order to fine tune these processes. This year will also be dedicated to outreach to journals to ensure their paricipation in encouraging FAIR data submission. We will also continue to work with organizations such as the USFS to ensure that other key resources, such as FIA, can be explored and integrated in CartograPlant.

Impacts
What was accomplished under these goals? In Year 2, the team focused on the workflow development for CartograPlant - we worked with three different systems - Populus trichocarpa, Zea mays, and Vitis vinifera. In all cases, studies were loaded that had overlapping use cases (working at least partially from the same georeference population). Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. The workflows for this portion are now available in the local Galaxy instance that supports CartograPlant. Users with accounts can present data associated with loaded studies to the available analytics. The workflows are also partially implemented with visualizations that easy the burden of filter selection on the users. Visualizations present overviews of the missing data by individual and by loci. Finally, the project focused on outreach to the community through a Botany workshop, Tripal Hackathon, and the Galaxy Community Conference.

Publications

Type: Journal Articles Status: Published Year Published: 2021 Citation: Staton, M., Cannon, E., Sanderson, L. A., Wegrzyn, J., Anderson, T., Buehler, S., ... & Ficklin, S. (2021). Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Briefings in bioinformatics, 22(6), bbab238.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2021 Citation: Cyberinfrastructure for Plant Conservation and Breeding. Virtual Workshop for Botany 2021. July 2021.
Type: Conference Papers and Presentations Status: Accepted Year Published: 2021 Citation: Integrated Analysis for CartograPlant: The Galaxy of Plant Populations for Galaxy Community Conference 2021. June 2021.

Progress 08/01/19 to 07/31/20

Outputs
Target Audience:The target audience for the original CartograTree audience was the forest tree breeding community. As part of this proposal, we are extending the application to reach theplantbreeding and conservation research community.During year one, the project team expanded Tripal Plant PopGen Submit Pipeline to accept data from a wider range of plant systems which included updating the taxonomic definitions. This also required a new release to accommodate the new Minimal Information About a Plant Phenotyping Experiment (MIAPPE) standards. These updates were coordinated with end users and announced via our annual newsletter and coordinated with associated journals. Our team also coordinated and integrated data from the Botanical Information and Ecology Network (BIEN). This collaboration enabled full integration of species range maps and occurrance data from a global repository of plants. The CartograPlant team formed a biocuration team of undergraduates that was led and trained by our most experienced undergraduate biocuration team member. As part of this effort, we reached developed a social media campaign via Twitter (TreeGenesDB) and also through conferences, including the Plant and Animal Genome (PAG) Conference. Finally, we organized the first scientific advisory meeting to communicate with our stakeholders on the utility of the existing environmental layers. We focused on loading new layers associated with the NSF-funded NEON project and layers associated with finer resolution of soil and land-use types. Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The first year of the CartograPlant project provided mentorship for the newly hired lead postdoc - Dr. Irene Cobo-Simon as well as a team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. In addition, Dr. Herndon (co-PI) mentored three undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap application which was updated for the (https://treesnap.org/). How have the results been disseminated to communities of interest?Results have been disseminated through the public release of CartograPlant. Updates are communicated through Twitter, community mailing lists, and at the Plant and Animal Genome Conference (talk and computer demonstration). The project has already resulted in three publications highlighting CartograPlant and Galaxy-based analytics. What do you plan to do during the next reporting period to accomplish the goals?The primary focus for the upcoming year will be on Goal 2 as well as continuing our effots on Goal 4. Related to Goal 2, we will focus on developing robust workflows in Galaxy that are connected to intelligent analytics and interactive visualizations. Goal 4 will include more community-level training on the implemented workflows as well as feedback from the community on how to improve these.

Impacts
What was accomplished under these goals? The primary products from Year 1 include the fully public version of CartograPlant v2.0 (https://cartograplant.org/). The integration of 14 new environmental layers with a specific focus on the NEON, soil, and human impact global layers. The incorporation of full range maps for all plant species with associated data from the Botanical Information and Ecology Network (BIEN) database (https://bien.nceas.ucsb.edu/bien/). This release was also associated with a full update of the Tripal Plant PopGen Submit (TPPS - https://tpps.readthedocs.io/en/latest/intro.html) framework in our open-source database framework (Tripal v3.0) which is using the new Minimal Information About a Plant Phenotyping Experiment (MIAPPE - https://www.miappe.org/). This new TPPS released is available as a module for install in any of the over 30 active Tripal databases (https://tripal.info/).

Publications

Type: Journal Articles Status: Published Year Published: 2020 Citation: Wegrzyn, Jill L., Taylor Falk, Emily Grau, Sean Buehler, Risharde Ramnath, and Nic Herndon. "Cyberinfrastructure and resources to enable an integrative approach to studying forest trees." Evolutionary Applications 13, no. 1 (2020): 228-241.
Type: Journal Articles Status: Published Year Published: 2019 Citation: Wegrzyn, Jill L., Margaret A. Staton, Nathaniel R. Street, Dorrie Main, Emily Grau, Nic Herndon, Sean Buehler et al. "Cyberinfrastructure to improve forest health and productivity: the role of tree databases in connecting genomes, phenomes, and the environment." Frontiers in plant science 10 (2019): 813.
Type: Journal Articles Status: Published Year Published: 2020 Citation: Spoor, Shawna, Connor Wytko, Brian Soto, Ming Chen, Abdullah Almsaeed, Bradford Condon, Nic Herndon et al. "Tripal and Galaxy: supporting reproducible scientific workflows for community biological databases." Database 2020 (2020).
Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Wegrzyn, J.L. Integrating beyond genomics: Cyberinfrastructure for forest health and productivity. Virtual Presentation for the CAES, New Haven, Connecticut. August 2020.
Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Wegrzyn, J.L. Cyberinfrastructure for forest health: landscape genomics for the future. Ecological Institute at UNAM, Mexico City, Mexico. November 2019.
Type: Conference Papers and Presentations Status: Published Year Published: 2020 Citation: Grau E.S., Wegrzyn, J.L. Computer Demonstration: TreeGenes and CartograPlant. Plant and Animal Genome XVIII, San Diego, California. January 2020.