Progress 08/01/19 to 09/30/23
Outputs Target Audience:The CartograPlant project focused on serving the needs of researchers in academia, government, and NGOs interested in integrated assessments of plant populations through well connected genotype, phenotype, and environmental metrics and in a georeferenced framework. The software also provided opportunities for land owners and citizen scientists to connect to provide data through TreeSnap. This project enabled expansion of the TreeSnap mobile utility in regards to metadata collection and customization for researcher-led collection projects. The CartograPlant projectconnectedwith the larger database community, through its implementation as a Tripal (tripal.info) module. In addition, the HPC capacity and analytic workflows were developed through the open-source bioinformatic framework,Galaxy (https://usegalaxy.org/). There was also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, and other members of the team, to engage the community. In specific, we hosted a total of five training sessions (through AG2PI, Botany, Plant and Animal Genome, and two direct offerings).In total, these trained over 187 end users.Irene, as well as our lead database developer, Emily Grau, also served on the genotype-phenotype committee through the AgBioData Working Groups that met weekly to develop best practices for data - the audience here includes other databases hosting species of agricultural interest. CartograPlant was presented at a total of 11 different conferences/meetings. Three unique competitions were run via social media (primarily on Twitter/LinkedIn) to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. We also expanded our biocuration team and expanded efforts to include the curation of environmental layers, public population genetic studies, reference genomes/annotations, and also manual curation of trait data. The biocuration training model included over 10 undergraduates during the duration of the project. The students work together as acoordinated team with a leader that trains new students on biocuration best practices. Finally, we supportedseveral different teams the World Forest ID organization that tracks illegal logging, the US Forest Service, two different large-scale JGI projects hosting forest tree data, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The finalyear of the project provided mentorship fora team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with two competitions that were run during the time of plant-based conferences (SFTIC and WFGA this past Summer).In addition, Dr. Herndon (co-PI) mentored two undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton's laboratory, the new technical developer for TreeSnap, was trained in the development of high quality software documentation. How have the results been disseminated to communities of interest?In the final year of the project,we focused on community training through virtual workshops that would be accessible to users everywhere. We hosted these demonstrations and workshops through the USDA funded AG2PI network, as well as the AgBioData consortium, SFTIC annual meeting, and through two independent virtual offerings. Training included walk throughs of both association mapping and landscape genomics workflows, including data filtering. We also hosted a total of two contests for data submission (study submission) that the biocuration team coordinated. We also presented this work at three different conferences (Plant and Animal Genome, SFTIC, The Nature Conservancy, and WFGA). We are also involved in two other USDA-NIFA funded initatives, including AG2PI as well as the AgBioData community with a focus on genotype/phenotype FAIR data standards. Finally, we continued our support of the USDA efforts for World Forest ID with application of CartograPlant to track illegal logging. What do you plan to do during the next reporting period to accomplish the goals?
Nothing Reported
Impacts What was accomplished under these goals?
In the final year, the team continued their focus on the workflow development for CartograPlant. Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. Wedeveloped new interactive visualization methods for SNP imputation to assist researchers in determining optimal thresholds. We also developed new visualizations for environmental data, including improvements in the intersections across map layers. We optimized the environmental metrics as well to speed the downstream analytic workflow for landscape genomics. We also completed a full re-design of the front page menuing from feedback received from the advisory committee. Trainingto the community included two dedicated virtual workshops that were broadly advertised to the research community. These were led by the primary postdoc developer and lead database administrator. We also trained members of the Nature Conservancy, Timber Tracking networks, and student researchers.
Publications
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
CartograPlant: Cyberinfrastructure to improve plant health and productivity in the context of a changing climate
SFTIC Conference, June 2023
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
Integrated Platform for Association Mapping and Landscape Genomics. Plant and Animal Genome, January 2023
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Computational tools and infrastructure to improve forest health, Virtual Presentation for the IUFRO Tree Biotechnology Meeting, Harbin, China, July 2022
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
Transforming forest health through genomics, Virtual Presentation for the Atlanta Botanical Garden, February 2023.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2023
Citation:
Seeing the Forest for the Trees: Bioinformatics Solutions to Conserve Biodiversity, Women in Bioinformatics Keynote, New Haven, CT, November 2023
|
Progress 08/01/21 to 07/31/22
Outputs Target Audience:In Year 3, the CartograPlant project continued to focuson connections with the Tripal (tripal.info) / Galaxy (https://usegalaxy.org/) community to facilitate workflow development. There was also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, to engage the community. In specific, we hosted a virtual training session in June2022 and had a total of 40 attendees. Irene also presented a virtual presentation for Botany 2022 (A hybrid conference in July 2022). Irene also served on the genotype-phenotype committee through the AgBioData Working Groups that met weekly to develop best practices for data - the audience here includes other databases hosting species of agricultural interest.Two unique competitions were run via social media (primarily on Twitter) to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. We also expanded our curation team and expanded efforts to include the curation of environmental layers, public population genetic studies, and also manual curation of trait data. We now employ a total of five undergraduates who work in a coordinated team with a leader that trains new students on biocuration best practices.Finally, we continued to interact with several different teams - the World Forest ID organization that tracks illegal logging, the US Forest Service, two different large-scale JGI projects hosting forest tree data, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Most of the team continued from the second year into the third. The 3rd year of the project provided mentorship for postdoc Dr. Irene Cobo-Simon as well as a team of sixundergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with twocompetitions that were run during the time of plant-based conferences (Botany and NAFGS this past Summer). This past year, I enacted a new system of team-based feedback for the biocuration team so that they could provide direct feedback on data integrity aspects of the interface. In addition, Dr. Herndon (co-PI) mentored two undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap and has continued to provide support for new trait-based ontologies. How have the results been disseminated to communities of interest?In Year 3, we focused on community training through virtual workshops that would be accessible to users everywhere.We hosted two of these workshops (about three hours in length each). All PIs participated and the lead postdoc led most of the session.Training included walk throughs of both association mapping and landscape genomics workflows, including data filtering. We also hosted atotal of two contests for data submission(study submission) that the biocuration team coordinated.We also presented this work at three different conferences (Botany, NAFGS, and Plant and Animal Genome). We are also involved in two other USDA-NIFA funded initatives, including AG2PIas well as the AgBioData community with a focus on genotype/phenotype FAIR data standards. Finally, we continued our support of the USDA efforts for World Forest ID with application of CartograPlant to track illegal logging. What do you plan to do during the next reporting period to accomplish the goals?During our final year of the project, we will host an in-person workshop at Plant and Animal Genome (January 2023). We will also publish two papers on CartograPlant to demonstrate its utility as a meta-analysis tool - with a specific focus on the ability to confirm genotype-phenotype associations across studies and identify new associations. We will provide a set of updated training materials through our project page Git and lead one additional Virtual Workshop. We will also work directly with our advisory board to review the workflows that are available in CartograPlant and seek feedback on the exposed parameters, visualizations, and overall usability.
Impacts What was accomplished under these goals?
In Year 3, the team continued their focuson the workflow development for CartograPlant. Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. We specifically focused on improving the process for population structure and have included an automated process that generates study-specific and species-specific structure maps after the upload of a new study. These layers are now available instantly to the end users. The second primary focus was visualization. Here, we developed new interactive visualization methods for SNP filtering that allowed more iterative adjustment to the datasets and allowed the user to change parameters after examining the impact in preliminary analysis. We also developed visualizations for environmental data, specifically correlation matricies to match what was available for phenotypes. We optimized the environmental metrics as well to speed the downstream analytic workflow for landscape genomics.Finally, the project focused on outreach to the community through two dedicated virtual workshops that were broadly advertised to the research community. These were led by the primary postdoc developer and ran for three hours each.
Publications
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Integrated Platform for Association Mapping and Landscape Genomics. Plant and Animal Genome. January 2022 in the Tripal Workshop Session.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
CartograPlant: Cyberinfrastructure for genotype, phenotype, and environment. North American Forest Genetics Society. June 2022.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
TreeGenes and CartograPlant: Resources for Plant Conservation and Breeding. Botany 2022 Virtual Presentation. July 2022.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2022
Citation:
Cyberinfrastructure to improve forest health and productivity. Virtual Keynote Presentation for the Australasian Plant Breeding Conference, May 2022.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2021
Citation:
Integrated technologies to support forest tree biodiversity and conservation genomics, Virtual Presentation for Natural Resources Canada, Quebec City, Canada. December 2021.
|
Progress 08/01/20 to 07/31/21
Outputs Target Audience:In Year 2, the CartograPlant project focused on connections with the Tripal (tripal.info) / Galaxy (https://usegalaxy.org/) community to facilitate workflow development. Therewas also directed effort by our postdoc scholar lead, Dr. Irene Cobo-Simon, to engage the community. In specific, we hosted a workshop during the Botany 2021 meeting (July2021) that had 32 attendees. We also participated in a Virtual Tripal Hackathon to work with other developers on challenges related to Tripal-Galaxy workflow implementation. We presented at the North American Forest Genetics Student Symposium Conference (May 2021). Three unique competitions were run via social media to encourage the submission of peer reviewed data sets with appropriate metadata through the FAIR workflow known as the Tripal Plant PopGen Submit pipeline. This enhanced the repositories of self-described plant population studies. Finally, we interacted with several different teams - the World Forest ID organization that tracks illegal logging, the US Forest Service, and researchers working in horticultural systems to improve representation of the environmental layers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Most of the team continued from the first year into the second.The secondyear of the CartograPlant project provided mentorship for postdoc Dr. Irene Cobo-Simon as well as a team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. They also assisted with three different competitions that were run during the time of plant-based conferences. In addition, Dr. Herndon (co-PI) mentored three undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap and has added in support for the ontologies that are being used as part of the MIAPPE standards in CartograPlant(https://treesnap.org/). Abdullah has also put together a secondary application, known as WildType that is ideal for users to connect directly to the TPPS system to upload their data. How have the results been disseminated to communities of interest?This year, we published a new review paper on Tripal which highlights the CartograPlant module. In addition, we interacted with the community through journal submission contests. A total of three contests were run during the time of virutal conferences where communication on Twitter was frequent. We also interacted with the Tripal community via the first virutal Hackathon as well as with the Galaxy community throught he community conference. The Botany 2021 workshop on Population Genetics and Landscape Genomics with CartograPlant was well inteded by faculty, postdocs, and graduate students. What do you plan to do during the next reporting period to accomplish the goals?The final year will provide an opportunity to publish two papers - one is the final stages of writing for submission to Global Change Biology descirbing the overall platform. The second will be use case on the utility of meta-analysis using the three systems described previously. This final year will also see expansion in the type of Galaxy workflows available and how the users interact with them. Finally, we will work with the community to test the utility of the UI as well as efficieny of the FAIR data submisison system in order to fine tune these processes. This year will also be dedicated to outreach to journals to ensure their paricipation in encouraging FAIR data submission. We will also continue to work with organizations such as the USFS to ensure that other key resources, such as FIA, can be explored and integrated in CartograPlant.
Impacts What was accomplished under these goals?
In Year 2, the team focused on the workflow development for CartograPlant - we worked with three different systems - Populus trichocarpa, Zea mays, and Vitis vinifera. In all cases, studies were loaded that had overlapping use cases (working at least partially from the same georeference population). Workflow analysis focused on the ability to re-map SNP locations onto new releases of the genome annotation, integrate across GBS-style studies and array based investigations, and generate population structure maps in a semi-automated manner. The workflows for this portion are now available in the local Galaxy instance that supports CartograPlant. Users with accounts can present data associated with loaded studies to the available analytics. The workflows are also partially implemented with visualizations that easy the burden of filter selection on the users. Visualizations present overviews of the missing data by individual and by loci. Finally, the project focused on outreach to the community through a Botany workshop, Tripal Hackathon, and the Galaxy Community Conference.
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2021
Citation:
Staton, M., Cannon, E., Sanderson, L. A., Wegrzyn, J., Anderson, T., Buehler, S., ... & Ficklin, S. (2021). Tripal, a community update after 10 years of supporting open source, standards-based genetic, genomic and breeding databases. Briefings in bioinformatics, 22(6), bbab238.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2021
Citation:
Cyberinfrastructure for Plant Conservation and Breeding. Virtual Workshop for Botany 2021. July 2021.
- Type:
Conference Papers and Presentations
Status:
Accepted
Year Published:
2021
Citation:
Integrated Analysis for CartograPlant: The Galaxy of Plant Populations for Galaxy Community Conference 2021. June 2021.
|
Progress 08/01/19 to 07/31/20
Outputs Target Audience:The target audience for the original CartograTree audience was the forest tree breeding community. As part of this proposal, we are extending the application to reach theplantbreeding and conservation research community.During year one, the project team expanded Tripal Plant PopGen Submit Pipeline to accept data from a wider range of plant systems which included updating the taxonomic definitions. This also required a new release to accommodate the new Minimal Information About a Plant Phenotyping Experiment (MIAPPE) standards. These updates were coordinated with end users and announced via our annual newsletter and coordinated with associated journals. Our team also coordinated and integrated data from the Botanical Information and Ecology Network (BIEN). This collaboration enabled full integration of species range maps and occurrance data from a global repository of plants. The CartograPlant team formed a biocuration team of undergraduates that was led and trained by our most experienced undergraduate biocuration team member. As part of this effort, we reached developed a social media campaign via Twitter (TreeGenesDB) and also through conferences, including the Plant and Animal Genome (PAG) Conference. Finally, we organized the first scientific advisory meeting to communicate with our stakeholders on the utility of the existing environmental layers. We focused on loading new layers associated with the NSF-funded NEON project and layers associated with finer resolution of soil and land-use types. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?The first year of the CartograPlant project provided mentorship for the newly hired lead postdoc - Dr. Irene Cobo-Simon as well as a team of four undergraduate students who served on the biocuration team. As part of this mission, they learned how to read scientific literature in the field of population genetics and landscape genomics. They coordinated with the authors to retrieve details and files not available with the studies. They documented progress and announced releases of studies once fully integrated into CartograPlant. The biocuration team was also involved in social media outreach on the project through the use of Twitter and relevant e-mail lists. In addition, Dr. Herndon (co-PI) mentored three undergraduates from the Computer Science department who focused on developing interactive visualizations in D3 that can be integrated into the analytic frameworks housed in Galaxy. In Co-PI Dr. Staton laboratory, Abdullah Almasaeed is the lead technical developer for the TreeSnap application which was updated for the (https://treesnap.org/). How have the results been disseminated to communities of interest?Results have been disseminated through the public release of CartograPlant. Updates are communicated through Twitter, community mailing lists, and at the Plant and Animal Genome Conference (talk and computer demonstration). The project has already resulted in three publications highlighting CartograPlant and Galaxy-based analytics. What do you plan to do during the next reporting period to accomplish the goals?The primary focus for the upcoming year will be on Goal 2 as well as continuing our effots on Goal 4. Related to Goal 2, we will focus on developing robust workflows in Galaxy that are connected to intelligent analytics and interactive visualizations. Goal 4 will include more community-level training on the implemented workflows as well as feedback from the community on how to improve these.
Impacts What was accomplished under these goals?
The primary products from Year 1 include the fully public version of CartograPlant v2.0 (https://cartograplant.org/). The integration of 14 new environmental layers with a specific focus on the NEON, soil, and human impact global layers. The incorporation of full range maps for all plant species with associated data from the Botanical Information and Ecology Network (BIEN) database (https://bien.nceas.ucsb.edu/bien/). This release was also associated with a full update of the Tripal Plant PopGen Submit (TPPS - https://tpps.readthedocs.io/en/latest/intro.html) framework in our open-source database framework (Tripal v3.0) which is using the new Minimal Information About a Plant Phenotyping Experiment (MIAPPE - https://www.miappe.org/). This new TPPS released is available as a module for install in any of the over 30 active Tripal databases (https://tripal.info/).
Publications
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Wegrzyn, Jill L., Taylor Falk, Emily Grau, Sean Buehler, Risharde Ramnath, and Nic Herndon. "Cyberinfrastructure and resources to enable an integrative approach to studying forest trees." Evolutionary Applications 13, no. 1 (2020): 228-241.
- Type:
Journal Articles
Status:
Published
Year Published:
2019
Citation:
Wegrzyn, Jill L., Margaret A. Staton, Nathaniel R. Street, Dorrie Main, Emily Grau, Nic Herndon, Sean Buehler et al. "Cyberinfrastructure to improve forest health and productivity: the role of tree databases in connecting genomes, phenomes, and the environment." Frontiers in plant science 10 (2019): 813.
- Type:
Journal Articles
Status:
Published
Year Published:
2020
Citation:
Spoor, Shawna, Connor Wytko, Brian Soto, Ming Chen, Abdullah Almsaeed, Bradford Condon, Nic Herndon et al. "Tripal and Galaxy: supporting reproducible scientific workflows for community biological databases." Database 2020 (2020).
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2020
Citation:
Wegrzyn, J.L. Integrating beyond genomics: Cyberinfrastructure for forest health and productivity. Virtual Presentation for the CAES, New Haven, Connecticut. August 2020.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2019
Citation:
Wegrzyn, J.L. Cyberinfrastructure for forest health: landscape genomics for the future. Ecological Institute at UNAM, Mexico City, Mexico. November 2019.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2020
Citation:
Grau E.S., Wegrzyn, J.L. Computer Demonstration: TreeGenes and CartograPlant. Plant and Animal Genome XVIII, San Diego, California. January 2020.
|
|