Progress 03/01/24 to 02/28/25
Outputs Target Audience:The target audience is aquaculture genomics researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?We presented AquaMine at the Plant and Animal Genome Conference in January 2025. What do you plan to do during the next reporting period to accomplish the goals?Complete the final release of AquaMine.
Impacts What was accomplished under these goals?
Impact Statement The aquaculture genomics research community is working to generate genomic resources in order to develop technologies that will enhance aquaculture production efficiency, sustainability, product quality and profitability. To understand the genome's role in the expression of economically important traits, scientists must integrate information about genes and other genomic elements with their own research data for downstream analyses. This project will provide a bioinformatics resource that enables data mining, integration and comparison of genomic datasets of aquaculture species. The web-accessible data mining resource will empower aquaculture researchers, with or without programming skills, to leverage the genomic data in their research, thereby accelerating discoveries that will lead to a better understanding of physiological mechanisms underlying commercially important traits. Progress Toward Specific Aims Aim 1. Develop AquaMine, a high performance data mining system that integrates genome assemblies and annotation data for species of importance to US aquaculture. 1) Major activities completed / experiments conducted Activities toward aim 1 were in the following steps in preparation of the next release of AquaMine: 1) obtaining and processing genome assemblies and genes from NCBI for 26 new or updated genomes, 2) processing a new data source (NCBI) for GO annotation, 3) completing the RNA-seq pipeline and computation of gene expression levels for 302 RNA-seq experiments downloaded from the NCBI SRA. 2) Data collected We did not generate new data, but collected and processed data from external sources as described above. 3) Summary statistics and discussion of results We now have genomes and genes for a total of 58 species. 4) Key outcomes or other accomplishments realized NA Aim 2. Enhance the available genomic data with additional information using existing computational pipelines. 1) Major activities completed / experiments conducted 2) Data collected We did not generate new raw data, but computed datasets using existing data. 3) Summary statistics and discussion of results NA 4) Key outcomes or other accomplishments realized NA Aim 3. Foster a user community by engaging researchers throughout the development process to ensure usability and support of aquaculture research objectives. 1) Major activities completed / experiments conducted We presented AquaMine at the Plant and Animal Genome Conference in January 2025. 2) Data collected NA 3) Summary statistics and discussion of results NA 4) Key outcomes or other accomplishments realized NA
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2025
Citation:
Sivasankaran SK, Walsh AT, Palti Y, Roberts SB, Elsik CG. AquaMine: Tools for Mining Genomic Regions of Aquaculture Species. Poster. Plant and Animal Genome Conference. San Diego, CA. Jan 10-15 2025.
|
Progress 03/01/23 to 02/29/24
Outputs Target Audience:The target audience is aquaculture genomics researchers. Changes/Problems:Genome browser development was not originally proposed, but the research community has requested it, so we will put more effort into genome browser development in the next project period. What opportunities for training and professional development has the project provided?We presented an AquaMine tutorial at the NRSP-8 Aquaculture Workshop, which took place at North Carolina State University in May 2023. We have trained groups at Washington State University and at the USDA-ARS Cool and Cold Water Aquaculture Research Lab on the use of the JBrowse/Apollo genome browser. How have the results been disseminated to communities of interest?The release of AquaMinev1.2 was announced in emails sent to research community members and on our Slack channel. We presented AquaMine and the new genome browser at the Plant and Animal Genome Conference in January 2024. We also presented an AquaMine tutorial at the NRSP-8 Aquaculture Workshop, which took place at North Carolina State University in May 2023. What do you plan to do during the next reporting period to accomplish the goals?We will develop a new release of AquaMine with updated data and new genome assemblies for some species, and will create genome browsers for additional species. We will present AquaMine to the research community and conduct additional tutorials.
Impacts What was accomplished under these goals?
Impact Statement The aquaculture genomics research community is working to generate genomic resources in order to develop technologies that will enhance aquaculture production efficiency, sustainability, product quality and profitability. To understand the genome's role in the expression of economically important traits, scientists must integrate information about genes and other genomic elements with their own research data for downstream analyses. This project will provide a bioinformatics resource that enables data mining, integration and comparison of genomic datasets of aquaculture species. The web-accessible data mining resource will empower aquaculture researchers, with or without programming skills, to leverage the genomic data in their research, thereby accelerating discoveries that will lead to a better understanding of physiological mechanisms underlying commercially important traits. Progress Toward Specific Aims Aim 1. Develop AquaMine, a high performance data mining system that integrates genome assemblies and annotation data for species of importance to US aquaculture. 1) Major activities completed / experiments conducted Major activities toward aim 1 were in the following areas: 1) completing the AquaMinev1.2 release, 2) testing AquaMinev1.2 prior to public release, 3) developing a new genome browser, 4) deploying a BLAST search tool connected to the genome browser. Most of the data preparation for AquaMinev1.2 was reported in the previous annual report. but the new database had not been loaded, tested and released until this project period. The AquaMine webpage was updated, and the computed GO annotations were made available on the data download page. Based on input from the research community, we decided to develop a genome browser using data that had been processed for the AquaMine database to enable users to visualize annotations. The first genome browser was developed for rainbow trout using the JBrowse platform and includes the Apollo genome annotation tool. Data tracks include gene predictions (Ensembl and RefSeq), 29 RNA-seq experiments, single nucleotide polymorphisms and repetitive elements. The RNA-seq data is available in 6 different visualizations. In addition to the complete gene prediction tracks, there are two tracks that highlight disagreements between Ensembl and RefSeq gene models. 2) Data collected We did not generate new data, but we processed and integrated existing data. 3) Summary statistics and discussion of results AquaMinev1.2 has a total of 1,009,341,849 data objects. These include 39 genomes, 2,743,593 genes and 1,493,882 proteins. 4) Key outcomes or other accomplishments realized AquaMinev1.2 was made publicly available, and announced to the research community. Aim 2. Enhance the available genomic data with additional information using existing computational pipelines. 1) Major activities completed / experiments conducted Most of the proposed computations were completed the last project period This year we computed RNA-seq based transcript assemblies using StringTie to create RNA-seq-based gene models for JBrowse. 2) Data collected We did not generate new raw data, but computed datasets using existing data. 3) Summary statistics and discussion of results NA 4) Key outcomes or other accomplishments realized NA Aim 3. Foster a user community by engaging researchers throughout the development process to ensure usability and support of aquaculture research objectives. 1) Major activities completed / experiments conducted The release of AquaMinev1.2 was announced in emails sent to research community members and on our Slack channel. We presented AquaMine and the new genome browser at the Plant and Animal Genome Conference in January 2024. We also presented an AquaMine tutorial at the NRSP-8 Aquaculture Workshop, which took place at North Carolina State University in May 2023. We have trained groups at Washington State University and at the USDA-ARS Cool and Cold Water Aquaculture Research Lab on the use of the JBrowse/Apollo genome browser. 2) Data collected NA 3) Summary statistics and discussion of results NA 4) Key outcomes or other accomplishments realized NA
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Elsik CG. AquaMine Genomic Data Mining Warehouse and Genome Browsers for Species of Importance to US Aquaculture and Fisheries. Aquaculture Workshop presentation. Plant and Animal Genome Conference. San Diego, CA. Jan 12-17 2024.
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2024
Citation:
Konvalina JD, Walsh AT, Tripathi V, Gao G, Roberts SB, Palti Y, Elsik CG. AquaMine Genomic Data Mining Warehouse and Genome Browsers for Species of Importance to US Aquaculture and Fisheries. Poster. Plant and Animal Genome Conference. San Diego, CA. Jan 12-17 2024.
|
Progress 03/01/22 to 02/28/23
Outputs Target Audience:The target audience is aquaculture genomics researchers. Changes/Problems:
Nothing Reported
What opportunities for training and professional development has the project provided?Five research comminuty members, in addition to the Co-PIs, learned about AquaMine and contributed to testing the first release. We created an AquaMine Slack channel to provide quick responses to questions. How have the results been disseminated to communities of interest?AquaMine was presented in the AquaCulture workshop at the Plant and Animal Genome Conference in January 2023. The presentation included demonstration videos. A poster was also presented at PAG. What do you plan to do during the next reporting period to accomplish the goals?We will complete the second release (AquaMinev1.2) early in the reporting period, will demonstrate it to the working group, will hold zoom tutorials, and will seek feedback from researchers. We will develop of the third release with updated data, and will seek functional annotation data to incorporate into AquaMine.
Impacts What was accomplished under these goals?
Impact Statement The aquaculture genomics research community is working to generate genomic resources in order to develop technologies that will enhance aquaculture production efficiency, sustainability, product quality and profitability. To understand the genome's role in the expression of economically important traits, scientists must integrate information about genes and other genomic elements with their own research data for downstream analyses. This project will provide a bioinformatics resource that enables data mining, integration and comparison of genomic datasets of aquaculture species. The web-accessible data mining resource will empower aquaculture researchers, with or without programming skills, to leverage the genomic data in their research, thereby accelerating discoveries that will lead to a better understanding of physiological mechanisms underlying commercially important traits. Progress Toward Specific Aims Aim 1. Develop AquaMine, a high performance data mining system that integrates genome assemblies and annotation data for species of importance to US aquaculture. 1) Major activities completed / experiments conducted Major activities toward aim 1 were in the following areas: 1) completing the AquaMinev1.1 release, 2) testing AquaMinev1.1 prior to public release, 3) collecting feedback on AquaMinev1.1 from research community members after public release, 4) gathering and parsing data for the AquaMine v1.2 release, 5) modifying the InterMine data model for new AquaMinev1.2 datasets, 6) modifying data loaders for the new datasets, and 7) test loading the AquaMinev1.2 datasets. In preparation for AquaMinev1.2, we gathered updated data from the following sources: NCBI, Ensembl, Uniprot, InterPro, KEGG and OrthoDB. New datasets included genomes and gene sets of nine new fishery species, orthologs from Ensembl Compara, GO annotation from Ensembl BioMart, genomic variants from the European Variation archive for three species, Reactome pathways for the model organisms (Drosophila melanogaster, Danio rerio and Homo sapiens), RNA-seq based gene expression levels with metadata (see Objective 2), reciprocal best hit (RBH) orthologs (see Objective 2) and computed GO annotation for all species (see Objective 2). We modified the data model to accommodate RBH and gene expression data. Data parsers for NCBI and Ensembl gff3 files were updated, and a new script was developed to convert GO annotation from Ensembl BioMart into gaf format. Each dataset was test loaded into the PostgreSQL database individually to check for data format issues, and then subsets of data were loaded to check data integration and postprocessing. 2) Data collected We did not generate new data, but collected and formatted data as described above. 3) Summary statistics and discussion of results We have integrated genes and genomes of 37 aquatic eumetazoan species and 2 non-aquatic model organisms, as well as associated gene information and orthologous relationships. 4) Key outcomes or other accomplishments realized AquaMinev1.1 was made publicly available, and feedback was collected from research community members. The individual datasets for the second release, AquaMinev1.2, have been tested and validated. Aim 2. Enhance the available genomic data with additional information using existing computational pipelines. 1) Major activities completed / experiments conducted We reran our ortholog pipeline to develop an updated AquaMine-Ortho dataset which includes the new species. Ortholog sets were computed for the following taxonomic groups: Bilateria, Deuterostomia, Actinopterygii, Teleostei, Euteleosteomorpha, Neoteleostei, Carangaria, Eupercaria, Protacanthopterygii, Salmonidae, Otomorpha, Protostomia, Arthropoda, Mollusca. At the request of community members, we created a one-to-one ortholog dataset called AquaMine-RBH based on reciprocal best hit (RBH) protein alignments. AquaMine-RBH orthologs were computed between model organisms or well-annotated non-model organisms and other species with a similar evolutionary history of whole genome duplication. RBH were computed between Homo sapiens and Crassostrea gigas, Crassostrea virginica, Haliotis rufescens, Homarus americanus, Mercenaria mercenaria, Penaeus monodon, Penaeus vannamei, Procambarus clarkia and Lepisosteus oculatus; Crassostrea gigas and the other mollusks; Lottia gigantea and the other mollusks; Drosophila melanogaster and the other arthropods; Danio rerio and the other Teleostei except the salmonids; Oreochromis niloticus and the other Neoteleostei; Salmo salar and the other salmonids; Oncorhynchus mykiss and the other salmonids; Oncorhynchus kisutch and the other salmonids. We developed a pipeline to compute Gene Ontology (GO) annotation for all species to augment the GO annotation data available from UniProt and Ensembl BioMart. The pipeline included 1) identifying protein domains using InterProScan with the -goterms flag to include GO terms in the output and 2) leveraging the RBH one-to-one ortholog data to transfer GO terms from UniProt and Ensembl BioMart of well-annotated species to the other species. We computed RNA-seq-based expression levels for Haliotis rufescens, Oncorhynchus kisutch, Oncorhynchus mykiss, Oreochromis niloticus, Penaeus monodon and Salmo salar (166 experiments total from 8 gene expression atlas type BioProjects). Fastq sequence were obtained from the NCBI Sequence Read Archive (SRA), trimmed, and aligned to the respective genome assembly with Hisat2. StringTie2 was used to computed gene expression levels as FPKM and TPM values. Metadata was curated from the SRA and NCBI BioSamples. 2) Data collected We did not generate new raw data, but computed datasets using existing data. 3) Summary statistics and discussion of results The AquaMine-Ortho dataset has 375,341 ortholog clusters (i.e. families of orthologous genes) among 14 ortholog sets computed at different taxonomic levels. The AquaMine-RBH dataset includes 2,280,512 one-to-one ortholog pairs. The AquaMine-GO-Annotation dataset includes GO annotations of 782,942 genes in the non-model aquatic species, ranging from 12,659 to 33,550 genes per species, with a total of 7,477,383 GO annotations. The gene expression data includes 166 gene expression experiments from various tissues. 4) Key outcomes or other accomplishments realized The new AquaMine-Ortho, AquaMine-RBH, AquaMine-GO and gene expression data have been tested and validated, and they will be included in the production instance of AquaMine v1.2. Aim 3. Foster a user community by engaging researchers throughout the development process to ensure usability and support of aquaculture research objectives. 1) Major activities completed / experiments conducted The release of AquaMinev1.1 was announced in emails sent to research community members. We created an AquaMine Slack channel so research community members who wished to contribute to testing and feedback could easily communicate with the AquaMine team. We presented AquaMine at the Plant and Animal Genome Conference in January 2021. 2) Data collected NA 3) Summary statistics and discussion of results NA 4) Key outcomes or other accomplishments realized NA
Publications
- Type:
Conference Papers and Presentations
Status:
Published
Year Published:
2023
Citation:
Elsik CG, Walsh AT, Triant DA, Palti Y, Roberts S. AquaMine Genomic Data Mining Warehouse for Species of Importance to US Aquaculture and Fisheries. Workshop Presentation and Poster. Plant and Animal Genome Conference, January 13-18, 2023.
|
Progress 03/01/21 to 02/28/22
Outputs Target Audience:The target audience is aquaculture genomics researchers. Changes/Problems:The start of the project was delayed several months due to a 6-month delay in receiving a new server for this project (purchased using internal funding). Although we were able to start collecting some data prior to receiving the new server, we did not have sufficient computational resources on our other servers to load and test a new InterMine database with multiple genomes. As a result, we did not finish loading the final production instance of the first AquaMine release prior to the end of the first project period. The final production instance will be completed within the first two months of the second project period, and we will get back on schedule for the next release. What opportunities for training and professional development has the project provided?
Nothing Reported
How have the results been disseminated to communities of interest?We presented preliminary development work in the Plant & Animal Genome virtual Aquaculture workshop. What do you plan to do during the next reporting period to accomplish the goals?We will complete the first release early in the reporting period, demonstrate AquaMine to the AquaMine Working Group, hold Zoom tutorials announced on AngenMap, create demo videos, and seek feedback from researchers. We will complete the second release, which will include additional computed datasets.
Impacts What was accomplished under these goals?
Impact Statement The aquaculture genomics research community is working to generate genomic resources in order to develop technologies that will enhance aquaculture production efficiency, sustainability, product quality and profitability. To understand the genome's role in the expression of economically important traits, scientists must integrate information about genes and other genomic elements with their own research data for downstream analyses. This project will provide a bioinformatics resource that enables data mining, integration and comparison of genomic datasets of aquaculture species. The web-accessible data mining resource will empower aquaculture researchers, with or without programming skills, to leverage the genomic data in their research, thereby accelerating discoveries that will lead to a better understanding of physiological mechanisms underlying commercially important traits. Progress Toward Specific Aims Aim 1. Develop AquaMine, a high performance data mining system that integrates genome assemblies and annotation data for species of importance to US aquaculture. 1) Major activities completed / experiments conducted Major activities toward aim 1 were in the following areas: 1) data gathering, 2) data parsing, 3) modifying data loaders, 4) test loading the database, 5) modifying the InterMine WebApp, 6) developing new query templates, and 7) testing the database and webapps. We gathered data from the following sources: NCBI, Ensembl, Uniprot, InterPro, KEGG and OrthoDB. We curated genome data for twelve US aquaculture species: Crassostrea gigas (Pacific oyster), Crassostrea virginica (Eastern oyster), Ictalurus punctatus (channel catfish), Morone saxatilis (striped sea bass), Oncorhynchus kisutch (coho salmon), Oncorhynchus mykiss (rainbow trout), Oreochromis niloticus (Nile tilapia), Penaeus monodon (giant tiger prawn), Penaeus vannamei (Pacific white shrimp), Perca flavescens (yellow perch), Salmo salar (Atlantic salmon) and Seriola lalandi dorsalis (California yellowtail). To enhance comparative approaches based on orthology, we also included genomes of 14 additional aquatic eumetazoan species of importance to US fisheries and basic research, as well as genes from the model organisms Danio rerio, Drosophila melanogaster, Homo sapiens and Lottia gigantea. We developed new parsers for genome assembly files, NCBI and Ensembl gene annotations, and UniProt, and we developed a new data model for ortholog clusters. Each dataset was test loaded into the PostgreSQL database individually to check for data format issues, and then subsets of data were loaded to check data integration and postprocessing. 2) Data collected We did not generate new data, but collected and formatted data as described above. 3) Summary statistics and discussion of results We have integrated genes and genomes of 26 aquatic eumetazoan species and genes of 4 model organisms, as well as associated gene information and orthologous relationships. The database has 424,720,421 objects. 4) Key outcomes or other accomplishments realized The individual datasets have been tested and validated. The production instance of AquaMine v1.1 will be completed early in the second project year (see Changes/Problems). Aim 2. Enhance the available genomic data with additional information using existing computational pipelines. 1) Major activities completed / experiments conducted We computed a new ortholog dataset called AquaMine-Ortho using Orthologer, the pipeline developed by OrthoDB. By running the pipeline ourselves we were able to include species and last common ancestral taxonomic groups that are not available at OrthoDB. Ortholog sets were computed for the following taxonomic groups: Deuterostomia, Actinopterygii, Euteleosteomorpha, Neoteleostei, Percomorphaceae. Carangaria, Eupercaria, Protacanthopterygii, Salmoninae, Otomorpha, Protostomia, Arthropoda, Mollusca. The model organisms Danio rerio, Drosophila melanogaster, Homo sapiens and Lottia gigantea were included to allow researchers to leverage additional information from model species based on orthology. 2) Data collected We did not generate new raw data, but computed orthologs using existing protein sequence data from NCBI. 3) Summary statistics and discussion of results The AquaMine-Ortho dataset has 13 otholog sets computed at different taxonomic levels, with a total of 293,061 ortholog clusters (i.e. families of orthologous genes). 4) Key outcomes or other accomplishments realized The AquaMine-Ortho data has been tested and validated, and will be included in the production instance of AquaMine v1.1 early in the second project year (see Changes/Problems). The pre-loaded data was shared with USDA NCCCWA for use in research. Aim 3. Foster a user community by engaging researchers throughout the development process to ensure usability and support of aquaculture research objectives. 1) Major activities completed / experiments conducted We presented preliminary development work in the Plant & Animal Genome virtual Aquaculture workshop in January 2022, and sought input from community members. 2) Data collected Nothing to report 3) Summary statistics and discussion of results Nothing to report 4) Key outcomes or other accomplishments realized Reseachers became aware of the project and provided suggestions.
Publications
- Type:
Other
Status:
Other
Year Published:
2022
Citation:
Elsik CG. AquaMine - a High Performance Genomic Data Mining System for Species of Importance to US Aquaculture. Virtual Presentation. Plant and Animal Genome Conference, January 8-12, 2022.
|
|