Progress 09/15/06 to 07/31/09
Outputs OUTPUTS: Information from this project is being disseminated through an established GO annotation website, AgBase. Expressed protein sequence tags (ePSTs), which are experimentally-derived protein coding sequences that are not in the current genome annotation, are displayed on AgBase using the Generic Genome Browser (GBrowse). The GBrowse display includes biological evidence to allow evaluation of the strength of each ePST, including the presence of a valid start codon, the number of peptides used to identify the ePST, the coverage of the potential ORF by peptides, the presence/absence of a ribosomal binding site, the presence/absence of conserved domain(s), codon bias, and confidence in peptide identifications. All of the ePSTs and the biological evidence for M. haemoltyica, H. somni, and P. multocida can be accessed at the AgBase website (http://www.agbase.msstate.edu/epst/) under the link, "Microbial GBrowsers". A computational pipeline for proteogenomic mapping was developed as a component of this project, and it is available for download through AgBase at http://www.agbase.msstate.edu/index.html under the "Tools" link. A tool was also developed for peptide validation, PepOut. This tool uses a target/ decoy approach for peptide validation. It combines an outlier based machine learning approach with Bayesian statistics to determine the probability of a true identification of each peptide, which is a critical step in the proteogenomic mapping process. Results from this project were also disseminated to the bovine respiratory disease community at the annual Conference for Research Workers in Animal Diseases. PARTICIPANTS: Dr. Mark L. Lawrence at the Mississippi State University College of Veterinary Medicine (MSU-CVM) is the project director. Dr. Shane Burgess (MSU-CVM), Dr. Susan Bridges (MSU Department of Computer Sciences and Engineering), and Dr. Bindu Nanduri (MSU-CVM) are the project co-directors. Dr. James Watt, a postdoctoral scientist, is analyzing proteogenomic mapping data. Protein isolations were conducted by Michelle Banes (MSU-CVM), mass spectrometry was conducted by Dr. Tibor Pechan at the MSU Life Sciences and Biotechnology Institute. Nan Wang, a Ph.D. student under the direction of Dr. Bridges, constructed the proteogenomic mapping computational pipeline and is conducting the mapping analyses. Ranjit Kumar, a Ph.D. student under the direction of Dr. Lawrence, is responsible for displaying results in GBrowse and establishing links with collaborators at other institutions. Dr. Sarah Highlander at the Baylor College of Medicine directed genome sequencing of M. haemolytica, the H. somni genome sequence was conducted by Dr. Tom Inzana at the Virginia Tech College of Veterinary Medicine, and the 2.38 Mbp genome sequence of P. multocida nontoxigenic porcine pneumonic pasteurellosis isolate 3480 was finished by Dr. Allison Gillaspy at the Oklahoma University Health Sciences Center. TARGET AUDIENCES: This project specifically benefits the bovine respiratory disease research community by improving the genome annotation for the three most important bacterial pathogens responsible for causing bovine respiratory disease: Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida. Accurate and accessible annotation of their genomes is needed to maximize the utility of the genome sequences to the U.S. agricultural research community. In a broader sense, this project also serves as an important model for experimental annotation of other agricultural microbial genomes. The techniques and tools developed are publicly available to benefit annotation efforts of other microbial genomes. This project also evaluated whether the gene prediction methods used to annotate these genomes are accurate. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts Proteogenomic mapping has provided comprehensive experimental evidence to improve the annotation of the M. haemolytica, P. multocida, and H. somni genomes. For the M. haemolytica genome, which is not a finished genome, it provided biological evidence for the expression of many proteins that were annotated as pseudogenes. Thus, proteogenomic mapping is an efficient method for improving the annotation of unfinished genomes. For P. multocida and H. somni, which are finished genomes, proteogenomic mapping resulted in the identification of novel proteins, but not as many because of the better quality sequence and annotation. Importantly, this project resulted in the generation of computational tools for proteogenomic mapping, and it resulted in the generation of new computational tools to assess the quality of peptides and ePSTs. Our results indicate that assessment of ePST quality is critical to the success of any proteogenomic mapping project.
Publications
- Wang, N., B. Nanduri, M. L. Lawrence, S. M. Bridges, and S. C. Burgess. 2010. Gene Model Detection Using Mass Spectrometry. Methods in Molecular Biology: Proteome Bioinformatics. Vol. 604 ISBN: 978-1-60761-443-2
- Watt, J. M., G. D. Ramsey, S. K. Bridges, B. Nanduri, R. Kumar, S. C. Burgess, and M. L. Lawrence. 2009. Experimental annotation of Mannheimia haemolytica A:1 by proteogenomic mapping. Conference for Research Workers in Animal Disease, Chicago, Illinois.
- Wang, N., S. C. Burgess, M. L. Lawrence, and S. M. Bridges. 2009. Proteogenomic Mapping for Structural Annotation of Prokaryote Genomes. Proceeding of IJCBS09, Shanghai, China.
- Lawrence, M. L., J. Watt, S. Bridges, B. Nanduri, N. Wang, R. Kumar, and S. C. Burgess. 2008. Improvement of genome annotation for bovine respiratory disease pathogens by proteogenomic mapping. Conference for Research Workers in Animal Disease, Chicago, Illinois.
- Wang, N., S. C. Burgess, M. L. Lawrence, B. Nanduri, F. McCarthy, C. Yuan, and S. M. Bridges. 2008. Novel algorithms for structural annotation of prokaryotic genomes. ISBM 2008, Toronto, Canada
- Wang, N., C. Yuan, B. Nanduri, and S. M. Bridges. 2008. Integrating evidence for evaluation of potential protein-coding genes using Bayesian networks. Proceeding of BIOCOMP08, Las Vegas, Nevada.
- Wang, N., C. Yuan, D. Wu, S. C. Burgess, B. Nanduri, and S. M. Bridges. 2008. PepOut: Distance-based Outlier Detection Model for Improving MS/MS Peptide Identification Confidence . MCBIOS 2008, Oklahoma City, Oklahoma.
|
Progress 09/15/07 to 09/14/08
Outputs OUTPUTS: A critical component of this proposal is a coordinated effort with the PIs of the genome sequencing projects for Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida. Information will be disseminated to these annotation projects through an established GO annotation website, AgBase (http://www.agbase.msstate.edu/). Another objective of this project is to develop proteogenomic mapping tools and improve annotation tools available for researchers to conduct functional genomics and systems biology investigations on these pathogens. These tools are also being made available through AgBase. Expressed protein sequence tags (ePSTs), which are experimentally-derived protein coding sequences that are not in the current genome annotation, will be visualized and displayed on AgBase using the Generic Genome Browser (GBrowse). The GBrowse display will include biological evidence to allow evaluation of the strength of each ePST, including the presence of a valid start codon, the number of peptides used to identify the ePST, the coverage of the potential ORF by peptides, the presence/absence of a ribosomal binding site, the presence/absence of conserved domain(s), codon bias, and confidence in peptide identifications. Results from this project are being disseminated through national meetings and will be published in peer-reviewed journals. PARTICIPANTS: Dr. Mark L. Lawrence at the Mississippi State University College of Veterinary Medicine (MSU-CVM) is the project director. Dr. Shane Burgess (MSU-CVM), Dr. Susan Bridges (MSU Department of Computer Sciences and Engineering), and Dr. Bindu Nanduri (MSU-CVM) are the project co-directors. Dr. James Watt, a postdoctoral scientist, is analyzing proteogenomic mapping data. Protein isolations were conducted by Michelle Banes (MSU-CVM), mass spectrometry was conducted by Dr. Tibor Pechan at the MSU Life Sciences and Biotechnology Institute. Nan Wang, a Ph.D. student under the direction of Dr. Bridges, constructed the proteogenomic mapping computational pipeline and is conducting the mapping analyses. Ranjit Kumar, a Ph.D. student under the direction of Dr. Lawrence, is responsible for displaying results in GBrowse and establishing links with collaborators at other institutions. Dr. Sarah Highlander at the Baylor College of Medicine directed genome sequencing of M. haemolytica, the H. somni genome sequence was conducted by Dr. Tom Inzana at the Virginia Tech College of Veterinary Medicine, and the 2.38 Mbp genome sequence of P. multocida nontoxigenic porcine pneumonic pasteurellosis isolate 3480 was finished by Dr. Allison Gillaspy at the Oklahoma University Health Sciences Center. TARGET AUDIENCES: This project will specifically benefit the bovine respiratory disease research community by improving the genome annotation for the three most important bacterial pathogens responsible for causing bovine respiratory disease: Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida. Accurate and accessible annotation of their genomes is needed to maximize the utility of the genome sequences to the U.S. agricultural research community. In a broader sense, this project will also serve as an important model for experimental annotation of other agricultural microbial genomes. The techniques and tools developed will be made publicly available to benefit annotation efforts of other microbial genomes, and it will result in the establishment of a centralized database for annotation of agricultural microbial genomes. This project will also determine whether the gene prediction methods used to annotate these genomes are accurate, and it is likely that proteins will be identified that were not predicted in the annotation. PROJECT MODIFICATIONS: Nothing significant to report during this reporting period.
Impacts High throughput proteogenomic mapping will provide economic, rapid, and comprehensive experimental evidence to improve the annotation of the M. haemolytica, P. multocida, and H. somni genomes. Experimental evidence is being provided for the existence of protein products from the predicted protein coding sequences that were identified during annotation of these three genomes. In addition, proteins are being identified that were not predicted in the genome annotation, which will directly improve the genome annotations. This will result in improved proteomics databases for these species, a better understanding of the size and diversity of proteomes in Pasteurellaceae, and improved ability to model functional genomics datasets for these species. In addition, the computational tools for proteogenomic mapping and annotation that are being developed will be freely available through the AgBase website.
Publications
- Lawrence, M. L., J. Watt, S. Bridges, B. Nanduri, N. Wang, R. Kumar, and S. C. Burgess. 2008. Improvement of genome annotation for bovine respiratory disease pathogens by proteogenomic mapping. Conference for Research Workers in Animal Disease, Chicago, Illinois.
- Wang, N., C. Yuan, S.C. Burgess, S. M. Bridges. 2008 Distance-based Outlier Detection Model for Improving MS/MS Peptide Identification Confidence. MCBIOS'08, Oklahoma City, OK.
- Wang, Nan, S.C. Burgess, M.L. Lawrence, B. Nanduri, F. McCarthy, C. Yuan, S. M. Bridges. 2008. Novel algorithms for structural annotation of prokaryotic genomes. ISBM 2008, Toronto, Canada
|
Progress 09/15/06 to 09/14/07
Outputs OUTPUTS: The objective of this proposal is to conduct experimental annotation of the three bovine respiratory disease (BRD) pathogens Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida by proteogenomic mapping to improve their ongoing structural annotations and provide functional annotation. Another objective of this project is to improve annotation tools available for researchers to conduct functional genomics and systems biology investigations on these pathogens. To date, proteogenomic maps of Mannheimia haemolytica strain PHL213 and P. multocida strain 3480 have been produced. Proteins were isolated from each strain in triplicate and analyzed by multi-dimensional protein identification technology (MuDPIT) using two-dimensional liquid chromatography with electrospray ionization tandem mass spectrometry. The resulting mass spectra were searched against their respective protein databases using SEQUEST (Bioworks 3.2 cluster; ThermoElectron). For proteogenomic mapping, tandem mass spectra were also searched against the respective genome sequences translated in all six potential frames using SEQUEST. Peptides were validated using an outlier detection method by using a k-nearest neighbor approach to compare Xcorr and ∆CN scores derived from true and randomized databases. The lists of peptides identified from the genome sequences were compared with the lists identified from protein databases. For peptides only identified from the genome sequence, our automated proteogenomic mapping pipeline was then used to produce expressed protein sequence tags (ePSTs), which are the theoretical protein coding sequences that contain the peptide sequence identified by mass spectrometry. To visualize ePSTs, they were displayed against their respective genomes using the Generic Genome Browser (GBrowse). Biological evidence is currently being incorporated into GBrowse to allow evaluation of the strength of each ePST, including the presence of a valid start codon, the number of peptides used to identify the ePST, the coverage of the potential ORF by peptides, the presence/absence of a ribosomal binding site, the presence/absence of conserved domain(s), codon bias, and confidence in peptide identifications. Proteins have been isolated and analyzed from Histophilus somni strain 2336, and proteogenomic mapping of this strain is currently being conducted. PARTICIPANTS: Dr. Mark L. Lawrence at the Mississippi State University College of Veterinary Medicine (MSU-CVM) is the project director. Dr. Shane Burgess (MSU-CVM), Dr. Susan Bridges (MSU Department of Computer Sciences and Engineering), and Dr. Bindu Nanduri (MSU-CVM) are the project co-directors. Protein isolations were conducted by Michelle Banes (MSU-CVM), mass spectrometry was conducted by Dr. Tibor Pechan at the MSU Life Sciences and Biotechnology Institute. Nan Wang, a Ph.D. student under the direction of Dr. Bridges, constructed the proteogenomic mapping computational pipeline and is conducting the mapping analyses. Ranjit Kumar, a Ph.D. student under the direction of Dr. Lawrence, is responsible for displaying results in GBrowse and establishing links with collaborators at other institutions. A critical component of this proposal is a coordinated effort with the PIs of the genome sequencing projects for Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida. Information will be disseminated to these annotation projects through an established GO annotation website (AgBase). Dr. Sarah Highlander at the Baylor College of Medicine directed genome sequencing of M. haemolytica, the H. somni genome sequence was conducted by Dr. Tom Inzana at the Virginia Tech College of Veterinary Medicine, and the 2.38 Mbp genome sequence of P. multocida nontoxigenic porcine pneumonic pasteurellosis isolate 3480 was finished by Dr. Allison Gillaspy at the Oklahoma University Health Sciences Center. TARGET AUDIENCES: Since 2000, the USDA has invested heavily in microbial genome sequencing, which has resulted in the completion of a large number of genome sequences from animal pathogens that impact U.S. agriculture. However, the ability to decipher the information content of sequenced genomes is currently limited and has seriously hindered the full experimental exploitation of these sequences. In particular, there is no experimental evidence for the existence of predicted protein products from the large majority of annotated genes in sequenced microbial genomes. Furthermore, standardized gene ontology (GO) to facilitate data retrieval is not consistently used. This project will specifically benefit the bovine respiratory disease research community by improving the genome annotation for the three most important bacterial pathogens responsible for causing bovine respiratory disease: Mannheimia haemolytica, Histophilus somni, and Pasteurella multocida. Accurate and accessible annotation of their genomes is needed to maximize the utility of the genome sequences to the U.S. agricultural research community. In a broader sense, this project will also serve as an important model for experimental annotation of other agricultural microbial genomes. The techniques and tools developed will be made publicly available to benefit annotation efforts of other microbial genomes, and it will result in the establishment of a centralized database for annotation of agricultural microbial genomes. This project will also determine whether the gene prediction methods used to annotate these genomes are accurate, and it is likely that proteins will be identified that were not predicted in the annotation. PROJECT MODIFICATIONS: No Project Modifications information reported.
Impacts High throughput proteogenomic mapping will provide economic, rapid, and comprehensive experimental evidence to improve the annotation of the M. haemolytica, P. multocida, and H. somni genomes. Experimental evidence is being provided for the existence of protein products from the predicted protein coding sequences that were identified during annotation of these three genomes. In addition, proteins are being identified that were not predicted in the genome annotation, which will directly improve the genome annotations. This will result in improved proteomics databases for these species, a better understanding of the size and diversity of proteomes in Pasteurellaceae, and improved ability to model functional genomics datasets for these species. In addition, the computational tools for proteogenomic mapping and annotation that are being developed will be freely available through the AgBase website.
Publications
- No publications reported this period
|
|