Source: UNIVERSITY OF CALIFORNIA, BERKELEY submitted to NRP
COMPUTATIONAL APPROACHES FOR STRUCTURAL AND FUNCTIONAL GENOMICS
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
0187244
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2000
Project End Date
Sep 30, 2005
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIVERSITY OF CALIFORNIA, BERKELEY
(N/A)
BERKELEY,CA 94720
Performing Department
PLANT BIOLOGY
Non Technical Summary
(N/A)
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
9014099108016%
9012499108016%
9013999108016%
9017310108016%
2017310108018%
3047310108018%
Goals / Objectives
The principal aim of this project is to develop computational methods for structural and functional genomics, using the genome both as a base for investigation and as a resource to help answer biological questions. Structural genomics projects attempt to provide an experimental structure or a good theoretical model for every protein in all completed genomes. Our work will involve organizing proteins into families according to homology, predicting structure from homology and constructing coordinate models, maintaining an information resource for structural genomics, developing methods for selection of proteins for experimental characterization, and analyzing solved structures to detect homologies and functional information. The computational functional genomics aspect of this project will primarily involve moving beyond pairwise sequence comparison in order to achieve reliable functional annotation of complete genomes. This includes the use of gene genealogies to trace gene histories and functional divergences, non-homology approaches for functional characterization (such as Rosetta Stone and Phylogenetic Profiles), and "reverse genomics" comparison of multiple complete genomes to locate genes associated with characterized cellular or biochemical functions. We also plan to quantitatively combine sequence comparison with expression and other experimental functional data to improve computational molecular and cellular functional characterization.
Project Methods
See above.

Progress 10/01/00 to 09/30/05

Outputs
Our research group has projects in many disciplines of computational genomics; very brief sketches of major achievements of the group follow. *Structural genomics*: we participated in the Berkeley Structural Genomics Center, which made significant progress towards providing structural information for every soluble protein in Mycoplasma pneumoniae. We developed a strategy which became the basis for target selection in Protein Structure Initiative II. We assessed the impact of structural genomics, including the extent to which it met expectations and how it compares with traditional structural biology. *Protein function prediction*: we developed an automated method of prediction protein function from sequence. Our prediction method, called SIFTER, employs statistical tools and phylogenetic principles. SIFTER appears to work up to an order of magnitude better than other common methods. *Regulated unproductive splicing and translation*: We discovered that a large fraction of natural human alternative mRNAs are apparent targets of an RNA surveillance system called nonsense-mediated mRNA decay and are therefore degraded rather than being translated to make protein. We proposed that these mRNAs fated for destruction are a by-product of a novel regulatory system found in organisms from yeast to man. Subsequent analyses have uncovered several instances where this method appears to be used for gene regulation. Mutations can cause dysregulation, and we have filed patents for the diagnosis and treatment of individuals affected. We have also globally surveyed the landscape of nonsense-mediated mRNA decay in fly using an alternative-splicing specific DNA microarray. *Classification of Ancient Protein Evolutionary Relationships*: We have developed a computational framework to detect evolutionary relationships amongst proteins that lack any sequence similarity, using structural principles. *Structural Classification of Proteins and ASTRAL*: We have continued to support the comprehensive evolutionary and structural classification of all proteins, in the SCOP and ASTRAL databases. *Structural Classification of RNA*: We have constructed a comprehensive classification of all RNA 3D structures, and analyzed these for their salient motifs and substructures, and their bound metals. *Protein structure analysis*: We have measured protein sequence-structure correlations, suggesting the limitations of threading methods to predict protein structure. We also assessed entropy and correlations in protein secondary structure, which allowed us to create an extremely simple and efficient predictor comparable to the previous best methods. *Analysis of sequence comparison methods*: We have developed the premier methods of assessing different approaches for comparing protein sequences, using the SCOP gold standard and the Bayesian bootstrap for statistically-rigorous comparison. This allowed us to determine the optimal parameters and methods for sequence database searching and alignment. *Research tools*: We have built and distributed numerous popular tools, for example to run very large sequence database searches and to create sequence logos.

Impacts
The completed and ongoing research directly aids the interpretation of genome sequences and protein structures. Applications of this information include understanding of the roles of molecular functions of genes and proteins for use in genetic analysis and engineering for medicine, agriculture, and biotechnology. Some work unravels basic mechanisms of gene regulation with pervasive impact in understanding the biology of eukaryotic organisms. This has implications for how we learn about development, and host-pathogen interactions. It also is fundamental for interpreting the etiology and effect of disease mutations, and it has led us to propose a new treatment for numerous diseases that may provide significant benefit with limited adverse effects.

Publications

  • Hendrix DK, Brenner SE, Holbrook SR. 2006. RNA structural motifs: building blocks of a modular biomolecule. Quarterly Reviews of Biophysics in press.
  • Engelhardt BE, Jordan MI, Brenner SE. 2006. A statistical graphical model for predicting protein molecular function. Proceedings of the 23rd ICML in press.
  • Leontis NB, Altman R, Berman HM, Brenner SE, Brown J, Engelke D, Harvey SC, Holbrook SR, Jossinet F, Lewis SE, Major F, Mathews DH, Richardson J, Williamson JR, Westhof E. 2006. The RNA Ontology Consortium: an open invitation to the RNA community. RNA 12:533-541. doi:10.1261/rna.2343206
  • Smith A, Chandonia JM, Brenner SE. 2006. ANDY: a general, fault-tolerant tool for database searching on computer clusters. Bioinformatics 22:618-620. doi:10.1093/bioinformatics/btk020
  • Chandonia JM, Brenner SE. 2006. The impact of structural genomics: expectations and outcomes. Science 311:347-351. doi:10.1126/science.1121018
  • Soergel DAW, Lareau LF, Brenner SE. 2006. Regulation of gene expression by the coupling of alternative splicing and nonsense-mediated mRNA decay. in Maquat L, ed., Nonsense-mediated mRNA decay. Landes Bioscience. 175-196.
  • Stefan LR, Zhang R, Levitan AG, Hendrix DK, Brenner SE, Holbrook SR. 2006. MeRNA: a database of metal ion binding sites in RNA structures. Nucleic Acids Research 34:D131-D134. doi:10.1093/nar/gkj058
  • Chandonia JM, Kim SH, Brenner SE. 2005. Target selection and deselection at the Berkeley Structural Genomics Center. Proteins: Structure, Function, and Bioinformatics 62:356-370. doi:10.1002/prot.20674
  • Price GA, Crooks GE, Green RE, Brenner SE. 2005. Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap. Bioinformatics 21:3824-3831. doi:10.1093/bioinformatics/bti627
  • Kim SH, Shin DH, Liu J, Oganesyan V, Chen S, Xu QS, Kim SJ, Das D, Shulze-Gahmen U, Holbrook SR, Holbrook EL, Martinez BA, Oganesyan N, DeGiovanni A, Lou Y, Henriquez M, Huang C, Jancarik J, Pufan R, Choi IG, Chandonia JM, Hou J, Gold B, Yokota H, Brenner SE, Adams PD, Kim R. 2005. Structural genomics of minimal organisms and protein fold space. Journal of Structural and Functional Genomics 6:63-70. doi:10.1007/s10969-005-2651-9
  • Engelhardt BE, Jordan MI, Muratore KE, Brenner SE. 2005. Protein molecular function prediction by Bayesian phylogenomics. PLoS Computational Biology 1: e45. doi:10.1371/journal.pcbi.0010045
  • Crooks GE, Green RE, Brenner SE. 2005. Pairwise alignment incorporating dipeptide covariation. Bioinformatics 21:3704-3710. doi:10.1093/bioinformatics/bti616
  • Chandonia JM, Brenner SE. 2005. Update on the Pfam5000 strategy for selection of structural genomics targets. Proceedings of the 27th IEEE EMBS Conference 2540.1-5.
  • Carninci P, ... (12 authors) ..., Brenner SE, et al. 2005. The transcriptional landscape of the mammalian genome. Science 309: 1559-1563. doi:10.1126/science.1112014 Bourne PE, Brenner SE, Eisen MB. 2005. PLoS Computational Biology: a new community journal. PLoS Computational Biology 1:e4. doi:10.1371/journal.pcbi.0010004
  • Brenner SE, Tramontano A. 2005. Sequences and topology: a decade of genomes. Current Opinion in Structural Biology 15:245-247. doi:10.1016/j.sbi.2005.05.009 Blanchette M, Labourier E, Green RE, Brenner SE, Rio DC. 2005. Global analysis of positive and negative pre-mRNA splicing regulators in Drosophila. Genes & Development 19:1306-1314. doi:10.1101/gad.1314205
  • Crooks GE, Brenner SE. 2005. An alternative model of amino acid replacement. Bioinformatics 21:975-980. doi:10.1093/bioinformatics/bti109
  • Zachariah MA, Crooks GE, Holbrook SR, Brenner SE. 2005. A generalized affine gap model significantly improves protein sequence alignment accuracy. Proteins: Structure, Function, and Bioinformatics 58:329-338. doi:10.1002/prot.20299
  • Chandonia JM, Brenner SE. 2005. Implications of structural genomics target selection strategies: Pfam5000, whole genome, and random approaches. Proteins: Structure, Function, and Bioinformatics 58:166-179. doi:10.1002/prot.20298 Crooks GE, Wolfe J, Brenner SE. 2004. Measurements of protein sequence-structure correlations. Proteins: Structure, Function, and Bioinformatics 57:804-810. doi:10.1002/prot.20262
  • Tamura M, Hendrix DK, Klosterman PS, Schimmelman NRB, Brenner SE, Holbrook SR. 2004. SCOR: structural classification of RNA, version 2.0. Nucleic Acids Research 32:D182-D184. doi:10.1093/nar/gkh080
  • Green RE, Lewis BP, Hillman RT, Blanchette M, Lareau LF, Garnett AT, Rio DC, Brenner SE. 2003. Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genes. Bioinformatics 19:i118-i121. doi:10.1093/bioinformatics/btg1015 [ISMB2003 Conference Abstract]
  • Lewis BP, Green RE, Brenner SE. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proceedings of the National Academy of Sciences of the United States of America 100:189-192. doi:10.1073/pnas.0136770100
  • Klosterman PS, Tamura M, Holbrook SR, Brenner SE. 2002. SCOR: a structural classification of RNA database. Nucleic Acids Research 30:392-394.
  • Chandonia J-M, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. 2002. ASTRAL compendium enhancements. Nucleic Acids Research 30:260-263.
  • Lo Conte L, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. 2002. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Research 30:264-267.
  • Zupicich J, Brenner SE, Skarnes WC. 2001. Computational prediction of membrane-tethered transcriptions factors. Genome Biology 2:research0050.1-0050.6. doi:10.1186/gb-2001-2-12-research0050
  • Brenner SE. 2001. A tour of structural genomics. Nature Reviews Genetics 2:801-9.
  • Green RE, Brenner SE. 2002. Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison. Proceedings of the IEEE 9:1834-47. doi:10.1109/JPROC.2002.805303
  • Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Research 12:1611-8. doi:10.1101/gr.361602
  • Mougous JD, Green RE, Williams SJ, Brenner SE, Bertozzi CR. 2002. Sulfotransferases and sulfatases in mycobacteria. Chemistry & Biology 9:767-76. doi:10.1016/S1074-5521(02)00175-8


Progress 01/01/04 to 12/31/04

Outputs
Following are excerpts from abstracts of published works from this year: *Tamura, Hendrix, Klosterman, Schimmelman, Brenner, Holbrook 2004* SCOR provides a comprehensive perspective and understanding of RNA motif three-dimensional structure, function, tertiary interactions and their relationships. SCOR 2.0 represents a major expansion and introduces a new classification organization. *Andreeva, Howorth, Brenner, Hubbard, Chothia, Murzin 2004* SCOP participates in a project that aims to rationalize and integrate the data on proteins held in several sequence and structure databases. *Chandonia, Hon, Walker, Lo Conte, Koehl, Levitt, Brenner 2004* The ASTRAL Compendium, partially derived from the SCOP database of protein structure domains, provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. *Hillman, Green, Brenner 2004* For several of the PTC+ isoforms we identified existing experimental evidence can be reinterpreted and is consistent with the action of NMD to degrade the transcripts. Several genes with mRNA isoforms that we identified as PTC+ show how previous experimental results may be understood in light of NMD. *Klosterman, Hendrix, Tamura, Holbrook, Brenner 2004* Release 2.0.1 of the SCOR database contains a classification of the internal and hairpin loops in a comprehensive collection of 497 NMR and X-ray RNA structures. *Ranatunga, Hill, Mooster, Holbrook, Schulze-Gahmen, Xu, Bessman, Brenner, Holbrook 2004* We have determined the crystal structure, at 1.4 A, of the Nudix hydrolase DR1025 from the extremely radiation resistant bacterium Deinococcus radiodurans. *Crooks, Hon, Chandonia, Brenner 2004* WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment, providing a richer and more precise description of sequence similarity than consensus sequences. *Lareau, Green, Bhatnagar, Brenner 2004* Recent studies have investigated not only the scope but also the biological impact of alternative splicing on a large scale, revealing that its role in generating proteome diversity may be augmented by a role in regulation. *Crooks, Brenner 2004* We find that the important inter-sequence interactions are short ranged, that correlations between neighboring amino acids are essentially uninformative and that only one-fourth of the total information needed to determine the secondary structure is available from local inter-sequence correlations. *Blanchette, Labourier, Green, Brenner, Rio 2004* Immunopurification of nuclear RNP complexes showed that dU2AF50 associates with intronless mRNAs and bioinformatic analyses confirm the prevalence of U2AF binding sites in the majority of intronless mRNAs. *Chandonia, Earnest, Brenner. 2004* A report on the Keystone Symposium "Structural Genomics," held concurrently with the "Frontiers in Structural Biology" symposium, Snowbird, USA, 13-19 April 2004. *Crooks, Wolfe, Brenner 2004* We found that local interactions along the amino acid chain are far more important than non-local contacts and that correlations between proximate amino acids are essentially uninformative.

Impacts
The completed and ongoing research directly aids the interpretation of genome sequences and protein structures. Applications of this information include understanding of the roles of molecular functions of genes and proteins for use in genetic analysis and engineering for medicine, agriculture, and biotechnology. Newer work unravels basic mechanisms of gene regulation with pervasive impact in understanding the biology of eukaryotic organisms. This has implications for how we learn about development, host-pathogen interactions. It also is fundamental for interpreting the etiology and effect of disease mutations, and it has led us to propose a new treatment for numerous diseases that may provide significant benefit with limited adverse effects.

Publications

  • Tamura M, Hendrix DK, Klosterman PS, Schimmelman NRB, Brenner SE, Holbrook SR. 2004. SCOR: Structural Classification of Proteins, Version 2.0. Nucleic Acids Research 32:D182-D184.
  • Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. 2004. SCOP database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Research 32:D226-D229.
  • Chandonia JM, Hon G, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. 2004. The ASTRAL compendium in 2004. Nucleic Acids Research 32:D189-D192.
  • Hillman RT, Green RE, Brenner SE. 2004. An unappreciated role for RNA surveillance. Genome Biology 5:R8.1-R8.16.
  • Klosterman PS, Hendrix D, Tamura M, Holbrook SR, Brenner SE. 2004. Three-dimensional motifs from the SCOR, structural classification of RNA database: extruded strands, base triples, tetraloops, and U-turns. Nucleic Acids Research 8:2342-2352.
  • Ranatunga W, Hill EE, Mooster JL, Holbrook EL, Schulze-Gahmen U, Xu WL, Bessman MJ, Brenner SE, Holbrook SR. 2004. Structural studies of the nudix hydrolase DR1025 from Deinococcus radiodurans and ligand complexes. Journal of Molecular Biology 339:103-116.
  • Crooks GE, Hon G, Chandonia JM, Brenner SE. 2004. WebLogo: a sequence logo generator. Genome Research 14:1188-1190.
  • Lareau LF, Green RE, Bhatnagar RS, Brenner SE. 2004. The evolving roles of alternative splicing. Current Opinion in Structural Biology 14:273-282.
  • Crooks GE, Brenner SE. 2004. Protein secondary structure: entropy, correlations and prediction. Bioinformatics 20:1603-1611.
  • Chandonia JM, Earnest T, Brenner SE. 2004. Contrasting structural genomics and structural biology. Genome Biology 5:343.
  • Blanchette M, Labourier E, Green RE, Brenner SE, Rio DC. 2004. Genome-wide analysis reveals a novel function for the Drosophila splicing factor U2AF50 in the nuclear export of intronless mRNAs. Molecular Cell 14:775-786.
  • Crooks GE, Wolfe J, Brenner SE. 2004. Measurements of protein sequence-structure correlations. Proteins: Structure, Function, and Bioinformatics 57:804-810.


Progress 01/01/03 to 12/31/03

Outputs
Following are abstracts of published works from this year: *Lewis, Green, Brenner 2003* To better understand the role of alternative splicing, we conducted a large-scale analysis of reliable alternative isoforms of known human genes. Each isoform was classified according to its splice pattern and supporting evidence. We found that one-third of the alternative transcripts examined contain premature termination codons, and most persist even after rigorous filtering by multiple methods. These transcripts are apparent targets of nonsense-mediated mRNA decay (NMD), a surveillance mechanism that selectively degrades nonsense mRNAs. Several of these transcripts are from genes for which alternative splicing is known to regulate protein expression by generating alternate isoforms that are differentially subjected to NMD. We propose that regulated unproductive splicing and translation (RUST), through the coupling of alternative splicing and NMD, may be a pervasive, underappreciated means of regulating protein expression. *Green et al 2003* We have recently shown that a third of reliably-inferred alternative mRNA isoforms are candidates for nonsense-mediated mRNA decay (NMD), an mRNA surveillance system (Lewis et al., 2003, Proc. Natl Acad. Sci. USA, 100, 189-192). Rather than being translated to yield protein, these transcripts are expected to be degraded and may be subject to regulated unproductive splicing and translation (RUST). Our initial experimental studies are consistent with these predictions and suggest an unappreciated role for NMD in several human diseases.

Impacts
The completed and ongoing research directly aids the interpretation of genome sequences and protein structures. Applications of this information include understanding of the roles of molecular functions of genes and proteins for use in genetic analysis and engineering for medicine, agriculture, and biotechnology. Newer work unravels basic mechanisms of gene regulation with pervasive impact in understanding the biology of eukaryotic organisms. This has implications for how we learn about development, host-pathogen interactions. It also is fundamental for interpreting the etiology and effect of disease mutations, and it has led us to propose a new treatment for numerous diseases that may provide significant benefit with limited adverse effects.

Publications

  • Green RE, Lewis BP, Hillman RT, Blanchette M, Lareau LF, Garnett AT, Rio DC, Brenner SE. 2003. Widespread predicted nonsense-mediated mRNA decay of alternatively-spliced transcripts of human normal and disease genes. Bioinformatics 19:i118-i121.
  • Lewis BP, Green RE, Brenner SE. 2003. Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans. Proceedings of the National Academy of Sciences of the United States of America 100:189-192


Progress 01/01/02 to 12/31/02

Outputs
Following are abstracts of select published works: *Green & Brenner 2002* The exponentially growing library of known protein sequences represents molecules connected by an intricate network of evolutionary and functional relationships. To reveal these relationships, virtually every molecular biology experiment incorporates computational sequence analysis. The workhorse methods for this task make alignments between two sequences to measure their similarity. Informed use of these methods, such as NCBI BLAST, WU-BLAST, FASTA and SSEARCH, requires understanding of their effectiveness. To permit informed sequence analysis, we have assessed the effectiveness of modern versions of these algorithms using the trusted relationships among ASTRAL sequences in the Structural Classification of Proteins database classification of protein structures. We have reduced database representation artifacts through the use of a normalization method that addresses the uneven distribution of superfamily sizes. To allow for more meaningful and interpretable comparisons of results, we have implemented a bootstrapping procedure. We find that the most difficult pairwise relations to detect are those between members of larger superfamilies, and our test set is biased toward these. However, even when results are normalized, most distant evolutionary relationships elude detection. *Klosterman et al 2002* The Structural Classification of RNA (SCOR) database provides a survey of the three-dimensional motifs contained in 259 NMR and X-ray RNA structures. In one classification, the structures are grouped according to function. The RNA motifs, including internal and external loops, are also organized in a hierarchical classification. The 259 database entries contain 223 internal and 203 external loops; 52 entries consist of fully complementary duplexes. A classification of the well-characterized tertiary interactions found in the larger RNA structures is also included along with examples. The SCOR database is at http://scor.lbl.gov. *Chandonia et al, 2002* The ASTRAL compendium provides several databases and tools to aid in the analysis of protein structures, particularly through the use of their sequences. It is partially derived from the SCOP database of protein domains, and it includes sequences for each domain as well as other resources useful for studying these sequences and domain structures. Several major improvements have been made to the ASTRAL compendium since its initial release 2 years ago. The number of protein domain sequences included has doubled from 15 190 to 30 867, and additional databases have been added. In cases where a SCOP domain spans several protein chains, all of which can be traced back to a single genetic source, a `genetic domain' sequence is created by concatenating the sequences of each chain in the order found in the original gene sequence. Both the original-style library of SCOP sequences and a new library including genetic domain sequences are available. Selected representative subsets of each of these libraries, based on multiple criteria and degrees of similarity, are also included. ASTRAL may be found at http://astral.stanford.edu/.

Impacts
The completed and ongoing research directly aids the interpretation of genome sequences and protein structures. Applications of this information include understanding of the roles of molecular functions of genes and proteins for use in genetic analysis and engineering for medicine, agriculture, and biotechnology. Newer work unravels basic mechanisms of gene regulation with pervasive impact in understanding the biology of eukaryotic organisms.

Publications

  • *Green RE, Brenner SE. 2002. Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison. Proceedings of the IEEE 9:1834-47.
  • Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, et al. 2002. The Bioperl toolkit: Perl modules for the life sciences. Genome Research 12:1611-8.
  • Mougous JD, Green RE, Williams SJ, Brenner SE, Bertozzi CR. 2002. Sulfotransferases and sulfatases in mycobacteria. Chemistry & Biology 9:767-76.
  • *Klosterman PS, Holbrook SR, Brenner SE. 2002. SCOR: a structural classification of RNA database. Nucleic Acids Research 30:392-394.
  • Chandonia J-M, Walker NS, Lo Conte L, Koehl P, Levitt M, Brenner SE. 2002. ASTRAL compendium enhancements. Nucleic Acids Research 30:260-263.
  • Lo Conte L, Brenner SE, Hubbard TJP, Chothia C, Murzin AG. 2002. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Research 30:264-267.


Progress 01/01/01 to 12/31/01

Outputs
My research group computationally studies sequenced genomes to deepen understanding of molecular evolution, structure, and function. This report summarizes all research activities in my laboratory from Jan-Dec 2001. Structural genomics aims to provide an experimental structure or a theoretical model for every protein in completed genomes. Motivating this effort is the immense power of structure to elucidate evolution & function. To this end, the Berkeley Structural Genomics Center intends to produce a near-complete structural complement of the genomes for M. genitalium and M. pneumoniae. I have overseen selection of optimal targets for crystallography. Work is also progressing on a LIMS to track lab results. Protein structure is far better conserved than protein sequence, and therefore allows recognition of homology of more distantly related molecules. However, inference of homology from structure is presently a subjective judgment. To address this issue, we assessed the significance of similar structures with beta-bulges in corresponding locations. These are generally thought to be artifacts, so it was surprising when we found that apparently unrelated proteins would often have bulges in similar positions. This suggests that beta-bulges are not informative for inferring homology because they play an unexpectedly important structural role. Though the number of protein structures is growing rapidly, it is dwarfed by the quantity of sequences without structures. Threading is a powerful, but as yet unreliable, means of matching sequences and structures and thereby bringing structural information to a vast repertoire of sequences for which experimental structures are not yet available. We have made progress in constructing a flexible threading system, whose key enhancement is its ability to accept experimental data. We are beginning studies with simulated data, and we look forward to testing these methods with novel genome sequences and restraints from mass spectrometry. The past year witnessed tremendous growth in the amount of RNA structures, which is expected to expand further in the coming years. In order to aid interpretation of these data, my collaborators and I have built a Structural Classification of RNA (SCOR) database, which organizes motifs in RNA structures and provides a functional classification and examples of tertiary interactions. In completing this database, we made exciting discoveries, such as the prevalence of certain geometries for base-triples, and the curious existence of a diloop that had been evolutionarily identified as a tetraloop. A new area of research in my group involves the relationship between alternative splicing and protein structure. In the course of these studies, we discovered that a majority of alternative splicing events affect the protein product in a dramatic manner. This has launched a new set of investigations to understand alternative splicing's potential interactions with other cellular systems as a major post-transcriptional protein expression regulatory mechanism.

Impacts
This project is basic genomics research, whose applied impact, while indirect, is profound. Detailed scientific study of molecular function of newly discovered genes is critical for virtually every aspect of molecular biological endeavor, including improving knowledge of genomically-studied crops; interpreting and undermining pathogenicity; and aiding the understanding, diagnosis, and therapeutics for human disease.

Publications

  • Brenner SE. 2001. A Tour of Structural Genomics. Nature Reviews Genetics 2:801-809.
  • Zupicich J, Brenner SE, Skarnes WC. 2001. Computational prediction of membrane-tethered transcriptions factors. Genome Biology 2:research0050.1-0050.6.