Source: UNIVERSITY OF ILLINOIS submitted to NRP
SYSTEMS AND NETWORK BIOLOGY FOR AGRICULTURAL PRODUCTION
Sponsoring Institution
National Institute of Food and Agriculture
Project Status
COMPLETE
Funding Source
Reporting Frequency
Annual
Accession No.
1014249
Grant No.
(N/A)
Cumulative Award Amt.
(N/A)
Proposal No.
(N/A)
Multistate No.
(N/A)
Project Start Date
Oct 1, 2017
Project End Date
Sep 30, 2022
Grant Year
(N/A)
Program Code
[(N/A)]- (N/A)
Recipient Organization
UNIVERSITY OF ILLINOIS
2001 S. Lincoln Ave.
URBANA,IL 61801
Performing Department
Crop Sciences
Non Technical Summary
Our research proposal will explore the dynamics of biomolecular systems with network approaches that integrate evolutionary and functional genomic information. This includes deriving processes that are responsible for patterns of biological change so that we can make predictive statements about molecular biology. Our research impacts breeding, bioengineering and biomedicine applications for agriculture, consumer, and environmental sciences (ACES), which ultimately depend on our understanding of the protein molecules of the cell and their associated molecular functions. The rising human population demands increased production of food, feed, fodder, fiber and fuel while mitigating global impacts on climate change. These applications include for example the search for efficient biofuels, development of new biotechnologies and nanotechnologies, enhancement of food and agricultural systems, and the mitigation of pests and pollutants. Proteins sustain life in our planet, enabling major biogeochemical cycles crucial for planetary stability, crosstalk between animal, plants and microbes, photosynthesis and nitrogen fixation, and crucial signaling in brain activities important for cognition and behavior. Their misfolding results in neurobiological diseases such as Alzheimer's, their challenge causes plant and animal pathogenesis, and their deregulation results in cancer. Understanding proteomic make up and patterns of functional recruitment in metabolic networks is necessary to help engineer plant and microbes to secure America's energy future. Despite of the importance of this sophisticated machinery, our knowledge of how the machinery operates and determines biological function has yet to be uncovered, including the rationale for molecular change and the mysterious origin of the 'vocabulary' that shapes genetics. While we seek knowledge of basic principles underlying biological organization, these principles can be applied to very specific problems. Examples: (1) Principles of protein structure and organization can be applied to genetic engineering of photosynthetic antenna and reaction centers of photosynthesis necessary to increase yield of biomass-producing microbes; (2) Principles of biological network organization can be used to design new metabolic pathways important for lignin degradation and break down of biomass recalcitrance; (3) Principles of protein and ribonucleoprotein complex organization can be used to create new nanosystems or nanobioreactors for the food industry or to monitor the environment. We are also interested in generating statistical and bioinformatics tools for molecular biology and genomics that can impact molecular biology applications for agriculture.
Animal Health Component
(N/A)
Research Effort Categories
Basic
100%
Applied
(N/A)
Developmental
(N/A)
Classification

Knowledge Area (KA)Subject of Investigation (SOI)Field of Science (FOS)Percent
2012499104020%
3043899104020%
2014030104010%
2014099104050%
Goals / Objectives
We will use networks to study the dynamics of complex systems. Networks can describe how atoms or residues move in relation to other atoms or residues in biomolecules. Similarly, networks can describe how genes are transcribed to exert metabolic activities during the physiological activities or behavior of an organism. Networks can also describe how structural modules in molecules emerge and change throughout evolutionary history, while they produce more complex metabolic or signaling networks. The focus of this proposal is to untangle the principles behind the dynamics of biological networks at all of these different temporal scales. The methodologies to track network dynamics have been already developed by our team. Each of our five objectives explores network dynamics at different time frames, from molecular dynamics occurring at nanosecond or microsecond levels to molecular rearrangements unfolding during the course of billions of years of biological evolution. Our approaches include mining genomic information and molecular dynamics to discover the most basic patterns controlling molecular and cellular structure and their associated functions for systems and synthetic biology and bioengineering.Objective1. Analyzing how molecular functions shape dynamics in evolution and bioengineering of protein molecules. In this objective a completely novel proof-of-principle will be developed at the interface of two seemingly disparate fields of science, evolutionary genomics and molecular dynamics.Objective 2. Active sites and reaction chemistries and the impact of the combinatorial rearrangement of elementary functional loops (EFLs) and protein domains in the emergence of primordial molecular functions.Objective 3.Metabolite-centric reporter pathway analysis in Arabidopsis under stress to study transcriptome-level physiological changes. Transcriptome data can be a starting point when it is difficult to obtain genome scale qualitative proteomic and metabolomics data for analysis of metabolic change that occurs at physiological or developmental levels of an organism. Here we will perform reporter metabolite analysis using transcriptome data to uncover key pathways that outline metabolic responses of Arabidopsis to cold stress.Objective 4.Uncovering pathway modules shared by reward-dependent behaviors in behavioral genomics. In collaboration with Sandra Rodriguez-Zas (Animal Sciences, UIUC), we will study how behavior alters gene expression patterns of an organism.Objective 5.Developing a graph-theoretical dynamical and historical view of protein domain organization. The history of the rearrangement of structural domains in proteins will help dissect the processes of recruitment of molecular functions.
Project Methods
Methods focus on biodiversity at high levels of molecular complexity, building on evolutionary and functional genomic knowledge generated during almost two decades of research in structural bioinformatics. The toolset that will be used is extensive and includes a number of computational biology applications, including phylogenomic and chemoinformatic analysis, MD simulation, and the use of systems biology and network biology strategies, which include network representation, visualization and analysis.Protein domain structures in genomic sequences will be assigned using Hidden Markov models of structural recognition with tools that have over 95% prediction accuracy. Structures will be benchmarked relative to those deposited in the PDB database. We will use Molecular Dynamics (MD) simulations of protein structure with the NAMD package parallelized in the Blue Waters supercomputer environment, one of the fastest in the world, which is available on campus. MD simulations of protein loops will be explored on a timescale of 50-70 ns. We will construct a dynamics space, a 'dynamosome', by calculating the eigenvalues of the top 5 principal components from principal component analysis and centrality metrics from a network based on the dynamic cross-correlation matrix of the motions of protein residues. In order to assess the presence or absence of a specific network topology, we will calculate maximum modularity scores, alpha values to test power law behavior, and Bartel's test statistic for measuring the extent of modularity, scalefreeness and randomness of networks. We will perform unsupervised clustering of the trajectories using dynamosome variables. We also plan to use methods that classify community structure patterns exhibited by loops and their correlation to specific function. Our goal is to reconstruct a "structure-evolution" space that would complement our dynamosome.Phylogenomic analyses will make use of the protein domain census to construct phylogenetic statements using advanced methods of phylogenetic reconstruction, including the parsimony ratchet methodology as performed in PAUP*. We will also use Gene Ontology (GO) database definitions of molecular functions to index genomic, phylogenomic and MD information.Gene expression data from Arabidopsis and other plants under different abiotic stresses will be retrieved from the GEO database of NCBI (http://www.ncbi.nlm.nih.gov/geo/) or other plant genome databases. Anotated genes and biochemical reactions will be retrieved from AraCyc and manually rechecked using KEGG. Data will be subjected to metabolite reporter analysis in the Matlab environment. Enrichment analysis will identify up- and down-regulated genes of significance. Network analyses will use Cytoscape and Pajek software. A number of scripts in our laboratory will allow task automation.

Progress 10/01/19 to 09/30/20

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project enables training of graduate and undergraduate students. How have the results been disseminated to communities of interest?Research advances that were generated were published in well reputed peer-reviewed journals, including PLoS One, Genome Biology and Evolution and Evolutionary Bioinformatics. Research from the laboratory of PI Caetano-Anollés has been reported by National Geographics and other oulets. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We coupled network and systems biology approaches and evolutionary genomic information to advance our understanding of the origin and evolution of crucial molecular systems such as metabolic networks, elementary functionome networks, the proteome and networks of protein dynamics. We revealed important patterns of evolutionary recruitment that tailored the structural makeup of metabolism. We also studied the the evolution of life, including the evolutionary originof viruses and bacteria, and the diversification of populations of important microbial pathogens: (1) Dissecting enzyme recruitment in metabolic networks: Enzyme recruitment is a fundamental evolutionary driver of modern metabolism. We see evidence of recruitment at work in the metabolic Molecular Ancestry Networks (MANET) database (http://manet.illinois.edu), an online resource that integrates data from KEGG, SCOP and structural phylogenomic reconstruction. First, we releasedversion 3.0 of MANET, which updates data from KEGG and SCOP, links enzyme and PDB information with PDBsum, and traces evolutionary information of domains defined at fold family level of SCOP classification in metabolic subnetwork diagrams. Compared to SCOP folds used in the previous versions, fold families are cohesive units of functional similarity that are highly conserved at sequence level and offer a 10-fold increase of data entries. We surveyed enzymatic, functional and catalytic site distributions among superkingdoms showing that ancient enzymatic innovations followed a biphasic temporal pattern of diversification typical of module innovation. We grouped enzymatic activities of MANET into a hierarchical system of subnetworks and mesonetworks matching KEGG classification. The evolutionary growth of these modules of metabolic activity was studied using bipartite networks and their one-mode projections at enzyme, subnetwork and mesonetwork levels of organization. Evolving metabolic networks revealed patterns of enzyme sharing that transcended mesonetwork boundaries and supported the patchwork model of metabolic evolution. We also explored the scale-freeness, randomness and small-world properties of evolving networks as possible organizing principles of network growth and diversification. The network structure shows an increase in hierarchical modularity and scale-free behavior as metabolic networks unfold in evolutionary time. Remarkably, this evolutionary constraint on structure was stronger at lower levels of metabolic organization. Evolving metabolic structure reveals a 'principle of granularity', an evolutionary increase of the cohesiveness of lower-level parts of a hierarchical system. (2)The origin and evolution of viruses: The canonical frameworks of viral evolution describe viruses as cellular predecessors, reduced forms of cells, or entities that escaped cellular control. The discovery of giant viruses has changed these standard paradigms. Their genetic, proteomic and structural complexities resemble those of cells, prompting a redefinition and reclassification of viruses. In a previous genome-wide analysis of the evolution of structural domains in proteomes, with domains defined at the fold superfamily level, we found the origins of viruses intertwined with those of ancient cells. We have extended these data-driven analyses to the study of fold families confirming the co-evolution of viruses and ancient cells and the genetic ability of viruses to foster molecular innovation. The results support our suggestion that viruses arose by genomic reduction from ancient cells and validate a co-evolutionary 'symbiogenic' model of viral origins. (3)Bacterial origins: The candidate phyla radiation (CPR) is a proposed subdivision within the bacterial domain comprising several candidate phyla. CPR organisms are united by small genome and physical sizes, lack several metabolic enzymes, and populate deep branches within the bacterial subtree of life. These features raise intriguing questions regarding their origin and mode of evolution. In this study, we performed a comparative and phylogenomic analysis to investigate CPR origin and evolution. Unlike previous gene/protein sequence-based reports of CPR evolution, we used protein domain superfamilies classified by protein structure databases to resolve the evolutionary relationships of CPR with non-CPR bacteria, Archaea, Eukarya, and viruses. Across all supergroups, CPR shared maximum superfamilies with non-CPR bacteria and were placed as deep branching bacteria in most phylogenomic trees. CPR contributed 1.22% of new superfamilies to bacteria including the ribosomal protein L19e and encoded four core superfamilies that are likely involved in cell-to-cell interaction and establishing episymbiotic lifestyles. Although CPR and non-CPR bacterial proteomes gained common superfamilies over the course of evolution, CPR and Archaea had more common losses. These losses mostly involved metabolic superfamilies. In fact, phylogenies built from only metabolic protein superfamilies separated CPR and non-CPR bacteria. These findings indicate that CPR are bacterial organisms that have probably evolved in an Archaea-like manner via the early loss of metabolic functions. We also discovered that phylogenies built from metabolic and informational superfamilies gave contrasting views of the groupings among Archaea, Bacteria, and Eukarya, which add to the current debate on the evolutionary relationships among superkingdoms. (4)Genetic structure of Ralstonia populations: We also studied diversification of bacteria at population levels. Bacterial wilt-causing Ralstonia threaten numerous crops throughout the world. We studied the population structure of 196 isolates of Ralstonia solanacearum and 39 isolates of Ralstonia pseudosolanacearum, which were collected from potato- and tomato-growing areas in 19 states of Brazil. Regardless of the species, three groups of isolates were identified. One group encompassed R. pseudosolanacearum isolates. The other two groups comprise isolates of R. solanacearum (phylotype II) split according to geographic regions, one made of isolates from the North and Northeast and the other made of isolates from the Central, Southeast, and South regions (CSS). The analysis of genetic variability revealed that the proximity of some geographic regions and the movement of potato tubers could have facilitated migration and therefore low genetic differentiation between geographic regions. Finally, geography, which also influences host distribution, affects the structure of the population of R. solanacearum in Brazil. Despite quarantine procedures in Brazil, increasing levels of trade are a threat to biosecurity, and these results emphasize the need for improving our regional efforts to prevent the dispersal of pathogens. (5) Pathways of mutational change in SARS-CoV-2 proteomes: The massive worldwide spread of the SARS-CoV-2 virus is fueling the COVID-19 pandemic. We investigated the genomic accumulation of mutations at various time points of the early pandemic to identify changes in mutationally highly active genomic regions that are occurring worldwide. The analysis revealed dominant variants, most of which were located in loop regions and on the surface of the proteins. Mutation entropy decreased between March and April of 2020 after steady increases at several sites, including in the spike (S) protein that were previously found associated with higher case fatality rates. Notable expanding mutations involve the nucleocapsid (N) protein inter-domain linker region and the viroporin encoded by ORF3a.These results predict an ongoing mutational shift from the spike and replication complex to other regions, especially to encoded molecules known to represent major β-interferon antagonists.

Publications

  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Tomaszewski, T., DeVries, R.S., Dong, M., Bhatia, G., Norsworthy, M.D., Zheng, X. and Caetano-Anoll�s, G. 2020. New pathways of mutational change in SARS-CoV-2 proteomes involve regions of intrinsic disorder important for virus replication and release. Evolutionary Bioinformatics 16: 1176934320965149.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Kang, C., Sun, F., Yan, L., Bai, J. and Caetano-Anoll�s, G. 2019. Genome-wide identification and characterization of the vVacuolar H +-ATPase subunit H gene family in crop plants. International Journal of Molecular Sciences 20(20):5125.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Mughal, F. and Caetano-Anoll�s, G. 2019. MANET 3.0: Hierarchy and modularity in evolving metabolic networks. PLoS One 14(10):e0224201.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Nasir, A., Caetano-Anoll�s, G. and Claverie J.-M. 2019. Editorial: Viruses, genetic exchange, and the Tree of Life. Frontiers in Microbiology 10:2782.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Ribeiro Santiago, T., Lopes, C.A., Caetano-Anoll�s, G. and, Mizubuti, E.S.G. 2020. Genetic structure of Ralstonia solanacearum and Ralstonia pseudosolanacearum in Brazil. Plant Disease 104(4): 1019-1025.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Bokhari, R.H., Manirjan, N., Jeong, H., Kim, K.M., Caetano-Anoll�s, G. and Nasir, A. 2020. Bacterial origin and reductive evolution of the CPR group. Genome Biology and Evolution 12(3): 103-121.
  • Type: Journal Articles Status: Published Year Published: 2020 Citation: Mughal, F., Nasir, A. and Caetano-Anoll�s, G. 2020. The origin and evolution of viruses inferred from fold family structure. 165(10): 2177-2191.


Progress 10/01/18 to 09/30/19

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided? The project enables training of graduate and undergraduate students. How have the results been disseminated to communities of interest? Research advances that were generated were published in well reputed peer-reviewed journals, including Scientific Reports, Journal of Molecular Evolution and Evolutionary Bioinformatics. PI Caetano-Anollés reported findings in several universities and venues, including the famous 2019 Albany: 20th Conversation meeting and the Frontiers in Genomics Program at UNAM, Cuernavaca, Mexico. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We coupled network and systems biology approaches and evolutionary genomic information to advance our understanding of the origin and evolution of crucial molecular systems such as the proteome, metabolic networks, elementary functionome networks, and networks of protein dynamics. We revealed important patterns of evolutionary recruitment that tailored the structural makeup of these systems.We also studied horizontal gene transfer (HGT) processes responsible for recruitments using the human microbiome as model system and the rooting of phylogenomic trees important to understand the evolution of the proteome and the Tree of Life (ToL): (1)Emergence of hierarchical modularity in evolving networks uncovered by phylogenomic analysis: Networks describe how parts associate with each other to form integrated systems which often have modular and hierarchical structure. In biology, network growth involves two processes, one that unifies and the other that diversifies. Here, we propose a biphasic (bow-tie) theory of module emergence. In the first phase, parts are at first weakly linked and associate variously. As they diversify, they compete with each other and are often selected for performance. The emerging interactions constrain their structure and associations. This causes parts to self-organize into modules with tight linkage. In the second phase, variants of the modules diversify and become new parts for a new generative cycle of higher level organization. The paradigm predicts the rise of hierarchical modularity in evolving networks at different timescales and complexity levels. Remarkably, phylogenomic analyses uncover this emergence in the rewiring of metabolomic and transcriptome-informed metabolic networks, the nanosecond dynamics of proteins, and evolving networks of metabolism, elementary functionomes, and protein domain organization. (2)Horizontal gene transfer in human-associated microorganisms inferred by phylogenetic reconstruction and reconciliation: HGT is widespread in the evolution of prokaryotes, especially those associated with the human body. We implemented large-scale gene-species phylogenetic tree reconstructions and reconciliations to identify putative HGT-derived genes in the reference genomes of microbiota isolated from six major human body sites by the NIH Human Microbiome Project. Comparisons with a control group representing microbial genomes from diverse natural environments indicated that HGT activity increased significantly in the genomes of human microbiota, which is confirmatory of previous findings. Roughly, more than half of total genes in the genomes of human-associated microbiota were transferred (donated or received) by HGT. Up to 60% of the detected HGTs occurred either prior to the colonization of the human body or involved bacteria residing in different body sites. The latter could suggest 'genetic crosstalk' and movement of bacterial genes within the human body via hitherto poorly understood mechanisms. We also observed that HGT activity increased significantly among closely-related microorganisms and especially when they were united by physical proximity, suggesting that the 'phylogenetic effect' can significantly boost HGT activity. Finally, we identified several core and widespread genes least influenced by HGT that could become useful markers for building robust ToLs and address several outstanding technical challenges to improve the phylogeny-based genome-wide HGT detection method for future applications. (3)Testing empirical support for evolutionary models that root the tree of of diversified life:ToLs can only be rooted with direct methods that seek optimization of character state information in ingroup taxa. This involves optimizing phylogenetic tree, model, and data in an exercise of reciprocal illumination. Rooted ToLs have been built from a census of protein structural domains in proteomes using two kinds of models. Fully-reversible models use standard-ordered (additive) characters and Wagner parsimony to generate unrooted trees of proteomes that are then rooted with Weston's generality criterion. Non-reversible models directly build rooted trees with unordered characters and asymmetric stepmatrices of transformation costs that penalize gain over loss of domains. We tested the empirical support for the evolutionary models with character state reconstruction methods using two published proteomic datasets. We showed that the reversible models match reconstructed frequencies of character change and are faithful to the distribution of serial homologies in trees. In contrast, the non-reversible models went counter to trends in the data they must explain, attracting organisms with large proteomes to the base of the rooted trees while violating the triangle inequality of distances. This can lead to serious reconstruction inconsistencies that show model inadequacy. Our study highlights the aprioristic perils of disposing of countering evidence in natural history reconstruction.

Publications

  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Jeong, H., Arif, B., Caetano-Anoll�s, G., Kim, K.M. and Nasir, A. 2019. Horizontal gene transfer in human-associated microorganisms inferred by phylogenetic reconstruction and reconciliation. Scientific Reports 9(1):12173.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Diene, S.M., Pinault, L., Keshri, V., Arsmtrong, N., Khelaifia, S., Chabriere, E., Caetano-Anoll�s, G., Colson, P., La Scola, B., Rolain, J.M., Pontarotti, P. and Raoult, D. 2019. Human metallo-beta-lactamase enzymes degrade penicillin. Scientific Reports 9(1):12173.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Caetano-Anoll�s, G., Aziz, F., Mughal, F., Gr�ter, F., Ko�, I., Caetano-Anoll�s, K. and Caetano-Anoll�s, D. 2019. Emergence of hierarchical modularity in evolving networks uncovered by phylogenomic analysis. Evolutionary Bioinformatics 15: 1176934319872980.
  • Type: Journal Articles Status: Published Year Published: 2019 Citation: Caetano-Anoll�s, D., Nasir, A., Kim, K.M. and Caetano-Anoll�s G. 2019. Testing empirical support for evolutionary models that root the tree of life. J. Molecular Evolution 87(2-3):131-142.
  • Type: Conference Papers and Presentations Status: Published Year Published: 2019 Citation: Caetano-Anoll�s, G., Mughal, F. and Aziz M.F. 2019. 73. A double tale of evolutionary accretion in the structure of biological networks. J. Biomolecular Structure and Dynamics 37(S1):46-47.


Progress 10/01/17 to 09/30/18

Outputs
Target Audience: Nothing Reported Changes/Problems: Nothing Reported What opportunities for training and professional development has the project provided?The project enables training of graduate and undergraduate students. How have the results been disseminated to communities of interest? Research advances that were generated were published in well reputed peer-reviewed journals, including Frontiers in Microbiology, Frontiers in Bioengineering and Biotechnology, Bioessays, Evolutionary Bioinformatics, Science Progress, Biochimie, and Briefings in Bioinformatics.PI Caetano-Anollés reported findings in several universities and venues, including the Evolution - Genetic Novelty/Genomic Variations by RNA Networks and Viruses Conferencethat took place inSalzburg, Austria. What do you plan to do during the next reporting period to accomplish the goals? Nothing Reported

Impacts
What was accomplished under these goals? We used the phylogenomic methodologies we have developed to advance the study of the dynamics of complex systems. Our initial effort focused on the study of the interface between gene expression and metabolic networks of plants subjected to stress (Objective 3). We also advanced our understanding of secreted proteins in bacteria, the exchange of genetic information between cellular organisms of the three superkingdoms of life, andthe genomic makeup of giant viruses. This knowledge will be needed to complete other objectives. (1)Metabolite-centric reporter pathway and tripartite network analysis of Arabidopsis under cold stress:The study of plant resistance to cold stress and the metabolic processes underlying its molecular mechanisms benefit crop improvement programs. Scientists have described biochemical changes that protect plant cells from damageand some of the genes controlling them. However, it is not clear how all the cellular processes involved in plant protection work together. Lacking this global view, plant breeders have struggled to engineer cold-tolerant crops. We investigated the effects of cold stress on the metabolic pathways of Arabidopsisthaliana, a small plant commonly studied to understand genetic and physiological processes. These effects were directly inferred at system level from transcriptome data. The strategy goes beyond the traditional approach of examining a single gene, protein, or biochemical pathway at a time. Instead, it simultaneously examines the entire collection of genes, metabolites, pathways, and reactions involved in the cold stress response.First, we coupled transciptome data and a database that annotates genes and gene products to identify significantly changed gene expressionat four time points of cold treatment. Second, a metabolite-centric reporter pathway analysis approach that we developed enabled the computation of metabolites significantly associated with transcripts. Third, tripartite networks of gene-metabolite-pathway connectivity outlined the response of metabolites and pathways to cold stress. Our metabolome-independent analysis revealed stress-associated metabolites in pathway routes of the cold stress response, including amino acid, carbohydrate, lipid, hormone, energy, photosynthesis, and signaling pathways. Cold stress first triggered the mobilization of energy from glycolysis and ethanol degradation to enhance TCA cycle activity via acetyl-CoA. Interestingly, tripartite networks lacked power law behavior and scale free connectivity, favoring modularity. Network rewiring explicitly involved energetics, signal, carbon and redox metabolisms and membrane remodeling. Our study contributes a possible route forward for plant breeders and biological engineers, though more research is required to determine if the pathways involved can be modified simultaneously. Specifically, the methodology will allow scientists to use systems biology tools to study metabolic reactions that populate important pathways, and collectively engineer enzymes to improve how plants respond to environmental insults. The use of complex networks that systematically link the activities of genes to relevant biological functions now open remarkable opportunities for genetic engineering and synthetic biology. (2)Phylogenetic profiling of secreted proteins translocated byspecialized bacterial secretion 'effector' systems: Proteins are secreted to the extracellular medium by free-living bacteria or directly injected into other competing organisms to hinder or kill. We explored an approach based on the evolutionary dependence that most of the effectors maintain with their specific secretion system to analyze the co-occurrence of any orthologous protein group and their corresponding secretion system across multiple genomes. We compared and complemented our methodology with sequence-based machine learning prediction tools for the type III, IV and VI secretion systems. Finally, we provided the predictive results for the three secretion systems in 1606 complete genomes at http://www.iib.unsam.edu.ar/orgsissec/.This study adds a much-needed new dimension to the protein secretion classification problem that is taxonomically unbiased and based on the concept of genome evolution. (3)Archaea-first and the co-evolutionary diversification of superkingdoms: The origins and evolution of the Archaea, Bacteria, and Eukarya remain controversial. Phylogenomic-wide studies of molecular features that are evolutionarily conserved, such as protein structural domains, suggest Archaea is the first superkingdom to diversify from a stem line of descent. This line embodies the last universal common ancestor of cellular life. We developed a model in which ancestors of Euryarchaeota co-evolved with those of Bacteria prior to the diversification of Eukarya. We found this co-evolutionary scenario was supported by comparative genomic and phylogenomic analyses of the distributions of fold families of domains in the proteomes of free-living organisms, which show horizontal gene recruitments and informational process homologies. We also found the modelbenefits from the molecular study of cell physiologies responsible for membrane phospholipids, methanogenesis, methane oxidation, cell division, gas vesicles, and the cell wall. Our theory however challenges popular cell fusion and two-domain of life scenarios derived from sequence analysis, demanding phylogenetic reconciliation. (4)Ancestrality and mosaicism of giant viruses supporting the definition of the fourth supergroup of life:Giant viruses of amoebae were discovered in 2003. Since then, their diversity has greatly expanded. They were suggested to form a fourth branch of life alongside Bacteria, Archaea, and Eukarya. Their origin and ancestrality remain controversial. Here, we specify the evolution and definition of giant viruses. Phylogenetic and phenetic analyses of informational gene repertoires of giant viruses and selected bacteria, archaea and eukaryota were performed, including structural phylogenomics based on protein structural domains grouped into 289 universal fold superfamilies (FSFs). Hierarchical clustering analysis was performed based on a binary presence/absence matrix constructed using 727 informational COGs from cellular organisms. The presence/absence of 'universal' FSF domains was used to generate an unrooted maximum parsimony phylogenomic tree. Comparison of the gene content of a giant virus with those of a bacterium, an archaeon, and a eukaryote with small genomes was also performed. Overall, both cladistic analyses based on gene sequences of very central and ancient proteins and on highly conserved protein fold structures as well as phenetic analyses were congruent regarding the delineation of a fourth branch of microbes comprised by giant viruses. Giant viruses appeared as a basal group in the tree of all proteomes. A pangenome and core genome determined for Rickettsia bellii (bacteria), Methanomassiliicoccus luminyensis (archaeon), Encephalitozoon intestinalis (eukaryote), and Tupanvirus (giant virus) showed a substantial proportion of Tupanvirus genes that overlap with those of the cellular microbes. In addition, a substantial genome mosaicism was observed, with 51, 11, 8, and 0.2% of Tupanvirus genes best matching with viruses, Eukarya, Bacteria, and Archaea, respectively. Finally, we found that genes themselves may be subject to lateral sequence transfers. In summary, our data highlight the quantum leap between classical and giant viruses. Phylogenetic and phyletic analyses and the study of protein fold superfamilies confirm previous evidence of the existence of a fourth supergroup of life that includes giant viruses, and highlight its ancestrality and mosaicism. They also point out that best evolutionary representations for giant viruses and cellular microorganisms are rhizomes, and that sequence transfers rather than gene transfers have to be considered.

Publications

  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Caetano-Anollés, K., Caetano-Anollés, D., Nasir, A., Kim, K.M. and Caetano-Anollés, G. 2018. Order and polarity in character state transformation models that root the tree of life. Biochimie 149:135-136. doi: 10.1016/j.biochi.2018.04.001.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Staley, J.T. and Caetano-Anollés, G. 2018. Archaea-first and the co-evolutionary diversification of domains of life. Bioessays 40(8):e1800036. doi: 10.1002/bies.201800036.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Bai, J., Sun, F., Wang, M., Su, L., Li, R. and Caetano-Anollés, G. 2018. Genome-wide analysis of the MYB-CC gene family of maize. Genetica, doi: 10.1007/s10709-018-0042-y.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Koç, I., Yuksel, I. and Caetano-Anollés G. 2018. Metabolite-centric reporter pathway and tripartite network analysis of arabidopsis under cold stress. Front. Bioeng. Biotechnol. 6:121. doi: 10.3389/fbioe.2018.00121.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Caetano-Anollés, D., Caetano-Anollés, K. and Caetano-Anollés, G. 2018. Evolution of macromolecular structure: A double tale of biological accretion and diversification. Sci. Prog. 101(4):360-383. doi: 10.3184/003685018X15379391431599.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Caetano-Anollés, G., Nasir, A., Kim, K.M. and Caetano-Anollés, D. 2018. Rooting phylogenies and the tree of life while minimizing ad hoc and auxiliary assumptions. Evol. Bioinform. 14:1176934318805101. doi: 10.1177/1176934318805101.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Colson, P., Levasseur, A., La Scola, B., Sharma, V., Nasir, A., Pontarotti, P., Caetano-Anollés, G. and Raoult, D. 2018. Ancestrality and mosaicism of giant viruses supporting the definition of the fourth TRUC of microbes. Front. Microbiol. 9:2668. doi: 10.3389/fmicb.2018.02668.
  • Type: Book Chapters Status: Published Year Published: 2017 Citation: Caetano-Anollés, G., Minhas, B.F., Aziz, F., Mughal, F., Shahzad, K., Tal, G., Mittenthal, J.E., Caetano-Anollés, D., Koç, I., Nasir, A., Caetano-Anollés, K. and Kim, K.M. 2017. The compressed vocabulary of the proteins of Archaea. In: G. Witzany (ed.), Biocommunication of Archaea. Springer, Dordretch, The Netherlands, pp. 147-264.
  • Type: Journal Articles Status: Published Year Published: 2018 Citation: Zalguizuri, A., Caetano-Anollés, G. and Lepek, V.C. 2018. Phylogenetic profiling, an untapped resource for the prediction of secreted proteins and its complementation with sequence-based classifiers in bacterial type III, IV and VI secretion systems. Brief. Bioinform. bby009, doi: 10.1093/bib/bby009.