TOPICS IN
COMPUTATIONAL MOLECULAR BIOLOGY
18.418
February 8, 2012CRISPR, adaptive immunity system in Archaea and Bacteria: Lamarckian evolution and a general model of evolution of environmental sensors
Eugene Koonin
NIH
Senior Investigator

The CRISPR-Cas adaptive immunity system is present in nearly all Archaea and about half of Bacteria. This system consists of arrays of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) and suits of CRISPR-Associated (cas) genes; the CRISPR cassettes contains unique spacers about 40 basepairs in length within each repeat unit. Some of the spacers are identical to fragments of viral or plasmid genes. It has been shown that Cas proteins provide enzymatic activities that are required for utilization of the spacer transcripts as guide RNAs to cleave and inactivate the cognate alien DNA and in some cases possibly mRNA. The CRISPR-Cas systems are encoded by operons that have extraordinarily diverse architectures and a high rate of evolution of both the cas genes and the unique spacer content. Three complementary approaches to the study of CRISPR evolution will be presented. First, comprehensive analysis of the sequences and structures of Cas proteins using the most sensitive methods of computational analysis yielded a simple scenario for the origin and evolution of the CRISPR-Cas systems that implies the origin of prokaryotic adaptive immunity in thermophilic Archaea. Second, a comprehensive analysis of the selection processes that act on cas genes revealed a gradient from moderate to extremely weak purifying selection across the cas gene suite. Third, a mathematical model based on a cost-benefit analysis of the CRISPR-Cas system in the course of its coevolution with viromes of varying diversity was developed. Exploration of the parameter space of this model shows that selection prevents the loss of the CRISPR-Cas system within an interval of moderate viral diversity. At both very low and very high viral diversity, CRISPR-Cas systems become practically useless for bacteria and archaea, and are lost due to their intrinsic cost. This model has more general applications for the evolution of various environmental sensors. The CRISPR-Cas systems that incorporate new information into a genome in response to environmental cues seem to present a case of bona fide Lamarckian evolution.

February 15, 2012TBA
Erez Lieberman-Aiden
Harvard

New structures often emerge when we explore a known phenomenon from a more global vantage point. For instance, any given book can be read and comprehended. But what happens when we try to read all the books at once? Or: the local structure of DNA is a double helix. But if DNA did not fold further, the human genome - which is two meters long - could never fit inside the nucleus of a cell. How does it fold? This talk will focus on the extraordinary potential of technologies that enable us to zoom out, in the process transforming familiar concepts, like the contents of a book or the shape of DNA, into new research horizons. First, I will describe efforts, together with my collaborator Jean-Baptiste Michel and Google, to create tools for the quantitative analysis of a significant portion of the historical record. We began by constructing a reliable corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. 'Culturomics' provides insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. The Google Ngram Viewer, a simple web-based tool we released for the analysis of this corpus, was used over a million times in the first 24 hours. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. In the second half of my talk, I will describe Hi-C, a novel technology for probing the three-dimensional architecture of whole genomes. Developed together with collaborators at the Broad Institute and UMass Medical School, Hi-C couples proximity-dependent DNA ligation and massively parallel sequencing. My lab employs Hi-C to construct spatial proximity maps of the human genome. Hi-C maps have revealed that active and inactive portions of the human genome are spatially segregated, ie, that cells employ a sort of 'regulatory origami' as they turn genes on and off. At the megabase scale, the genomic fold is consistent with a fractal globule, a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus.

February 22, 2012TBA
February 29, 2012Shape Shifting: protein statistical physics as a linear programming problem
Jeremy England
MIT http://www.mit.edu Since a protein's shape typically provides the basis for its function, the conformational rearrangements of proteins in response to ligand binding, mutation, and covalent modification very often underlie biologically important molecular events, whether in the normal course of transducing a signal or through deleterious misfolding. A new analytical model of how structure depends on sequence enables us to use linear programming to examine many of these phenomena from the standpoint of statistical mechanics, so that we may begin to predict and explain specific changes in protein structure ranging from allosteric motion to the onset of aggregation disease.
March 7, 2012Reducing microbial unemployment: functional roles in the human microbiome
Curtis Huttenhower
Harvard

Among many surprising insights, the genomic revolution has helped us to realize that we're never alone and, in fact, barely human. For most of our lives, we share our bodies with some ten times as many microbes as human cells; these are resident in our gut and on nearly every body surface, and they are responsible for a tremendous diversity of metabolic activity, immunomodulation, and intercellular signaling. In order to understand these microbes' relationship with their hosts, however, we must establish how homeostasis is maintained in health or disregulated in disease. I will present an overview of microbial metabolism and function core to the healthy human microbiome and a survey of microbes that cooperate and compete to fulfill these metabolic roles. Since even bacteria within the same "species" regularly carry strikingly different genomes, it is critical to identify community membership at the species or strain level whenever possible. Finally, I will discuss how metabolic function normally present in the gut microbiota is disrupted in inflammatory diseases such as Crohn's and ulcerative colitis.

March 14, 2012Protein structure determination using model fragments from the Protein Data Bank
Ian Stokes Rees
Harvard http://www.harvard.edu The Protein Data Bank (PDB) contains over 70,000 macromolecular structures with atomic coordinates. Most of these models have been established using X-ray crystallography and a technique called Molecular Replacement (MR), used to bootstrap the model phasing data which is lost in the diffraction data. MR relies on a good selection of a model fragment, usually covering 15-40% of the unknown structure, that has sufficient structural homology for initial phase estimates to be calculated. Certain structures do not have known structural homologs, thus making MR difficult or impossible, and in other cases known (or anticipated) structural homologs do not provide successful phasing through the MR process. By harnessing US national cyberinfrastructure we have demonstrated that searches for good MR template models can be expanded to the entire fragment database derived from the PDB, the Structural Classification of Proteins, consisting of over 100,000 domains. Such a search can consume over 50,000 hours of computing, but has been shown to identify models suitable for phasing where no other technique had previously succeeded. This can also overcome challenges of low sequence identity and small model fragments (<10% of the structure). (PNAS Vol 107, 2010 DOI:10.1073/pnas.1012095107)
© MIT 2015