TOPICS IN
COMPUTATIONAL MOLECULAR BIOLOGY
18.418
February 3, 2010Introduction to 18.418
Bonnie Berger
Bonnie Berger
Massachusetts Institute of Technology
Professor, Applied Mathematics

An introduction to the course.

February 8, 2010Class

Overview of data used in computational biology and a brief discussion of the paper.

February 10, 2010Cancelled Due to Snow
February 16, 2010Class

Unfortunately the the previous Wednesday had to be moved to a later date, but we will read one of Dana Pe'er's papers and discuss it in class. The presenters are:

  • Defendant: Mark Lipson
  • Committee: Sara Sheehan

February 17, 2010Network Medicine: From Cellular Networks to the Human Diseasome
Albert-Laszlo Barabasi
Albert-Laszlo Barabasi
Northeastern University and Harvard Medical School
Emil T. Hofman Professor
Director, Center for Complex Network Research

The ultimate goal of understanding sub-cellular networks is to gain insights into the normal cellular functions, and understand the microscopic nature of perturbations that could lead to human diseases. A network of disorders and disease genes linked by known disorder-gene associations offers a platform to explore in a single graph-theoretic framework all known phenotype and disease gene associations, indicating the common genetic origin of many diseases. We find that the vast majority of disease genes are nonessential and show no tendency to encode hub proteins, and their expression pattern indicates that they are localized in the functional periphery of the network. We also study the evolution of patient illness using a network summarizing the disease associations extracted from 32 million Medicare claims, demonstrating that the cellular level links between disease causing proteins are amplified in the population as comorbodity patterns.

February 22, 2010Class

Please read the seminal disease-network paper below (the first paper). The presenters are:

  • Defendant: Angela Yen
  • Committee: George Tucker

February 24, 2010. Starting at 11amVisualizing the drug-target landscape
Enoch Huang
Enoch Huang
Pfizer Inc.
Executive Director, Head of Computational Sciences, Pfizer Research Centers of Emphasis
Boston University
Adjunct Assistant Professor, Bioinformatics

Stephen Campbell
Pfizer Inc.

Generating new therapeutic hypotheses for human disease requires the analysis and interpretation of many different experimental datasets. Assembling a holistic picture of the current landscape of drug discovery activity remains a challenge, however, because of the lack of integration between biological, chemical and clinical resources. Although tools designed to tackle the interpretation of individual data types are abundant, systems that bring together multiple elements to directly enable decision making within drug discovery programmes are rare. In this article, we review the path that led to the development of a knowledge system to tackle this problem within our organization and highlight the influences of existing technologies on its development. Central to our approach is the use of visualization to better convey the overall meaning of an integrated set of data including disease association, druggability, competitor intelligence, genomics and text mining. Organizing such data along lines of therapeutic precedence creates clearly distinct "zones" of pharmaceutical opportunity, ranging from small-molecule repurposing to biotherapeutic prospects and gene family exploitation. Mapping content in this way also provides a visual alerting mechanism that evaluates new evidence in the context of old, reducing information overload by filtering redundant information. In addition, we argue the need for more tools in this space and highlight the role that data standards, new technologies and increased collaboration might have in achieving this aim.

March 1, 2010Class

Please read the paper that pertains to the previous talk. You will need MIT certificates and use libproxy.mit.edu to view this paper. Once you follow the link, you would think that the PDF link would appear on that page, but it does not. You need to go to the table of contents for the journal issue. Do this by clicking the text "Volume 15, Issues 1-2" that appears several lines above the article's title. The PDF version of the paper will then be listed under "Reviews", titled "Visualizing the drug target landscape".

The presenters are:

  • Defendant 1: Sara Sheehan
  • Defendant 2: Minh Huynh-Le
  • Committee: Alex Levin

March 3, 2010Driving Mutations: Lessons from Yeast and Cancer
Dana Pe'er
Dana Pe'er
Columbia University
Assistant Professor, Biological Sciences

In this talk we will discuss a new perspective on question "How does variation in genotype encode for phenotypic diversity?" We will discuss methods that harness gene expression to identify genetic variants that influence a trait of interest. Our premise is that much of the influence of genotype on phenotype is mediated by changes in the regulatory network and these can be inferred using gene expression. We will discuss 3 vignettes: 1) We will demonstrate how biological modularity can be used to gain significant statistical power for linkage analysis of eQTLs (expression Quantitative Trait Loci). This will provide a refined map of linkages that provides new insight into non-additivity in genetic interactions and the influence of genetic variation on how a cell sees its environment. 2) Camelot, an algorithm method that integrates genotype and gene expression collected in a reference condition (un-drugged) and phenotype data to predict complex quantitative phenotypes in entirely different conditions (drug response) and identify causal genes that influence these traits. We will discuss why gene expression gives such a large boost in power. 3) Conexic, a novel Bayesian Network-based framework to integrate chromosomal copy number and gene expression data to detect genetic alterations in tumors that drive proliferation, and to model how these alterations perturb normal cell growth/survival. We demonstrate how this method can uncover an important role of protein trafficking in Melanoma.

March 8, 2010Class

Since we already read one of Dana's papers, we will be covering another drugability related paper. To access this paper you will need to have MIT certificates installed. The presenters are:

  • Defendant: Eric Eisner
  • Committee: Angela Yen

March 10, 2010Computational inference of tumor heterogeneity for cancer phylogenetics
Russell Schwartz
Russell Schwartz
Carnegie Mellon University
Associate Professor, Department of Biological Science and Computer Science

While cancer can in theory develop from a seemingly infinite variety of combinations of mutations, in practice most tumors seem to fall into a relatively small number of recurring sub-types characterized by roughly equivalent sequences of genetic abnormalities by which healthy cells progress into increasingly aggressive tumors. This observation raises the hope that identifying these common sub-types and their defining genetic features will lead to new prognostic markers and drug targets. One promising approach to this problem is "tumor phylogenetics": treating tumors as evolving populations and analyzing their likely evolutionary pathways through phylogenetic algorithms. Two main variants of this approach have been proposed: a tumor-by-tumor approach, in which one treats each observed tumor in a population as a possible end state in a phylogenetic tree or network; and a cell-by-cell approach, in which one examines differences between individual cells in a tumor sample to build trees explaining variation both within and between tumors. The latter approach has the advantage of allowing one access to information about within-tumor heterogeneity that can provide important clues about conserved pathways of tumor progression, but at the cost of allowing one to examine only a few markers of state per cell versus the genome-wide markers sets one can apply to samples of whole tumors or significant sub-regions thereof.

Here, we will examine recent work intended to give us many of the advantages of each of the two approaches to tumor phylogenetics. This work uses computational inferences of within-tumor heterogeneity to infer cell-by-cell progression from tissue-wide measures of tumor state. The approach builds on the use of "unmixing" methods that allow us to treat each tumor as a mixture of fundamental cell states and computationally infer the states and their usage in each tumor. We will see how one can pose tumor unmixing as a problem in computational geometry. We will then examine algorithms for solving this problem in the presence of noisy, high-dimensional assays of tumor state. Finally, we will see how the resulting mixture models can be applied for phylogenetic studies on development. The methods will be illustrated by application to genome-wide expression and DNA copy number data sets from lung and breast tumors. The results of these studies show the promise of computational inferences as a way of gaining the advantages of both genome-wide assays of tumor state and within-tumor heterogeneity to develop a more complete picture of the common mechanisms of tumor progression.

Joint work with Stanley Shackney, Ayshwarya Subramanian, David Tolliver, and Charalampos Tsourakakis

March 15, 2010Class

The presenters are:

  • Defendant 1: Eric Price
  • Defendant 2: George Tucker
  • Committee: Mark Lipson

March 17, 2010Recent results on RNA
Peter Clote
Peter Clote
Boston College
Professor, Department of Biology and Department of Computer Science
Ecole Polytechnique & Universite Paris-Sud
Digiteo Chair, Computer Science

In this talk, we present new results concerning RNA structure newly obtained by our groups in Boston and Paris. (1) We describe a quadratic time and linear space segmentation algorithm for RNA secondary and tertiary structure, with applications to localization of genes within a high scoring window of a gene finder. (2) We describe a TABU (local search) algorithm that determines near optimal folding pathways between two given RNA secondary structures. Since this problem is known to be NP-complete, there is an interest in developing such efficient approximation algorithms. (3) We describe a novel implementation of non-Boltzmannian sampling algorithm for RNA secondary structures with several applications.

This work is joint with I. Dotu, F. Lau, W.A. Lorenz, P. Van Hentenryck.

March 22, 2010Spring Break! No Class.
March 24, 2010Spring Break! No Class.
March 29, 2010Class

The presenters are:

  • Defendant: Timur Zhiyentayev
  • Committee: Robert Chen

March 31, 2010Graphical models of interacting proteins
Chris Bailey-Kellogg
Chris Bailey-Kellogg
Dartmouth University
Associate Professor, Computer Science

When studying a particular class of protein-protein interactions (e.g., PDZ domains and their ligands, or serine proteases and their inhibitors), a central goal is to model the underlying determinants of molecular recognition in a manner supporting prediction of and design for affinity and specificity. We have been addressing that goal by developing probabilistic graphical models for interacting proteins, one model centered on sequence information and one on structural information. Our sequence-based model integrates multiple sequence alignments and information about which pairs of proteins are known to interact, in order to extract residue "cross-coupling" constraints and thereby evaluate the plausibility of new interactions. We show that our approach uncovers biologically interesting constraints, successfully identifies known interactions, and makes explainable predictions about novel interactions. Our structure-based model employs molecular modeling and evaluates possible side-chain conformations, in order to compute partition functions and thereby predict changes in free energy of association for variants. We demonstrate that our approach is fast and accurate, and improves prediction error relative to gold-standard approaches due to its explicit computation of free energies.

Joint work with Alan Friedman (Purdue), Chris Langmead (CMU), and Naren Ramakrishnan (Va. Tech.)

April 5, 2010Class

The presenters are:

  • Defendant: Nur Shahir
  • Committee 1: Janice Jang
  • Committee 2: Boling Jiang

April 7, 2010Computational Epidemiology
John Brownstein
John Brownstein
Harvard Medical School
Assistant Professor of Pediatrics
Director, Computational Epidemiology Group

The rapid global reach in telecommunications has permitted public health professionals to communicate more effectively. In particular, internet-based resources such as discussion sites and online news sources, accessible through free and unrestricted subscription, are valuable sources of information. These data also exemplify unprecedented potential for increasing public awareness on public health issues and early warning of disease prior to their widespread recognition. However, despite an abundance of disparate electronic resources, none is comprehensive. Each has geographic, population and expertise gaps. There is lack of integration between tools and information sources and the output of these systems often comes as unstructured free text. Today, I will discuss the current capabilities and future directions in the use of the non-traditional data sources for the purposes of public health surveillance. I will specifically discuss the application of Web 2.0 and mobile phones as new approaches to rapid detection of emerging infectious diseases. In particular, I will describe a system, Healthmap.org, a free and open resource that attempts to address these issues by enhancing surveillance of infectious diseases through integration. Drawing from recent examples such as the H1N1 pandemic and the Haiti earthquake, I will demonstrate how new surveillance technology is providing early warning and tracking of new and emerging public health threats.

April 12, 2010Class

The presenters are:

  • Defendant: Chris Lin
  • Committee: Eric Eisner

April 14, 2010Characterization of Somatic Mutations in Cancer Genomes
Ben Raphael
Ben Raphael
Brown University
Assistant Professor, Computer Science and the Center for Computational Molecular Biology

Cancer is a disease that is driven by somatic mutations that accumulate in the genome during an individual’s lifetime. Recent advances in DNA sequencing technology are enabling genome-­‐wide measurements of these mutations in numerous cancers. I will discuss algorithmic approaches for two problems that arise in cancer genome analysis. The first problem is the inference of somatic mutations from the short sequences produced by current DNA sequencing technologies. Somatic mutations in cancer occupy a continuum of scales ranging from single nucleotide mutations through structural rearrangements of large blocks of DNA sequence. I will describe an algorithm for classification and comparison of structural rearrangements using paired-­‐read DNA sequencing data. The second problem is to distinguish functional mutations that drive cancer progression from neutral “passenger” mutations. Recent cancer sequencing studies have shown that somatic mutations are distributed over a large number of genes. This mutational heterogeneity is due in part to the fact that somatic mutations target cellular signaling and regulatory pathways, and that a mutation in dozens of possible genes might be sufficient to perturb a pathway. While some of these pathways are well characterized, many others are only approximately known. This approximate information is represented as an interaction network, a graph whose nodes are genes and whose edges represent biological interactions between genes. I will describe HotNet, an algorithm to identify subnetworks of an interaction network that are mutated in a significant number of cancer genomes. HotNet models mutations as heat sources and employs a diffusion process on the interaction network to find “hot subnetworks.” We also derive a statistical test to rigorously assess whether the number of hot subnetworks is significant under a suitable null hypothesis. I will illustrate applications of these algorithms to data from The Cancer Genome Atlas, a project that is characterizing the genomes of thousands of samples from dozens of cancer types.

April 19, 2010Patriot's Day! No Class.

Although there is no class due to the Patriot Day holiday, please make sure to read and submit a summary for the following paper by class time on Wednesday, April 12.

April 21, 2010Preventing the Incidentalome and Practicing a Responsible Personalized Medicine.
Isaac Kohane
Isaac Kohane
Children's Hospital, Boston
Chair, Informatics Program
Harvard University
Lawrence J. Henderson Professor of Pediatrics and Health Sciences and Technology

The promise of genomic medicine includes personalizing diagnoses and therapies, if not to the level of the individual, then to a small population of individuals with shared pathophysiology. The availability of commodity-priced genome-scale assays has led to increased popular demand and expectation for the application of these assays to clinical care. Yet there are several structural impediments to the safe practice of genomic medicine, all of which fall within the domain of biomedical informatics. These include A) the growth of the Incidentalome, the tsunami of false positives that inevitably result from application of massively parallel tests, B) the lack of systematic interpretations of genomic tests and evaluation of their performance. C) The absence of a mechanism to transfer the growing knowledge of genomics to the physician at the point of care. D) The increasingly blurry line between clinical research and clinical care. Fortunately there are several developments in biomedical informatics that address these impediments. Foremost among these is the movement to treat subjects as full and autonomous partners in research collaborations even as they continue to be treated as patients and secondarily the industrialization of phenotyping and sample acquisition methodologies to match the efficiencies of genome-scale measurements. I will review both the challenges and leading exemplars of the solutions brought to bear.

April 26, 2010Class

Please read and summarized both of these papers. The presenters are:

  • Defendant 1: Janice Jang
  • Defendant 2: Boling Jiang
  • Committee: Nur Shahir

April 28, 2010Remote Homology Detection: Beyond Hidden Markov Models
Lenore Cowen
Lenore Cowen
Tufts University
Professor, Computer Science

Profile Hidden Markov Models remain one of the best general tools to recognize proteins that fold into the same structural motif as a solved protein structure. We look at two different approaches to generalize them to do better at predicting remote protein homologs. The first incorporates evolutionary information in a novel way to construct a better artificial training set. The second is a framework for incorporating pairwise preferences in beta-sheet formation together with HMM models into a Markov Random Field. We show how the second framework can be used to better predict beta-propeller folds, and identify an interesting family of bacterial hybrid two-component sensor system proteins whose N-terminal region we predict contains a double beta-propeller.

May 3, 2010Class

The presenters are:

  • Defendant: Robert Chen
  • Committee: Minh Huynh-Le

May 5, 2010Genetic Circuits
Ron Weiss
Ron Weiss
Massachusetts Institute of Technology
Department of Biological Engineering
Department of Electrical Engineering and Computer Science

Synthetic biology is revolutionizing how we conceptualize and approach the engineering of biological systems. Recent advances in the field are allowing us to expand beyond the construction and analysis of small gene networks towards the implementation of complex multicellular systems with a variety of applications. In this talk I will describe our integrated computational / experimental approach to engineering complex behavior in living systems ranging from bacteria to stem cells. In our research, we appropriate useful design principles from electrical engineering and other well established fields. These principles include abstraction, standardization, modularity, and computer aided design. But we also spend considerable effort towards understanding what makes synthetic biology different from all other existing engineering disciplines and discovering new design and construction rules that are effective for this unique discipline.

We will briefly describe the implementation of genetic circuits with finely-tuned digital and analog behavior and the use of artificial cell-cell communication to coordinate the behavior of cell populations for programmed pattern formation. Recent results with implementing Turing patterns with engineering bacteria will be presented. Arguably the most significant contribution of synthetic biology will be in medical applications such as tissue engineering. We will discuss preliminary experimental results for obtaining precise spatiotemporal control over stem cell differentiation. For this purpose, we couple elements for gene regulation, cell fate determination, signal processing, and artificial cell-cell communication. We will conclude by discussing the design and preliminary results for creating an artificial tissue homeostasis system where genetically engineered stem cells maintain indefinitely a desired level of pancreatic beta cells despite attacks by the autoimmune response. The system, which relies on artificial cell-cell communication, various regulatory network motifs, and programmed differentiation into beta cells, may one day be useful for the treatment (or cure) of diabetes.

May 10, 2010Class

The presenters are:

  • Defendant: Eric Price
  • Committee: George Tucker

© MIT 2015