February 2, 2011 | Introduction to 18.418 |
Bonnie Berger Massachusetts Institute of Technology Professor, Applied Mathematics An introduction to the course. |
February 7, 2011 | Class |
February 9, 2011 | Discriminating coding and non-coding RNAs using comparative sequence analysis |
Stefan Washietl Massachusetts Institute of Technology Posdoctoral Fellow, Kellis Group, CSAIL In my talk, I will first briefly review challenges and the current state-of-the-art for genome-wide annotation of non-coding RNAs. To accurately locate non-coding RNAs in a genome it turned out to be critical to know what parts are actually coding. Although there are many sophisticated protein gene finders and very good annotations exist for most model organisms, there are also ambiguous and non-standard situations in which these programs fail. We have therefore developed a new algorithm called "RNAcode", a program to detect coding regions in multiple sequence alignments that is optimized for emerging applications not covered by current protein gene finding software. Our algorithm combines evolutionary information from nucleotide substitution and gap patterns in a unified framework and also deals with real-life issues such as alignment and sequencing errors. It uses an explicit statistical model with no machine learning component and can therefore be applied "out of the box", without any training, to data from all domains of life. I will demonstrate how RNAcode was used in combination with mass spectrometry experiments to predict and confirm seven novel short peptides in E. coli that have evaded annotation so far. As another example of a typical application, I will show how RNAcode can be used together with the structural RNA gene finder RNAz to study ambiguous cases of dual function genes that function on both the RNA and protein level. |
February 14, 2011 | Class |
February 16, 2011 | The evolution of eusociality |
Corina Tarnita Harvard University Junior Fellow, Society of Fellows Eusociality, in which some individuals reduce their lifetime reproductive potential to raise the offspring of others, underlies the most advanced forms of social organization and the ecologically dominant role of social insects. For the past four decades, kin selection theory, based on the concept of inclusive fitness, has been the major theoretical attempt to explain the evolution of eusociality. In this talk I propose that standard natural selection theory in the context of precise models of population structure represents a simpler and superior approach, allows the evaluation of multiple competing hypotheses, and provides an exact framework for interpreting empirical observations. |
February 21, 2011 | Class |
February 23, 2011 | Probabilistic Graphical Model for Protein Structure Prediction |
Jinbo Xu Toyota Technological Institute at Chicago Assistant Professor If we know the primary sequence of a protein, can we predict its three‐ dimensional structure by computational methods? This is one of the most important and difficult problems in computational molecular biology and has tremendous implications for protein functional study and drug discovery. Existing computational methods for protein structure prediction can be broadly classified into two categories: template‐based modeling (i.e., protein threading/homology modeling) and template‐free modeling (i.e., ab initio folding). Template‐based modeling predicts structure of a protein using experimental structures in the Protein Data Bank (PDB) as templates while template‐free modeling predicts protein structure without depending on a template. This talk will present new probabilistic graphical models for knowledge‐based protein structure prediction. In particular, this talk will present a regression‐tree‐ based Conditional Random Fields (CRF) method for template‐based modeling and a Conditional Random Fields/Conditional Neural Fields (CRF/CNF) method for template‐free modeling. Experimental results indicate that our template‐ based method performs extremely well, especially on hard template‐based modeling targets and our template‐free method is also very promising for mainly‐alpha proteins. |
February 28, 2011 | Class |
March 2, 2011 | Ensemble Predictions of beta-sheet Protein Structures |
Jerome Waldispuhl McGill University Assistant Professor In this talk, I will describe my work in the area of protein structure prediction. I will introduce new ensemble modeling techniques which can analyze and predict an entire landscape of structural solutions, rather than simple single answer optimizations. This philosophy has a broad impact on our understanding of protein folding properties. |
March 9, 2011 | Evolutionary dynamics of cancer |
Franziska Michor Dana-Farber Cancer Institute and Harvard School of Public Health Associate Professor |
March 16, 2011 | Information from Networks |
Leonid Chindelevitch Pfizer The networks describing the interaction between different biological entities can yield a lot of interesting information if analyzed properly. This talk will describe the analysis of two kinds of networks: metabolic networks and causal regulatory networks. We will construct mathematical models to ask questions of each kind of network, describe the algorithms required to provide answers, and finally discuss the kind of biological insights that arise from this analysis. |
March 23, 2011 | Spring Break |
March 30, 2011 | How do cells pack their DNA, and why do we care about it |
Leonid Mirny |
April 6, 2011 | Evidence of abundant stop codon readthrough in Drosophila and other metazoa |
Irwin Jungreis Massachusetts Institute of Technology Research Scientist, Kellis Lab Abstract: When encountering the stop codons of certain genes, ribosomes will insert a standard amino acid and continue translating, instead of stopping. While such stop codon readthrough occurs in many viral genomes, it has been observed for only a handful of eukaryotic genes. In 2007, Mike Lin found comparative genomics evidence that for 149 Drosophila genes the open reading frame following the stop codon is protein-coding, hinting that stop codon readthrough might be common in Drosophila. We have applied a wealth of bioinformatics techniques and genome-wide data sets to:
|
April 13, 2011 | Techniques for the analysis of ancient DNA |
Nick Patterson Broad Institute Two papers published last year described the analysis of DNA of Neandertals found in Vindija Cave, Croatia and DNA of a hominin from Denisova Cave, Siberia. I briefly describe the main results, but then go into more detail on the analysis which uses some novel methodology. |
April 20, 2011 | Dimensionality reduction in the analysis of human genetics data |
Petros Drineas Rensselaer Polytechnic Institute Dimensionality reduction algorithms (either deterministic or randomized) have been widely used for data analysis in numerous application domains, including the study of human genetics. For instance, linear dimensionality reduction techniques (such as Principal Components Analysis) have been extensively applied in population genetics. In this talk we will discuss such applications and their implications for human genetics, as well as the potential of applying non-linear or supervised dimensionality reduction techniques in this area |
April 27, 2011 | Modeling Intrinsically Disordered Proteins |
Collin Stultz MIT A number of neurodegenerative disorders such as Alzheimer’s disease and Parkinson’s disease involve the formation of protein aggregates. The primary constituent of these aggregates belongs to a unique class of heteropolymers called intrinsically disordered proteins (IDPs). While many proteins fold to a unique conformation that is determined by their amino acid sequence, IDPs do not adopt a single well-defined conformation in solution. Instead they populate a heterogeneous set of conformers under physiological conditions. Nevertheless, despite this intrinsic propensity for disorder, a number of these proteins can form ordered aggregates both in vitro and in vivo. As the formation of these aggregates may play an important role in disease pathogenesis, a detailed structural characterization of these proteins and their mechanism of aggregation is of critical importance. One problematic issue is that the characterization of intrinsically disordered proteins is quite challenging because accurate models of these systems require a description of both their thermally accessible conformers and the associated relative stabilities or weights. These structures and weights are typically chosen such that calculated ensemble averages agree with some set of prespecified experimental measurements; however, the large number of degrees of freedom in these systems typically leads to multiple conformational ensembles that are degenerate with respect to any given set of experimental observables. In this talk I will discuss a method for modeling these systems that is based on Bayesian statistics. A unique and powerful feature of the approach is that it provides a built-in error measure that allows one to assess the accuracy of the resulting ensemble. We apply the method to the intrinsically disordered proteins, tau protein and alpha synuclein, which have been implicated in the pathogenesis of Alzheimer’s disease and Parkinson’s disease, respectively. The models reveal specific patterns of long-range contacts that may play a role in the aggregation process. |
May 4, 2011 | Liability threshold modeling increases power in case-control association studies |
Alkes Price Harvard University Genetic case-control association studies often include data on covariates, such as body mass index (BMI) or age, that may modify the underlying genetic risk of case or control samples. For example, in type 2 diabetes, odds ratios estimated from low-BMI cases are larger than those estimated from high-BMI cases. An unanswered question is how to optimally use this information to maximize statistical power. In this study we show via simulation that our approach to fitting liability threshold models and computing association statistics, which accounts for disease prevalence and non-random ascertainment, can use this information to increase power. Our method outperforms standard case-control association tests, case-control tests with covariates, tests of gene x covariate interaction, and tests that restrict to a subset of samples. We investigate empirical case-control studies of type 2 diabetes, prostate cancer, breast cancer, rheumatoid arthritis, age-related macular degeneration, and end-stage kidney disease over a total of 78,256 samples. In these data sets, liability threshold modeling outperforms logistic regression for 104 of the 140 known associated variants investigated (p-value < 10-9). The improvement varied across diseases with a 17% median increase in test statistics, corresponding to a greater than 25% increase in power. Application of liability threshold modeling to future case-control association studies of these diseases, or other diseases with analogous effects of covariates on genetic risk, will yield a substantial increase in power for disease gene discovery. |
May 11, 2011 | Mona Singh |