麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Organization of topics

791/7.36/BE490 Lecture #4 Mar.4,2004 Markov hidden markov models for DNA Sequence Analysis Chris burge
7.91 / 7.36 / BE.490 Lecture #4 Mar. 4, 2004 Markov & Hidden Markov Models for DNA Sequence Analysis Chris Burge

Organization of topics Dependence Lecture Object Model Structure 5678910111 Weight 3/2 Matrix Independence G SECCAA Model 10201908060.9020.000.0ot Hidden Markov Local 3/4 Dependence Pramater Stop ass 3ss Model 3/9 Energy model Non-local I Covariation ModelDependence Anticodon
Organization of Topics Model Dependence Lecture Object Structure Weight Matrix Model Hidden Markov Model 3/2 Independence Local 3/4 Dependence Energy Model, Covariation Model Non-local Dependence 3/9

Markov Hidden markov models for dna Markov Models for splice sites Hidden Markov models looking under the hood The Viterbi algorithm Real World HMMs See Ch, 4 of Mount
Markov & Hidden Markov Models for DNA • Hidden Markov Models - looking under the hood See Ch. 4 of Mount • Markov Models for splice sites • The Viterbi Algorithm • Real World HMMs

Review of DNA Motif Modeling Discovery WMMs for splice sites Information Content of a motif The Motif Finding/Discovery Problem The Gibbs Sampler TThe gibbs Sampling Algorithm Multimedia Experience Motif Modeling -beyond Weight Matrices See Ch, 4 of Mount
Review of DNA Motif Modeling & Discovery • Information Content of a Motif See Ch. 4 of Mount • WMMs for splice sites • The Motif Finding/Discovery Problem • The Gibbs Sampler • Motif Modeling - Beyond Weight Matrices

Information content of a dna/rna motif 3-2-1:123456 f, freq of nt k at position G 2Ml GeEc Shannon Entropy yH(O)=∑ flog, ( f)(ity Information/position )=2-H(0=2+2/log()=/log) Motif containing m bits of info. Will occur approximately once per 2 bases of random sequence
Information Content of a DNA/RNA Motif -3 -2 -1 1 2 3 4 5 6 f k = freq. of nt k at position Shannon Entropy H( G f ) = − ∑ f log 2( f k k ) (bits) k Information/position ) = 2 + ∑ f log 2 ( f ) = ∑ f log 2( 1 f k ) k k k (bits) k k 4 G f G I( f ) = 2 − H( Motif containing m bits of info. will occur approximately once per 2 m bases of random sequence

Variables Affecting Motif Finding gcggaagagggcactagcccatgtgagagggcaaggacca atctttctcttaaaaataacataattcagggccaggatgt gtcacgagctttatcctacagatgatgaatgcaaatcagc taaaagataatatcgaccctagcgtggcgggcaaggtgct gtagattcgggtaccgttcataaaagtacgggaatttcgg L avg sequence length tatacttttaggtcgttatgttaggcgagggcaaaagtca ctctgccgattcggcgagtgatcgaagagggcaatgcctc aggatggggaaaatatgagaccaggggagggccacactgc acacgtctagggctgtgaaatctctgccgggctaacagac N=no of sequences gtgtcgatgttgagaacgtaggcgccgaggccaacgctga atgcaccgccattagtccggttccaagagggcaactttgt gcgggcggcccagtgcgcaacgcacagggcaaggttta= info content of motif gtcgcctaccctggcaattgtaaaacgacggcaatgttcg cgtattaatgataaagaggggggtaggaggtcaactcttc aatgcttataacataggagtagagtagtgggtaaactacg tctgaaccttctttatgcgaagacgcgagggcaatcggga W=motif width tgcatgtctgacaacttgtccaggaggaggtcaacgactc cgtgtcatagaattccatccgccacgcggggtaatttgga tcccgtcaaagtgccaacttgtgccggggggctagcagct acagcccgggaatatagacgcgtttggagtgcaaacatac acgggaagatacgagttcgatttcaagagttcaaaacgtg cccgataggactaataaggacgaaacgagggcgatcaatg ttagtacaaacccgctcacccgaaaggagggcaaatacct agcaaggttcagatatacagccaggggagacctataactc gtccacgtgcgtatgtactaattgtggagagcaaatcatt
Variables Affecting Motif Finding gcggaagagggcactagcccatgtgagagggcaaggacca atctttctcttaaaaataacataattcagggccaggatgt gtcacgagctttatcctacagatgatgaatgcaaatcagc taaaagataatatcgaccctagcgtggcgggcaaggtgct gtagattcgggtaccgttcataaaagtacgggaatttcgg L = avg. sequence length tatacttttaggtcgttatgttaggcgagggcaaaagtca ctctgccgattcggcgagtgatcgaagagggcaatgcctc aggatggggaaaatatgagaccaggggagggccacactgc acacgtctagggctgtgaaatctctgccgggctaacagac N = no. of sequences gtgtcgatgttgagaacgtaggcgccgaggccaacgctga atgcaccgccattagtccggttccaagagggcaactttgt ctgcgggcggcccagtgcgcaacgcacagggcaaggttta tgtgttgggcggttctgaccacatgcgagggcaacctccc I = info. content of motif gtcgcctaccctggcaattgtaaaacgacggcaatgttcg cgtattaatgataaagaggggggtaggaggtcaactcttc aatgcttataacataggagtagagtagtgggtaaactacg tctgaaccttctttatgcgaagacgcgagggcaatcggga W = motif width tgcatgtctgacaacttgtccaggaggaggtcaacgactc cgtgtcatagaattccatccgccacgcggggtaatttgga tcccgtcaaagtgccaacttgtgccggggggctagcagct acagcccgggaatatagacgcgtttggagtgcaaacatac acgggaagatacgagttcgatttcaagagttcaaaacgtg cccgataggactaataaggacgaaacgagggcgatcaatg ttagtacaaacccgctcacccgaaaggagggcaaatacct agcaaggttcagatatacagccaggggagacctataactc gtccacgtgcgtatgtactaattgtggagagcaaatcatt …

How is the 5'ss recognized? U1 SnRNA CCAUUCAUAG-5 1|| Pre-mRNA 5 UUCGUGAGU c G ≤
How is the 5’ss recognized? U1 snRNA 3’ ………CCAUUCAUAG-5’ |||||| Pre-mRNA 5’…………UUCGUGAGU…………… 3’

RNA Energetics i CCAUUCAUAG-5′ 1|| Free energy of helix formation 5..CGUGAGU..3 derives from G G base pairing U U base stacking U GpA AY CpU Y A G Doug Turner's Energy rules A 1.30 2.40 2.10 1.00 T-0.90 1.30
RNA Energetics I …CCAUUCAUAG-5’ |||||| Free energy of helix formation 5’…CGUGAGU……… 3’ derives from: G A G • base pairing: > > C U U • base stacking: 5' --> 3' UX AY |G p A | 3' <-- 5’ C p U X Y A C G U A . . . -1.30 Doug Turner’s Energy Rules: C . . -2.40 . G . -2.10 . -1.00 T -0.90 . -1.30

RNA Energetics II npNpNpNpNpNpn Lots of consecutive XX NpNpNpNpNpNpN base pairs-good NpnpNpNpnpnpN X X Internal loop -bad NpnpNpNpnpnpN npNp NpNpNpN Terminal base pair X X X not stable- bad NpnpnpnpnpNpN Generally a will be more stable than B or c
RNA Energetics II N p N p N p N p N p N p N A) x | | | | xx N p N p N p N p N p N p N B) N p N p N p N p N p N p N x | | x | | x N p N p N p N p N p N p N N p N p N p N p N p N p N C) x | | | x | x N p N p N p N p N p N p N Lots of consecutive base pairs - good Internal loop - bad Terminal base pair not stable - bad Generally A will be more stable than B or C

Conditional Frequencies in 5'ss Sequences ≤ 1123456 5ss which have g at +5 5'ss which lack G at +5 Pos-1+3+4|+6 Pos-1+3+4|+6 A 447514 A2815122 C43418 C 32820 G78511319 G9715930 T93949 T021228 Data from Burge, 1998"Computational Methods in Molecular Biology
Conditional Frequencies in 5’ss Sequences -1123456 5’ss which have G at +5 5’ss which lack G at +5 Pos -1 +3 +4 +6 A 9 44 75 14 C 4 3 4 18 G 78 51 13 19 T 9 3 9 49 Pos -1 +3 +4 +6 A 2 81 51 22 C 1 3 28 20 G 97 15 9 30 T 0 2 12 28 Data from Burge, 1998 “Computational Methods in Molecular Biology
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Predicting rna Secondary structure.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 Review of DNA Seq.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Genome Sequencing.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 The Language of genomics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Molecular Phylogenetics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Database Searching.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 More Pairwise Sequence Comparisons.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 More Multiple Sequence Alignment.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Michael Yaffe Introduction to Bioinformatics.pdf
- 《微生物遗传学》第四章 基因工程技术在改进微生物.ppt
- 《分子生物学》课程教学资源(练习题)试题详解(含参考答案).doc
- 南京军区南京总医院:《组织芯片应用的现状与前景》讲义.pdf
- 《酶学》课程教学资源(讲义)第四章 酶的结构和功能.doc
- 《酶学》课程教学资源(讲义)第十一章 酶在医学方面的应用.doc
- 《酶学》课程教学资源(讲义)第六章 多种因素对酶反应速度的影响.doc
- 《酶学》课程教学资源(讲义)第八章 酶的别构效应.doc
- 《酶学》课程教学资源(讲义)第五章 酶催化动力学基础.doc
- 《酶学》课程教学资源(讲义)第二章 酶的一般性质和分类.doc
- 《酶学》课程教学资源(讲义)第九章 固定化生物催化剂.doc
- 《酶学》课程教学资源(讲义)第三章 酶活性的测定及分离纯化.doc
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Structure Prediction.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Markov models.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Review -Homology Modeling.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Review of protein structure hierarchy.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 How are X-ray crystal structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 For a molecular simulation or model.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 Comparing protein Structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 The protein interactome.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 DNA Microarrays Clustering.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Ab initio structure prediction.pdf
- 《植物与植物生理学》课程PPT教学课件(高职高专)第三章 植物的矿质营养.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第二章 植物的水分代谢.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第五章 植物的呼吸作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第四章 植物的光合作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第一章 植物细胞和组织.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)植物鉴赏与人文精神.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)展望21世纪的生命科学.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)人兽共患病.ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第1章 绪论(主讲:邹思湘).ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第2章 生命的化学特征.ppt