麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 Review of DNA Seq

791/7.36/BE490 Lecture #3 Mar.2,2004 DNA Motif Modeling 8 Discovery Chris burge
7.91 / 7.36 / BE.490 Lecture #3 Mar. 2, 2004 DNA Motif Modeling & Discovery Chris Burge

Review of DNA Seq. Comparison/Alignment Target frequencies and mismatch penalties Eukaryotic gene structure Comparative genomics applications Pipmaker(2 species comparison) Phylogenetic Shadowing(many species) Intro to DNA sequence motifs
Review of DNA Seq. Comparison/Alignment • Target frequencies and mismatch penalties • Eukaryotic gene structure • Comparative genomics applications: (2 species comparison) • Intro to DNA sequence motifs - Pipmaker - Phylogenetic Shadowing (many species)

Organization of topics Dependence Lecture Object Model Structure 5678910111 Weight 3/2 Matrix Independence G SECCAA Model 10201908060.9020.000.0ot Hidden Markov Local 3/4 Dependence Pramater Stop ass 3ss Model 3/9 Energy Model, Non-local I Covariation ModelDependence
Organization of Topics Model Dependence Lecture Object Structure Weight Matrix Model Hidden Markov Model 3/2 Independence Local 3/4 Dependence Energy Model, Covariation Model Non-local Dependence 3/9

DNA Motif Modeling Discovery Review -WMMs for splice sites Information Content of a motif The Motif Finding/Discovery Problem The Gibbs Sampler TThe gibbs Sampling Algorithm Multimedia Experience Motif Modeling -beyond Weight Matrices See Ch, 4 of Mount
DNA Motif Modeling & Discovery • Information Content of a Motif See Ch. 4 of Mount • Review - WMMs for splice sites • The Motif Finding/Discovery Problem • The Gibbs Sampler • Motif Modeling - Beyond Weight Matrices

Splicing model 5 splice site c BPX GC G G ATP branch site 7-65-4-3-2-1 ATP cacac acca ATP 3 splice site 12-11-10-98-7-65-43-2-112 cCCCccC. CCIe G
Splicing Model I branch site 5’ splice site 3’ splice site

Weight matrix models ii 5'splice c signal GaY Gs G Background Con AG Pos Generic 2 5+6 A 0.25 A 03060.1 010.1 0.25 C 04010.0 0.102 G 0.25 G 020.208 0802 T 010101 0.00.5 T 0.25 S=S, S, SSg Odds ratio: R P(S|+)=P3S)P2S2)P1(S)P5(S)P6(S) P(S-)=Pbg(S1)Pb( S2)Pbg(S3)Pbg(S8)Pbg(Sg) Background model homogenous, assumes independence
Weight Matrix Models II 5’ splice signal C A G … G T Background Con: Pos -3 -2 -1 … +5 +6 A 0.3 0.6 0.1 … 0.1 0.1 C 0.4 0.1 0.0 … 0.1 0.2 G 0.2 0.2 0.8 … 0.8 0.2 T 0.1 0.1 0.1 … 0.0 0.5 Pos Generic A 0.25 C 0.25 G 0.25 T 0.25 S = S1 S2 S3 S4 S5 S6 S7 S8 S9 ( S1)P-2 ( S 2)P-1 ( S 3) ••• P 5 ( S 8)P 6 ( S 9 ) Odds Ratio: R = P(S|+) = P-3 P(S|-) = Pbg ( S1)Pbg ( S 2)Pbg ( S 3) ••• Pbg ( S 8)Pbg ( S 9) Background model homogenous, assumes independence

Weight matrix Models Iii S=S, S3 345063708 S9 P(S|+)P3(S)P2(S2)P1(S3)P5(S)P6(S) Odds ratio: R P(S)-)Pbg(S1)Pbg(S2)Pbg(S3)o Pbg(S8)Pbg(Sg) ∏P4+(SP(S) Score s=log2R =>log2(P 4+(Sk)/Pba(S) Neyman-Pearson Lemma Optimal decision rules are of the form>C EquiV. log2 (R)>c because log is a monotone function
Weight Matrix Models III S = S1 S2 S3 S4 S5 S6 S7 S8 S9 P(S|+) P-3 ( S1)P-2 ( S 2)P-1 ( S 3) ••• P 5 ( S 8)P 6 ( S 9 ) Odds Ratio: R = = P(S|-) Pbg ( S1)Pbg ( S 2)Pbg ( S 3) ••• Pbg ( S 8)Pbg ( S 9) k=9 = ∏ P-4+ k ( S k)/ Pbg ( S k) k=1 k=9 Score s = log 2R = ∑ log2 (P-4+ k ( S k)/ Pbg ( S k)) k=1 Neyman-Pearson Lemma: Optimal decision rules are of the form R > C Equiv.: log 2(R) > C ’ because log is a monotone function

Weight matrix Models iv Slide WMM along sequence ttgacctagatgagatgtcgttcactttactgagctacagaaaa Assign score to each 9 base window Use score cutoff to predict potential 5 splice sites
Weight Matrix Models IV Slide WMM along sequence: ttgacctagatgagatgtcgttcacttttactgagctacagaaaa …… Assign score to each 9 base window. Use score cutoff to predict potential 5’ splice sites

Histogram of 5'ss Scores 2000 PseudO-n。 eco Teon。2s True 1500 15 5 Splice 1000 1。 Splice Sites soo so Sites 200 150 100 Score(1/10 bit units) Measuring accuracy Sn:20%50%90% Sensitivity = of true sites w/score> cutoff Specificity = of sites w/ score cutoff sp:50%32%7% that are true sites
Histogram of 5’ss Scores True 5’ Splice Sites “Decoy” 5’ Splice Sites Score (1/10 bit units) Measuring Accuracy: Sensitivity = % of true sites w/ score > cutoff Specificity = % of sites w/ score > cutoff that are true sites Sn: 20% 50% 90% Sp: 50% 32% 7%

What does this result tell us? A) Splicing machinery also uses other information besides 5'ss motif to identify splice sites; OR B)WMM model does not accurately capture some aspects of the 5'ss that are used in recognition (or both) This is a pretty common situation in biology
What does this result tell us? A) Splicing machinery also uses other information besides 5’ss motif to identify splice sites; OR B) WMM model does not accurately capture some aspects of the 5’ss that are used in recognition (or both) This is a pretty common situation in biology
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Genome Sequencing.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 The Language of genomics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Molecular Phylogenetics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Database Searching.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 More Pairwise Sequence Comparisons.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 More Multiple Sequence Alignment.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Michael Yaffe Introduction to Bioinformatics.pdf
- 《微生物遗传学》第四章 基因工程技术在改进微生物.ppt
- 《分子生物学》课程教学资源(练习题)试题详解(含参考答案).doc
- 南京军区南京总医院:《组织芯片应用的现状与前景》讲义.pdf
- 《酶学》课程教学资源(讲义)第四章 酶的结构和功能.doc
- 《酶学》课程教学资源(讲义)第十一章 酶在医学方面的应用.doc
- 《酶学》课程教学资源(讲义)第六章 多种因素对酶反应速度的影响.doc
- 《酶学》课程教学资源(讲义)第八章 酶的别构效应.doc
- 《酶学》课程教学资源(讲义)第五章 酶催化动力学基础.doc
- 《酶学》课程教学资源(讲义)第二章 酶的一般性质和分类.doc
- 《酶学》课程教学资源(讲义)第九章 固定化生物催化剂.doc
- 《酶学》课程教学资源(讲义)第三章 酶活性的测定及分离纯化.doc
- 《酶学》课程教学资源(讲义)第七章 多底物酶反应动力学.doc
- 《酶学》课程教学资源(讲义)第一章 绪论.doc
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Predicting rna Secondary structure.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Organization of topics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Structure Prediction.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Markov models.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Review -Homology Modeling.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Review of protein structure hierarchy.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 How are X-ray crystal structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 For a molecular simulation or model.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 Comparing protein Structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 The protein interactome.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 DNA Microarrays Clustering.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Ab initio structure prediction.pdf
- 《植物与植物生理学》课程PPT教学课件(高职高专)第三章 植物的矿质营养.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第二章 植物的水分代谢.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第五章 植物的呼吸作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第四章 植物的光合作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第一章 植物细胞和组织.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)植物鉴赏与人文精神.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)展望21世纪的生命科学.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)人兽共患病.ppt