麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 The Language of genomics

791/736/BE490 Lecture 2 Feb.26,2004 DNA Sequence Comparison 8 Alignment Chris burge
7.91 / 7.36 / BE.490 Lecture #2 Feb. 26, 2004 DNA Sequence Comparison & Alignment Chris Burge

Review of Lecture 1: Genome Sequencing dna se sequence Analysis The Language of genomics CDNAS, ESTS. BACS Alus. etc Dideoxy Method Shotgun Sequencing The 'shotgun coverage equation(Poisson) Flavors of blast BLASTIPNXJ, TBLASTINXI Statistics of High Scoring Segments
Review of Lecture 1: “Genome Sequencing & DNA Sequence Analysis” • The Language of Genomics • • Flavors of BLAST • Statistics of High Scoring Segments - cDNAs, ESTs, BACs, Alus, etc. Dideoxy Method / Shotgun Sequencing - The ‘shotgun coverage equation’ (Poisson) - BLAST[PNX], TBLAST[NX]

Shotgun sequencing a bac or a genome 200 kb(NIH 3 Gb(Celera) Sonicate. Subclone Subclones Sequence, Assemble What would cause problems W Shotgun Contigs assembly?
Shotgun Sequencing a BAC or a Genome 200 kb (NIH) 3 Gb (Celera) Sequence, Assemble Sonicate, Subclone Subclones Shotgun Contigs What would cause problems with assembly?

DNA Sequence Alignment IV Which alignments are significant? ttgacctagatgagatgtcgttcacttttactgagctacagaaaa 45 S: 403 ttgatctagatgagatgccattcacttttactgagctacagaaaa 447 Identify high scoring segments whose score S exceeds a cutoff X using dynamic programming Scores follow an extreme value distribution P(S>x=1-exp[-Kmn e-XI For sequences of length m, n where K, n depend on the score matrix and the composition of the sequences being compared (Same theory as for protein sequence alignments
DNA Sequence Alignment IV Which alignments are significant? Q: 1 ttgacctagatgagatgtcgttcacttttactgagctacagaaaa 45 |||| |||||||||||| | ||||||||||||||||||||||||| S: 403 ttgatctagatgagatgccattcacttttactgagctacagaaaa 447 Identify high scoring segments whose score S exceeds a cutoff x using dynamic programming. Scores follow an extreme value distribution: P(S > x) = 1 - exp[-Kmn e - λ x] For sequences of length m, n where K, λ depend on the score matrix and the composition of the sequences being compared (Same theory as for protein sequence alignments)

From M yaffe Notes cont Lecture #2 Probability values for the extreme value distribution(A)and the normal distribution(B). The area under each curve is I The random sequence alignment scores would give rise to an"extreme value distribution -like a skewed gaussian Called Gumbel extreme value distribution or a normal distribution with a mean m and a variance o, the height of the curve is described by Y1/ov2) exp[-(x-m)2/2021 For an extreme value distribution, the height of the curve is described by Y=exp[-x-e-x].and P(S>x)=1-exp[-e-xx-ul)l where u=(In Kmn)/n Can show that mean extreme score is-log2 (nm), and the probability of getting a score that exceeds some number of standard deviations"X is P(S>X)- Kmne-x ***K and n are tabulated for different matrices *** For the less statistically inclined E- Kmne-us
From M. Yaffe Notes (cont) Lecture #2 • The random sequence alignment scores would give rise to an “extreme value” distribution – like a skewed gaussian. • Called Gumbel extreme value distribution For a normal distribution with a mean m and a variance σ, the height of the curve is described by Y=1/(σ√2π) exp[-(x-m)2/2σ2] For an extreme value distribution, the height of the curve is described by Y=exp[-x-e-x] …and P(S>x) = 1-exp[-e-λ(x-u)] where u=(ln Kmn)/λ Can show that mean extreme score is ~ log2(nm), and the probability of getting a score that exceeds some number of “standard deviations” x is: P(S>x)~ Kmne-λx. ***K and λ are tabulated for different matrices **** For the less statistically inclined: E~ Kmne -λS -2 -1 0.2 Yev 0.4 -4 4 0.4 B. Yn Probability values for the extreme value distribution (A) and the normal distribution (B). The area under each curve is 1. 0 1 2 X X A. 3 4 5

DNA Sequence Comparison Alignment Target frequencies and mismatch penalties Eukaryotic gene structure Comparative genomics applications Pipmaker(2 species comparison) Phylogenetic Shadowing(many species) Intro to DNA sequence motifs See Ch. 7 of Mount
DNA Sequence Comparison & Alignment • Target frequencies and mismatch penalties • Eukaryotic gene structure • Comparative genomics applications: • See Ch. 7 of Mount - Pipmaker (2 species comparison) - Phylogenetic Shadowing (many species) Intro to DNA sequenc e motifs

DNA Sequence Alignment V How is n related to the score matrix? n is the unique positive solution to the equation pip: e= p frequency of nt i, Si= score for aligning an i,j pair What kind of an equation is this? What would happen to n if we doubled all the scores? What does this tell us about the nature of n? Karlin Altschul 1990
i DNA Sequence Alignment V How is λ related to the score matrix? λ is the unique positive solution to the equation*: ∑ p pjeλsij = 1 i i,j p = frequency of nt i, sij = score for aligning an i,j pair What kind of an equation is this? What would happen to λ if we doubled all the scores? What does this tell us about the nature of λ? *Karlin & Altschul, 1990

DNA Sequence Alignment VI What scoring matrix to use for dNA? Usually use simple match-mismatch matrices Gmm CGT mmm mmm m="mismatch penalty(must be negative
DNA Sequence Alignment VI What scoring matrix to use for DNA? Usually use simple match-mismatch matrices: i j: A C G T A 1 m m m C m 1 m m si,j : G T m m m m 1 m m 1 m = “mismatch penalty” (must be negative)

DNA Sequence alignment Vll How to choose the mismatch penalty? Use theory of High Scoring Segment composition High scoring alignments will have composition qi= pp ei where q = frequency of i j pairs(target frequencies") pp - req of i, j bases in sequences being compared What would happen to the target frequencies if we doubled all of the scores? *Karlin Altschul. 1990
DNA Sequence Alignment VII How to choose the mismatch penalty? Use theory of High Scoring Segment composition* High scoring alignments will have composition: qij = pi pj e λ sij where qij = frequency of i,j pairs (“target frequencies”) p , p = freq of i, j bases in sequences being compared i j What would happen to the target frequencies if we doubled all of the scores? *Karlin & Altschul, 1990

DNA Sequence alignment Vlll Still figuring out how to choose the mismatch penalty m Target frequencies: qi=p,p e/ij =In(q;/p:p )/A If you want to find regions with R% identities r=R/100q=r4q=(1n)12() Set s=1 Then m=Si=S/Si=In(q /p pi ))/(In(qi/pip1)/ (]) →m=n4(1)/3)n(4
DNA Sequence Alignment VIII Still figuring out how to choose the mismatch penalty m Target frequencies: qij = pi pj e λ sij sij = l n ( qij / pi pj ⇒ )/ If you want to find regions with R% identities: r = R /100 qii = r/4 qij = (1-r)/12 (i,j) Set sii = 1 Then m = sij = sij/sii = ln(qij / pi pj )/ λ) / (ln(qii / pi pi )/ λ (i ≠j) ⇒ m = ln(4(1-r)/3)/ln(4r) λ
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Molecular Phylogenetics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Database Searching.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 More Pairwise Sequence Comparisons.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 More Multiple Sequence Alignment.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Michael Yaffe Introduction to Bioinformatics.pdf
- 《微生物遗传学》第四章 基因工程技术在改进微生物.ppt
- 《分子生物学》课程教学资源(练习题)试题详解(含参考答案).doc
- 南京军区南京总医院:《组织芯片应用的现状与前景》讲义.pdf
- 《酶学》课程教学资源(讲义)第四章 酶的结构和功能.doc
- 《酶学》课程教学资源(讲义)第十一章 酶在医学方面的应用.doc
- 《酶学》课程教学资源(讲义)第六章 多种因素对酶反应速度的影响.doc
- 《酶学》课程教学资源(讲义)第八章 酶的别构效应.doc
- 《酶学》课程教学资源(讲义)第五章 酶催化动力学基础.doc
- 《酶学》课程教学资源(讲义)第二章 酶的一般性质和分类.doc
- 《酶学》课程教学资源(讲义)第九章 固定化生物催化剂.doc
- 《酶学》课程教学资源(讲义)第三章 酶活性的测定及分离纯化.doc
- 《酶学》课程教学资源(讲义)第七章 多底物酶反应动力学.doc
- 《酶学》课程教学资源(讲义)第一章 绪论.doc
- 孝感学院:《植物解剖学》第四章 种子植物的繁殖和繁殖器官.ppt
- 孝感学院:《植物解剖学》第三讲 叶.ppt
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Genome Sequencing.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 Review of DNA Seq.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Predicting rna Secondary structure.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Organization of topics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Structure Prediction.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Markov models.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Review -Homology Modeling.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Review of protein structure hierarchy.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 How are X-ray crystal structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 For a molecular simulation or model.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 Comparing protein Structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 The protein interactome.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 DNA Microarrays Clustering.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Ab initio structure prediction.pdf
- 《植物与植物生理学》课程PPT教学课件(高职高专)第三章 植物的矿质营养.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第二章 植物的水分代谢.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第五章 植物的呼吸作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第四章 植物的光合作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第一章 植物细胞和组织.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)植物鉴赏与人文精神.ppt