麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 More Pairwise Sequence Comparisons

7.91- Lecture #2 Michael Yaffe More Pairwise Sequence Comparisons ARDE SHGLLENKLLGCDSMRWE :..:::: GRDYKMALLEQWIIGCD-MRWD and Multiple sequence Alignment ARDESHGLLENKLLGCDSMRWE GRDYKMALLEQWILGCD-MRWD sR--元 IEDCMV-CNFFR而 Reading This lecture: Mount pp.8-9,65-89,96-115,140-155,161-170
7.91 – Lecture #2 More Pairwise Sequence Comparisons ARDFSHGLLENKLLGCDSMRWE .::. .:::. .:::: :::. GRDYKMALLEQWILGCD-MRWD - and – Multiple Sequence Alignment ARDFSHGLLENKLLGCDSMRWE .::. .:::. .:::: :::. GRDYKMALLEQWILGCD-MRWD .::. ::.: .. :. .::: SRDW--ALIEDCMV-CNFFRWD Reading: This lecture: Mount pp. 8-9, 65-89, 96-115, 140-155, 161-170 Michael Yaffe

Outline Recursion and dynamic programming Applied dynamic programming: global alignments: Needleman-Wunsch Applied dynamic programming: local alignments Smith -Waterman Substitution matrices PAM. blosUM, Gonnet Gaps- linear and affine · Alignment statistics What you need to know to optimize an alignment
Outline • Recursion and dynamic programming • Applied dynamic programming: global alignments: Needleman-Wunsch • Applied dynamic programming: local alignments – Smith-Waterman • Substitution matrices: PAM, BLOSUM, Gonnet • Gaps - linear and affine • Alignment statistics • What you need to know to optimize an alignment

Outline(cont) Multiple sequence alignments: MSA, Clustal Block analysis Position-Specific Scoring Matrices(PSSM)
Outline (cont) • Multiple sequence alignments: MSA, Clustal • Block analysis • Position-Specific Scoring Matrices (PSSM)

Examples o(n)is“ polynomial time” as long as K<3 tractable Consider our un-gapped dot matrix Global alignment 12345678 12345678. 12345678 12345678 12345678 12345678 essentially an o(mn) problem
Examples O(nk) is “polynomial time” as long as K<3 …..tractable Consider our un-gapped dot matrix Global alignment: 1 n 1 12345678…. 12345678…. 12345678…. 12345678…. 12345678…. m 12345678…. ….essentially an O(mn) problem

O.K. Examples o(n)better than o(n log(n)), better than O(n), better than o(n3) Terrible Examples o(kn =exponential time. horrible!!!! NP problems-no known polynomial time Solutions non-deterministic polynomial Problems
O.K. Examples O(n) better than O(n log(n)), better than O(n2), better than O(n3) Terrible Examples O(kn) = exponential time….horrible!!!! NP problems- no known polynomial time Solutions = non-deterministic polynomial Problems

Recursion and dynamic Programming Aligning two protein sequences without gaps- roughly an o(mn)problem With gaps- becomes computationally astronomical, and cannot be done y direct comparison methods. (22/\(2L); L=sequence length) Alternative is to compare all possible pairs of characters(matches and mismatches, and also take gaps into account as well, while keeping the number of comparisons manageable. The approach is called dynamic programming. Mathematically proven to produce optimal alignment Need a substitution or similarity matrix and some way to account for gaps Example of how to score an alignment: Write down two sequences sequence#1 V D S C Y sequence#2 V E S L C Y Score from sub. Matrix 4 2 4 -119 7 Score=X(AA pair scores)-gap penalty =15
Recursion and Dynamic Programming Aligning two protein sequences without gaps – roughly an O(mn) problem. With gaps – becomes computationally astronomical, and cannot be done by direct comparison methods. (= 22L / √(2 πL); L=sequence length) Alternative is to compare all possible pairs of characters (matches and mismatches, and also take gaps into account as well, while keeping the number of comparisons manageable. The approach is called dynamic programming. Mathematically proven to produce optimal alignment Need a substitution or similarity matrix and some way to account for gaps. Example of how to score an alignment: Write down two sequences: sequence#1 V D S – C Y sequence#2 V E S L C Y Score from sub. Matrix 4 2 4 -11 9 7 Score = Σ(AA pair scores) – gap penalty = 15

BLOSUM 62 Scoring Matrix cs T-115 P-3-1-17 A010-14 G-30-2-206 N-310 EQH -40-1-1-1-20 2 R-3-1-1-2-1-20-20105 K-30-1-1-1-20-111-125 M-1-1-1-2-1-3-2-3-20-2-1-15 工-1-2-1-3-1-4-3-3-3-3-3-3-314 L-1-2-1-3-1-4-3-4-3-2-3-2-2224 V-1-20-20-3-3-3-2-2-3-3-2131 F-2-2-2-4-2-3-3-3-3-3-1-3-3000-16 Y-2-2-2-3-2-3-2-3-2-12-2-2-1-1-1-13 W-2-3-2-4-3-2-4-4-3-2-2-3-3-1-3-2-31211 C STPAGNDEQHRKMI LVF Y W
C 9 S T 5 P A 0 1 0 G 6 N 6 D 6 E 2 5 Q 0 2 5 H 0 0 8 R 0 1 0 5 K 1 5 M 0 I 4 L 2 4 V 3 1 4 F 0 Y 2 7 W CSTPAGNDEQHRKMILVFYW BLOSUM 62 Scoring Matrix -1 4 -1 1 -3 -1 -1 7 -1 4 -3 0 -2 -2 0 -3 1 0 -2 -2 0 -3 0 -1 -1 -2 -1 1 -4 0 -1 -1 -1 -2 0 -3 0 -1 -1 -1 -2 0 -3 -1 -2 -2 -2 -2 1 -1 -3 -1 -1 -2 -1 -2 0 -2 -3 0 -1 -1 -1 -2 0 -1 1 -1 2 -1 -1 -1 -2 -1 -3 -2 -3 -2 -2 -1 -1 5 -1 -2 -1 -3 -1 -4 -3 -3 -3 -3 -3 -3 -3 1 -1 -2 -1 -3 -1 -4 -3 -4 -3 -2 -3 -2 -2 2 -1 -2 0 -2 0 -3 -3 -3 -2 -2 -3 -3 -2 1 -2 -2 -2 -4 -2 -3 -3 -3 -3 -3 -1 -3 -3 0 0 -1 6 -2 -2 -2 -3 -2 -3 -2 -3 -2 -1 -2 -2 -1 -1 -1 -1 3 -2 -3 -2 -4 -3 -2 -4 -4 -3 -2 -2 -3 -3 -1 -3 -2 -3 1 2 11

Scoring system should: favor matching identical or related amino acids Penalize for poor matches and for gaps To get a good scoring system need to know: how often a particular amino acid Pair is found in related proteins compared with its occurence by chance. This Is the information contained in the substitution matrix …… and when a gap would be a better choice Deriving realistic substitution matrices First need to know frequency of one amino acid substituting for another In related proteins plab)] c/w the chance that substituting one for the other occurred by chance based on the relative frequencies of each amino acid in proteins, gla) and q(b). Call this theodds ratio" P()q(b If we do this for all positions in an alignment, then the total probablilty will be the product of the odds ratios at each position. but multiplication is computationally expensive. .SO. take the log(odds ratio) and add them instead Matrices like PAM and BLOSUM matrices are derived from these log odds ratios And contain positive and negative numbers reflecting likelihood of amino Acid substitutions in related proteins
Scoring system should: favor matching identical or related amino acids Penalize for poor matches and for gaps. To get a good scoring system need to know: how often a particular amino acid Pair is found in related proteins compared with its occurence by chance. This Is the information contained in the substitution matrix …..….and when a gap would be a better choice Deriving realistic substitution matrices: First need to know frequency of one amino acid substituting for another In related proteins [=P(ab)] c/w the chance that substituting one for the other occurred by chance, based on the relative frequencies of each amino acid in proteins, q(a) and q(b). Call this the “odds ratio”: P(ab)/q(a)q(b) If we do this for all positions in an alignment, then the total probablilty will be the product of the odds ratios at each position….but multiplication is computationally expensive….so….take the log (odds ratio) and add them instead. Matrices like PAM and BLOSUM matrices are derived from these log odds ratios And contain positive and negative numbers reflecting likelihood of amino Acid substitutions in related proteins

To do Dynamic Programming First write one sequence across the top, and one down along the side Gap V D Gap 0 1 gap 2 gaps gap 2 gaps ESLc Note-linear gap penalty: Mo)=nA, where A=gap penalty
To do Dynamic Programming: First write one sequence across the top, and one down along the side Gap V D S C Y Gap 0 1 gap 2 gaps V 1 gap E 2 gaps S L C Y Note – linear gap penalty: γ(n)=nA, where A=gap penalty

To do Dynamic Programming: First write one sequence across the top, and one down along the side 2 5 Gap V D 0 Gap -8-16-24 32-40 2 E So scoring Sij requires that we know S(i-1, j-1) and S(i, j-1) and S(i-1,j 3 S 24 Therefore recursive, we use the solutions Of smaller problems to solve larger ones 32 ANd we store how we got to the sij score i e, the intermediate solutions in a tabular 40 matrix. Computer scientists call this dynamic programming where"programming means 18 the matrix, not some kind of computer code
1 2 3 4 5 6 To do Dynamic Programming: First write one sequence across the top, and one down along the side i =0 1 2 3 4 5 j = Gap V D S C Y 0 Gap 0 sij -8 -16 -24 -32 -40 V -8 E -16 So scoring Sij requires that we know S(i-1, j-1) and S(i, j-1) and S(i-1, j)… S -24 Therefore recursive. We use the solutions Of smaller problems to solve larger ones. L -32 AND we store how we got to the Sij score, i.e. the intermediate solutions in a tabular C -40 matrix. Computer scientists call this dynamic programming, where “programming” means Y -48 the matrix, not some kind of computer code
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 More Multiple Sequence Alignment.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Michael Yaffe Introduction to Bioinformatics.pdf
- 《微生物遗传学》第四章 基因工程技术在改进微生物.ppt
- 《分子生物学》课程教学资源(练习题)试题详解(含参考答案).doc
- 南京军区南京总医院:《组织芯片应用的现状与前景》讲义.pdf
- 《酶学》课程教学资源(讲义)第四章 酶的结构和功能.doc
- 《酶学》课程教学资源(讲义)第十一章 酶在医学方面的应用.doc
- 《酶学》课程教学资源(讲义)第六章 多种因素对酶反应速度的影响.doc
- 《酶学》课程教学资源(讲义)第八章 酶的别构效应.doc
- 《酶学》课程教学资源(讲义)第五章 酶催化动力学基础.doc
- 《酶学》课程教学资源(讲义)第二章 酶的一般性质和分类.doc
- 《酶学》课程教学资源(讲义)第九章 固定化生物催化剂.doc
- 《酶学》课程教学资源(讲义)第三章 酶活性的测定及分离纯化.doc
- 《酶学》课程教学资源(讲义)第七章 多底物酶反应动力学.doc
- 《酶学》课程教学资源(讲义)第一章 绪论.doc
- 孝感学院:《植物解剖学》第四章 种子植物的繁殖和繁殖器官.ppt
- 孝感学院:《植物解剖学》第三讲 叶.ppt
- 孝感学院:《植物解剖学》第一讲 花的重要组成结构及形成.ppt
- 《现代生物学导论》课程教学资源(PPT课件)第四章 生命的基本化学组成.ppt
- 《现代生物学导论》课程教学资源(PPT课件)第三章 细胞.ppt
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Database Searching.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Molecular Phylogenetics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 The Language of genomics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Genome Sequencing.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 Review of DNA Seq.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Predicting rna Secondary structure.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Organization of topics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Structure Prediction.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Markov models.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Review -Homology Modeling.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Review of protein structure hierarchy.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 How are X-ray crystal structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 For a molecular simulation or model.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 Comparing protein Structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 The protein interactome.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 DNA Microarrays Clustering.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Ab initio structure prediction.pdf
- 《植物与植物生理学》课程PPT教学课件(高职高专)第三章 植物的矿质营养.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第二章 植物的水分代谢.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第五章 植物的呼吸作用.ppt