浙江大学:《生物信息学》课程配套PPT课件(第二版)3 Analysis and alignment of sequences 3.4 Multiple sequence alignment and domain finding

《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3 3.4 Multiple sequence alignment and domain finding (1) Multiple sequence alignment and progressive global alignment(Clustal W) (2) Find and model local multiple alignment (3)How to evaluate the quality of a PSSM?
• (1) Multiple sequence alignment and progressive global alignment (ClustalW) • (2) Find and model local multiple alignment • (3) How to evaluate the quality of a PSSM? 3.4 Multiple sequence alignment and domain finding 《生物信息学》(第二版)(樊龙江主编,2021)配套PPT3-3

数据库 DNA/RNA 保守功能位点/元件 (Rfam、Dfam等) 联配多条序列 信息量/熵 概型 蛋白质 功能域 数据库 HMM ( PROSITE、Pfam等) 正则表达式

(1)Multiple sequence alignment and progressive global alignment(ClustalW) Why produce a multiple sequence alignment? Conserved regions/domains are likely to represent regions that are essential for structure and function core of proteins A multiple sequence alignment is a starting point for an evolutionary(phylogenetic)analysis Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences
(1) Multiple sequence alignment and progressive global alignment (ClustalW) • Conserved regions/domains are likely to represent regions that are essential for structure and function - core of proteins • A multiple sequence alignment is a starting point for an evolutionary (phylogenetic) analysis • Using more than two sequences results in a more convincing alignment by revealing conserved regions in all of the sequences Why produce a multiple sequence alignment?

Types of multiple sequence alignment Global alignment in which entire sequences are aligned at the same time using extension of dy ynamic programming Local alignment in which conserved local regions derived by removing stretches of global alignment found by statistical methods
Types of multiple sequence alignment • Global alignment in which entire sequences are aligned at the same time using extension of dynamic programming • Local alignment in which conserved local regions • derived by removing stretches of global alignment • found by statistical methods

EXample of local msa Rest of Rest of proteins do not align we∥ot proteins do align well Domain that aligns well To find an identifiable common AGGCTT usually longest, pattern with AAGCTA 2 some degree of variability. No agactt 3 gaps are shown in this example AAACTA/ 4 but they can be accomodated
Example of local msa AGGCTT AAGCTA AGACTT AAACTA 1 2 3 4 Domain that aligns well Rest of proteins do not align well Rest of proteins do not align well To find an identifiable common, usually longest, pattern with some degree of variability. No gaps are shown in this example but they can be accomodated

Challenges Finding an optimal alignment of more than two sequences that includes matches mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column ot an msa
• Finding an optimal alignment of more than two sequences that includes matches, mismatches, and gaps, and that takes into account the degree of variation in all of the sequences at the same time poses a very difficult challenge. • A second computational challenge is identifying a reasonable method of obtaining a cumulative score for the substitutions in the column of an msa. Challenges

Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix Sequence 1 ■■■■■■■■■■■■ Problems . Time and space needed is length of seq raised to power of no of sequences 8c8 ? Optimal score in 3 Can do for three sequences dimensions but not more than three
Multiple sequence alignment is computational complex Suppose one tries to align three sequences by extending the method of aligning 2 sequences to a 3 dimensional scoring matrix. Sequence 1 Sequence 2 Y W W ? Optimal score in 3 dimensions Problems: •Time and space needed is length of seq. raised to power of no. of sequences •Can do for three sequences but not more than three

Alignment of three sequences by dynamic programming For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003=2.7 X 107, whereas only 3002=9 X104 is required for two sequences of this length (The number of steps and memory required for N M-amino-acid sequences: Mv Carrillo and Lipman(1988) found a way(the sum of pairs, sP method, the msa program)to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment
Alignment of three sequences by dynamic programming • For three protein sequences each 300 amino acids in length and excluding gaps, the number of comparisons to be made by dynamic programming is equal to 3003 = 2.7 ×107 , whereas only 3002 = 9 ×104 is required for two sequences of this length. (The number of steps and memory required for N M-amino-acid sequences: MN) • Carrillo and Lipman (1988) found a way (the sum of pairs, SP method, the MSA program) to reduce the number of comparisons that must be made without compromising the attempt to find an optimal alignment

Basic idea of msa program sum of pairs(SP)method B A-C sequence A
Basic idea of MSA program: sum of pairs (SP) method

Thus, approximate methods are used, including (1)a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2)iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result (3)alignments based on locally conserved patterns found in the same order in the sequences (4)use of statistical methods and probabilistic models of the sequences
Thus, approximate methods are used, including: (1) a progressive global alignment of the sequences starting with an alignment of the most alike sequences and then building an alignment by adding more sequences; (2) iterative methods that make an initial alignment of groups of sequences and then revise the alignment to achieve a more reasonable result; (3) alignments based on locally conserved patterns found in the same order in the sequences; (4) use of statistical methods and probabilistic models of the sequences
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 浙江大学:《生物信息学》课程配套PPT课件(第二版)5 Phylogenetic Tree 5.1 Genetic polymorphism and phylogenetic tree 5.2 Construction of phylogenetic tree.pptx
- 上海交通大学医学院:常用实验动物生物学特性及其应用(小鼠、大鼠、豚鼠、兔).ppt
- 天津医科大学附属肿瘤医院:采用系统生物学方法分析乳腺癌转移中Runx2对细胞外基质重塑的调节机制.ppt
- 中国科学技术大学:系统生物学与复杂性疾病(知识讲座,吴家睿).ppt
- 中国科学技术大学:《药物化学》课程教学资源(PPT课件讲稿)Chapter 1 Introduction Medicinal Chemistry(授课教师:阮科).ppt
- 复旦大学:数理科学与生命科学(PPT讲稿,理论生命科学研究中心:郝柏林).ppt
- 北京大学:《分子心血管学》课程PPT教学课件(讲稿)代谢组学技术及应用新策略简介(主讲:刘慧颖).pptx
- 普通高等教育“十二五”规划教材:《生物信息学》课程PPT教学课件(Bioinformatics)第八章 合成生物学.ppt
- 电子科技大学:《生物信息学》课程PPT教学课件(讲稿)课程绪论(主讲:林昊).ppt
- 基因注释与功能分类 Gene Annotation And Functional Classification.ppt
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——天目山昆虫重要类群.pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——实习讲座(2019).pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——鸟类学.pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——大型真菌(王英明).pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——昆虫学——昆虫标本的采集与制作.pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——植物学目的、内容、过程及要求.pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——昆虫学——天目山昆虫主要类群(吴纪华).pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——真菌学——大型真菌(王英明).pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——追踪鸟类的迁徙(马志军).pdf
- 复旦大学:《生物学野外实习》课程教学资源(教学课件)天目山野外实习——植物学(张文驹).pdf
- 浙江大学:《生物信息学》课程配套PPT课件(第二版)3 Analysis and alignment of sequences 3.1 Compositional bias in biological sequences 3.2 Alignment of pairs of sequences.pptx
- 山东大学:PCR最新技术原理、方法及应用(第二版,张为宁),2011.ppt
- 复旦大学:《医学与生物安全》课程教学资源(见习实习方案).pdf
- 复旦大学:《医学与生物安全》课程教学资源(见实习方案,WORD版).doc
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)生物安全实验室设施——防护设施、空气净化与负压体系、生物安全柜(韩文东).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室管理——菌毒种保藏与运输(丁悦娜).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室生物安全基础——生物安全实验室的个人防护、消毒灭菌、废弃物处理(孙志平).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)生物安全 Bio-safety瞿涤人为生物危险(生物战剂、生物恐怖、防御突发事件的应对).pdf
- 《医学与生物安全》课程教学资源:生物安全监督执法——强化世博会实验室生物安全保障培训(上海市卫生局监督所:顾小平).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室生物安全基本概念与危险因子(叶荣).pdf
- 张勘上海市卫生局:病原微生物实验室生物安全管控的实践探索与未来挑战(张勘).pdf
- 复旦大学:《基因组学》课程教学资源(学习资料)遗传学词典.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)系统生物学综述.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)人类基因组范围转录异构变异——表达水平多样性.pdf
- 复旦大学:《基因组学》课程教学资源(学习资料)美国提出基因测序数据分类新标准.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)科学家绘制出最清晰立体人类基因组结构图.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)简述miRNA及其在动植物中的差异.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)基因注解网站.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)基因组结构的进化.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)Genome Project History(2011).ppt