麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Review -Homology Modeling

7.91 Amy Keating Methods for Protein structure prediction Homology modeling Fold recognition Next time: Ab initio Prediction
Methods for Protein Structure Prediction Homology Modeling & Fold Recognition Next time: Ab Initio Prediction 7.91 Amy Keating

Review -Homology modeling Identify a protein with similar sequence for which a structure has been solved (the template) Align the target sequence with the template Use the alignment to build an approximate structure for the target Fill in any missing pieces Fine-tune the structure Evaluate success An excellent review Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29 (2000): 291-325
Review - Homology Modeling • Identify a protein with similar sequence for which a structure has been solved (the template) • Align the target sequence with the template • Use the alignment to build an approximate structure for the target • Fill in any missing pieces • Fine-tune the structure • Evaluate success An excellent review: Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29 (2000): 291-325

EDN CRABPI NM23 100 these numbers are from an mu>uHm-08 entirely O TEMPLATE· TARGET automated 50 ● MODEL· TARGET process- can do better with TEMPLATE- TARGET DIFFERENCE manua ALIGNMENT ERROR intervention 020406080100 SEQUENCE IDENTITY Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29 (2000) 291-325 Courtesy of Annual Reviews Nonprofit Publisher of the Annual Review of TM Series. Used with permission
these numbers are from an entirely automated process - can do better with manual intervention Marti-Renom et al. Annu. Rev. Biophys. Biomol. Struct. 29 (2000): 291-325. Courtesy of Annual Reviews Nonprofit Publisher of the Annual Review of TM Series. Used with permission

Homology Modeling on a Genomic Scale · Requires automation Can't choose templates or fine-tune the alignment by hand MODBASE and 3D-CRUNCH http://alto.compbio.ucsfedu/modbase-cgi/index.cgi http://www.expasy.ch/swissmod/sm_3dcrunch.htm Automatic assessment is critical how reliable is the model?
Homology Modeling on a Genomic Scale • Requires automation – Can’t choose templates or fine-tune the alignment by hand! • MODBASE and 3D-CRUNCH http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi http://www.expasy.ch/swissmod/SM_3DCrunch.html • Automatic assessment is critical - how reliable is the model?

One approach to assessment Want to compute the probability that a prediction is good based on properties of the model For a given score of the model e.g. Q-score-more on this later), use a training set of known examples together with Bayes'rule P(AB) =P(AA B/P(B)=P(AP(BA)RP(AP(BA)+ P(AP(B !A)y Assume probability of a good Vs a bad model is the same . e P(a)=P(A) where a good model; !a =bad model; B=Q-score P(good Q-score)=P(Q-scorel good)/P(Q-score good)+P(Q-score bad)] good models bad models Q -score Sanchez, R, and A sali. "Large-scale Protein Structure Modeling of The Saccharomyces Cerevisiae Genome Proc Nat/ Acad SciU SA. 95, no. 23(10 November 1998 ) 13597-602
One approach to assessment Want to compute the probability that a prediction is good, based on properties of the model For a given score of the model (e.g. Q-score - more on this later), use a training set of known examples, together with Bayes’ rule P(A|B) = P(A ^ B)/P(B) = P(A)P(B|A)/{P(A)P(B|A) + P(!A)P(B|!A)} Assume probability of a good vs. a bad model is the same, i.e. P(A) = P(!A) where A = good model; !A = bad model; B = Q-score P(good|Q-score) = P(Q-score|good)/{P(Q-score|good) + P(Q-score|bad)} Prob. Q-score good models bad models Sanchez, R, and A Sali. "Large-scale Protein Structure Modeling of The Saccharomyces Cerevisiae Genome." Proc Natl Acad Sci U S A. 95, no. 23 (10 November 1998): 13597-602

MODBASE http://alto.compbio.ucsfedumodbase-cgi/index.cgi 733, 239 sequences 7, 120 non-redundant structures Fold Assignments (by PSI-BLAST) Reliable fold assignments: 827,007 for 413, 311 sequences Average folds per sequence: 2.0 Average length of queries: 511 amino acids Average length of folds: 229 amino acids Comparative Models(by mODELLeR) Reliable models 547,473 Sequences with reliable models: 327, 393(59%) Structures used as templates 6366(89%) For a reliable fold assignment psi-BLaSt E value<0.0001 oR a reliable model For a reliable model, 30% of Ca atoms superpose within 3. 5A of their correct positions
MODBASE http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi • 733,239 sequences & 7,120 non-redundant structures • Fold Assignments (by PSI-BLAST) • Reliable fold assignments: 827,007 for 413,311 sequences • Average folds per sequence: 2.0 • Average length of queries: 511 amino acids • Average length of folds: 229 amino acids • Comparative Models (by MODELLER) • Reliable models 547,473 • Sequences with reliable models: 327,393 (59%) • Structures used as templates: 6.366 (89%) For a reliable fold assignment, PSI-BLAST E value < 0.0001 OR a reliable model. For a reliable model, 30% of Cα atoms superpose within 3.5Å of their correct positions

EXample You' ve just cloned a new gene from Pombe look it up in ModBase putative galactosyltransferase associated protein kinase (GenBank accession 3006192) ASE TARGET MODEL DATA TEMPLATE Model/ old Sequenc Sequence Reliabilty based vie Database Database Organism Segment Annotation Links Annotation 端需噩密 serineithreonine Q“。TRQ60145 Dataset 39871368298450016-1211.00141291 human cyclin-dependent SP/TR-2001 PFAM kinase erineithre onine Q∴TR96045Dt5e 述的代39859423000111100824260msep38 SP/TR-2001 PFAM Q“TRQ60145Daae 45393533031010 bitchin SP/TR-2001 PFAM serinelthreonine ScHamps-depende 速为39823936020101010°:23281021371804 lapm 3 2(catalytic ubunit)alpha isoe Pieper, Ursula, Narayanan Eswar, Ashley C. Stuart, Valentin A Ilyin, and Andrej sali. "MODBASE, A Database of Annotated Comparative Protein Structure Models "Nuc. Acids Res. 30(2002 255-259 http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi
Example You’ve just cloned a new gene from Pombe - look it up in ModBase • putative galactosyltransferase associated protein kinase (GenBank accession # 3006192) Pieper, Ursula, Narayanan Eswar, Ashley C. Stuart, Valentin A. Ilyin, and Andrej Sali. "MODBASE, A Database of Annotated Comparative Protein Structure Models." Nucl. Acids Res. 30 (2002): 255-259. http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi

Model of new PomBE gene TARGET TEMPLATE= 1HCL PDB ID: 1HCL Schulze Gahmen, U, J Brandsen, H D. Jones, D O Morgan, L Meijer, J. Vesely, and S H. Kim. Multiple Modes of Ligand Recognition Crystal Structures of Cyclin-dependent Protein Kinase 2 in Complex with ATP and Two Inhibitors Olomoucine and Isopentenyladenine. "Proteins 22(1995): 378 roteindAtabAnk(pdb-http://www.pdb.orgisthesingleworldwiderepositoryfortheprocessinganddistributionof3-dbiologicalmacromolecularstructuredata lan, H M, J. Westbrook, Z. Feng, G. Gilliland, T. N Bhat, H Weissig, L N. Shindyalov, and P E Bourne. The Protein Data Bank. Nucleic Acids Research 28 (2000:235242 (pDbAdvisoryNoticeonusingmaterialsavailableinthearchivehttp:/www.rcsb.org/pdb/advisory.html)
Model of new POMBE gene TARGET TEMPLATE = 1HCL PDB ID: 1HCL Schulze-Gahmen, U., J. Brandsen, H. D. Jones, D. O. Morgan, L. Meijer, J. Vesely, and S. H. Kim. "Multiple Modes of Ligand Recognition: Crystal Structures of Cyclin-dependent Protein Kinase 2 in Complex with ATP and Two Inhibitors, Olomoucine and Isopentenyladenine." Proteins 22 (1995): 378. The Protein Data Bank (PDB - http://www.pdb.org/) is the single worldwide repository for the processing and distribution of 3-D biological macromolecular structure data. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. The Protein Data Bank. Nucleic Acids Research 28 (2000): 235-242. (PDB Advisory Notice on using materials available in the archive: http://www.rcsb.org/pdb/advisory.html)

The casp contests Critical assessment of protein structure Prediction Began in 1994( CASP1) · Held every two years Experimentalists submit target sequences Predictors submit and rank blind predictions assessors develop criteria to judge success A meeting is held to discuss the results and a journal issue (of ProteIns) is published to describe them In theory, this identifies the problem areas and people go back and work on them for the next round of casp
The CASP contests • Critical Assessment of Protein Structure Prediction • Began in 1994 (CASP1) • Held every two years • Experimentalists submit target sequences • Predictors submit and rank blind predictions • Assessors develop criteria to judge success • A meeting is held to discuss the results and a journal issue (of PROTEINS) is published to describe them • In theory, this identifies the problem areas and people go back and work on them for the next round of CASP

CASP4 Target Toll 1. Protein name Example of a CASP target 2. Ors Name Escherichia coli 3. Number of amino acids(approx) 4. Accession number P08324 5. Sequence Database -pr 6. Amino acid sequence SKIVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALEL RDGDKSRFLGKGVTKAVAAVNGPIAQALIGKDAKDQAGIDKIMIDLDGTE NKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMM NIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAK VLKAKG MNTAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASE FYKDGKYVLAGEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGF AYQTKVLGDKIQL VGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTE TLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSD RVAKYNQLIRIEEALGEKAPYNGRKEIKGQA 7. Additional Information oligomerization state: dimer in the presence of magnesium by dynamic light scattering and small angle x-ray solution scattering and in the recently solved crystal structure 8. Homologous Sequence of known structure 9. Current state of the experimental work Structure solved by molecular replacement. Currently the refinement to 2. 5 a resolution is near completion Current Rfree 27 %. R 22
CASP4 Target T0111 1. Protein Name Example of a CASP target enolase 2. Organism Name Escherichia coli 3. Number of amino acids (approx) 431 4. Accession number P08324 5. Sequence Database Swiss-prot 6. Amino acid sequence SKIVKIIGREIIDSRGNPTVEAEVHLEGGFVGMAAAPSGASTGSREALEL RDGDKSRFLGKGVTKAVAAVNGPIAQALIGKDAKDQAGIDKIMIDLDGTE NKSKFGANAILAVSLANAKAAAAAKGMPLYEHIAELNGTPGKYSMPVPMM NIINGGEHADNNVDIQEFMIQPVGAKTVKEAIRMGSEVFHHLAKVLKAKG MNTAVGDEGGYAPNLGSNAEALAVIAEAVKAAGYELGKDITLAMDCAASE FYKDGKYVLAGEGNKAFTSEEFTHFLEELTKQYPIVSIEDGLDESDWDGF AYQTKVLGDKIQLVGDDLFVTNTKILKEGIEKGIANSILIKFNQIGSLTE TLAAIKMAKDAGYTAVISHRSGETEDATIADLAVGTAAGQIKTGSMSRSD RVAKYNQLIRIEEALGEKAPYNGRKEIKGQA 7. Additional Information oligomerization state: dimer in the presence of magnesium by dynamic light scattering and small angle x-ray solution scattering and in the recently solved crystal structure. 8. Homologous Sequence of known structure yes 9. Current state of the experimental work Structure solved by molecular replacement. Currently, the refinement to 2.5 A resolution is near completion. Current Rfree 27 % ; R 22 %
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Markov models.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Structure Prediction.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Organization of topics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Predicting rna Secondary structure.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 Review of DNA Seq.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Genome Sequencing.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 The Language of genomics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 5 Molecular Phylogenetics.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 4 Database Searching.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 More Pairwise Sequence Comparisons.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 More Multiple Sequence Alignment.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Michael Yaffe Introduction to Bioinformatics.pdf
- 《微生物遗传学》第四章 基因工程技术在改进微生物.ppt
- 《分子生物学》课程教学资源(练习题)试题详解(含参考答案).doc
- 南京军区南京总医院:《组织芯片应用的现状与前景》讲义.pdf
- 《酶学》课程教学资源(讲义)第四章 酶的结构和功能.doc
- 《酶学》课程教学资源(讲义)第十一章 酶在医学方面的应用.doc
- 《酶学》课程教学资源(讲义)第六章 多种因素对酶反应速度的影响.doc
- 《酶学》课程教学资源(讲义)第八章 酶的别构效应.doc
- 《酶学》课程教学资源(讲义)第五章 酶催化动力学基础.doc
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 Review of protein structure hierarchy.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 1 How are X-ray crystal structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 3 For a molecular simulation or model.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 2 Comparing protein Structures.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 The protein interactome.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 7 DNA Microarrays Clustering.pdf
- 麻省理工大学:《Foundations of Biology》课程教学资源(英文版)Lecture 6 Ab initio structure prediction.pdf
- 《植物与植物生理学》课程PPT教学课件(高职高专)第三章 植物的矿质营养.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第二章 植物的水分代谢.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第五章 植物的呼吸作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第四章 植物的光合作用.ppt
- 《植物与植物生理学》课程PPT教学课件(高职高专)第一章 植物细胞和组织.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)植物鉴赏与人文精神.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)展望21世纪的生命科学.ppt
- 四川农业大学:《生命科学概论》课程教学资源(PPT课件讲稿)人兽共患病.ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第1章 绪论(主讲:邹思湘).ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第2章 生命的化学特征.ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第3章 蛋白质.ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第4章 核酸.ppt
- 南京农业大学:《动物生物化学 Animal Biochemistry》精品课程教学资源(PPT课件讲稿)第5章 糖类.ppt