复旦大学:《基因组学》课程教学资源(学习资料)美国提出基因测序数据分类新标准

Science 9 October 2009: vol.326no.5950pp.236-237 DO:10.1126/ science.1180614 GENOMICS Genomics Genome Project Standards in a New Era of Sequencing P. S. G. Chain Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Science 9 october 2009: 236-237 For over a decade, genome sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole-genome sequencing that requires reevaluation of such standards. With commercially available 454 pyrosequencing(followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker"draft however, these can be very poor quality genomes(due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and has contributed to many wasted hours. Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time-and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The result is an ever-widening gap between drafted and finished genomes that only promises to continue(see the figure page 236); hence, there is an urgent need to distinguish good from poor data 美国提出基因测序数据分类新标准 时间:2009-10-2708:55来源科技日报 美研究人员提出了基因测序数据信息的质量标准,这有利于研究人员开发出更有 效的疫苗,或有助于公共健康部门或安全人员更迅速地应对潜在的公共卫生突发 事件
Science 9 October 2009: Vol. 326 no. 5950 pp. 236-237 DOI: 10.1126/science.1180614 GENOMICS Genomics Genome Project Standards in a New Era of Sequencing • P. S. G. Chain, • Genomic Standards Consortium Human Microbiome Project Jumpstart Consortium, Science 9 October 2009: 236-237. For over a decade, genome sequences have adhered to only two standards that are relied on for purposes of sequence analysis by interested third parties (1, 2). However, ongoing developments in revolutionary sequencing technologies have resulted in a redefinition of traditional whole-genome sequencing that requires reevaluation of such standards. With commercially available 454 pyrosequencing (followed by Illumina, SOLiD, and now Helicos), there has been an explosion of genomes sequenced under the moniker “draft”; however, these can be very poor quality genomes (due to inherent errors in the sequencing technologies, and the inability of assembly programs to fully address these errors). Further, one can only infer that such draft genomes may be of poor quality by navigating through the databases to find the number and type of reads deposited in sequence trace repositories (and not all genomes have this available), or to identify the number of contigs or genome fragments deposited to the database. The difficulty in assessing the quality of such deposited genomes has created some havoc for genome analysis pipelines and has contributed to many wasted hours. Exponential leaps in raw sequencing capability and greatly reduced prices have further skewed the time- and cost-ratios of draft data generation versus the painstaking process of improving and finishing a genome. The result is an ever-widening gap between drafted and finished genomes that only promises to continue (see the figure, page 236); hence, there is an urgent need to distinguish good from poor data sets. 美国提出基因测序数据分类新标准 时间:2009-10-27 08:55 来源:科技日报 美研究人员提出了基因测序数据信息的质量标准,这有利于研究人员开发出更有 效的疫苗,或有助于公共健康部门或安全人员更迅速地应对潜在的公共卫生突发 事件

最近,美国洛斯阿拉莫斯国家实验室(LANL)的一个遗传学小组和一国际财 团联合提出了一套旨在阐明可公开获取的基因测序数据信息的质量标准。新标准 最终可使遗传研究人员开发出更有效的疫苗,或有助于公共健康部门或安全人员 更迅速地应对潜在的公共卫生突发事件。 在最新一期的《科学》杂志上,LANL遗传学家帕特里克·钱恩和他的同事 提出了6个基因组测序数据标签,可将基因测序数据按其完整性、准确性以及由 此带来的可靠性进行归类。这些标签可在公共数据库中获取,而目前使用的标签 仅为两个。此项成果的重要性在于,研究人员必须每天使用这样的数据,以对未 知遗传数据和已知生物体的遗传数据进行相互参照,而有了这样的新的分类标 准,数据的获取与对比工作的效率将大大提高。 每个生物体的细胞内都有DNA,由4个分子构建模块(或称碱基对)组成 碱基对排成特定序列时就可构成基因。这些基因序列可包含对生物体有益或有害 的遗传指令。基因组研究人员编目了数以千计的基因数据,并将其放在公众数据 库中以供其他研究者使用。然而,由于基因数据的复杂性,公共数据库中的遗 传信息范围从粗略到精致一概都有。过去,这些基因数据常被归类为“草图”和 “成品”两大类,给基因数据的准确性留下了太多的不确定性。 钱恩表示,在过去几年里,基因测序技术已取得重大进步,公众可获得的基 因数据已呈爆炸性增长,每天产生的碱基对序列数据量要比过去几年产生的数据 量还要多几十亿次。不同的测序技术具有不同的精确度。一个序列中的高度不确 定性可能会引导研究人员走向一条耗时长达一年甚至数年的错误道路。因此,有 必要建立一个标准,为研究人员提供对遗传测序数据质量的明确评估 钱恩联合了大大小小的数个基因组测序中心,如美国能源部联合基因组研究 所、桑格研究所、人类微生物群系项目 Jumpstart联盟测序中心、密歇根州立大 学以及安大略省癌症研究所等,共同提议将现有的测序数据分类从两大类充实为 6大类。这6个标准涵盖了从代表公众提交最低要求的“标准草图序列”到代表 最高标准的“完成序列”,而“完成序列”的验收标准是每10万个碱基对中最 多只能包含一个错误。 LANL基因科学小组负责人、联合基因组研究所LANL研究中心主任克里 斯·戴特表示,该项研究的目的是为了让所有主要的基因组中心和基因组研究小 组都能用上符合其需要的分类基因组测序数据。而为了尽可能保证基因组序列的 完整性,一些较小的研究中心也可采用这个分类等级来建立和提交其研究成果, 以帮助其他科学家了解既已完成的工作。(冯卫东) Standards for a new genomic Era LANL among organizations proposing new genome sequence strategies Los Alamos, New Mexico, OC TOBER 21, 2009-A team of geneticists at Los
最近,美国洛斯阿拉莫斯国家实验室(LANL)的一个遗传学小组和一国际财 团联合提出了一套旨在阐明可公开获取的基因测序数据信息的质量标准。新标准 最终可使遗传研究人员开发出更有效的疫苗,或有助于公共健康部门或安全人员 更迅速地应对潜在的公共卫生突发事件。 在最新一期的《科学》杂志上,LANL 遗传学家帕特里克·钱恩和他的同事 提出了 6 个基因组测序数据标签,可将基因测序数据按其完整性、准确性以及由 此带来的可靠性进行归类。这些标签可在公共数据库中获取,而目前使用的标签 仅为两个。此项成果的重要性在于,研究人员必须每天使用这样的数据,以对未 知遗传数据和已知生物体的遗传数据进行相互参照,而有了这样的新的分类标 准,数据的获取与对比工作的效率将大大提高。 每个生物体的细胞内都有 DNA,由 4 个分子构建模块(或称碱基对)组成, 碱基对排成特定序列时就可构成基因。这些基因序列可包含对生物体有益或有害 的遗传指令。基因组研究人员编目了数以千计的基因数据,并将其放在公众数据 库中以供其他研究者使用。 然而,由于基因数据的复杂性,公共数据库中的遗 传信息范围从粗略到精致一概都有。过去,这些基因数据常被归类为“草图”和 “成品”两大类,给基因数据的准确性留下了太多的不确定性。 钱恩表示,在过去几年里,基因测序技术已取得重大进步,公众可获得的基 因数据已呈爆炸性增长,每天产生的碱基对序列数据量要比过去几年产生的数据 量还要多几十亿次。不同的测序技术具有不同的精确度。一个序列中的高度不确 定性可能会引导研究人员走向一条耗时长达一年甚至数年的错误道路。因此,有 必要建立一个标准,为研究人员提供对遗传测序数据质量的明确评估。 钱恩联合了大大小小的数个基因组测序中心,如美国能源部联合基因组研究 所、桑格研究所、人类微生物群系项目 Jumpstart 联盟测序中心、密歇根州立大 学以及安大略省癌症研究所等,共同提议将现有的测序数据分类从两大类充实为 6 大类。这 6 个标准涵盖了从代表公众提交最低要求的“标准草图序列”到代表 最高标准的“完成序列”,而“完成序列”的验收标准是每 10 万个碱基对中最 多只能包含一个错误。 LANL 基因科学小组负责人、联合基因组研究所 LANL 研究中心主任克里 斯·戴特表示,该项研究的目的是为了让所有主要的基因组中心和基因组研究小 组都能用上符合其需要的分类基因组测序数据。而为了尽可能保证基因组序列的 完整性,一些较小的研究中心也可采用这个分类等级来建立和提交其研究成果, 以帮助其他科学家了解既已完成的工作。(冯卫东) Standards for a New Genomic Era LANL among organizations proposing new genome sequence strategies Los Alamos, New Mexico, OCTOBER 21, 2009—A team of geneticists at Los

Alamos National Laboratory, together with a consortium of intemational researchers, has recently proposed a set of standards designed to elucidate the quality of publicly available genetic sequencing information. The new standards could eventually allow genetic researchers to develop vaccines more efficiently or help public health or security personnel more quickly respond to potential public-health emergencies In a recent issue of Science, Los Alamos geneticist Patrick Chain and colleagues presented six labels for genome sequence data that are, or will become, available in public databases rather than the two labels used today The six labels would roughly characterize the completeness and accuracy-and consequently, the potential reliability-of genetic sequencing data. This is of great importance since researchers use such data on a daily basis for cross-referencing unknown genetic material with the genetic material of known organisms. Every living organism with DNA has chromosomes containing the four molecular building blocks, or base pairs, represented by letters A, T, G, and C One chromosome can contain millions of base pairs arranged like rungs on a ladder of DNA. The base pairs are arranged in sets of specific sequences that make up genes. These gene sequences can contain genetic instructions that help or harm an organism-for example by encoding enzymes that digest certain foods, or inducing cellular aberrations that give rise to certain diseases Genome researchers have catalogued genetic data from thousands of organisms and placed them in publicly available libraries. Researchers can use these libraries to crosscheck genetic data, for example when attempting to isolate an unknown public health threat, or to determine where a potentially helpful or harmful gene may be located on an organisms chromosome. For scientific fields such as biofuels research or environ mental remediation genetic data could help researchers determine whether microorganisms can efficienty break down plant matter to aid in ethanol production, or digest environmental contaminants like hydrocarbons However, because of the complexity of genetic data, genetic information in
Alamos National Laboratory, together with a consortium of international researchers, has recently proposed a set of standards designed to elucidate the quality of publicly available genetic sequencing information. The new standards could eventually allow genetic researchers to develop vaccines more efficiently or help public health or security personnel more quickly respond to potential public-health emergencies. In a recent issue of Science, Los Alamos geneticist Patrick Chain and colleagues presented six labels for genome sequence data that are, or will become, available in public databases rather than the two labels used today. The six labels would roughly characterize the completeness and accuracy—and consequently, the potential reliability—of genetic sequencing data. This is of great importance since researchers use such data on a daily basis for cross-referencing unknown genetic material with the genetic material of known organisms. Every living organism with DNA has chromosomes containing the four molecular building blocks, or base pairs, represented by letters A, T, G, and C. One chromosome can contain millions of base pairs arranged like rungs on a ladder of DNA. The base pairs are arranged in sets of specific sequences that make up genes. These gene sequences can contain genetic instructions that help or harm an organism—for example by encoding enzymes that digest certain foods, or inducing cellular aberrations that give rise to certain diseases. Genome researchers have catalogued genetic data from thousands of organisms and placed them in publicly available libraries. Researchers can use these libraries to crosscheck genetic data, for example when attempting to isolate an unknown public health threat, or to determine where a potentially helpful or harmful gene may be located on an organism’s chromosome. For scientific fields such as biofuels research or environmental remediation, genetic data could help researchers determine whether microorganisms can efficiently break down plant matter to aid in ethanol production, or digest environmental contaminants like hydrocarbons. However, because of the complexity of genetic data, genetic information in

public libraries can range from very rough to very refined In the past, genetic data has been classified either as "draft"or" finished, "leaving a wide range of uncertainty about the potential accuracy of genetic data In the past few years we've seen major advances in genetic sequencing technology, so we ve seen an explosion in the amount of publicly available data, said Chain, who is lead author of the Science paper. The amount of base-pair sequencing data generated each day is in the billions-orders of magnitude larger than what was generated a few years ago. Different sequencing technologies have different levels of accuracy. High degrees of uncertainty in a sequence can potentially lead a researcher down a wrong path that they could follow for a year or more. We now have a need for standards that will provide researchers with an unambiguous estimation of the quality of genetic sequence data. Working with researchers from genome sequencing centers big and small-induding the U.s. Department of Energy s Joint Genome Institute, the Sanger Institute, the Human Microbiome Project Jumpstart Consortium sequencing centers, Michigan State University, and the Ontario Institute for Cancer Research among others -chain and colleagues have proposed that sequence data be placed into one of six categories that augment the existing two categories. The six standards range from "standard draft sequence, representing minimum requirements for public submission, to a finished sequence, the highest standard, which can be verified to contain only one sequencing error per 100,000 base pairs My hope is all the major genome centers and advanced genomics groups use the gradations that fit their needs, said Chris Detter, LANL Genome Science Group Leader and Joint Genome Institute-LANL Center director. Some centers may want all six, while some may only want three, but as long as they keep them intact, we are in good shape. Then, my hope is that the smaller genomics groups adopt the classes as written to help the rest of the scientific community know what they are generating and submitting Other DOE JGI authors on the Science paper include David Bruce, Phil
public libraries can range from very rough to very refined. In the past, genetic data has been classified either as “draft” or “finished,” leaving a wide range of uncertainty about the potential accuracy of genetic data. “In the past few years we’ve seen major advances in genetic sequencing technology, so we’ve seen an explosion in the amount of publicly available data,” said Chain, who is lead author of the Science paper. “The amount of base-pair sequencing data generated each day is in the billions—orders of magnitude larger than what was generated a few years ago. Different sequencing technologies have different levels of accuracy. High degrees of uncertainty in a sequence can potentially lead a researcher down a wrong path that they could follow for a year or more. We now have a need for standards that will provide researchers with an unambiguous estimation of the quality of genetic sequence data.” Working with researchers from genome sequencing centers big and small—including the U.S. Department of Energy’s Joint Genome Institute, the Sanger Institute, the Human Microbiome Project Jumpstart Consortium sequencing centers, Michigan State University, and the Ontario Institute for Cancer Research, among others—Chain and colleagues have proposed that sequence data be placed into one of six categories that augment the existing two categories. The six standards range from “standard draft sequence,” representing minimum requirements for public submission, to a “finished sequence,” the highest standard, which can be verified to contain only one sequencing error per 100,000 base pairs. “My hope is all the major genome centers and advanced genomics groups use the gradations that fit their needs,” said Chris Detter, LANL Genome Science Group Leader and Joint Genome Institute-LANL Center director. “Some centers may want all six, while some may only want three, but as long as they keep them intact, we are in good shape. Then, my hope is that the smaller genomics groups adopt the classes as written to help the rest of the scientific community know what they are generating and submitting.” Other DOE JGI authors on the Science paper include David Bruce, Phil

Hugenholtz, Nikos Kyrpides, Alla Lapidus, Sam Pitluck, and Jeremy Schmutz Other collaborating institutions are the Sanger Institute and the HMP Jumpstart Consortium sequencing centers (Washington University School of Medicine, the Broad Institute, the J Craig Venter Institute, and Baylor College of Medicine), as well as Michigan State University, the Ontario Institute for Cancer Research, National Center for Biotechnology Information, Seattle Childrens Hospital and Research Institute, Emory GRA, and the Naval Medical research Center AboutLosAlamosNationalLaboratory(www.lanl.gov) Los Alamos National Laboratory, a multidisciplinary research institution engaged in strategic science on behalf of national security, is operated by Los Alamos National Security, LLC, a team composed of Bechtel National, the University of Califomia, The Babcock Wilcox Company, and the Washington Division of URS for the Department of Energy's National Nuclear Security Administration Los Alamos enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns
Hugenholtz, Nikos Kyrpides, Alla Lapidus, Sam Pitluck, and Jeremy Schmutz. Other collaborating institutions are the Sanger Institute and the HMP Jumpstart Consortium sequencing centers (Washington University School of Medicine, the Broad Institute, the J. Craig Venter Institute, and Baylor College of Medicine), as well as Michigan State University, the Ontario Institute for Cancer Research, National Center for Biotechnology Information, Seattle Children’s Hospital and Research Institute, Emory GRA, and the Naval Medical Research Center. About Los Alamos National Laboratory (www.lanl.gov) Los Alamos National Laboratory, a multidisciplinary research institution engaged in strategic science on behalf of national security, is operated by Los Alamos National Security, LLC, a team composed of Bechtel National, the University of California, The Babcock & Wilcox Company, and the Washington Division of URS for the Department of Energy’s National Nuclear Security Administration. Los Alamos enhances national security by ensuring the safety and reliability of the U.S. nuclear stockpile, developing technologies to reduce threats from weapons of mass destruction, and solving problems related to energy, environment, infrastructure, health, and global security concerns
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 复旦大学:《基因组学》课程教学资源(学习资料)人类基因组范围转录异构变异——表达水平多样性.pdf
- 复旦大学:《基因组学》课程教学资源(学习资料)系统生物学综述.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)遗传学词典.doc
- 张勘上海市卫生局:病原微生物实验室生物安全管控的实践探索与未来挑战(张勘).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室生物安全基本概念与危险因子(叶荣).pdf
- 《医学与生物安全》课程教学资源:生物安全监督执法——强化世博会实验室生物安全保障培训(上海市卫生局监督所:顾小平).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)生物安全 Bio-safety瞿涤人为生物危险(生物战剂、生物恐怖、防御突发事件的应对).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室生物安全基础——生物安全实验室的个人防护、消毒灭菌、废弃物处理(孙志平).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)实验室管理——菌毒种保藏与运输(丁悦娜).pdf
- 复旦大学:《医学与生物安全》课程教学资源(讲稿)生物安全实验室设施——防护设施、空气净化与负压体系、生物安全柜(韩文东).pdf
- 复旦大学:《医学与生物安全》课程教学资源(见实习方案,WORD版).doc
- 复旦大学:《医学与生物安全》课程教学资源(见习实习方案).pdf
- 山东大学:PCR最新技术原理、方法及应用(第二版,张为宁),2011.ppt
- 浙江大学:《生物信息学》课程配套PPT课件(第二版)3 Analysis and alignment of sequences 3.1 Compositional bias in biological sequences 3.2 Alignment of pairs of sequences.pptx
- 浙江大学:《生物信息学》课程配套PPT课件(第二版)3 Analysis and alignment of sequences 3.4 Multiple sequence alignment and domain finding.pptx
- 浙江大学:《生物信息学》课程配套PPT课件(第二版)5 Phylogenetic Tree 5.1 Genetic polymorphism and phylogenetic tree 5.2 Construction of phylogenetic tree.pptx
- 上海交通大学医学院:常用实验动物生物学特性及其应用(小鼠、大鼠、豚鼠、兔).ppt
- 天津医科大学附属肿瘤医院:采用系统生物学方法分析乳腺癌转移中Runx2对细胞外基质重塑的调节机制.ppt
- 中国科学技术大学:系统生物学与复杂性疾病(知识讲座,吴家睿).ppt
- 中国科学技术大学:《药物化学》课程教学资源(PPT课件讲稿)Chapter 1 Introduction Medicinal Chemistry(授课教师:阮科).ppt
- 复旦大学:《基因组学》课程教学资源(学习资料)科学家绘制出最清晰立体人类基因组结构图.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)简述miRNA及其在动植物中的差异.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)基因注解网站.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)基因组结构的进化.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)Genome Project History(2011).ppt
- 复旦大学:《基因组学》课程教学资源(学习资料)2007年完成基因组测序的生物.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)大豆基因组测序完成.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)高粱基因组计划.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)国际研究小组完成木薯基因组图谱.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)科学家计划绘制香蕉基因组图谱.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)美科学家绘出玉米基因组草图.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)拟南芥基因组.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)葡萄基因组测定完成.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)世界千人测序计划.pdf
- 复旦大学:《基因组学》课程教学资源(学习资料)烟草基因组计划.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)杨树全基因组测序.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)玉米基因组.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)植物基因组.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)植物基因组计划.doc
- 复旦大学:《基因组学》课程教学资源(学习资料)基因组加倍与物种形成.pdf