生物谷報(bào)道:來自美國(guó)克萊格凡特研究所(J. Craig Venter Institute,,由TIGR所建立),加拿大多倫多大學(xué),,加州大學(xué)圣地亞哥分校,,西班牙巴塞羅那大學(xué)(Universitat de Barcelona)的研究人員近期公布了單個(gè)個(gè)體二倍體基因組序列,為未來的基因組比較打開了一道門,,也開創(chuàng)了個(gè)體基因組信息的新紀(jì)元,。
我們每個(gè)人的基因組信息一般都是被包裝進(jìn)23對(duì)染色體中的,每23條來自一個(gè)親代,,他們的DNA又是來自其祖先基因的混合,。因此人類基因組都是作為二倍體行使功能,而且由于等位基因和/或其非編碼功能調(diào)控元件之間復(fù)雜的相互作用也會(huì)產(chǎn)生新表型,。
大約40多年前科學(xué)家們首次在染色體上觀察到了人類基因組的二倍體特性,,而且目前臨床實(shí)驗(yàn)室依然將染色體組型作為全基因組檢測(cè)的標(biāo)準(zhǔn)。隨著分子生物學(xué)的進(jìn)步,,其它的比如染色體熒光原位雜交(chromosomal fluorescence in situ hybridization,,F(xiàn)ISH),以芯片技術(shù)為基礎(chǔ)的遺傳分析等技術(shù)也為遺傳分析的進(jìn)步貢獻(xiàn)了不小的力量,。但是盡管有這些技術(shù),,科學(xué)家們依然懷疑在實(shí)驗(yàn)樣品中只觀測(cè)到一小部分的遺傳突變,。
過去的十年當(dāng)中,隨著高通量DNA測(cè)序技術(shù),,以及先進(jìn)的生物信息學(xué)分析方法的發(fā)展,,獲得人類基因組大多數(shù)序列的測(cè)序結(jié)果已經(jīng)成為可能,國(guó)際人類基因序列協(xié)會(huì)(Human Genome Sequencing Consortium,,HGSC)目前已經(jīng)獲得了人類基因組的兩個(gè)版本version,,分別利用的是克隆的方法,以及任意全基因組鳥槍法,。
在這篇文章中,,克萊格凡特研究所的研究人員分析1900萬條基因序列和1300萬條非編碼序列,使用最新的方法詳細(xì)檢測(cè)了不同版本的相同染色體的基因序列,。結(jié)果發(fā)現(xiàn)了400萬種變異,,包括單個(gè)核苷差異、序列插入和刪除以及單個(gè)基因副本數(shù)的不同,。
他們利用的方法主要是基于全基因組鳥槍法,,并配合先進(jìn)的基因組組合策略和軟件,從而完成了二倍體基因組大片段的測(cè)序((>200 kilobases),。與之前的人類基因組序列相比,研究人員發(fā)現(xiàn)測(cè)序結(jié)果中基因組變化的大部分是基于SNPs的已研究過比較多的變異,,但是這一測(cè)序也發(fā)現(xiàn)了一些很少研究的基因組變異,,插入和刪除,這組成了基因組突變事件的一小部分(22%),。
這些數(shù)據(jù)描繪了一個(gè)二倍體人類基因組的分子特征,,為未來的基因組比較打開了一道門,也開創(chuàng)了個(gè)體基因組信息的新紀(jì)元,。
原始出處:
PLoS Biology
Received: May 9, 2007; Accepted: July 30, 2007; Published: September 4, 2007
The Diploid Genome Sequence of an Individual Human
Samuel Levy1*, Granger Sutton1, Pauline C. Ng1, Lars Feuk2, Aaron L. Halpern1, Brian P. Walenz1, Nelson Axelrod1, Jiaqi Huang1, Ewen F. Kirkness1, Gennady Denisov1, Yuan Lin1, Jeffrey R. MacDonald2, Andy Wing Chun Pang2, Mary Shago2, Timothy B. Stockwell1, Alexia Tsiamouri1, Vineet Bafna3, Vikas Bansal3, Saul A. Kravitz1, Dana A. Busam1, Karen Y. Beeson1, Tina C. McIntosh1, Karin A. Remington1, Josep F. Abril4, John Gill1, Jon Borman1, Yu-Hui Rogers1, Marvin E. Frazier1, Stephen W. Scherer2, Robert L. Strausberg1, J. Craig Venter1
1 J. Craig Venter Institute, Rockville, Maryland, United States of America, 2 Program in Genetics and Genomic Biology, The Hospital for Sick Children, and Molecular and Medical Genetics, University of Toronto, Toronto, Ontario, Canada, 3 Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, United States of America, 4 Genetics Department, Facultat de Biologia, Universitat de Barcelona, Barcelona, Catalonia, Spain
Presented here is a genome sequence of an individual human. It was produced from 32 million random DNA fragments, sequenced by Sanger dideoxy technology and assembled into 4,528 scaffolds, comprising 2,810 million bases (Mb) of contiguous sequence with approximately 7.5-fold coverage for any given region. We developed a modified version of the Celera assembler to facilitate the identification and comparison of alternate alleles within this individual diploid genome. Comparison of this genome and the National Center for Biotechnology Information human reference assembly revealed more than 4.1 million DNA variants, encompassing 12.3 Mb. These variants (of which 1,288,319 were novel) included 3,213,401 single nucleotide polymorphisms (SNPs), 53,823 block substitutions (2–206 bp), 292,102 heterozygous insertion/deletion events (indels)(1–571 bp), 559,473 homozygous indels (1–82,711 bp), 90 inversions, as well as numerous segmental duplications and copy number variation regions. Non-SNP DNA variation accounts for 22% of all events identified in the donor, however they involve 74% of all variant bases. This suggests an important role for non-SNP genetic alterations in defining the diploid genome structure. Moreover, 44% of genes were heterozygous for one or more variants. Using a novel haplotype assembly strategy, we were able to span 1.5 Gb of genome sequence in segments >200 kb, providing further precision to the diploid nature of the genome. These data depict a definitive molecular portrait of a diploid human genome that provides a starting point for future genome comparisons and enables an era of individualized genomic information.