生物谷報(bào)道:一項(xiàng)新的研究指出,,基于計(jì)算機(jī)的統(tǒng)計(jì)方法正在幫助研究人員用物種的基因片斷來(lái)建立地球上所有物種的家族樹(shù)。雖然DNA序列的數(shù)據(jù)正在加速積累,,但是這些序列通常只是一些片段,,這給構(gòu)造“生命之樹(shù)”的工作帶來(lái)不少的問(wèn)題。較多的脊椎動(dòng)物和有葉植物的基因信息現(xiàn)已存在于數(shù)據(jù)庫(kù)中,,但是不常見(jiàn)的物種,,比如細(xì)菌和真菌的序列在現(xiàn)有的數(shù)據(jù)庫(kù)中存在的不多。現(xiàn)在,,Amy Driskell和同事提出了一個(gè)有意思的計(jì)算方法,,將來(lái)自不同數(shù)據(jù)庫(kù)的大量不完整的數(shù)據(jù)結(jié)合起來(lái)。雖然有些物種的遺傳數(shù)據(jù)的不完全性高達(dá)92%,,研究人員能從中得到有用的信息,,他們用這些信息證實(shí)了基于更完整序列所得到的進(jìn)化關(guān)系。最新一期Science報(bào)道這方面的進(jìn)展,,同時(shí)同期期還刊登了一篇研究評(píng)述總結(jié)了生物研究這個(gè)方面的進(jìn)展,。
生物谷專家認(rèn)為,如何處理海量的基因組數(shù)據(jù)庫(kù),,將來(lái)還有蛋白質(zhì)組數(shù)據(jù)庫(kù),,一直是科學(xué)家苦惱的科學(xué)問(wèn)題,這一基于計(jì)算機(jī)的統(tǒng)計(jì)方法有望將這些數(shù)據(jù)進(jìn)行總結(jié)歸納,,得出更有意義的結(jié)論,。但是我們同時(shí)要看到,當(dāng)前任何一種模型也不能完全解釋如此復(fù)雜的數(shù)據(jù)模型,,將來(lái)會(huì)有更多的,,更完善的計(jì)算機(jī)模型出現(xiàn)。
We assess the phylogenetic potential of 300,000 protein sequences sampled from Swiss-Prot and GenBank. Although only a small subset of these data was potentially phylogenetically informative, this subset retained a substantial fraction of the original taxonomic diversity. Sampling biases in the databases necessitate building phylogenetic data sets that have large numbers of missing entries. However, an analysis of two "supermatrices" suggests that even data sets with as much as 92% missing data can provide insights into broad sections of the tree of life.
全文下載
點(diǎn)擊瀏覽該文件
相關(guān)文章
Genomic Databases and the Tree of Life
Keith A. Crandall and Jennifer E. Buhay
Science 12 November 2004: 1144-1145
[Summary] [Full Text] [PDF]