近日,,密蘇里大學(xué)發(fā)現(xiàn),,在多種植物基因組完全不同的區(qū)域中發(fā)現(xiàn)相同的DNA序列,。Dmitry Korkin是計(jì)算機(jī)系的助理教授,,也是該論文的主要作者,。“之前沒有人能夠完成這樣一規(guī)模的研究。”研究結(jié)果發(fā)表在PNAS雜志上,。
當(dāng)白宮科技政策辦公室宣布了“大數(shù)據(jù)研究和發(fā)展倡議”后,,對(duì)大量數(shù)據(jù)進(jìn)行官方分析成為國家的重中之重。密蘇里大學(xué)的一個(gè)多學(xué)科團(tuán)隊(duì)成功地應(yīng)對(duì)了巨大數(shù)據(jù)的挑戰(zhàn),,他們用開創(chuàng)性的計(jì)算計(jì)算法發(fā)現(xiàn)不同動(dòng)植物種類間的相同DNA序列,,從而解決的一個(gè)主要的生物學(xué)問題。
研究的共同作者,、動(dòng)物科學(xué)助理教授Gavin Conant說,,“我們的發(fā)現(xiàn)有助于解釋植物進(jìn)化的一些謎團(tuán),植物基因組的基礎(chǔ)研究為藥物及農(nóng)作物開發(fā)提供給了原材料并改進(jìn)技術(shù)”
先前的研究發(fā)現(xiàn),,在不同的動(dòng)物DNA中存在長段的相同編碼,。但是在MU的此次新研究前,計(jì)算機(jī)程序不足夠發(fā)現(xiàn)植物DNA中的相同序列,,因?yàn)檫@些相同的片段不在同一位點(diǎn)上,。
之前的研究是將六種動(dòng)物(狗、雞,、人類,、小鼠、獼猴,、大鼠)的基因組相互進(jìn)行了對(duì)比,。同樣的,六種植物(擬南芥,、大豆,、大米、三葉,、高粱和葡萄)的基因組也進(jìn)行了相互對(duì)比,。完成這些遺傳序列對(duì)比共使用了48臺(tái)具有每小時(shí)100萬次搜索能力的計(jì)算機(jī),耗時(shí)4個(gè)星期,,總搜索次數(shù)達(dá)320億次,。
雖然研究人員發(fā)現(xiàn)植物種類間就像動(dòng)物種族一樣有相同序列,但他們表示這些序列演化過程不同。
Conant 說,,“人們可能希望看到趨同進(jìn)化,,但是我們不這么認(rèn)為,植物和動(dòng)物都是復(fù)雜的多細(xì)胞生物,,都需要應(yīng)對(duì)許多相同的環(huán)境條件,,例如呼吸空氣和攝入水分、應(yīng)對(duì)天氣變化,,不過它們的基因組以不同的方式編碼應(yīng)對(duì)這些挑戰(zhàn)的解決方案,。
MU團(tuán)隊(duì)的研究為將來研究動(dòng)植物發(fā)展出不同的遺傳機(jī)制的原因以及這些遺傳機(jī)制如何運(yùn)作奠定了基礎(chǔ);他們的基礎(chǔ)研究也為可能改善人類生活的新發(fā)現(xiàn)奠定了基礎(chǔ),。用于編碼分析的計(jì)算機(jī)程序除了提高遺傳科學(xué)在抵抗疾病中的潛能外,,其本身也有助于新藥研發(fā)。
Korkin說:“同樣的算法可用于發(fā)現(xiàn)生物體整套蛋白質(zhì)中相同的序列模式,,這有助于找到現(xiàn)有藥物新靶標(biāo)或研究這些藥物的副作用,。”(生物谷Bioon.com)
doi:10.1073/pnas.1121356109
PMC:
PMID:
Long identical multispecies elements in plant and animal genomes
Jeff Reneker, Eric Lyons, Gavin C. Conant, J. Chris Pires, Michael Freeling, Chi-Ren Shyu, and Dmitry Korkin
Ultraconserved elements (UCEs) are DNA sequences that are 100% identical (no base substitutions, insertions, or deletions) and located in syntenic positions in at least two genomes. Although hundreds of UCEs have been found in animal genomes, little is known about the incidence of ultraconservation in plant genomes. Using an alignment-free information-retrieval approach, we have comprehensively identified all long identical multispecies elements (LIMEs), which include both syntenic and nonsyntenic regions, of at least 100 identical base pairs shared by at least two genomes. Among six animal genomes, we found the previously known syntenic UCEs as well as previously undescribed nonsyntenic elements. In contrast, among six plant genomes, we only found nonsyntenic LIMEs. LIMEs can also be classified as either simple (repetitive) or complex (nonrepetitive), they may occur in multiple copies in a genome, and they are often spread across multiple chromosomes. Although complex LIMEs were found in both animal and plant genomes, they differed significantly in their composition and copy number. Further analyses of plant LIMEs revealed their functional diversity, encompassing elements found near rRNA and enzyme-coding genes, as well as those found in transposons and noncoding DNA. We conclude that despite the common presence of LIMEs in both animal and plant lineages, the evolutionary processes involved in the creation and maintenance of these elements differ in the two groups and are likely attributable to several mechanisms, including transfer of genetic material from organellar to nuclear genomes, de novo sequence manufacturing, and purifying selection.