最近,,中國科學(xué)院北京基因組研究所基因組科學(xué)與信息重點實驗室“百人計劃”章張研究員,帶領(lǐng)其團隊成功開發(fā)出“編碼蛋白質(zhì)DNA序列并行比對工具—ParaAT(Parallel Alignment and back-Translation)”,。該研究成果發(fā)表在《生物醫(yī)藥與生物物理研究通訊》(Biochemical and Biophysical Research Communications,,BBRC)期刊上。
同源序列比對是生物信息學(xué)最普遍使用的分析方法之一,,其中,,編碼蛋白質(zhì)DNA序列比對最為常見,對比較基因組學(xué),、分子進化學(xué),、系統(tǒng)發(fā)育等領(lǐng)域具有重要的基礎(chǔ)意義。為獲取相應(yīng)的比對結(jié)果,,通常采用的方法是將蛋白序列的比對結(jié)果“回譯”(back-translate)成DNA比對序列,,這樣的比對結(jié)果比直接進行DNA序列比對更可靠、準(zhǔn)確,。為此,,科學(xué)家提出了多個不同的工具,采用的策略都是先進行蛋白質(zhì)序列比對,,然后將比對結(jié)果回譯成DNA比對,。然而,這些工具每次只能處理一組同源數(shù)據(jù),,無法實現(xiàn)多組同源序列的對比工作,。
鑒于傳統(tǒng)工具所產(chǎn)生的弊端,基因組所科研人員開發(fā)了ParaAT,,成功解決了此項科研難題,。ParaAT可實現(xiàn)多組同源編碼蛋白質(zhì)DNA序列的并行比對,不僅解決了大規(guī)模,、多組同源序列的比對工作,,同時也大大降低了運行時間,獲得了較好的并行加速比(speedup),,適合海量數(shù)據(jù)的分析工作,。
ParaAT可在不同操作系統(tǒng)下運行,支持多種不同的輸出格式,方便后續(xù)相關(guān)的生物信息學(xué)分析(如用于檢測自然選擇壓力的KaKs_Calculator),。(生物谷Bioon.com)
doi: doi: 10.1016/j.bbrc.2012.02.101
PMC:
PMID:
A parallel tool for constructing multiple protein-coding DNA alignments
Zhang Zhanga, Jingfa Xiaoa, Jiayan Wua, Haiyan Zhangb, Guiming Liua, Xumin Wanga, Lin Dais
Constructing multiple homologous alignments for protein-coding DNA sequences is crucial for a variety of bioinformatic analyses but remains computationally challenging. With the growing amount of sequence data available and the ongoing efforts largely dependent on protein-coding DNA alignments, there is an increasing demand for a tool that can process a large number of homologous groups and generate multiple protein-coding DNA alignments. Here we present a parallel tool – ParaAT that is capable of parallelly constructing multiple protein-coding DNA alignments for a large number of homologs. As testified on empirical datasets, ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks. ParaAT is freely available for academic use only at http://cbb.big.ac.cn/software