最近,,中國科學院北京基因組研究所基因組科學與信息重點實驗室“百人計劃”章張研究員,,帶領其團隊成功開發(fā)出“編碼蛋白質DNA序列并行比對工具—ParaAT(Parallel Alignment and back-Translation)”。該研究成果發(fā)表在《生物醫(yī)藥與生物物理研究通訊》(Biochemical and Biophysical Research Communications,,BBRC)期刊上。
同源序列比對是生物信息學最普遍使用的分析方法之一,,其中,,編碼蛋白質DNA序列比對最為常見,對比較基因組學、分子進化學,、系統(tǒng)發(fā)育等領域具有重要的基礎意義,。為獲取相應的比對結果,通常采用的方法是將蛋白序列的比對結果“回譯”(back-translate)成DNA比對序列,,這樣的比對結果比直接進行DNA序列比對更可靠,、準確。為此,,科學家提出了多個不同的工具,,采用的策略都是先進行蛋白質序列比對,然后將比對結果回譯成DNA比對,。然而,,這些工具每次只能處理一組同源數據,無法實現多組同源序列的對比工作,。
鑒于傳統(tǒng)工具所產生的弊端,,基因組所科研人員開發(fā)了ParaAT,成功解決了此項科研難題,。ParaAT可實現多組同源編碼蛋白質DNA序列的并行比對,,不僅解決了大規(guī)模、多組同源序列的比對工作,,同時也大大降低了運行時間,,獲得了較好的并行加速比(speedup),適合海量數據的分析工作,。
ParaAT可在不同操作系統(tǒng)下運行,,支持多種不同的輸出格式,方便后續(xù)相關的生物信息學分析(如用于檢測自然選擇壓力的KaKs_Calculator),。(生物谷Bioon.com)
doi: doi: 10.1016/j.bbrc.2012.02.101
PMC:
PMID:
A parallel tool for constructing multiple protein-coding DNA alignments
Zhang Zhanga, Jingfa Xiaoa, Jiayan Wua, Haiyan Zhangb, Guiming Liua, Xumin Wanga, Lin Dais
Constructing multiple homologous alignments for protein-coding DNA sequences is crucial for a variety of bioinformatic analyses but remains computationally challenging. With the growing amount of sequence data available and the ongoing efforts largely dependent on protein-coding DNA alignments, there is an increasing demand for a tool that can process a large number of homologous groups and generate multiple protein-coding DNA alignments. Here we present a parallel tool – ParaAT that is capable of parallelly constructing multiple protein-coding DNA alignments for a large number of homologs. As testified on empirical datasets, ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks. ParaAT is freely available for academic use only at http://cbb.big.ac.cn/software