最近,中國(guó)科學(xué)院北京基因組研究所基因組科學(xué)與信息重點(diǎn)實(shí)驗(yàn)室“百人計(jì)劃”章張研究員,,帶領(lǐng)其團(tuán)隊(duì)成功開發(fā)出“編碼蛋白質(zhì)DNA序列并行比對(duì)工具—ParaAT(Parallel Alignment and back-Translation)”,。該研究成果發(fā)表在《生物醫(yī)藥與生物物理研究通訊》(Biochemical and Biophysical Research Communications,,BBRC)期刊上。
同源序列比對(duì)是生物信息學(xué)最普遍使用的分析方法之一,,其中,,編碼蛋白質(zhì)DNA序列比對(duì)最為常見,對(duì)比較基因組學(xué),、分子進(jìn)化學(xué),、系統(tǒng)發(fā)育等領(lǐng)域具有重要的基礎(chǔ)意義。為獲取相應(yīng)的比對(duì)結(jié)果,,通常采用的方法是將蛋白序列的比對(duì)結(jié)果“回譯”(back-translate)成DNA比對(duì)序列,,這樣的比對(duì)結(jié)果比直接進(jìn)行DNA序列比對(duì)更可靠、準(zhǔn)確,。為此,,科學(xué)家提出了多個(gè)不同的工具,采用的策略都是先進(jìn)行蛋白質(zhì)序列比對(duì),,然后將比對(duì)結(jié)果回譯成DNA比對(duì),。然而,這些工具每次只能處理一組同源數(shù)據(jù),,無(wú)法實(shí)現(xiàn)多組同源序列的對(duì)比工作,。
鑒于傳統(tǒng)工具所產(chǎn)生的弊端,基因組所科研人員開發(fā)了ParaAT,,成功解決了此項(xiàng)科研難題,。ParaAT可實(shí)現(xiàn)多組同源編碼蛋白質(zhì)DNA序列的并行比對(duì),不僅解決了大規(guī)模、多組同源序列的比對(duì)工作,,同時(shí)也大大降低了運(yùn)行時(shí)間,,獲得了較好的并行加速比(speedup),適合海量數(shù)據(jù)的分析工作,。
ParaAT可在不同操作系統(tǒng)下運(yùn)行,,支持多種不同的輸出格式,方便后續(xù)相關(guān)的生物信息學(xué)分析(如用于檢測(cè)自然選擇壓力的KaKs_Calculator),。(生物谷Bioon.com)
doi: doi: 10.1016/j.bbrc.2012.02.101
PMC:
PMID:
A parallel tool for constructing multiple protein-coding DNA alignments
Zhang Zhanga, Jingfa Xiaoa, Jiayan Wua, Haiyan Zhangb, Guiming Liua, Xumin Wanga, Lin Dais
Constructing multiple homologous alignments for protein-coding DNA sequences is crucial for a variety of bioinformatic analyses but remains computationally challenging. With the growing amount of sequence data available and the ongoing efforts largely dependent on protein-coding DNA alignments, there is an increasing demand for a tool that can process a large number of homologous groups and generate multiple protein-coding DNA alignments. Here we present a parallel tool – ParaAT that is capable of parallelly constructing multiple protein-coding DNA alignments for a large number of homologs. As testified on empirical datasets, ParaAT is well suited for large-scale data analysis in the high-throughput era, providing good scalability and exhibiting high parallel efficiency for computationally demanding tasks. ParaAT is freely available for academic use only at http://cbb.big.ac.cn/software