近日,,中國科學(xué)院北京基因組研究所基因組科學(xué)與信息重點(diǎn)實(shí)驗(yàn)室“百人計(jì)劃”章張研究員帶領(lǐng)其團(tuán)隊(duì),成功設(shè)計(jì)開發(fā)出檢測密碼子使用偏好(Codon Usage Bias,,簡稱CUB)的新算法:密碼子偏差系數(shù)模型(Codon Deviation Coefficient,,簡稱CDC)。該研究成果發(fā)表在BMC Bioinformatics雜志上,。
此項(xiàng)工作原創(chuàng)性地將概率論中的交,、并、補(bǔ)操作應(yīng)用到組分分析,,用GC含量(S)和嘌呤含量(R)來表示四個核苷酸組分,,并在此基礎(chǔ)上推導(dǎo)出密碼子和氨基酸的組分,從而設(shè)計(jì)出基于S和R的組分模型,,應(yīng)用該模型考察基因的CUB,,進(jìn)而提出了CDC算法。不同于現(xiàn)有的CAI,、ENC等相關(guān)算法,,CDC通過GC含量和嘌呤含量考慮了不同序列的背景組分特異性,獨(dú)創(chuàng)性地運(yùn)用自展重抽樣法(Bootstrap Resampling)檢測CUB的顯著性,,且不需要高表達(dá)基因作為先驗(yàn)信息。
經(jīng)驗(yàn)證,,CDC在模擬數(shù)據(jù)中優(yōu)于現(xiàn)有的多個相關(guān)算法,,在真實(shí)數(shù)據(jù)中CDC與基因表達(dá)含量的關(guān)聯(lián)系數(shù)(Correlation Coefficient)高于其它算法,并且在大腸桿菌中發(fā)現(xiàn)CUB的顯著性與基因功能有著緊密聯(lián)系,。
該項(xiàng)成果的發(fā)布,,使科研工作者能更準(zhǔn)確快速地分析CUB,進(jìn)而更深入地研究在自然選擇壓力下的基因突變,、基因表達(dá),,蛋白質(zhì)功能等的進(jìn)化。(生物谷Bioon.com)
doi:10.1186/1471-2105-13-43
PMC:
PMID:
Codon Deviation Coefficient: a novel measure for estimating codon usage bias and its statistical significance
Zhang Zhang, Jun Li, Peng Cui, Feng Ding, Ang Li, Jeffrey P Townsend and Jun Yu
Background Genetic mutation, selective pressure for translational efficiency and accuracy, level of gene expression, and protein function through natural selection are all believed to lead to codon usage bias (CUB). Therefore, informative measurement of CUB is of fundamental importance to making inferences regarding gene function and genome evolution. However, extant measures of CUB have not fully accounted for the quantitative effect of background nucleotide composition and have not statistically evaluated the significance of CUB in sequence analysis. Results Here we propose a novel measure--Codon Deviation Coefficient (CDC)--that provides an informative measurement of CUB and its statistical significance without requiring any prior knowledge. Unlike previous measures, CDC estimates CUB by accounting for background nucleotide compositions tailored to codon positions and adopts the bootstrapping to assess the statistical significance of CUB for any given sequence. We evaluate CDC by examining its effectiveness on simulated sequences and empirical data and show that CDC outperforms extant measures by achieving a more informative estimation of CUB and its statistical significance. Conclusions As validated by both simulated and empirical data, CDC provides a highly informative quantification of CUB and its statistical significance, useful for determining comparative magnitudes and patterns of biased codon usage for genes or genomes with diverse sequence compositions.