隨著蛋白質(zhì)結(jié)構(gòu)信息的不斷積累,以及結(jié)構(gòu)基因組學(xué)不斷的發(fā)展,,越來越多的功能未知但結(jié)構(gòu)已知的蛋白質(zhì)提交到了國(guó)際大分子數(shù)據(jù)庫中(PDB數(shù)據(jù)庫),,這些蛋白質(zhì)的功能及其功能位點(diǎn)需要注釋,。而隨著實(shí)驗(yàn)生物學(xué)的不斷發(fā)展,以前一些已知功能的蛋白質(zhì)的功能及其功能位點(diǎn)可能需要重新注釋,。因此,,發(fā)展一種精確,快速的可用于大規(guī)模功能注釋的算法是結(jié)構(gòu)生物信息學(xué)的重要研究?jī)?nèi)容之一,。盡管已有許多算法用來對(duì)蛋白質(zhì)結(jié)構(gòu)或序列進(jìn)行功能注釋,,但這些算法的精確性,敏感度等需要更進(jìn)一步的提高,。
最近,,黃京飛課題組的李功華博士生在導(dǎo)師的指導(dǎo)下,開發(fā)出了一個(gè)新的預(yù)測(cè)蛋白質(zhì)功能位點(diǎn)的算法(CMASA),。這個(gè)算法相對(duì)于其它已知的算法具有更高的精確性和敏感性,,而且具有計(jì)算速度快的特點(diǎn),。利用CMASA,黃京飛課題組成員對(duì)PDB數(shù)據(jù)庫中的酶進(jìn)行了催化位點(diǎn)的注釋并發(fā)現(xiàn)了166個(gè)新的未被注釋的酶的催化位點(diǎn),。(生物谷Bioon.com)
生物谷推薦原文出處:
BMC Bioinformatics 2010, 11:439doi:10.1186/1471-2105-11-439
CMASA: an accurate algorithm for detecting local protein structural similarity and its application to enzyme catalytic site annotation
Gong-Hua Li and Jing-Fei Huang
Background
The rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.
Results
The evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).
Conclusions
The CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server