北京蛋白質組研究中心/蛋白質組學國家重點實驗室朱云平研究員課題組張紀陽博士等通過建立貝葉斯模型分析“鳥槍法”鑒定蛋白質組數(shù)據(jù),大幅提升蛋白質組質譜數(shù)據(jù)的利用率,。相關論文發(fā)表在最新一期國際蛋白質組學權威雜志:《分子與細胞蛋白質組學》(Molecular & Cellular Proteomics, MCP)上面,,同期雜志還發(fā)表了該所姜穎副研究員課題組、錢小紅研究員課題組的兩篇研究論文,,創(chuàng)該刊單期同一單位發(fā)文數(shù)之最,。
大規(guī)模、高通量的蛋白質組研究產生了海量的數(shù)據(jù),,其中包含了大量的噪聲,,而高可靠的數(shù)據(jù)是進一步生物學分析的基礎,故目前的分析方法均采用了過嚴的標準,,但在降低假陽性的同時也人為地造成了數(shù)據(jù)較高的假陰性及較低的利用率,。因此,"在保證高可信度的前提下,,最大限度地利用實驗數(shù)據(jù)"一直是蛋白質組學界的追求,。"鳥槍法"是目前蛋白質組鑒定中地位最重要、應用最廣泛的技術策略,。他們基于隨機數(shù)據(jù)庫策略,、非參概率密度模型和貝葉斯公式,建立了串聯(lián)質譜數(shù)據(jù)過濾的多元貝葉斯非參模型。通過標準蛋白和復雜樣品的嚴格考核,,表明該模型具有良好的靈敏性和普適性,,可將質譜數(shù)據(jù)的利用率提高10~40%,創(chuàng)本領域最好水平,。(生物谷Bioon.com)
生物谷推薦原始出處:
Molecular & Cellular Proteomics 8:547-557, 2009.doi:10.1074/mcp.M700558-MCP200
Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics
Jiyang Zhang,,?, Jie Ma,?, Lei Dou, Songfeng Wu, Xiaohong Qian, Hongwei Xie, Yunping Zhu,|| and Fuchu He,**,
From the State Key Laboratory of Proteomics, Beijing Proteome Research Center, Beijing Institute of Radiation Medicine, Beijing 102206, China, School of Mechanical Engineering and Automatization, National University of Defense Technology, Changsha 410073, China, and ** Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China
Tandem mass spectrometry combined with database searching allows high throughput identification of peptides in shotgun proteomics. However, validating database search results, a problem with a lot of solutions proposed, is still advancing in some aspects, such as the sensitivity, specificity, and generalizability of the validation algorithms. Here a Bayesian nonparametric (BNP) model for the validation of database search results was developed that incorporates several popular techniques in statistical learning, including the compression of feature space with a linear discriminant function, the flexible nonparametric probability density function estimation for the variable probability structure in complex problem, and the Bayesian method to calculate the posterior probability. Importantly the BNP model is compatible with the popular target-decoy database search strategy naturally. We tested the BNP model on standard proteins and real, complex sample data sets from multiple MS platforms and compared it with PeptideProphet, the cutoff-based method, and a simple nonparametric method (proposed by us previously). The performance of the BNP model was shown to be superior for all data sets searched on sensitivity and generalizability. Some high quality matches that had been filtered out by other methods were detected and assigned with high probability by the BNP model. Thus, the BNP model could be able to validate the database search results effectively and extract more information from MS/MS data.