3月27日,,國際學術期刊BMC Genomics在線發(fā)表了中科院上海生科院計算生物學所楊力研究組和生化與細胞所陳玲玲研究組的最新合作研究論文Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences。該項研究發(fā)展了一新的計算分析流程,,在不需要相應基因組信息的情況下,,僅通過對多個樣本的RNA轉(zhuǎn)錄組信息進行比較,,發(fā)現(xiàn)了體內(nèi)存在的大量成簇RNA編輯新位點及其組織差異性調(diào)控,。
在高等生物中,最主要的RNA編輯是A(adenosine)-to-I(Inosine) 的修飾,,其受到蛋白酶ADAR(adenosine deaminases that act on RNA )的催化調(diào)控,,編輯后腺苷酸(A)變成了次黃嘌呤核苷酸(I)。在翻譯水平上,,次黃嘌呤核苷酸(I)被識別為鳥核苷酸(G),,因此在該位點的編輯相當于A到G的轉(zhuǎn)換,從而改變了所在位置的編碼氨基酸序列,,豐富了基因的多樣性和多能性,。同時,存在于非編碼區(qū)的RNA編輯也可以通過影響RNA的可變剪接和細胞內(nèi)定位等途徑來改變RNA分子的功能和命運,,因此RNA編輯調(diào)控對于轉(zhuǎn)錄后RNA的多樣性和功能至關重要,。近年來,高通量測序技術被廣泛地應用于RNA編輯位點的預測分析,,極大地推動了RNA編輯的研究,。但是,由于高通量技術和后續(xù)計算分析的局限性,,對RNA編輯在全轉(zhuǎn)錄組水平的精確預測還存在著很大的挑戰(zhàn),。
該研究工作發(fā)展了一個高效的計算分析流程,并應用于RNA編輯位點的預測,,在人體組織中發(fā)現(xiàn)了600多個成簇(clustered) A-to-I RNA編輯的新位點及其在人組織間的差異調(diào)控,;重要的是,該研究還發(fā)現(xiàn)了在非重復序列中存在的成簇RNA編輯位點及其序列結(jié)構特征,。該計算流程及其所帶來的新發(fā)現(xiàn),,進一步豐富了人們對RNA編輯的認識,也開拓了對RNA編輯功能研究的思路,。與以往RNA編輯檢測方法不同,這一計算流程不需要測定同一樣本的基因組DNA序列來排除背景干擾,,而只需要多個樣本的RNA轉(zhuǎn)錄組信息進行比較,,獲得高準確度的A-to-I RNA編輯預測。值得一提的是,,在此項研究工作審稿過程中,,一篇Nat Methods (Ramaswami, et al,, Nat Methods,, 2013, 10: 128-132)文章報道了另一種只利用轉(zhuǎn)錄組RNA信息來預測A-to-I RNA編輯的方法,,這提示在今后的研究中可以利用類似的方法對更多轉(zhuǎn)錄組數(shù)據(jù)進行分析,,來進一步研究RNA編輯在基因表達調(diào)控上的功能作用,。
該工作由計算生物學所朱閃閃博士和生化與細胞所研究生向劍鋒等共同完成,并得到中科院,、國家自然科學基金委,、和上海市科委的經(jīng)費支持。(生物谷Bioon.com)
doi:10.1186/1471-2164-14-206
PMC:
PMID:
Prediction of constitutive A-to-I editing sites from human transcriptomes in the absence of genomic sequences
Shanshan Zhu, Jian-Feng Xiang, Tian Chen, Ling-Ling Chen and Li Yang
Background Adenosine-to-inosine (A-to-I) RNA editing is recognized as a cellular mechanism for generating both RNA and protein diversity. Inosine base pairs with cytidine during reverse transcription and therefore appears as guanosine during sequencing of cDNA. Current approaches of RNA editing identification largely depend on the comparison between transcriptomes and genomic DNA (gDNA) sequencing datasets from the same individuals, and it has been challenging to identify editing candidates from transcriptomes in the absence of gDNA information. Results We have developed a new strategy to accurately predict constitutive RNA editing sites from publicly available human RNA-seq datasets in the absence of relevant genomic sequences. Our approach establishes new parameters to increase the ability to map mismatches and to minimize sequencing/mapping errors and unreported genome variations. We identified 695 novel constitutive A-to-I editing sites that appear in clusters (named "editing boxes") in multiple samples and which exhibit spatial and dynamic regulation across human tissues. Some of these editing boxes are enriched in non-repetitive regions lacking inverted repeat structures and contain an extremely high conversion frequency of As to Is. We validated a number of editing boxes in multiple human cell lines and confirmed that ADAR1 is responsible for the observed promiscuous editing events in non-repetitive regions, further expanding our knowledge of the catalytic substrate of A-to-I RNA editing by ADAR enzymes. Conclusions The approach we present here provides a novel way of identifying A-to-I RNA editing events by analyzing only RNA-seq datasets. This method has allowed us to gain new insights into RNA editing and should also aid in the identification of more constitutive A-to-I editing sites from additional transcriptomes.