近日,,國際著名雜志Evolutionary Bioinformatics 刊登了中國科學(xué)院北京基因組研究所基因組科學(xué)與信息重點(diǎn)實(shí)驗(yàn)室的最新研究成果“LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes,。”研究人員成功構(gòu)建出“脊椎動物進(jìn)化分支共調(diào)控基因數(shù)據(jù)庫”,。
脊椎動物,尤其是哺乳動物,,其基因組的序列特征和基因位置關(guān)系具有良好的共線性關(guān)系,,這些復(fù)雜且動態(tài)的基因排列和染色體結(jié)構(gòu)對于維持體形發(fā)育和細(xì)胞分化具有重要意義。但其中幾個基本問題卻一直困擾著科研人員,,如:不同進(jìn)化分支物種(靈長目,、嚙齒目,、食肉目和偶蹄目等)基因組的保守和變異的基因聚類的最小單位是什么?這些基因聚類與核小體定位和染色體折疊的關(guān)系,?這些基因的聚集是隨機(jī)的還是有所偏好的,?哪些是隨機(jī)的,哪些是功能相關(guān)的,?
基于以上科學(xué)問題,,在基因組所副所長、基因組科學(xué)與信息重點(diǎn)實(shí)驗(yàn)室主任于軍研究員的指導(dǎo)下,,王大鵬博士、張宇賓和樊中華所在小組收集了廣泛范圍物種的基因組注釋信息,,包括哺乳動物,、鳥類、爬行類,、兩棲類和魚類,,并且選擇有代表性的昆蟲、線蟲和真菌作為外群,。研究以人類基因組為參照,,將其它各物種的基因組以同源基因?yàn)樵瓌t,以保守的兩個“核心基因”為單位(保持轉(zhuǎn)錄方向保守的“頭對頭”,、“尾對尾”或者“頭對尾”)對應(yīng)到人類基因組上,。研究同時提供了多種研究共調(diào)控機(jī)制的工具,如共進(jìn)化,、共表達(dá),、基因功能富集和啟動子分析等模塊。
該數(shù)據(jù)庫及相關(guān)工具的構(gòu)建,,為解析具體一個基因或者幾個基因在不同進(jìn)化樹分支內(nèi)保守性和分支間變異性相關(guān)的基因復(fù)制,、丟失、插入,、倒位以及染色體水平的多倍化等基因組變異事件,,提供了有力的支持。(生物谷Bioon.com)
doi:10.4137/EBO.S8540
PMC:
PMID:
LCGbase: A Comprehensive Database for Lineage-Based Co-regulated Genes
Dapeng Wang, Yubin Zhang, Zhonghua Fan, Guiming Liu and Jun Yu
Animal genes of different lineages, such as vertebrates and arthropods, are well-organized and blended into dynamic chromosomal structures that represent a primary regulatory mechanism for body development and cellular differentiation. The majority of genes in a genome are actually clustered, which are evolutionarily stable to different extents and biologically meaningful when evaluated among genomes within and across lineages. Until now, many questions concerning gene organization, such as what is the minimal number of genes in a cluster and what is the driving force leading to gene co-regulation, remain to be addressed. Here, we provide a user-friendly database—LCGbase (a comprehensive database for lineage-based co-regulated genes)—hosting information on evolutionary dynamics of gene clustering and ordering within animal kingdoms in two different lineages: vertebrates and arthropods. The database is constructed on a web-based Linux-Apache-MySQL-PHP framework and effective interactive user-inquiry service. Compared to other gene annotation databases with similar purposes, our database has three comprehensible advantages. First, our database is inclusive, including all high-quality genome assemblies of vertebrates and representative arthropod species. Second, it is human-centric since we map all gene clusters from other genomes in an order of lineage-ranks (such as primates, mammals, warm-blooded, and reptiles) onto human genome and start the database from well-defined gene pairs (a minimal cluster where the two adjacent genes are oriented as co-directional, convergent, and divergent pairs) to large gene clusters. Furthermore, users can search for any adjacent genes and their detailed annotations. Third, the database provides flexible parameter definitions, such as the distance of transcription start sites between two adjacent genes, which is extendable to genes that flanking the cluster across species. We also provide useful tools for sequence alignment, gene ontology (GO) annotation, promoter identification, gene expression (co-expression), and evolutionary analysis. This database not only provides a way to define lineage-specific and species-specific gene clusters but also facilitates future studies on gene co-regulation, epigenetic control of gene expression (DNA methylation and histone marks), and chromosomal structures in a context of gene clusters and species evolution. LCGbase is freely available at http://lcgbase.big.ac.cn/LCGbase.