最近,,密西根理工大學(xué)的數(shù)學(xué)家開發(fā)出一項(xiàng)新軟件——Ensemble Learning Approach(ELA),,ELA可用于比較不同個(gè)體之間的基因組成,從中分選出疾病相關(guān)的基因。研究人員利用該軟件能夠找出某些人類遺傳病的致病基因。此外,,他們還發(fā)現(xiàn)了2型糖尿病相關(guān)基因的11個(gè)突變體,即單核苷酸多態(tài)性(single nucleotide polymorphisms,SNPs),。這項(xiàng)研究發(fā)表在Genetic Epidemiology雜志上,。
像2型糖尿病這種復(fù)雜的遺傳疾病,單個(gè)基因突變可能促成該病的發(fā)生,,多個(gè)基因共同作用也可能引起該病,。過去,很難針對(duì)多個(gè)基因間的相互作用進(jìn)行研究,,因?yàn)橐獙⑷祟惢蚪M中約50萬個(gè)基因匹配起來再進(jìn)行計(jì)算幾乎不可能實(shí)現(xiàn),。
而ELA軟件避開上述問題,首先,,將基因研究的范圍縮小到只包括潛在的致病基因,;再通過統(tǒng)計(jì)學(xué)方法計(jì)算出哪些SNPs是能單獨(dú)致病,哪些需要多個(gè)基因共同作用才能致病,。為了測(cè)試他們建立的模型在實(shí)際數(shù)據(jù)上的有效性,,課題組在英國(guó)對(duì)1,000人進(jìn)行基因分析——包括500名2型糖尿病患者,500名健康人,。他們發(fā)現(xiàn)有11個(gè)SNPs引起該2型糖尿病的可能性很高,。(生物谷Bioon.com)
生物谷推薦原始出處:
Genetic Epidemiology Volume 32 Issue 4, Pages 285 - 300
An ensemble learning approach jointly modeling main and interaction effects in genetic association studies
Zhaogong Zhang 1 2, Shuanglin Zhang 1 2, Man-Yu Wong 3, Nicholas J. Wareham 4, Qiuying Sha 1 *
1Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan
2Heilongjiang University, Harbin, China
3Department of Mathematics, Hong Kong University of Sciences and Technology, Hong Kong, China
4Department of Public Health and Primary Care, University of Cambridge Institute of Public Health, Cambridge, United Kingdom
Complex diseases are presumed to be the results of interactions of several genes and environmental factors, with each gene only having a small effect on the disease. Thus, the methods that can account for gene-gene interactions to search for a set of marker loci in different genes or across genome and to analyze these loci jointly are critical. In this article, we propose an ensemble learning approach (ELA) to detect a set of loci whose main and interaction effects jointly have a significant association with the trait. In the ELA, we first search for base learners and then combine the effects of the base learners by a linear model. Each base learner represents a main effect or an interaction effect. The result of the ELA is easy to interpret. When the ELA is applied to analyze a data set, we can get a final model, an overall P-value of the association test between the set of loci involved in the final model and the trait, and an importance measure for each base learner and each marker involved in the final model. The final model is a linear combination of some base learners. We know which base learner represents a main effect and which one represents an interaction effect. The importance measure of each base learner or marker can tell us the relative importance of the base learner or marker in the final model. We used intensive simulation studies as well as a real data set to evaluate the performance of the ELA. Our simulation studies demonstrated that the ELA is more powerful than the single-marker test in all the simulation scenarios. The ELA also outperformed the other three existing multi-locus methods in almost all cases. In an application to a large-scale case-control study for Type 2 diabetes, the ELA identified 11 single nucleotide polymorphisms that have a significant multi-locus effect (P-value=0.01), while none of the single nucleotide polymorphisms showed significant marginal effects and none of the two-locus combinations showed significant two-locus interaction effects.