如何可靠地檢測新近發(fā)生的正選擇,?李海鵬研究員的最新研究成果,實(shí)現(xiàn)了20年來理論群體遺傳學(xué)的一個(gè)夢想,。
正選擇是一個(gè)重要的進(jìn)化力量,它使得攜帶某個(gè)突變的個(gè)體相對(duì)于不攜帶這個(gè)突變的個(gè)體來說有生存和繁殖上的優(yōu)勢。正選擇作為一種重要的進(jìn)化力量,,不僅在野生群體和現(xiàn)代人類的進(jìn)化過程中扮演重要的角色,而且在家養(yǎng)動(dòng)植物的馴化過程中(例如稻米,、狗和豬)均起著決定性的作用,。雖然我們無法回到過去,但是在最近1 – 10萬年里發(fā)生的正選擇事件通常都會(huì)在生物體的基因組里留下些蛛絲馬跡,。所以我們可以檢測到這些正選擇事件并且定位造成這些適應(yīng)度進(jìn)化的突變,,進(jìn)而為研究這些突變的功能開啟大門。最終,,我們希望通過這些研究來探索進(jìn)化的根本奧秘——適應(yīng)度進(jìn)化的生物學(xué)機(jī)制,。
然而,在檢測這些正選擇事件的時(shí)候,,人們發(fā)現(xiàn)所研究對(duì)象的群體數(shù)量在過去幾萬年里很可能在不斷發(fā)生變化,,這種群體數(shù)量的變化造成了相關(guān)檢驗(yàn)方法(neutrality tests)的高假陽性率。例如,,黑腹果蠅最早生活在非洲南部的一小塊區(qū)域,,在1 – 6萬年前開始擴(kuò)散到世界的其他地區(qū)。群體數(shù)量的擴(kuò)張?jiān)诠壍幕蚪M里留下了幾乎和正選擇一樣的痕跡,,使得相關(guān)檢驗(yàn)方法的假陽性率非常高(可以高達(dá)80%到90%),,這就造成了針對(duì)近期正選擇的檢驗(yàn)可信度很低,。
為了降低假陽性率,目前幾乎采用的唯一方法是在基因組水平上分析遺傳多態(tài)數(shù)據(jù),。然而由于長期進(jìn)化過程中所遺留下來的信息是有限的以及目前計(jì)算分析能力的局限,,人們所采用的模型不可能無限地接近于實(shí)際,所以要精確估計(jì)出自然群體的歷史數(shù)量變化的參數(shù)是相當(dāng)困難的,。所以這一方法仍然難以精確估計(jì)出具體的假陽性率,,而且其可靠性仍就無法在數(shù)學(xué)上得到證明。不僅如此,,由于一些重要的前提條件無法滿足,,使得該方法難以運(yùn)用到絕大多數(shù)的野生物種和家養(yǎng)動(dòng)植物的研究當(dāng)中。
基于此,,在1989年Tajima提出了著名的Tajima’s D檢測方法之后,,針對(duì)其存在的問題,理論群體遺傳學(xué)研究的一個(gè)主要目標(biāo)就是要建立一個(gè)行之有效的方法,,使得檢測新近發(fā)生的正選擇不受群體數(shù)量變化的影響,。實(shí)現(xiàn)這個(gè)目標(biāo)成為過去20多年理論群體遺傳學(xué)追逐的夢想。
李海鵬研究員的最新研究成果把這一夢想變成了實(shí)現(xiàn),。他提出了一種全新的通過檢驗(yàn)樹的拓?fù)浣Y(jié)構(gòu)策略來檢測新近發(fā)生的正選擇,,并建立了相應(yīng)的統(tǒng)計(jì)學(xué)方法。數(shù)學(xué)和計(jì)算機(jī)模擬兩方面均證明了,,該統(tǒng)計(jì)學(xué)假設(shè)檢驗(yàn)的結(jié)果不受群體歷史數(shù)量變動(dòng)的影響,,比如瓶頸效應(yīng)和群體擴(kuò)張。這意味著無論群體的數(shù)量在歷史上如何變動(dòng),,這一新方法的假陽性率將保持在統(tǒng)計(jì)學(xué)假設(shè)檢驗(yàn)時(shí)所設(shè)的顯著性水平以下,。
新的方法不需要任何種群歷史的信息或者對(duì)種群參數(shù)的估計(jì),也無需基因組水平的遺傳多態(tài)數(shù)據(jù),,僅僅需要來自于100 – 1000bp范圍內(nèi)的遺傳多態(tài)數(shù)據(jù),,就可以可靠地檢測新近發(fā)生的正選擇,。這一新方法的建立將極大地促進(jìn)相關(guān)領(lǐng)域的發(fā)展,。(生物谷Bioon.com)
生物谷推薦原文出處:
Molecular Biology and Evolution, doi:10.1093/molbev/msq211
A new test for detecting recent positive selection that is free from the confounding impacts of demography
Haipeng Li
Department of Computational Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
It has been a long-standing interest in evolutionary biology to search for the traces of recent positive Darwinian selection in organisms. However, such efforts have been severely hindered by the confounding signatures of demography. As a consequence neutrality tests often lead to false inference of positive selection since they detect the deviation from the standard neutral model. Here, using the maximum frequency of derived mutations (MFDM) to examine the unbalanceness of the tree of a locus, I propose a statistical test that is analytically free from the confounding effects of varying population size and has a high statistical power (up to 90.5%) to detect recent positive selection. When compared with five well-known neutrality tests for detecting selection (i.e., Tajima's D-test, Fu & Li's D-test, Fay & Wu's H-test, the E-test and the joint DH test), the MFDM test is indeed the only one free from the confounding impacts of bottlenecks and size expansions. Simulations based on wide-range parameters demonstrated that the MFDM test is robust to background selection, population subdivision and admixture (including hidden population structure). Moreover, when two high-frequency mutations are introduced, the MFDM test is robust to the misinference of derived and ancestral variants of segregating sites due to multiple hits. Finally, the sensitivity of the MFDM test in detecting balancing selection is also discussed. In summary, it is demonstrated that summary statistics based on tree topology can be used to detect selection, and this work provides a reliable method that can distinguish selection from demography even when DNA polymorphism data from only one locus is available.