版納植物園生態(tài)進(jìn)化組的Cannon教授和其組員發(fā)明了一種新的研究方法,,該方法不用事先組裝,,通過分析檢測數(shù)據(jù)中達(dá)到某種“復(fù)雜度”的基因片段是否存在及其出現(xiàn)頻次,,來探討一定數(shù)量目標(biāo)基因組中的序列差異,。
在以往的研究中,,針對短測序片段(short read sequence,SRS)進(jìn)行的比較基因組分析多數(shù)都需有事先組裝好的DNA序列作為參照,,這一定程度上制約了這類數(shù)據(jù)在生物信息學(xué)研究的發(fā)展,。
Cannon教授等的研究比較九個(gè)樹種從種群到科一級的基因組多樣性的海量數(shù)據(jù),,并利用已知的3個(gè)樹種的基因組數(shù)據(jù)作為對照,探知測序反應(yīng)中數(shù)據(jù)的質(zhì)量和分布偏差,。
該方法定義了3類主要的富含生物信息的復(fù)雜DNA片段,,其中每一類都具有其特殊的統(tǒng)計(jì)屬性。第一類復(fù)雜片段為某一基因組所特有但假陽性的概率很高,高度依賴于測序覆蓋度和分布情況,;第二類復(fù)雜片段為兩個(gè)基因組所共有并能顯示其潛在的拷貝數(shù)差異,;第三類復(fù)雜片段為某一些基因組所共有,與物種的形態(tài)和地理差異相聯(lián)系,。由于該方法不需事先組裝,,即可分析海量數(shù)據(jù),極大的推進(jìn)了短序列測序技術(shù)在非模式生物上的應(yīng)用,,并為更為進(jìn)一步的基因組裝和細(xì)致研究直接篩選出最有效的遺傳部件提供新的途徑,。該研究中也展示了該技術(shù)的實(shí)際應(yīng)用前景,例如,,我們可為一種瀕危木材樹種找到大量的種群水平上的遺傳標(biāo)記,,從而可以界定木材個(gè)體的來源,規(guī)范國際木材交易,。
新一代DNA測序技術(shù)的突破為研究熱帶森林的生態(tài)和進(jìn)化提供了一個(gè)新的平臺,,Cannon教授等的研究是版納植物園為把基因組學(xué)應(yīng)用在植物功能適應(yīng)進(jìn)化與氣候變化、物種多樣化和共存,、以及極度瀕危的亞洲熱帶森林自然資源保護(hù)諸方面所邁出的重要一步,。(生物谷Bioon.com)
生物谷推薦原始出處:
Molecular Ecology Volume 19 Issue s1, Pages 147 - 161
Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack
CHARLES H. CANNON*?, CHAI-SHIAN KUA*, D. ZHANG* and J.R. HARTING
*Ecological Evolution Group, Xishuangbanna Tropical Botanic Garden, Chinese Academy of Sciences, Menglun, Mengla 666303, China , ?Department of Biological Sciences, Texas Tech University, Lubbock, TX 79409, USA
Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of 'complex' fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.