A SNP Map of the Rat Genome Generated from cDNA Sequences
Heike Zimdahl,1 Gerald Nyakatura,2 Petra Brandt,2 Herbert Schulz,1 Oliver Hummel,1,3 Berthold Fartmann,2 David Brett,2 Marcus Droege,2 Jan Monti,1 Young-Ae Lee,1 Yinyan Sun,1,3 Shaying Zhao,4 Eitan E. Winter,5 Chris P. Ponting,5 Yuan Chen,6 Arek Kasprzyk,6 Ewan Birney,6 Detlev Ganten,1 Norbert Hubner1*
The understanding of disease susceptibility and biological variability will largely depend on the identification of functional genetic variants. We describe a map of more than 12,000 gene-based single nucleotide polymorphisms (SNPs) from transcribed regions, created by aligning cDNA sequence from three other strains to the recent rat BN/SsNHsd/Mcwi (henceforth BN) genome assembly (1). Forty-two percent of the SNPs could be mapped to genes with a known translational start site in the rat, amounting to 3785 untranslated and 1229 coding-region SNPs (cSNPs) divided roughly equally between those causing synonymous and nonsynonymous changes.
To identify SNPs in transcribed sequences, we have sequenced 371,916 clones from non-normalized cDNA libraries generated from several tissues (2). The cDNA libraries were derived from two widely used inbred rat strains: the SHRSP, one of the most commonly used experimental model organisms in cardiovascular research, and the WKY, a widely used reference strain. Additional cDNA libraries were produced from SD (Sprague Dawley) rats, a commonly used but not completely inbred strain. Sequences were compared using SSAHA-SNP software (3) to identify differences in high-quality matches (2). We identified a total of 12,985 high-quality candidate SNPs in transcribed sequences across the rat genome (Table 1) (2).
Table 1. Results of SNP screening in transcribed sequences. "Number of base pairs (bp) screened" includes only the reads that passed all filters.
--------------------------------------------------------------------------------
Strain No. of bp screened No. of candidate SNPs
--------------------------------------------------------------------------------
SHRSP versus BN 5,114,620 4,748
WKY versus BN 4,699,327 4,138
SD versus BN 3,557,619 2,765
WKY versus SHRSP 478
WKY versus SD 407
SHRSP versus SD 449
Total
--------------------------------------------------------------------------------
13,371,566
--------------------------------------------------------------------------------
12,985
--------------------------------------------------------------------------------
To test the accuracy of our results, we resequenced 300 SNPs on genomic DNA. Of 300, 287 were authentic, indicating that most of the SNPs are true variants. The rate of polymorphism between BN and any of the strains used here was 1 SNP in 1100 base pairs. The SNPs were distributed in a ratio of roughly 2:1 between transitions and transversions, consistent with prior observations for human and mouse SNPs (4, 5). SNPs were not randomly distributed across the transcribed loci. Specifically, we rejected the model that SNPs are randomly distributed over clustered expressed sequence tags (ESTs) by comparing our observed distribution with that expected assuming a random (Poisson) distribution given the length of the transcripts investigated and mean observed polymorphism rate (P < 10–19) (2). There was an excess of loci with either no SNP or multiple SNPs. The nonrandom distribution of SNPs across transcribed loci suggests regional variation due to gene history; that is, the coalescent time to the most recent common ancestor for alleles at a locus can vary greatly across the genome, owing to either natural selection or demographic effects (2). Thus, an attempt to determine the complete haplotype structure across commonly used laboratory rat strains may provide an important tool to aid in the identification of disease susceptibility genes, as recently suggested for the mouse (4, 6).
Mutations that cause disease frequently occur in the coding sequence and directly influence protein structure and function. However, many diseases result from mutations that influence various aspects of messenger RNA (mRNA) metabolism (7). In an attempt to provide an initial functional annotation and to maximize utility, SNPs were characterized at the (i) transcript or nucleotide level, (ii) protein level, and (iii) genomic level [supporting online material (SOM) Text]. A visual description of the data can be viewed at the ENSEMBL database (8).
This first-generation cDNA-based SNP map for the rat provides a public resource for defining functional variation across the genome. It represents a valuable tool for comparative genome analysis with other mammalian species and should help identify biomedically important genes.
References and Notes
1. Rat Genome Sequencing Project Consortium. Data available from www.hgsc.bcm.tmc.edu/projects/rat/.
2. Materials and Methods are available as supporting online material on Science Online.
3. Z. Ning, A. J. Cox, J. C. Mullikin, Genome Res. 11, 1725 (2001).[Abstract/Free Full Text]
4. K. Lindblad-Toh et al., Nature Genet . 24, 38 (2000).
5. International SNP Map Working Group, Nature 409, 928 (2001).[CrossRef][ISI][Medline]
6. C. M. Wade et al., Nature 420, 574 (2002).[CrossRef][ISI][Medline]
7. J. T. Mendell, H. C. Dietz, Cell 107, 411 (2001).[ISI][Medline]
8. All SNP data aligned to the assembled rat genome sequence and annotation with respect to described genes and genomic features are incorporated into the ENSEMBL database (www.ensembl.org). Integrating our data into the ENSEMBL database allows it to be visualized within the context of the growing body of information about the human and other genome sequences represented. Additionally, all data can be found at dbSNP (www.ncbi.nlm.nih.gov/SNP), which details the exact allelic variant of the described SNP. SNPs can be accessed from dbSNP and ENSEMBL databases using accession numbers ss16343960 to ss16355610.
9. We thank all members of the Rat Genome Sequencing Project Consortium for providing the assembled rat genome sequence. We also thank H. Lehrach and T. J. Aitman for comments on the manuscript. Supported by MWG and by a grant from the NGFN/BMBF (German Ministry for Research and Education) to N.H.
Supporting Online Material