n a dramatic preview of post- genomic research, a team of bio- informaticians and geneticists has identified the gene defect underlying Leigh Syndrome, French Canadian type (LSFC) — a fatal hereditary disease prevalent in the Saguenay- Lac-Saint-Jean (SLSJ) region of Quebec. The discovery, which resulted from cross-referencing DNA, protein, and gene expression databases, was published in the January 14, 2003, issue of the Proceedings of the National Academy of Sciences.
One out of 2,000 children born in the SLSJ area suffers this recessive form of Leigh Syndrome, which causes mental retardation and ultimately premature death. About one out of every 23 inhabitants in this region carries a copy of the defective gene. Children who inherit two copies of the faulty gene suffer the disease. The new finding will form the basis of a genetic test to screen gene carriers, and identify children with the disease before serious symptoms arise.
Two years ago, the Whitehead Institute Center for Genome Research pinned the location of the LSFC gene to a section of chromosome 2 containing about 30 known and suspected genes. During a weekend in the fall of 2001, Whitehead fellow Vamsi Mootha ran a custom-built software program through two large gene expression databases: READ (Riken Expression Array Database) and a compilation of cancer study results published online by the Whitehead group.
Mootha was investigating whether any of those 30 candidate LSFC genes showed similar expression patterns to known genes involved in the synthesis and function of mitochondria (microscopic energy-generating organelles within cells). In the early 1990s, researchers at the Hospital for Sick Children in Toronto had shown that LSFC patients suffer defects in energy metabolism.
Why use a cancer gene database to hunt down a metabolic disease gene? Gene chip studies typically produce data on tens of thousands of genes, thereby providing information about many genes that aren't directly involved in cancer.
Within a few days of starting the analysis, one intriguing "mitochondrial-like" gene kept "popping up," Mootha says. He then checked data from a proteomics study of mitochondria that he had previously (and conveniently) worked on. He also ran the same analysis on two expression databases from the Genomics Institute of the Novartis Research Foundation's Gene Expression Atlas.
Everything pointed to the same gene, LRPPRC, which codes for an RNA-binding protein likely involved in the processing of mitochondrial gene transcripts.
Finally, genetic testing verified what the software search had indicated, revealing the precise abnormality in LSFC.
Twenty-two patients and 32 parents were tested for the mutation. All of them had mutations in LRPPRC, including one patient with a distinct mutation in each copy of the gene. This last finding, Mootha says, is the "crowning evidence" that proves this is the genuine defect.
It took about seven months to go from identifying the gene to confirming it was the culprit. But that's lightning speed compared to traditional approaches to hunting for disease genes, which typically took years, sometimes decades, in the pregenome era.
In the Neighborhood
Mootha's neighborhood analysis software finds genes that have similar gene expression patterns. "If you ask, 'Who does this gene travel with?' you can use that information to find new genes that do the same general thing as ones you already know about," he says, such as genes involved in ener- gy metabolism. Because the expression databases Mootha used are public, anyone could follow his lead, in principle. (Mootha will even send the program to those requesting it by e-mail.)
"Many platforms are now generating a lot of genomics information," says Scott Jokerst, senior informatics product manager at Affymetrix Inc. "Research like this provides clear evidence that people can put data together in a way that can be applied." Mootha hopes this approach can be used to find more disease genes.
"Complex diseases are still going to be difficult, but it's heartening when you see this happen with even simpler diseases, where the genetic location wasn't known," Jokerst says.
"We're excited about using [the technique] again," Mootha says, "and a few people have already approached us to suggest projects. But this is not a program that you just drop onto the data. It has to be tailored for each case."
Several Canadian groups collaborated with the Whitehead Institute on this study, including researchers at McGill University and The Hospital for Sick Children in Toronto. The proteomics studies were conducted at MDS Proteomics in Denmark.