BIOINFORMATICS OF SIGNAL TRANSDUCTION PATHWAYS: Dong Xu
Traditionally, study of signal transduction pathways is done on individual basis through ad hoc approaches, which take many experiments and a very long time. With the advent of bioinformatics and high-throughput measurement technologies, e.g., microarray chips for gene/protein expression and two-hybrid systems for protein-protein interactions, it is now feasible and essential to develop new and effective protocols for systematic characterization of signal transduction pathways. The high-throughput data provide a protein interaction map at the cellular level, based on both physical interactions between proteins and genetic interactions between genes. Bioinformatics can use these data together with developed bioinformatics tools to generate hypotheses for signal transduction pathways, including constructing of pathway model, studying evolutionary relationship, understanding biological mechanisms using various bioinformatic tools (e.g., subcellular localization prediction, structure prediction, gene regulatory region analysis, etc.), interpreting and integrating experimental data, and rationally designing new experiments. After the hypothesis generation, we can use the constructed models for mathematical modeling to study the behavior of a model, including dynamics/kinetics modeling, biomaterial flux modeling, etc. The results of mathematical modeling can provide feedbacks for refining the model.
In characterizing signal transduction pathways, another important piece of information is the known pathways in other genomes. For example, if a particular transport pathway is partially/fully characterized in yeast, it can possibly be used as a template in characterizing the corresponding or related pathway in another genome. Over the years, a number of signal transduction pathways have been fully/partially characterized in different genomes by different research communities. These pathways have been carefully extracted from the literature and put into various databases. Several databases have been developed for signal transduction networks. CSNDB (http://geo.nihs.go.jp/csndb/) is a data- and knowledge-base for signaling pathways of human cells. Transpath (http://193.175.244.148/) focuses on pathways involved in the regulation of transcription factors in different species, including human, mouse and rat. SPAD (http://www.grt.kyushu-u.ac.jp/eny-doc/) is an integrated database for genetic information and signal transduction systems. The most comprehensive and widely used database for biological pathways is KEGG (http://star.scl.kyoto-u.ac.jp/kegg/). It contains information of metabolic pathways, signal transduction pathways, and molecular assemblies. There are many related data associated with signal transduction as well, e.g., data at the Alliance for Cell Signaling (http://www.afcs.org/ ) and BioNOME (http://bionome.sdsc.edu/). These data provide directly related information for signal transduction. Based on a known or constructed pathway model, there are several tools that simulate the behavior of the model either for a particular pathway or at the whole cellular level, e.g., Gepasi (http://gepasi.dbs.aber.ac.uk/softw/gepasi.html),
Virtual Cell (http://www.nrcam.uchc.edu/vcell_development/vcell_dev.html),
E-Cell (http://www.e-cell.org/), etc.
There have been a number of attempts to construct signal transduction pathway models, using various computational frameworks like Bayesian networks (Friedman et al, 2000), Boolean networks (Shmulevich et al, 2002), differential equations (Jamshidi et al, 2001; Kato et al, 2000), and steady-state model (Kyoda et al, 2000), generally based on one type of experimental data, like microarray gene expression data. While potentially promising, their modeling methodology makes scant use of the multitude of information sources in a coherent manner, thus producing overly simplistic solutions. This is particularly problematic given high-throughput experimental data are generally very noisy, intrinsically incomplete, and possibly inconsistent. We propose to develop bioinformatic techniques for integration of information from experimental data (e.g., gene expression data, protein-protein interaction data, genomic sequence data, subcellular localization data, and lipidomics data), and design targeted experiments for study of specific pathway components. One of unique strength of this proposal is that we have a strong capacity to generate various high-throughput data, as described in the five cores in this proposal, while other high-throughput centers generally focus on one technology. Such experimental capacity provides an immediate access to related data and fast validation of prediction results. In addition our bioinformatics effort have a wide coverage, including bioinformatics analysis of high-throughput data, protein structure prediction, hypothesis generation and mathematical modeling of pathway, and machine-learning techniques for molecular biology, whereas other bioinformatics groups typically specialize in a particular area. A wide range of both experimental and computational in-house expertise allows us to shorten the iterative cycles from data collection to data analysis and rational design of experiments so that we can significantly reduce the cost and time needed to fully characterize a signal transduction pathway. To our knowledge, we are in a very unique position, which is impossible elsewhere. The bioinformatics effort of this proposal has both service and research aims. The service aim is to assist in the discovery and integration of knowledge produced by all specific projects of the proposal. The research aim is to investigate new computational methods for knowledge discovery and integration. The Xu Lab at Oak Ridge National Laboratory will primarily work on integration of knowledge about signalling pathways. The Discovery Systems Laboratory (DSL) at Vanderbilt University will provide access to standard and novel supervised and unsupervised algorithms and implementations of those in support of all projects as well as of novel research in the induction of focused regions in specific regulatory pathways. The two teams will work closely. In particular, the Xu Lab will apply novel algorithms developed by DSL for the initial constructions of signal transduction pathways.
Dong Xu has access to well established bioinformatics infrastructure at Oak Ridge National Laboratory, including a strong research team with diverse expertise doing a wide range of bioinformatics studies, high-performance computing facilities, and a comprehensive collection of bioinformatics software and databases (see the facility description). He has a good track record in many areas of bioinformatics related to this project, including sequence comparison, protein structure prediction, gene expression data analysis, regulatory region analysis, and signal transduction pathway construction. A relevant project that he carried out is the application of consensus approach in studying the signal transduction pathway in the amino acid and peptide transport in yeast S. cerevisiae. Genes encoding the amino acid and peptide transporters are differentially regulated by the presence of specific amino acids and peptides in the growth medium. Sensing extracellular conditions of amino acid and peptide, receptors on the cytoplasmic membrane transduce a signal to intracellular molecules. Among the receptors, Ptr3p plays a crucial role as a switch for regulating expression of the di/tri-peptide transporter, Ptr2p, as well as a number of amino acid permeases. It is proposed that a signal transduction pathway is activated between Ptr3 and the transcription factors of the amino acid and peptide transporters. Several key questions related to this transport pathway are unsolved, including what are the pathway components between Ptr3p and transcription factors for proteins in the related pathways? In collaborating with experimentalist Dr. J. Becker at University of Tennessee, we have performed computational studies on these questions using various tools and data.
We have constructed (Yu et al, submitted) an interaction map for the Ssy1p-Ptr3p-Ssy5p complex and transcription factors that control proteins in the related pathways, using various information including data from DIP (http://dip.doe-mbi.ucla.edu), BIND (http://binddb.org) and gene expression data (Forsberg et al, 2001; Zhu et al, 2000). The protein function and subcellular localization information was used to select the most probable pathways, i.e., all the constituents of a path with more reasonably related functions and subcellular localizations have better chance to be in the correct pathway. Regulatory region analysis and gene expression analysis validate the pathway model by showing how the se