国产精品综合色区在线观看,国产日韩欧美激情视频在线观看,国产乱妇乱子视频在线

隨著結(jié)構(gòu)基因組的發(fā)展,，對(duì)結(jié)構(gòu)進(jìn)行分析,，預(yù)測(cè)，注釋的工具的應(yīng)用和發(fā)展越來越重要了,。在這篇review中作者從四個(gè)方面介紹了這個(gè)領(lǐng)域近來的發(fā)展：特異的結(jié)構(gòu)比對(duì)（specifically structure alignment）,，進(jìn)化距離遠(yuǎn)的同源及類似蛋白的檢測(cè)（detection of remote homologs and analogs），同源建模（homology modeling）和從結(jié)構(gòu)預(yù)測(cè)功能（use of structures to predict function）,。作者還對(duì)結(jié)構(gòu)基因組的各種理論進(jìn)行了探討,，包括基于結(jié)構(gòu)的序列空間聚類和基因組范圍的功能識(shí)別等。

Structural genomics: Computational　methods for structure analysis

SHARON GOLDSMITH-FISCHMAN AND BARRY HONIG
Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Columbia University,
New York, New York 10032, USA

Abstract
The success of structural genomics initiatives requires the development and application of tools for structure　analysis, prediction, and annotation. In this paper we review recent developments in these areas; specifically　structure alignment, the detection of remote homologs and analogs, homology modeling and the use of　structures to predict function. We also discuss various rationales for structural genomics initiatives. These　include the structure-based clustering of sequence space and genome-wide function assignment. It is also　argued that structural genomics can be integrated into more traditional biological research if specific
biological questions are included in target selection strategies.　
Keywords: Structural genomics; homology modeling; structure alignments; functional analysis.

Structural genomics is a term that refers to high-throughput　three-dimensional structure determination and analysis of　biological macromolecules, at this stage primarily individual
protein domains. The determination of the three-dimensional　structures of proteins has for many years come　under the classification of “curiosity” or “hypothesis　driven” research. Structures were generally determined because　they could be expected to teach us something new
about a biological problem; for example, the details of an　enzyme mechanism, the nature of a molecular recognition　process, or the energetic basis of energy transduction processes.
An important spinoff of structural biology has been　the discovery of new relationships between amino acid sequences　and protein structures, and among different protein　structures. New computational tools have been developed to　exploit the information that has become available, and many　remarkable and unexpected relationships have been uncovered.　Concepts such as protein family, fold, and superfamily　have been introduced (Orengo et al. 1997; Hubbard et al.1999), and detailed taxonomies have been developed that　help us understand the complex three-dimensional shapes of　proteins. Structural genomics represents a new direction in
structural biology in that it is based on the goal of determining　as many structures as possible, even in advance of a　well-defined biological question. Nevertheless, the field is
ultimately “curiosity driven” but the questions being asked　now relate to the discovery of complex relationships in sequence　and structure space and, ultimately, to a deeper understanding　of many biological problems once these relationships　are understood.
It should be recognized that in the short run, structural　genomics will not lead to the determination of the large　macromolecular structures and complexes that have been　the hallmark of the breathtaking advances in structural biology　in recent years. Indeed, the experimental focus has　generally been on technological advances in expression, purification,
crystallization, and structure determination,　whereas the structures themselves need not be particularly　interesting in terms of the new biology they reveal. The　stated goals of structural genomics have reflected this reality.

One goal that was expressed at an early stage was the　determination of a representative for each protein fold, thus　providing complete coverage of “fold space.” However,　there is a great deal of ambiguity in the definition of a fold,and in fact, an argument can be made that fold space should　be viewed as continuous (Shindyalov and Bourne 2000;　Yang and Honig 2000a; Harrison et al. 2002). A more precise　expression of the goals of structural genomics is the
experimental determination of enough structures so that all　other structures can be built with homology modeling (Vitkup　et al. 2001; Chance et al. 2002). The figure of 30%　sequence identity is often used as the cutoff, above which a　homology model is viewed as meaningful.

Based on this　criterion, it has been estimated that about 16,000 structures　need to be determined (Vitkup et al. 2001). It is perhaps　remarkable, and for a computational biologist quite satisfying,　that an experimental strategy is based on the ability to　construct a model. Functional considerations are also being　used as criteria in target selection, as discussed in a recent　article in this journal from the New York Structural Genomics　Research Consortium (Chance et al. 2002). An additional　strategy, based on the functional information derivable
from homology models, will be introduced below.

Although the goals of structural genomics are varied,　there is no question that the coming years will see a major　increase in the number of proteins whose three-dimensional　structures are known. The status of structural genomics targets,including information relating to gene expression, protein　purification, and structure determination, is curated and　made publicly available at targetdb.pdb.org. How should　the newly determined structures be used? As part of NESG　(Northeast Structural Genomics Consortium; www.nesg.org; Bertone et al. 2001) we are attempting to derive maximum　information from each new structure that is determined　by our consortium. Our goals are to assign function　or to suggest functional hypotheses for each new structure　and to update sequence–structure relationships on a continuous　basis. The availability of new structures allows us to　test existing methods and to consider what new computational　tools might be developed so as to better exploit the　continuous flow of new structural information. The goal of　this article is to discuss the approaches currently being used　to analyze new structures and, in a number of cases, to point　to new directions. Specifically we summarize (1) structural　relationships among proteins, (2) methods that combine sequence　and structural information to derive new relationships　between distantly related proteins, (3) protein structure　prediction by homology, and (4) structure-based assignment　of protein function. As will be discussed, there is　a clear synergistic relationship between computational　methods and target selection in structural genomics. Computational　methods generally become increasingly effective　when the data set of protein structures upon which they are　based increases. In parallel, the criteria used to choose the
structures to be determined experimentally will benefit from　improvements in the sensitivity and precision of methods　used to define sequence, structure, and functional relationships
between proteins.

Structure alignments

Folded proteins are constructed from a small number of　secondary structure elements (SSEs) whose mutual organization　in space creates distinct and somewhat striking threedimensional
patterns. These were visually characterized and　classified in a seminal paper by Jane Richardson, who also　emphasized their esthetic qualities (Richardson 1981). The　visual classification of protein structures and the relationships　that can be detected in this way are embodied in the　SCOP database (Hubbard et al. 1999). SCOP classifies most　globular proteins into class (Levitt and Chothia 1976), fold,　superfamily, and family; these terms have been widely　adopted. Fold refers primarily to the organization of SSEs in　space, while superfamily refers to proteins with the same　fold and a related function. In addition to SCOP, another　widely used classification scheme is the CATH database of　Thornton, Orengo et al. (1997). CATH is partially automatic　and makes use of the SSAP program (Orengo and　Taylor 1996) to derive structural relationships. As is the　case for SCOP, CATH also involves manual intervention,　particularly to discriminate between groups of proteins that　have a similar fold (analogs) and those that have a common　fold and a functional relationship (homologs, or superfamily　members in the SCOP terminology).

SCOP and CATH are extremely valuable resources that　have been used to derive structure/function relationships　between proteins. SCOP by its very nature does not provide
objective measures by which to describe structural relationships,　whereas CATH, although including such a score　(SSAP), is not based exclusively on structural relationships.

A variety of objective measures are available from the many　structure superposition algorithms that have been introduced　in recent years (Swindells et al. 1998), all of which　are based on one or more geometric criteria that are use to　characterize structural similarity. For example, DALI　(Holm and Sander 1993) measures structural similarity　based on C contact distances while others algorithms such　as SSAP (Orengo and Taylor 1996), VAST (Madej et al.
1995), and PrISM (Yang and Honig 2000a) make use of　secondary structure information. The CE algorithm (Shindyalov　and Bourne 1998) utilizes aligned fragment pairs of　a given length; the program MAMMOTH (Ortiz et al. 2002)　performs sequence- and secondary structure-independent
structural alignments by detecting subsets of aligned pairs;　the MUSTA program (Leibowitz et al. 2001) performs multiple　structure alignments implementing an alignment algorithm　based on a geometric hashing technique that is sequence-　independent (Nussinov and Wolfson 1991).

Structural　similarity is often measured in terms of RMSD, but　this can be ambiguous, as the definition of topologically　equivalent residues to be used in the calculation of an
RMSD is not always clear. The problem becomes more　extreme for distantly related proteins. In addition, RMSD　measures are clearly more meaningful as the length of the　aligned segments increases, a problem that has recently　been addressed by Carugo and Pongor (2001), who introduced　a normalized RMSD score. Structural similarity is　often reported in terms of a Z-score, which measures a　deviation from a database average. Z-scores determined　with different algorithms have no direct relationship but, in　the context of a widely used program such as DALI, provide　an extremely meaningful measure of structural relationships.PrISM, a program developed in our lab (Yang and　Honig 2000a), defines a protein structural distance (PSD)
that is a composite score including a contribution from both　an RMSD term and a term that accounts for the spatial　arrangement of SSEs. In this way it attempts to measure　similarities even for distantly related proteins.

This brief summary illustrates the underlying reality that　there is no unique and objective measure for defining structural　relationships between proteins. Nor, as pointed out by　Godzik, is there unique way of aligning two proteins(Godzik 1996). On the other hand, most algorithms yield　similar conclusions as to what is similar and what is not.　Significant discrepancies do, however, arise, and it may be　worthwhile utilizing multiple approaches in a particular application.

The identification of relationships between proteins based　on structural alignments has a number of goals. The identification　of homologs permits functional inferences so　clearly revealed in the SCOP and CATH databases. In contrast,identifying structural analogs provides no obvious　functional information but may be extremely useful if one is　interested in common sequence features that cause a protein　to adopt a particular topology (Yang and Honig 2000b). It is　interesting in this regard that it is difficult to distinguish　homologs from analogs on purely structural terms (Rost　1997; Russell et al. 1998; Yang and Honig 2000b). This　difficulty reflects the fact that sequence features that determine　structure are often distinct from those that determine　function. Indeed, it is often the case that there is no detectable　sequence conservation pattern common to structures　with the same fold. Thus, when a well-defined sequence　pattern is observed it is generally an indication of an evolutionary
relationship, suggestive of a functional relationship,　rather than resulting from the sequence requirements　for a particular fold (Yang and Honig 2000c).

Sequence- and structure-based remotehomolog and analog detection

Because the detection of functional homologs is the central　approach currently used to assign function to specific genes,　increasing the sensitivity of homolog detection is a problem　of central importance. Pairwise sequence search methods　such as BLAST and FASTA (Pearson and Lipman 1988;　Altschul et al. 1997) can reliably identify homologs down to　about the 30% sequence identity level. More recently, multiple　sequence profile methods such as PSI-Blast (Altschul　et al. 1997) and Hidden Markov Models (HMM; Krogh et　al. 1994; Eddy 1995, 1996; Karplus et al. 1998) have significantly　enhanced our ability to detect remote homologs　beyond what is possible with pairwise sequence methods.　Databases such as Pfam (Bateman et al. 2002) and SMART　(Schultz et al. 2000) contain multiple sequence alignments　of individual protein and domain families. Other sequence　derived database include PROSITE (Falquet et al. 2002),
which contains profiles generated from multiple sequence　alignments; EMOTIF (Huang and Brutlag 2001), which　contains the sequence alignments from the BLOCKS+ and　PRINTS databases (Attwood and Beck 1994; Henikoff et al.　1999), and InterPro (Apweiler et al. 2000), a database combining　information present in the PRINTS, PROSITE,　Pfam, TIGERFAMs (Haft et al. 2001), and ProDom (Servant　et al. 2002) databases and cross-referenced with　BLOCKS. These databases can be searched for functional　relationships using Web-based servers.

Pure sequence-based methods have inherent statistical　limits. The use of structural information has been shown to　increase both the detection sensitivity and alignment accuracy
once a relationship has been detected. Structure-based　3D–1D sequence profiles (Bowie et al. 1991) and fold recognition　methods (Jones et al. 1992) have been in use for　some time (Jones 1997) and will not be reviewed here.　Recent advances in both of these approaches have involved
methods that combine sequence, structural, and functional　information (Johnson et al. 1993; Elofsson et al. 1996;　Fischer et al. 1996; Rice et al. 1997; Rost 1997; Jaroszewski　et al. 1998; Hargbo and Elofsson 1999; Jones et al. 1999;　Kelley et al. 2000; Panchenko et al. 2000; Kolinski et al.　2001; Shi et al. 2001; Reva et al. 2002).

The use of multiple structure alignments to derive multiple　sequence alignments offers an alternative approach to　the mapping of structural information onto sequence (Mizuguchi
et al. 1998; Matsuo and Bryant 1999; Yang and　Honig 2000c; Reddy et al. 2001). Sequence profiles generated　from multiple structure alignments have been used to　identify homologous core structures (Matsuo and Bryant　1999), to identify evolutionarily conserved residues (Mirny
and Shakhnovich 1999; Yang and Honig 2000c; Reddy et　al. 2001), and to derive structure-based substitution matrices　(Ogata et al. 1998; Prlic et al. 2000; Blake and Cohen 2001).　On the other hand, although the use of multiple structure　information can improve alignment quality (Panchenko et　al. 2000; Shi et al. 2001), sequence profiles generated entirely　from multiple structure alignments alone do not perform　particularly well (Panchenko et al. 2000; Gough and
Chothia 2002; Griffiths-Jones and Bateman 2002). These　papers have shown that profiles constructed with the individual　family members as seeds performed better than those　constructed with a single combined alignment of an entire．

family. Recently, multiple structure alignments have been　combined with multiple sequence alignments to enhance　both sequence search sensitivity (Kelley et al. 2000) and　alignment quality (Al-Lazikani et al. 2001). In these approaches,　sequences for which structures are available are　aligned with a multiple structure alignment. Each of these　sequences is then used as a seed to align to related sequences　using a multiple sequence alignment. The individual　multiple sequence alignments are then merged　through the multiple structure alignment to yield what we　will term a sequence enhanced multiple structure alignment.
However, even when sequence and structural information　are combined, strong sequence signals can at times be lost　when three-dimensional position specific scoring matrices(3D-PSSM) are used (Kelley et al. 2000).

Homology modeling

Homology or comparative modeling involves the prediction　of the structure of a query sequence from the structures of　one or more structural templates. The procedure involves　the identification of possible templates that have a clear　sequence relationship to the query, the assembly of the　model, the prediction of regions of the structure that are　likely to have different conformations than the templates　(e.g., loops), and ultimately, the refinement of the structure　in an attempt to account for inherent differences between　the template and query structures. As mentioned above, homology　modeling figures heavily as a rationale for structural　genomics initiatives under the stated assumption that　accurate models can be built for query sequences that have　a greater than 30% sequence identity with their best template.
Of course, the accuracy requirements for a homology　model depend in large part on why the model is being built.　For example, if one is using a model in structure-based drug　design there is a clear need for a highly accurate description　of the ligand binding site. In contrast, if an electrostatic．

The quality of the alignment of the query to the template　sequence is a major factor in determining the quality of　homology models. This is one of the sources of the 30%　rule, because alignment quality usually decreases dramatically　below about 30% sequence identity. (A structural explanation　for this observation has been offered by Chung　and Subbiah, 1996). On the other hand, continuing improvements　in profile-based sequence alignment methods have　extended the range of sequence identities where it seems　pattern on a protein surface (Honig and Nicholls 1995) is of　interest to help identify a binding interface with another　protein, nucleic acid, or membrane, it may be possible to　suffice with a less reliable model based on lowers levels of　sequence identity to the template. Thus, the 30% rule is best　appropriate to attempt the construction of a homology　model. It is interesting in this regard that some of the homology　modeling targets in the CASP4 (Venclovas 2001)　and CASP5 experiments have no homologs in the PDB that　have significant levels of sequence identity with the query.　Rather, the assignment to the homology category is based　on the availability of statistically significant Psi-Blast hits　(Tramontano 1998). Advances in the accuracy of sequence　alignments using structure-based profile methods such as　those described above should result in continuing improvements　in the quality of homology models and in an increase　in the number of sequences for which a meaningful homology　model can be built.　used as a useful guideline rather than as a meaningful cutoff　as to when a model should be viewed as reliable.

Once one or more templates have been identified and　alignments have been decided, a decision must be made as　to how to construct the model. A number of strategies have　been adopted (Sanchez and Sali 1997). When one template　is clearly preferable, the coordinates of the aligned residues　in the query are simply superimposed on those of the template.　When more than one template is available, it is possible　to construct a model based on multiple templates　where the coordinates of the query are required to satisfy　spatial constraints defined by interatomic distances in each　of the templates. This is the main strategy adopted in the　widely used MODELLER program (Sali and Blundell　1993). A third option is to construct a composite model that　is assembled from structural fragments taken from different　templates. A new program, Nest, developed in our lab, allows　the user to choose any one of these options or to allow　the program to decide on its own which option to choose　http://trantor.bioc.columbia.edu/∼xiang/jackal/index.html;　Z. Xiang and B. Honig, in prep.).

Once a model has been constructed for the backbone, the　conformations of side chains that are different between the　query and template need to be predicted. The standard procedures　is to sample rotamer libraries for each residue (Ponder　and Richards 1987; Xiang and Honig 2001; Dunbrack　2002) and search for the combinations of side chain conformations　with the lowest energy (Bower et al. 1997; Levitt　et al. 1997). There are a large number of possible rotamer
combinations for an entire protein, and this has led to the　application of advanced sampling techniques to the problem　(Lee and Subbiah 1991; Vasquez 1996; De Maeyer et al.1997). These seem particularly important for problems of　protein design where the actual sequence of the protein　needs to be determined (Dahiyat and Mayo 1997; Gordon　and Mayo 1999; Looger and Hellinga 2001). However, the　combinatorial problem is much less severe than what was　generally assumed if one wishes to predict the conformations　of buried side chains in a given protein, presumably　because the problem of packing the protein interior is so　constrained by steric factors (Xiang and Honig 2001). If a　large number of rotamers per residue are sampled, it is　generally possible to find low energy conformations thatcorrespond quite closely to the native structure (Xiang and　Honig 2001; Jacobson et al. 2002).

A second problem that has been addressed for many years　is the prediction of the conformation of loops. These often　correspond to regions of the query sequence that are different　than those of the template so that the problem of loop　prediction is an important element in homology modeling.　Two general approaches have been applied to the prediction　of loop conformation: database search and ab initio techniques.　In the database search method, a library of segments　derived from known protein structures is searched for conformations　that fit the topological constraint of the loop　stems. Loop candidates found in this way can then be evaluated　by different criteria such as sequence relationships between　the template and query segment or some measure of　conformational energy. In some applications, such methods
can be very powerful; for example, when canonical structures　exist, as is the case for the hypervariable loops of　antibodies (Chothia and Lesk 1987; Martin and Thornton　1996). However, in general, there is no guarantee that the　correct loop conformation can be found in the PDB. Ab　initio methods involve the generation of a large number of　loop conformations, usually randomly, and their evaluation　based on some sort of energy function (Bruccoleri and　Karplus 1990; Rapp and Friesner 1999; Fiser et al. 2000).Recent advances in ab initio loop prediction (Fiser et al.2000; Xiang et al. 2002) suggest that the approach will, in　general, yield significantly more accurate predictions than　fragment based methods. This is because it is possible to　generate and evaluate far more conformations for a given　loop than are available from fragments in the PDB.

Despite progress in sequence alignment, model building,　and loop and side chain prediction, formidable problemsstill remain in homology modeling. Even the most accurate　loop and side chain procedures only work well when the　conformation of the backbone is accurately known, and this　often is not be the case. Indeed, there are frequently differences　in conformation between query and template, and　unless these can be predicted, they pose an inherent 　imitation　on all model-building procedures. This is another reason　that prediction accuracy declines at low levels of sequence　identity; as proteins diverge more in sequence, they　tend to diverge more in structure (Yang and Honig 2000b).　Thus, even if an alignment is perfect, the accuracy of the　homology model will depend on the degree of structural　similarity between template and query. Structural genomics　initiatives will help solve this problem by providing more　templates with significant levels of sequence identity to every　protein whose structure is not known. In addition, continuing　advances in the ability to evaluate conformational　free energies offer the possibility of significant improvements　in structure prediction.

Programs such as Verify 3D (Luthy et al. 1992) and Prosa　II (Sippl 1993) that provide measures of protein stability arealready widely used in evaluating the reliability of different　models and in identifying regions of a structure that do not　appear to be native-like. However, if a truly accurate　method of evaluating relative conformational free energies　were available the procedure of constructing a large number　of models for each protein and choosing based on their　relative stabilities would become far more effective. Conformational　free energy “scoring functions” fall into two　categories: statistical effective energy functions that are　based on the observed properties of amino acids in known　structures and physical effective energy functions that are　based on a direct evaluation of the conformational free energy　of a protein (Sippl 1995; Samudrala and Moult 1998;　Simons et al. 1999; Lazaridis and Karplus 2000; Petrey and　Honig 2000). Both approaches are generally capable of discriminating
native conformations from incorrectly folded　“decoys,” but this does not guarantee that they are able to　determine which of two partially incorrect structures is closest　to the native conformation. Even if this was possible, the　problem of beginning with a structure that differs from the　native, and relaxing it in such a way so as to arrive at a　near-native conformation, or at least a structure that is closer　to the native than the initial conformation, still remains. Of　course, the two problems are closely linked; the more accurate
the evaluation of the conformational free energy, the　more likely it is that some procedure that produces conformational　change, such as molecular dynamics, will be able　to improve on a model constructed from one or more templates.　The goal of significantly improving a structure constructed　from one or more templates poses a major computational　and theoretical challenge that must be overcome if　there is to be significant progress in homology modeling　beyond what will result from the availability of an increasing　number of structures.　

Structure-based functional analysis

The functional annotation of newly determined structures(Teichmann et al. 2001) is important, not only for gainingbiological insights, but for uncovering novel relationships　between sequence, structure, and function. There are a number　of ways in which structure determination can aid in the　assignment of function. Some of these involve enhancements　in widely used methods while others will require the　development of totally new technologies that exploit three　dimensional structures in novel ways. The assignment of　function based on sequence homology is, of course, the　most direct way of assigning function so that the advances　in remote homolog detection based on structural information,　as summarized above, will inevitably improve function　assignment. Similarly, structural alignment methods allow　the direct determination of relationships between proteins　that are not evident from sequence analysis so that each newstructure has the potential of revealing new functional relationships(Zarembinski et al. 1998; Hwang et al. 1999; Volz　1999; Du et al. 2000).

A number of groups have investigated the correlation　between sequence, structure, and functional similarity using　E.C. numbers as a measure of functional relationships, including
identifying the relationship between CATH or　SCOP classification and E.C. number (Martin et al. 1998;　Hegyi and Gerstein 1999; Devos and Valencia 2000;　Pawlowski et al. 2000; Wilson et al. 2000; Todd et al. 2001;　Rost 2002). Thornton and coworkers (Martin et al. 1998)　report that enzyme function is not closely related to protein　fold, indeed, it is well known that enzymes with very similar　functions can have very different folds (the classic example
being serine proteases such as subtilisin and trypsin). In　contrast, a correlation was found between protein architecture　and ligand type. Russell et al. have found that there are
locations on groups of analogous proteins that show a tendency　to bind substrates despite the absence of any evidence　that these proteins are evolutionarily related (Russell et al.
1998). It is not clear that the existence of such “supersites”　suggests the existence of a common ancestor or whether　they simply reflect an underlying structural or physicalchemical
property of a particular fold. Studies of this type　raise intriguing questions whose answers may become more　accessible as the number of available three-dimensional　structures grows. These goals will be aided by efforts such　as the Gene Ontology (GO) project, which provide a more　detailed functional annotations of gene products than has　been available in the past (Ashburner et al. 2000). GO classifies　sequences into multiple categories, based on their specific　function, biological process, and subcellular location,　and should thus enable the detection of relationships that　would not otherwise be evident.　

Although the expectation is that many proteins solved as　a result of structural genomics initiatives will not have identifiable　sequence or structural homologs, it is clear that in
many cases alignments of global structures will reveal relationships　that aid in the assignment of function. However,there are many ways to define function, and as the desired
definition becomes increasingly precise it becomes necessary　to examine local motifs rather than global features. In　parallel with the sequence motif databases that have been　assembled (e.g., BLOCKS, PRINTS, PROSITE) a number　of groups have developed methods to define three-dimensional　motifs so that each new structure can be searched　for the presence of a particular local pattern. For example,Wallace et al. (1996, 1997) have assembled a database of
three-dimensional templates of active site residues. Their　PROCAT database can be searched (http://www.biochem.ucl.ac.uk/bsm/PROCAT/PROCAT.html) to identify groups　of residues in a query structure with orientations consistent　with those in known active sites (Wallace et al. 1996). Kasuya　and Thornton have shown that many PROSITE patternshave a common three-dimensional structure that provides　a basis for creating a library of functional templates
(Kasuya and Thornton 1999). Skolnick et al. developed geometric　and conformational descriptors of active site residues　(Fetrow and Skolnick 1998; Fetrow et al. 1999). These　“fuzzy functional forms” contain information from sequence　conservation, and biochemical data, as well as describe　geometric relationships between active site residues.

Another approach that has been used to identify functionally　important regions involves the mapping of conserved　sequence features on the protein surface. Methods in this　category include Evolutionary Trace (Lichtarge et al. 1996),　ConSurf (Armon et al. 2001), 3D-Cluster (Landgraf et al.2001), and AL2CO (Pei and Grishin 2001). These approaches　have been used to successfully identify clusters of　specific ligand binding residues (Aloy et al. 2001; Armon et
al. 2001; Pei and Grishin 2001; Lichtarge and Sowa 2002;Madabushi et al. 2002).

The physical and chemical nature of the protein surface　offers an approach to the analysis of function that has only　been partially exploited. Electrostatic, hydrophobicity, and　geometric patterns such as those revealed in the GRASP　program (Nicholls et al. 1991) have been used for some time　to identify functionally important regions on protein surfaces.For example, patches of electrostatic potential are　often indicators of a binding interface, usually to a molecular
with a potential of opposite sign (Honig and Nicholls　1995). This is, however, not always the case; indeed, some　interfaces appear to exploit electrostatic interactions to drive　binding, while in others hydrophobic residues appear to be　the dominant surface feature (Sheinerman and Honig 2002).

The quantitative description of protein surfaces is a problem　that offers considerable potential in the structure-based　analysis of function. In analogy with structural alignment
methods that have been developed to identify common geometric　features in the polypeptide backbone, it would be　extremely valuable to be able to identify common surface　features that are characteristic of a particular binding function.　These might include descriptors of shape, electrostatic　potential, hydrophobicity, and sequence conservation,　mapped onto a protein surface rather onto individual amino　acids. A number of efforts with this goal in mind have
already been reported. These include patch analysis of　(Jones and Thornton 1997), which analyzes surface features　of patches of residues, and the GRASS (Nayal et al. 1999)　and SPIN servers (www.trantor.bioc.columbia.edu), which　can be used to identify functionally relevant esidues in protein　structures. Recently, Klebe and coworkers (Schmitt et　al. 2002) reported a new method to recognize similar binding　pockets on protein surfaces independent of any sequence　or structural relationship these proteins might have.　They describe a database (Cavbase) that contains active-site　cavity descriptors that can be searched to help assign function　to proteins of that may exhibit similar binding properties.

Discussion

In this review we have summarized (although not exhaustively)computational methods that are an integral component　of current structural genomics initiatives. These methods　can aid in the detection of novel relationships between　proteins and in assigning function to individual proteins.　Computational tools in this area can be expected to improve　as additional data become available and as our understanding　of the principles of protein structure and function continues　to develop. Indeed, the advent of structural genomic　initiatives is certain to spur the development of a host of　new computational methods aimed at detecting new relationships
between sequence, structure, and function.　Large-scale analysis is a common underlying theme in　much current research effort in areas characterized by the　“omics” suffix. Genome-wide structure prediction and function　assignment are agreed upon goals as is the structure and
function-based clustering of sequence space. These goals　are clearly of great value, but they are only part of the　picture. Much of biology still involves a focus on individual　problems, and this is likely to remain the case for years to　come. Thus, it becomes appropriate to ask how the vast　quantities of new data can be used to address specific problems　in detail rather than to provide a broad overview of　genome-wide behavior. One approach is to develop a etailed
structural description of entire protein families that is　not accessible if the structures of just a few family members　are available. It is important, for example, not only to know　what family members have in common but also to understand　how they are different. Biological specificity lies in　the differences, for example, in the nucleotide sequencespecific　DNA recognition of closely related transcription　factors or in the differential phosphotyrosine ontaining　peptide binding of different SH2 domains. Knowing the　structures of all family members would clearly be useful in　addressing questions such as these, but the spirit of many
structural genomics initiatives is to avoid solving the structures　of many closely related proteins.

The methods described in this review offer at least a　partial solution to the problem. Continued progress in the　development of sequence and structure alignment methods　will increase the sensitivity of remote homolog detection　and alignment accuracy will continue to improve. In parallel,　homology modeling will become an increasingly more　accurate procedure, and it will become possible to construct　meaningful homology models for entire protein families.　Such models will provide the basis for a more detailed　analysis of function than has been available in the past and,　in the hands of researchers interested in the biology of a
particular problem, will provide powerful tools for the　analysis of experimental data and for the design of new　experiments (see, e.g., Murray and Honig 2002). The number　of structures required to model all members of a particularfamily will not necessarily conform to the 30% rule　mentioned above, but rather may be dictated by the properties　of the family and by the particular question being　asked. This protein family-based approach will still conform　to the structural genomics approach of solving enough　structures so that all others can be obtained from homology　modeling, but the number of structures needed to be solved　will be determined on a case by case basis. A target selection　strategy with these ideas in mind may provide a link　between structural genomics initiatives and more traditional　research in structural biology.

Acknowledgments

This work was supported by grants NIH P50GM62413, NIH　GM30518, and NSF DBI-9904841. We are deeply appreciative of　our interactions with Drs. Gaetano Montelione, Diana Murray,　Burkhard Rost, and Mark Gerstein who are all members of the　bioinformatics group of the NESG Consortium. We also acknowledge　our collaborations with the members of the experimental　group of the NESG Consortium who have provided us with the　data required to apply the functional annotation to current problems　on a real-time basis. We thank Emil Alexov, Chris Tang, Lei
Xie, and Jason Xiang for informative discussions on the topics　covered in this article.

References

Al-Lazikani, B., Sheinerman, F.B., and Honig, B. 2001. Combining multiple　structure and sequence alignments to improve sequence detection and alignment:　Application to the SH2 domains of Janus kinases. Proc. Natl. Acad.Sci. 98: 14796–14801.
Aloy, P., Querol, E., Aviles, F.X., and Sternberg, M.J. 2001. Automated structure-based prediction of functional sites in proteins: Applications to assessing　the validity of inheriting protein function from homology in genome　annotation and to protein docking. J. Mol. Biol. 311: 395–408.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W.,and Lipman, D.J. 1997. Gapped BLAST and PSI-BLAST: A new generation　of protein database search programs. Nucleic Acids Res. 25: 3389–3402.
Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M.,Bucher, P., Cerutti, L., Corpet, F., Croning, M.D., et al. 2000. InterPro—An　integrated documentation resource for protein families, domains and functionalsites. Bioinformatics 16: 1145–1150.
Armon, A., Graur, D., and Ben-Tal, N. 2001. ConSurf: An algorithmic tool for　the identification of functional regions in proteins by surface mapping of　phylogenetic information. J. Mol. Biol. 307: 447–463.
Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M.,Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al. 2000. Gene　ontology: Tool for the unification of biology. The Gene Ontology Consortium.Nat. Genet. 25: 25–29.
Attwood, T.K. and Beck, M.E. 1994. PRINTS—A protein motif fingerprint　database. Protein Eng. 7: 841–848.
Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M., and Sonnhammer, E.L. 2002. The　Pfam protein families database. Nucleic Acids Res. 30: 276–280.
Bertone, P., Kluger, Y., Lan, N., Zheng, D., Christendat, D., Yee, A., Edwards,A.M., Arrowsmith, C.H., Montelione, G.T., and Gerstein, M. 2001. SPINE:An integrated tracking database and data mining approach for identifying　feasible targets in high-throughput structural proteomics. Nucleic Acids Res.29: 2884–2298.
Blake, J.D. and Cohen, F.E. 2001. Pairwise sequence alignment below the　twilight zone. J. Mol. Biol. 307: 721–735.
Bower, M.J., Cohen, F.E., and Dunbrack Jr., R.L. 1997. Prediction of protein　side-chain rotamers from a backbone-dependent rotamer library: A new　homology modeling tool. J. Mol. Biol. 267: 1268–1282.
Bowie, J.U., Luthy, R., and Eisenberg, D. 1991. A method to identify proteinsequences that fold into a known three-dimensional structure. Science 253:164–170.
Bruccoleri, R.E. and Karplus, M. 1990. Conformational sampling using hightemperature
molecular dynamics. Biopolymers 29: 1847–1862.
Carugo, O. and Pongor, S. 2001. A normalized root-mean-square distance forcomparing protein three-dimensional structures. Protein Sci. 10: 1470–1473.
Chance, M.R., Bresnick, A.R., Burley, S.K., Jiang, J.S., Lima, C.D., Sali, A.,Almo, S.C., Bonanno, J.B., Buglino, J.A., Boulton, S., et al. 2002. Structural　genomics: A pipeline for providing structures for the biologist. Protein Sci.11: 723–738.
Chothia, C. and Lesk, A.M. 1987. Canonical structures for the hypervariable　regions of immunoglobulins. J. Mol. Biol. 196: 901–917.
Chung, S.Y. and Subbiah, S. 1996. A structural explanation for the twilight zone　of protein sequence homology. Structure 4: 1123–1127.
Dahiyat, B.I. and Mayo, S.L. 1997. Probing the role of packing specificity in　protein design. Proc. Natl. Acad. Sci. 94: 10172–10177.
De Maeyer, M., Desmet, J., and Lasters, I. 1997. All in one: A highly detailed　rotamer library improves both accuracy and speed in the modelling of　sidechains by dead-end elimination. Fold. Des. 2: 53–66.
Devos, D. and Valencia, A. 2000. Practical limits of function prediction. Proteins　41: 98–107.
Du, X., Choi, I.G., Kim, R., Wang, W., Jancarik, J., Yokota, H., and Kim, S.H.2000. Crystal structure of an intracellular protease from Pyrococcus horikoshiiat 2-Å resolution. Proc. Natl. Acad. Sci. 97: 14079–14084.
Dunbrack, R.L. 2002. Rotamer libraries in the 21st century. Curr. Opin. Struct.Biol. 12: 431–440.
Eddy, S.R. 1995. Multiple alignment using hidden Markov models. Proc. Int.Conf. Intell. Syst. Mol. Biol. 3: 114–120.
———. 1996. Hidden Markov models. Curr. Opin. Struct. Biol. 6: 361–365.
Elofsson, A., Fischer, D., Rice, D.W., Le Grand, S.M., and Eisenberg, D. 1996.
A study of combined structure/sequence profiles. Fold. Des. 1: 451–461.
Falquet, L., Pagni, M., Bucher, P., Hulo, N., Sigrist, C.J., Hofmann, K., and　Bairoch, A. 2002. The PROSITE database, its status in 2002. Nucleic Acids　Res. 30: 235–238.
Fetrow, J.S. and Skolnick, J. 1998. Method for prediction of protein function　from sequence using the sequence-to-structure-to-function paradigm with　application to glutaredoxins/thioredoxins and T1 ribonucleases. J. Mol.Biol. 281: 949–968.
Fetrow, J.S., Siew, N., and Skolnick, J. 1999. Structure-based functional motif　identifies a potential disulfide oxidoreductase active site in the serine/threonine　protein phosphatase-1 subfamily. FASEB J. 13: 1866–1874.
Fischer, D., Rice, D., Bowie, J.U., and Eisenberg, D. 1996. Assigning amino　acid sequences to 3-dimensional protein folds. FASEB J. 10: 126–136.　Fiser, A., Do, R.K., and Sali, A. 2000. Modeling of loops in protein structures.Protein Sci. 9: 1753–1773.
Godzik, A. 1996. The structural alignment between two proteins: Is there aunique answer? Protein Sci. 5: 1325–1338.
Gordon, D.B. and Mayo, S.L. 1999. Branch-and-terminate: A combinatorial　optimization algorithm for protein design. Struct. Fold. Des. 7: 1089–1098.　Gough, J. and Chothia, C. 2002. SUPERFAMILY: HMMs representing all　proteins of known structure. SCOP sequence searches, alignments and genome　assignments. Nucleic Acids Res. 30: 268–272.
Griffiths-Jones, S. and Bateman, A. 2002. The use of structure information to　increase alignment accuracy does not aid homologue detection with profile　HMMs. Bioinformatics 18: 1243–1249.
Haft, D.H., Loftus, B.J., Richardson, D.L., Yang, F., Eisen, J.A., Paulsen, I.T.,and White, O. 2001. TIGRFAMs: A protein family resource for the functionalidentification of proteins. Nucleic Acids Res. 29: 41–43.
Hargbo, J. and Elofsson, A. 1999. Hidden Markov models that use predicted　secondary structures for fold recognition. Proteins 36: 68–76.
Harrison, A., Pearl, F., Mott, R., Thornton, J., and Orengo, C. 2002. Quantifying　the similarities within fold space. J. Mol. Biol. 323: 909–926.
Hegyi, H. and Gerstein, M. 1999. The relationship between protein structure and　function: A comprehensive survey with application to the yeast genome.J. Mol. Biol. 288: 147–164.
Henikoff, S., Henikoff, J.G., and Pietrokovski, S. 1999. Blocks+: A non-redundant　database of protein alignment blocks derived from multiple compilations.Bioinformatics 15: 471–479.
Holm, L. and Sander, C. 1993. Protein structure comparison by alignment of　distance matrices. J. Mol. Biol. 233: 123–138.
Honig, B. and Nicholls, A. 1995. Classical electrostatics in biology and chemistry.Science 268: 1144–1149.
Huang, J.Y. and Brutlag, D.L. 2001. The EMOTIF database. Nucleic Acids Res.29: 202–204.
Hubbard, T.J., Ailey, B., Brenner, S.E., Murzin, A.G., and Chothia, C. 1999SCOP: A structural classification of proteins database. Nucleic Acids Res.
27: 254–256.
Hwang, K.Y., Chung, J.H., Kim, S.H., Han, Y.S., and Cho, Y. 1999. Structurebased
identification of a novel NTPase from Methanococcus jannaschii.Nat. Struct. Biol. 6: 691–696.
Jacobson, M.P., Friesner, R.A., Xiang, Z., and Honig, B. 2002. On the role of　the crystal environment in determining protein side-chain conformations.J. Mol. Biol. 320: 597–608.
Jaroszewski, L., Rychlewski, L., Zhang, B., and Godzik, A. 1998. Fold prediction　by a hierarchy of sequence, threading, and modeling methods. Protein　Sci. 7: 1431–1440.
Johnson, M.S., Overington, P.J., and Blundell, T.L. 1993. Alignment and　searching for common protein folds using a data bank of structural templates.J. Mol. Biol. 231: 735–752.
Jones, D.T. 1997. Progress in protein structure prediction. Curr. Opin. Struct.Biol. 7: 377–387.
Jones, D.T., Taylor, W.R., and Thornton, J.M. 1992. A new approach to protein　fold recognition. Nature 358: 86–89.
Jones, D.T., Tress, M., Bryson, K., and Hadley, C. 1999. Successful recognition　of protein folds using threading methods biased by sequence similarity and　predicted secondary structure. Proteins Suppl 3: 104–111.
Jones, S., and Thornton, J.M. 1997. Prediction of protein–protein interaction　sites using patch analysis. J. Mol. Biol. 272: 133–143.
Karplus, K., Barrett, C., and Hughey, R. 1998. Hidden Markov models for　detecting remote protein homologies. Bioinformatics 14: 846–856.
Kasuya, A. and Thornton, J.M. 1999. Three-dimensional structure analysis of　PROSITE patterns. J. Mol. Biol. 286: 1673–1691.
Kelley, L.A., MacCallum, R.M., and Sternberg, M.J. 2000. Enhanced genome　annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol.299: 499–520.
Kolinski, A., Betancourt, M.R., Kihara, D., Rotkiewicz, P., and Skolnick, J.　2001. Generalized comparative modeling (GENECOMP): A combination of　sequence comparison, threading, and lattice modeling for protein structureprediction and refinement. Proteins 44: 133–149.
Krogh, A., Brown, M., Mian, I.S., Sjolander, K., and Haussler, D. 1994. Hidden　Markov models in computational biology. Applications to protein modeling.J. Mol. Biol. 235: 1501–1531.
Landgraf, R., Xenarios, I., and Eisenberg, D. 2001. Three-dimensional cluster　analysis identifies interfaces and functional residue clusters in proteins.　J. Mol. Biol. 307: 1487–1502.
Lazaridis, T. and Karplus, M. 2000. Effective energy functions for protein　structure prediction. Curr. Opin. Struct. Biol. 10: 139–145.　
Lee, C. and Subbiah, S. 1991. Prediction of protein side-chain conformation by　packing optimization. J. Mol. Biol. 217: 373–388.
Leibowitz, N., Fligelman, Z.Y., Nussinov, R., and Wolfson, H.J. 2001. Automated　multiple structure alignment and detection of a common substructural　motif. Proteins 43: 235–245.
Levitt, M. and Chothia, C. 1976. Structural patterns in globular proteins. Nature　261: 552–558.
Levitt, M., Gerstein, M., Huang, E., Subbiah, S., and Tsai, J. 1997. Protein　folding: The endgame. Annu. Rev. Biochem. 66: 549–579.
Lichtarge, O. and Sowa, M.E. 2002. Evolutionary predictions of binding surfaces　and interactions. Curr. Opin. Struct. Biol. 12: 21–27.
Lichtarge, O., Bourne, J.R., and Cohen, F.E. 1996. An evolutionary tracemethod defines binding surfaces common to protein families. J. Mol. Biol.257: 342–358.
Looger, L.L. and Hellinga, H.W. 2001. Generalized dead-end elimination algorithms　make large-scale protein side-chain structure prediction tractable:　Implications for protein design and structural genomics. J. Mol. Biol. 307:429–445.
Luthy, R., Bowie, J.U., and Eisenberg, D. 1992. Assessment of protein models　with three-dimensional profiles. Nature 356: 83–85.Madabushi, S., Yao, H., Marsh, M., Kristensen, D.M., Philippi, A., Sowa, M.E.,and Lichtarge, O. 2002. Structural clusters of evolutionary trace residues are　statistically significant and common in proteins. J. Mol. Biol. 316: 139–154.
Madej, T., Gibrat, J.F., and Bryant, S.H. 1995. Threading a database of protein　cores. Proteins 23: 356–369.　Martin, A.C. and Thornton, J.M. 1996. Structural families in loops of homologous　proteins: Automatic classification, modelling and application to antibodies.
J. Mol. Biol. 263: 800–815.　Martin, A.C., Orengo, C.A., Hutchinson, E.G., Jones, S., Karmirantzou, M.,　Laskowski, R.A., Mitchell, J.B., Taroni, C., and Thornton, J.M. 1998. Proteinfolds and functions. Structure 6: 875–884.
Matsuo, Y. and Bryant, S.H. 1999. Identification of homologous core structures.　Proteins 35: 70–79.Mirny, L.A. and Shakhnovich, E.I. 1999. Universally conserved positions in　protein folds: Reading evolutionary signals about stability, folding kinetics　and function. J. Mol. Biol. 291: 177–196.
Mizuguchi, K., Deane, C.M., Blundell, T.L., and Overington, J.P. 1998.HOMSTRAD: A database of protein structure alignments for homologous　families. Protein Sci. 7: 2469–2471.
Murray, D. and Honig, B. 2002. Electrostatic control of the membrane targeting　of C2 domains. Mol. Cell 9:145–154.
Nayal, M., Hitz, B.C., and Honig, B. 1999. GRASS: A server for the graphical　representation and analysis of structures. Protein Sci. 8: 676–679.
Nicholls, A., Sharp, K.A., and Honig, B. 1991. Protein folding and association:　Insights from the interfacial and thermodynamic properties of hydrocarbons.Proteins 11: 281–296.
Nussinov, R. and Wolfson, H.J. 1991. Efficient detection of three-dimensional　structural motifs in biological macromolecules by computer vision techniques.Proc. Natl. Acad. Sci. 88: 10495–10499.
Ogata, K., Ohya, M., and Umeyama, H. 1998. Amino acid similarity matrix for　homology modeling derived from structural alignment and optimized by theMonte Carlo method. J. Mol. Graph. Model 16: 178–189, 254.
Orengo, C.A. and Taylor, W.R. 1996. SSAP: Sequential structure alignment　program for protein structure comparison. Methods Enzymol. 266: 617–635.
Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., Swindells, M.B., and Thornton,　J.M. 1997. CATH—A hierarchic classification of protein domain structures.Structure 5: 1093–1108.
Ortiz, A.R., Strauss, C.E.M., and Olema, O. 2002. MAMMOTH (matching　moleular molecular models obtained from theory): An automated method　for model comparison. Protein Sci. 11: 2606–2621.
Panchenko, A.R., Marchler-Bauer, A., and Bryant, S.H. 2000. Combination　of threading potentials and sequence profiles improves fold recognition.J. Mol. Biol. 296: 1319–1331.
Pawlowski, K., Jaroszewski, L., Rychlewski, L., and Godzik, A. 2000. Sensitive　sequence comparison as protein function predictor. Pacific Symposium on　Biocomputing 5: 42–53.
Pearson, W.R. and Lipman, D.J. 1988. Improved tools for biological sequence　comparison. Proc. Natl. Acad. Sci. 85: 2444–2448.
Pei, J. and Grishin, N.V. 2001. AL2CO: Calculation of positional conservation　in a protein sequence alignment. Bioinformatics 17: 700–712.
Petrey, D. and Honig, B. 2000. Free energy determinants of tertiary structure　and the evaluation of protein models. Protein Sci. 9: 2181–2191.
Ponder, J.W., and Richards, F.M. 1987. Tertiary templates for proteins. Use of　packing criteria in the enumeration of allowed sequences for different structural　classes. J. Mol. Biol. 193: 775–791.
Prlic, A., Domingues, F.S., and Sippl, M.J. 2000. Structure-derived substitution　matrices for alignment of distantly related sequences. Protein Eng. 13:545–550.
Rapp, C.S. and Friesner, R.A. 1999. Prediction of loop geometries using a　generalized born model of solvation effects. Proteins 35: 173–183.
Reddy, B.V., Li, W.W., Shindyalov, I.N., and Bourne, P.E. 2001. Conserved　key amino acid positions (CKAAPs) derived from the analysis of common　substructures in proteins. Proteins 42: 148–163.
Reva, B., Finkelstein, A., and Topiol, S. 2002. Threading with chemostructural　restrictions method for predicting fold and functionally significant residues:　Application to dipeptidylpeptidase IV (DPP-IV). Proteins 47: 180–193.
Rice, D.W., Fischer, D., Weiss, R., and Eisenberg, D. 1997. Fold assignments　for amino acid sequences of the CASP2 experiment. Proteins Suppl 1:113–122.
Richardson, J.S. 1981. The anatomy and taxonomy of protein structure. Adv.　Protein Chem. 34: 167–339.
Rost, B. 1997. Protein structures sustain evolutionary drift. Fold. Des. 2: S19–S24.
———. 2002. Enzyme function less conserved than anticipated. J. Mol. Biol.318: 595–608.
Rost, B., Schneider, R., and Sander, C. 1997. Protein fold recognition by prediction-based threading. J. Mol. Biol. 270: 471–480.
Russell, R.B., Sasieni, P.D., and Sternberg, M.J. 1998. Supersites within superfolds.Binding site similarity in the absence of homology. J. Mol. Biol. 282:903–918.
Sali, A. and Blundell, T.L. 1993. Comparative protein modelling by satisfaction　of spatial restraints. J. Mol. Biol. 234: 779–815.
Samudrala, R. and Moult, J. 1998. An all-atom distance-dependent conditional　probability discriminatory function for protein structure prediction. J. Mol.Biol. 275: 895–916.
Sanchez, R. and Sali, A. 1997. Advances in comparative protein-structure modelling.Curr. Opin. Struct. Biol. 7: 206–214.
Schmitt, S., Kuhn, D., and Klebe, G. 2002. A new method to detect relatedfunction among proteins independent of sequence and fold homology.J. Mol. Biol. 323: 387–406.
Schultz, J., Copley, R.R., Doerks, T., Ponting, C.P., and Bork, P. 2000.
SMART: A web-based tool for the study of genetically mobile domains.Nucleic Acids Res. 28: 231–234.
Servant, F., Bru, C., Carrere, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn,D. 2002. ProDom: Automated clustering of homologous domains. BriefBioinform. 3: 246–251.
Sheinerman, F.B. and Honig, B. 2002. On the role of electrostatic interactionsin the design of protein–protein interfaces. J. Mol. Biol. 318: 161–177.
Shi, J., Blundell, T.L., and Mizuguchi, K. 2001. FUGUE: Sequence-structurehomology recognition using environment-specific substitution tables andstructure-dependent gap penalties. J. Mol. Biol. 310: 243–257.
Shindyalov, I.N. and Bourne, P.E. 1998. Protein structure alignment by incrementalcombinatorial extension (CE) of the optimal path. Protein Eng. 11:739–747.
———. 2000. An alternative view of protein fold space. Proteins 38: 247–260.Simons, K.T., Ruczinski, I., Kooperberg, C., Fox, B.A., Bystroff, C., and Baker,D. 1999. Improved recognition of native-like protein structures using acombination of sequence-dependent and sequence-independent features of　proteins. Proteins 34: 82–95.
Sippl, M.J. 1993. Recognition of errors in three-dimensional structures of proteins.　Proteins 17: 355–362.
———. 1995. Knowledge-based potentials for proteins. Curr. Opin. Struct.　Biol. 5: 229–235.
Swindells, M.B., Orengo, C.A., Jones, D.T., Hutchinson, E.G., and Thornton,J.M. 1998. Contemporary approaches to protein structure classification.Bioessays 20: 884–891.
Teichmann, S.A., Murzin, A.G., and Chothia, C. 2001. Determination of protein　function, evolution and interactions by structural genomics. Curr. Opin.　Struct. Biol. 11: 354–363.
Todd, A.E., Orengo, C.A., and Thornton, J.M. 2001. Evolution of function in　protein superfamilies, from a structural perspective. J. Mol. Biol. 307:1113–1143.
Tramontano, A. 1998. Homology modeling with low sequence identity. Methods　14: 293–300.
Vasquez, M. 1996. Modeling side-chain conformation. Curr. Opin. Struct. Biol.6: 217–221.
Venclovas, C. 2001. Comparative modeling of CASP4 target proteins: Combining　results of sequence search with three-dimensional structure assessment.Proteins Suppl 5: 47–54.
Vitkup, D., Melamud, E., Moult, J., and Sander, C. 2001. Completeness in　structural genomics. Nat. Struct. Biol. 8: 59–66.
Volz, K. 1999. A test case for structure-based functional assignment: The 1.2 Å　crystal structure of the yjgF gene product from Escherichia coli. Protein Sci.　8: 2428–2437.
Wallace, A.C., Laskowski, R.A., and Thornton, J.M. 1996. Derivation of 3D　coordinate templates for searching structural databases: Application to Ser-His-Asp catalytic triads in the serine proteinases and lipases. Protein Sci. 5:1001–1013.
Wallace, A.C., Borkakoti, N., and Thornton, J.M. 1997. TESS: A geometric hashing　algorithm for deriving 3D coordinate templates for searching structural　databases. Application to enzyme active sites. Protein Sci. 6: 2308–2323.
Wilson, C.A., Kreychman, J., and Gerstein, M. 2000. Assessing annotation　transfer for genomics: Quantifying the relations between protein sequence,　structure and function through traditional and probabilistic scores. J. Mol.Biol. 297: 233–249.
Xiang, Z. and Honig, B. 2001. Extending the accuracy limits of prediction for　side-chain conformations. J. Mol. Biol. 311: 421–430.　Xiang, Z., Soto, C.S., and Honig, B. 2002. Evaluating conformational freeenergies: The colony energy and its application to the problem of loop　prediction. Proc. Natl. Acad. Sci. 99: 7432–7437.
Yang, A.S. and Honig, B. 2000a. An integrated approach to the analysis and　modeling of protein sequences and structures. I. Protein structural alignment　and a quantitative measure for protein structural distance. J. Mol. Biol. 301:665–678.
———. 2000b. An integrated approach to the analysis and modeling of protein　sequences and structures. II. On the relationship between sequence and　structural similarity for proteins that are not obviously related in sequence.　J. Mol. Biol. 301: 679–689.
———. 2000c. An integrated approach to the analysis and modeling of protein　sequences and structures. III. A comparative study of sequence conservation　in protein structural families using multiple structural alignments. J. Mol.Biol. 301: 691–711.
Zarembinski, T.I., Hung, L.W., Mueller-Dieckmann, J.H., Kim, K.K., Yokota,　H., Kim, R., and Kim, S.H. 1998. Structure-based assignment of the biochemical　function of a hypothetical protein: A test case of structural genomics.Proc. Natl. Acad. Sci. 95: 15189–15193.

• 衛(wèi)生部：先看病后付費(fèi)不宜推行	• 控費(fèi)失當(dāng)危及醫(yī)保制度根在公立醫(yī)院改革
• 公立醫(yī)院改革將如何影響醫(yī)療器械產(chǎn)業(yè)？	• 醫(yī)改持續(xù)深化影響漸顯藥品流通業(yè)利潤(rùn)增幅下滑
• 三明醫(yī)改三回歸：砍掉虛高抹掉灰色	• 黃潔夫：改善醫(yī)療生態(tài)環(huán)境是醫(yī)改成功的關(guān)鍵
• 陳仲強(qiáng)委員：發(fā)展社會(huì)辦醫(yī)先要卸包袱	• 從常用藥品短缺現(xiàn)象談醫(yī)藥價(jià)格改革
• 回眸2012年新醫(yī)改—邁向病有所醫(yī)新時(shí)代	• 互聯(lián)網(wǎng)醫(yī)療能否拆除醫(yī)院壁壘,？目前多以預(yù)約掛號(hào)

互邦電動(dòng)輪椅HBLD1-A(	海波 811B-D雙人電動(dòng)
海波 811B 經(jīng)濟(jì)型電動(dòng)	互邦電動(dòng)輪椅HBLD2-A

VIP

推廣服務(wù)

結(jié)構(gòu)基因組：在結(jié)構(gòu)分析中的計(jì)算方法

平臺(tái)客服